linux-stable/mm
Johannes Weiner 9457056ac4 mm: madvise: MADV_DONTNEED_LOCKED
MADV_DONTNEED historically rejects mlocked ranges, but with MLOCK_ONFAULT
and MCL_ONFAULT allowing to mlock without populating, there are valid use
cases for depopulating locked ranges as well.

Users mlock memory to protect secrets.  There are allocators for secure
buffers that want in-use memory generally mlocked, but cleared and
invalidated memory to give up the physical pages.  This could be done with
explicit munlock -> mlock calls on free -> alloc of course, but that adds
two unnecessary syscalls, heavy mmap_sem write locks, vma splits and
re-merges - only to get rid of the backing pages.

Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are
okay with on-demand initial population.  It seems valid to selectively
free some memory during the lifetime of such a process, without having to
mess with its overall policy.

Why add a separate flag? Isn't this a pretty niche usecase?

- MADV_DONTNEED has been bailing on locked vmas forever. It's at least
  conceivable that someone, somewhere is relying on mlock to protect
  data from perhaps broader invalidation calls. Changing this behavior
  now could lead to quiet data corruption.

- It also clarifies expectations around MADV_FREE and maybe
  MADV_REMOVE. It avoids the situation where one quietly behaves
  different than the others. MADV_FREE_LOCKED can be added later.

- The combination of mlock() and madvise() in the first place is
  probably niche. But where it happens, I'd say that dropping pages
  from a locked region once they don't contain secrets or won't page
  anymore is much saner than relying on mlock to protect memory from
  speculative or errant invalidation calls. It's just that we can't
  change the default behavior because of the two previous points.

Given that, an explicit new flag seems to make the most sense.

[hannes@cmpxchg.org: fix mips build]

Link: https://lkml.kernel.org/r/20220304171912.305060-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:51 -07:00
..
damon Folio changes for 5.18 2022-03-22 17:03:12 -07:00
kasan kasan: disable LOCKDEP when printing reports 2022-03-24 19:06:50 -07:00
kfence kfence: allow use of a deferrable timer 2022-03-22 15:57:11 -07:00
backing-dev.c remove congestion tracking framework 2022-03-22 15:57:01 -07:00
balloon_compaction.c
bootmem_info.c bootmem: Use page->index instead of page->freelist 2022-01-06 12:27:03 +01:00
cma.c mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
cma_debug.c
cma_sysfs.c
compaction.c mm: compaction: cleanup the compaction trace events 2022-03-22 15:57:09 -07:00
debug.c mm: unexport page_init_poison 2022-03-24 19:06:45 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: remove pte entry from the page table 2022-02-04 09:25:04 -08:00
dmapool.c mm/dmapool.c: revert "make dma pool to use kmalloc_node" 2022-01-15 16:30:28 +02:00
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c remove inode_congested() 2022-03-22 15:57:01 -07:00
failslab.c
filemap.c mm: warn on deleting redirtied only if accounted 2022-03-24 19:06:51 -07:00
folio-compat.c mm/rmap: Convert rmap_walk() to take a folio 2022-03-21 13:01:35 -04:00
frontswap.c frontswap: remove support for multiple ops 2022-01-22 08:33:38 +02:00
gup.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
gup_test.c
gup_test.h
highmem.c mm/highmem: remove unnecessary done label 2022-03-22 15:57:11 -07:00
hmm.c mm/hmm.c: remove unneeded local variable ret 2022-03-22 15:57:12 -07:00
huge_memory.c mm/huge_memory: remove stale locking logic from __split_huge_pmd() 2022-03-24 19:06:51 -07:00
hugetlb.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
hugetlb_cgroup.c hugetlb: add hugetlb.*.numa_stat file 2022-01-15 16:30:29 +02:00
hugetlb_vmemmap.c mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key 2022-03-22 15:57:08 -07:00
hugetlb_vmemmap.h mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate 2021-06-30 20:47:25 -07:00
hwpoison-inject.c mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler 2022-03-22 15:57:07 -07:00
init-mm.c kernel/fork: Initialize mm's PASID 2022-02-14 19:51:47 +01:00
internal.h Folio changes for 5.18 2022-03-22 17:03:12 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: move ioremap_page_range to vmalloc.c 2021-09-08 11:50:24 -07:00
Kconfig mm: generalize ARCH_HAS_FILTER_PGPROT 2022-03-24 19:06:51 -07:00
Kconfig.debug mm: page table check 2022-01-15 16:30:28 +02:00
khugepaged.c mm/khugepaged: remove reuse_swap_page() usage 2022-03-24 19:06:51 -07:00
kmemleak.c mm/kmemleak: avoid scanning potential huge holes 2022-02-04 09:25:05 -08:00
ksm.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
list_lru.c mm/list_lru: optimize memcg_reparent_list_lru_node() 2022-03-22 15:57:08 -07:00
maccess.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
madvise.c mm: madvise: MADV_DONTNEED_LOCKED 2022-03-24 19:06:51 -07:00
Makefile mm: move the migrate_vma_* device migration code into its own file 2022-03-03 12:47:33 -05:00
mapping_dirty_helpers.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
memblock.c memblock: use kfree() to release kmalloced memblock regions 2022-02-20 08:45:39 +02:00
memcontrol.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-05 11:08:32 -08:00
memory-failure.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
memory.c mm: unmap_mapping_range_tree() with i_mmap_rwsem shared 2022-03-24 19:06:51 -07:00
memory_hotplug.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mempolicy.c mempolicy: mbind_range() set_policy() after vma_merge() 2022-03-22 15:57:09 -07:00
mempool.c mm: remove spurious blkdev.h includes 2021-10-18 06:17:01 -06:00
memremap.c mm: delete __ClearPageWaiters() 2022-03-24 19:06:45 -07:00
memtest.c
migrate.c mm/migration: add trace events for base page and HugeTLB migrations 2022-03-24 19:06:45 -07:00
migrate_device.c mm/migrate: Convert remove_migration_ptes() to folios 2022-03-21 13:01:35 -04:00
mincore.c
mlock.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mm_init.c
mmap.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mmap_lock.c mm: mmap_lock: fix disabling preemption directly 2021-07-23 17:43:28 -07:00
mmu_gather.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
mmu_notifier.c
mmzone.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mprotect.c memory tiering: skip to scan fast memory 2022-03-22 15:57:09 -07:00
mremap.c mm/mremap:: use vma_lookup() instead of find_vma() 2022-03-22 15:57:05 -07:00
msync.c
nommu.c Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
oom_kill.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
page-writeback.c mm: warn on deleting redirtied only if accounted 2022-03-24 19:06:51 -07:00
page_alloc.c kasan, page_alloc: allow skipping memory init for HW_TAGS 2022-03-24 19:06:47 -07:00
page_counter.c mm/page_counter: remove an incorrect call to propagate_protected_usage() 2022-01-15 16:30:27 +02:00
page_ext.c mm: make some vars and functions static or __init 2022-01-15 16:30:31 +02:00
page_idle.c mm/rmap: Constify the rmap_walk_control argument 2022-03-21 13:01:35 -04:00
page_io.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
page_isolation.c Revert "mm/page_isolation: unset migratetype directly for non Buddy page" 2022-02-04 09:25:04 -08:00
page_owner.c mm/page_owner.c: record tgid 2022-03-24 19:06:44 -07:00
page_poison.c
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_table_check.c mm/page_table_check.c: use strtobool for param parsing 2022-03-22 15:57:11 -07:00
page_vma_mapped.c mm: Convert page_vma_mapped_walk to work on PFNs 2022-03-21 12:59:02 -04:00
pagewalk.c mm: pagewalk: fix walk for hugepage tables 2021-06-29 10:53:49 -07:00
percpu-internal.h mm: memcg/percpu: account extra objcg space to memory cgroups 2022-01-15 16:30:31 +02:00
percpu-km.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu.c bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
pgalloc-track.h
pgtable-generic.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
process_vm_access.c
ptdump.c mm: sparsemem: use page table lock to protect kernel pmd operations 2022-03-22 15:57:08 -07:00
readahead.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
rmap.c mm: fix race between MADV_FREE reclaim and blkdev direct IO read 2022-03-24 19:06:51 -07:00
rodata_test.c
secretmem.c fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio 2022-03-16 13:37:05 -04:00
shmem.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
shuffle.c
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab.c mm: introduce kmem_cache_alloc_lru 2022-03-22 15:57:03 -07:00
slab.h mm: introduce kmem_cache_alloc_lru 2022-03-22 15:57:03 -07:00
slab_common.c mm/slab_common: use helper function is_power_of_2() 2022-02-21 11:38:12 +01:00
slob.c slab updates for 5.18 2022-03-23 12:33:21 -07:00
slub.c slab updates for 5.18 2022-03-23 12:33:21 -07:00
sparse-vmemmap.c mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP 2022-03-22 15:57:08 -07:00
sparse.c mm/sparse: make mminit_validate_memmodel_limits() static 2022-03-22 15:57:05 -07:00
swap.c mm: delete __ClearPageWaiters() 2022-03-24 19:06:45 -07:00
swap_cgroup.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
swap_slots.c treewide: Add missing includes masked by cgroup -> bpf dependency 2021-12-03 10:58:13 -08:00
swap_state.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
swapfile.c mm/swapfile: remove stale reuse_swap_page() 2022-03-24 19:06:51 -07:00
truncate.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
usercopy.c Merge branch 'akpm' (patches from Andrew) 2022-03-22 16:11:53 -07:00
userfaultfd.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
util.c ARM: 2022-03-24 11:58:57 -07:00
vmacache.c
vmalloc.c kasan, vmalloc: only tag normal vmalloc allocations 2022-03-24 19:06:48 -07:00
vmpressure.c mm/vmpressure: fix data-race with memcg->socket_pressure 2021-11-06 13:30:40 -07:00
vmscan.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
vmstat.c mm: only re-generate demotion targets when a numa node changes its N_CPU state 2022-03-22 15:57:11 -07:00
workingset.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
z3fold.c mm/z3fold: add kerneldoc fields for z3fold_pool 2021-07-01 11:06:03 -07:00
zbud.c mm/zbud: add kerneldoc fields for zbud_pool 2021-07-01 11:06:03 -07:00
zpool.c zpool: remove the list of pools_head 2022-01-15 16:30:31 +02:00
zsmalloc.c zsmalloc: replace get_cpu_var with local_lock 2022-01-22 08:33:37 +02:00
zswap.c mm/zswap.c: allow handling just same-value filled pages 2022-03-22 15:57:11 -07:00