linux-stable/mm
Vlastimil Babka 700d2e9a36 mm, page_alloc: reduce page alloc/free sanity checks
Historically, we have performed sanity checks on all struct pages being
allocated or freed, making sure they have no unexpected page flags or
certain field values.  This can detect insufficient cleanup and some cases
of use-after-free, although on its own it can't always identify the
culprit.  The result is a warning and the "bad page" being leaked.

The checks do need some cpu cycles, so in 4.7 with commits 479f854a20
("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
and 4db7548ccb ("mm, page_alloc: defer debugging checks of freed pages
until a PCP drain") they were no longer performed in the hot paths when
allocating and freeing from pcplists, but only when pcplists are bypassed,
refilled or drained.  For debugging purposes, with CONFIG_DEBUG_VM enabled
the checks were instead still done in the hot paths and not when refilling
or draining pcplists.

With 4462b32c92 ("mm, page_alloc: more extensive free page checking with
debug_pagealloc"), enabling debug_pagealloc also moved the sanity checks
back to hot pahs.  When both debug_pagealloc and CONFIG_DEBUG_VM are
enabled, the checks are done both in hotpaths and pcplist refill/drain.

Even though the non-debug default today might seem to be a sensible
tradeoff between overhead and ability to detect bad pages, on closer look
it's arguably not.  As most allocations go through the pcplists, catching
any bad pages when refilling or draining pcplists has only a small chance,
insufficient for debugging or serious hardening purposes.  On the other
hand the cost of the checks is concentrated in the already expensive
drain/refill batching operations, and those are done under the often
contended zone lock.  That was recently identified as an issue for page
allocation and the zone lock contention reduced by moving the checks
outside of the locked section with a patch "mm: reduce lock contention of
pcp buffer refill", but the cost of the checks is still visible compared
to their removal [1].  In the pcplist draining path free_pcppages_bulk()
the checks are still done under zone->lock.

Thus, remove the checks from pcplist refill and drain paths completely.
Introduce a static key check_pages_enabled to control checks during page
allocation a freeing (whether pcplist is used or bypassed). The static
key is enabled if either is true:

- kernel is built with CONFIG_DEBUG_VM=y (debugging)
- debug_pagealloc or page poisoning is boot-time enabled (debugging)
- init_on_alloc or init_on_free is boot-time enabled (hardening)

The resulting user visible changes:
- no checks when draining/refilling pcplists - less overhead, with
  likely no practical reduction of ability to catch bad pages
- no checks when bypassing pcplists in default config (no
  debugging/hardening) - less overhead etc. as above
- on typical hardened kernels [2], checks are now performed on each page
  allocation/free (previously only when bypassing/draining/refilling
  pcplists) - the init_on_alloc/init_on_free enabled should be sufficient
  indication for preferring more costly alloc/free operations for
  hardening purposes and we shouldn't need to introduce another toggle
- code (various wrappers) removal and simplification

[1] https://lore.kernel.org/all/68ba44d8-6899-c018-dcb3-36f3a96e6bea@sra.uni-hannover.de/
[2] https://lore.kernel.org/all/63ebc499.a70a0220.9ac51.29ea@mx.google.com/

[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: make check_pages_enabled static]
Link: https://lkml.kernel.org/r/20230216095131.17336-1-vbabka@suse.cz
Reported-by: Alexander Halbuer <halbuer@sra.uni-hannover.de>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 16:20:06 -07:00
..
damon mm/damon/paddr: fix folio_nr_pages() after folio_put() in damon_pa_mark_accessed_or_deactivate() 2023-03-07 17:04:55 -08:00
kasan kasan: test: fix test for new meminstrinsic instrumentation 2023-03-02 21:54:22 -08:00
kfence mm: kfence: fix handling discontiguous page 2023-03-28 15:24:32 -07:00
kmsan kmsan: disable ftrace in kmsan core code 2023-02-20 12:46:16 -08:00
backing-dev.c mm: add /sys/class/bdi/<bdi>/min_ratio_fine knob 2022-11-30 15:59:06 -08:00
balloon_compaction.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
bootmem_info.c bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem 2022-08-28 14:02:45 -07:00
cma.c mm/cma: fix potential memory loss on cma_declare_contiguous_nid 2023-02-02 22:33:24 -08:00
cma.h
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c mm: cma: make kobj_type structure constant 2023-03-28 16:20:06 -07:00
compaction.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
debug.c mm: export dump_mm() 2023-02-09 16:51:40 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE 2023-02-02 22:33:11 -08:00
dmapool.c
early_ioremap.c
fadvise.c mm: support POSIX_FADV_NOREUSE 2023-01-18 17:12:57 -08:00
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
folio-compat.c mm: change to return bool for isolate_lru_page() 2023-02-20 12:46:17 -08:00
frontswap.c frontswap: don't call ->init if no ops are registered 2022-09-26 12:14:34 -07:00
gup.c mm: change to return bool for folio_isolate_lru() 2023-02-20 12:46:17 -08:00
gup_test.c mm/gup_test: free memory allocated via kvcalloc() using kvfree() 2022-12-15 16:37:48 -08:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
highmem.c highmem: fix kmap_to_page() for kmap_local_page() addresses 2022-10-12 18:51:51 -07:00
hmm.c mm/hugetlb: make walk_hugetlb_range() safe to pmd unshare 2023-01-18 17:12:39 -08:00
huge_memory.c mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage 2023-03-07 17:04:53 -08:00
hugetlb.c mm: hugetlb: change to return bool for isolate_hugetlb() 2023-02-20 12:46:17 -08:00
hugetlb_cgroup.c mm/hugetlb: increase use of folios in alloc_huge_page() 2023-02-13 15:54:27 -08:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: simplify hugetlb_vmemmap_init() a bit 2023-03-28 16:20:06 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability 2022-08-08 18:06:43 -07:00
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c mm: remove rb tree. 2022-09-26 19:46:16 -07:00
internal.h - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig zsmalloc: set default zspage chain size to 8 2023-02-02 22:33:23 -08:00
Kconfig.debug mm: move KMEMLEAK's Kconfig items from lib to mm 2023-02-02 22:33:26 -08:00
khugepaged.c mm/khugepaged: alloc_charge_hpage() take care of mem charge errors 2023-03-28 16:20:06 -07:00
kmemleak.c lib/stackdepot, mm: rename stack_depot_want_early_init 2023-02-16 20:43:49 -08:00
ksm.c mm/ksm: fix race with VMA iteration and mm_struct teardown 2023-03-23 17:18:33 -07:00
list_lru.c
maccess.c maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() 2022-11-11 11:44:46 -08:00
madvise.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
Makefile mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol 2022-10-03 14:03:36 -07:00
mapping_dirty_helpers.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
memblock.c memblock: small optimizations 2023-02-27 09:34:53 -08:00
memcontrol.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
memfd.c mm/memfd: add write seals when apply SEAL_EXEC to executable memfd 2023-01-18 17:12:37 -08:00
memory-failure.c mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON 2023-02-27 17:00:14 -08:00
memory-tiers.c memory tier: release the new_memtier in find_create_memory_tier() 2023-02-09 16:51:40 -08:00
memory.c mm/uffd: fix comment in handling pte markers 2023-02-20 12:46:18 -08:00
memory_hotplug.c mm/memory_hotplug: cleanup return value handing in do_migrate_range() 2023-02-20 12:46:18 -08:00
mempolicy.c mm: hugetlb: change to return bool for isolate_hugetlb() 2023-02-20 12:46:17 -08:00
mempool.c mempool: do not use ksize() for poisoning 2022-11-30 15:58:41 -08:00
memremap.c mm/memremap.c: fix outdated comment in devm_memremap_pages 2023-02-09 16:51:46 -08:00
memtest.c
migrate.c migrate_pages: try migrate in batch asynchronously firstly 2023-03-07 17:04:54 -08:00
migrate_device.c mm: change to return bool for isolate_lru_page() 2023-02-20 12:46:17 -08:00
mincore.c mm: teach mincore_hugetlb about pte markers 2023-03-07 17:04:53 -08:00
mlock.c mm: introduce vm_flags_reset_once to replace WRITE_ONCE vm_flags updates 2023-02-09 16:51:41 -08:00
mm_init.c memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap.c mm: deduplicate error handling for map_deny_write_exec 2023-03-23 17:18:32 -07:00
mmap_lock.c
mmu_gather.c mm: mmu_gather: allow more than one batch of delayed rmaps 2022-12-11 18:12:21 -08:00
mmu_notifier.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
mmzone.c mm: multi-gen LRU: groundwork 2022-09-26 19:46:09 -07:00
mprotect.c mm: fix error handling for map_deny_write_exec 2023-03-23 17:18:33 -07:00
mremap.c mm: replace vma->vm_flags direct modifications with modifier calls 2023-02-09 16:51:39 -08:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c mm: replace vma->vm_flags direct modifications with modifier calls 2023-02-09 16:51:39 -08:00
oom_kill.c mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export 2023-02-02 22:32:54 -08:00
page-writeback.c fs: convert writepage_t callback to pass a folio 2023-02-02 22:33:34 -08:00
page_alloc.c mm, page_alloc: reduce page alloc/free sanity checks 2023-03-28 16:20:06 -07:00
page_counter.c mm: page_counter: remove unneeded atomic ops for low/min 2022-09-11 20:26:01 -07:00
page_ext.c mm/page_ext: init page_ext early if there are no deferred struct pages 2023-02-02 22:33:22 -08:00
page_idle.c mm: page_idle: convert page idle to use a folio 2023-01-18 17:12:52 -08:00
page_io.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
page_isolation.c mm/page_isolation: fix clang deadcode warning 2022-10-28 13:37:22 -07:00
page_owner.c lib/stackdepot, mm: rename stack_depot_want_early_init 2023-02-16 20:43:49 -08:00
page_poison.c
page_reporting.c mm/page_reporting: replace rcu_access_pointer() with rcu_dereference_protected() 2023-01-18 17:12:50 -08:00
page_reporting.h
page_table_check.c mm/page_ext: do not allocate space for page_ext->flags if not needed 2023-02-02 22:33:11 -08:00
page_vma_mapped.c mm/hugetlb: introduce hugetlb_walk() 2023-01-18 17:12:39 -08:00
pagewalk.c mm/hugetlb: introduce hugetlb_walk() 2023-01-18 17:12:39 -08:00
percpu-internal.h mm: percpu: fix incorrect size in pcpu_obj_full_size() 2023-02-16 20:43:55 -08:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: memcontrol: rename memcg_kmem_enabled() 2023-02-16 20:43:56 -08:00
pgalloc-track.h
pgtable-generic.c
process_vm_access.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
ptdump.c mm: pagewalk: Fix race between unmap and page walker 2022-09-03 10:13:13 -07:00
readahead.c readahead: convert readahead_expand() to use a folio 2023-02-02 22:33:21 -08:00
rmap.c mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON 2023-02-27 17:00:14 -08:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
shmem.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
shrinker_debug.c mm: shrinkers: fix deadlock in shrinker debugfs 2023-02-09 15:56:51 -08:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h
slab.c slab fix for 6.3-rc4 2023-03-24 10:12:14 -07:00
slab.h mm: memcontrol: rename memcg_kmem_enabled() 2023-02-16 20:43:56 -08:00
slab_common.c mm/kasan: simplify and refine kasan_cache code 2023-01-18 17:12:55 -08:00
slob.c Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next 2022-09-29 11:30:55 +02:00
slub.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
sparse-vmemmap.c mm/sparse-vmemmap: generalise vmemmap_populate_hugepages() 2022-12-11 18:12:12 -08:00
sparse.c mm/sparse: fix "unused function 'pgdat_to_phys'" warning 2023-02-02 22:33:29 -08:00
swap.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
swap.h mm: remove the __swap_writepage return value 2023-02-02 22:33:33 -08:00
swap_cgroup.c mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled 2022-10-03 14:03:36 -07:00
swap_slots.c mm/swap: convert put_swap_page() to put_swap_folio() 2022-10-03 14:02:46 -07:00
swap_state.c swap_state: update shadow_nodes for anonymous page 2023-02-02 22:33:24 -08:00
swapfile.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
truncate.c folio-compat: remove lru_cache_add() 2022-12-11 18:12:13 -08:00
usercopy.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
userfaultfd.c mm/uffd: detect pgtable allocation failures 2023-01-18 17:12:53 -08:00
util.c mm: fix typo in __vm_enough_memory warning 2023-02-13 15:54:33 -08:00
vmalloc.c mm, vmalloc: fix high order __GFP_NOFAIL allocations 2023-03-23 17:18:31 -07:00
vmpressure.c
vmscan.c mm: change to return bool for folio_isolate_lru() 2023-02-20 12:46:17 -08:00
vmstat.c mm: vmscan: split khugepaged stats from direct reclaim stats 2022-11-30 15:58:41 -08:00
workingset.c swap_state: update shadow_nodes for anonymous page 2023-02-02 22:33:24 -08:00
z3fold.c mm: remove PageMovable export 2023-01-18 17:12:57 -08:00
zbud.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zpool.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zsmalloc.c zsmalloc: make zspage chain size configurable 2023-02-02 22:33:23 -08:00
zswap.c zswap: fix writeback lock ordering for zsmalloc 2022-12-11 18:12:09 -08:00