linux-stable/mm
Johannes Weiner 55612e80e7 mm: page_alloc: close migratetype race between freeing and stealing
There are three freeing paths that read the page's migratetype
optimistically before grabbing the zone lock.  When this races with block
stealing, those pages go on the wrong freelist.

The paths in question are:
- when freeing >costly orders that aren't THP
- when freeing pages to the buddy upon pcp lock contention
- when freeing pages that are isolated
- when freeing pages initially during boot
- when freeing the remainder in alloc_pages_exact()
- when "accepting" unaccepted VM host memory before first use
- when freeing pages during unpoisoning

None of these are so hot that they would need this optimization at the
cost of hampering defrag efforts.  Especially when contrasted with the
fact that the most common buddy freeing path - free_pcppages_bulk - is
checking the migratetype under the zone->lock just fine.

In addition, isolated pages need to look up the migratetype under the lock
anyway, which adds branches to the locked section, and results in a double
lookup when the pages are in fact isolated.

Move the lookups into the lock.

Link: https://lkml.kernel.org/r/20240320180429.678181-8-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:03 -07:00
..
damon mm: madvise: pageout: ignore references rather than clearing young 2024-03-04 17:01:18 -08:00
kasan fix missing vmalloc.h includes 2024-04-25 20:55:49 -07:00
kfence mm: introduce slabobj_ext to support slab object extensions 2024-04-25 20:55:51 -07:00
kmsan mm: kmsan: remove runtime checks from kmsan_unpoison_memory() 2024-02-22 10:24:41 -08:00
backing-dev.c vfs-6.9.misc 2024-03-11 09:38:17 -07:00
balloon_compaction.c
bootmem_info.c bootmem: use kmemleak_free_part_phys in put_page_bootmem 2023-10-25 16:47:13 -07:00
cma.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma.h mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma_debug.c
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c mm: enable page allocation tagging 2024-04-25 20:55:54 -07:00
debug.c mm: improve dumping of mapcount and page_type 2024-04-25 20:56:00 -07:00
debug_page_alloc.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
debug_page_ref.c
debug_vm_pgtable.c fix missing vmalloc.h includes 2024-04-25 20:55:49 -07:00
dmapool.c mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs 2023-12-05 11:17:58 +01:00
dmapool_test.c
early_ioremap.c
fadvise.c
fail_page_alloc.c
failslab.c
filemap.c mm: enable page allocation tagging 2024-04-25 20:55:54 -07:00
folio-compat.c mm: remove page_add_new_anon_rmap and lru_cache_add_inactive_or_unevictable 2023-12-29 11:58:27 -08:00
gup.c mm/treewide: replace pXd_huge() with pXd_leaf() 2024-04-25 20:55:46 -07:00
gup_test.c
gup_test.h
highmem.c x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs 2023-12-29 12:22:28 -08:00
hmm.c mm/treewide: replace pXd_huge() with pXd_leaf() 2024-04-25 20:55:46 -07:00
huge_memory.c mm: remove folio_prep_large_rmappable() 2024-04-25 20:56:00 -07:00
hugetlb.c hugetlb: remove mention of destructors 2024-04-25 20:56:01 -07:00
hugetlb_cgroup.c mm, hugetlb: remove HUGETLB_CGROUP_MIN_ORDER 2023-10-18 14:34:17 -07:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: move mmap lock to vmemmap_remap_range() 2023-12-12 10:57:08 -08:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: fix reference to nonexistent file 2023-10-25 16:47:14 -07:00
hwpoison-inject.c
init-mm.c mm: Deprecate pasid field 2023-12-12 10:11:32 +01:00
internal.h mm: remove folio_prep_large_rmappable() 2024-04-25 20:56:00 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig Kbuild updates for v6.9 2024-03-21 14:41:00 -07:00
Kconfig.debug mm/slub: unify all sl[au]b parameters with "slab_$param" 2024-01-22 10:31:08 +01:00
khugepaged.c mm: convert free_swap_cache() to take a folio 2024-03-04 17:01:26 -08:00
kmemleak.c mm/slub: avoid recursive loop with kmemleak 2024-04-25 20:55:59 -07:00
ksm.c mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() 2023-12-29 11:58:56 -08:00
list_lru.c mm/zswap: stop lru list shrinking when encounter warm region 2024-02-22 10:24:54 -08:00
maccess.c
madvise.c mm/madvise: don't perform madvise VMA walk for MADV_POPULATE_(READ|WRITE) 2024-04-25 20:55:43 -07:00
Makefile kbuild: make -Woverride-init warnings more consistent 2024-03-31 11:32:26 +09:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c cxl fixes for 6.8-rc6 2024-02-24 15:53:40 -08:00
memcontrol.c mm: always initialise folio->_deferred_list 2024-04-25 20:55:59 -07:00
memfd.c mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins() 2024-03-04 17:01:21 -08:00
memory-failure.c mm: free up PG_slab 2024-04-25 20:56:00 -07:00
memory-tiers.c mm/demotion: print demotion targets 2024-02-22 10:24:55 -08:00
memory.c mm/mempolicy: use numa_node_id() instead of cpu_to_node() 2024-04-25 20:55:48 -07:00
memory_hotplug.c mm/memory_hotplug: export mhp_supports_memmap_on_memory() 2024-02-22 10:24:40 -08:00
mempolicy.c mm: enable page allocation tagging 2024-04-25 20:55:54 -07:00
mempool.c mempool: hook up to memory allocation profiling 2024-04-25 20:55:56 -07:00
memremap.c mm: remove stale example from comment 2023-12-29 11:58:26 -08:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-03-13 12:12:21 -07:00
migrate.c merge mm-hotfixes-stable into mm-nonmm-stable to pick up stackdepot changes 2024-02-23 17:28:43 -08:00
migrate_device.c mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() 2023-12-29 11:58:56 -08:00
mincore.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
mlock.c mm: make folios_put() the basis of release_pages() 2024-03-04 17:01:22 -08:00
mm_init.c codetag: debug: mark codetags for reserved pages as empty 2024-04-25 20:55:58 -07:00
mm_slot.h
mmap.c RISC-V Patches for the 6.9 Merge Window 2024-03-22 10:41:13 -07:00
mmap_lock.c
mmu_gather.c mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing 2024-02-22 15:27:17 -08:00
mmu_notifier.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
mmzone.c zswap: shrink zswap pool based on memory pressure 2023-12-12 10:57:02 -08:00
mprotect.c mprotect: use pfn_swap_entry_folio 2024-02-21 16:00:03 -08:00
mremap.c mm: abstract VMA merge and extend into vma_merge_extend() helper 2023-10-18 14:34:18 -07:00
msync.c
nommu.c mm: vmalloc: enable memory allocation profiling 2024-04-25 20:55:57 -07:00
oom_kill.c mm: update mark_victim tracepoints fields 2024-03-04 17:01:16 -08:00
page-writeback.c writeback: remove a use of write_cache_pages() from do_writepages() 2024-02-23 17:48:38 -08:00
page_alloc.c mm: page_alloc: close migratetype race between freeing and stealing 2024-04-25 20:56:03 -07:00
page_counter.c
page_ext.c mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y 2024-04-25 20:55:54 -07:00
page_idle.c
page_io.c zswap: memcontrol: implement zswap writeback disabling 2023-12-29 20:22:11 -08:00
page_isolation.c mm: page_alloc: fix freelist movement during block conversion 2024-04-25 20:56:03 -07:00
page_owner.c mm: introduce slabobj_ext to support slab object extensions 2024-04-25 20:55:51 -07:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm: convert page_table_check_pte_set() to page_table_check_ptes_set() 2023-08-24 16:20:18 -07:00
page_vma_mapped.c mm: thp: introduce multi-size THP sysfs interface 2023-12-20 14:48:12 -08:00
pagewalk.c mm: pagewalk: assert write mmap lock only for walking the user page tables 2023-12-10 16:51:53 -08:00
percpu-internal.h mm: percpu: add codetag reference into pcpuobj_ext 2024-04-25 20:55:56 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c mm: percpu: enable per-cpu allocation tagging 2024-04-25 20:55:56 -07:00
pgalloc-track.h
pgtable-generic.c mm/pgtable: notes on pte_offset_map[_lock]() 2023-08-18 10:12:25 -07:00
process_vm_access.c mm: fix process_vm_rw page counts 2023-12-10 16:51:39 -08:00
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c mm: support order-1 folios in the page cache 2024-03-04 17:01:19 -08:00
rmap.c rmap: replace two calls to compound_order with folio_order 2024-02-22 15:27:20 -08:00
rodata_test.c
secretmem.c mm/secretmem: use a folio in secretmem_fault() 2023-08-21 13:38:02 -07:00
shmem.c mm/shmem: inline shmem_is_huge() for disabled transparent hugepages 2024-04-16 15:39:51 -07:00
shmem_quota.c tmpfs: fix race on handling dquot rbtree 2024-03-26 11:07:23 -07:00
show_mem.c lib: add memory allocations report in show_mem() 2024-04-25 20:55:57 -07:00
shrinker.c mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info() 2024-01-05 09:58:32 -08:00
shrinker_debug.c mm: shrinker: convert shrinker_rwsem to mutex 2023-10-04 10:32:26 -07:00
shuffle.c
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab.h mm: free up PG_slab 2024-04-25 20:56:00 -07:00
slab_common.c mm/slab: enable slab allocation tagging for kmalloc and friends 2024-04-25 20:55:55 -07:00
slub.c mm/slub: avoid recursive loop with kmemleak 2024-04-25 20:55:59 -07:00
sparse-vmemmap.c mm/vmemmap: allow architectures to override how vmemmap optimization works 2023-08-18 10:12:53 -07:00
sparse.c mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers 2024-02-21 16:00:01 -08:00
swap.c mm: fix list corruption in put_pages_list 2024-03-12 13:07:16 -07:00
swap.h mm/swap: fix race when skipping swapcache 2024-02-20 14:20:48 -08:00
swap_cgroup.c
swap_slots.c mm/zswap: invalidate zswap entry when swap entry free 2024-02-22 10:24:54 -08:00
swap_state.c mm: convert free_swap_cache() to take a folio 2024-03-04 17:01:26 -08:00
swapfile.c - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames 2024-03-14 17:43:30 -07:00
truncate.c fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
usercopy.c
userfaultfd.c userfaultfd: fix deadlock warning when locking src and dst VMAs 2024-03-26 11:07:23 -07:00
util.c mm: vmalloc: enable memory allocation profiling 2024-04-25 20:55:57 -07:00
vmalloc.c mm: vmalloc: enable memory allocation profiling 2024-04-25 20:55:57 -07:00
vmpressure.c eventfd: simplify eventfd_signal() 2023-11-28 14:08:38 +01:00
vmscan.c - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames 2024-03-14 17:43:30 -07:00
vmstat.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
workingset.c mm: move mapping_set_update out of <linux/swap.h> 2024-02-21 11:36:50 +05:30
z3fold.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zswap.c mm: zswap: remove unnecessary check in zswap_find_zpool() 2024-04-25 20:55:48 -07:00