linux-stable/mm
Aaron Thompson 115d9d77bb mm: Always release pages to the buddy allocator in memblock_free_late().
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the deferred
init process. memblock_free_pages() is called by memblock_free_late(),
which is used to free reserved ranges after memblock_free_all() has
run. All pages in reserved ranges have been initialized at that point,
and accordingly, those pages are not touched by the deferred init
process. This means that currently, if the pages that
memblock_free_late() intends to release are in the deferred range, they
will never be released to the buddy allocator. They will forever be
reserved.

In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().

For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call __free_pages_core()
directly instead.

One case where this issue can occur in the wild is EFI boot on
x86_64. The x86 EFI code reserves all EFI boot services memory ranges
via memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively). If any of those ranges happens to fall within the
deferred init range, the pages will not be released and that memory will
be unavailable.

For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:

v6.2-rc2:
  # grep -E 'Node|spanned|present|managed' /proc/zoneinfo
  Node 0, zone      DMA
          spanned  4095
          present  3999
          managed  3840
  Node 0, zone    DMA32
          spanned  246652
          present  245868
          managed  178867

v6.2-rc2 + patch:
  # grep -E 'Node|spanned|present|managed' /proc/zoneinfo
  Node 0, zone      DMA
          spanned  4095
          present  3999
          managed  3840
  Node 0, zone    DMA32
          spanned  246652
          present  245868
          managed  222816   # +43,949 pages

Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/01010185892de53e-e379acfb-7044-4b24-b30a-e2657c1ba989-000000@us-west-2.amazonses.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2023-01-08 18:49:33 +02:00
..
damon mm/damon: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
kasan hardening updates for v6.2-rc1 2022-12-14 12:20:00 -08:00
kfence hardening updates for v6.2-rc1 2022-12-14 12:20:00 -08:00
kmsan kmsan: export kmsan_handle_urb 2022-12-21 14:31:52 -08:00
Kconfig New Feature: 2022-12-17 14:06:53 -06:00
Kconfig.debug mm, slub: add CONFIG_SLUB_TINY 2022-11-27 23:38:02 +01:00
Makefile
backing-dev.c mm: add /sys/class/bdi/<bdi>/min_ratio_fine knob 2022-11-30 15:59:06 -08:00
balloon_compaction.c
bootmem_info.c
cma.c
cma.h
cma_debug.c
cma_sysfs.c
compaction.c mm, compaction: fix fast_isolate_around() to stay within boundaries 2022-11-30 15:59:07 -08:00
debug.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm: remove unused savedwrite infrastructure 2022-11-30 15:58:49 -08:00
dmapool.c
early_ioremap.c
fadvise.c mm/fadvise: use LLONG_MAX instead of -1 for eof 2022-12-11 18:12:11 -08:00
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c filemap: convert replace_page_cache_page() to replace_page_cache_folio() 2022-12-11 18:12:12 -08:00
folio-compat.c folio-compat: remove lru_cache_add() 2022-12-11 18:12:13 -08:00
frontswap.c
gup.c New Feature: 2022-12-17 14:06:53 -06:00
gup_test.c mm/gup_test: free memory allocated via kvcalloc() using kvfree() 2022-12-15 16:37:48 -08:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
highmem.c
hmm.c mm: Remove pointless barrier() after pmdp_get_lockless() 2022-12-15 10:37:27 -08:00
huge_memory.c ARM64: 2022-12-15 11:12:21 -08:00
hugetlb.c hugetlb: really allocate vma lock for all sharable vmas 2022-12-21 14:31:52 -08:00
hugetlb_cgroup.c mm/hugeltb_cgroup: convert hugetlb_cgroup_commit_charge*() to folios 2022-11-30 15:58:43 -08:00
hugetlb_vmemmap.c mm/hugetlb_vmemmap: remap head page to newly allocated page 2022-11-30 15:58:47 -08:00
hugetlb_vmemmap.h
hwpoison-inject.c
init-mm.c
internal.h mm/hwpoison: introduce per-memory_block hwpoison counter 2022-11-08 17:37:22 -08:00
interval_tree.c
io-mapping.c
ioremap.c
khugepaged.c New Feature: 2022-12-17 14:06:53 -06:00
kmemleak.c mm/kmemleak: use %pK to display kernel pointers in backtrace 2022-12-15 16:37:49 -08:00
ksm.c mm/ksm: convert break_ksm() to use walk_page_range_vma() 2022-12-11 18:12:09 -08:00
list_lru.c
maccess.c maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() 2022-11-11 11:44:46 -08:00
madvise.c MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
mapping_dirty_helpers.c mm: Rename pmd_read_atomic() 2022-12-15 10:37:27 -08:00
memblock.c mm: Always release pages to the buddy allocator in memblock_free_late(). 2023-01-08 18:49:33 +02:00
memcontrol.c mm: memcg: fix swapcached stat accounting 2022-12-11 18:12:20 -08:00
memfd.c
memory-failure.c mm/memory-failure.c: cleanup in unpoison_memory 2022-11-30 15:59:08 -08:00
memory-tiers.c mm/demotion: fix NULL vs IS_ERR checking in memory_tier_init 2022-11-30 15:58:54 -08:00
memory.c mm: remove VM_FAULT_WRITE 2022-12-11 18:12:08 -08:00
memory_hotplug.c
mempolicy.c mm/mempolicy: fix memory leak in set_mempolicy_home_node system call 2022-12-21 14:31:51 -08:00
mempool.c mempool: do not use ksize() for poisoning 2022-11-30 15:58:41 -08:00
memremap.c mm/memremap.c: map FS_DAX device memory as decrypted 2022-11-08 15:57:23 -08:00
memtest.c
migrate.c MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
migrate_device.c mm/migrate_device: return number of migrating pages in args->cpages 2022-11-22 18:50:43 -08:00
mincore.c mm: convert find_get_incore_page() to filemap_get_incore_folio() 2022-11-08 17:37:18 -08:00
mlock.c
mm_init.c memory: move hotplug memory notifier priority to same file for easy sorting 2022-11-08 17:37:17 -08:00
mm_slot.h
mmap.c MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
mmap_lock.c
mmu_gather.c mm: mmu_gather: allow more than one batch of delayed rmaps 2022-12-11 18:12:21 -08:00
mmu_notifier.c
mmzone.c
mprotect.c New Feature: 2022-12-17 14:06:53 -06:00
mremap.c mm, mremap: fix mremap() expanding vma with addr inside vma 2022-12-21 14:31:51 -08:00
msync.c
nommu.c
oom_kill.c
page-writeback.c mm: add bdi_set_min_ratio_no_scale() function 2022-11-30 15:59:06 -08:00
page_alloc.c mm/page_alloc: update comments in __free_pages_ok() 2022-12-11 18:12:17 -08:00
page_counter.c
page_ext.c Merge branch 'mm-hotfixes-stable' into mm-stable 2022-11-30 14:58:42 -08:00
page_idle.c
page_io.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
page_isolation.c mm/page_isolation: fix clang deadcode warning 2022-10-28 13:37:22 -07:00
page_owner.c
page_poison.c
page_reporting.c mm/page_reporting: Add checks for page_reporting_order param 2022-11-28 16:48:19 +00:00
page_reporting.h
page_table_check.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
page_vma_mapped.c
pagewalk.c mm/pagewalk: add walk_page_range_vma() 2022-12-11 18:12:08 -08:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm/percpu.c: remove the lcm code since block size is fixed at page size 2022-11-07 22:59:18 -08:00
pgalloc-track.h
pgtable-generic.c
process_vm_access.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
ptdump.c
readahead.c
rmap.c mm,thp,rmap: fix races between updates of subpages_mapcount 2022-12-11 18:12:20 -08:00
rodata_test.c
secretmem.c
shmem.c MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
shrinker_debug.c
shuffle.c
shuffle.h
slab.c Merge branch 'slab/for-6.2/kmalloc_redzone' into slab/for-next 2022-11-21 10:36:09 +01:00
slab.h Merge branch 'slub-tiny-v1r6' into slab/for-next 2022-12-01 00:14:00 +01:00
slab_common.c hardening updates for v6.2-rc1 2022-12-14 12:20:00 -08:00
slob.c
slub.c MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
sparse-vmemmap.c mm/sparse-vmemmap: generalise vmemmap_populate_hugepages() 2022-12-11 18:12:12 -08:00
sparse.c mm/hwpoison: introduce per-memory_block hwpoison counter 2022-11-08 17:37:22 -08:00
swap.c mm: teach release_pages() to take an array of encoded page pointers too 2022-11-30 15:58:50 -08:00
swap.h mm: convert find_get_incore_page() to filemap_get_incore_folio() 2022-11-08 17:37:18 -08:00
swap_cgroup.c
swap_slots.c
swap_state.c mm: mmu_gather: prepare to gather encoded page pointers with flags 2022-11-30 15:58:50 -08:00
swapfile.c MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
truncate.c folio-compat: remove lru_cache_add() 2022-12-11 18:12:13 -08:00
usercopy.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
userfaultfd.c New Feature: 2022-12-17 14:06:53 -06:00
util.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
vmalloc.c mm: vmalloc: use trace_free_vmap_area_noflush event 2022-11-08 17:37:17 -08:00
vmpressure.c
vmscan.c New Feature: 2022-12-17 14:06:53 -06:00
vmstat.c mm: vmscan: split khugepaged stats from direct reclaim stats 2022-11-30 15:58:41 -08:00
workingset.c folio-compat: remove lru_cache_add() 2022-12-11 18:12:13 -08:00
z3fold.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zbud.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zpool.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zsmalloc.c zsmalloc: implement writeback mechanism for zsmalloc 2022-12-11 18:12:10 -08:00
zswap.c zswap: fix writeback lock ordering for zsmalloc 2022-12-11 18:12:09 -08:00