linux-stable/mm
Vlastimil Babka af47e6a95e mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
commit 803de9000f upstream.

Sven reports an infinite loop in __alloc_pages_slowpath() for costly order
__GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO.  Such combination
can happen in a suspend/resume context where a GFP_KERNEL allocation can
have __GFP_IO masked out via gfp_allowed_mask.

Quoting Sven:

1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
   with __GFP_RETRY_MAYFAIL set.

2. page alloc's __alloc_pages_slowpath tries to get a page from the
   freelist. This fails because there is nothing free of that costly
   order.

3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
   which bails out because a zone is ready to be compacted; it pretends
   to have made a single page of progress.

4. page alloc tries to compact, but this always bails out early because
   __GFP_IO is not set (it's not passed by the snd allocator, and even
   if it were, we are suspending so the __GFP_IO flag would be cleared
   anyway).

5. page alloc believes reclaim progress was made (because of the
   pretense in item 3) and so it checks whether it should retry
   compaction. The compaction retry logic thinks it should try again,
   because:
    a) reclaim is needed because of the early bail-out in item 4
    b) a zonelist is suitable for compaction

6. goto 2. indefinite stall.

(end quote)

The immediate root cause is confusing the COMPACT_SKIPPED returned from
__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
indicating a lack of order-0 pages, and in step 5 evaluating that in
should_compact_retry() as a reason to retry, before incrementing and
limiting the number of retries.  There are however other places that
wrongly assume that compaction can happen while we lack __GFP_IO.

To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
evaluation and switch the open-coded test in try_to_compact_pages() to use
it.

Also use the new helper in:
- compaction_ready(), which will make reclaim not bail out in step 3, so
  there's at least one attempt to actually reclaim, even if chances are
  small for a costly order
- in_reclaim_compaction() which will make should_continue_reclaim()
  return false and we don't over-reclaim unnecessarily
- in __alloc_pages_slowpath() to set a local variable can_compact,
  which is then used to avoid retrying reclaim/compaction for costly
  allocations (step 5) if we can't compact and also to skip the early
  compaction attempt that we do in some cases

Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz
Fixes: 3250845d05 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Sven van Ashbrook <svenva@chromium.org>
Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/
Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Curtis Malainey <cujomalainey@chromium.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13 12:59:22 +02:00
..
kasan kasan: print the original fault addr when access invalid shadow 2023-11-08 17:30:43 +01:00
Kconfig mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-06 10:19:07 -08:00
Kconfig.debug
Makefile mm,kmemleak-test.c: move kmemleak-test.c to samples dir 2020-10-13 18:38:27 -07:00
backing-dev.c writeback, cgroup: remove extra percpu_ref_exit() 2023-05-30 12:57:56 +01:00
balloon_compaction.c
cleancache.c
cma.c mm/cma: use nth_page() in place of direct struct page manipulation 2023-11-28 16:54:58 +00:00
cma.h mm: cma: use CMA_MAX_NAME to define the length of cma name array 2020-09-01 09:19:43 +02:00
cma_debug.c
compaction.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-13 12:59:22 +02:00
debug.c mm, dump_page: rename head_mapcount() --> head_compound_mapcount() 2020-10-13 18:38:29 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: remove pte entry from the page table 2022-02-08 18:30:35 +01:00
dmapool.c mm/dmapool.c: replace hard coded function name with __func__ 2020-10-13 18:38:32 -07:00
early_ioremap.c
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c mm/filemap: fix infinite loop in generic_file_buffered_read() 2023-09-23 11:01:10 +02:00
frame_vector.c media: vb2: frame_vector.c: replace WARN_ONCE with a comment 2023-10-10 21:53:33 +02:00
frontswap.c mm/frontswap: mark various intentional data races 2020-08-14 19:56:56 -07:00
gup.c mm/migration: return errno when isolate_huge_page failed 2023-02-15 17:22:21 +01:00
gup_benchmark.c mm/gup_benchmark: take the mmap lock around GUP 2020-10-18 09:27:09 -07:00
highmem.c mm/highmem.c: clean up endif comments 2020-10-16 11:11:18 -07:00
hmm.c mm/hmm.c: allow VM_MIXEDMAP to work with hmm_range_fault 2022-01-27 10:54:36 +01:00
huge_memory.c mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage 2023-03-22 13:30:04 +01:00
hugetlb.c mm/hugetlb: change hugetlb_reserve_pages() to type bool 2024-03-15 10:48:22 -04:00
hugetlb_cgroup.c hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings 2021-03-30 14:31:54 +02:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-30 11:53:54 +01:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-06-30 08:47:27 -04:00
interval_tree.c
ioremap.c mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
khugepaged.c mm/khugepaged: check again on anon uffd-wp during isolation 2023-04-26 11:27:38 +02:00
kmemleak.c Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" 2022-09-15 11:32:02 +02:00
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-19 10:13:07 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() 2022-11-25 17:45:53 +01:00
madvise.c mm: fix madivse_pageout mishandling on non-LRU page 2022-10-05 10:38:40 +02:00
mapping_dirty_helpers.c
memblock.c Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." 2023-02-22 12:55:56 +01:00
memcontrol.c mm: kmem: drop __GFP_NOFAIL when allocating objcg vectors 2023-11-28 16:55:01 +00:00
memfd.c memfd: check for non-NULL file_seals in memfd_create() syscall 2023-06-28 10:28:09 +02:00
memory-failure.c mm/memory-failure: fix an incorrect use of tail pages 2024-04-13 12:59:01 +02:00
memory.c mm: fix unmap_mapping_range high bits shift bug 2024-01-15 18:48:06 +01:00
memory_hotplug.c mm/memory_hotplug: use pfn math in place of direct struct page manipulation 2023-11-28 16:54:58 +00:00
mempolicy.c migrate: hugetlb: check for hugetlb shared PMD in node migration 2023-02-15 17:22:21 +01:00
mempool.c mm/mempool: add 'else' to split mutually exclusive case 2020-10-13 18:38:34 -07:00
memremap.c mm/memremap.c: map FS_DAX device memory as decrypted 2022-11-16 09:57:17 +01:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-04-13 12:58:39 +02:00
migrate.c mm/migrate: set swap entry values of THP tail pages properly. 2024-04-13 12:59:02 +02:00
mincore.c mm: factor find_get_incore_page out of mincore_page 2020-10-13 18:38:29 -07:00
mlock.c mlock: fix unevictable_pgs event counts on THP 2020-09-19 13:13:38 -07:00
mm_init.c mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-08-07 11:33:26 -07:00
mmap.c mm/mmap: undo ->mmap() when arch_validate_flags() fails 2022-10-26 13:25:11 +02:00
mmu_gather.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2022-12-14 11:31:55 +01:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-27 13:53:54 +02:00
mmzone.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 20:00:09 +02:00
mprotect.c mm: don't try to NUMA-migrate COW pages that have other uses 2022-02-23 12:00:57 +01:00
mremap.c mm/mremap: hold the rmap lock in write mode when moving page table entries. 2022-08-21 15:15:21 +02:00
msync.c
nommu.c mm: remove alloc_vm_area 2020-10-18 09:27:10 -07:00
oom_kill.c oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup 2022-04-27 13:53:54 +02:00
page-writeback.c mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again 2024-02-23 08:42:24 +01:00
page_alloc.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-13 12:59:22 +02:00
page_counter.c mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge() 2020-10-13 18:38:30 -07:00
page_ext.c
page_idle.c
page_io.c mm: fix unexpected zeroed page mapping with zram swap 2022-04-20 09:23:25 +02:00
page_isolation.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_owner.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_poison.c mm/page_poison.c: replace bool variable with static key 2020-10-16 11:11:17 -07:00
page_reporting.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_reporting.h
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-30 08:47:29 -04:00
pagewalk.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 11:11:38 +02:00
percpu-internal.h percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-vm.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
pgalloc-track.h mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-30 08:47:26 -04:00
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-19 18:27:21 +01:00
ptdump.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 11:11:38 +02:00
readahead.c vfs: fix readahead(2) on block devices 2023-11-20 11:06:44 +01:00
rmap.c mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse 2022-09-05 10:28:56 +02:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c tmpfs: verify {g,u}id mount options correctly 2023-09-19 12:20:06 +02:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
slab.c mm/sl?b.c: remove ctor argument from kmem_cache_flags 2021-05-14 09:50:45 +02:00
slab.h mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag 2021-11-26 10:39:19 +01:00
slab_common.c mm/slub: fix redzoning for small allocations 2021-06-23 14:42:54 +02:00
slob.c mm: memcg: convert vmstat slab counters to bytes 2020-08-07 11:33:24 -07:00
slub.c mm/slub: fix to return errno if kmalloc() fails 2022-09-28 11:10:28 +02:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c mm/sparsemem: fix race in accessing memory_section->usage 2024-02-23 08:42:00 +01:00
swap.c mm: move call to compound_head() in release_pages() 2020-10-13 18:38:33 -07:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache() 2020-10-13 18:38:30 -07:00
swap_state.c mm: swap: get rid of livelock in swapin readahead 2022-03-23 09:13:27 +01:00
swapfile.c mm: swap: fix race between free_swap_and_cache() and swapoff() 2024-04-13 12:58:26 +02:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-30 08:47:27 -04:00
usercopy.c mm/usercopy: return 1 from hardened_usercopy __setup() handler 2022-04-08 14:40:43 +02:00
userfaultfd.c userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb 2024-03-01 13:16:43 +01:00
util.c mm: vmalloc: introduce array allocation functions 2024-02-23 08:41:55 +01:00
vmacache.c
vmalloc.c mm: add a call to flush_cache_vmap() in vmap_pfn() 2023-08-30 16:23:14 +02:00
vmpressure.c
vmscan.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-13 12:59:22 +02:00
vmstat.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 20:00:09 +02:00
workingset.c XArray updates for 5.9 2020-10-20 14:39:37 -07:00
z3fold.c mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page 2021-07-14 16:56:51 +02:00
zbud.c mm/zbud: remove redundant initialization 2020-10-13 18:38:34 -07:00
zpool.c mm/zpool.c: delete duplicated word and fix grammar 2020-08-12 10:57:58 -07:00
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:42:43 +02:00
zswap.c