linux-stable/mm
Barry Song 2864f3d0f5 mm: madvise: pageout: ignore references rather than clearing young
While doing MADV_PAGEOUT, the current code will clear PTE young so that
vmscan won't read young flags to allow the reclamation of madvised folios
to go ahead.  It seems we can do it by directly ignoring references, thus
we can remove tlb flush in madvise and rmap overhead in vmscan.

Regarding the side effect, in the original code, if a parallel thread runs
side by side to access the madvised memory with the thread doing madvise,
folios will get a chance to be re-activated by vmscan (though the time gap
is actually quite small since checking PTEs is done immediately after
clearing PTEs young).  But with this patch, they will still be reclaimed. 
But this behaviour doing PAGEOUT and doing access at the same time is
quite silly like DoS.  So probably, we don't need to care.  Or ignoring
the new access during the quite small time gap is even better.

For DAMON's DAMOS_PAGEOUT based on physical address region, we still keep
its behaviour as is since a physical address might be mapped by multiple
processes.  MADV_PAGEOUT based on virtual address is actually much more
aggressive on reclamation.  To untouch paddr's DAMOS_PAGEOUT, we simply
pass ignore_references as false in reclaim_pages().

A microbench as below has shown 6% decrement on the latency of
MADV_PAGEOUT,

 #define PGSIZE 4096
 main()
 {
 	int i;
 #define SIZE 512*1024*1024
 	volatile long *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
 			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

 	for (i = 0; i < SIZE/sizeof(long); i += PGSIZE / sizeof(long))
 		p[i] =  0x11;

 	madvise(p, SIZE, MADV_PAGEOUT);
 }

w/o patch                    w/ patch
root@10:~# time ./a.out      root@10:~# time ./a.out
real	0m49.634s            real   0m46.334s
user	0m0.637s             user   0m0.648s
sys	0m47.434s            sys    0m44.265s

Link: https://lkml.kernel.org/r/20240226005739.24350-1-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04 17:01:18 -08:00
..
damon mm: madvise: pageout: ignore references rather than clearing young 2024-03-04 17:01:18 -08:00
kasan kasan: fix a2 allocation and remove explicit cast in atomic tests 2024-03-04 17:01:17 -08:00
kfence KFENCE: cleanup kfence_guarded_alloc() after CONFIG_SLAB removal 2023-12-05 11:17:58 +01:00
kmsan mm: kmsan: remove runtime checks from kmsan_unpoison_memory() 2024-02-22 10:24:41 -08:00
Kconfig Introduce cpu_dcache_is_aliasing() across all architectures 2024-02-22 15:27:19 -08:00
Kconfig.debug mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile 2023-12-05 11:14:40 +01:00
Makefile mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile 2023-12-05 11:14:40 +01:00
backing-dev.c blk-wbt: Fix detection of dirty-throttled tasks 2024-02-06 09:44:03 -07:00
balloon_compaction.c
bootmem_info.c
cma.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma.h mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma_debug.c
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c mm/compaction: optimize >0 order folio compaction with free page split. 2024-02-23 17:48:33 -08:00
debug.c
debug_page_alloc.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: fix BUG_ON with pud advanced test 2024-02-23 17:27:13 -08:00
dmapool.c mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs 2023-12-05 11:17:58 +01:00
dmapool_test.c
early_ioremap.c
fadvise.c
fail_page_alloc.c
failslab.c
filemap.c merge mm-hotfixes-stable into mm-nonmm-stable to pick up stackdepot changes 2024-02-23 17:28:43 -08:00
folio-compat.c mm: remove page_add_new_anon_rmap and lru_cache_add_inactive_or_unevictable 2023-12-29 11:58:27 -08:00
gup.c mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() 2023-12-29 11:58:56 -08:00
gup_test.c
gup_test.h
highmem.c x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs 2023-12-29 12:22:28 -08:00
hmm.c
huge_memory.c userfaultfd: use per-vma locks in userfaultfd operations 2024-02-22 15:27:20 -08:00
hugetlb.c hugetlb: allow faults to be handled under the VMA lock 2024-03-04 17:01:16 -08:00
hugetlb_cgroup.c
hugetlb_vmemmap.c mm: hugetlb_vmemmap: move mmap lock to vmemmap_remap_range() 2023-12-12 10:57:08 -08:00
hugetlb_vmemmap.h
hwpoison-inject.c
init-mm.c mm: Deprecate pasid field 2023-12-12 10:11:32 +01:00
internal.h mm: madvise: pageout: ignore references rather than clearing young 2024-03-04 17:01:18 -08:00
interval_tree.c
io-mapping.c
ioremap.c
khugepaged.c mm/khugepaged: bypassing unnecessary scans with MMF_DISABLE_THP check 2024-02-23 17:48:25 -08:00
kmemleak.c kmemleak: avoid RCU stalls when freeing metadata for per-CPU pointers 2023-12-12 10:57:07 -08:00
ksm.c mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() 2023-12-29 11:58:56 -08:00
list_lru.c mm/zswap: stop lru list shrinking when encounter warm region 2024-02-22 10:24:54 -08:00
maccess.c
madvise.c mm: madvise: pageout: ignore references rather than clearing young 2024-03-04 17:01:18 -08:00
mapping_dirty_helpers.c
memblock.c mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array 2024-02-20 14:20:49 -08:00
memcontrol.c mm: memcg: use larger batches for proactive reclaim 2024-02-22 10:24:52 -08:00
memfd.c
memory-failure.c mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page 2024-02-07 21:20:34 -08:00
memory-tiers.c mm/demotion: print demotion targets 2024-02-22 10:24:55 -08:00
memory.c mm/memory: change vmf_anon_prepare() to be non-static 2024-03-04 17:01:15 -08:00
memory_hotplug.c mm/memory_hotplug: export mhp_supports_memmap_on_memory() 2024-02-22 10:24:40 -08:00
mempolicy.c mm/mempolicy: protect task interleave functions with tsk->mems_allowed_seq 2024-02-22 10:24:47 -08:00
mempool.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
memremap.c mm: remove stale example from comment 2023-12-29 11:58:26 -08:00
memtest.c
migrate.c merge mm-hotfixes-stable into mm-nonmm-stable to pick up stackdepot changes 2024-02-23 17:28:43 -08:00
migrate_device.c mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() 2023-12-29 11:58:56 -08:00
mincore.c
mlock.c
mm_init.c efi: disable mirror feature during crashkernel 2024-01-12 15:20:47 -08:00
mm_slot.h
mmap.c mm/mmap: pass vma to vma_merge() 2024-02-22 10:24:52 -08:00
mmap_lock.c
mmu_gather.c mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing 2024-02-22 15:27:17 -08:00
mmu_notifier.c
mmzone.c zswap: shrink zswap pool based on memory pressure 2023-12-12 10:57:02 -08:00
mprotect.c mprotect: use pfn_swap_entry_folio 2024-02-21 16:00:03 -08:00
mremap.c
msync.c
nommu.c mm/vmalloc: remove vmap_area_list 2024-02-23 17:48:19 -08:00
oom_kill.c mm: update mark_victim tracepoints fields 2024-03-04 17:01:16 -08:00
page-writeback.c writeback: remove a use of write_cache_pages() from do_writepages() 2024-02-23 17:48:38 -08:00
page_alloc.c mm/page_alloc: make check_new_page() return bool 2024-03-04 17:01:15 -08:00
page_counter.c
page_ext.c
page_idle.c
page_io.c zswap: memcontrol: implement zswap writeback disabling 2023-12-29 20:22:11 -08:00
page_isolation.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_owner.c mm,page_owner: filter out stacks by a threshold 2024-02-23 17:48:17 -08:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c
page_vma_mapped.c mm: thp: introduce multi-size THP sysfs interface 2023-12-20 14:48:12 -08:00
pagewalk.c mm: pagewalk: assert write mmap lock only for walking the user page tables 2023-12-10 16:51:53 -08:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: Introduce flush_cache_vmap_early() 2023-12-14 00:23:17 -08:00
pgalloc-track.h
pgtable-generic.c
process_vm_access.c mm: fix process_vm_rw page counts 2023-12-10 16:51:39 -08:00
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c readahead: use ilog2 instead of a while loop in page_cache_ra_order() 2024-02-22 10:24:38 -08:00
rmap.c rmap: replace two calls to compound_order with folio_order 2024-02-22 15:27:20 -08:00
rodata_test.c
secretmem.c
shmem.c shmem: properly report quota mount options 2024-02-23 17:48:34 -08:00
shmem_quota.c
show_mem.c mm, treewide: introduce NR_PAGE_ORDERS 2024-01-08 15:27:15 -08:00
shrinker.c mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info() 2024-01-05 09:58:32 -08:00
shrinker_debug.c
shuffle.c
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab.h mm/slab: move kmalloc() functions from slab_common.c to slub.c 2023-12-06 11:57:21 +01:00
slab_common.c slub: use a folio in __kmalloc_large_node 2024-01-05 10:17:46 -08:00
slub.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
sparse-vmemmap.c
sparse.c mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers 2024-02-21 16:00:01 -08:00
swap.c mm/mmu_gather: add __tlb_remove_folio_pages() 2024-02-22 15:27:17 -08:00
swap.h mm/swap: fix race when skipping swapcache 2024-02-20 14:20:48 -08:00
swap_cgroup.c
swap_slots.c mm/zswap: invalidate zswap entry when swap entry free 2024-02-22 10:24:54 -08:00
swap_state.c mm/mmu_gather: add __tlb_remove_folio_pages() 2024-02-22 15:27:17 -08:00
swapfile.c mm/swapfile:__swap_duplicate: drop redundant WRITE_ONCE on swap_map for err cases 2024-02-23 17:48:34 -08:00
truncate.c fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
usercopy.c
userfaultfd.c userfaultfd: use per-vma locks in userfaultfd operations 2024-02-22 15:27:20 -08:00
util.c mm/util.c: add byte count to __vm_enough_memory failure warning 2024-03-04 17:01:14 -08:00
vmalloc.c mm: vmalloc: refactor vmalloc_dump_obj() function 2024-02-23 17:48:21 -08:00
vmpressure.c eventfd: simplify eventfd_signal() 2023-11-28 14:08:38 +01:00
vmscan.c mm: madvise: pageout: ignore references rather than clearing young 2024-03-04 17:01:18 -08:00
vmstat.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
workingset.c mm: ratelimit stat flush from workingset shrinker 2024-01-05 10:17:45 -08:00
z3fold.c mm/z3fold: fix the comment for __encode_handle() 2024-02-23 17:48:31 -08:00
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc: remove get_zspage_mapping() 2024-02-23 17:48:32 -08:00
zswap.c mm/zswap: change zswap_pool kref to percpu_ref 2024-03-04 17:01:13 -08:00