linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-22 18:41:06 +00:00

History

Yu Zhao bd74fdaea1 mm: multi-gen LRU: support page table walks To further exploit spatial locality, the aging prefers to walk page tables to search for young PTEs and promote hot pages. A kill switch will be added in the next patch to disable this behavior. When disabled, the aging relies on the rmap only. NB: this behavior has nothing similar with the page table scanning in the 2.4 kernel [1], which searches page tables for old PTEs, adds cold pages to swapcache and unmaps them. To avoid confusion, the term "iteration" specifically means the traversal of an entire mm_struct list; the term "walk" will be applied to page tables and the rmap, as usual. An mm_struct list is maintained for each memcg, and an mm_struct follows its owner task to the new memcg when this task is migrated. Given an lruvec, the aging iterates lruvec_memcg()->mm_list and calls walk_page_range() with each mm_struct on this list to promote hot pages before it increments max_seq. When multiple page table walkers iterate the same list, each of them gets a unique mm_struct; therefore they can run concurrently. Page table walkers ignore any misplaced pages, e.g., if an mm_struct was migrated, pages it left in the previous memcg will not be promoted when its current memcg is under reclaim. Similarly, page table walkers will not promote pages from nodes other than the one under reclaim. This patch uses the following optimizations when walking page tables: 1. It tracks the usage of mm_struct's between context switches so that page table walkers can skip processes that have been sleeping since the last iteration. 2. It uses generational Bloom filters to record populated branches so that page table walkers can reduce their search space based on the query results, e.g., to skip page tables containing mostly holes or misplaced pages. 3. It takes advantage of the accessed bit in non-leaf PMD entries when CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y. 4. It does not zigzag between a PGD table and the same PMD table spanning multiple VMAs. IOW, it finishes all the VMAs within the range of the same PMD table before it returns to a PGD table. This improves the cache performance for workloads that have large numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5. Server benchmark results: Single workload: fio (buffered I/O): no change Single workload: memcached (anon): +[8, 10]% Ops/sec KB/sec patch1-7: 1147696.57 44640.29 patch1-8: 1245274.91 48435.66 Configurations: no change Client benchmark results: kswapd profiles: patch1-7 48.16% lzo1x_1_do_compress (real work) 8.20% page_vma_mapped_walk (overhead) 7.06% _raw_spin_unlock_irq 2.92% ptep_clear_flush 2.53% __zram_bvec_write 2.11% do_raw_spin_lock 2.02% memmove 1.93% lru_gen_look_around 1.56% free_unref_page_list 1.40% memset patch1-8 49.44% lzo1x_1_do_compress (real work) 6.19% page_vma_mapped_walk (overhead) 5.97% _raw_spin_unlock_irq 3.13% get_pfn_folio 2.85% ptep_clear_flush 2.42% __zram_bvec_write 2.08% do_raw_spin_lock 1.92% memmove 1.44% alloc_zspage 1.36% memset Configurations: no change Thanks to the following developers for their efforts [3]. kernel test robot <lkp@intel.com> [1] https://lwn.net/Articles/23732/ [2] https://llvm.org/docs/ScudoHardenedAllocator.html [3] https://lore.kernel.org/r/202204160827.ekEARWQo-lkp@intel.com/ Link: https://lkml.kernel.org/r/20220918080010.2920238-9-yuzhao@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Larabel <Michael@MichaelLarabel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2022-09-26 19:46:09 -07:00
..
autogroup.c	sched/autogroup: Fix sysctl move	2022-05-30 12:36:36 +02:00
autogroup.h	sched/headers: Add header guard to kernel/sched/stats.h and kernel/sched/autogroup.h	2022-02-23 08:22:00 +01:00
build_policy.c	sched: Fix missing prototype warnings	2022-05-01 10:03:43 +02:00
build_utility.c	sched: Fix missing prototype warnings	2022-05-01 10:03:43 +02:00
clock.c	sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote}	2022-05-19 23:46:09 +02:00
completion.c	sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
core.c	mm: multi-gen LRU: support page table walks	2022-09-26 19:46:09 -07:00
core_sched.c	sched/core: Fix the bug that task won't enqueue into core tree when update cookie	2022-07-21 10:39:39 +02:00
cpuacct.c	Merge branch 'sched/fast-headers' into sched/core	2022-03-15 09:05:05 +01:00
cpudeadline.c	sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
cpudeadline.h
cpufreq.c	sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
cpufreq_schedutil.c	sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util()	2022-06-28 09:17:46 +02:00
cpupri.c	sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
cpupri.h	sched/cpupri: Add CPUPRI_HIGHER	2020-10-29 11:00:30 +01:00
cputime.c	sched/core: add forced idle accounting for cgroups	2022-07-04 09:23:07 +02:00
deadline.c	This cycle's scheduler updates for v6.0 are:	2022-08-01 11:49:06 -07:00
debug.c	memory tiering: hot page selection with hint page fault latency	2022-09-11 20:25:54 -07:00
fair.c	memory tiering: adjust hot threshold automatically	2022-09-11 20:25:54 -07:00
features.h	sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg	2022-06-28 09:08:30 +02:00
idle.c	context_tracking: Take idle eqs entrypoints over RCU	2022-07-05 13:32:16 -07:00
isolation.c	sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
loadavg.c	sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
Makefile	sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
membarrier.c	sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
pelt.c	sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
pelt.h	sched/fair: Decay task PELT values during wakeup migration	2022-06-28 09:17:46 +02:00
psi.c	sched/psi: Remove unused parameter nbytes of psi_trigger_create()	2022-08-15 12:35:25 -10:00
rt.c	nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt()	2022-07-21 10:39:38 +02:00
sched-pelt.h	sched/fair: Fix "runnable_avg_yN_inv" not used warnings	2019-06-17 12:15:58 +02:00
sched.h	memory tiering: hot page selection with hint page fault latency	2022-09-11 20:25:54 -07:00
smp.h	smp: Rename flush_smp_call_function_from_idle()	2022-05-01 10:03:43 +02:00
stats.c	sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
stats.h	sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies	2022-02-23 10:58:34 +01:00
stop_task.c	sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
swait.c	sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
topology.c	sched/numa: Adjust imb_numa_nr to a better approximation of memory channels	2022-06-13 10:30:00 +02:00
wait.c	sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c files there	2022-02-23 10:58:33 +01:00
wait_bit.c	wait_on_bit: add an acquire memory barrier	2022-08-26 09:30:25 -07:00