linux-stable/mm
Dave Hansen 295be91f7e mm/migrate: optimize hotplug-time demotion order updates
Patch series "mm/migrate: 5.15 fixes for automatic demotion", v2.

This contains two fixes for the "automatic demotion" code which was
merged into 5.15:

 * Fix memory hotplug performance regression by watching
   suppressing any real action on irrelevant hotplug events.

 * Ensure CPU hotplug handler is registered when memory hotplug
   is disabled.

This patch (of 2):

== tl;dr ==

Automatic demotion opted for a simple, lazy approach to handling hotplug
events.  This noticeably slows down memory hotplug[1].  Optimize away
updates to the demotion order when memory hotplug events should have no
effect.

This has no effect on CPU hotplug.  There is no known problem on the CPU
side and any work there will be in a separate series.

== Background ==

Automatic demotion is a memory migration strategy to ensure that new
allocations have room in faster memory tiers on tiered memory systems.
The kernel maintains an array (node_demotion[]) to drive these
migrations.

The node_demotion[] path is calculated by starting at nodes with CPUs
and then "walking" to nodes with memory.  Only hotplug events which
online or offline a node with memory (N_ONLINE) or CPUs (N_CPU) will
actually affect the migration order.

== Problem ==

However, the current code is lazy.  It completely regenerates the
migration order on *any* CPU or memory hotplug event.  The logic was
that these events are extremely rare and that the overhead from
indiscriminate order regeneration is minimal.

Part of the update logic involves a synchronize_rcu(), which is a pretty
big hammer.  Its overhead was large enough to be detected by some 0day
tests that watch memory hotplug performance[1].

== Solution ==

Add a new helper (node_demotion_topo_changed()) which can differentiate
between superfluous and impactful hotplug events.  Skip the expensive
update operation for superfluous events.

== Aside: Locking ==

It took me a few moments to declare the locking to be safe enough for
node_demotion_topo_changed() to work.  It all hinges on the memory
hotplug lock:

During memory hotplug events, 'mem_hotplug_lock' is held for write.
This ensures that two memory hotplug events can not be called
simultaneously.

CPU hotplug has a similar lock (cpuhp_state_mutex) which also provides
mutual exclusion between CPU hotplug events.  In addition, the demotion
code acquire and hold the mem_hotplug_lock for read during its CPU
hotplug handlers.  This provides mutual exclusion between the demotion
memory hotplug callbacks and the CPU hotplug callbacks.

This effectively allows treating the migration target generation code to
act as if it is single-threaded.

1. https://lore.kernel.org/all/20210905135932.GE15026@xsang-OptiPlex-9020/

Link: https://lkml.kernel.org/r/20210924161251.093CCD06@davehans-spike.ostc.intel.com
Link: https://lkml.kernel.org/r/20210924161253.D7673E31@davehans-spike.ostc.intel.com
Fixes: 884a6e5d1f ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-18 20:22:02 -10:00
..
damon mm/damon: don't use strnlen() with known-bogus source length 2021-09-24 16:13:34 -07:00
kasan Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
kfence Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
Kconfig mm/idle_page_tracking: make PG_idle reusable 2021-09-08 11:50:24 -07:00
Kconfig.debug mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO 2020-12-15 12:13:46 -08:00
Makefile mm: introduce Data Access MONitor (DAMON) 2021-09-08 11:50:24 -07:00
backing-dev.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
balloon_compaction.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
bootmem_info.c mm/bootmem_info.c: mark __init on register_page_bootmem_info_section 2021-09-03 09:58:14 -07:00
cleancache.c
cma.c mm: use proper type for cma_[alloc|release] 2021-05-05 11:27:24 -07:00
cma.h mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
cma_debug.c mm/cma: change cma mutex to irq safe spinlock 2021-05-05 11:27:21 -07:00
cma_sysfs.c mm: cma: support sysfs 2021-05-05 11:27:24 -07:00
compaction.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
debug.c mm/debug: sync up latest migrate_reason to migrate_reason_names 2021-09-24 16:13:35 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: fix corrupted page flag 2021-09-03 09:58:10 -07:00
dmapool.c mm/dmapool: use DEVICE_ATTR_RO macro 2021-06-29 10:53:52 -07:00
early_ioremap.c mm/early_ioremap.c: remove redundant early_ioremap_shutdown() 2021-09-08 11:50:24 -07:00
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
frontswap.c mm/mempool: minor coding style tweaks 2021-05-05 11:27:27 -07:00
gup.c Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly" 2021-09-07 11:03:45 -07:00
gup_test.c selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages 2021-05-05 11:27:26 -07:00
gup_test.h selftests/vm: gup_test: fix test flag 2021-05-05 11:27:26 -07:00
highmem.c mm: in_irq() cleanup 2021-09-08 11:50:24 -07:00
hmm.c mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled 2021-09-08 18:45:52 -07:00
huge_memory.c mm,do_huge_pmd_numa_page: remove unnecessary TLB flushing code 2021-09-03 09:58:13 -07:00
hugetlb.c mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY 2021-09-03 09:58:17 -07:00
hugetlb_cgroup.c hugetlb: make free_huge_page irq safe 2021-05-05 11:27:22 -07:00
hugetlb_vmemmap.c mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON 2021-06-30 20:47:26 -07:00
hugetlb_vmemmap.h mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate 2021-06-30 20:47:25 -07:00
hwpoison-inject.c mm: hwpoison: don't drop slab caches for offlining non-LRU page 2021-09-03 09:58:15 -07:00
init-mm.c mm: add setup_initial_init_mm() helper 2021-07-08 11:48:21 -07:00
internal.h mm/numa: automatically generate node migration order 2021-09-03 09:58:16 -07:00
interval_tree.c mm/interval_tree: add comments to improve code readability 2021-04-30 11:20:38 -07:00
io-mapping.c mm: add a io_mapping_map_user helper 2021-04-30 11:20:39 -07:00
ioremap.c mm: move ioremap_page_range to vmalloc.c 2021-09-08 11:50:24 -07:00
khugepaged.c huge tmpfs: SGP_NOALLOC to stop collapse_file() on race 2021-09-03 09:58:12 -07:00
kmemleak.c mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp 2021-09-08 18:45:53 -07:00
ksm.c mm/ksm: remove old GCC 4.9+ check 2021-09-13 10:18:28 -07:00
list_lru.c mm: vmscan: consolidate shrinker_maps handling code 2021-05-05 11:27:23 -07:00
maccess.c ARM: 9115/1: mm/maccess: fix unaligned copy_{from,to}_kernel_nofault 2021-08-20 11:39:25 +01:00
madvise.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: remove double Note in kerneldoc 2021-07-01 11:06:02 -07:00
memblock.c memblock: exclude NOMAP regions from kmemleak 2021-10-13 08:36:59 +03:00
memcontrol.c memcg: flush lruvec stats in the refault 2021-09-23 10:09:13 -07:00
memfd.c Reimplement RLIMIT_MEMLOCK on top of ucounts 2021-04-30 14:14:02 -05:00
memory-failure.c mm/memory_failure: fix the missing pte_unmap() call 2021-09-24 16:13:35 -07:00
memory.c afs: Fix mmap coherency vs 3rd-party changes 2021-09-13 09:10:39 +01:00
memory_hotplug.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
mempolicy.c Merge branches 'akpm' and 'akpm-hotfixes' (patches from Andrew) 2021-09-08 18:52:05 -07:00
mempool.c kasan: use separate (un)poison implementation for integrated init 2021-06-04 19:32:21 +01:00
memremap.c mm/memory_hotplug: remove nid parameter from arch_remove_memory() 2021-09-08 11:50:23 -07:00
memtest.c
migrate.c mm/migrate: optimize hotplug-time demotion order updates 2021-10-18 20:22:02 -10:00
mincore.c inode: make init and permission helpers idmapped mount aware 2021-01-24 14:27:16 +01:00
mlock.c mm: introduce memfd_secret system call to create "secret" memory areas 2021-07-08 11:48:21 -07:00
mm_init.c include/linux/page-flags-layout.h: cleanups 2021-04-30 11:20:42 -07:00
mmap.c Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux 2021-09-04 11:35:47 -07:00
mmap_lock.c mm: mmap_lock: fix disabling preemption directly 2021-07-23 17:43:28 -07:00
mmu_gather.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-25 09:22:55 -07:00
mmzone.c mm/lru: replace pgdat lru_lock with lruvec lock 2020-12-15 14:48:04 -08:00
mprotect.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
mremap.c mm/mremap: fix memory account on do_munmap() failure 2021-09-03 09:58:14 -07:00
msync.c mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start 2021-04-30 11:20:37 -07:00
nommu.c Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux 2021-09-04 11:35:47 -07:00
oom_kill.c mm: introduce process_mrelease system call 2021-09-03 09:58:17 -07:00
page-writeback.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
page_alloc.c mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype 2021-09-08 18:45:53 -07:00
page_counter.c mm: page_counter: mitigate consequences of a page_counter underflow 2021-04-30 11:20:38 -07:00
page_ext.c mm/idle_page_tracking: make PG_idle reusable 2021-09-08 11:50:24 -07:00
page_idle.c mm/idle_page_tracking: make PG_idle reusable 2021-09-08 11:50:24 -07:00
page_io.c swap: fix swapfile read/write offset 2021-03-02 17:25:46 -07:00
page_isolation.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
page_owner.c mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE 2021-09-08 11:50:22 -07:00
page_poison.c mm: page_poison: print page info when corruption is caught 2021-04-30 11:20:36 -07:00
page_reporting.c mm/page_reporting: allow driver to specify reporting order 2021-06-29 10:53:47 -07:00
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_vma_mapped.c mm: device exclusive memory access 2021-07-01 11:06:03 -07:00
pagewalk.c mm: pagewalk: fix walk for hugepage tables 2021-06-29 10:53:49 -07:00
percpu-internal.h Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu 2021-07-01 17:17:24 -07:00
percpu-km.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu-stats.c percpu: rework memcg accounting 2021-06-05 20:43:15 +00:00
percpu-vm.c percpu: flush tlb in pcpu_reclaim_populated() 2021-07-04 18:30:17 +00:00
percpu.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
pgalloc-track.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-16 09:24:42 -07:00
process_vm_access.c mm/process_vm_access.c: remove duplicate include 2021-05-05 11:27:27 -07:00
ptdump.c mm: ptdump: fix build failure 2021-04-16 16:10:37 -07:00
readahead.c mm: Protect operations adding pages to page cache with invalidate_lock 2021-07-13 13:14:27 +02:00
rmap.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
rodata_test.c
secretmem.c mm/secretmem: use refcount_t instead of atomic_t 2021-09-08 11:50:24 -07:00
shmem.c mm/shmem.c: fix judgment error in shmem_is_huge() 2021-09-24 16:13:34 -07:00
shuffle.c mm: eliminate "expecting prototype" kernel-doc warnings 2021-04-16 16:10:36 -07:00
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
slab.h mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() 2021-07-30 10:14:39 -07:00
slab_common.c mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context 2021-09-04 01:12:23 +02:00
slob.c mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels 2021-03-08 14:18:46 -08:00
slub.c mm, slub: convert kmem_cpu_slab protection to local_lock 2021-09-04 10:22:01 +02:00
sparse-vmemmap.c mm: sparsemem: split the huge PMD mapping of vmemmap pages 2021-06-30 20:47:26 -07:00
sparse.c mm: introduce memmap_alloc() to unify memory map allocation 2021-09-03 09:58:15 -07:00
swap.c mm: fs: invalidate bh_lrus for only cold path 2021-09-24 16:13:35 -07:00
swap_cgroup.c
swap_slots.c mm: Replace deprecated CPU-hotplug functions. 2021-08-28 01:46:17 +02:00
swap_state.c Revert "mm: swap: check if swap backing device is congested or not" 2021-08-20 11:31:42 -07:00
swapfile.c mm, memcg: inline swap-related functions to improve disabled memcg config 2021-09-03 09:58:12 -07:00
truncate.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
usercopy.c
userfaultfd.c userfaultfd: change mmap_changing to atomic 2021-09-03 09:58:16 -07:00
util.c mm: fix uninitialized use in overcommit_policy_handler 2021-09-24 16:13:35 -07:00
vmacache.c
vmalloc.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
vmpressure.c mm/vmpressure: replace vmpressure_to_css() with vmpressure_to_memcg() 2021-09-03 09:58:17 -07:00
vmscan.c mm,vmscan: fix divide by zero in get_scan_count 2021-09-08 18:45:53 -07:00
vmstat.c mm/vmstat: protect per cpu variables with preempt disable on RT 2021-09-08 15:32:34 -07:00
workingset.c memcg: flush lruvec stats in the refault 2021-09-23 10:09:13 -07:00
z3fold.c mm/z3fold: add kerneldoc fields for z3fold_pool 2021-07-01 11:06:03 -07:00
zbud.c mm/zbud: add kerneldoc fields for zbud_pool 2021-07-01 11:06:03 -07:00
zpool.c mm: fix typos in comments 2021-05-07 00:26:35 -07:00
zsmalloc.c mm/zsmalloc.c: improve readability for async_free_zspage() 2021-07-01 11:06:02 -07:00
zswap.c mm/zswap.c: fix two bugs in zswap_writeback_entry() 2021-06-30 20:47:31 -07:00