linux-stable/mm
Andrea Righi 5af2f74f90 mm: swap: properly update readahead statistics in unuse_pte_range()
commit ebc5951eea upstream.

In unuse_pte_range() we blindly swap-in pages without checking if the
swap entry is already present in the swap cache.

By doing this, the hit/miss ratio used by the swap readahead heuristic
is not properly updated and this leads to non-optimal performance during
swapoff.

Tracing the distribution of the readahead size returned by the swap
readahead heuristic during swapoff shows that a small readahead size is
used most of the time as if we had only misses (this happens both with
cluster and vma readahead), for example:

r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
        COUNT      EVENT
        36948      $retval = 8
        44151      $retval = 4
        49290      $retval = 1
        527771     $retval = 2

Checking if the swap entry is present in the swap cache, instead, allows
to properly update the readahead statistics and the heuristic behaves in a
better way during swapoff, selecting a bigger readahead size:

r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
        COUNT      EVENT
        1618       $retval = 1
        4960       $retval = 2
        41315      $retval = 4
        103521     $retval = 8

In terms of swapoff performance the result is the following:

Testing environment
===================

 - Host:
   CPU: 1.8GHz Intel Core i7-8565U (quad-core, 8MB cache)
   HDD: PC401 NVMe SK hynix 512GB
   MEM: 16GB

 - Guest (kvm):
   8GB of RAM
   virtio block driver
   16GB swap file on ext4 (/swapfile)

Test case
=========
 - allocate 85% of memory
 - `systemctl hibernate` to force all the pages to be swapped-out to the
   swap file
 - resume the system
 - measure the time that swapoff takes to complete:
   # /usr/bin/time swapoff /swapfile

Result (swapoff time)
======
                  5.6 vanilla   5.6 w/ this patch
                  -----------   -----------------
cluster-readahead      22.09s              12.19s
    vma-readahead      18.20s              15.33s

Conclusion
==========

The specific use case this patch is addressing is to improve swapoff
performance in cloud environments when a VM has been hibernated, resumed
and all the memory needs to be forced back to RAM by disabling swap.

This change allows to better exploits the advantages of the readahead
heuristic during swapoff and this improvement allows to to speed up the
resume process of such VMs.

[andrea.righi@canonical.com: update changelog]
  Link: http://lkml.kernel.org/r/20200418084705.GA147642@xps-13
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Anchal Agarwal <anchalag@amazon.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Cc: Kelley Nielsen <kelleynnn@gmail.com>
Link: http://lkml.kernel.org/r/20200416180132.GB3352@xps-13
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:29 +01:00
..
kasan panic: Consolidate open-coded panic_on_warn checks 2023-02-06 07:52:50 +01:00
backing-dev.c mm: bdi: initialize bdi_min_ratio when bdi is unregistered 2021-12-14 14:49:00 +01:00
balloon_compaction.c mm/balloon_compaction: suppress allocation warnings 2019-09-04 07:42:01 -04:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma.c cma: don't quit at first error when activating reserved areas 2020-09-03 11:26:51 +02:00
cma.h
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-05-14 09:47:45 -07:00
compaction.c mm, compaction: fix fast_isolate_around() to stay within boundaries 2023-01-18 11:41:44 +01:00
debug.c mm/debug.c: always print flags in dump_page() 2020-03-05 16:43:51 +01:00
debug_page_ref.c
dmapool.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
early_ioremap.c
fadvise.c fs: Export generic_fadvise() 2019-08-30 22:43:58 -07:00
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm: fs: initialize fsdata passed to write_begin/write_end interface 2022-11-25 17:42:22 +01:00
frame_vector.c v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() fails 2022-12-08 11:23:06 +01:00
frontswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
gup.c mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page 2022-12-19 12:24:15 +01:00
gup_benchmark.c mm/gup: fix memory leak in __gup_benchmark_ioctl 2020-01-09 10:20:00 +01:00
highmem.c
hmm.c pagewalk: separate function pointers from iterator data 2019-09-07 04:28:04 -03:00
huge_memory.c mm/huge_memory.c: don't discard hugepage if other processes are mapping it 2021-07-14 16:53:47 +02:00
hugetlb.c mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page 2022-12-19 12:24:15 +01:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-15 18:34:00 -08:00
hwpoison-inject.c hwpoison-inject: no need to check return value of debugfs_create functions 2019-06-03 15:39:40 +02:00
init-mm.c mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch 2019-10-19 06:32:32 -04:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-06-30 08:47:52 -04:00
interval_tree.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248 2019-06-19 17:09:08 +02:00
Kconfig mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-16 10:56:59 +01:00
Kconfig.debug mm, page_owner, debug_pagealloc: save and dump freeing stack trace 2019-09-24 15:54:08 -07:00
khugepaged.c mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma 2023-01-24 07:18:01 +01:00
kmemleak-test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
kmemleak.c Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" 2022-09-15 12:04:49 +02:00
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-19 10:08:27 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-11 13:23:31 +01:00
maccess.c uaccess: Add non-pagefault user-space write function 2020-01-17 19:48:40 +01:00
madvise.c mm: fix madivse_pageout mishandling on non-LRU page 2022-10-05 10:37:43 +02:00
Makefile mm: silence -Woverride-init/initializer-overrides 2019-09-24 15:54:10 -07:00
memblock.c mm: Always release pages to the buddy allocator in memblock_free_late(). 2023-01-18 11:42:06 +01:00
memcontrol.c memcg: fix possible use-after-free in memcg_write_event_control() 2022-12-14 11:30:43 +01:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-08 19:07:49 +01:00
memory-failure.c mm/memory-failure: make sure wait for page writeback in memory_failure 2021-06-23 14:41:23 +02:00
memory.c mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() 2022-05-15 19:54:47 +02:00
memory_hotplug.c mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() 2021-09-22 12:26:43 +02:00
mempolicy.c mm/mempolicy: fix uninit-value in mpol_rebind_policy() 2022-07-29 17:14:16 +02:00
mempool.c
memremap.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-09 10:19:56 +01:00
memtest.c
migrate.c mm/migrate_device.c: flush TLB while holding PTL 2022-10-05 10:37:43 +02:00
mincore.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mlock.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mm_init.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmap.c mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region() 2022-09-20 12:28:00 +02:00
mmu_context.c mm: fix kthread_use_mm() vs TLB invalidate 2020-09-03 11:26:51 +02:00
mmu_gather.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2022-12-14 11:30:42 +01:00
mmu_notifier.c mm/mmu_notifiers: use the right return code for WARN_ON 2019-11-06 08:47:50 -08:00
mmzone.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 19:54:46 +02:00
mprotect.c mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa 2020-03-12 13:00:19 +01:00
mremap.c mm/mremap: hold the rmap lock in write mode when moving page table entries. 2022-08-25 11:17:20 +02:00
msync.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
nommu.c x86/mm: split vmalloc_sync_all() 2020-03-25 08:25:58 +01:00
oom_kill.c oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup 2022-04-27 13:50:48 +02:00
page-writeback.c mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() 2020-01-23 08:22:41 +01:00
page_alloc.c mm: prevent page_frag_alloc() from corrupting the memory 2022-10-05 10:37:43 +02:00
page_counter.c mm/page_counter.c: fix protection usage propagation 2020-08-21 13:05:27 +02:00
page_ext.c mm, page_owner: fix off-by-one error in __set_page_owner_handle() 2019-10-14 15:04:00 -07:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-06-29 16:43:45 +08:00
page_io.c mm: fix unexpected zeroed page mapping with zram swap 2022-05-12 12:23:48 +02:00
page_isolation.c mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-23 12:40:47 +02:00
page_owner.c mm/page_owner: change split_page_owner to take a count 2020-10-29 09:57:52 +01:00
page_poison.c mm/page_poison.c: fix a typo in a comment 2019-09-24 15:54:08 -07:00
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-30 08:47:55 -04:00
pagewalk.c mm: pagewalk: Fix race between unmap and page walker 2022-10-15 07:54:36 +02:00
percpu-internal.h percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-km.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-vm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu.c percpu: fix first chunk size calculation for populated bitmap 2020-09-23 12:40:45 +02:00
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-30 08:47:52 -04:00
process_vm_access.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
readahead.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
rmap.c mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse 2022-09-05 10:27:46 +02:00
rodata_test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
shmem.c shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode 2022-01-27 09:19:29 +01:00
shuffle.c mm/shuffle: don't move pages between zones and don't read garbage memmaps 2020-09-03 11:26:51 +02:00
shuffle.h mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
slab.c mm, debug_pagealloc: don't rely on static keys too early 2020-01-23 08:22:40 +01:00
slab.h mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag 2021-11-26 10:47:21 +01:00
slab_common.c mm: slab: fix kmem_cache_create failed when sysfs node not destroyed 2021-07-25 14:35:14 +02:00
slob.c mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) 2019-10-07 15:47:20 -07:00
slub.c mm/slub: fix to return errno if kmalloc() fails 2022-09-28 11:04:04 +02:00
sparse-vmemmap.c mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() 2019-07-18 17:08:07 -07:00
sparse.c mm/sparse: add the missing sparse_buffer_fini() in error branch 2021-05-14 09:44:32 +02:00
swap.c mm: introduce MADV_COLD 2019-09-25 17:51:41 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c mm/swap_state: fix a data race in swapin_nr_pages 2020-10-01 13:18:08 +02:00
swapfile.c mm: swap: properly update readahead statistics in unuse_pte_range() 2023-02-22 12:50:29 +01:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-30 08:47:53 -04:00
usercopy.c mm/usercopy: return 1 from hardened_usercopy __setup() handler 2022-04-15 14:18:30 +02:00
userfaultfd.c mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() 2022-05-15 19:54:47 +02:00
util.c random: move randomize_page() into mm where it belongs 2022-06-22 14:11:17 +02:00
vmacache.c
vmalloc.c mm/vunmap: add cond_resched() in vunmap_pmd_range 2020-09-03 11:26:52 +02:00
vmpressure.c mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() 2019-10-07 15:47:19 -07:00
vmscan.c mm,vmscan: fix divide by zero in get_scan_count 2021-09-22 12:26:37 +02:00
vmstat.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 19:54:46 +02:00
workingset.c mm: workingset: fix vmstat counters for shadow nodes 2019-08-13 16:06:52 -07:00
z3fold.c mm/z3fold: fix potential memory leak in z3fold_destroy_pool() 2021-07-14 16:53:47 +02:00
zbud.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zpool.c zpool: add malloc_support_movable to zpool_driver 2019-09-24 15:54:12 -07:00
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:33:50 +02:00
zswap.c zswap: do not map same object twice 2019-09-24 15:54:12 -07:00