linux-stable/Documentation
Nhat Pham b5ba474f3f zswap: shrink zswap pool based on memory pressure
Currently, we only shrink the zswap pool when the user-defined limit is
hit.  This means that if we set the limit too high, cold data that are
unlikely to be used again will reside in the pool, wasting precious
memory.  It is hard to predict how much zswap space will be needed ahead
of time, as this depends on the workload (specifically, on factors such as
memory access patterns and compressibility of the memory pages).

This patch implements a memcg- and NUMA-aware shrinker for zswap, that is
initiated when there is memory pressure.  The shrinker does not have any
parameter that must be tuned by the user, and can be opted in or out on a
per-memcg basis.

Furthermore, to make it more robust for many workloads and prevent
overshrinking (i.e evicting warm pages that might be refaulted into
memory), we build in the following heuristics:

* Estimate the number of warm pages residing in zswap, and attempt to
  protect this region of the zswap LRU.
* Scale the number of freeable objects by an estimate of the memory
  saving factor. The better zswap compresses the data, the fewer pages
  we will evict to swap (as we will otherwise incur IO for relatively
  small memory saving).
* During reclaim, if the shrinker encounters a page that is also being
  brought into memory, the shrinker will cautiously terminate its
  shrinking action, as this is a sign that it is touching the warmer
  region of the zswap LRU.

As a proof of concept, we ran the following synthetic benchmark: build the
linux kernel in a memory-limited cgroup, and allocate some cold data in
tmpfs to see if the shrinker could write them out and improved the overall
performance.  Depending on the amount of cold data generated, we observe
from 14% to 35% reduction in kernel CPU time used in the kernel builds.

[nphamcs@gmail.com: check shrinker enablement early, use less costly stat flushing]
  Link: https://lkml.kernel.org/r/20231206194456.3234203-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231130194023.4102148-7-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12 10:57:02 -08:00
..
ABI leds: class: Don't expose color sysfs entry 2023-11-22 11:46:03 +00:00
PCI
RCU Merge branches 'rcu/torture', 'rcu/fixes', 'rcu/docs', 'rcu/refscale', 'rcu/tasks' and 'rcu/stall' into rcu/next 2023-10-23 15:24:11 +02:00
accel
accounting
admin-guide zswap: shrink zswap pool based on memory pressure 2023-12-12 10:57:02 -08:00
arch Docs/LoongArch: Update links in LoongArch introduction.rst 2023-11-21 15:03:25 +08:00
block The number of commits for documentation is not huge this time around, but 2023-11-01 17:11:41 -10:00
bpf bpf: Add __bpf_kfunc_{start,end}_defs macros 2023-11-01 22:33:53 -07:00
cdrom
core-api maple_tree: update the documentation of maple tree 2023-12-10 16:51:32 -08:00
cpu-freq
crypto crypto: ahash - remove support for nonzero alignmask 2023-10-27 18:04:29 +08:00
dev-tools Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
devicetree Some pin control fixes for the v6.7 cycle: 2023-11-29 06:45:22 -08:00
doc-guide docs: doc-guide: mention 'make refcheckdocs' 2023-10-22 20:38:55 -06:00
driver-api media updates for v6.7-rc1 2023-11-06 15:06:06 -08:00
fault-injection
fb
features
filesystems fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
firmware-guide
firmware_class
fpga
gpu Merge tag 'drm-misc-next-2023-10-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-next 2023-10-31 10:47:50 +10:00
hid
hwmon hwmon: (aquacomputer_d5next) Add support for Aquacomputer High Flow USB and MPS Flow 2023-10-29 22:22:48 -07:00
i2c Documentation: i2c: add fault code for not supporting 10 bit addresses 2023-10-29 21:03:35 +01:00
iio
images
infiniband
input
isdn
kbuild Kbuild updates for v6.7 2023-11-04 08:07:19 -10:00
kernel-hacking
leds
litmus-tests
livepatch
locking
maintainer
mhi
misc-devices eeprom: remove doc and MAINTAINERS section after driver was removed 2023-10-18 10:01:34 +02:00
mm Documentation/mm: drop pte_bad() descriptions from arch page table helpers 2023-12-10 16:51:40 -08:00
netlabel
netlink netlink: specs: devlink: add forgotten port function caps enum values 2023-11-01 22:13:43 -07:00
networking Including fixes from netfilter and bpf. 2023-11-09 17:09:35 -08:00
nvdimm
nvme
pcmcia
peci
power
process docs: netdev: try to guide people on dealing with silence 2023-11-21 14:35:43 -08:00
rust Rust changes for v6.7 2023-10-30 20:30:49 -10:00
scheduler asm-generic updates for v6.7 2023-11-01 15:28:33 -10:00
scsi
security
sound Linux 6.6-rc7 2023-10-23 19:38:22 +01:00
sphinx Documentation/sphinx: Remove the repeated word "the" in comments. 2023-10-22 20:33:38 -06:00
sphinx-static
spi
staging
target
timers
tools
trace Documentation: tracing: Add a note about argument and retval access 2023-11-10 19:59:03 +09:00
translations Docs/zh_CN/LoongArch: Update links in LoongArch introduction.rst 2023-11-21 15:03:26 +08:00
usb USB/Thunderbolt changes for 6.7-rc1 2023-11-03 16:00:42 -10:00
userspace-api drm next and fixes for 6.7-rc1 2023-11-07 17:10:02 -08:00
virt KVM/arm64 updates for 6.7 2023-10-31 16:37:07 -04:00
w1
watchdog
wmi
.gitignore
Changes
CodingStyle
Kconfig
Makefile
SubmittingPatches
atomic_bitops.txt
atomic_t.txt
conf.py
docutils.conf
dontdiff
index.rst
memory-barriers.txt
subsystem-apis.rst