linux-stable/arch
David Hildenbrand 840565b135 s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests
[ Upstream commit 06201e00ee ]

commit fa41ba0d08 ("s390/mm: avoid empty zero pages for KVM guests to
avoid postcopy hangs") introduced an undesired side effect when combined
with memory ballooning and VM migration: memory part of the inflated
memory balloon will consume memory.

Assuming we have a 100GiB VM and inflated the balloon to 40GiB. Our VM
will consume ~60GiB of memory. If we now trigger a VM migration,
hypervisors like QEMU will read all VM memory. As s390x does not support
the shared zeropage, we'll end up allocating for all previously-inflated
memory part of the memory balloon: 50 GiB. So we might easily
(unexpectedly) crash the VM on the migration source.

Even worse, hypervisors like QEMU optimize for zeropage migration to not
consume memory on the migration destination: when migrating a
"page full of zeroes", on the migration destination they check whether the
target memory is already zero (by reading the destination memory) and avoid
writing to the memory to not allocate memory: however, s390x will also
allocate memory here, implying that also on the migration destination, we
will end up allocating all previously-inflated memory part of the memory
balloon.

This is especially bad if actual memory overcommit was not desired, when
memory ballooning is used for dynamic VM memory resizing, setting aside
some memory during boot that can be added later on demand. Alternatives
like virtio-mem that would avoid this issue are not yet available on
s390x.

There could be ways to optimize some cases in user space: before reading
memory in an anonymous private mapping on the migration source, check via
/proc/self/pagemap if anything is already populated. Similarly check on
the migration destination before reading. While that would avoid
populating tables full of shared zeropages on all architectures, it's
harder to get right and performant, and requires user space changes.

Further, with posctopy live migration we must place a page, so there,
"avoid touching memory to avoid allocating memory" is not really
possible. (Note that a previously we would have falsely inserted
shared zeropages into processes using UFFDIO_ZEROPAGE where
mm_forbids_zeropage() would have actually forbidden it)

PV is currently incompatible with memory ballooning, and in the common
case, KVM guests don't make use of storage keys. Instead of zapping
zeropages when enabling storage keys / PV, that turned out to be
problematic in the past, let's do exactly the same we do with KSM pages:
trigger unsharing faults to replace the shared zeropages by proper
anonymous folios.

What about added latency when enabling storage kes? Having a lot of
zeropages in applicable environments (PV, legacy guests, unittests) is
unexpected. Further, KSM could today already unshare the zeropages
and unmerging KSM pages when enabling storage kets would unshare the
KSM-placed zeropages in the same way, resulting in the same latency.

[ agordeev: Fixed sparse and checkpatch complaints and error handling ]

Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Fixes: fa41ba0d08 ("s390/mm: avoid empty zero pages for KVM guests to avoid postcopy hangs")
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20240411161441.910170-3-david@redhat.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12 11:11:33 +02:00
..
alpha rtc: Add support for configuring the UIP timeout for RTC reads 2024-01-31 16:18:56 -08:00
arc ARC: [plat-hsdk]: Remove misplaced interrupt-cells property 2024-05-02 16:32:33 +02:00
arm ARM: configs: sunxi: Enable DRM_DW_HDMI 2024-06-12 11:11:32 +02:00
arm64 arm64: dts: qcom: sa8155p-adp: fix SDHC2 CD pin configuration 2024-05-17 12:02:35 +02:00
csky work around gcc bugs with 'asm goto' with outputs 2024-02-23 09:24:47 +01:00
hexagon hexagon: vmlinux.lds.S: handle attributes section 2024-04-03 15:28:55 +02:00
ia64
loongarch LoongArch: Lately init pmu after smp is online 2024-06-12 11:11:24 +02:00
m68k mm: Introduce flush_cache_vmap_early() 2024-02-16 19:10:52 +01:00
microblaze
mips MIPS: scall: Save thread_info.syscall unconditionally on entry 2024-05-17 12:02:15 +02:00
nios2 mm: Introduce flush_cache_vmap_early() 2024-02-16 19:10:52 +01:00
openrisc
parisc parisc: add missing export of __cmpxchg_u8() 2024-06-12 11:11:31 +02:00
powerpc powerpc/crypto/chacha-p10: Fix failure on non Power10 2024-05-17 12:02:18 +02:00
riscv riscv, bpf: Fix incorrect runtime stats 2024-05-17 12:02:01 +02:00
s390 s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests 2024-06-12 11:11:33 +02:00
sh mm: Introduce flush_cache_vmap_early() 2024-02-16 19:10:52 +01:00
sparc mm, treewide: introduce NR_PAGE_ORDERS 2024-05-02 16:32:41 +02:00
um um: Fix adding '-no-pie' for clang 2024-02-23 09:25:03 +01:00
x86 crypto: x86/sha512-avx2 - add missing vzeroupper 2024-06-12 11:11:32 +02:00
xtensa xtensa: fix MAKE_PC_FROM_RA second argument 2024-05-17 12:02:32 +02:00
.gitignore
Kconfig cpu: Re-enable CPU mitigations by default for !X86 architectures 2024-05-02 16:32:44 +02:00