linux-stable/arch/x86
Sean Christopherson 9e52fd5949 KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously
commit 0df9dab891 upstream.

Stop zapping invalidate TDP MMU roots via work queue now that KVM
preserves TDP MMU roots until they are explicitly invalidated.  Zapping
roots asynchronously was effectively a workaround to avoid stalling a vCPU
for an extended during if a vCPU unloaded a root, which at the time
happened whenever the guest toggled CR0.WP (a frequent operation for some
guest kernels).

While a clever hack, zapping roots via an unbound worker had subtle,
unintended consequences on host scheduling, especially when zapping
multiple roots, e.g. as part of a memslot.  Because the work of zapping a
root is no longer bound to the task that initiated the zap, things like
the CPU affinity and priority of the original task get lost.  Losing the
affinity and priority can be especially problematic if unbound workqueues
aren't affined to a small number of CPUs, as zapping multiple roots can
cause KVM to heavily utilize the majority of CPUs in the system, *beyond*
the CPUs KVM is already using to run vCPUs.

When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root
zap can result in KVM occupying all logical CPUs for ~8ms, and result in
high priority tasks not being scheduled in in a timely manner.  In v5.15,
which doesn't preserve unloaded roots, the issues were even more noticeable
as KVM would zap roots more frequently and could occupy all CPUs for 50ms+.

Consuming all CPUs for an extended duration can lead to significant jitter
throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots
is a semi-frequent operation as memslots are deleted and recreated with
different host virtual addresses to react to host GPU drivers allocating
and freeing GPU blobs.  On ChromeOS, the jitter manifests as audio blips
during games due to the audio server's tasks not getting scheduled in
promptly, despite the tasks having a high realtime priority.

Deleting memslots isn't exactly a fast path and should be avoided when
possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the
memslot shenanigans, but KVM is squarely in the wrong.  Not to mention
that removing the async zapping eliminates a non-trivial amount of
complexity.

Note, one of the subtle behaviors hidden behind the async zapping is that
KVM would zap invalidated roots only once (ignoring partial zaps from
things like mmu_notifier events).  Preserve this behavior by adding a flag
to identify roots that are scheduled to be zapped versus roots that have
already been zapped but not yet freed.

Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can
encounter invalid roots, as it's not at all obvious why zapping
invalidated roots shouldn't simply zap all invalid roots.

Reported-by: Pattara Teerapong <pteerapong@google.com>
Cc: David Stevens <stevensd@google.com>
Cc: Yiwei Zhang<zzyiwei@google.com>
Cc: Paul Hsia <paulhsia@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230916003916.2545000-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-06 13:16:20 +02:00
..
boot x86/boot/compressed: Reserve more memory for page tables 2023-09-23 11:14:33 +02:00
coco - Some SEV and CC platform helpers cleanup and simplifications now that 2023-06-27 13:26:30 -07:00
configs arch/*/configs/*defconfig: Replace AUTOFS4_FS by AUTOFS_FS 2023-07-29 14:08:22 -07:00
crypto This push fixes an alignment crash in x86/aria. 2023-05-29 07:05:49 -04:00
entry x86/mm: Fix VDSO and VVAR placement on 5-level paging machines 2023-08-09 13:38:48 -07:00
events perf/x86/uncore: Correct the number of CHAs on EMR 2023-09-13 09:53:56 +02:00
hyperv hyperv-fixes for 6.5-rc5 2023-08-04 17:16:14 -07:00
ia32
include KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously 2023-10-06 13:16:20 +02:00
kernel x86/srso: Add SRSO mitigation for Hygon processors 2023-10-06 13:16:19 +02:00
kvm KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously 2023-10-06 13:16:20 +02:00
lib x86/asm: Fix build of UML with KASAN 2023-10-06 13:15:55 +02:00
math-emu x86/fpu: Include asm/fpu/regset.h 2023-05-18 11:56:18 -07:00
mm x86/sev: Make enc_dec_hypercall() accept a size instead of npages 2023-09-13 09:53:54 +02:00
net bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable. 2023-06-12 16:47:18 +02:00
pci - Address -Wmissing-prototype warnings 2023-06-26 16:43:54 -07:00
platform A single regression fix for x86: 2023-07-01 11:40:01 -07:00
power x86/topology: Remove CPU0 hotplug option 2023-05-15 13:44:49 +02:00
purgatory x86/purgatory: Remove LTO flags 2023-09-23 11:14:33 +02:00
ras
realmode x86/realmode: Make stack lock work in trampoline_compat() 2023-05-30 14:11:47 +02:00
tools
um
video drm changes for 6.5-rc1: 2023-06-29 11:00:17 -07:00
virt/vmx/tdx
xen xen: branch for v6.5-rc2 2023-07-13 13:39:36 -07:00
.gitignore
Kbuild
Kconfig efi/x86: Ensure that EFI_RUNTIME_MAP is enabled for kexec 2023-10-06 13:16:10 +02:00
Kconfig.assembler
Kconfig.cpu x86/cpu: Remove X86_FEATURE_NAMES 2023-05-15 20:03:08 +02:00
Kconfig.debug
Makefile x86/unwind/orc: Add ELF section with ORC version identifier 2023-06-16 17:17:42 +02:00
Makefile.postlink x86/build: Avoid relocation information in final vmlinux 2023-06-14 19:54:40 +02:00
Makefile.um
Makefile_32.cpu