linux-stable/arch
Sean Christopherson 9e52fd5949 KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously
commit 0df9dab891 upstream.

Stop zapping invalidate TDP MMU roots via work queue now that KVM
preserves TDP MMU roots until they are explicitly invalidated.  Zapping
roots asynchronously was effectively a workaround to avoid stalling a vCPU
for an extended during if a vCPU unloaded a root, which at the time
happened whenever the guest toggled CR0.WP (a frequent operation for some
guest kernels).

While a clever hack, zapping roots via an unbound worker had subtle,
unintended consequences on host scheduling, especially when zapping
multiple roots, e.g. as part of a memslot.  Because the work of zapping a
root is no longer bound to the task that initiated the zap, things like
the CPU affinity and priority of the original task get lost.  Losing the
affinity and priority can be especially problematic if unbound workqueues
aren't affined to a small number of CPUs, as zapping multiple roots can
cause KVM to heavily utilize the majority of CPUs in the system, *beyond*
the CPUs KVM is already using to run vCPUs.

When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root
zap can result in KVM occupying all logical CPUs for ~8ms, and result in
high priority tasks not being scheduled in in a timely manner.  In v5.15,
which doesn't preserve unloaded roots, the issues were even more noticeable
as KVM would zap roots more frequently and could occupy all CPUs for 50ms+.

Consuming all CPUs for an extended duration can lead to significant jitter
throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots
is a semi-frequent operation as memslots are deleted and recreated with
different host virtual addresses to react to host GPU drivers allocating
and freeing GPU blobs.  On ChromeOS, the jitter manifests as audio blips
during games due to the audio server's tasks not getting scheduled in
promptly, despite the tasks having a high realtime priority.

Deleting memslots isn't exactly a fast path and should be avoided when
possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the
memslot shenanigans, but KVM is squarely in the wrong.  Not to mention
that removing the async zapping eliminates a non-trivial amount of
complexity.

Note, one of the subtle behaviors hidden behind the async zapping is that
KVM would zap invalidated roots only once (ignoring partial zaps from
things like mmu_notifier events).  Preserve this behavior by adding a flag
to identify roots that are scheduled to be zapped versus roots that have
already been zapped but not yet freed.

Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can
encounter invalid roots, as it's not at all obvious why zapping
invalidated roots shouldn't simply zap all invalid roots.

Reported-by: Pattara Teerapong <pteerapong@google.com>
Cc: David Stevens <stevensd@google.com>
Cc: Yiwei Zhang<zzyiwei@google.com>
Cc: Paul Hsia <paulhsia@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230916003916.2545000-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-06 13:16:20 +02:00
..
alpha Kbuild fixes for v6.5 (2nd) 2023-08-13 08:56:24 -07:00
arc ARC: atomics: Add compiler barrier to atomic operations... 2023-09-19 12:30:22 +02:00
arm ARM: dts: ti: omap: motorola-mapphone: Fix abe_clkctrl warning on boot 2023-10-06 13:16:03 +02:00
arm64 arm64: dts: imx: Add imx8mm-prt8mm.dtb to build 2023-10-06 13:16:06 +02:00
csky
hexagon
ia64 locking: remove spin_lock_prefetch 2023-08-12 09:18:47 -07:00
loongarch LoongArch: Set all reserved memblocks on Node#0 at initialization 2023-10-06 13:16:18 +02:00
m68k m68k: Fix invalid .section syntax 2023-07-24 14:50:02 +02:00
microblaze
mips MIPS: Alchemy: only build mmc support helpers if au1xmmc is enabled 2023-10-06 13:16:01 +02:00
nios2
openrisc
parisc parisc: irq: Make irq_stack_union static to avoid sparse warning 2023-10-06 13:16:09 +02:00
powerpc powerpc/watchpoints: Annotate atomic context in more places 2023-10-06 13:16:17 +02:00
riscv riscv: errata: fix T-Head dcache.cva encoding 2023-10-06 13:16:12 +02:00
s390 s390/boot: cleanup number of page table levels setup 2023-09-23 11:14:18 +02:00
sh sh: push-switch: Reorder cleanup operations to avoid use-after-free bug 2023-09-19 12:30:21 +02:00
sparc nmi_backtrace: allow excluding an arbitrary CPU 2023-09-13 09:53:08 +02:00
um um: virt-pci: fix missing declaration warning 2023-09-13 09:53:48 +02:00
x86 KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously 2023-10-06 13:16:20 +02:00
xtensa xtensa: boot/lib: fix function prototypes 2023-10-06 13:16:04 +02:00
.gitignore
Kconfig