Commit graph

548 commits

Author SHA1 Message Date
Tom Lendacky
916e3e5f26 KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX
When the TSC_AUX MSR is virtualized, the TSC_AUX value is swap type "B"
within the VMSA. This means that the guest value is loaded on VMRUN and
the host value is restored from the host save area on #VMEXIT.

Since the value is restored on #VMEXIT, the KVM user return MSR support
for TSC_AUX can be replaced by populating the host save area with the
current host value of TSC_AUX. And, since TSC_AUX is not changed by Linux
post-boot, the host save area can be set once in svm_hardware_enable().
This eliminates the two WRMSR instructions associated with the user return
MSR support.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <d381de38eb0ab6c9c93dda8503b72b72546053d7.1694811272.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23 05:35:49 -04:00
Tom Lendacky
e0096d01c4 KVM: SVM: Fix TSC_AUX virtualization setup
The checks for virtualizing TSC_AUX occur during the vCPU reset processing
path. However, at the time of initial vCPU reset processing, when the vCPU
is first created, not all of the guest CPUID information has been set. In
this case the RDTSCP and RDPID feature support for the guest is not in
place and so TSC_AUX virtualization is not established.

This continues for each vCPU created for the guest. On the first boot of
an AP, vCPU reset processing is executed as a result of an APIC INIT
event, this time with all of the guest CPUID information set, resulting
in TSC_AUX virtualization being enabled, but only for the APs. The BSP
always sees a TSC_AUX value of 0 which probably went unnoticed because,
at least for Linux, the BSP TSC_AUX value is 0.

Move the TSC_AUX virtualization enablement out of the init_vmcb() path and
into the vcpu_after_set_cpuid() path to allow for proper initialization of
the support after the guest CPUID information has been set.

With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid()
path, the intercepts must be either cleared or set based on the guest
CPUID input.

Fixes: 296d5a17e7 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23 05:35:49 -04:00
Paolo Bonzini
6d5e3c318a KVM x86 changes for 6.6:
- Misc cleanups
 
  - Retry APIC optimized recalculation if a vCPU is added/enabled
 
  - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the
    "emergency disabling" behavior to KVM actually being loaded, and move all of
    the logic within KVM
 
  - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC
    ratio MSR can diverge from the default iff TSC scaling is enabled, and clean
    up related code
 
  - Add a framework to allow "caching" feature flags so that KVM can check if
    the guest can use a feature without needing to search guest CPUID
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmTueMwSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5hp4P/i/UmIJEJupryUrD/ZXcSjqmupCtv4JS
 Z2o1KIAPbM5GUX4iyF1cnZrI4Ac5zMtULN8Tp3ATOp3AqKy72AqB1Z82e+v6SKis
 KfSXlDFCPFisrwv3Ys7JEu9vIS8oqITHmSBk8OAmElwujdQ5jYLZjwGbCXbM9qas
 yCFGLqD4fjX8XqkZLmXggjT99MPSgiTPoKL592Wq4JR8mY4hyQqJzBepDjb94sT7
 wrsAv1B+BchGDguk0+nOdmHM4emGrZU7fVqi3OFPofSlwAAdkqZObleb422KB058
 5bcpNow+9VH5pzgq8XSAU7DLNgH9aXH0PcVU8ASU6P0D9fceKoOFuL47nnFbwz0t
 vKafcXNWFs8xHE4iyzvAAsZK/X8GR0ngNByPnamATMsjt2tTmsa5BOyAPkIN+GpT
 DzZCIk27SbdGC3lGYlSV+5ob/+sOr6m384DkvSZnU6JiiFLlZiTxURj1/9Zvfka8
 2co2wnf8cJxnKFUThFfuxs9XpKgvhkOE8LauwCSo4MAQM95Pen+NAK960RBWj0xl
 wof5kIGmKbwmMXyg2Sr+EKqe5KRPba22Yi3x24tURAXafKK/AW7T8dgEEXOll7dp
 pKmTPAevwUk9wYIGultjhEBXKYgMOeD2BVoTa5je5h1Da28onrSJ7aLQUixHHs0J
 gLdtzs8M9K9t
 =yGM1
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-misc-6.6' of https://github.com/kvm-x86/linux into HEAD

KVM x86 changes for 6.6:

 - Misc cleanups

 - Retry APIC optimized recalculation if a vCPU is added/enabled

 - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the
   "emergency disabling" behavior to KVM actually being loaded, and move all of
   the logic within KVM

 - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC
   ratio MSR can diverge from the default iff TSC scaling is enabled, and clean
   up related code

 - Add a framework to allow "caching" feature flags so that KVM can check if
   the guest can use a feature without needing to search guest CPUID
2023-08-31 13:36:33 -04:00
Paolo Bonzini
bd7fe98b35 KVM: x86: SVM changes for 6.6:
- Add support for SEV-ES DebugSwap, i.e. allow SEV-ES guests to use debug
    registers and generate/handle #DBs
 
  - Clean up LBR virtualization code
 
  - Fix a bug where KVM fails to set the target pCPU during an IRTE update
 
  - Fix fatal bugs in SEV-ES intrahost migration
 
  - Fix a bug where the recent (architecturally correct) change to reinject
    #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmTue8YSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5aqUP/jF7DyMXyQGYMKoQhFxWyGRhfqV8Ov8i
 7sUpEKSx5WTxOsFHBgdGeNU+m9eBJHWVmrJM9imI4OCUvJmxasRRsnyhvEUvBIUE
 amQT45aVm2xqjRNRUkOCUUHiDKtUdwpSRlOSyhnDTKmlMbNoH+fO3SLJ1oB/fsae
 wnmyiF98j2vT/5mD6F/F87hlNMq4CqG/OZWJ9Kk8GfvfJpUcC8r/H0NsMgSMF2/L
 Q+Hn+r/XDfMSrBiyEzevWyPbJi7nL+WF9EQDJASf+aAkmFMHK6AU4XNITwVw3XcZ
 FGtSP/NzvnePhd5gqtbiW9hRQkWcKjqnydtyI3ZDVVBpEbJ6OJn3+UFoLZ8NoSE+
 D3EDs1PA7Qjty6kYx9/NZpXz5BAMd9mikkTL7PTrlrAZAEimToqoHx7mBjmLp4E+
 diKrpG2w1OTtO/Pafi0z0zZN6Yc9MJOyZVK78DpIiLey3rNip9SawWGh+wV14WNC
 nbn7Wpf8EGE1E8n00mwrGMRCuRm7LQhLbcVXITiGKrbpxUzam6sqDIgt73Q7xma2
 NWcPizeFNy47uurNOA2V9xHkbEAYjWaM12uyzmGzILvvmvNnpU0NuZ78cgV5ZWMk
 4US53CAQbG4+qUCJWhIDoriluaLXjL9tLiZgJW0T6cus3nL5NWYqvlq6TWYyK00J
 zjiK7vky77Pq
 =WC5V
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-svm-6.6' of https://github.com/kvm-x86/linux into HEAD

KVM: x86: SVM changes for 6.6:

 - Add support for SEV-ES DebugSwap, i.e. allow SEV-ES guests to use debug
   registers and generate/handle #DBs

 - Clean up LBR virtualization code

 - Fix a bug where KVM fails to set the target pCPU during an IRTE update

 - Fix fatal bugs in SEV-ES intrahost migration

 - Fix a bug where the recent (architecturally correct) change to reinject
   #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)
2023-08-31 13:32:40 -04:00
Sean Christopherson
80d0f521d5 KVM: SVM: Require nrips support for SEV guests (and beyond)
Disallow SEV (and beyond) if nrips is disabled via module param, as KVM
can't read guest memory to partially emulate and skip an instruction.  All
CPUs that support SEV support NRIPS, i.e. this is purely stopping the user
from shooting themselves in the foot.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230825013621.2845700-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-25 09:00:40 -07:00
Sean Christopherson
cb49631ad1 KVM: SVM: Don't inject #UD if KVM attempts to skip SEV guest insn
Don't inject a #UD if KVM attempts to "emulate" to skip an instruction
for an SEV guest, and instead resume the guest and hope that it can make
forward progress.  When commit 04c40f344d ("KVM: SVM: Inject #UD on
attempted emulation for SEV guest w/o insn buffer") added the completely
arbitrary #UD behavior, there were no known scenarios where a well-behaved
guest would induce a VM-Exit that triggered emulation, i.e. it was thought
that injecting #UD would be helpful.

However, now that KVM (correctly) attempts to re-inject INT3/INTO, e.g. if
a #NPF is encountered when attempting to deliver the INT3/INTO, an SEV
guest can trigger emulation without a buffer, through no fault of its own.
Resuming the guest and retrying the INT3/INTO is architecturally wrong,
e.g. the vCPU will incorrectly re-hit code #DBs, but for SEV guests there
is literally no other option that has a chance of making forward progress.

Drop the #UD injection for all "skip" emulation, not just those related to
INT3/INTO, even though that means that the guest will likely end up in an
infinite loop instead of getting a #UD (the vCPU may also crash, e.g. if
KVM emulated everything about an instruction except for advancing RIP).
There's no evidence that suggests that an unexpected #UD is actually
better than hanging the vCPU, e.g. a soft-hung vCPU can still respond to
IRQs and NMIs to generate a backtrace.

Reported-by: Wu Zongyo <wuzongyo@mail.ustc.edu.cn>
Closes: https://lore.kernel.org/all/8eb933fd-2cf3-d7a9-32fe-2a1d82eac42a@mail.ustc.edu.cn
Fixes: 6ef88d6e36 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
Cc: stable@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230825013621.2845700-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-25 09:00:40 -07:00
Sean Christopherson
ee785c870d KVM: nSVM: Use KVM-governed feature framework to track "vNMI enabled"
Track "virtual NMI exposed to L1" via a governed feature flag instead of
using a dedicated bit/flag in vcpu_svm.

Note, checking KVM's capabilities instead of the "vnmi" param means that
the code isn't strictly equivalent, as vnmi_enabled could have been set
if nested=false where as that the governed feature cannot.  But that's a
glorified nop as the feature/flag is consumed only by paths that are
gated by nSVM being enabled.

Link: https://lore.kernel.org/r/20230815203653.519297-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:31 -07:00
Sean Christopherson
b89456aee7 KVM: nSVM: Use KVM-governed feature framework to track "vGIF enabled"
Track "virtual GIF exposed to L1" via a governed feature flag instead of
using a dedicated bit/flag in vcpu_svm.

Note, checking KVM's capabilities instead of the "vgif" param means that
the code isn't strictly equivalent, as vgif_enabled could have been set
if nested=false where as that the governed feature cannot.  But that's a
glorified nop as the feature/flag is consumed only by paths that are

Link: https://lore.kernel.org/r/20230815203653.519297-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:31 -07:00
Sean Christopherson
59d67fc1f0 KVM: nSVM: Use KVM-governed feature framework to track "Pause Filter enabled"
Track "Pause Filtering is exposed to L1" via governed feature flags
instead of using dedicated bits/flags in vcpu_svm.

No functional change intended.

Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/20230815203653.519297-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:30 -07:00
Sean Christopherson
e183d17ac3 KVM: nSVM: Use KVM-governed feature framework to track "LBRv enabled"
Track "LBR virtualization exposed to L1" via a governed feature flag
instead of using a dedicated bit/flag in vcpu_svm.

Note, checking KVM's capabilities instead of the "lbrv" param means that
the code isn't strictly equivalent, as lbrv_enabled could have been set
if nested=false where as that the governed feature cannot.  But that's a
glorified nop as the feature/flag is consumed only by paths that are
gated by nSVM being enabled.

Link: https://lore.kernel.org/r/20230815203653.519297-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:30 -07:00
Sean Christopherson
4d2a1560ff KVM: nSVM: Use KVM-governed feature framework to track "vVM{SAVE,LOAD} enabled"
Track "virtual VMSAVE/VMLOAD exposed to L1" via a governed feature flag
instead of using a dedicated bit/flag in vcpu_svm.

Opportunistically add a comment explaining why KVM disallows virtual
VMLOAD/VMSAVE when the vCPU model is Intel.

No functional change intended.

Link: https://lore.kernel.org/r/20230815203653.519297-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:29 -07:00
Sean Christopherson
4365a45571 KVM: nSVM: Use KVM-governed feature framework to track "TSC scaling enabled"
Track "TSC scaling exposed to L1" via a governed feature flag instead of
using a dedicated bit/flag in vcpu_svm.

Note, this fixes a benign bug where KVM would mark TSC scaling as exposed
to L1 even if overall nested SVM supported is disabled, i.e. KVM would let
L1 write MSR_AMD64_TSC_RATIO even when KVM didn't advertise TSCRATEMSR
support to userspace.

Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/20230815203653.519297-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:28 -07:00
Sean Christopherson
7a6a6a3bf5 KVM: nSVM: Use KVM-governed feature framework to track "NRIPS enabled"
Track "NRIPS exposed to L1" via a governed feature flag instead of using
a dedicated bit/flag in vcpu_svm.

No functional change intended.

Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/20230815203653.519297-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:28 -07:00
Sean Christopherson
fe60e8f65f KVM: x86: Use KVM-governed feature framework to track "XSAVES enabled"
Use the governed feature framework to track if XSAVES is "enabled", i.e.
if XSAVES can be used by the guest.  Add a comment in the SVM code to
explain the very unintuitive logic of deliberately NOT checking if XSAVES
is enumerated in the guest CPUID model.

No functional change intended.

Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/20230815203653.519297-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:38:28 -07:00
Borislav Petkov (AMD)
f58d6fbcb7 x86/CPU/AMD: Fix the DIV(0) initial fix attempt
Initially, it was thought that doing an innocuous division in the #DE
handler would take care to prevent any leaking of old data from the
divider but by the time the fault is raised, the speculation has already
advanced too far and such data could already have been used by younger
operations.

Therefore, do the innocuous division on every exit to userspace so that
userspace doesn't see any potentially old data from integer divisions in
kernel space.

Do the same before VMRUN too, to protect host data from leaking into the
guest too.

Fixes: 77245f1c3c ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230811213824.10025-1-bp@alien8.de
2023-08-14 11:02:50 +02:00
Linus Torvalds
138bcddb86 Add a mitigation for the speculative RAS (Return Address Stack) overflow
vulnerability on AMD processors. In short, this is yet another issue
 where userspace poisons a microarchitectural structure which can then be
 used to leak privileged information through a side channel.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmTQs1gACgkQEsHwGGHe
 VUo1UA/8C34PwJveZDcerdkaxSF+WKx7AjOI/L2ws1qn9YVFA3ItFMgVuFTrlY6c
 1eYKYB3FS9fVN3KzGOXGyhho6seHqfY0+8cyYupR+PVLn9rSy7GqHaIMr37FdQ2z
 yb9xu26v+gsvuPEApazS6MxijYS98u71rHhmg97qsHCnUiMJ01+TaGucntukNJv8
 FfwjZJvgeUiBPQ/6IeA/O0413tPPJ9weawPyW+sV1w7NlXjaUVkNXwiq/Xxbt9uI
 sWwMBjFHpSnhBRaDK8W5Blee/ZfsS6qhJ4jyEKUlGtsElMnZLPHbnrbpxxqA9gyE
 K+3ZhoHf/W1hhvcZcALNoUHLx0CvVekn0o41urAhPfUutLIiwLQWVbApmuW80fgC
 DhPedEFu7Wp6Okj5+Bqi/XOsOOWN2WRDSzdAq10o1C+e+fzmkr6y4E6gskfz1zXU
 ssD9S4+uAJ5bccS5lck4zLffsaA03nAYTlvl1KRP4pOz5G9ln6eyO20ar1WwfGAV
 o5ZsTJVGQMyVA49QFkksj+kOI3chkmDswPYyGn2y8OfqYXU4Ip4eN+VkjorIAo10
 zIec3Z0bCGZ9UUMylUmdtH3KAm8q0wVNoFrUkMEmO8j6nn7ew2BhwLMn4uu+nOnw
 lX2AG6PNhRLVDVaNgDsWMwejaDsitQPoWRuCIAZ0kQhbeYuwfpM=
 =73JY
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86/srso fixes from Borislav Petkov:
 "Add a mitigation for the speculative RAS (Return Address Stack)
  overflow vulnerability on AMD processors.

  In short, this is yet another issue where userspace poisons a
  microarchitectural structure which can then be used to leak privileged
  information through a side channel"

* tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/srso: Tie SBPB bit setting to microcode patch detection
  x86/srso: Add a forgotten NOENDBR annotation
  x86/srso: Fix return thunks in generated code
  x86/srso: Add IBPB on VMEXIT
  x86/srso: Add IBPB
  x86/srso: Add SRSO_NO support
  x86/srso: Add IBPB_BRTYPE support
  x86/srso: Add a Speculative RAS Overflow mitigation
  x86/bugs: Increase the x86 bugs vector size to two u32s
2023-08-07 16:35:44 -07:00
Sean Christopherson
223f93d4d8 KVM: nSVM: Skip writes to MSR_AMD64_TSC_RATIO if guest state isn't loaded
Skip writes to MSR_AMD64_TSC_RATIO that are done in the context of a vCPU
if guest state isn't loaded, i.e. if KVM will update MSR_AMD64_TSC_RATIO
during svm_prepare_switch_to_guest() before entering the guest.  Checking
guest_state_loaded may or may not be a net positive for performance as
the current_tsc_ratio cache will optimize away duplicate WRMSRs in the
vast majority of scenarios.  However, the cost of the check is negligible,
and the real motivation is to document that KVM needs to load the vCPU's
value only when running the vCPU.

Link: https://lore.kernel.org/r/20230729011608.1065019-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 17:16:30 -07:00
Sean Christopherson
2d63699099 KVM: x86: Always write vCPU's current TSC offset/ratio in vendor hooks
Drop the @offset and @multiplier params from the kvm_x86_ops hooks for
propagating TSC offsets/multipliers into hardware, and instead have the
vendor implementations pull the information directly from the vCPU
structure.  The respective vCPU fields _must_ be written at the same
time in order to maintain consistent state, i.e. it's not random luck
that the value passed in by all callers is grabbed from the vCPU.

Explicitly grabbing the value from the vCPU field in SVM's implementation
in particular will allow for additional cleanup without introducing even
more subtle dependencies.  Specifically, SVM can skip the WRMSR if guest
state isn't loaded, i.e. svm_prepare_switch_to_guest() will load the
correct value for the vCPU prior to entering the guest.

This also reconciles KVM's handling of related values that are stored in
the vCPU, as svm_write_tsc_offset() already assumes/requires the caller
to have updated l1_tsc_offset.

Link: https://lore.kernel.org/r/20230729011608.1065019-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 17:16:29 -07:00
Sean Christopherson
229725acfa KVM: SVM: Clean up preemption toggling related to MSR_AMD64_TSC_RATIO
Explicitly disable preemption when writing MSR_AMD64_TSC_RATIO only in the
"outer" helper, as all direct callers of the "inner" helper now run with
preemption already disabled.  And that isn't a coincidence, as the outer
helper requires a vCPU and is intended to be used when modifying guest
state and/or emulating guest instructions, which are typically done with
preemption enabled.

Direct use of the inner helper should be extremely limited, as the only
time KVM should modify MSR_AMD64_TSC_RATIO without a vCPU is when
sanitizing the MSR for a specific pCPU (currently done when {en,dis}abling
disabling SVM). The other direct caller is svm_prepare_switch_to_guest(),
which does have a vCPU, but is a one-off special case: KVM is about to
enter the guest on a specific pCPU and thus must have preemption disabled.

Link: https://lore.kernel.org/r/20230729011608.1065019-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 17:16:29 -07:00
Sean Christopherson
c0dc39bd2c KVM: nSVM: Use the "outer" helper for writing multiplier to MSR_AMD64_TSC_RATIO
When emulating nested SVM transitions, use the outer helper for writing
the TSC multiplier for L2.  Using the inner helper only for one-off cases,
i.e. for paths where KVM is NOT emulating or modifying vCPU state, will
allow for multiple cleanups:

 - Explicitly disabling preemption only in the outer helper
 - Getting the multiplier from the vCPU field in the outer helper
 - Skipping the WRMSR in the outer helper if guest state isn't loaded

Opportunistically delete an extra newline.

No functional change intended.

Link: https://lore.kernel.org/r/20230729011608.1065019-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 17:16:28 -07:00
Sean Christopherson
2e6b9bd49b KVM: SVM: Use "standard" stgi() helper when disabling SVM
Now that kvm_rebooting is guaranteed to be true prior to disabling SVM
in an emergency, use the existing stgi() helper instead of open coding
STGI.  In effect, eat faults on STGI if and only if kvm_rebooting==true.

Link: https://lore.kernel.org/r/20230721201859.2307736-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:15 -07:00
Sean Christopherson
6ae44e012f KVM: x86: Force kvm_rebooting=true during emergency reboot/crash
Set kvm_rebooting when virtualization is disabled in an emergency so that
KVM eats faults on virtualization instructions even if kvm_reboot() isn't
reached.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230721201859.2307736-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:15 -07:00
Sean Christopherson
76ab816108 x86/virt: KVM: Move "disable SVM" helper into KVM SVM
Move cpu_svm_disable() into KVM proper now that all hardware
virtualization management is routed through KVM.  Remove the now-empty
virtext.h.

No functional change intended.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230721201859.2307736-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:15 -07:00
Sean Christopherson
c4db4f20f3 KVM: SVM: Check that the current CPU supports SVM in kvm_is_svm_supported()
Check "this" CPU instead of the boot CPU when querying SVM support so that
the per-CPU checks done during hardware enabling actually function as
intended, i.e. will detect issues where SVM isn't support on all CPUs.

Disable migration for the use from svm_init() mostly so that the standard
accessors for the per-CPU data can be used without getting yelled at by
CONFIG_DEBUG_PREEMPT=y sanity checks.  Preventing the "disabled by BIOS"
error message from reporting the wrong CPU is largely a bonus, as ensuring
a stable CPU during module load is a non-goal for KVM.

Link: https://lore.kernel.org/all/ZAdxNgv0M6P63odE@google.com
Cc: Kai Huang <kai.huang@intel.com>
Cc: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230721201859.2307736-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:15 -07:00
Sean Christopherson
85fd29dd5f x86/virt: KVM: Open code cpu_has_svm() into kvm_is_svm_supported()
Fold the guts of cpu_has_svm() into kvm_is_svm_supported(), its sole
remaining user.

No functional change intended.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230721201859.2307736-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:15 -07:00
Sean Christopherson
baeb4de7ad x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot callback
Use the virt callback to disable SVM (and set GIF=1) during an emergency
instead of blindly attempting to disable SVM.  Like the VMX case, if a
hypervisor, i.e. KVM, isn't loaded/active, SVM can't be in use.

Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230721201859.2307736-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:14 -07:00
Sean Christopherson
a85cd52d72 KVM: SVM: Use svm_get_lbr_vmcb() helper to handle writes to DEBUGCTL
Use the recently introduced svm_get_lbr_vmcb() instead an open coded
equivalent to retrieve the target VMCB when emulating writes to
MSR_IA32_DEBUGCTLMSR.

No functional change intended.

Link: https://lore.kernel.org/r/20230607203519.1570167-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02 16:41:55 -07:00
Sean Christopherson
41dfb5f13e KVM: SVM: Clean up handling of LBR virtualization enabled
Clean up the enable_lbrv computation in svm_update_lbrv() to consolidate
the logic for computing enable_lbrv into a single statement, and to remove
the coding style violations (lack of curly braces on nested if).

No functional change intended.

Link: https://lore.kernel.org/r/20230607203519.1570167-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02 16:41:54 -07:00
Sean Christopherson
d518f8cc10 KVM: SVM: Fix dead KVM_BUG() code in LBR MSR virtualization
Refactor KVM's handling of LBR MSRs on SVM to avoid a second layer of
case statements, and thus eliminate a dead KVM_BUG() call, which (a) will
never be hit in the current code base and (b) if a future commit breaks
things, will never fire as KVM passes "false" instead "true" or '1' for
the KVM_BUG() condition.

Reported-by: Michal Luczaj <mhal@rbox.co>
Cc: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/20230607203519.1570167-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02 16:41:54 -07:00
Sean Christopherson
26a0652cb4 KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid
Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid,
e.g. due to setting bits 63:32, illegal combinations, or to a value that
isn't allowed in VMX (non-)root mode.  The VMX checks in particular are
"fun" as failure to disallow Real Mode for an L2 that is configured with
unrestricted guest disabled, when KVM itself has unrestricted guest
enabled, will result in KVM forcing VM86 mode to virtual Real Mode for
L2, but then fail to unwind the related metadata when synthesizing a
nested VM-Exit back to L1 (which has unrestricted guest enabled).

Opportunistically fix a benign typo in the prototype for is_valid_cr4().

Cc: stable@vger.kernel.org
Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29 11:05:31 -04:00
Sean Christopherson
b439eb8ab5 Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"
Now that handle_fastpath_set_msr_irqoff() acquires kvm->srcu, i.e. allows
dereferencing memslots during WRMSR emulation, drop the requirement that
"next RIP" is valid.  In hindsight, acquiring kvm->srcu would have been a
better fix than avoiding the pastpath, but at the time it was thought that
accessing SRCU-protected data in the fastpath was a one-off edge case.

This reverts commit 5c30e8101e.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721224337.2335137-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29 11:05:27 -04:00
Sean Christopherson
a6bb570902 KVM: SVM: Don't try to pointlessly single-step SEV-ES guests for NMI window
Bail early from svm_enable_nmi_window() for SEV-ES guests without trying
to enable single-step of the guest, as single-stepping an SEV-ES guest is
impossible and the guest is responsible for *telling* KVM when it is ready
for an new NMI to be injected.

Functionally, setting TF and RF in svm->vmcb->save.rflags is benign as the
field is ignored by hardware, but it's all kinds of confusing.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-10-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 16:13:43 -07:00
Sean Christopherson
389fbbec26 KVM: SVM: Don't defer NMI unblocking until next exit for SEV-ES guests
Immediately mark NMIs as unmasked in response to #VMGEXIT(NMI complete)
instead of setting awaiting_iret_completion and waiting until the *next*
VM-Exit to unmask NMIs.  The whole point of "NMI complete" is that the
guest is responsible for telling the hypervisor when it's safe to inject
an NMI, i.e. there's no need to wait.  And because there's no IRET to
single-step, the next VM-Exit could be a long time coming, i.e. KVM could
incorrectly hold an NMI pending for far longer than what is required and
expected.

Opportunistically fix a stale reference to HF_IRET_MASK.

Fixes: 916b54a768 ("KVM: x86: Move HF_NMI_MASK and HF_IRET_MASK into "struct vcpu_svm"")
Fixes: 4444dfe405 ("KVM: SVM: Add NMI support for an SEV-ES guest")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-9-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 16:13:18 -07:00
Alexey Kardashevskiy
c2690b5f01 KVM: SVM/SEV/SEV-ES: Rework intercepts
Currently SVM setup is done sequentially in
init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb()
and tries keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
is DR intercepts which is for SEV-ES before sev_es_init_vmcb() runs.

Move the SEV-ES intercept setup to sev_es_init_vmcb(). From now on
set_dr_intercepts()/clr_dr_intercepts() handle SVM/SEV only.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-6-aik@amd.com
[sean: drop comment about intercepting DR7]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 16:12:06 -07:00
Alexey Kardashevskiy
2837dd00f8 KVM: SEV-ES: explicitly disable debug
SVM/SEV enable debug registers intercepts to skip swapping DRs
on entering/exiting the guest. When the guest is in control of
debug registers (vcpu->guest_debug == 0), there is an optimisation to
reduce the number of context switches: intercepts are cleared and
the KVM_DEBUGREG_WONT_EXIT flag is set to tell KVM to do swapping
on guest enter/exit.

The same code also executes for SEV-ES, however it has no effect as
- it always takes (vcpu->guest_debug == 0) branch;
- KVM_DEBUGREG_WONT_EXIT is set but DR7 intercept is not cleared;
- vcpu_enter_guest() writes DRs but VMRUN for SEV-ES swaps them
with the values from _encrypted_ VMSA.

Be explicit about SEV-ES not supporting debug:
- return right away from dr_interception() and skip unnecessary processing;
- return an error right away from the KVM_SEV_LAUNCH_UPDATE_VMSA handler
if debugging was already enabled.
KVM_SET_GUEST_DEBUG are failing already after KVM_SEV_LAUNCH_UPDATE_VMSA
is finished due to vcpu->arch.guest_state_protected set to true.

Add WARN_ON to kvm_x86::sync_dirty_debug_regs() (saves guest DRs on
guest exit) to signify that SEV-ES won't hit that path.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-5-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 15:58:53 -07:00
Alexey Kardashevskiy
29de732cc9 KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV
Currently SVM setup is done sequentially in
init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb() and tries
keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
is #GP intercept which init_vmcb() skips setting for SEV guests and
then sev_es_init_vmcb() needlessly clears it.

Remove the SEV check from init_vmcb(). Clear the #GP intercept in
sev_init_vmcb(). SEV-ES will use the SEV setting.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-3-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 15:58:53 -07:00
Alexey Kardashevskiy
b265ee7bae KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header
Static functions set_dr_intercepts() and clr_dr_intercepts() are only
called from SVM so move them to .c.

No functional change intended.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-2-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 15:58:53 -07:00
Borislav Petkov (AMD)
d893832d0e x86/srso: Add IBPB on VMEXIT
Add the option to flush IBPB only on VMEXIT in order to protect from
malicious guests but one otherwise trusts the software that runs on the
hypervisor.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-27 11:07:19 +02:00
Paolo Bonzini
24975ce8b2 KVM SVM changes for 6.5:
- Drop manual TR/TSS load after VM-Exit now that KVM uses VMLOAD for host state
 
  - Fix a not-yet-problematic missing call to trace_kvm_exit() for VM-Exits that
    are handled in the fastpath
 
  - Print more descriptive information about the status of SEV and SEV-ES during
    module load
 
  - Assert that misc_cg_set_capacity() doesn't fail to avoid should-be-impossible
    memory leaks
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmSaK5ESHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5ApAQAJSlzFh6Cg6OzlKqsrXDTTrNuP7Yu5Pe
 tCQm9uppab++TyBz00GCoaUjqXJY1j1riStbl0j1yGJ69Ocjrqj58IGGeoj4NKgt
 Dsiwgs0IWCshe7noVcYQeC4FInrNiFOog7Zog7uDyJmtHprZHorcJ9rmsBXMmedw
 OSrzoxyhVwbtbPmgMfEP+xw4wccXVioci4EOySqI0GI9QrQ+cfdafs8irxxeLG7v
 IY1qG3fwNmGp2uHdb3lG48TUbggWzKG5o1RC+fwwN132Y2fepxjcAeZ25gNms3lz
 Q1fm7vPNkGRqelqg7x+z9B10D6uJc0hngZPe6Hs8C7y1+hvTjXwmx81WXsQxM7RM
 rhhbp1o1C0xKSLzFciaZyW4lQW4cw5wxGRNoIenpHUe48bK9wjTYxez2MiQwfbNJ
 Dt9RAaBVF/UdNBZu2wtA3czgHwOHKSqUOwO2N2iBW62KgRzITQe9r9VtVikslbQD
 /nAq7PJOGz8JuJXkDWI0nLYEW6pInzsiXB21CPQrYR8XOQnnWglzmMTL/KxPeVYg
 pBHJUf6U7AdhjHMkPp2Yc1eQTNspDzRfZBGFZz1YS103JpmUIs97W0phHru/ONKh
 1cBv2N9ZrOJhuL1LAxaLp9OSvR+UQP/mMdCAEjvUTpEbmqbtEAqWMkTTwIEdIi74
 PcaJfv7GsYJV
 =rcFn
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-svm-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM SVM changes for 6.5:

 - Drop manual TR/TSS load after VM-Exit now that KVM uses VMLOAD for host state

 - Fix a not-yet-problematic missing call to trace_kvm_exit() for VM-Exits that
   are handled in the fastpath

 - Print more descriptive information about the status of SEV and SEV-ES during
   module load

 - Assert that misc_cg_set_capacity() doesn't fail to avoid should-be-impossible
   memory leaks
2023-07-01 07:19:42 -04:00
Paolo Bonzini
751d77fefa KVM x86/pmu changes for 6.5:
- Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes
    included along the way
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmSaHFgSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5twMP/15ZJFqZVigVQoATJeeR9tWUuyJe95xM
 lyfnTel91Sg8XOamdwBGi7jLpaDgj34Jm0cfM7/4LbJk2/taeaCLYmJd5w9FXvaw
 EkytQGO85hVNe2XuY+h+XxSIxpflKxgFuUnOwcDk2QbKgASzNSG/mJ9ZBx8PNVXD
 FnyOqpbbYDFspWWvUOAI/RkHnr/dALjXJsSUMvuh3nz5e1NTyubjCAZg+/bse2nR
 s8FrcSh4B0Lg0h4r2fdJ4sAiM/qWhcCIhq5svyTAcUG0T4rMS40LrosJOw3wkBRM
 dyZYXy6GEENeCFJPhenF1mTE1embFyZp89PV/FCNRZXODbnM4kheJFT9gucAjlKi
 ZafRcutrkYIVf4lZCMofDfQGLX/GCEJnwUPKyGygIsPoDRrdR7OLrFycON5bxocr
 9NBNG+2teQFbnt5irB/bBGojtIZtu3OEylkuRjQUQ3lJYQ5r6LddarI9acIu1SHt
 4rRfh8QN5qmMvVblaQzggOr6BPtmPr8QqMEMFncaUMCsV/82hRAEfvj2rifGFJNo
 Axz1ajMfirxyM45WzredUkzzsbphiiegPBELCLRZfHmaEhJ8P7t7wvri0bXt9YdI
 vjSfX+6ulOgDC+xAazE0gEJO4Uh5+g3Y+1e0fr43ltWzUOWdCQskzD3LE9DkqIXj
 KAaCuHYbYpIZ
 =MwqV
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-pmu-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM x86/pmu changes for 6.5:

 - Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes
   included along the way
2023-07-01 07:18:51 -04:00
Paolo Bonzini
36b68d360a KVM x86 changes for 6.5:
- Move handling of PAT out of MTRR code and dedup SVM+VMX code
 
  - Fix output of PIC poll command emulation when there's an interrupt
 
  - Fix a longstanding bug in the reporting of the number of entries returned by
    KVM_GET_CPUID2
 
  - Add a maintainer's handbook to document KVM x86 processes, preferred coding
    style, testing expectations, etc.
 
  - Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmSaGMMSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5iDIP/0PwY3J5odTEUTnAyuDFPimd5PBt9k/O
 B414wdpSKVgzq+0An4qM9mKRnklVIh2p8QqQTvDhcBUg3xb6CX9xZ4ery7hp/T5O
 tr5bAXs2AYX6jpxvsopt+w+E9j6fvkJhcJCRU9im3QbrqwUE+ecyU5OHvmv2n/GO
 syVZJbPOYuoLPKDjlSMrScE6fWEl9UOvHc5BK/vafTeyisMG3vv1BSmJj6GuiNNk
 TS1RRIg//cOZghQyDfdXt0azTmakNZyNn35xnoX9x8SRmdRykyUjQeHmeqWxPDso
 kiGO+CGancfS57S6ZtCkJjqEWZ1o/zKdOxr8MMf/3nJhv4kY7/5XtlVoACv5soW9
 bZEmNiXIaSbvKNMwAlLJxHFbLa1sMdSCb345CIuMdt5QiWJ53ZiTyIAJX6+eL+Zf
 8nkeekgPf5VUs6Zt0RdRPyvo+W7Vp9BtI87yDXm1nQKpbys2pt6CD3YB/oF4QViG
 a5cyGoFuqRQbS3nmbshIlR7EanTuxbhLZKrNrFnolZ5e624h3Cnk2hVsfTznVGiX
 vNHWM80phk1CWB9McErrZVkGfjlyVyBL13CBB2XF7Dl6PfF6/N22a9bOuTJD3tvk
 PlNx4hvZm3esvvyGpjfbSajTKYE8O7rxiE1KrF0BpZ5IUl5WSiTr6XCy/yI/mIeM
 hay2IWhPOF2z
 =D0BH
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-misc-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM x86 changes for 6.5:

* Move handling of PAT out of MTRR code and dedup SVM+VMX code

* Fix output of PIC poll command emulation when there's an interrupt

* Add a maintainer's handbook to document KVM x86 processes, preferred coding
  style, testing expectations, etc.

* Misc cleanups
2023-07-01 07:08:59 -04:00
Like Xu
94cdeebd82 KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022
CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new
performance monitoring features for AMD processors.

Bit 0 of EAX indicates support for Performance Monitoring Version 2
(PerfMonV2) features. If found to be set during PMU initialization,
the EBX bits of the same CPUID function can be used to determine
the number of available PMCs for different PMU types.

Expose the relevant bits via KVM_GET_SUPPORTED_CPUID so that
guests can make use of the PerfMonV2 features.

Co-developed-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20230603011058.1038821-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-06 17:31:44 -07:00
Like Xu
d338d8789e KVM: x86/pmu: Advertise PERFCTR_CORE iff the min nr of counters is met
Enable and advertise PERFCTR_CORE if and only if the minimum number of
required counters are available, i.e. if perf says there are less than six
general purpose counters.

Opportunistically, use kvm_cpu_cap_check_and_set() instead of open coding
the check for host support.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
[sean: massage shortlog and changelog]
Link: https://lore.kernel.org/r/20230603011058.1038821-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-06 17:31:44 -07:00
Michal Luczaj
e12fa4b92a KVM: x86: Clean up: remove redundant bool conversions
As test_bit() returns bool, explicitly converting result to bool is
unnecessary. Get rid of '!!'.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230605200158.118109-1-mhal@rbox.co
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-06 17:13:55 -07:00
Sean Christopherson
791a089861 KVM: SVM: Invoke trace_kvm_exit() for fastpath VM-Exits
Move SVM's call to trace_kvm_exit() from the "slow" VM-Exit handler to
svm_vcpu_run() so that KVM traces fastpath VM-Exits that re-enter the
guest without bouncing through the slow path.  This bug is benign in the
current code base as KVM doesn't currently support any such exits on SVM.

Fixes: a9ab13ff6e ("KVM: X86: Improve latency for single target IPI fastpath")
Link: https://lore.kernel.org/r/20230602011920.787844-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-02 16:40:27 -07:00
Maciej S. Szmigiero
b2ce899788 KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware
I noticed that with vCPU count large enough (> 16) they sometimes froze at
boot.
With vCPU count of 64 they never booted successfully - suggesting some kind
of a race condition.

Since adding "vnmi=0" module parameter made these guests boot successfully
it was clear that the problem is most likely (v)NMI-related.

Running kvm-unit-tests quickly showed failing NMI-related tests cases, like
"multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests
and the NMI parts of eventinj test.

The issue was that once one NMI was being serviced no other NMI was allowed
to be set pending (NMI limit = 0), which was traced to
svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather
than for the "NMI pending" flag.

Fix this by testing for the right flag in svm_is_vnmi_pending().
Once this is done, the NMI-related kvm-unit-tests pass successfully and
the Windows guest no longer freezes at boot.

Fixes: fa4c027a79 ("KVM: x86: Add support for SVM's Virtual NMI")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-02 16:34:20 -07:00
Sean Christopherson
dee321977a KVM: x86: Move common handling of PAT MSR writes to kvm_set_msr_common()
Move the common check-and-set handling of PAT MSR writes out of vendor
code and into kvm_set_msr_common().  This aligns writes with reads, which
are already handled in common code, i.e. makes the handling of reads and
writes symmetrical in common code.

Alternatively, the common handling in kvm_get_msr_common() could be moved
to vendor code, but duplicating code is generally undesirable (even though
the duplicatated code is trivial in this case), and guest writes to PAT
should be rare, i.e. the overhead of the extra function call is a
non-issue in practice.

Suggested-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230511233351.635053-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-01 13:41:06 -07:00
Ke Guo
7aeae02761 KVM: SVM: Use kvm_pat_valid() directly instead of kvm_mtrr_valid()
Use kvm_pat_valid() directly instead of bouncing through kvm_mtrr_valid().
The PAT is not an MTRR, and kvm_mtrr_valid() just redirects to
kvm_pat_valid(), i.e. is exempt from KVM's "zap SPTEs" logic that's
needed to honor guest MTRRs when the VM has a passthrough device with
non-coherent DMA (KVM does NOT set "ignore guest PAT" in this case, and so
enables hardware virtualization of the guest's PAT, i.e. doesn't need to
manually emulate the PAT memtype).

Signed-off-by: Ke Guo <guoke@uniontech.com>
[sean: massage changelog]
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230511233351.635053-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-01 13:41:05 -07:00
Mingwei Zhang
0d3518d2f8 KVM: SVM: Remove TSS reloading code after VMEXIT
Remove the dedicated post-VMEXIT TSS reloading code now that KVM uses
VMLOAD to load host segment state, which includes TSS state.

Fixes: e79b91bb3c ("KVM: SVM: use vmsave/vmload for saving/restoring additional host state")
Reported-by: Venkatesh Srinivas <venkateshs@google.com>
Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20230523165635.4002711-1-mizhang@google.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-01 13:38:16 -07:00
Linus Torvalds
c8c655c34e s390:
* More phys_to_virt conversions
 
 * Improvement of AP management for VSIE (nested virtualization)
 
 ARM64:
 
 * Numerous fixes for the pathological lock inversion issue that
   plagued KVM/arm64 since... forever.
 
 * New framework allowing SMCCC-compliant hypercalls to be forwarded
   to userspace, hopefully paving the way for some more features
   being moved to VMMs rather than be implemented in the kernel.
 
 * Large rework of the timer code to allow a VM-wide offset to be
   applied to both virtual and physical counters as well as a
   per-timer, per-vcpu offset that complements the global one.
   This last part allows the NV timer code to be implemented on
   top.
 
 * A small set of fixes to make sure that we don't change anything
   affecting the EL1&0 translation regime just after having having
   taken an exception to EL2 until we have executed a DSB. This
   ensures that speculative walks started in EL1&0 have completed.
 
 * The usual selftest fixes and improvements.
 
 KVM x86 changes for 6.4:
 
 * Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled,
   and by giving the guest control of CR0.WP when EPT is enabled on VMX
   (VMX-only because SVM doesn't support per-bit controls)
 
 * Add CR0/CR4 helpers to query single bits, and clean up related code
   where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return
   as a bool
 
 * Move AMD_PSFD to cpufeatures.h and purge KVM's definition
 
 * Avoid unnecessary writes+flushes when the guest is only adding new PTEs
 
 * Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations
   when emulating invalidations
 
 * Clean up the range-based flushing APIs
 
 * Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single
   A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle
   changed SPTE" overhead associated with writing the entire entry
 
 * Track the number of "tail" entries in a pte_list_desc to avoid having
   to walk (potentially) all descriptors during insertion and deletion,
   which gets quite expensive if the guest is spamming fork()
 
 * Disallow virtualizing legacy LBRs if architectural LBRs are available,
   the two are mutually exclusive in hardware
 
 * Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
   after KVM_RUN, similar to CPUID features
 
 * Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES
 
 * Apply PMU filters to emulated events and add test coverage to the
   pmu_event_filter selftest
 
 x86 AMD:
 
 * Add support for virtual NMIs
 
 * Fixes for edge cases related to virtual interrupts
 
 x86 Intel:
 
 * Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
   not being reported due to userspace not opting in via prctl()
 
 * Fix a bug in emulation of ENCLS in compatibility mode
 
 * Allow emulation of NOP and PAUSE for L2
 
 * AMX selftests improvements
 
 * Misc cleanups
 
 MIPS:
 
 * Constify MIPS's internal callbacks (a leftover from the hardware enabling
   rework that landed in 6.3)
 
 Generic:
 
 * Drop unnecessary casts from "void *" throughout kvm_main.c
 
 * Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct
   size by 8 bytes on 64-bit kernels by utilizing a padding hole
 
 Documentation:
 
 * Fix goof introduced by the conversion to rST
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRNExkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNyjwf+MkzDael9y9AsOZoqhEZ5OsfQYJ32
 Im5ZVYsPRU2K5TuoWql6meIihgclCj1iIU32qYHa2F1WYt2rZ72rJp+HoY8b+TaI
 WvF0pvNtqQyg3iEKUBKPA4xQ6mj7RpQBw86qqiCHmlfNt0zxluEGEPxH8xrWcfhC
 huDQ+NUOdU7fmJ3rqGitCvkUbCuZNkw3aNPR8dhU8RAWrwRzP2hBOmdxIeo81WWY
 XMEpJSijbGpXL9CvM0Jz9nOuMJwZwCCBGxg1vSQq0xTfLySNMxzvWZC2GFaBjucb
 j0UOQ7yE0drIZDVhd3sdNslubXXU6FcSEzacGQb9aigMUon3Tem9SHi7Kw==
 =S2Hq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "s390:

   - More phys_to_virt conversions

   - Improvement of AP management for VSIE (nested virtualization)

  ARM64:

   - Numerous fixes for the pathological lock inversion issue that
     plagued KVM/arm64 since... forever.

   - New framework allowing SMCCC-compliant hypercalls to be forwarded
     to userspace, hopefully paving the way for some more features being
     moved to VMMs rather than be implemented in the kernel.

   - Large rework of the timer code to allow a VM-wide offset to be
     applied to both virtual and physical counters as well as a
     per-timer, per-vcpu offset that complements the global one. This
     last part allows the NV timer code to be implemented on top.

   - A small set of fixes to make sure that we don't change anything
     affecting the EL1&0 translation regime just after having having
     taken an exception to EL2 until we have executed a DSB. This
     ensures that speculative walks started in EL1&0 have completed.

   - The usual selftest fixes and improvements.

  x86:

   - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is
     enabled, and by giving the guest control of CR0.WP when EPT is
     enabled on VMX (VMX-only because SVM doesn't support per-bit
     controls)

   - Add CR0/CR4 helpers to query single bits, and clean up related code
     where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long"
     return as a bool

   - Move AMD_PSFD to cpufeatures.h and purge KVM's definition

   - Avoid unnecessary writes+flushes when the guest is only adding new
     PTEs

   - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s
     optimizations when emulating invalidations

   - Clean up the range-based flushing APIs

   - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a
     single A/D bit using a LOCK AND instead of XCHG, and skip all of
     the "handle changed SPTE" overhead associated with writing the
     entire entry

   - Track the number of "tail" entries in a pte_list_desc to avoid
     having to walk (potentially) all descriptors during insertion and
     deletion, which gets quite expensive if the guest is spamming
     fork()

   - Disallow virtualizing legacy LBRs if architectural LBRs are
     available, the two are mutually exclusive in hardware

   - Disallow writes to immutable feature MSRs (notably
     PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features

   - Overhaul the vmx_pmu_caps selftest to better validate
     PERF_CAPABILITIES

   - Apply PMU filters to emulated events and add test coverage to the
     pmu_event_filter selftest

   - AMD SVM:
       - Add support for virtual NMIs
       - Fixes for edge cases related to virtual interrupts

   - Intel AMX:
       - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if
         XTILE_DATA is not being reported due to userspace not opting in
         via prctl()
       - Fix a bug in emulation of ENCLS in compatibility mode
       - Allow emulation of NOP and PAUSE for L2
       - AMX selftests improvements
       - Misc cleanups

  MIPS:

   - Constify MIPS's internal callbacks (a leftover from the hardware
     enabling rework that landed in 6.3)

  Generic:

   - Drop unnecessary casts from "void *" throughout kvm_main.c

   - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the
     struct size by 8 bytes on 64-bit kernels by utilizing a padding
     hole

  Documentation:

   - Fix goof introduced by the conversion to rST"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits)
  KVM: s390: pci: fix virtual-physical confusion on module unload/load
  KVM: s390: vsie: clarifications on setting the APCB
  KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA
  KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state
  KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init()
  KVM: selftests: Test the PMU event "Instructions retired"
  KVM: selftests: Copy full counter values from guest in PMU event filter test
  KVM: selftests: Use error codes to signal errors in PMU event filter test
  KVM: selftests: Print detailed info in PMU event filter asserts
  KVM: selftests: Add helpers for PMC asserts in PMU event filter test
  KVM: selftests: Add a common helper for the PMU event filter guest code
  KVM: selftests: Fix spelling mistake "perrmited" -> "permitted"
  KVM: arm64: vhe: Drop extra isb() on guest exit
  KVM: arm64: vhe: Synchronise with page table walker on MMU update
  KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc()
  KVM: arm64: nvhe: Synchronise with page table walker on TLBI
  KVM: arm64: Handle 32bit CNTPCTSS traps
  KVM: arm64: nvhe: Synchronise with page table walker on vcpu run
  KVM: arm64: vgic: Don't acquire its_lock before config_lock
  KVM: selftests: Add test to verify KVM's supported XCR0
  ...
2023-05-01 12:06:20 -07:00