Commit Graph

769 Commits

Author SHA1 Message Date
Like Xu 765da7fe0e KVM: x86: Remove break statements that will never be executed
Fix compiler warnings when compiling KVM with [-Wunreachable-code-break].
No functional change intended.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20230807094243.32516-1-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:28:00 -07:00
Sean Christopherson 65966aaca1 KVM: x86: Assert that the emulator doesn't load CS with garbage in !RM
Yell loudly if KVM attempts to load CS outside of Real Mode without an
accompanying control transfer type, i.e. on X86_TRANSFER_NONE.  KVM uses
X86_TRANSFER_NONE when emulating IRET and exceptions/interrupts for Real
Mode, but IRET emulation for Protected Mode is non-existent.  WARN instead
of trying to pass in a less-wrong type, e.g. X86_TRANSFER_RET, as
emulating IRET goes even beyond emulating FAR RET (which KVM also doesn't
fully support).

Reported-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Link: https://lore.kernel.org/r/20230216202254.671772-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-23 16:07:52 -07:00
Linus Torvalds 49d5759268 ARM:
- Provide a virtual cache topology to the guest to avoid
   inconsistencies with migration on heterogenous systems. Non secure
   software has no practical need to traverse the caches by set/way in
   the first place.
 
 - Add support for taking stage-2 access faults in parallel. This was an
   accidental omission in the original parallel faults implementation,
   but should provide a marginal improvement to machines w/o FEAT_HAFDBS
   (such as hardware from the fruit company).
 
 - A preamble to adding support for nested virtualization to KVM,
   including vEL2 register state, rudimentary nested exception handling
   and masking unsupported features for nested guests.
 
 - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
   resuming a CPU when running pKVM.
 
 - VGIC maintenance interrupt support for the AIC
 
 - Improvements to the arch timer emulation, primarily aimed at reducing
   the trap overhead of running nested.
 
 - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
   interest of CI systems.
 
 - Avoid VM-wide stop-the-world operations when a vCPU accesses its own
   redistributor.
 
 - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions
   in the host.
 
 - Aesthetic and comment/kerneldoc fixes
 
 - Drop the vestiges of the old Columbia mailing list and add [Oliver]
   as co-maintainer
 
 This also drags in arm64's 'for-next/sme2' branch, because both it and
 the PSCI relay changes touch the EL2 initialization code.
 
 RISC-V:
 
 - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
 
 - Correctly place the guest in S-mode after redirecting a trap to the guest
 
 - Redirect illegal instruction traps to guest
 
 - SBI PMU support for guest
 
 s390:
 
 - Two patches sorting out confusion between virtual and physical
   addresses, which currently are the same on s390.
 
 - A new ioctl that performs cmpxchg on guest memory
 
 - A few fixes
 
 x86:
 
 - Change tdp_mmu to a read-only parameter
 
 - Separate TDP and shadow MMU page fault paths
 
 - Enable Hyper-V invariant TSC control
 
 - Fix a variety of APICv and AVIC bugs, some of them real-world,
   some of them affecting architecurally legal but unlikely to
   happen in practice
 
 - Mark APIC timer as expired if its in one-shot mode and the count
   underflows while the vCPU task was being migrated
 
 - Advertise support for Intel's new fast REP string features
 
 - Fix a double-shootdown issue in the emergency reboot code
 
 - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM
   similar treatment to VMX
 
 - Update Xen's TSC info CPUID sub-leaves as appropriate
 
 - Add support for Hyper-V's extended hypercalls, where "support" at this
   point is just forwarding the hypercalls to userspace
 
 - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and
   MSR filters
 
 - One-off fixes and cleanups
 
 - Fix and cleanup the range-based TLB flushing code, used when KVM is
   running on Hyper-V
 
 - Add support for filtering PMU events using a mask.  If userspace
   wants to restrict heavily what events the guest can use, it can now
   do so without needing an absurd number of filter entries
 
 - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
   support is disabled
 
 - Add PEBS support for Intel Sapphire Rapids
 
 - Fix a mostly benign overflow bug in SEV's send|receive_update_data()
 
 - Move several SVM-specific flags into vcpu_svm
 
 x86 Intel:
 
 - Handle NMI VM-Exits before leaving the noinstr region
 
 - A few trivial cleanups in the VM-Enter flows
 
 - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
   EPTP switching (or any other VM function) for L1
 
 - Fix a crash when using eVMCS's enlighted MSR bitmaps
 
 Generic:
 
 - Clean up the hardware enable and initialization flow, which was
   scattered around multiple arch-specific hooks.  Instead, just
   let the arch code call into generic code.  Both x86 and ARM should
   benefit from not having to fight common KVM code's notion of how
   to do initialization.
 
 - Account allocations in generic kvm_arch_alloc_vm()
 
 - Fix a memory leak if coalesced MMIO unregistration fails
 
 selftests:
 
 - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit
   the correct hypercall instruction instead of relying on KVM to patch
   in VMMCALL
 
 - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx
 mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O
 9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H
 duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs
 VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV
 /WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw==
 =goe1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Provide a virtual cache topology to the guest to avoid
     inconsistencies with migration on heterogenous systems. Non secure
     software has no practical need to traverse the caches by set/way in
     the first place

   - Add support for taking stage-2 access faults in parallel. This was
     an accidental omission in the original parallel faults
     implementation, but should provide a marginal improvement to
     machines w/o FEAT_HAFDBS (such as hardware from the fruit company)

   - A preamble to adding support for nested virtualization to KVM,
     including vEL2 register state, rudimentary nested exception
     handling and masking unsupported features for nested guests

   - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
     resuming a CPU when running pKVM

   - VGIC maintenance interrupt support for the AIC

   - Improvements to the arch timer emulation, primarily aimed at
     reducing the trap overhead of running nested

   - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
     interest of CI systems

   - Avoid VM-wide stop-the-world operations when a vCPU accesses its
     own redistributor

   - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected
     exceptions in the host

   - Aesthetic and comment/kerneldoc fixes

   - Drop the vestiges of the old Columbia mailing list and add [Oliver]
     as co-maintainer

  RISC-V:

   - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE

   - Correctly place the guest in S-mode after redirecting a trap to the
     guest

   - Redirect illegal instruction traps to guest

   - SBI PMU support for guest

  s390:

   - Sort out confusion between virtual and physical addresses, which
     currently are the same on s390

   - A new ioctl that performs cmpxchg on guest memory

   - A few fixes

  x86:

   - Change tdp_mmu to a read-only parameter

   - Separate TDP and shadow MMU page fault paths

   - Enable Hyper-V invariant TSC control

   - Fix a variety of APICv and AVIC bugs, some of them real-world, some
     of them affecting architecurally legal but unlikely to happen in
     practice

   - Mark APIC timer as expired if its in one-shot mode and the count
     underflows while the vCPU task was being migrated

   - Advertise support for Intel's new fast REP string features

   - Fix a double-shootdown issue in the emergency reboot code

   - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give
     SVM similar treatment to VMX

   - Update Xen's TSC info CPUID sub-leaves as appropriate

   - Add support for Hyper-V's extended hypercalls, where "support" at
     this point is just forwarding the hypercalls to userspace

   - Clean up the kvm->lock vs. kvm->srcu sequences when updating the
     PMU and MSR filters

   - One-off fixes and cleanups

   - Fix and cleanup the range-based TLB flushing code, used when KVM is
     running on Hyper-V

   - Add support for filtering PMU events using a mask. If userspace
     wants to restrict heavily what events the guest can use, it can now
     do so without needing an absurd number of filter entries

   - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
     support is disabled

   - Add PEBS support for Intel Sapphire Rapids

   - Fix a mostly benign overflow bug in SEV's
     send|receive_update_data()

   - Move several SVM-specific flags into vcpu_svm

  x86 Intel:

   - Handle NMI VM-Exits before leaving the noinstr region

   - A few trivial cleanups in the VM-Enter flows

   - Stop enabling VMFUNC for L1 purely to document that KVM doesn't
     support EPTP switching (or any other VM function) for L1

   - Fix a crash when using eVMCS's enlighted MSR bitmaps

  Generic:

   - Clean up the hardware enable and initialization flow, which was
     scattered around multiple arch-specific hooks. Instead, just let
     the arch code call into generic code. Both x86 and ARM should
     benefit from not having to fight common KVM code's notion of how to
     do initialization

   - Account allocations in generic kvm_arch_alloc_vm()

   - Fix a memory leak if coalesced MMIO unregistration fails

  selftests:

   - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to
     emit the correct hypercall instruction instead of relying on KVM to
     patch in VMMCALL

   - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits)
  KVM: SVM: hyper-v: placate modpost section mismatch error
  KVM: x86/mmu: Make tdp_mmu_allowed static
  KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
  KVM: arm64: nv: Filter out unsupported features from ID regs
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Handle SMCs taken from virtual EL2
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: Use the S2 MMU context to iterate over S2 table
  ...
2023-02-25 11:30:21 -08:00
Paolo Bonzini 4bc6dcaa15 KVM SVM changes for 6.3:
- Fix a mostly benign overflow bug in SEV's send|receive_update_data()
 
  - Move the SVM-specific "host flags" into vcpu_svm (extracted from the
    vNMI enabling series)
 
  - A handful for fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmPsH7kSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5/dQQAJSVCYA7F7LcfJf+c1ULG0XCd8rHnXdR
 EGTygTnWrzwTCRaunOBPE4AJRxKdkwTKy+yfnnQVRdYYfRe1SZKpUQ2XNEDGn0+v
 zVfOmSFFcCWXmJeY8y5n1GlDH4ENO3G7nD1ncDQ0I9PazmOsmxChoVZ9afFJ5bpo
 73hjcYVfUDYxGkeRLWSSSFtWIGguE8BkpRH3wZ8MGZi+ueoFUJPPBKHeDtxCV2/T
 KcJLne8tQVTiWCdMO3EFwxgIvsjQDoT0gZYLYNHJ6KqD9Szc3jA9v2ryTm5IYlpb
 akYUqePaD0SGrrfDBrwz3bLu3fehDu7eduXESRlzzb8S4xP7/qXeo9KeVN+b4MBb
 nmBBFncvMWbC8Po5wB5OVAfAa7ACmGiXeBV8pfgGI6FTq1fpc4VNm2PevKkDvlqN
 O2eZ1KuNkwBnbIPj3JVPPnsJcUjYXFjZyzfpMV1T+ExmL/IYceatX4S7zfgL5nUg
 3qFi5mX2Cufk2EBvBu+Dkpt/H4lze+ysZRciMC+v7Q4LWAYZ8HW1a44pnBVUJMPM
 bWiJ1/O8RIWM1tWIrlO38+ZZalbu3spIVMBXKzqEGXvpUwJ4UgZM1tFiWvISTVFe
 2X6N3d7aT/DQ1PzZU6BsyVZWAFaodHBauMcr9FUkWqqGu3HOhqC4rSJZ9eRR7V5O
 WSp1gTVY1JXy
 =AVpx
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-svm-6.3' of https://github.com/kvm-x86/linux into HEAD

KVM SVM changes for 6.3:

 - Fix a mostly benign overflow bug in SEV's send|receive_update_data()

 - Move the SVM-specific "host flags" into vcpu_svm (extracted from the
   vNMI enabling series)

 - A handful for fixes and cleanups
2023-02-15 12:23:06 -05:00
Michal Luczaj 096691e0d2 KVM: x86/emulator: Fix comment in __load_segment_descriptor()
The comment refers to the same condition twice. Make it reflect what the
code actually does.

No functional change intended.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20230126013405.2967156-3-mhal@rbox.co
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-03 13:34:02 -08:00
Michal Luczaj 0735d1c34e KVM: x86/emulator: Fix segment load privilege level validation
Intel SDM describes what steps are taken by the CPU to verify if a
memory segment can actually be used at a given privilege level. Loading
DS/ES/FS/GS involves checking segment's type as well as making sure that
neither selector's RPL nor caller's CPL are greater than segment's DPL.

Emulator implements Intel's pseudocode in __load_segment_descriptor(),
even quoting the pseudocode in the comments. Although the pseudocode is
correctly translated, the implementation is incorrect. This is most
likely due to SDM, at the time, being wrong.

Patch fixes emulator's logic and updates the pseudocode in the comment.
Below are historical notes.

Emulator code for handling segment descriptors appears to have been
introduced in March 2010 in commit 38ba30ba51 ("KVM: x86 emulator:
Emulate task switch in emulator.c"). Intel SDM Vol 2A: Instruction Set
Reference, A-M (Order Number: 253666-034US, _March 2010_) lists the
steps for loading segment registers in section related to MOV
instruction:

  IF DS, ES, FS, or GS is loaded with non-NULL selector
  THEN
    IF segment selector index is outside descriptor table limits
    or segment is not a data or readable code segment
    or ((segment is a data or nonconforming code segment)
    and (both RPL and CPL > DPL))   <---
      THEN #GP(selector); FI;

This is precisely what __load_segment_descriptor() quotes and
implements. But there's a twist; a few SDM revisions later
(253667-044US), in August 2012, the snippet above becomes:

  IF DS, ES, FS, or GS is loaded with non-NULL selector
  THEN
    IF segment selector index is outside descriptor table limits
    or segment is not a data or readable code segment
    or ((segment is a data or nonconforming code segment)
      [note: missing or superfluous parenthesis?]
    or ((RPL > DPL) and (CPL > DPL))   <---
      THEN #GP(selector); FI;

Many SDMs later (253667-065US), in December 2017, pseudocode reaches
what seems to be its final form:

  IF DS, ES, FS, or GS is loaded with non-NULL selector
  THEN
    IF segment selector index is outside descriptor table limits
    OR segment is not a data or readable code segment
    OR ((segment is a data or nonconforming code segment)
        AND ((RPL > DPL) or (CPL > DPL)))   <---
      THEN #GP(selector); FI;

which also matches the behavior described in AMD's APM, which states that
a #GP occurs if:

  The DS, ES, FS, or GS register was loaded and the segment pointed to
  was a data or non-conforming code segment, but the RPL or CPL was
  greater than the DPL.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20230126013405.2967156-2-mhal@rbox.co
[sean: add blurb to changelog calling out AMD agrees]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-03 13:31:59 -08:00
Ricardo Ribalda 828dfc0f7b scripts/spelling.txt: add `permitted'
Patch series "spelling: Fix some trivial typos".

Seems like permitted has two t's :), Lets add that to spellings to help
others.


This patch (of 3):

Add another common typo. Noticed when I sent a patch with the typo and
in kvm and of.

[ribalda@chromium.org: fix trivial typo]
  Link: https://lkml.kernel.org/r/20221220-permited-v1-2-52ea9857fa61@chromium.org
Link: https://lkml.kernel.org/r/20221220-permited-v1-1-52ea9857fa61@chromium.org
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:50:01 -08:00
Maxim Levitsky 32e69f232d KVM: x86: Use emulator callbacks instead of duplicating "host flags"
Instead of re-defining the "host flags" bits, just expose dedicated
helpers for each of the two remaining flags that are consumed by the
emulator.  The emulator never consumes both "is guest" and "is SMM" in
close proximity, so there is no motivation to avoid additional indirect
branches.

Also while at it, garbage collect the recently removed host flags.

No functional change is intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Santosh Shukla <Santosh.Shukla@amd.com>
Link: https://lore.kernel.org/r/20221129193717.513824-6-mlevitsk@redhat.com
[sean: fix CONFIG_KVM_SMM=n builds, tweak names of wrappers]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-31 17:29:09 -08:00
Sean Christopherson 8d20bd6381 KVM: x86: Unify pr_fmt to use module name for all KVM modules
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code.  In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.

Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.

Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.

Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl().  But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20221130230934.1014142-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:47:35 -05:00
Paolo Bonzini f1554150d3 KVM: x86: move SMM exit to a new file
Some users of KVM implement the UEFI variable store through a paravirtual
device that does not require the "SMM lockbox" component of edk2, and
would like to compile out system management mode.  In preparation for
that, move the SMM exit code out of emulate.c and into a new file.

The code is still written as a series of invocations of the emulator
callbacks, but the two exiting_smm and leave_smm callbacks are merged
into one, and all the code from em_rsm is now part of the callback.
This removes all knowledge of the format of the SMM save state area
from the emulator.  Further patches will clean up the code and
invoke KVM's own functions to access control registers, descriptor
caches, etc.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220929172016.319443-4-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09 12:31:18 -05:00
Paolo Bonzini b0b42197b5 KVM: x86: start moving SMM-related functions to new files
Create a new header and source with code related to system management
mode emulation.  Entry and exit will move there too; for now,
opportunistically rename put_smstate to PUT_SMSTATE while moving
it to smm.h, and adjust the SMM state saving code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220929172016.319443-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09 12:31:16 -05:00
Maxim Levitsky 696db303e5 KVM: x86: smm: number of GPRs in the SMRAM image depends on the image format
On 64 bit host, if the guest doesn't have X86_FEATURE_LM, KVM will
access 16 gprs to 32-bit smram image, causing out-ouf-bound ram
access.

On 32 bit host, the rsm_load_state_64/enter_smm_save_state_64
is compiled out, thus access overflow can't happen.

Fixes: b443183a25 ("KVM: x86: Reduce the number of emulator GPRs to '8' for 32-bit KVM")

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221025124741.228045-15-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28 06:10:30 -04:00
Maxim Levitsky ad8f9e6994 KVM: x86: emulator: update the emulation mode after CR0 write
Update the emulation mode when handling writes to CR0, because
toggling CR0.PE switches between Real and Protected Mode, and toggling
CR0.PG when EFER.LME=1 switches between Long and Protected Mode.

This is likely a benign bug because there is no writeback of state,
other than the RIP increment, and when toggling CR0.PE, the CPU has
to execute code from a very low memory address.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221025124741.228045-14-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28 06:10:30 -04:00
Maxim Levitsky 055f37f84e KVM: x86: emulator: update the emulation mode after rsm
Update the emulation mode after RSM so that RIP will be correctly
written back, because the RSM instruction can switch the CPU mode from
32 bit (or less) to 64 bit.

This fixes a guest crash in case the #SMI is received while the guest
runs a code from an address > 32 bit.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221025124741.228045-13-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28 06:10:29 -04:00
Maxim Levitsky d087e0f79f KVM: x86: emulator: introduce emulator_recalc_and_set_mode
Some instructions update the cpu execution mode, which needs to update the
emulation mode.

Extract this code, and make assign_eip_far use it.

assign_eip_far now reads CS, instead of getting it via a parameter,
which is ok, because callers always assign CS to the same value
before calling this function.

No functional change is intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221025124741.228045-12-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28 06:10:29 -04:00
Maxim Levitsky 5015bb89b5 KVM: x86: emulator: em_sysexit should update ctxt->mode
SYSEXIT is one of the instructions that can change the
processor mode, thus ctxt->mode should be updated after it.

Note that this is likely a benign bug, because the only problematic
mode change is from 32 bit to 64 bit which can lead to truncation of RIP,
and it is not possible to do with sysexit,
since sysexit running in 32 bit mode will be limited to 32 bit version.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221025124741.228045-11-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28 06:10:28 -04:00
Linus Torvalds ef688f8b8c The first batch of KVM patches, mostly covering x86, which I
am sending out early due to me travelling next week.  There is a
 lone mm patch for which Andrew gave an informal ack at
 https://lore.kernel.org/linux-mm/20220817102500.440c6d0a3fce296fdf91bea6@linux-foundation.org.
 
 I will send the bulk of ARM work, as well as other
 architectures, at the end of next week.
 
 ARM:
 
 * Account stage2 page table allocations in memory stats.
 
 x86:
 
 * Account EPT/NPT arm64 page table allocations in memory stats.
 
 * Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR accesses.
 
 * Drop eVMCS controls filtering for KVM on Hyper-V, all known versions of
   Hyper-V now support eVMCS fields associated with features that are
   enumerated to the guest.
 
 * Use KVM's sanitized VMCS config as the basis for the values of nested VMX
   capabilities MSRs.
 
 * A myriad event/exception fixes and cleanups.  Most notably, pending
   exceptions morph into VM-Exits earlier, as soon as the exception is
   queued, instead of waiting until the next vmentry.  This fixed
   a longstanding issue where the exceptions would incorrecly become
   double-faults instead of triggering a vmexit; the common case of
   page-fault vmexits had a special workaround, but now it's fixed
   for good.
 
 * A handful of fixes for memory leaks in error paths.
 
 * Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow.
 
 * Never write to memory from non-sleepable kvm_vcpu_check_block()
 
 * Selftests refinements and cleanups.
 
 * Misc typo cleanups.
 
 Generic:
 
 * remove KVM_REQ_UNHALT
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmM2zwcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNpbwf+MlVeOlzE5SBdrJ0TEnLmKUel1lSz
 QnZzP5+D65oD0zhCilUZHcg6G4mzZ5SdVVOvrGJvA0eXh25ruLNMF6jbaABkMLk/
 FfI1ybN7A82hwJn/aXMI/sUurWv4Jteaad20JC2DytBCnsW8jUqc49gtXHS2QWy4
 3uMsFdpdTAg4zdJKgEUfXBmQviweVpjjl3ziRyZZ7yaeo1oP7XZ8LaE1nR2l5m0J
 mfjzneNm5QAnueypOh5KhSwIvqf6WHIVm/rIHDJ1HIFbgfOU0dT27nhb1tmPwAcE
 +cJnnMUHjZqtCXteHkAxMClyRq0zsEoKk0OGvSOOMoq3Q0DavSXUNANOig==
 =/hqX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "The first batch of KVM patches, mostly covering x86.

  ARM:

   - Account stage2 page table allocations in memory stats

  x86:

   - Account EPT/NPT arm64 page table allocations in memory stats

   - Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR
     accesses

   - Drop eVMCS controls filtering for KVM on Hyper-V, all known
     versions of Hyper-V now support eVMCS fields associated with
     features that are enumerated to the guest

   - Use KVM's sanitized VMCS config as the basis for the values of
     nested VMX capabilities MSRs

   - A myriad event/exception fixes and cleanups. Most notably, pending
     exceptions morph into VM-Exits earlier, as soon as the exception is
     queued, instead of waiting until the next vmentry. This fixed a
     longstanding issue where the exceptions would incorrecly become
     double-faults instead of triggering a vmexit; the common case of
     page-fault vmexits had a special workaround, but now it's fixed for
     good

   - A handful of fixes for memory leaks in error paths

   - Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow

   - Never write to memory from non-sleepable kvm_vcpu_check_block()

   - Selftests refinements and cleanups

   - Misc typo cleanups

  Generic:

   - remove KVM_REQ_UNHALT"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
  KVM: remove KVM_REQ_UNHALT
  KVM: mips, x86: do not rely on KVM_REQ_UNHALT
  KVM: x86: never write to memory from kvm_vcpu_check_block()
  KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events
  KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending
  KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter
  KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set
  KVM: x86: lapic does not have to process INIT if it is blocked
  KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific
  KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed
  KVM: nVMX: Make an event request when pending an MTF nested VM-Exit
  KVM: x86: make vendor code check for all nested events
  mailmap: Update Oliver's email address
  KVM: x86: Allow force_emulation_prefix to be written without a reload
  KVM: selftests: Add an x86-only test to verify nested exception queueing
  KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes
  KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events()
  KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior
  KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions
  KVM: x86: Morph pending exceptions to pending VM-Exits at queue time
  ...
2022-10-09 09:39:55 -07:00
Sean Christopherson 0701ec903e KVM: x86: Use DR7_GD macro instead of open coding check in emulator
Use DR7_GD in the emulator instead of open coding the check, and drop a
comically wrong comment.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-10-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:03:06 -04:00
Hou Wenlong 36d546d59a KVM: x86: Return emulator error if RDMSR/WRMSR emulation failed
The return value of emulator_{get|set}_mst_with_filter() is confused,
since msr access error and emulator error are mixed. Although,
KVM_MSR_RET_* doesn't conflict with X86EMUL_IO_NEEDED at present, it is
better to convert msr access error to emulator error if error value is
needed.

So move "r < 0" handling for wrmsr emulation into the set helper function,
then only X86EMUL_* is returned in the helper functions. Also add "r < 0"
check in the get helper function, although KVM doesn't return -errno
today, but assuming that will always hold true is unnecessarily risking.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Link: https://lore.kernel.org/r/09b2847fc3bcb8937fb11738f0ccf7be7f61d9dd.1661930557.git.houwenlong.hwl@antgroup.com
[sean: wrap changelog less aggressively]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:03:03 -04:00
Liam Ni e390f4d69d KVM:x86: Clean up ModR/M "reg" initialization in reg op decoding
Refactor decode_register_operand() to get the ModR/M register if and
only if the instruction uses a ModR/M encoding to make it more obvious
how the register operand is retrieved.

Signed-off-by: Liam Ni <zhiguangni01@gmail.com>
Link: https://lore.kernel.org/r/20220908141210.1375828-1-zhiguangni01@zhaoxin.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:34 -04:00
Sean Christopherson 50b2d49baf KVM: x86: Inject #UD on emulated XSETBV if XSAVES isn't enabled
Inject #UD when emulating XSETBV if CR4.OSXSAVE is not set.  This also
covers the "XSAVE not supported" check, as setting CR4.OSXSAVE=1 #GPs if
XSAVE is not supported (and userspace gets to keep the pieces if it
forces incoherent vCPU state).

Add a comment to kvm_emulate_xsetbv() to call out that the CPU checks
CR4.OSXSAVE before checking for intercepts.  AMD'S APM implies that #UD
has priority (says that intercepts are checked before #GP exceptions),
while Intel's SDM says nothing about interception priority.  However,
testing on hardware shows that both AMD and Intel CPUs prioritize the #UD
over interception.

Fixes: 02d4160fbd ("x86: KVM: add xsetbv to the emulator")
Cc: stable@vger.kernel.org
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220824033057.3576315-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-22 17:04:20 -04:00
Michal Luczaj 6aa5c47c35 KVM: x86/emulator: Fix handing of POP SS to correctly set interruptibility
The emulator checks the wrong variable while setting the CPU
interruptibility state, the target segment is embedded in the instruction
opcode, not the ModR/M register.  Fix the condition.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Fixes: a5457e7bcf ("KVM: emulate: POP SS triggers a MOV SS shadow too")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20220821215900.1419215-1-mhal@rbox.co
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-08-24 13:45:40 -07:00
Nick Desaulniers a0a12c3ed0 asm goto: eradicate CC_HAS_ASM_GOTO
GCC has supported asm goto since 4.5, and Clang has since version 9.0.0.
The minimum supported versions of these tools for the build according to
Documentation/process/changes.rst are 5.1 and 11.0.0 respectively.

Remove the feature detection script, Kconfig option, and clean up some
fallback code that is no longer supported.

The removed script was also testing for a GCC specific bug that was
fixed in the 4.7 release.

Also remove workarounds for bpftrace using clang older than 9.0.0, since
other BPF backend fixes are required at this point.

Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/
Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637
Acked-by: Borislav Petkov <bp@suse.de>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-21 10:06:28 -07:00
Josh Poimboeuf 3d9606b0e0 x86/kvm: Fix "missing ENDBR" BUG for fastop functions
The following BUG was reported:

  traps: Missing ENDBR: andw_ax_dx+0x0/0x10 [kvm]
  ------------[ cut here ]------------
  kernel BUG at arch/x86/kernel/traps.c:253!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
   <TASK>
   asm_exc_control_protection+0x2b/0x30
  RIP: 0010:andw_ax_dx+0x0/0x10 [kvm]
  Code: c3 cc cc cc cc 0f 1f 44 00 00 66 0f 1f 00 48 19 d0 c3 cc cc cc
        cc 0f 1f 40 00 f3 0f 1e fa 20 d0 c3 cc cc cc cc 0f 1f 44 00 00
        <66> 0f 1f 00 66 21 d0 c3 cc cc cc cc 0f 1f 40 00 66 0f 1f 00 21
        d0

   ? andb_al_dl+0x10/0x10 [kvm]
   ? fastop+0x5d/0xa0 [kvm]
   x86_emulate_insn+0x822/0x1060 [kvm]
   x86_emulate_instruction+0x46f/0x750 [kvm]
   complete_emulated_mmio+0x216/0x2c0 [kvm]
   kvm_arch_vcpu_ioctl_run+0x604/0x650 [kvm]
   kvm_vcpu_ioctl+0x2f4/0x6b0 [kvm]
   ? wake_up_q+0xa0/0xa0

The BUG occurred because the ENDBR in the andw_ax_dx() fastop function
had been incorrectly "sealed" (converted to a NOP) by apply_ibt_endbr().

Objtool marked it to be sealed because KVM has no compile-time
references to the function.  Instead KVM calculates its address at
runtime.

Prevent objtool from annotating fastop functions as sealable by creating
throwaway dummy compile-time references to the functions.

Fixes: 6649fa876d ("x86/ibt,kvm: Add ENDBR to fastops")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Debugged-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Message-Id: <0d4116f90e9d0c1b754bb90c585e6f0415a1c508.1660837839.git.jpoimboe@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:05:42 -04:00
Josh Poimboeuf 22472d1260 x86/kvm: Simplify FOP_SETCC()
SETCC_ALIGN and FOP_ALIGN are both 16.  Remove the special casing for
FOP_SETCC() and just make it a normal fastop.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Message-Id: <7c13d94d1a775156f7e36eed30509b274a229140.1660837839.git.jpoimboe@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19 04:05:42 -04:00
Michal Luczaj 4ac5b42377 KVM: x86: emulator: Fix illegal LEA handling
The emulator mishandles LEA with register source operand. Even though such
LEA is illegal, it can be encoded and fed to CPU. In which case real
hardware throws #UD. The emulator, instead, returns address of
x86_emulate_ctxt._regs. This info leak hurts host's kASLR.

Tell the decoder that illegal LEA is not to be emulated.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Message-Id: <20220729134801.1120-1-mhal@rbox.co>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-10 15:08:23 -04:00
Paolo Bonzini 63f4b21041 Merge remote-tracking branch 'kvm/next' into kvm-next-5.20
KVM/s390, KVM/x86 and common infrastructure changes for 5.20

x86:

* Permit guests to ignore single-bit ECC errors

* Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache

* Intel IPI virtualization

* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS

* PEBS virtualization

* Simplify PMU emulation by just using PERF_TYPE_RAW events

* More accurate event reinjection on SVM (avoid retrying instructions)

* Allow getting/setting the state of the speaker port data bit

* Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent

* "Notify" VM exit (detect microarchitectural hangs) for Intel

* Cleanups for MCE MSR emulation

s390:

* add an interface to provide a hypervisor dump for secure guests

* improve selftests to use TAP interface

* enable interpretive execution of zPCI instructions (for PCI passthrough)

* First part of deferred teardown

* CPU Topology

* PV attestation

* Minor fixes

Generic:

* new selftests API using struct kvm_vcpu instead of a (vm, id) tuple

x86:

* Use try_cmpxchg64 instead of cmpxchg64

* Bugfixes

* Ignore benign host accesses to PMU MSRs when PMU is disabled

* Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior

* x86/MMU: Allow NX huge pages to be disabled on a per-vm basis

* Port eager page splitting to shadow MMU as well

* Enable CMCI capability by default and handle injected UCNA errors

* Expose pid of vcpu threads in debugfs

* x2AVIC support for AMD

* cleanup PIO emulation

* Fixes for LLDT/LTR emulation

* Don't require refcounted "struct page" to create huge SPTEs

x86 cleanups:

* Use separate namespaces for guest PTEs and shadow PTEs bitmasks

* PIO emulation

* Reorganize rmap API, mostly around rmap destruction

* Do not workaround very old KVM bugs for L0 that runs with nesting enabled

* new selftests API for CPUID
2022-08-01 03:21:00 -04:00
Paolo Bonzini 7962918160 KVM: emulate: do not adjust size of fastop and setcc subroutines
Instead of doing complicated calculations to find the size of the subroutines
(which are even more complicated because they need to be stringified into
an asm statement), just hardcode to 16.

It is less dense for a few combinations of IBT/SLS/retbleed, but it has
the advantage of being really simple.

Cc: stable@vger.kernel.org # 5.15.x: 84e7051c0bc1: x86/kvm: fix FASTOP_SIZE when return thunks are enabled
Cc: stable@vger.kernel.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-15 07:49:40 -04:00
Thadeu Lima de Souza Cascardo 84e7051c0b x86/kvm: fix FASTOP_SIZE when return thunks are enabled
The return thunk call makes the fastop functions larger, just like IBT
does. Consider a 16-byte FASTOP_SIZE when CONFIG_RETHUNK is enabled.

Otherwise, functions will be incorrectly aligned and when computing their
position for differently sized operators, they will executed in the middle
or end of a function, which may as well be an int3, leading to a crash
like:

[   36.091116] int3: 0000 [#1] SMP NOPTI
[   36.091119] CPU: 3 PID: 1371 Comm: qemu-system-x86 Not tainted 5.15.0-41-generic #44
[   36.091120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[   36.091121] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm]
[   36.091185] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc
[   36.091186] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202
[   36.091188] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[   36.091188] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200
[   36.091189] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002
[   36.091190] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70
[   36.091190] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000
[   36.091191] FS:  00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000
[   36.091192] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.091192] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0
[   36.091195] PKRU: 55555554
[   36.091195] Call Trace:
[   36.091197]  <TASK>
[   36.091198]  ? fastop+0x5a/0xa0 [kvm]
[   36.091222]  x86_emulate_insn+0x7b8/0xe90 [kvm]
[   36.091244]  x86_emulate_instruction+0x2f4/0x630 [kvm]
[   36.091263]  ? kvm_arch_vcpu_load+0x7c/0x230 [kvm]
[   36.091283]  ? vmx_prepare_switch_to_host+0xf7/0x190 [kvm_intel]
[   36.091290]  complete_emulated_mmio+0x297/0x320 [kvm]
[   36.091310]  kvm_arch_vcpu_ioctl_run+0x32f/0x550 [kvm]
[   36.091330]  kvm_vcpu_ioctl+0x29e/0x6d0 [kvm]
[   36.091344]  ? kvm_vcpu_ioctl+0x120/0x6d0 [kvm]
[   36.091357]  ? __fget_files+0x86/0xc0
[   36.091362]  ? __fget_files+0x86/0xc0
[   36.091363]  __x64_sys_ioctl+0x92/0xd0
[   36.091366]  do_syscall_64+0x59/0xc0
[   36.091369]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091370]  ? do_syscall_64+0x69/0xc0
[   36.091371]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091372]  ? __x64_sys_writev+0x1c/0x30
[   36.091374]  ? do_syscall_64+0x69/0xc0
[   36.091374]  ? exit_to_user_mode_prepare+0x37/0xb0
[   36.091378]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.091379]  ? do_syscall_64+0x69/0xc0
[   36.091379]  ? do_syscall_64+0x69/0xc0
[   36.091380]  ? do_syscall_64+0x69/0xc0
[   36.091381]  ? do_syscall_64+0x69/0xc0
[   36.091381]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[   36.091384] RIP: 0033:0x7efdfe6d1aff
[   36.091390] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[   36.091391] RSP: 002b:00007efdfce8c460 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   36.091393] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007efdfe6d1aff
[   36.091393] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000c
[   36.091394] RBP: 0000558f1609e220 R08: 0000558f13fb8190 R09: 00000000ffffffff
[   36.091394] R10: 0000558f16b5e950 R11: 0000000000000246 R12: 0000000000000000
[   36.091394] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[   36.091396]  </TASK>
[   36.091397] Modules linked in: isofs nls_iso8859_1 kvm_intel joydev kvm input_leds serio_raw sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler drm msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net net_failover crypto_simd ahci xhci_pci cryptd psmouse virtio_blk libahci xhci_pci_renesas failover
[   36.123271] ---[ end trace db3c0ab5a48fabcc ]---
[   36.123272] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm]
[   36.123319] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc
[   36.123320] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202
[   36.123321] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[   36.123321] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200
[   36.123322] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002
[   36.123322] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70
[   36.123323] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000
[   36.123323] FS:  00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000
[   36.123324] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.123325] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0
[   36.123327] PKRU: 55555554
[   36.123328] Kernel panic - not syncing: Fatal exception in interrupt
[   36.123410] Kernel Offset: 0x1400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   36.135305] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fixes: aa3d480315 ("x86: Use return-thunk in asm code")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Message-Id: <20220713171241.184026-1-cascardo@canonical.com>
Tested-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 07:44:38 -04:00
Sean Christopherson 2626206963 KVM: x86: Set error code to segment selector on LLDT/LTR non-canonical #GP
When injecting a #GP on LLDT/LTR due to a non-canonical LDT/TSS base, set
the error code to the selector.  Intel SDM's says nothing about the #GP,
but AMD's APM explicitly states that both LLDT and LTR set the error code
to the selector, not zero.

Note, a non-canonical memory operand on LLDT/LTR does generate a #GP(0),
but the KVM code in question is specific to the base from the descriptor.

Fixes: e37a75a13c ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220711232750.1092012-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-13 18:14:06 -07:00
Sean Christopherson ec6e4d8632 KVM: x86: Mark TSS busy during LTR emulation _after_ all fault checks
Wait to mark the TSS as busy during LTR emulation until after all fault
checks for the LTR have passed.  Specifically, don't mark the TSS busy if
the new TSS base is non-canonical.

Opportunistically drop the one-off !seg_desc.PRESENT check for TR as the
only reason for the early check was to avoid marking a !PRESENT TSS as
busy, i.e. the common !PRESENT is now done before setting the busy bit.

Fixes: e37a75a13c ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR")
Reported-by: syzbot+760a73552f47a8cd0fd9@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Hou Wenlong <houwenlong.hwl@antgroup.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220711232750.1092012-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-13 18:14:05 -07:00
Peter Zijlstra f43b9876e8 x86/retbleed: Add fine grained Kconfig knobs
Do fine-grained Kconfig for all the various retbleed parts.

NOTE: if your compiler doesn't support return thunks this will
silently 'upgrade' your mitigation to IBPB, you might not like this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-29 17:43:41 +02:00
Peter Zijlstra af2e140f34 x86/kvm: Fix SETcc emulation for return thunks
Prepare the SETcc fastop stuff for when RET can be larger still.

The tricky bit here is that the expressions should not only be
constant C expressions, but also absolute GAS expressions. This means
no ?: and 'true' is ~0.

Also ensure em_setcc() has the same alignment as the actual FOP_SETCC()
ops, this ensures there cannot be an alignment hole between em_setcc()
and the first op.

Additionally, add a .skip directive to the FOP_SETCC() macro to fill
any remaining space with INT3 traps; however the primary purpose of
this directive is to generate AS warnings when the remaining space
goes negative. Which is a very good indication the alignment magic
went side-ways.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27 10:33:58 +02:00
Sean Christopherson d38ea9579c KVM: x86: Bug the VM on an out-of-bounds data read
Bug the VM and terminate emulation if an out-of-bounds read into the
emulator's data cache occurs.  Knowingly contuining on all but guarantees
that KVM will overwrite random kernel data, which is far, far worse than
killing the VM.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220526210817.3428868-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10 10:01:34 -04:00
Sean Christopherson 49a1431d3b KVM: x86: Bug the VM if the emulator generates a bogus exception vector
Bug the VM if KVM's emulator attempts to inject a bogus exception vector.
The guest is likely doomed even if KVM continues on, and propagating a
bad vector to the rest of KVM runs the risk of breaking other assumptions
in KVM and thus triggering a more egregious bug.

All existing users of emulate_exception() have hardcoded vector numbers
(__load_segment_descriptor() uses a few different vectors, but they're
all hardcoded), and future users are likely to follow suit, i.e. the
change to emulate_exception() is a glorified nop.

As for the ctxt->exception.vector check in x86_emulate_insn(), the few
known times the WARN has been triggered in the past is when the field was
not set when synthesizing a fault, i.e. for all intents and purposes the
check protects against consumption of uninitialized data.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220526210817.3428868-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10 10:01:33 -04:00
Sean Christopherson 1cca2f8c50 KVM: x86: Bug the VM if the emulator accesses a non-existent GPR
Bug the VM, i.e. kill it, if the emulator accesses a non-existent GPR,
i.e. generates an out-of-bounds GPR index.  Continuing on all but
gaurantees some form of data corruption in the guest, e.g. even if KVM
were to redirect to a dummy register, KVM would be incorrectly read zeros
and drop writes.

Note, bugging the VM doesn't completely prevent data corruption, e.g. the
current round of emulation will complete before the vCPU bails out to
userspace.  But, the very act of killing the guest can also cause data
corruption, e.g. due to lack of file writeback before termination, so
taking on additional complexity to cleanly bail out of the emulator isn't
justified, the goal is purely to stem the bleeding and alert userspace
that something has gone horribly wrong, i.e. to avoid _silent_ data
corruption.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220526210817.3428868-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10 10:01:33 -04:00
Sean Christopherson b443183a25 KVM: x86: Reduce the number of emulator GPRs to '8' for 32-bit KVM
Reduce the number of GPRs emulated by 32-bit KVM from 16 to 8.  KVM does
not support emulating 64-bit mode on 32-bit host kernels, and so should
never generate accesses to R8-15.

Opportunistically use NR_EMULATOR_GPRS in rsm_load_state_{32,64}() now
that it is precise and accurate for both flavors.

Wrap the definition with full #ifdef ugliness; sadly, IS_ENABLED()
doesn't guarantee a compile-time constant as far as BUILD_BUG_ON() is
concerned.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Message-Id: <20220526210817.3428868-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10 10:01:31 -04:00
Sean Christopherson 0cbc60d44c KVM: x86: Use 16-bit fields to track dirty/valid emulator GPRs
Use a u16 instead of a u32 to track the dirty/valid status of GPRs in the
emulator.  Unlike struct kvm_vcpu_arch, x86_emulate_ctxt tracks only the
"true" GPRs, i.e. doesn't include RIP in its array, and so only needs to
track 16 registers.

Note, maxing out at 16 GPRs is a fundamental property of x86-64 and will
not change barring a massive architecture update.  Legacy x86 ModRM and
SIB encodings use 3 bits for GPRs, i.e. support 8 registers.  x86-64 uses
a single bit in the REX prefix for each possible reference type to double
the number of supported GPRs to 16 registers (4 bits).

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220526210817.3428868-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10 10:01:30 -04:00
Sean Christopherson a5ba67b42f KVM: x86: Omit VCPU_REGS_RIP from emulator's _regs array
Omit RIP from the emulator's _regs array, which is used only for GPRs,
i.e. registers that can be referenced via ModRM and/or SIB bytes.  The
emulator uses the dedicated _eip field for RIP, and manually reads from
_eip to handle RIP-relative addressing.

To avoid an even bigger, slightly more dangerous change, hardcode the
number of GPRs to 16 for the time being even though 32-bit KVM's emulator
technically should only have 8 GPRs.  Add a TODO to address that in a
future commit.

See also the comments above the read_gpr() and write_gpr() declarations,
and obviously the handling in writeback_registers().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Message-Id: <20220526210817.3428868-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10 10:01:30 -04:00
Sean Christopherson dfe21e6bc0 KVM: x86: Harden _regs accesses to guard against buggy input
WARN and truncate the incoming GPR number/index when reading/writing GPRs
in the emulator to guard against KVM bugs, e.g. to avoid out-of-bounds
accesses to ctxt->_regs[] if KVM generates a bogus index.  Truncate the
index instead of returning e.g. zero, as reg_write() returns a pointer
to the register, i.e. returning zero would result in a NULL pointer
dereference.  KVM could also force the index to any arbitrary GPR, but
that's no better or worse, just different.

Open code the restriction to 16 registers; RIP is handled via _eip and
should never be accessed through reg_read() or reg_write().  See the
comments above the declarations of reg_read() and reg_write(), and the
behavior of writeback_registers().  The horrific open coded mess will be
cleaned up in a future commit.

There are no such bugs known to exist in the emulator, but determining
that KVM is bug-free is not at all simple and requires a deep dive into
the emulator.  The code is so convoluted that GCC-12 with the recently
enable -Warray-bounds spits out a false-positive due to a GCC bug:

  arch/x86/kvm/emulate.c:254:27: warning: array subscript 32 is above array
                                 bounds of 'long unsigned int[17]' [-Warray-bounds]
    254 |         return ctxt->_regs[nr];
        |                ~~~~~~~~~~~^~~~
  In file included from arch/x86/kvm/emulate.c:23:
  arch/x86/kvm/kvm_emulate.h: In function 'reg_rmw':
  arch/x86/kvm/kvm_emulate.h:366:23: note: while referencing '_regs'
    366 |         unsigned long _regs[NR_VCPU_REGS];
        |                       ^~~~~

Link: https://lore.kernel.org/all/YofQlBrlx18J7h9Y@google.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216026
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105679
Reported-and-tested-by: Robert Dinse <nanook@eskimo.com>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220526210817.3428868-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10 10:01:29 -04:00
Sean Christopherson 61d9c412d0 KVM: x86: Grab regs_dirty in local 'unsigned long'
Capture ctxt->regs_dirty in a local 'unsigned long' instead of casting it
to an 'unsigned long *' for use in for_each_set_bit().  The bitops helpers
really do read the entire 'unsigned long', even though the walking of the
read value is capped at the specified size.  I.e. 64-bit KVM is reading
memory beyond ctxt->regs_dirty, which is a u32 and thus 4 bytes, whereas
an unsigned long is 8 bytes.  Functionally it's not an issue because
regs_dirty is in the middle of x86_emulate_ctxt, i.e. KVM is just reading
its own memory, but relying on that coincidence is gross and unsafe.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220526210817.3428868-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-10 10:01:29 -04:00
Linus Torvalds 38904911e8 * Only do MSR filtering for MSRs accessed by rdmsr/wrmsr
* Documentation improvements
 
 * Prevent module exit until all VMs are freed
 
 * PMU Virtualization fixes
 
 * Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences
 
 * Other miscellaneous bugfixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJIGV8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO5FQgAhls4+Nu+NqId/yvvyNxr3vXq0dHI
 hLlHtvzgGzZisZ7y2bNeyIpJVBDT5LCbrptPD/5eTvchVswDh0+kCVC0Uni5ugGT
 tLT/Pv9Oq9e0X7aGdHRyuHIivIFDC20zIZO2DV48Lrj/+r6DafB2Fghq2XQLlBxN
 p8KislvuqAAos543BPC1+Lk3dhOLuZ8qcFD8wGRlcCwjNwYaitrQ16rO04cLfUur
 OwIks1I6TdI2JpLBhm6oWYVG/YnRsoo4bQE8cjdQ6yNSbwWtRpV33q7X6onw8x8K
 BEeESoTnMqfaxIF/6mPl6bnDblVHFp6Xhld/vJcgeWQTdajFtuFE/K4sCA==
 =xnQ6
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

 - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr

 - Documentation improvements

 - Prevent module exit until all VMs are freed

 - PMU Virtualization fixes

 - Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences

 - Other miscellaneous bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: x86: fix sending PV IPI
  KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
  KVM: x86: Remove redundant vm_entry_controls_clearbit() call
  KVM: x86: cleanup enter_rmode()
  KVM: x86: SVM: fix tsc scaling when the host doesn't support it
  kvm: x86: SVM: remove unused defines
  KVM: x86: SVM: move tsc ratio definitions to svm.h
  KVM: x86: SVM: fix avic spec based definitions again
  KVM: MIPS: remove reference to trap&emulate virtualization
  KVM: x86: document limitations of MSR filtering
  KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr
  KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
  KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
  KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set
  KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
  KVM: x86: Trace all APICv inhibit changes and capture overall status
  KVM: x86: Add wrappers for setting/clearing APICv inhibits
  KVM: x86: Make APICv inhibit reasons an enum and cleanup naming
  KVM: X86: Handle implicit supervisor access with SMAP
  KVM: X86: Rename variable smap to not_smap in permission_fault()
  ...
2022-04-02 12:09:02 -07:00
Hou Wenlong ac8d6cad3c KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr
If MSR access is rejected by MSR filtering,
kvm_set_msr()/kvm_get_msr() would return KVM_MSR_RET_FILTERED,
and the return value is only handled well for rdmsr/wrmsr.
However, some instruction emulation and state transition also
use kvm_set_msr()/kvm_get_msr() to do msr access but may trigger
some unexpected results if MSR access is rejected, E.g. RDPID
emulation would inject a #UD but RDPID wouldn't cause a exit
when RDPID is supported in hardware and ENABLE_RDTSCP is set.
And it would also cause failure when load MSR at nested entry/exit.
Since msr filtering is based on MSR bitmap, it is better to only
do MSR filtering for rdmsr/wrmsr.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <2b2774154f7532c96a6f04d71c82a8bec7d9e80b.1646655860.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02 05:34:47 -04:00
Hou Wenlong a836839cbf KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
When RDTSCP is supported but RDPID is not supported in host,
RDPID emulation is available. However, __kvm_get_msr() would
only fail when RDTSCP/RDPID both are disabled in guest, so
the emulator wouldn't inject a #UD when RDPID is disabled but
RDTSCP is enabled in guest.

Fixes: fb6d4d340e ("KVM: x86: emulate RDPID")
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02 05:34:46 -04:00
Linus Torvalds 7001052160 Add support for Intel CET-IBT, available since Tigerlake (11th gen), which is a
coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism
 where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP.
 
 Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is
 limited to 2 instructions (and typically fewer) on branch targets not starting
 with ENDBR. CET-IBT also limits speculation of the next sequential instruction
 after the indirect CALL/JMP [1].
 
 CET-IBT is fundamentally incompatible with retpolines, but provides, as
 described above, speculation limits itself.
 
 [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEv3OU3/byMaA0LqWJdkfhpEvA5LoFAmI/LI8VHHBldGVyekBp
 bmZyYWRlYWQub3JnAAoJEHZH4aRLwOS6ZnkP/2QCgQLTu6oRxv9O020CHwlaSEeD
 1Hoy3loum5q5hAi1Ik3dR9p0H5u64c9qbrBVxaFoNKaLt5GKrtHaDSHNk2L/CFHX
 urpH65uvTLxbyZzcahkAahoJ71XU+m7PcrHLWMunw9sy10rExYVsUOlFyoyG6XCF
 BDCNZpdkC09ZM3vwlWGMZd5Pp+6HcZNPyoV9tpvWAS2l+WYFWAID7mflbpQ+tA8b
 y/hM6b3Ud0rT2ubuG1iUpopgNdwqQZ+HisMPGprh+wKZkYwS2l8pUTrz0MaBkFde
 go7fW16kFy2HQzGm6aIEBmfcg0palP/mFVaWP0zS62LwhJSWTn5G6xWBr3yxSsht
 9gWCiI0oDZuTg698MedWmomdG2SK6yAuZuqmdKtLLoWfWgviPEi7TDFG/cKtZdAW
 ag8GM8T4iyYZzpCEcWO9GWbjo6TTGq30JBQefCBG47GjD0csv2ubXXx0Iey+jOwT
 x3E8wnv9dl8V9FSd/tMpTFmje8ges23yGrWtNpb5BRBuWTeuGiBPZED2BNyyIf+T
 dmewi2ufNMONgyNp27bDKopY81CPAQq9cVxqNm9Cg3eWPFnpOq2KGYEvisZ/rpEL
 EjMQeUBsy/C3AUFAleu1vwNnkwP/7JfKYpN00gnSyeQNZpqwxXBCKnHNgOMTXyJz
 beB/7u2KIUbKEkSN
 =jZfK
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra:
 "Add support for Intel CET-IBT, available since Tigerlake (11th gen),
  which is a coarse grained, hardware based, forward edge
  Control-Flow-Integrity mechanism where any indirect CALL/JMP must
  target an ENDBR instruction or suffer #CP.

  Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation
  is limited to 2 instructions (and typically fewer) on branch targets
  not starting with ENDBR. CET-IBT also limits speculation of the next
  sequential instruction after the indirect CALL/JMP [1].

  CET-IBT is fundamentally incompatible with retpolines, but provides,
  as described above, speculation limits itself"

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html

* tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  kvm/emulate: Fix SETcc emulation for ENDBR
  x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0
  x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0
  kbuild: Fixup the IBT kbuild changes
  x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
  x86: Remove toolchain check for X32 ABI capability
  x86/alternative: Use .ibt_endbr_seal to seal indirect calls
  objtool: Find unused ENDBR instructions
  objtool: Validate IBT assumptions
  objtool: Add IBT/ENDBR decoding
  objtool: Read the NOENDBR annotation
  x86: Annotate idtentry_df()
  x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
  x86: Annotate call_on_stack()
  objtool: Rework ASM_REACHABLE
  x86: Mark __invalid_creds() __noreturn
  exit: Mark do_group_exit() __noreturn
  x86: Mark stop_this_cpu() __noreturn
  objtool: Ignore extra-symbol code
  objtool: Rename --duplicate to --lto
  ...
2022-03-27 10:17:23 -07:00
Linus Torvalds 1ebdbeb03e ARM:
- Proper emulation of the OSLock feature of the debug architecture
 
 - Scalibility improvements for the MMU lock when dirty logging is on
 
 - New VMID allocator, which will eventually help with SVA in VMs
 
 - Better support for PMUs in heterogenous systems
 
 - PSCI 1.1 support, enabling support for SYSTEM_RESET2
 
 - Implement CONFIG_DEBUG_LIST at EL2
 
 - Make CONFIG_ARM64_ERRATUM_2077057 default y
 
 - Reduce the overhead of VM exit when no interrupt is pending
 
 - Remove traces of 32bit ARM host support from the documentation
 
 - Updated vgic selftests
 
 - Various cleanups, doc updates and spelling fixes
 
 RISC-V:
 
 - Prevent KVM_COMPAT from being selected
 
 - Optimize __kvm_riscv_switch_to() implementation
 
 - RISC-V SBI v0.3 support
 
 s390:
 
 - memop selftest
 
 - fix SCK locking
 
 - adapter interruptions virtualization for secure guests
 
 - add Claudio Imbrenda as maintainer
 
 - first step to do proper storage key checking
 
 x86:
 
 - Continue switching kvm_x86_ops to static_call(); introduce
   static_call_cond() and __static_call_ret0 when applicable.
 
 - Cleanup unused arguments in several functions
 
 - Synthesize AMD 0x80000021 leaf
 
 - Fixes and optimization for Hyper-V sparse-bank hypercalls
 
 - Implement Hyper-V's enlightened MSR bitmap for nested SVM
 
 - Remove MMU auditing
 
 - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
   page tracking is enabled
 
 - Cleanup the implementation of the guest PGD cache
 
 - Preparation for the implementation of Intel IPI virtualization
 
 - Fix some segment descriptor checks in the emulator
 
 - Allow AMD AVIC support on systems with physical APIC ID above 255
 
 - Better API to disable virtualization quirks
 
 - Fixes and optimizations for the zapping of page tables:
 
   - Zap roots in two passes, avoiding RCU read-side critical sections
     that last too long for very large guests backed by 4 KiB SPTEs.
 
   - Zap invalid and defunct roots asynchronously via concurrency-managed
     work queue.
 
   - Allowing yielding when zapping TDP MMU roots in response to the root's
     last reference being put.
 
   - Batch more TLB flushes with an RCU trick.  Whoever frees the paging
     structure now holds RCU as a proxy for all vCPUs running in the guest,
     i.e. to prolongs the grace period on their behalf.  It then kicks the
     the vCPUs out of guest mode before doing rcu_read_unlock().
 
 Generic:
 
 - Introduce __vcalloc and use it for very large allocations that
   need memcg accounting
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmI4fdwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMq8gf/WoeVHtw2QlL5Mmz6McvRRmPAYPLV
 wLUIFNrRqRvd8Tw4kivzZoh/xTpwmnojv0YdK5SjKAiMjgv094YI1LrNp1JSPvmL
 pitocMkA10RSJNWHeEMg9cMSKH0rKiqeYl6S1e2XsdB+UZZ2BINOCVtvglmjTAvJ
 dFBdKdBkqjAUZbdXAGIvz4JEEER3N/LkFDKGaUGX+0QIQOzGBPIyLTxynxIDG6mt
 RViCCFyXdy5NkVp5hZFm96vQ2qAlWL9B9+iKruQN++82+oqWbeTdSqPhdwF7GyFz
 BfOv3gobQ2c4ef/aMLO5LswZ9joI1t/4kQbbAn6dNybpOAz/NXfDnbNefg==
 =keox
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - Proper emulation of the OSLock feature of the debug architecture

   - Scalibility improvements for the MMU lock when dirty logging is on

   - New VMID allocator, which will eventually help with SVA in VMs

   - Better support for PMUs in heterogenous systems

   - PSCI 1.1 support, enabling support for SYSTEM_RESET2

   - Implement CONFIG_DEBUG_LIST at EL2

   - Make CONFIG_ARM64_ERRATUM_2077057 default y

   - Reduce the overhead of VM exit when no interrupt is pending

   - Remove traces of 32bit ARM host support from the documentation

   - Updated vgic selftests

   - Various cleanups, doc updates and spelling fixes

  RISC-V:
   - Prevent KVM_COMPAT from being selected

   - Optimize __kvm_riscv_switch_to() implementation

   - RISC-V SBI v0.3 support

  s390:
   - memop selftest

   - fix SCK locking

   - adapter interruptions virtualization for secure guests

   - add Claudio Imbrenda as maintainer

   - first step to do proper storage key checking

  x86:
   - Continue switching kvm_x86_ops to static_call(); introduce
     static_call_cond() and __static_call_ret0 when applicable.

   - Cleanup unused arguments in several functions

   - Synthesize AMD 0x80000021 leaf

   - Fixes and optimization for Hyper-V sparse-bank hypercalls

   - Implement Hyper-V's enlightened MSR bitmap for nested SVM

   - Remove MMU auditing

   - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
     page tracking is enabled

   - Cleanup the implementation of the guest PGD cache

   - Preparation for the implementation of Intel IPI virtualization

   - Fix some segment descriptor checks in the emulator

   - Allow AMD AVIC support on systems with physical APIC ID above 255

   - Better API to disable virtualization quirks

   - Fixes and optimizations for the zapping of page tables:

      - Zap roots in two passes, avoiding RCU read-side critical
        sections that last too long for very large guests backed by 4
        KiB SPTEs.

      - Zap invalid and defunct roots asynchronously via
        concurrency-managed work queue.

      - Allowing yielding when zapping TDP MMU roots in response to the
        root's last reference being put.

      - Batch more TLB flushes with an RCU trick. Whoever frees the
        paging structure now holds RCU as a proxy for all vCPUs running
        in the guest, i.e. to prolongs the grace period on their behalf.
        It then kicks the the vCPUs out of guest mode before doing
        rcu_read_unlock().

  Generic:
   - Introduce __vcalloc and use it for very large allocations that need
     memcg accounting"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  KVM: use kvcalloc for array allocations
  KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
  kvm: x86: Require const tsc for RT
  KVM: x86: synthesize CPUID leaf 0x80000021h if useful
  KVM: x86: add support for CPUID leaf 0x80000021
  KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
  Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
  kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
  KVM: arm64: fix typos in comments
  KVM: arm64: Generalise VM features into a set of flags
  KVM: s390: selftests: Add error memop tests
  KVM: s390: selftests: Add more copy memop tests
  KVM: s390: selftests: Add named stages for memop test
  KVM: s390: selftests: Add macro as abstraction for MEM_OP
  KVM: s390: selftests: Split memop tests
  KVM: s390x: fix SCK locking
  RISC-V: KVM: Implement SBI HSM suspend call
  RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
  RISC-V: Add SBI HSM suspend related defines
  RISC-V: KVM: Implement SBI v0.3 SRST extension
  ...
2022-03-24 11:58:57 -07:00
Peter Zijlstra 3986f65d4f kvm/emulate: Fix SETcc emulation for ENDBR
Companion patch for commit fe83f5eae4 ("kvm/emulate: Fix SETcc
emulation function offsets with SLS"), now extending it to cover the
additional ENDBR instruction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/YjMVpfe%2f9ldmWX8W@hirez.programming.kicks-ass.net
2022-03-22 21:12:14 +01:00
Peter Zijlstra b9067cd80f Merge branch 'kvm/kvm-sls-fix'
Sync with the last minute SLS fix to extend it for IBT.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-03-22 21:12:14 +01:00
Linus Torvalds 95ab0e8768 Changes for this cycle were:
- Fix address filtering for Intel/PT,ARM/CoreSight
  - Enable Intel/PEBS format 5
  - Allow more fixed-function counters for x86
  - Intel/PT: Enable not recording Taken-Not-Taken packets
  - Add a few branch-types
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI4WdIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jdTA/7BADTYzFCbdwPzHt2mR8osv7k+pDvYxs9
 wxNjyi1X7N8cPkhqgIg9CfdhdyDOqo7+J4fG17f2qbwjNK7b2Fb1/U6ZoZaf+f8F
 W0e2LX5KZTXUhkA+TEjrXvYD9FmJaCPM/l2RQg8U7okBs2kb0H6QT2Yn21wd1roC
 WwI5KFiWSVS1IzpVLaXjDh+FJfJHd75ReMqJeus+QoVQ9NHeuI+t4DglSB1IBi54
 d/zeVXE/Y4dFTQOrU06S2HxcOEptvXZsPmVLvKab/veeGGyWiGPxQpvu6bXm6u3x
 0sV+dn67zut2m2pQlUZUucgGTSYIZTpOe+rNukTB9hJ4XeN4/1ohOOCrOuYM+63P
 lGFbN1v+LD7Wc6C2eEhw8G5GEL0qbwzFNQ06O3EOFi7C7GKn7WS/ET6XuuMOERFk
 uxEPb4pFtbBlJ0SriCprFJSd5NL3PORZlLIhv4hGH5hilLR1TFeKDuwZaM4noQxU
 dL3rKGLi9H+P46Eni9H28+0gDISbv1xL+WivHOFQNmhBqAZO52ZcF3J+dgBaR1B5
 pBxVTycFpZMjxSZnqTE0gMsFaLIpVGc+75Chns1rajR0mEtRtJUQUbYz4tK4zb0E
 dZR1p+VF6+DYmSRhiqeaTi9uz9oE8kMa8o/EcbFIg/9BgEnUwJXU20bjnar30xQ7
 9OIn7r9hjHI=
 =XPuo
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf event updates from Ingo Molnar:

 - Fix address filtering for Intel/PT,ARM/CoreSight

 - Enable Intel/PEBS format 5

 - Allow more fixed-function counters for x86

 - Intel/PT: Enable not recording Taken-Not-Taken packets

 - Add a few branch-types

* tag 'perf-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix the build on !CONFIG_PHYS_ADDR_T_64BIT
  perf: Add irq and exception return branch types
  perf/x86/intel/uncore: Make uncore_discovery clean for 64 bit addresses
  perf/x86/intel/pt: Add a capability and config bit for disabling TNTs
  perf/x86/intel/pt: Add a capability and config bit for event tracing
  perf/x86/intel: Increase max number of the fixed counters
  KVM: x86: use the KVM side max supported fixed counter
  perf/x86/intel: Enable PEBS format 5
  perf/core: Allow kernel address filter when not filtering the kernel
  perf/x86/intel/pt: Fix address filter config for 32-bit kernel
  perf/core: Fix address filter parser for multiple filters
  x86: Share definition of __is_canonical_address()
  perf/x86/intel/pt: Relax address filter validation
2022-03-22 13:06:49 -07:00
Linus Torvalds 8565d64430 bounds-fixes updates for v5.18-rc1
- Various buffer and array bounds related fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmI4nPQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJi/gD/9UctJGcKAi28EVVcS11oLSxl97
 LuIOJ4lWr8WUpCUqHcN65biUoODjshkIJRTx6Vxx9diLm3u6NO58+oJJCveKvE7w
 LtFjkbXBZ2sTxUoMZiva7qW8A6pYTfpiGq2lyUWVZRLOAMnNlCVuhcIonkzkR7js
 xdMZ2AmiQ0LJqT8paw4UUtSxGXGpLkcbuEoWHVWbqd3jgUbDwA4WR4xJw3ZUyh9i
 ONHOsfl/nFCNcLU69ppGJWPlXqNr5hHjjCeRzlcMfnwD/kxA7Qgt5TmpdEeAD4zx
 csNbvXbaW2Y+5IUWKXHT2Rt0rW1u+Zi5c+mtstTJf7XqK6slvTdLugY5TCtI6oXf
 x4qOMbqDjPbTr9Gpw3289WlqZYNJs1pGdeD4zL2HiOmwXq75GCNgxe0bv1hjnhNG
 b/bggAkpN/0n9r5BCQ32FWBg6S26VPOzg7//l6M38EBtQyakBVnS/064SP3aGTx4
 8oCKmrNLQXyQz7mdskOA9hwyEkF1+hCX2kJFsoZ9iN0TDYKzzJYP8cBLzZe6bfPE
 dqsAc36W8FIHATfo7wrbTVABP61wJcHgocSLICRYmGQrSMTqREl9P+nDDEWl/wcc
 vKd1kyYhnskcz7GVdFtSDnpcHp6W/aiLwJUFCpAkgz2GBzrt1MtGxnFrXl6s8cc4
 bSK/JClIBhMvBas4Tw==
 =gm8R
 -----END PGP SIGNATURE-----

Merge tag 'bounds-fixes-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull bounds fixes from Kees Cook:
 "These are a handful of buffer and array bounds fixes that I've been
  carrying in preparation for the coming memcpy improvements and the
  enabling of '-Warray-bounds' globally.

  There are additional similar fixes in other maintainer's trees, but
  these ended up getting carried by me. :)"

* tag 'bounds-fixes-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  media: omap3isp: Use struct_group() for memcpy() region
  tpm: vtpm_proxy: Check length to avoid compiler warning
  alpha: Silence -Warray-bounds warnings
  m68k: cmpxchg: Dereference matching size
  intel_th: msu: Use memset_startat() for clearing hw header
  KVM: x86: Replace memset() "optimization" with normal per-field writes
2022-03-21 19:58:27 -07:00