* Move a lot of state that was previously stored on a per vcpu
   basis into a per-CPU area, because it is only pertinent to the
   host while the vcpu is loaded. This results in better state
   tracking, and a smaller vcpu structure.
 
 * Add full handling of the ERET/ERETAA/ERETAB instructions in
   nested virtualisation. The last two instructions also require
   emulating part of the pointer authentication extension.
   As a result, the trap handling of pointer authentication has
   been greatly simplified.
 
 * Turn the global (and not very scalable) LPI translation cache
   into a per-ITS, scalable cache, making non directly injected
   LPIs much cheaper to make visible to the vcpu.
 
 * A batch of pKVM patches, mostly fixes and cleanups, as the
   upstreaming process seems to be resuming. Fingers crossed!
 
 * Allocate PPIs and SGIs outside of the vcpu structure, allowing
   for smaller EL2 mapping and some flexibility in implementing
   more or less than 32 private IRQs.
 
 * Purge stale mpidr_data if a vcpu is created after the MPIDR
   map has been created.
 
 * Preserve vcpu-specific ID registers across a vcpu reset.
 
 * Various minor cleanups and improvements.
 
 LoongArch:
 
 * Add ParaVirt IPI support.
 
 * Add software breakpoint support.
 
 * Add mmio trace events support.
 
 RISC-V:
 
 * Support guest breakpoints using ebreak
 
 * Introduce per-VCPU mp_state_lock and reset_cntx_lock
 
 * Virtualize SBI PMU snapshot and counter overflow interrupts
 
 * New selftests for SBI PMU and Guest ebreak
 
 * Some preparatory work for both TDX and SNP page fault handling.
   This also cleans up the page fault path, so that the priorities
   of various kinds of fauls (private page, no memory, write
   to read-only slot, etc.) are easier to follow.
 
 x86:
 
 * Minimize amount of time that shadow PTEs remain in the special
   REMOVED_SPTE state.  This is a state where the mmu_lock is held for
   reading but concurrent accesses to the PTE have to spin; shortening
   its use allows other vCPUs to repopulate the zapped region while
   the zapper finishes tearing down the old, defunct page tables.
 
 * Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID field,
   which is defined by hardware but left for software use.  This lets KVM
   communicate its inability to map GPAs that set bits 51:48 on hosts
   without 5-level nested page tables.  Guest firmware is expected to
   use the information when mapping BARs; this avoids that they end up at
   a legal, but unmappable, GPA.
 
 * Fixed a bug where KVM would not reject accesses to MSR that aren't
   supposed to exist given the vCPU model and/or KVM configuration.
 
 * As usual, a bunch of code cleanups.
 
 x86 (AMD):
 
 * Implement a new and improved API to initialize SEV and SEV-ES VMs, which
   will also be extendable to SEV-SNP.  The new API specifies the desired
   encryption in KVM_CREATE_VM and then separately initializes the VM.
   The new API also allows customizing the desired set of VMSA features;
   the features affect the measurement of the VM's initial state, and
   therefore enabling them cannot be done tout court by the hypervisor.
 
   While at it, the new API includes two bugfixes that couldn't be
   applied to the old one without a flag day in userspace or without
   affecting the initial measurement.  When a SEV-ES VM is created with
   the new VM type, KVM_GET_REGS/KVM_SET_REGS and friends are
   rejected once the VMSA has been encrypted.  Also, the FPU and AVX
   state will be synchronized and encrypted too.
 
 * Support for GHCB version 2 as applicable to SEV-ES guests.  This, once
   more, is only accessible when using the new KVM_SEV_INIT2 flow for
   initialization of SEV-ES VMs.
 
 x86 (Intel):
 
 * An initial bunch of prerequisite patches for Intel TDX were merged.
   They generally don't do anything interesting.  The only somewhat user
   visible change is a new debugging mode that checks that KVM's MMU
   never triggers a #VE virtualization exception in the guest.
 
 * Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig VM-Exit to
   L1, as per the SDM.
 
 Generic:
 
 * Use vfree() instead of kvfree() for allocations that always use vcalloc()
   or __vcalloc().
 
 * Remove .change_pte() MMU notifier - the changes to non-KVM code are
   small and Andrew Morton asked that I also take those through the KVM
   tree.  The callback was only ever implemented by KVM (which was also the
   original user of MMU notifiers) but it had been nonfunctional ever since
   calls to set_pte_at_notify were wrapped with invalidate_range_start
   and invalidate_range_end... in 2012.
 
 Selftests:
 
 * Enhance the demand paging test to allow for better reporting and stressing
   of UFFD performance.
 
 * Convert the steal time test to generate TAP-friendly output.
 
 * Fix a flaky false positive in the xen_shinfo_test due to comparing elapsed
   time across two different clock domains.
 
 * Skip the MONITOR/MWAIT test if the host doesn't actually support MWAIT.
 
 * Avoid unnecessary use of "sudo" in the NX hugepage test wrapper shell
   script, to play nice with running in a minimal userspace environment.
 
 * Allow skipping the RSEQ test's sanity check that the vCPU was able to
   complete a reasonable number of KVM_RUNs, as the assert can fail on a
   completely valid setup.  If the test is run on a large-ish system that is
   otherwise idle, and the test isn't affined to a low-ish number of CPUs, the
   vCPU task can be repeatedly migrated to CPUs that are in deep sleep states,
   which results in the vCPU having very little net runtime before the next
   migration due to high wakeup latencies.
 
 * Define _GNU_SOURCE for all selftests to fix a warning that was introduced by
   a change to kselftest_harness.h late in the 6.9 cycle, and because forcing
   every test to #define _GNU_SOURCE is painful.
 
 * Provide a global pseudo-RNG instance for all tests, so that library code can
   generate random, but determinstic numbers.
 
 * Use the global pRNG to randomly force emulation of select writes from guest
   code on x86, e.g. to help validate KVM's emulation of locked accesses.
 
 * Allocate and initialize x86's GDT, IDT, TSS, segments, and default exception
   handlers at VM creation, instead of forcing tests to manually trigger the
   related setup.
 
 Documentation:
 
 * Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmZE878UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOukQf+LcvZsWtrC7Wd5K9SQbYXaS4Rk6P6
 JHoQW2d0hUN893J2WibEw+l1J/0vn5JumqHXyZgJ7CbaMtXkWWQTwDSDLuURUKpv
 XNB3Sb17G87NH+s1tOh0tA9h5upbtlHVHvrtIwdbb9+XHgQ6HTL4uk+HdfO/p9fW
 cWBEZAKoWcCIa99Numv3pmq5vdrvBlNggwBugBS8TH69EKMw+V1Vu1SFkIdNDTQk
 NJJ28cohoP3wnwlIHaXSmU4RujipPH3Lm/xupyA5MwmzO713eq2yUqV49jzhD5/I
 MA4Ruvgrdm4wpp89N9lQMyci91u6q7R9iZfMu0tSg2qYI3UPKIdstd8sOA==
 =2lED
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:

   - Move a lot of state that was previously stored on a per vcpu basis
     into a per-CPU area, because it is only pertinent to the host while
     the vcpu is loaded. This results in better state tracking, and a
     smaller vcpu structure.

   - Add full handling of the ERET/ERETAA/ERETAB instructions in nested
     virtualisation. The last two instructions also require emulating
     part of the pointer authentication extension. As a result, the trap
     handling of pointer authentication has been greatly simplified.

   - Turn the global (and not very scalable) LPI translation cache into
     a per-ITS, scalable cache, making non directly injected LPIs much
     cheaper to make visible to the vcpu.

   - A batch of pKVM patches, mostly fixes and cleanups, as the
     upstreaming process seems to be resuming. Fingers crossed!

   - Allocate PPIs and SGIs outside of the vcpu structure, allowing for
     smaller EL2 mapping and some flexibility in implementing more or
     less than 32 private IRQs.

   - Purge stale mpidr_data if a vcpu is created after the MPIDR map has
     been created.

   - Preserve vcpu-specific ID registers across a vcpu reset.

   - Various minor cleanups and improvements.

  LoongArch:

   - Add ParaVirt IPI support

   - Add software breakpoint support

   - Add mmio trace events support

  RISC-V:

   - Support guest breakpoints using ebreak

   - Introduce per-VCPU mp_state_lock and reset_cntx_lock

   - Virtualize SBI PMU snapshot and counter overflow interrupts

   - New selftests for SBI PMU and Guest ebreak

   - Some preparatory work for both TDX and SNP page fault handling.

     This also cleans up the page fault path, so that the priorities of
     various kinds of fauls (private page, no memory, write to read-only
     slot, etc.) are easier to follow.

  x86:

   - Minimize amount of time that shadow PTEs remain in the special
     REMOVED_SPTE state.

     This is a state where the mmu_lock is held for reading but
     concurrent accesses to the PTE have to spin; shortening its use
     allows other vCPUs to repopulate the zapped region while the zapper
     finishes tearing down the old, defunct page tables.

   - Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID
     field, which is defined by hardware but left for software use.

     This lets KVM communicate its inability to map GPAs that set bits
     51:48 on hosts without 5-level nested page tables. Guest firmware
     is expected to use the information when mapping BARs; this avoids
     that they end up at a legal, but unmappable, GPA.

   - Fixed a bug where KVM would not reject accesses to MSR that aren't
     supposed to exist given the vCPU model and/or KVM configuration.

   - As usual, a bunch of code cleanups.

  x86 (AMD):

   - Implement a new and improved API to initialize SEV and SEV-ES VMs,
     which will also be extendable to SEV-SNP.

     The new API specifies the desired encryption in KVM_CREATE_VM and
     then separately initializes the VM. The new API also allows
     customizing the desired set of VMSA features; the features affect
     the measurement of the VM's initial state, and therefore enabling
     them cannot be done tout court by the hypervisor.

     While at it, the new API includes two bugfixes that couldn't be
     applied to the old one without a flag day in userspace or without
     affecting the initial measurement. When a SEV-ES VM is created with
     the new VM type, KVM_GET_REGS/KVM_SET_REGS and friends are rejected
     once the VMSA has been encrypted. Also, the FPU and AVX state will
     be synchronized and encrypted too.

   - Support for GHCB version 2 as applicable to SEV-ES guests.

     This, once more, is only accessible when using the new
     KVM_SEV_INIT2 flow for initialization of SEV-ES VMs.

  x86 (Intel):

   - An initial bunch of prerequisite patches for Intel TDX were merged.

     They generally don't do anything interesting. The only somewhat
     user visible change is a new debugging mode that checks that KVM's
     MMU never triggers a #VE virtualization exception in the guest.

   - Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig
     VM-Exit to L1, as per the SDM.

  Generic:

   - Use vfree() instead of kvfree() for allocations that always use
     vcalloc() or __vcalloc().

   - Remove .change_pte() MMU notifier - the changes to non-KVM code are
     small and Andrew Morton asked that I also take those through the
     KVM tree.

     The callback was only ever implemented by KVM (which was also the
     original user of MMU notifiers) but it had been nonfunctional ever
     since calls to set_pte_at_notify were wrapped with
     invalidate_range_start and invalidate_range_end... in 2012.

  Selftests:

   - Enhance the demand paging test to allow for better reporting and
     stressing of UFFD performance.

   - Convert the steal time test to generate TAP-friendly output.

   - Fix a flaky false positive in the xen_shinfo_test due to comparing
     elapsed time across two different clock domains.

   - Skip the MONITOR/MWAIT test if the host doesn't actually support
     MWAIT.

   - Avoid unnecessary use of "sudo" in the NX hugepage test wrapper
     shell script, to play nice with running in a minimal userspace
     environment.

   - Allow skipping the RSEQ test's sanity check that the vCPU was able
     to complete a reasonable number of KVM_RUNs, as the assert can fail
     on a completely valid setup.

     If the test is run on a large-ish system that is otherwise idle,
     and the test isn't affined to a low-ish number of CPUs, the vCPU
     task can be repeatedly migrated to CPUs that are in deep sleep
     states, which results in the vCPU having very little net runtime
     before the next migration due to high wakeup latencies.

   - Define _GNU_SOURCE for all selftests to fix a warning that was
     introduced by a change to kselftest_harness.h late in the 6.9
     cycle, and because forcing every test to #define _GNU_SOURCE is
     painful.

   - Provide a global pseudo-RNG instance for all tests, so that library
     code can generate random, but determinstic numbers.

   - Use the global pRNG to randomly force emulation of select writes
     from guest code on x86, e.g. to help validate KVM's emulation of
     locked accesses.

   - Allocate and initialize x86's GDT, IDT, TSS, segments, and default
     exception handlers at VM creation, instead of forcing tests to
     manually trigger the related setup.

  Documentation:

   - Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (225 commits)
  selftests/kvm: remove dead file
  KVM: selftests: arm64: Test vCPU-scoped feature ID registers
  KVM: selftests: arm64: Test that feature ID regs survive a reset
  KVM: selftests: arm64: Store expected register value in set_id_regs
  KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope
  KVM: arm64: Only reset vCPU-scoped feature ID regs once
  KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs()
  KVM: arm64: Rename is_id_reg() to imply VM scope
  KVM: arm64: Destroy mpidr_data for 'late' vCPU creation
  KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support
  KVM: arm64: Fix hvhe/nvhe early alias parsing
  KVM: SEV: Allow per-guest configuration of GHCB protocol version
  KVM: SEV: Add GHCB handling for termination requests
  KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
  KVM: SEV: Add support to handle AP reset MSR protocol
  KVM: x86: Explicitly zero kvm_caps during vendor module load
  KVM: x86: Fully re-initialize supported_mce_cap on vendor module load
  KVM: x86: Fully re-initialize supported_vm_types on vendor module load
  KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns
  KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values
  ...
This commit is contained in:
Linus Torvalds 2024-05-15 14:46:43 -07:00
commit f4b0c4b508
262 changed files with 8888 additions and 4201 deletions

View File

@ -6316,7 +6316,7 @@ The "flags" field is reserved for future extensions and must be '0'.
:Architectures: none
:Type: vm ioctl
:Parameters: struct kvm_create_guest_memfd(in)
:Returns: 0 on success, <0 on error
:Returns: A file descriptor on success, <0 on error
KVM_CREATE_GUEST_MEMFD creates an anonymous file and returns a file descriptor
that refers to it. guest_memfd files are roughly analogous to files created
@ -6894,6 +6894,13 @@ Note that KVM does not skip the faulting instruction as it does for
KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
if it decides to decode and emulate the instruction.
This feature isn't available to protected VMs, as userspace does not
have access to the state that is required to perform the emulation.
Instead, a data abort exception is directly injected in the guest.
Note that although KVM_CAP_ARM_NISV_TO_USER will be reported if
queried outside of a protected VM context, the feature will not be
exposed if queried on a protected VM file descriptor.
::
/* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
@ -8819,6 +8826,8 @@ means the VM type with value @n is supported. Possible values of @n are::
#define KVM_X86_DEFAULT_VM 0
#define KVM_X86_SW_PROTECTED_VM 1
#define KVM_X86_SEV_VM 2
#define KVM_X86_SEV_ES_VM 3
Note, KVM_X86_SW_PROTECTED_VM is currently only for development and testing.
Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in

View File

@ -0,0 +1,138 @@
.. SPDX-License-Identifier: GPL-2.0
=======================================
ARM firmware pseudo-registers interface
=======================================
KVM handles the hypercall services as requested by the guests. New hypercall
services are regularly made available by the ARM specification or by KVM (as
vendor services) if they make sense from a virtualization point of view.
This means that a guest booted on two different versions of KVM can observe
two different "firmware" revisions. This could cause issues if a given guest
is tied to a particular version of a hypercall service, or if a migration
causes a different version to be exposed out of the blue to an unsuspecting
guest.
In order to remedy this situation, KVM exposes a set of "firmware
pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
interface. These registers can be saved/restored by userspace, and set
to a convenient value as required.
The following registers are defined:
* KVM_REG_ARM_PSCI_VERSION:
KVM implements the PSCI (Power State Coordination Interface)
specification in order to provide services such as CPU on/off, reset
and power-off to the guest.
- Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
(and thus has already been initialized)
- Returns the current PSCI version on GET_ONE_REG (defaulting to the
highest PSCI version implemented by KVM and compatible with v0.2)
- Allows any PSCI version implemented by KVM and compatible with
v0.2 to be set with SET_ONE_REG
- Affects the whole VM (even if the register view is per-vcpu)
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
Holds the state of the firmware support to mitigate CVE-2017-5715, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_1 in [1].
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL:
KVM does not offer
firmware support for the workaround. The mitigation status for the
guest is unknown.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL:
The workaround HVC call is
available to the guest and required for the mitigation.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED:
The workaround HVC call
is available to the guest, but it is not needed on this VCPU.
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
Holds the state of the firmware support to mitigate CVE-2018-3639, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_2 in [1]_.
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
A workaround is not
available. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
The workaround state is
unknown. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
The workaround is available,
and can be disabled by a vCPU. If
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED is set, it is active for
this vCPU.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
The workaround is always active on this vCPU or it is not needed.
Bitmap Feature Firmware Registers
---------------------------------
Contrary to the above registers, the following registers exposes the
hypercall services in the form of a feature-bitmap to the userspace. This
bitmap is translated to the services that are available to the guest.
There is a register defined per service call owner and can be accessed via
GET/SET_ONE_REG interface.
By default, these registers are set with the upper limit of the features
that are supported. This way userspace can discover all the usable
hypercall services via GET_ONE_REG. The user-space can write-back the
desired bitmap back via SET_ONE_REG. The features for the registers that
are untouched, probably because userspace isn't aware of them, will be
exposed as is to the guest.
Note that KVM will not allow the userspace to configure the registers
anymore once any of the vCPUs has run at least once. Instead, it will
return a -EBUSY.
The pseudo-firmware bitmap register are as follows:
* KVM_REG_ARM_STD_BMAP:
Controls the bitmap of the ARM Standard Secure Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_BIT_TRNG_V1_0:
The bit represents the services offered under v1.0 of ARM True Random
Number Generator (TRNG) specification, ARM DEN0098.
* KVM_REG_ARM_STD_HYP_BMAP:
Controls the bitmap of the ARM Standard Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_HYP_BIT_PV_TIME:
The bit represents the Paravirtualized Time service as represented by
ARM DEN0057A.
* KVM_REG_ARM_VENDOR_HYP_BMAP:
Controls the bitmap of the Vendor specific Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT
The bit represents the ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID
and ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID function-ids.
Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
The bit represents the Precision Time Protocol KVM service.
Errors:
======= =============================================================
-ENOENT Unknown register accessed.
-EBUSY Attempt a 'write' to the register after the VM has started.
-EINVAL Invalid bitmap written to the register.
======= =============================================================
.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf

View File

@ -1,138 +1,46 @@
.. SPDX-License-Identifier: GPL-2.0
=======================
ARM Hypercall Interface
=======================
===============================================
KVM/arm64-specific hypercalls exposed to guests
===============================================
KVM handles the hypercall services as requested by the guests. New hypercall
services are regularly made available by the ARM specification or by KVM (as
vendor services) if they make sense from a virtualization point of view.
This file documents the KVM/arm64-specific hypercalls which may be
exposed by KVM/arm64 to guest operating systems. These hypercalls are
issued using the HVC instruction according to version 1.1 of the Arm SMC
Calling Convention (DEN0028/C):
This means that a guest booted on two different versions of KVM can observe
two different "firmware" revisions. This could cause issues if a given guest
is tied to a particular version of a hypercall service, or if a migration
causes a different version to be exposed out of the blue to an unsuspecting
guest.
https://developer.arm.com/docs/den0028/c
In order to remedy this situation, KVM exposes a set of "firmware
pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
interface. These registers can be saved/restored by userspace, and set
to a convenient value as required.
All KVM/arm64-specific hypercalls are allocated within the "Vendor
Specific Hypervisor Service Call" range with a UID of
``28b46fb6-2ec5-11e9-a9ca-4b564d003a74``. This UID should be queried by the
guest using the standard "Call UID" function for the service range in
order to determine that the KVM/arm64-specific hypercalls are available.
The following registers are defined:
``ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID``
---------------------------------------------
* KVM_REG_ARM_PSCI_VERSION:
Provides a discovery mechanism for other KVM/arm64 hypercalls.
KVM implements the PSCI (Power State Coordination Interface)
specification in order to provide services such as CPU on/off, reset
and power-off to the guest.
+---------------------+-------------------------------------------------------------+
| Presence: | Mandatory for the KVM/arm64 UID |
+---------------------+-------------------------------------------------------------+
| Calling convention: | HVC32 |
+---------------------+----------+--------------------------------------------------+
| Function ID: | (uint32) | 0x86000000 |
+---------------------+----------+--------------------------------------------------+
| Arguments: | None |
+---------------------+----------+----+---------------------------------------------+
| Return Values: | (uint32) | R0 | Bitmap of available function numbers 0-31 |
| +----------+----+---------------------------------------------+
| | (uint32) | R1 | Bitmap of available function numbers 32-63 |
| +----------+----+---------------------------------------------+
| | (uint32) | R2 | Bitmap of available function numbers 64-95 |
| +----------+----+---------------------------------------------+
| | (uint32) | R3 | Bitmap of available function numbers 96-127 |
+---------------------+----------+----+---------------------------------------------+
- Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
(and thus has already been initialized)
- Returns the current PSCI version on GET_ONE_REG (defaulting to the
highest PSCI version implemented by KVM and compatible with v0.2)
- Allows any PSCI version implemented by KVM and compatible with
v0.2 to be set with SET_ONE_REG
- Affects the whole VM (even if the register view is per-vcpu)
``ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID``
----------------------------------------
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
Holds the state of the firmware support to mitigate CVE-2017-5715, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_1 in [1].
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL:
KVM does not offer
firmware support for the workaround. The mitigation status for the
guest is unknown.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL:
The workaround HVC call is
available to the guest and required for the mitigation.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED:
The workaround HVC call
is available to the guest, but it is not needed on this VCPU.
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
Holds the state of the firmware support to mitigate CVE-2018-3639, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_2 in [1]_.
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
A workaround is not
available. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
The workaround state is
unknown. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
The workaround is available,
and can be disabled by a vCPU. If
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED is set, it is active for
this vCPU.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
The workaround is always active on this vCPU or it is not needed.
Bitmap Feature Firmware Registers
---------------------------------
Contrary to the above registers, the following registers exposes the
hypercall services in the form of a feature-bitmap to the userspace. This
bitmap is translated to the services that are available to the guest.
There is a register defined per service call owner and can be accessed via
GET/SET_ONE_REG interface.
By default, these registers are set with the upper limit of the features
that are supported. This way userspace can discover all the usable
hypercall services via GET_ONE_REG. The user-space can write-back the
desired bitmap back via SET_ONE_REG. The features for the registers that
are untouched, probably because userspace isn't aware of them, will be
exposed as is to the guest.
Note that KVM will not allow the userspace to configure the registers
anymore once any of the vCPUs has run at least once. Instead, it will
return a -EBUSY.
The pseudo-firmware bitmap register are as follows:
* KVM_REG_ARM_STD_BMAP:
Controls the bitmap of the ARM Standard Secure Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_BIT_TRNG_V1_0:
The bit represents the services offered under v1.0 of ARM True Random
Number Generator (TRNG) specification, ARM DEN0098.
* KVM_REG_ARM_STD_HYP_BMAP:
Controls the bitmap of the ARM Standard Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_HYP_BIT_PV_TIME:
The bit represents the Paravirtualized Time service as represented by
ARM DEN0057A.
* KVM_REG_ARM_VENDOR_HYP_BMAP:
Controls the bitmap of the Vendor specific Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT
The bit represents the ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID
and ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID function-ids.
Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
The bit represents the Precision Time Protocol KVM service.
Errors:
======= =============================================================
-ENOENT Unknown register accessed.
-EBUSY Attempt a 'write' to the register after the VM has started.
-EINVAL Invalid bitmap written to the register.
======= =============================================================
.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf
See ptp_kvm.rst

View File

@ -7,6 +7,7 @@ ARM
.. toctree::
:maxdepth: 2
fw-pseudo-registers
hyp-abi
hypercalls
pvtime

View File

@ -7,19 +7,29 @@ PTP_KVM is used for high precision time sync between host and guests.
It relies on transferring the wall clock and counter value from the
host to the guest using a KVM-specific hypercall.
* ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID: 0x86000001
``ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID``
----------------------------------------
This hypercall uses the SMC32/HVC32 calling convention:
Retrieve current time information for the specific counter. There are no
endianness restrictions.
ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID
============== ======== =====================================
Function ID: (uint32) 0x86000001
Arguments: (uint32) KVM_PTP_VIRT_COUNTER(0)
KVM_PTP_PHYS_COUNTER(1)
Return Values: (int32) NOT_SUPPORTED(-1) on error, or
(uint32) Upper 32 bits of wall clock time (r0)
(uint32) Lower 32 bits of wall clock time (r1)
(uint32) Upper 32 bits of counter (r2)
(uint32) Lower 32 bits of counter (r3)
Endianness: No Restrictions.
============== ======== =====================================
+---------------------+-------------------------------------------------------+
| Presence: | Optional |
+---------------------+-------------------------------------------------------+
| Calling convention: | HVC32 |
+---------------------+----------+--------------------------------------------+
| Function ID: | (uint32) | 0x86000001 |
+---------------------+----------+----+---------------------------------------+
| Arguments: | (uint32) | R1 | ``KVM_PTP_VIRT_COUNTER (0)`` |
| | | +---------------------------------------+
| | | | ``KVM_PTP_PHYS_COUNTER (1)`` |
+---------------------+----------+----+---------------------------------------+
| Return Values: | (int32) | R0 | ``NOT_SUPPORTED (-1)`` on error, else |
| | | | upper 32 bits of wall clock time |
| +----------+----+---------------------------------------+
| | (uint32) | R1 | Lower 32 bits of wall clock time |
| +----------+----+---------------------------------------+
| | (uint32) | R2 | Upper 32 bits of counter |
| +----------+----+---------------------------------------+
| | (uint32) | R3 | Lower 32 bits of counter |
+---------------------+----------+----+---------------------------------------+

View File

@ -76,15 +76,56 @@ are defined in ``<linux/psp-dev.h>``.
KVM implements the following commands to support common lifecycle events of SEV
guests, such as launching, running, snapshotting, migrating and decommissioning.
1. KVM_SEV_INIT
---------------
1. KVM_SEV_INIT2
----------------
The KVM_SEV_INIT command is used by the hypervisor to initialize the SEV platform
The KVM_SEV_INIT2 command is used by the hypervisor to initialize the SEV platform
context. In a typical workflow, this command should be the first command issued.
For this command to be accepted, either KVM_X86_SEV_VM or KVM_X86_SEV_ES_VM
must have been passed to the KVM_CREATE_VM ioctl. A virtual machine created
with those machine types in turn cannot be run until KVM_SEV_INIT2 is invoked.
Parameters: struct kvm_sev_init (in)
Returns: 0 on success, -negative on error
::
struct kvm_sev_init {
__u64 vmsa_features; /* initial value of features field in VMSA */
__u32 flags; /* must be 0 */
__u16 ghcb_version; /* maximum guest GHCB version allowed */
__u16 pad1;
__u32 pad2[8];
};
It is an error if the hypervisor does not support any of the bits that
are set in ``flags`` or ``vmsa_features``. ``vmsa_features`` must be
0 for SEV virtual machines, as they do not have a VMSA.
``ghcb_version`` must be 0 for SEV virtual machines, as they do not issue GHCB
requests. If ``ghcb_version`` is 0 for any other guest type, then the maximum
allowed guest GHCB protocol will default to version 2.
This command replaces the deprecated KVM_SEV_INIT and KVM_SEV_ES_INIT commands.
The commands did not have any parameters (the ```data``` field was unused) and
only work for the KVM_X86_DEFAULT_VM machine type (0).
They behave as if:
* the VM type is KVM_X86_SEV_VM for KVM_SEV_INIT, or KVM_X86_SEV_ES_VM for
KVM_SEV_ES_INIT
* the ``flags`` and ``vmsa_features`` fields of ``struct kvm_sev_init`` are
set to zero, and ``ghcb_version`` is set to 0 for KVM_SEV_INIT and 1 for
KVM_SEV_ES_INIT.
If the ``KVM_X86_SEV_VMSA_FEATURES`` attribute does not exist, the hypervisor only
supports KVM_SEV_INIT and KVM_SEV_ES_INIT. In that case, note that KVM_SEV_ES_INIT
might set the debug swap VMSA feature (bit 5) depending on the value of the
``debug_swap`` parameter of ``kvm-amd.ko``.
2. KVM_SEV_LAUNCH_START
-----------------------
@ -425,6 +466,18 @@ issued by the hypervisor to make the guest ready for execution.
Returns: 0 on success, -negative on error
Device attribute API
====================
Attributes of the SEV implementation can be retrieved through the
``KVM_HAS_DEVICE_ATTR`` and ``KVM_GET_DEVICE_ATTR`` ioctls on the ``/dev/kvm``
device node, using group ``KVM_X86_GRP_SEV``.
Currently only one attribute is implemented:
* ``KVM_X86_SEV_VMSA_FEATURES``: return the set of all bits that
are accepted in the ``vmsa_features`` of ``KVM_SEV_INIT2``.
Firmware Management
===================

View File

@ -404,6 +404,18 @@ static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
return (esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_ACCESS;
}
/* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
static inline bool esr_iss_is_eretax(unsigned long esr)
{
return esr & ESR_ELx_ERET_ISS_ERET;
}
/* Indicate which key is used for ERETAx (false: A-Key, true: B-Key) */
static inline bool esr_iss_is_eretab(unsigned long esr)
{
return esr & ESR_ELx_ERET_ISS_ERETA;
}
const char *esr_get_class_string(unsigned long esr);
#endif /* __ASSEMBLY */

View File

@ -73,10 +73,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_range,
__KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context,
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
__KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr,
__KVM_HOST_SMCCC_FUNC___vgic_v3_write_vmcr,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
@ -241,8 +239,6 @@ extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
extern void __kvm_adjust_pc(struct kvm_vcpu *vcpu);
extern u64 __vgic_v3_get_gic_config(void);
extern u64 __vgic_v3_read_vmcr(void);
extern void __vgic_v3_write_vmcr(u32 vmcr);
extern void __vgic_v3_init_lrs(void);
extern u64 __kvm_get_mdcr_el2(void);

View File

@ -125,16 +125,6 @@ static inline void vcpu_set_wfx_traps(struct kvm_vcpu *vcpu)
vcpu->arch.hcr_el2 |= HCR_TWI;
}
static inline void vcpu_ptrauth_enable(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 |= (HCR_API | HCR_APK);
}
static inline void vcpu_ptrauth_disable(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 &= ~(HCR_API | HCR_APK);
}
static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.vsesr_el2;
@ -587,16 +577,14 @@ static __always_inline u64 kvm_get_reset_cptr_el2(struct kvm_vcpu *vcpu)
} else if (has_hvhe()) {
val = (CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN);
if (!vcpu_has_sve(vcpu) ||
(vcpu->arch.fp_state != FP_STATE_GUEST_OWNED))
if (!vcpu_has_sve(vcpu) || !guest_owns_fp_regs())
val |= CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN;
if (cpus_have_final_cap(ARM64_SME))
val |= CPACR_EL1_SMEN_EL1EN | CPACR_EL1_SMEN_EL0EN;
} else {
val = CPTR_NVHE_EL2_RES1;
if (vcpu_has_sve(vcpu) &&
(vcpu->arch.fp_state == FP_STATE_GUEST_OWNED))
if (vcpu_has_sve(vcpu) && guest_owns_fp_regs())
val |= CPTR_EL2_TZ;
if (cpus_have_final_cap(ARM64_SME))
val &= ~CPTR_EL2_TSM;

View File

@ -211,6 +211,7 @@ typedef unsigned int pkvm_handle_t;
struct kvm_protected_vm {
pkvm_handle_t handle;
struct kvm_hyp_memcache teardown_mc;
bool enabled;
};
struct kvm_mpidr_data {
@ -220,20 +221,10 @@ struct kvm_mpidr_data {
static inline u16 kvm_mpidr_index(struct kvm_mpidr_data *data, u64 mpidr)
{
unsigned long mask = data->mpidr_mask;
u64 aff = mpidr & MPIDR_HWID_BITMASK;
int nbits, bit, bit_idx = 0;
u16 index = 0;
unsigned long index = 0, mask = data->mpidr_mask;
unsigned long aff = mpidr & MPIDR_HWID_BITMASK;
/*
* If this looks like RISC-V's BEXT or x86's PEXT
* instructions, it isn't by accident.
*/
nbits = fls(mask);
for_each_set_bit(bit, &mask, nbits) {
index |= (aff & BIT(bit)) >> (bit - bit_idx);
bit_idx++;
}
bitmap_gather(&index, &aff, &mask, fls(mask));
return index;
}
@ -530,8 +521,42 @@ struct kvm_cpu_context {
u64 *vncr_array;
};
/*
* This structure is instantiated on a per-CPU basis, and contains
* data that is:
*
* - tied to a single physical CPU, and
* - either have a lifetime that does not extend past vcpu_put()
* - or is an invariant for the lifetime of the system
*
* Use host_data_ptr(field) as a way to access a pointer to such a
* field.
*/
struct kvm_host_data {
struct kvm_cpu_context host_ctxt;
struct user_fpsimd_state *fpsimd_state; /* hyp VA */
/* Ownership of the FP regs */
enum {
FP_STATE_FREE,
FP_STATE_HOST_OWNED,
FP_STATE_GUEST_OWNED,
} fp_owner;
/*
* host_debug_state contains the host registers which are
* saved and restored during world switches.
*/
struct {
/* {Break,watch}point registers */
struct kvm_guest_debug_arch regs;
/* Statistical profiling extension */
u64 pmscr_el1;
/* Self-hosted trace */
u64 trfcr_el1;
/* Values of trap registers for the host before guest entry. */
u64 mdcr_el2;
} host_debug_state;
};
struct kvm_host_psci_config {
@ -592,19 +617,9 @@ struct kvm_vcpu_arch {
u64 mdcr_el2;
u64 cptr_el2;
/* Values of trap registers for the host before guest entry. */
u64 mdcr_el2_host;
/* Exception Information */
struct kvm_vcpu_fault_info fault;
/* Ownership of the FP regs */
enum {
FP_STATE_FREE,
FP_STATE_HOST_OWNED,
FP_STATE_GUEST_OWNED,
} fp_state;
/* Configuration flags, set once and for all before the vcpu can run */
u8 cflags;
@ -627,11 +642,10 @@ struct kvm_vcpu_arch {
* We maintain more than a single set of debug registers to support
* debugging the guest from the host and to maintain separate host and
* guest state during world switches. vcpu_debug_state are the debug
* registers of the vcpu as the guest sees them. host_debug_state are
* the host registers which are saved and restored during
* world switches. external_debug_state contains the debug
* values we want to debug the guest. This is set via the
* KVM_SET_GUEST_DEBUG ioctl.
* registers of the vcpu as the guest sees them.
*
* external_debug_state contains the debug values we want to debug the
* guest. This is set via the KVM_SET_GUEST_DEBUG ioctl.
*
* debug_ptr points to the set of debug registers that should be loaded
* onto the hardware when running the guest.
@ -640,18 +654,6 @@ struct kvm_vcpu_arch {
struct kvm_guest_debug_arch vcpu_debug_state;
struct kvm_guest_debug_arch external_debug_state;
struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
struct task_struct *parent_task;
struct {
/* {Break,watch}point registers */
struct kvm_guest_debug_arch regs;
/* Statistical profiling extension */
u64 pmscr_el1;
/* Self-hosted trace */
u64 trfcr_el1;
} host_debug_state;
/* VGIC state */
struct vgic_cpu vgic_cpu;
struct arch_timer_cpu timer_cpu;
@ -817,8 +819,6 @@ struct kvm_vcpu_arch {
#define DEBUG_STATE_SAVE_SPE __vcpu_single_flag(iflags, BIT(5))
/* Save TRBE context if active */
#define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
/* vcpu running in HYP context */
#define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7))
/* SVE enabled for host EL0 */
#define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
@ -896,7 +896,7 @@ struct kvm_vcpu_arch {
* Don't bother with VNCR-based accesses in the nVHE code, it has no
* business dealing with NV.
*/
static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
static inline u64 *___ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
{
#if !defined (__KVM_NVHE_HYPERVISOR__)
if (unlikely(cpus_have_final_cap(ARM64_HAS_NESTED_VIRT) &&
@ -906,6 +906,13 @@ static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
return (u64 *)&ctxt->sys_regs[r];
}
#define __ctxt_sys_reg(c,r) \
({ \
BUILD_BUG_ON(__builtin_constant_p(r) && \
(r) >= NR_SYS_REGS); \
___ctxt_sys_reg(c, r); \
})
#define ctxt_sys_reg(c,r) (*__ctxt_sys_reg(c,r))
u64 kvm_vcpu_sanitise_vncr_reg(const struct kvm_vcpu *, enum vcpu_sysreg);
@ -1168,6 +1175,44 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
DECLARE_KVM_HYP_PER_CPU(struct kvm_host_data, kvm_host_data);
/*
* How we access per-CPU host data depends on the where we access it from,
* and the mode we're in:
*
* - VHE and nVHE hypervisor bits use their locally defined instance
*
* - the rest of the kernel use either the VHE or nVHE one, depending on
* the mode we're running in.
*
* Unless we're in protected mode, fully deprivileged, and the nVHE
* per-CPU stuff is exclusively accessible to the protected EL2 code.
* In this case, the EL1 code uses the *VHE* data as its private state
* (which makes sense in a way as there shouldn't be any shared state
* between the host and the hypervisor).
*
* Yes, this is all totally trivial. Shoot me now.
*/
#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
#define host_data_ptr(f) (&this_cpu_ptr(&kvm_host_data)->f)
#else
#define host_data_ptr(f) \
(static_branch_unlikely(&kvm_protected_mode_initialized) ? \
&this_cpu_ptr(&kvm_host_data)->f : \
&this_cpu_ptr_hyp_sym(kvm_host_data)->f)
#endif
/* Check whether the FP regs are owned by the guest */
static inline bool guest_owns_fp_regs(void)
{
return *host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED;
}
/* Check whether the FP regs are owned by the host */
static inline bool host_owns_fp_regs(void)
{
return *host_data_ptr(fp_owner) == FP_STATE_HOST_OWNED;
}
static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt)
{
/* The host's MPIDR is immutable, so let's set it up at boot time */
@ -1211,7 +1256,6 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu);
static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
{
@ -1247,10 +1291,9 @@ struct kvm *kvm_arch_alloc_vm(void);
#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
static inline bool kvm_vm_is_protected(struct kvm *kvm)
{
return false;
}
#define kvm_vm_is_protected(kvm) (is_protected_kvm_enabled() && (kvm)->arch.pkvm.enabled)
#define vcpu_is_protected(vcpu) kvm_vm_is_protected((vcpu)->kvm)
int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature);
bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
@ -1275,6 +1318,8 @@ static inline bool __vcpu_has_feature(const struct kvm_arch *ka, int feature)
#define vcpu_has_feature(v, f) __vcpu_has_feature(&(v)->kvm->arch, (f))
#define kvm_vcpu_initialized(v) vcpu_get_flag(vcpu, VCPU_INITIALIZED)
int kvm_trng_call(struct kvm_vcpu *vcpu);
#ifdef CONFIG_KVM
extern phys_addr_t hyp_mem_base;
@ -1331,4 +1376,19 @@ bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
(get_idreg_field((kvm), id, fld) >= expand_field_sign(id, fld, min) && \
get_idreg_field((kvm), id, fld) <= expand_field_sign(id, fld, max))
/* Check for a given level of PAuth support */
#define kvm_has_pauth(k, l) \
({ \
bool pa, pi, pa3; \
\
pa = kvm_has_feat((k), ID_AA64ISAR1_EL1, APA, l); \
pa &= kvm_has_feat((k), ID_AA64ISAR1_EL1, GPA, IMP); \
pi = kvm_has_feat((k), ID_AA64ISAR1_EL1, API, l); \
pi &= kvm_has_feat((k), ID_AA64ISAR1_EL1, GPI, IMP); \
pa3 = kvm_has_feat((k), ID_AA64ISAR2_EL1, APA3, l); \
pa3 &= kvm_has_feat((k), ID_AA64ISAR2_EL1, GPA3, IMP); \
\
(pa + pi + pa3) == 1; \
})
#endif /* __ARM64_KVM_HOST_H__ */

View File

@ -80,8 +80,8 @@ void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_save_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_restore_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if);
int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
#ifdef __KVM_NVHE_HYPERVISOR__

View File

@ -60,7 +60,20 @@ static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
return ttbr0 & ~GENMASK_ULL(63, 48);
}
extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
int kvm_init_nv_sysregs(struct kvm *kvm);
#ifdef CONFIG_ARM64_PTR_AUTH
bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr);
#else
static inline bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr)
{
/* We really should never execute this... */
WARN_ON_ONCE(1);
*elr = 0xbad9acc0debadbad;
return false;
}
#endif
#endif /* __ARM64_KVM_NESTED_H */

View File

@ -99,5 +99,26 @@ alternative_else_nop_endif
.macro ptrauth_switch_to_hyp g_ctxt, h_ctxt, reg1, reg2, reg3
.endm
#endif /* CONFIG_ARM64_PTR_AUTH */
#else /* !__ASSEMBLY */
#define __ptrauth_save_key(ctxt, key) \
do { \
u64 __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYLO_EL1); \
ctxt_sys_reg(ctxt, key ## KEYLO_EL1) = __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYHI_EL1); \
ctxt_sys_reg(ctxt, key ## KEYHI_EL1) = __val; \
} while(0)
#define ptrauth_save_keys(ctxt) \
do { \
__ptrauth_save_key(ctxt, APIA); \
__ptrauth_save_key(ctxt, APIB); \
__ptrauth_save_key(ctxt, APDA); \
__ptrauth_save_key(ctxt, APDB); \
__ptrauth_save_key(ctxt, APGA); \
} while(0)
#endif /* __ASSEMBLY__ */
#endif /* __ASM_KVM_PTRAUTH_H */

View File

@ -297,6 +297,7 @@
#define TCR_TBI1 (UL(1) << 38)
#define TCR_HA (UL(1) << 39)
#define TCR_HD (UL(1) << 40)
#define TCR_TBID0 (UL(1) << 51)
#define TCR_TBID1 (UL(1) << 52)
#define TCR_NFD0 (UL(1) << 53)
#define TCR_NFD1 (UL(1) << 54)

View File

@ -82,6 +82,12 @@ bool is_kvm_arm_initialised(void);
DECLARE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
static inline bool is_pkvm_initialized(void)
{
return IS_ENABLED(CONFIG_KVM) &&
static_branch_likely(&kvm_protected_mode_initialized);
}
/* Reports the availability of HYP mode */
static inline bool is_hyp_mode_available(void)
{
@ -89,8 +95,7 @@ static inline bool is_hyp_mode_available(void)
* If KVM protected mode is initialized, all CPUs must have been booted
* in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
*/
if (IS_ENABLED(CONFIG_KVM) &&
static_branch_likely(&kvm_protected_mode_initialized))
if (is_pkvm_initialized())
return true;
return (__boot_cpu_mode[0] == BOOT_CPU_MODE_EL2 &&
@ -104,8 +109,7 @@ static inline bool is_hyp_mode_mismatched(void)
* If KVM protected mode is initialized, all CPUs must have been booted
* in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
*/
if (IS_ENABLED(CONFIG_KVM) &&
static_branch_likely(&kvm_protected_mode_initialized))
if (is_pkvm_initialized())
return false;
return __boot_cpu_mode[0] != __boot_cpu_mode[1];

View File

@ -210,8 +210,8 @@ static const struct {
char alias[FTR_ALIAS_NAME_LEN];
char feature[FTR_ALIAS_OPTION_LEN];
} aliases[] __initconst = {
{ "kvm_arm.mode=nvhe", "id_aa64mmfr1.vh=0" },
{ "kvm_arm.mode=protected", "id_aa64mmfr1.vh=0" },
{ "kvm_arm.mode=nvhe", "arm64_sw.hvhe=0 id_aa64mmfr1.vh=0" },
{ "kvm_arm.mode=protected", "arm64_sw.hvhe=1" },
{ "arm64.nosve", "id_aa64pfr0.sve=0" },
{ "arm64.nosme", "id_aa64pfr1.sme=0" },
{ "arm64.nobti", "id_aa64pfr1.bt=0" },

View File

@ -23,6 +23,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
vgic/vgic-its.o vgic/vgic-debug.o
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o
kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o
always-y := hyp_constants.h hyp-constants.s

View File

@ -35,10 +35,11 @@
#include <asm/virt.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_nested.h>
#include <asm/kvm_pkvm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_ptrauth.h>
#include <asm/sections.h>
#include <kvm/arm_hypercalls.h>
@ -69,15 +70,42 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
}
/*
* This functions as an allow-list of protected VM capabilities.
* Features not explicitly allowed by this function are denied.
*/
static bool pkvm_ext_allowed(struct kvm *kvm, long ext)
{
switch (ext) {
case KVM_CAP_IRQCHIP:
case KVM_CAP_ARM_PSCI:
case KVM_CAP_ARM_PSCI_0_2:
case KVM_CAP_NR_VCPUS:
case KVM_CAP_MAX_VCPUS:
case KVM_CAP_MAX_VCPU_ID:
case KVM_CAP_MSI_DEVID:
case KVM_CAP_ARM_VM_IPA_SIZE:
case KVM_CAP_ARM_PMU_V3:
case KVM_CAP_ARM_SVE:
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
case KVM_CAP_ARM_PTRAUTH_GENERIC:
return true;
default:
return false;
}
}
int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
struct kvm_enable_cap *cap)
{
int r;
u64 new_cap;
int r = -EINVAL;
if (cap->flags)
return -EINVAL;
if (kvm_vm_is_protected(kvm) && !pkvm_ext_allowed(kvm, cap->cap))
return -EINVAL;
switch (cap->cap) {
case KVM_CAP_ARM_NISV_TO_USER:
r = 0;
@ -86,9 +114,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
break;
case KVM_CAP_ARM_MTE:
mutex_lock(&kvm->lock);
if (!system_supports_mte() || kvm->created_vcpus) {
r = -EINVAL;
} else {
if (system_supports_mte() && !kvm->created_vcpus) {
r = 0;
set_bit(KVM_ARCH_FLAG_MTE_ENABLED, &kvm->arch.flags);
}
@ -99,25 +125,22 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
set_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags);
break;
case KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE:
new_cap = cap->args[0];
mutex_lock(&kvm->slots_lock);
/*
* To keep things simple, allow changing the chunk
* size only when no memory slots have been created.
*/
if (!kvm_are_all_memslots_empty(kvm)) {
r = -EINVAL;
} else if (new_cap && !kvm_is_block_size_supported(new_cap)) {
r = -EINVAL;
} else {
r = 0;
kvm->arch.mmu.split_page_chunk_size = new_cap;
if (kvm_are_all_memslots_empty(kvm)) {
u64 new_cap = cap->args[0];
if (!new_cap || kvm_is_block_size_supported(new_cap)) {
r = 0;
kvm->arch.mmu.split_page_chunk_size = new_cap;
}
}
mutex_unlock(&kvm->slots_lock);
break;
default:
r = -EINVAL;
break;
}
@ -195,6 +218,23 @@ void kvm_arch_create_vm_debugfs(struct kvm *kvm)
kvm_sys_regs_create_debugfs(kvm);
}
static void kvm_destroy_mpidr_data(struct kvm *kvm)
{
struct kvm_mpidr_data *data;
mutex_lock(&kvm->arch.config_lock);
data = rcu_dereference_protected(kvm->arch.mpidr_data,
lockdep_is_held(&kvm->arch.config_lock));
if (data) {
rcu_assign_pointer(kvm->arch.mpidr_data, NULL);
synchronize_rcu();
kfree(data);
}
mutex_unlock(&kvm->arch.config_lock);
}
/**
* kvm_arch_destroy_vm - destroy the VM data structure
* @kvm: pointer to the KVM struct
@ -209,7 +249,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
if (is_protected_kvm_enabled())
pkvm_destroy_hyp_vm(kvm);
kfree(kvm->arch.mpidr_data);
kvm_destroy_mpidr_data(kvm);
kfree(kvm->arch.sysreg_masks);
kvm_destroy_vcpus(kvm);
@ -218,9 +259,47 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_arm_teardown_hypercalls(kvm);
}
static bool kvm_has_full_ptr_auth(void)
{
bool apa, gpa, api, gpi, apa3, gpa3;
u64 isar1, isar2, val;
/*
* Check that:
*
* - both Address and Generic auth are implemented for a given
* algorithm (Q5, IMPDEF or Q3)
* - only a single algorithm is implemented.
*/
if (!system_has_full_ptr_auth())
return false;
isar1 = read_sanitised_ftr_reg(SYS_ID_AA64ISAR1_EL1);
isar2 = read_sanitised_ftr_reg(SYS_ID_AA64ISAR2_EL1);
apa = !!FIELD_GET(ID_AA64ISAR1_EL1_APA_MASK, isar1);
val = FIELD_GET(ID_AA64ISAR1_EL1_GPA_MASK, isar1);
gpa = (val == ID_AA64ISAR1_EL1_GPA_IMP);
api = !!FIELD_GET(ID_AA64ISAR1_EL1_API_MASK, isar1);
val = FIELD_GET(ID_AA64ISAR1_EL1_GPI_MASK, isar1);
gpi = (val == ID_AA64ISAR1_EL1_GPI_IMP);
apa3 = !!FIELD_GET(ID_AA64ISAR2_EL1_APA3_MASK, isar2);
val = FIELD_GET(ID_AA64ISAR2_EL1_GPA3_MASK, isar2);
gpa3 = (val == ID_AA64ISAR2_EL1_GPA3_IMP);
return (apa == gpa && api == gpi && apa3 == gpa3 &&
(apa + api + apa3) == 1);
}
int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
{
int r;
if (kvm && kvm_vm_is_protected(kvm) && !pkvm_ext_allowed(kvm, ext))
return 0;
switch (ext) {
case KVM_CAP_IRQCHIP:
r = vgic_present;
@ -311,7 +390,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
break;
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
case KVM_CAP_ARM_PTRAUTH_GENERIC:
r = system_has_full_ptr_auth();
r = kvm_has_full_ptr_auth();
break;
case KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE:
if (kvm)
@ -378,12 +457,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
/*
* Default value for the FP state, will be overloaded at load
* time if we support FP (pretty likely)
*/
vcpu->arch.fp_state = FP_STATE_FREE;
/* Set up the timer */
kvm_timer_vcpu_init(vcpu);
@ -395,6 +468,13 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
/*
* This vCPU may have been created after mpidr_data was initialized.
* Throw out the pre-computed mappings if that is the case which forces
* KVM to fall back to iteratively searching the vCPUs.
*/
kvm_destroy_mpidr_data(vcpu->kvm);
err = kvm_vgic_vcpu_init(vcpu);
if (err)
return err;
@ -428,6 +508,44 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
}
static void vcpu_set_pauth_traps(struct kvm_vcpu *vcpu)
{
if (vcpu_has_ptrauth(vcpu)) {
/*
* Either we're running running an L2 guest, and the API/APK
* bits come from L1's HCR_EL2, or API/APK are both set.
*/
if (unlikely(vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu))) {
u64 val;
val = __vcpu_sys_reg(vcpu, HCR_EL2);
val &= (HCR_API | HCR_APK);
vcpu->arch.hcr_el2 &= ~(HCR_API | HCR_APK);
vcpu->arch.hcr_el2 |= val;
} else {
vcpu->arch.hcr_el2 |= (HCR_API | HCR_APK);
}
/*
* Save the host keys if there is any chance for the guest
* to use pauth, as the entry code will reload the guest
* keys in that case.
* Protected mode is the exception to that rule, as the
* entry into the EL2 code eagerly switch back and forth
* between host and hyp keys (and kvm_hyp_ctxt is out of
* reach anyway).
*/
if (is_protected_kvm_enabled())
return;
if (vcpu->arch.hcr_el2 & (HCR_API | HCR_APK)) {
struct kvm_cpu_context *ctxt;
ctxt = this_cpu_ptr_hyp_sym(kvm_hyp_ctxt);
ptrauth_save_keys(ctxt);
}
}
}
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
struct kvm_s2_mmu *mmu;
@ -466,8 +584,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
else
vcpu_set_wfx_traps(vcpu);
if (vcpu_has_ptrauth(vcpu))
vcpu_ptrauth_disable(vcpu);
vcpu_set_pauth_traps(vcpu);
kvm_arch_vcpu_load_debug_state_flags(vcpu);
if (!cpumask_test_cpu(cpu, vcpu->kvm->arch.supported_cpus))
@ -580,11 +698,6 @@ unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)
}
#endif
static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
{
return vcpu_get_flag(vcpu, VCPU_INITIALIZED);
}
static void kvm_init_mpidr_data(struct kvm *kvm)
{
struct kvm_mpidr_data *data = NULL;
@ -594,7 +707,8 @@ static void kvm_init_mpidr_data(struct kvm *kvm)
mutex_lock(&kvm->arch.config_lock);
if (kvm->arch.mpidr_data || atomic_read(&kvm->online_vcpus) == 1)
if (rcu_access_pointer(kvm->arch.mpidr_data) ||
atomic_read(&kvm->online_vcpus) == 1)
goto out;
kvm_for_each_vcpu(c, vcpu, kvm) {
@ -631,7 +745,7 @@ static void kvm_init_mpidr_data(struct kvm *kvm)
data->cmpidr_to_idx[index] = c;
}
kvm->arch.mpidr_data = data;
rcu_assign_pointer(kvm->arch.mpidr_data, data);
out:
mutex_unlock(&kvm->arch.config_lock);
}
@ -790,9 +904,8 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
* doorbells to be signalled, should an interrupt become pending.
*/
preempt_disable();
kvm_vgic_vmcr_sync(vcpu);
vcpu_set_flag(vcpu, IN_WFI);
vgic_v4_put(vcpu);
kvm_vgic_put(vcpu);
preempt_enable();
kvm_vcpu_halt(vcpu);
@ -800,7 +913,7 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
preempt_disable();
vcpu_clear_flag(vcpu, IN_WFI);
vgic_v4_load(vcpu);
kvm_vgic_load(vcpu);
preempt_enable();
}
@ -980,7 +1093,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
if (run->exit_reason == KVM_EXIT_MMIO) {
ret = kvm_handle_mmio_return(vcpu);
if (ret)
if (ret <= 0)
return ret;
}
@ -1270,7 +1383,7 @@ static unsigned long system_supported_vcpu_features(void)
if (!system_supports_sve())
clear_bit(KVM_ARM_VCPU_SVE, &features);
if (!system_has_full_ptr_auth()) {
if (!kvm_has_full_ptr_auth()) {
clear_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, &features);
clear_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, &features);
}
@ -1971,7 +2084,7 @@ static void cpu_set_hyp_vector(void)
static void cpu_hyp_init_context(void)
{
kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
kvm_init_host_cpu_context(host_data_ptr(host_ctxt));
if (!is_kernel_in_hyp_mode())
cpu_init_hyp_mode();
@ -2470,22 +2583,28 @@ out_err:
struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
{
struct kvm_vcpu *vcpu;
struct kvm_vcpu *vcpu = NULL;
struct kvm_mpidr_data *data;
unsigned long i;
mpidr &= MPIDR_HWID_BITMASK;
if (kvm->arch.mpidr_data) {
u16 idx = kvm_mpidr_index(kvm->arch.mpidr_data, mpidr);
rcu_read_lock();
data = rcu_dereference(kvm->arch.mpidr_data);
vcpu = kvm_get_vcpu(kvm,
kvm->arch.mpidr_data->cmpidr_to_idx[idx]);
if (data) {
u16 idx = kvm_mpidr_index(data, mpidr);
vcpu = kvm_get_vcpu(kvm, data->cmpidr_to_idx[idx]);
if (mpidr != kvm_vcpu_get_mpidr_aff(vcpu))
vcpu = NULL;
return vcpu;
}
rcu_read_unlock();
if (vcpu)
return vcpu;
kvm_for_each_vcpu(i, vcpu, kvm) {
if (mpidr == kvm_vcpu_get_mpidr_aff(vcpu))
return vcpu;

View File

@ -2117,6 +2117,26 @@ inject:
return true;
}
static bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
{
bool control_bit_set;
if (!vcpu_has_nv(vcpu))
return false;
control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
if (!is_hyp_ctxt(vcpu) && control_bit_set) {
kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
return true;
}
return false;
}
bool forward_smc_trap(struct kvm_vcpu *vcpu)
{
return forward_traps(vcpu, HCR_TSC);
}
static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
{
u64 mode = spsr & PSR_MODE_MASK;
@ -2152,37 +2172,39 @@ static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
{
u64 spsr, elr, mode;
bool direct_eret;
u64 spsr, elr, esr;
/*
* Going through the whole put/load motions is a waste of time
* if this is a VHE guest hypervisor returning to its own
* userspace, or the hypervisor performing a local exception
* return. No need to save/restore registers, no need to
* switch S2 MMU. Just do the canonical ERET.
* Forward this trap to the virtual EL2 if the virtual
* HCR_EL2.NV bit is set and this is coming from !EL2.
*/
spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
spsr = kvm_check_illegal_exception_return(vcpu, spsr);
mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
direct_eret = (mode == PSR_MODE_EL0t &&
vcpu_el2_e2h_is_set(vcpu) &&
vcpu_el2_tge_is_set(vcpu));
direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
if (direct_eret) {
*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
*vcpu_cpsr(vcpu) = spsr;
trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
if (forward_traps(vcpu, HCR_NV))
return;
/* Check for an ERETAx */
esr = kvm_vcpu_get_esr(vcpu);
if (esr_iss_is_eretax(esr) && !kvm_auth_eretax(vcpu, &elr)) {
/*
* Oh no, ERETAx failed to authenticate. If we have
* FPACCOMBINE, deliver an exception right away. If we
* don't, then let the mangled ELR value trickle down the
* ERET handling, and the guest will have a little surprise.
*/
if (kvm_has_pauth(vcpu->kvm, FPACCOMBINE)) {
esr &= ESR_ELx_ERET_ISS_ERETA;
esr |= FIELD_PREP(ESR_ELx_EC_MASK, ESR_ELx_EC_FPAC);
kvm_inject_nested_sync(vcpu, esr);
return;
}
}
preempt_disable();
kvm_arch_vcpu_put(vcpu);
elr = __vcpu_sys_reg(vcpu, ELR_EL2);
spsr = __vcpu_sys_reg(vcpu, SPSR_EL2);
spsr = kvm_check_illegal_exception_return(vcpu, spsr);
if (!esr_iss_is_eretax(esr))
elr = __vcpu_sys_reg(vcpu, ELR_EL2);
trace_kvm_nested_eret(vcpu, elr, spsr);

View File

@ -14,19 +14,6 @@
#include <asm/kvm_mmu.h>
#include <asm/sysreg.h>
void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
{
struct task_struct *p = vcpu->arch.parent_task;
struct user_fpsimd_state *fpsimd;
if (!is_protected_kvm_enabled() || !p)
return;
fpsimd = &p->thread.uw.fpsimd_state;
kvm_unshare_hyp(fpsimd, fpsimd + 1);
put_task_struct(p);
}
/*
* Called on entry to KVM_RUN unless this vcpu previously ran at least
* once and the most recent prior KVM_RUN for this vcpu was called from
@ -38,30 +25,18 @@ void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
*/
int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
{
struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
int ret;
struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
kvm_vcpu_unshare_task_fp(vcpu);
/* pKVM has its own tracking of the host fpsimd state. */
if (is_protected_kvm_enabled())
return 0;
/* Make sure the host task fpsimd state is visible to hyp: */
ret = kvm_share_hyp(fpsimd, fpsimd + 1);
if (ret)
return ret;
vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
/*
* We need to keep current's task_struct pinned until its data has been
* unshared with the hypervisor to make sure it is not re-used by the
* kernel and donated to someone else while already shared -- see
* kvm_vcpu_unshare_task_fp() for the matching put_task_struct().
*/
if (is_protected_kvm_enabled()) {
get_task_struct(current);
vcpu->arch.parent_task = current;
}
return 0;
}
@ -86,7 +61,8 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
* guest in kvm_arch_vcpu_ctxflush_fp() and override this to
* FP_STATE_FREE if the flag set.
*/
vcpu->arch.fp_state = FP_STATE_HOST_OWNED;
*host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
*host_data_ptr(fpsimd_state) = kern_hyp_va(&current->thread.uw.fpsimd_state);
vcpu_clear_flag(vcpu, HOST_SVE_ENABLED);
if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
@ -110,7 +86,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
* been saved, this is very unlikely to happen.
*/
if (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK)) {
vcpu->arch.fp_state = FP_STATE_FREE;
*host_data_ptr(fp_owner) = FP_STATE_FREE;
fpsimd_save_and_flush_cpu_state();
}
}
@ -126,7 +102,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu)
{
if (test_thread_flag(TIF_FOREIGN_FPSTATE))
vcpu->arch.fp_state = FP_STATE_FREE;
*host_data_ptr(fp_owner) = FP_STATE_FREE;
}
/*
@ -142,8 +118,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
WARN_ON_ONCE(!irqs_disabled());
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
if (guest_owns_fp_regs()) {
/*
* Currently we do not support SME guests so SVCR is
* always 0 and we just need a variable to point to.
@ -196,16 +171,38 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
isb();
}
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
if (guest_owns_fp_regs()) {
if (vcpu_has_sve(vcpu)) {
__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
/* Restore the VL that was saved when bound to the CPU */
/*
* Restore the VL that was saved when bound to the CPU,
* which is the maximum VL for the guest. Because the
* layout of the data when saving the sve state depends
* on the VL, we need to use a consistent (i.e., the
* maximum) VL.
* Note that this means that at guest exit ZCR_EL1 is
* not necessarily the same as on guest entry.
*
* Restoring the VL isn't needed in VHE mode since
* ZCR_EL2 (accessed via ZCR_EL1) would fulfill the same
* role when doing the save from EL2.
*/
if (!has_vhe())
sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1,
SYS_ZCR_EL1);
}
/*
* Flush (save and invalidate) the fpsimd/sve state so that if
* the host tries to use fpsimd/sve, it's not using stale data
* from the guest.
*
* Flushing the state sets the TIF_FOREIGN_FPSTATE bit for the
* context unconditionally, in both nVHE and VHE. This allows
* the kernel to restore the fpsimd/sve state, including ZCR_EL1
* when needed.
*/
fpsimd_save_and_flush_cpu_state();
} else if (has_vhe() && system_supports_sve()) {
/*

View File

@ -55,6 +55,13 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
static int handle_smc(struct kvm_vcpu *vcpu)
{
/*
* Forward this trapped smc instruction to the virtual EL2 if
* the guest has asked for it.
*/
if (forward_smc_trap(vcpu))
return 1;
/*
* "If an SMC instruction executed at Non-secure EL1 is
* trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
@ -207,19 +214,40 @@ static int handle_sve(struct kvm_vcpu *vcpu)
}
/*
* Guest usage of a ptrauth instruction (which the guest EL1 did not turn into
* a NOP). If we get here, it is that we didn't fixup ptrauth on exit, and all
* that we can do is give the guest an UNDEF.
* Two possibilities to handle a trapping ptrauth instruction:
*
* - Guest usage of a ptrauth instruction (which the guest EL1 did not
* turn into a NOP). If we get here, it is because we didn't enable
* ptrauth for the guest. This results in an UNDEF, as it isn't
* supposed to use ptrauth without being told it could.
*
* - Running an L2 NV guest while L1 has left HCR_EL2.API==0, and for
* which we reinject the exception into L1.
*
* Anything else is an emulation bug (hence the WARN_ON + UNDEF).
*/
static int kvm_handle_ptrauth(struct kvm_vcpu *vcpu)
{
if (!vcpu_has_ptrauth(vcpu)) {
kvm_inject_undefined(vcpu);
return 1;
}
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu)) {
kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
return 1;
}
/* Really shouldn't be here! */
WARN_ON_ONCE(1);
kvm_inject_undefined(vcpu);
return 1;
}
static int kvm_handle_eret(struct kvm_vcpu *vcpu)
{
if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_ERET_ISS_ERET)
if (esr_iss_is_eretax(kvm_vcpu_get_esr(vcpu)) &&
!vcpu_has_ptrauth(vcpu))
return kvm_handle_ptrauth(vcpu);
/*

View File

@ -135,9 +135,9 @@ static inline void __debug_switch_to_guest_common(struct kvm_vcpu *vcpu)
if (!vcpu_get_flag(vcpu, DEBUG_DIRTY))
return;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
guest_ctxt = &vcpu->arch.ctxt;
host_dbg = &vcpu->arch.host_debug_state.regs;
host_dbg = host_data_ptr(host_debug_state.regs);
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
__debug_save_state(host_dbg, host_ctxt);
@ -154,9 +154,9 @@ static inline void __debug_switch_to_host_common(struct kvm_vcpu *vcpu)
if (!vcpu_get_flag(vcpu, DEBUG_DIRTY))
return;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
guest_ctxt = &vcpu->arch.ctxt;
host_dbg = &vcpu->arch.host_debug_state.regs;
host_dbg = host_data_ptr(host_debug_state.regs);
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
__debug_save_state(guest_dbg, guest_ctxt);

View File

@ -27,6 +27,7 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_nested.h>
#include <asm/kvm_ptrauth.h>
#include <asm/fpsimd.h>
#include <asm/debug-monitors.h>
#include <asm/processor.h>
@ -39,12 +40,6 @@ struct kvm_exception_table_entry {
extern struct kvm_exception_table_entry __start___kvm_ex_table;
extern struct kvm_exception_table_entry __stop___kvm_ex_table;
/* Check whether the FP regs are owned by the guest */
static inline bool guest_owns_fp_regs(struct kvm_vcpu *vcpu)
{
return vcpu->arch.fp_state == FP_STATE_GUEST_OWNED;
}
/* Save the 32-bit only FPSIMD system register state */
static inline void __fpsimd_save_fpexc32(struct kvm_vcpu *vcpu)
{
@ -155,7 +150,7 @@ static inline bool cpu_has_amu(void)
static inline void __activate_traps_hfgxtr(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
CHECK_FGT_MASKS(HFGRTR_EL2);
@ -191,7 +186,7 @@ static inline void __activate_traps_hfgxtr(struct kvm_vcpu *vcpu)
static inline void __deactivate_traps_hfgxtr(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
if (!cpus_have_final_cap(ARM64_HAS_FGT))
@ -226,13 +221,13 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
write_sysreg(0, pmselr_el0);
hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
hctxt = host_data_ptr(host_ctxt);
ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
vcpu_set_flag(vcpu, PMUSERENR_ON_CPU);
}
vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
*host_data_ptr(host_debug_state.mdcr_el2) = read_sysreg(mdcr_el2);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
if (cpus_have_final_cap(ARM64_HAS_HCX)) {
@ -254,13 +249,13 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
{
write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
write_sysreg(*host_data_ptr(host_debug_state.mdcr_el2), mdcr_el2);
write_sysreg(0, hstr_el2);
if (kvm_arm_support_pmu_v3()) {
struct kvm_cpu_context *hctxt;
hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
hctxt = host_data_ptr(host_ctxt);
write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0);
vcpu_clear_flag(vcpu, PMUSERENR_ON_CPU);
}
@ -271,10 +266,8 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
__deactivate_traps_hfgxtr(vcpu);
}
static inline void ___activate_traps(struct kvm_vcpu *vcpu)
static inline void ___activate_traps(struct kvm_vcpu *vcpu, u64 hcr)
{
u64 hcr = vcpu->arch.hcr_el2;
if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM))
hcr |= HCR_TVM;
@ -376,8 +369,8 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
isb();
/* Write out the host state if it's in the registers */
if (vcpu->arch.fp_state == FP_STATE_HOST_OWNED)
__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
if (host_owns_fp_regs())
__fpsimd_save_state(*host_data_ptr(fpsimd_state));
/* Restore the guest state */
if (sve_guest)
@ -389,7 +382,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
if (!(read_sysreg(hcr_el2) & HCR_RW))
write_sysreg(__vcpu_sys_reg(vcpu, FPEXC32_EL2), fpexc32_el2);
vcpu->arch.fp_state = FP_STATE_GUEST_OWNED;
*host_data_ptr(fp_owner) = FP_STATE_GUEST_OWNED;
return true;
}
@ -449,60 +442,6 @@ static inline bool handle_tx2_tvm(struct kvm_vcpu *vcpu)
return true;
}
static inline bool esr_is_ptrauth_trap(u64 esr)
{
switch (esr_sys64_to_sysreg(esr)) {
case SYS_APIAKEYLO_EL1:
case SYS_APIAKEYHI_EL1:
case SYS_APIBKEYLO_EL1:
case SYS_APIBKEYHI_EL1:
case SYS_APDAKEYLO_EL1:
case SYS_APDAKEYHI_EL1:
case SYS_APDBKEYLO_EL1:
case SYS_APDBKEYHI_EL1:
case SYS_APGAKEYLO_EL1:
case SYS_APGAKEYHI_EL1:
return true;
}
return false;
}
#define __ptrauth_save_key(ctxt, key) \
do { \
u64 __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYLO_EL1); \
ctxt_sys_reg(ctxt, key ## KEYLO_EL1) = __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYHI_EL1); \
ctxt_sys_reg(ctxt, key ## KEYHI_EL1) = __val; \
} while(0)
DECLARE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
static bool kvm_hyp_handle_ptrauth(struct kvm_vcpu *vcpu, u64 *exit_code)
{
struct kvm_cpu_context *ctxt;
u64 val;
if (!vcpu_has_ptrauth(vcpu))
return false;
ctxt = this_cpu_ptr(&kvm_hyp_ctxt);
__ptrauth_save_key(ctxt, APIA);
__ptrauth_save_key(ctxt, APIB);
__ptrauth_save_key(ctxt, APDA);
__ptrauth_save_key(ctxt, APDB);
__ptrauth_save_key(ctxt, APGA);
vcpu_ptrauth_enable(vcpu);
val = read_sysreg(hcr_el2);
val |= (HCR_API | HCR_APK);
write_sysreg(val, hcr_el2);
return true;
}
static bool kvm_hyp_handle_cntpct(struct kvm_vcpu *vcpu)
{
struct arch_timer_context *ctxt;
@ -590,9 +529,6 @@ static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code)
__vgic_v3_perform_cpuif_access(vcpu) == 1)
return true;
if (esr_is_ptrauth_trap(kvm_vcpu_get_esr(vcpu)))
return kvm_hyp_handle_ptrauth(vcpu, exit_code);
if (kvm_hyp_handle_cntpct(vcpu))
return true;

View File

@ -53,7 +53,13 @@ pkvm_hyp_vcpu_to_hyp_vm(struct pkvm_hyp_vcpu *hyp_vcpu)
return container_of(hyp_vcpu->vcpu.kvm, struct pkvm_hyp_vm, kvm);
}
static inline bool pkvm_hyp_vcpu_is_protected(struct pkvm_hyp_vcpu *hyp_vcpu)
{
return vcpu_is_protected(&hyp_vcpu->vcpu);
}
void pkvm_hyp_vm_table_init(void *tbl);
void pkvm_host_fpsimd_state_init(void);
int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
unsigned long pgd_hva);

View File

@ -83,10 +83,10 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
/* Disable and flush SPE data generation */
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
__debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
__debug_save_spe(host_data_ptr(host_debug_state.pmscr_el1));
/* Disable and flush Self-Hosted Trace generation */
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
__debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
__debug_save_trace(host_data_ptr(host_debug_state.trfcr_el1));
}
void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
@ -97,9 +97,9 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
__debug_restore_spe(*host_data_ptr(host_debug_state.pmscr_el1));
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
__debug_restore_trace(*host_data_ptr(host_debug_state.trfcr_el1));
}
void __debug_switch_to_host(struct kvm_vcpu *vcpu)

View File

@ -600,7 +600,6 @@ static bool ffa_call_supported(u64 func_id)
case FFA_MSG_POLL:
case FFA_MSG_WAIT:
/* 32-bit variants of 64-bit calls */
case FFA_MSG_SEND_DIRECT_REQ:
case FFA_MSG_SEND_DIRECT_RESP:
case FFA_RXTX_MAP:
case FFA_MEM_DONATE:

View File

@ -39,10 +39,8 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
hyp_vcpu->vcpu.arch.cptr_el2 = host_vcpu->arch.cptr_el2;
hyp_vcpu->vcpu.arch.iflags = host_vcpu->arch.iflags;
hyp_vcpu->vcpu.arch.fp_state = host_vcpu->arch.fp_state;
hyp_vcpu->vcpu.arch.debug_ptr = kern_hyp_va(host_vcpu->arch.debug_ptr);
hyp_vcpu->vcpu.arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;
hyp_vcpu->vcpu.arch.vsesr_el2 = host_vcpu->arch.vsesr_el2;
@ -64,7 +62,6 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
host_vcpu->arch.fault = hyp_vcpu->vcpu.arch.fault;
host_vcpu->arch.iflags = hyp_vcpu->vcpu.arch.iflags;
host_vcpu->arch.fp_state = hyp_vcpu->vcpu.arch.fp_state;
host_cpu_if->vgic_hcr = hyp_cpu_if->vgic_hcr;
for (i = 0; i < hyp_cpu_if->used_lrs; ++i)
@ -178,16 +175,6 @@ static void handle___vgic_v3_get_gic_config(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __vgic_v3_get_gic_config();
}
static void handle___vgic_v3_read_vmcr(struct kvm_cpu_context *host_ctxt)
{
cpu_reg(host_ctxt, 1) = __vgic_v3_read_vmcr();
}
static void handle___vgic_v3_write_vmcr(struct kvm_cpu_context *host_ctxt)
{
__vgic_v3_write_vmcr(cpu_reg(host_ctxt, 1));
}
static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
{
__vgic_v3_init_lrs();
@ -198,18 +185,18 @@ static void handle___kvm_get_mdcr_el2(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __kvm_get_mdcr_el2();
}
static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
static void handle___vgic_v3_save_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
__vgic_v3_save_aprs(kern_hyp_va(cpu_if));
__vgic_v3_save_vmcr_aprs(kern_hyp_va(cpu_if));
}
static void handle___vgic_v3_restore_aprs(struct kvm_cpu_context *host_ctxt)
static void handle___vgic_v3_restore_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
__vgic_v3_restore_aprs(kern_hyp_va(cpu_if));
__vgic_v3_restore_vmcr_aprs(kern_hyp_va(cpu_if));
}
static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
@ -340,10 +327,8 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_tlb_flush_vmid_range),
HANDLE_FUNC(__kvm_flush_cpu_context),
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__vgic_v3_read_vmcr),
HANDLE_FUNC(__vgic_v3_write_vmcr),
HANDLE_FUNC(__vgic_v3_save_aprs),
HANDLE_FUNC(__vgic_v3_restore_aprs),
HANDLE_FUNC(__vgic_v3_save_vmcr_aprs),
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
HANDLE_FUNC(__pkvm_vcpu_init_traps),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),

View File

@ -533,7 +533,13 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
int ret = 0;
esr = read_sysreg_el2(SYS_ESR);
BUG_ON(!__get_fault_info(esr, &fault));
if (!__get_fault_info(esr, &fault)) {
/*
* We've presumably raced with a page-table change which caused
* AT to fail, try again.
*/
return;
}
addr = (fault.hpfar_el2 & HPFAR_MASK) << 8;
ret = host_stage2_idmap(addr);

View File

@ -200,7 +200,7 @@ static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
}
/*
* Initialize trap register values for protected VMs.
* Initialize trap register values in protected mode.
*/
void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
{
@ -247,6 +247,17 @@ void pkvm_hyp_vm_table_init(void *tbl)
vm_table = tbl;
}
void pkvm_host_fpsimd_state_init(void)
{
unsigned long i;
for (i = 0; i < hyp_nr_cpus; i++) {
struct kvm_host_data *host_data = per_cpu_ptr(&kvm_host_data, i);
host_data->fpsimd_state = &host_data->host_ctxt.fp_regs;
}
}
/*
* Return the hyp vm structure corresponding to the handle.
*/
@ -430,6 +441,7 @@ static void *map_donated_memory(unsigned long host_va, size_t size)
static void __unmap_donated_memory(void *va, size_t size)
{
kvm_flush_dcache_to_poc(va, size);
WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
PAGE_ALIGN(size) >> PAGE_SHIFT));
}

View File

@ -205,7 +205,7 @@ asmlinkage void __noreturn __kvm_host_psci_cpu_entry(bool is_cpu_on)
struct psci_boot_args *boot_args;
struct kvm_cpu_context *host_ctxt;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
if (is_cpu_on)
boot_args = this_cpu_ptr(&cpu_on_args);

View File

@ -257,8 +257,7 @@ static int fix_hyp_pgtable_refcnt(void)
void __noreturn __pkvm_init_finalise(void)
{
struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
struct kvm_cpu_context *host_ctxt = host_data_ptr(host_ctxt);
unsigned long nr_pages, reserved_pages, pfn;
int ret;
@ -301,6 +300,7 @@ void __noreturn __pkvm_init_finalise(void)
goto out;
pkvm_hyp_vm_table_init(vm_table_base);
pkvm_host_fpsimd_state_init();
out:
/*
* We tail-called to here from handle___pkvm_init() and will not return,

View File

@ -40,7 +40,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
___activate_traps(vcpu);
___activate_traps(vcpu, vcpu->arch.hcr_el2);
__activate_traps_common(vcpu);
val = vcpu->arch.cptr_el2;
@ -53,7 +53,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val |= CPTR_EL2_TSM;
}
if (!guest_owns_fp_regs(vcpu)) {
if (!guest_owns_fp_regs()) {
if (has_hvhe())
val &= ~(CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN |
CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN);
@ -191,7 +191,6 @@ static const exit_handler_fn hyp_exit_handlers[] = {
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
[ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low,
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
[ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops,
};
@ -203,13 +202,12 @@ static const exit_handler_fn pvm_exit_handlers[] = {
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
[ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low,
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
[ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops,
};
static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
{
if (unlikely(kvm_vm_is_protected(kern_hyp_va(vcpu->kvm))))
if (unlikely(vcpu_is_protected(vcpu)))
return pvm_exit_handlers;
return hyp_exit_handlers;
@ -228,9 +226,7 @@ static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
*/
static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
{
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
if (kvm_vm_is_protected(kvm) && vcpu_mode_is_32bit(vcpu)) {
if (unlikely(vcpu_is_protected(vcpu) && vcpu_mode_is_32bit(vcpu))) {
/*
* As we have caught the guest red-handed, decide that it isn't
* fit for purpose anymore by making the vcpu invalid. The VMM
@ -264,7 +260,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
pmr_sync();
}
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = &vcpu->arch.ctxt;
@ -337,7 +333,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
__sysreg_restore_state_nvhe(host_ctxt);
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)
if (guest_owns_fp_regs())
__fpsimd_save_fpexc32(vcpu);
__debug_switch_to_host(vcpu);
@ -367,7 +363,7 @@ asmlinkage void __noreturn hyp_panic(void)
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
vcpu = host_ctxt->__hyp_running_vcpu;
if (vcpu) {

View File

@ -11,13 +11,23 @@
#include <nvhe/mem_protect.h>
struct tlb_inv_context {
u64 tcr;
struct kvm_s2_mmu *mmu;
u64 tcr;
u64 sctlr;
};
static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt,
bool nsh)
static void enter_vmid_context(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt,
bool nsh)
{
struct kvm_s2_mmu *host_s2_mmu = &host_mmu.arch.mmu;
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
vcpu = host_ctxt->__hyp_running_vcpu;
cxt->mmu = NULL;
/*
* We have two requirements:
*
@ -40,20 +50,55 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
else
dsb(ish);
/*
* If we're already in the desired context, then there's nothing to do.
*/
if (vcpu) {
/*
* We're in guest context. However, for this to work, this needs
* to be called from within __kvm_vcpu_run(), which ensures that
* __hyp_running_vcpu is set to the current guest vcpu.
*/
if (mmu == vcpu->arch.hw_mmu || WARN_ON(mmu != host_s2_mmu))
return;
cxt->mmu = vcpu->arch.hw_mmu;
} else {
/* We're in host context. */
if (mmu == host_s2_mmu)
return;
cxt->mmu = host_s2_mmu;
}
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
u64 val;
/*
* For CPUs that are affected by ARM 1319367, we need to
* avoid a host Stage-1 walk while we have the guest's
* VMID set in the VTTBR in order to invalidate TLBs.
* We're guaranteed that the S1 MMU is enabled, so we can
* simply set the EPD bits to avoid any further TLB fill.
* avoid a Stage-1 walk with the old VMID while we have
* the new VMID set in the VTTBR in order to invalidate TLBs.
* We're guaranteed that the host S1 MMU is enabled, so
* we can simply set the EPD bits to avoid any further
* TLB fill. For guests, we ensure that the S1 MMU is
* temporarily enabled in the next context.
*/
val = cxt->tcr = read_sysreg_el1(SYS_TCR);
val |= TCR_EPD1_MASK | TCR_EPD0_MASK;
write_sysreg_el1(val, SYS_TCR);
isb();
if (vcpu) {
val = cxt->sctlr = read_sysreg_el1(SYS_SCTLR);
if (!(val & SCTLR_ELx_M)) {
val |= SCTLR_ELx_M;
write_sysreg_el1(val, SYS_SCTLR);
isb();
}
} else {
/* The host S1 MMU is always enabled. */
cxt->sctlr = SCTLR_ELx_M;
}
}
/*
@ -62,18 +107,40 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
* ensuring that we always have an ISB, but not two ISBs back
* to back.
*/
__load_stage2(mmu, kern_hyp_va(mmu->arch));
if (vcpu)
__load_host_stage2();
else
__load_stage2(mmu, kern_hyp_va(mmu->arch));
asm(ALTERNATIVE("isb", "nop", ARM64_WORKAROUND_SPECULATIVE_AT));
}
static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
static void exit_vmid_context(struct tlb_inv_context *cxt)
{
__load_host_stage2();
struct kvm_s2_mmu *mmu = cxt->mmu;
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
vcpu = host_ctxt->__hyp_running_vcpu;
if (!mmu)
return;
if (vcpu)
__load_stage2(mmu, kern_hyp_va(mmu->arch));
else
__load_host_stage2();
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
/* Ensure write of the host VMID */
/* Ensure write of the old VMID */
isb();
/* Restore the host's TCR_EL1 */
if (!(cxt->sctlr & SCTLR_ELx_M)) {
write_sysreg_el1(cxt->sctlr, SYS_SCTLR);
isb();
}
write_sysreg_el1(cxt->tcr, SYS_TCR);
}
}
@ -84,7 +151,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
/*
* We could do so much better if we had the VA as well.
@ -105,7 +172,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
@ -114,7 +181,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, true);
enter_vmid_context(mmu, &cxt, true);
/*
* We could do so much better if we had the VA as well.
@ -135,7 +202,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
@ -152,7 +219,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
start = round_down(start, stride);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride,
TLBI_TTL_UNKNOWN);
@ -162,7 +229,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
@ -170,13 +237,13 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
__tlbi(vmalls12e1is);
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
@ -184,19 +251,19 @@ void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
__tlbi(vmalle1);
asm volatile("ic iallu");
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_vm_context(void)
{
/* Same remark as in __tlb_switch_to_guest() */
/* Same remark as in enter_vmid_context() */
dsb(ish);
__tlbi(alle1is);
dsb(ish);

View File

@ -914,12 +914,12 @@ static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
{
u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
return kvm_pte_valid(pte) && memattr == KVM_S2_MEMATTR(pgt, NORMAL);
}
static bool stage2_pte_executable(kvm_pte_t pte)
{
return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
return kvm_pte_valid(pte) && !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
}
static u64 stage2_map_walker_phys_addr(const struct kvm_pgtable_visit_ctx *ctx,
@ -979,6 +979,21 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
if (!stage2_pte_needs_update(ctx->old, new))
return -EAGAIN;
/* If we're only changing software bits, then store them and go! */
if (!kvm_pgtable_walk_shared(ctx) &&
!((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW)) {
bool old_is_counted = stage2_pte_is_counted(ctx->old);
if (old_is_counted != stage2_pte_is_counted(new)) {
if (old_is_counted)
mm_ops->put_page(ctx->ptep);
else
mm_ops->get_page(ctx->ptep);
}
WARN_ON_ONCE(!stage2_try_set_pte(ctx, new));
return 0;
}
if (!stage2_try_break_pte(ctx, data->mmu))
return -EAGAIN;
@ -1370,7 +1385,7 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
struct kvm_pgtable *pgt = ctx->arg;
struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
if (!stage2_pte_cacheable(pgt, ctx->old))
return 0;
if (mm_ops->dcache_clean_inval_poc)

View File

@ -330,7 +330,7 @@ void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if)
write_gicreg(0, ICH_HCR_EL2);
}
void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
static void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
{
u64 val;
u32 nr_pre_bits;
@ -363,7 +363,7 @@ void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
}
}
void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if)
static void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if)
{
u64 val;
u32 nr_pre_bits;
@ -455,16 +455,35 @@ u64 __vgic_v3_get_gic_config(void)
return val;
}
u64 __vgic_v3_read_vmcr(void)
static u64 __vgic_v3_read_vmcr(void)
{
return read_gicreg(ICH_VMCR_EL2);
}
void __vgic_v3_write_vmcr(u32 vmcr)
static void __vgic_v3_write_vmcr(u32 vmcr)
{
write_gicreg(vmcr, ICH_VMCR_EL2);
}
void __vgic_v3_save_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if)
{
__vgic_v3_save_aprs(cpu_if);
if (cpu_if->vgic_sre)
cpu_if->vgic_vmcr = __vgic_v3_read_vmcr();
}
void __vgic_v3_restore_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if)
{
/*
* If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
* is dependent on ICC_SRE_EL1.SRE, and we have to perform the
* VMCR_EL2 save/restore in the world switch.
*/
if (cpu_if->vgic_sre)
__vgic_v3_write_vmcr(cpu_if->vgic_vmcr);
__vgic_v3_restore_aprs(cpu_if);
}
static int __vgic_v3_bpr_min(void)
{
/* See Pseudocode for VPriorityGroup */

View File

@ -33,11 +33,43 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
/*
* HCR_EL2 bits that the NV guest can freely change (no RES0/RES1
* semantics, irrespective of the configuration), but that cannot be
* applied to the actual HW as things would otherwise break badly.
*
* - TGE: we want the guest to use EL1, which is incompatible with
* this bit being set
*
* - API/APK: they are already accounted for by vcpu_load(), and can
* only take effect across a load/put cycle (such as ERET)
*/
#define NV_HCR_GUEST_EXCLUDE (HCR_TGE | HCR_API | HCR_APK)
static u64 __compute_hcr(struct kvm_vcpu *vcpu)
{
u64 hcr = vcpu->arch.hcr_el2;
if (!vcpu_has_nv(vcpu))
return hcr;
if (is_hyp_ctxt(vcpu)) {
hcr |= HCR_NV | HCR_NV2 | HCR_AT | HCR_TTLB;
if (!vcpu_el2_e2h_is_set(vcpu))
hcr |= HCR_NV1;
write_sysreg_s(vcpu->arch.ctxt.vncr_array, SYS_VNCR_EL2);
}
return hcr | (__vcpu_sys_reg(vcpu, HCR_EL2) & ~NV_HCR_GUEST_EXCLUDE);
}
static void __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
___activate_traps(vcpu);
___activate_traps(vcpu, __compute_hcr(vcpu));
if (has_cntpoff()) {
struct timer_map map;
@ -75,7 +107,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val |= CPTR_EL2_TAM;
if (guest_owns_fp_regs(vcpu)) {
if (guest_owns_fp_regs()) {
if (vcpu_has_sve(vcpu))
val |= CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN;
} else {
@ -162,6 +194,8 @@ static void __vcpu_put_deactivate_traps(struct kvm_vcpu *vcpu)
void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu)
{
host_data_ptr(host_ctxt)->__hyp_running_vcpu = vcpu;
__vcpu_load_switch_sysregs(vcpu);
__vcpu_load_activate_traps(vcpu);
__load_stage2(vcpu->arch.hw_mmu, vcpu->arch.hw_mmu->arch);
@ -171,6 +205,61 @@ void kvm_vcpu_put_vhe(struct kvm_vcpu *vcpu)
{
__vcpu_put_deactivate_traps(vcpu);
__vcpu_put_switch_sysregs(vcpu);
host_data_ptr(host_ctxt)->__hyp_running_vcpu = NULL;
}
static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
{
u64 esr = kvm_vcpu_get_esr(vcpu);
u64 spsr, elr, mode;
/*
* Going through the whole put/load motions is a waste of time
* if this is a VHE guest hypervisor returning to its own
* userspace, or the hypervisor performing a local exception
* return. No need to save/restore registers, no need to
* switch S2 MMU. Just do the canonical ERET.
*
* Unless the trap has to be forwarded further down the line,
* of course...
*/
if ((__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV) ||
(__vcpu_sys_reg(vcpu, HFGITR_EL2) & HFGITR_EL2_ERET))
return false;
spsr = read_sysreg_el1(SYS_SPSR);
mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
switch (mode) {
case PSR_MODE_EL0t:
if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
return false;
break;
case PSR_MODE_EL2t:
mode = PSR_MODE_EL1t;
break;
case PSR_MODE_EL2h:
mode = PSR_MODE_EL1h;
break;
default:
return false;
}
/* If ERETAx fails, take the slow path */
if (esr_iss_is_eretax(esr)) {
if (!(vcpu_has_ptrauth(vcpu) && kvm_auth_eretax(vcpu, &elr)))
return false;
} else {
elr = read_sysreg_el1(SYS_ELR);
}
spsr = (spsr & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
write_sysreg_el2(spsr, SYS_SPSR);
write_sysreg_el2(elr, SYS_ELR);
return true;
}
static const exit_handler_fn hyp_exit_handlers[] = {
@ -182,7 +271,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
[ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low,
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
[ESR_ELx_EC_ERET] = kvm_hyp_handle_eret,
[ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops,
};
@ -197,7 +286,7 @@ static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
* If we were in HYP context on entry, adjust the PSTATE view
* so that the usual helpers work correctly.
*/
if (unlikely(vcpu_get_flag(vcpu, VCPU_HYP_CONTEXT))) {
if (vcpu_has_nv(vcpu) && (read_sysreg(hcr_el2) & HCR_NV)) {
u64 mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
switch (mode) {
@ -221,8 +310,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *guest_ctxt;
u64 exit_code;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt->__hyp_running_vcpu = vcpu;
host_ctxt = host_data_ptr(host_ctxt);
guest_ctxt = &vcpu->arch.ctxt;
sysreg_save_host_state_vhe(host_ctxt);
@ -240,11 +328,6 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
sysreg_restore_guest_state_vhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
if (is_hyp_ctxt(vcpu))
vcpu_set_flag(vcpu, VCPU_HYP_CONTEXT);
else
vcpu_clear_flag(vcpu, VCPU_HYP_CONTEXT);
do {
/* Jump in the fire! */
exit_code = __guest_enter(vcpu);
@ -258,7 +341,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
sysreg_restore_host_state_vhe(host_ctxt);
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)
if (guest_owns_fp_regs())
__fpsimd_save_fpexc32(vcpu);
__debug_switch_to_host(vcpu);
@ -306,7 +389,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 par)
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
vcpu = host_ctxt->__hyp_running_vcpu;
__deactivate_traps(vcpu);

View File

@ -67,7 +67,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
struct kvm_cpu_context *host_ctxt;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
__sysreg_save_user_state(host_ctxt);
/*
@ -110,7 +110,7 @@ void __vcpu_put_switch_sysregs(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
struct kvm_cpu_context *host_ctxt;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
__sysreg_save_el1_state(guest_ctxt);
__sysreg_save_user_state(guest_ctxt);

View File

@ -17,8 +17,8 @@ struct tlb_inv_context {
u64 sctlr;
};
static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt)
static void enter_vmid_context(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt)
{
struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
u64 val;
@ -67,7 +67,7 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
isb();
}
static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
static void exit_vmid_context(struct tlb_inv_context *cxt)
{
/*
* We're done with the TLB operation, let's restore the host's
@ -97,7 +97,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
dsb(ishst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
/*
* We could do so much better if we had the VA as well.
@ -118,7 +118,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
@ -129,7 +129,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
dsb(nshst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
/*
* We could do so much better if we had the VA as well.
@ -150,7 +150,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
@ -169,7 +169,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
dsb(ishst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride,
TLBI_TTL_UNKNOWN);
@ -179,7 +179,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
@ -189,13 +189,13 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
dsb(ishst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
__tlbi(vmalls12e1is);
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
@ -203,14 +203,14 @@ void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
__tlbi(vmalle1);
asm volatile("ic iallu");
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_vm_context(void)

View File

@ -86,7 +86,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
/* Detect an already handled MMIO return */
if (unlikely(!vcpu->mmio_needed))
return 0;
return 1;
vcpu->mmio_needed = 0;
@ -117,7 +117,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
*/
kvm_incr_pc(vcpu);
return 0;
return 1;
}
int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
@ -133,11 +133,19 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
/*
* No valid syndrome? Ask userspace for help if it has
* volunteered to do so, and bail out otherwise.
*
* In the protected VM case, there isn't much userspace can do
* though, so directly deliver an exception to the guest.
*/
if (!kvm_vcpu_dabt_isvalid(vcpu)) {
trace_kvm_mmio_nisv(*vcpu_pc(vcpu), kvm_vcpu_get_esr(vcpu),
kvm_vcpu_get_hfar(vcpu), fault_ipa);
if (vcpu_is_protected(vcpu)) {
kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
return 1;
}
if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
&vcpu->kvm->arch.flags)) {
run->exit_reason = KVM_EXIT_ARM_NISV;

View File

@ -1522,8 +1522,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
read_lock(&kvm->mmu_lock);
pgt = vcpu->arch.hw_mmu->pgt;
if (mmu_invalidate_retry(kvm, mmu_seq))
if (mmu_invalidate_retry(kvm, mmu_seq)) {
ret = -EAGAIN;
goto out_unlock;
}
/*
* If we are not forced to use page mapping, check if we are
@ -1581,6 +1583,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
memcache,
KVM_PGTABLE_WALK_HANDLE_FAULT |
KVM_PGTABLE_WALK_SHARED);
out_unlock:
read_unlock(&kvm->mmu_lock);
/* Mark the page dirty only if the fault is handled successfully */
if (writable && !ret) {
@ -1588,8 +1592,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
mark_page_dirty_in_slot(kvm, memslot, gfn);
}
out_unlock:
read_unlock(&kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
return ret != -EAGAIN ? ret : 0;
}
@ -1768,40 +1770,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
return false;
}
bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
kvm_pfn_t pfn = pte_pfn(range->arg.pte);
if (!kvm->arch.mmu.pgt)
return false;
WARN_ON(range->end - range->start != 1);
/*
* If the page isn't tagged, defer to user_mem_abort() for sanitising
* the MTE tags. The S2 pte should have been unmapped by
* mmu_notifier_invalidate_range_end().
*/
if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn)))
return false;
/*
* We've moved a page around, probably through CoW, so let's treat
* it just like a translation fault and the map handler will clean
* the cache to the PoC.
*
* The MMU notifiers will have unmapped a huge PMD before calling
* ->change_pte() (which in turn calls kvm_set_spte_gfn()) and
* therefore we never need to clear out a huge PMD through this
* calling path and a memcache is not required.
*/
kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
PAGE_SIZE, __pfn_to_phys(pfn),
KVM_PGTABLE_PROT_R, NULL, 0);
return false;
}
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;

View File

@ -35,13 +35,9 @@ static u64 limit_nv_id_reg(u32 id, u64 val)
break;
case SYS_ID_AA64ISAR1_EL1:
/* Support everything but PtrAuth and Spec Invalidation */
/* Support everything but Spec Invalidation */
val &= ~(GENMASK_ULL(63, 56) |
NV_FTR(ISAR1, SPECRES) |
NV_FTR(ISAR1, GPI) |
NV_FTR(ISAR1, GPA) |
NV_FTR(ISAR1, API) |
NV_FTR(ISAR1, APA));
NV_FTR(ISAR1, SPECRES));
break;
case SYS_ID_AA64PFR0_EL1:

206
arch/arm64/kvm/pauth.c Normal file
View File

@ -0,0 +1,206 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2024 - Google LLC
* Author: Marc Zyngier <maz@kernel.org>
*
* Primitive PAuth emulation for ERETAA/ERETAB.
*
* This code assumes that is is run from EL2, and that it is part of
* the emulation of ERETAx for a guest hypervisor. That's a lot of
* baked-in assumptions and shortcuts.
*
* Do no reuse for anything else!
*/
#include <linux/kvm_host.h>
#include <asm/gpr-num.h>
#include <asm/kvm_emulate.h>
#include <asm/pointer_auth.h>
/* PACGA Xd, Xn, Xm */
#define PACGA(d,n,m) \
asm volatile(__DEFINE_ASM_GPR_NUMS \
".inst 0x9AC03000 |" \
"(.L__gpr_num_%[Rd] << 0) |" \
"(.L__gpr_num_%[Rn] << 5) |" \
"(.L__gpr_num_%[Rm] << 16)\n" \
: [Rd] "=r" ((d)) \
: [Rn] "r" ((n)), [Rm] "r" ((m)))
static u64 compute_pac(struct kvm_vcpu *vcpu, u64 ptr,
struct ptrauth_key ikey)
{
struct ptrauth_key gkey;
u64 mod, pac = 0;
preempt_disable();
if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU))
mod = __vcpu_sys_reg(vcpu, SP_EL2);
else
mod = read_sysreg(sp_el1);
gkey.lo = read_sysreg_s(SYS_APGAKEYLO_EL1);
gkey.hi = read_sysreg_s(SYS_APGAKEYHI_EL1);
__ptrauth_key_install_nosync(APGA, ikey);
isb();
PACGA(pac, ptr, mod);
isb();
__ptrauth_key_install_nosync(APGA, gkey);
preempt_enable();
/* PAC in the top 32bits */
return pac;
}
static bool effective_tbi(struct kvm_vcpu *vcpu, bool bit55)
{
u64 tcr = vcpu_read_sys_reg(vcpu, TCR_EL2);
bool tbi, tbid;
/*
* Since we are authenticating an instruction address, we have
* to take TBID into account. If E2H==0, ignore VA[55], as
* TCR_EL2 only has a single TBI/TBID. If VA[55] was set in
* this case, this is likely a guest bug...
*/
if (!vcpu_el2_e2h_is_set(vcpu)) {
tbi = tcr & BIT(20);
tbid = tcr & BIT(29);
} else if (bit55) {
tbi = tcr & TCR_TBI1;
tbid = tcr & TCR_TBID1;
} else {
tbi = tcr & TCR_TBI0;
tbid = tcr & TCR_TBID0;
}
return tbi && !tbid;
}
static int compute_bottom_pac(struct kvm_vcpu *vcpu, bool bit55)
{
static const int maxtxsz = 39; // Revisit these two values once
static const int mintxsz = 16; // (if) we support TTST/LVA/LVA2
u64 tcr = vcpu_read_sys_reg(vcpu, TCR_EL2);
int txsz;
if (!vcpu_el2_e2h_is_set(vcpu) || !bit55)
txsz = FIELD_GET(TCR_T0SZ_MASK, tcr);
else
txsz = FIELD_GET(TCR_T1SZ_MASK, tcr);
return 64 - clamp(txsz, mintxsz, maxtxsz);
}
static u64 compute_pac_mask(struct kvm_vcpu *vcpu, bool bit55)
{
int bottom_pac;
u64 mask;
bottom_pac = compute_bottom_pac(vcpu, bit55);
mask = GENMASK(54, bottom_pac);
if (!effective_tbi(vcpu, bit55))
mask |= GENMASK(63, 56);
return mask;
}
static u64 to_canonical_addr(struct kvm_vcpu *vcpu, u64 ptr, u64 mask)
{
bool bit55 = !!(ptr & BIT(55));
if (bit55)
return ptr | mask;
return ptr & ~mask;
}
static u64 corrupt_addr(struct kvm_vcpu *vcpu, u64 ptr)
{
bool bit55 = !!(ptr & BIT(55));
u64 mask, error_code;
int shift;
if (effective_tbi(vcpu, bit55)) {
mask = GENMASK(54, 53);
shift = 53;
} else {
mask = GENMASK(62, 61);
shift = 61;
}
if (esr_iss_is_eretab(kvm_vcpu_get_esr(vcpu)))
error_code = 2 << shift;
else
error_code = 1 << shift;
ptr &= ~mask;
ptr |= error_code;
return ptr;
}
/*
* Authenticate an ERETAA/ERETAB instruction, returning true if the
* authentication succeeded and false otherwise. In all cases, *elr
* contains the VA to ERET to. Potential exception injection is left
* to the caller.
*/
bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr)
{
u64 sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL2);
u64 esr = kvm_vcpu_get_esr(vcpu);
u64 ptr, cptr, pac, mask;
struct ptrauth_key ikey;
*elr = ptr = vcpu_read_sys_reg(vcpu, ELR_EL2);
/* We assume we're already in the context of an ERETAx */
if (esr_iss_is_eretab(esr)) {
if (!(sctlr & SCTLR_EL1_EnIB))
return true;
ikey.lo = __vcpu_sys_reg(vcpu, APIBKEYLO_EL1);
ikey.hi = __vcpu_sys_reg(vcpu, APIBKEYHI_EL1);
} else {
if (!(sctlr & SCTLR_EL1_EnIA))
return true;
ikey.lo = __vcpu_sys_reg(vcpu, APIAKEYLO_EL1);
ikey.hi = __vcpu_sys_reg(vcpu, APIAKEYHI_EL1);
}
mask = compute_pac_mask(vcpu, !!(ptr & BIT(55)));
cptr = to_canonical_addr(vcpu, ptr, mask);
pac = compute_pac(vcpu, cptr, ikey);
/*
* Slightly deviate from the pseudocode: if we have a PAC
* match with the signed pointer, then it must be good.
* Anything after this point is pure error handling.
*/
if ((pac & mask) == (ptr & mask)) {
*elr = cptr;
return true;
}
/*
* Authentication failed, corrupt the canonical address if
* PAuth2 isn't implemented, or some XORing if it is.
*/
if (!kvm_has_pauth(vcpu->kvm, PAuth2))
cptr = corrupt_addr(vcpu, cptr);
else
cptr = ptr ^ (pac & mask);
*elr = cptr;
return false;
}

View File

@ -222,7 +222,6 @@ void pkvm_destroy_hyp_vm(struct kvm *host_kvm)
int pkvm_init_host_vm(struct kvm *host_kvm)
{
mutex_init(&host_kvm->lock);
return 0;
}
@ -259,6 +258,7 @@ static int __init finalize_pkvm(void)
* at, which would end badly once inaccessible.
*/
kmemleak_free_part(__hyp_bss_start, __hyp_bss_end - __hyp_bss_start);
kmemleak_free_part(__hyp_rodata_start, __hyp_rodata_end - __hyp_rodata_start);
kmemleak_free_part_phys(hyp_mem_base, hyp_mem_size);
ret = pkvm_drop_host_privileges();

View File

@ -232,7 +232,7 @@ bool kvm_set_pmuserenr(u64 val)
if (!vcpu || !vcpu_get_flag(vcpu, PMUSERENR_ON_CPU))
return false;
hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
hctxt = host_data_ptr(host_ctxt);
ctxt_sys_reg(hctxt, PMUSERENR_EL0) = val;
return true;
}

View File

@ -151,7 +151,6 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
{
void *sve_state = vcpu->arch.sve_state;
kvm_vcpu_unshare_task_fp(vcpu);
kvm_unshare_hyp(vcpu, vcpu + 1);
if (sve_state)
kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));

View File

@ -1568,17 +1568,31 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, const struct sys_reg_desc *r
return IDREG(vcpu->kvm, reg_to_encoding(r));
}
static bool is_feature_id_reg(u32 encoding)
{
return (sys_reg_Op0(encoding) == 3 &&
(sys_reg_Op1(encoding) < 2 || sys_reg_Op1(encoding) == 3) &&
sys_reg_CRn(encoding) == 0 &&
sys_reg_CRm(encoding) <= 7);
}
/*
* Return true if the register's (Op0, Op1, CRn, CRm, Op2) is
* (3, 0, 0, crm, op2), where 1<=crm<8, 0<=op2<8.
* (3, 0, 0, crm, op2), where 1<=crm<8, 0<=op2<8, which is the range of ID
* registers KVM maintains on a per-VM basis.
*/
static inline bool is_id_reg(u32 id)
static inline bool is_vm_ftr_id_reg(u32 id)
{
return (sys_reg_Op0(id) == 3 && sys_reg_Op1(id) == 0 &&
sys_reg_CRn(id) == 0 && sys_reg_CRm(id) >= 1 &&
sys_reg_CRm(id) < 8);
}
static inline bool is_vcpu_ftr_id_reg(u32 id)
{
return is_feature_id_reg(id) && !is_vm_ftr_id_reg(id);
}
static inline bool is_aa32_id_reg(u32 id)
{
return (sys_reg_Op0(id) == 3 && sys_reg_Op1(id) == 0 &&
@ -2338,7 +2352,6 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_AA64MMFR0_EL1_TGRAN16_2)),
ID_WRITABLE(ID_AA64MMFR1_EL1, ~(ID_AA64MMFR1_EL1_RES0 |
ID_AA64MMFR1_EL1_HCX |
ID_AA64MMFR1_EL1_XNX |
ID_AA64MMFR1_EL1_TWED |
ID_AA64MMFR1_EL1_XNX |
ID_AA64MMFR1_EL1_VH |
@ -3069,12 +3082,14 @@ static bool check_sysreg_table(const struct sys_reg_desc *table, unsigned int n,
for (i = 0; i < n; i++) {
if (!is_32 && table[i].reg && !table[i].reset) {
kvm_err("sys_reg table %pS entry %d lacks reset\n", &table[i], i);
kvm_err("sys_reg table %pS entry %d (%s) lacks reset\n",
&table[i], i, table[i].name);
return false;
}
if (i && cmp_sys_reg(&table[i-1], &table[i]) >= 0) {
kvm_err("sys_reg table %pS entry %d out of order\n", &table[i - 1], i - 1);
kvm_err("sys_reg table %pS entry %d (%s -> %s) out of order\n",
&table[i], i, table[i - 1].name, table[i].name);
return false;
}
}
@ -3509,26 +3524,25 @@ void kvm_sys_regs_create_debugfs(struct kvm *kvm)
&idregs_debug_fops);
}
static void kvm_reset_id_regs(struct kvm_vcpu *vcpu)
static void reset_vm_ftr_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *reg)
{
const struct sys_reg_desc *idreg = first_idreg;
u32 id = reg_to_encoding(idreg);
u32 id = reg_to_encoding(reg);
struct kvm *kvm = vcpu->kvm;
if (test_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags))
return;
lockdep_assert_held(&kvm->arch.config_lock);
IDREG(kvm, id) = reg->reset(vcpu, reg);
}
/* Initialize all idregs */
while (is_id_reg(id)) {
IDREG(kvm, id) = idreg->reset(vcpu, idreg);
static void reset_vcpu_ftr_id_reg(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *reg)
{
if (kvm_vcpu_initialized(vcpu))
return;
idreg++;
id = reg_to_encoding(idreg);
}
set_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags);
reg->reset(vcpu, reg);
}
/**
@ -3540,19 +3554,24 @@ static void kvm_reset_id_regs(struct kvm_vcpu *vcpu)
*/
void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;
unsigned long i;
kvm_reset_id_regs(vcpu);
for (i = 0; i < ARRAY_SIZE(sys_reg_descs); i++) {
const struct sys_reg_desc *r = &sys_reg_descs[i];
if (is_id_reg(reg_to_encoding(r)))
if (!r->reset)
continue;
if (r->reset)
if (is_vm_ftr_id_reg(reg_to_encoding(r)))
reset_vm_ftr_id_reg(vcpu, r);
else if (is_vcpu_ftr_id_reg(reg_to_encoding(r)))
reset_vcpu_ftr_id_reg(vcpu, r);
else
r->reset(vcpu, r);
}
set_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags);
}
/**
@ -3978,14 +3997,6 @@ int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
sys_reg_CRm(r), \
sys_reg_Op2(r))
static bool is_feature_id_reg(u32 encoding)
{
return (sys_reg_Op0(encoding) == 3 &&
(sys_reg_Op1(encoding) < 2 || sys_reg_Op1(encoding) == 3) &&
sys_reg_CRn(encoding) == 0 &&
sys_reg_CRm(encoding) <= 7);
}
int kvm_vm_ioctl_get_reg_writable_masks(struct kvm *kvm, struct reg_mask_range *range)
{
const void *zero_page = page_to_virt(ZERO_PAGE(0));
@ -4014,7 +4025,7 @@ int kvm_vm_ioctl_get_reg_writable_masks(struct kvm *kvm, struct reg_mask_range *
* compliant with a given revision of the architecture, but the
* RES0/RES1 definitions allow us to do that.
*/
if (is_id_reg(encoding)) {
if (is_vm_ftr_id_reg(encoding)) {
if (!reg->val ||
(is_aa32_id_reg(encoding) && !kvm_supports_32bit_el0()))
continue;

View File

@ -28,27 +28,65 @@ struct vgic_state_iter {
int nr_lpis;
int dist_id;
int vcpu_id;
int intid;
unsigned long intid;
int lpi_idx;
u32 *lpi_array;
};
static void iter_next(struct vgic_state_iter *iter)
static void iter_next(struct kvm *kvm, struct vgic_state_iter *iter)
{
struct vgic_dist *dist = &kvm->arch.vgic;
if (iter->dist_id == 0) {
iter->dist_id++;
return;
}
/*
* Let the xarray drive the iterator after the last SPI, as the iterator
* has exhausted the sequentially-allocated INTID space.
*/
if (iter->intid >= (iter->nr_spis + VGIC_NR_PRIVATE_IRQS - 1)) {
if (iter->lpi_idx < iter->nr_lpis)
xa_find_after(&dist->lpi_xa, &iter->intid,
VGIC_LPI_MAX_INTID,
LPI_XA_MARK_DEBUG_ITER);
iter->lpi_idx++;
return;
}
iter->intid++;
if (iter->intid == VGIC_NR_PRIVATE_IRQS &&
++iter->vcpu_id < iter->nr_cpus)
iter->intid = 0;
}
if (iter->intid >= (iter->nr_spis + VGIC_NR_PRIVATE_IRQS)) {
if (iter->lpi_idx < iter->nr_lpis)
iter->intid = iter->lpi_array[iter->lpi_idx];
iter->lpi_idx++;
static int iter_mark_lpis(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq;
unsigned long intid;
int nr_lpis = 0;
xa_for_each(&dist->lpi_xa, intid, irq) {
if (!vgic_try_get_irq_kref(irq))
continue;
xa_set_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER);
nr_lpis++;
}
return nr_lpis;
}
static void iter_unmark_lpis(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq;
unsigned long intid;
xa_for_each(&dist->lpi_xa, intid, irq) {
xa_clear_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER);
vgic_put_irq(kvm, irq);
}
}
@ -61,15 +99,12 @@ static void iter_init(struct kvm *kvm, struct vgic_state_iter *iter,
iter->nr_cpus = nr_cpus;
iter->nr_spis = kvm->arch.vgic.nr_spis;
if (kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
iter->nr_lpis = vgic_copy_lpi_list(kvm, NULL, &iter->lpi_array);
if (iter->nr_lpis < 0)
iter->nr_lpis = 0;
}
if (kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
iter->nr_lpis = iter_mark_lpis(kvm);
/* Fast forward to the right position if needed */
while (pos--)
iter_next(iter);
iter_next(kvm, iter);
}
static bool end_of_vgic(struct vgic_state_iter *iter)
@ -114,7 +149,7 @@ static void *vgic_debug_next(struct seq_file *s, void *v, loff_t *pos)
struct vgic_state_iter *iter = kvm->arch.vgic.iter;
++*pos;
iter_next(iter);
iter_next(kvm, iter);
if (end_of_vgic(iter))
iter = NULL;
return iter;
@ -134,13 +169,14 @@ static void vgic_debug_stop(struct seq_file *s, void *v)
mutex_lock(&kvm->arch.config_lock);
iter = kvm->arch.vgic.iter;
kfree(iter->lpi_array);
iter_unmark_lpis(kvm);
kfree(iter);
kvm->arch.vgic.iter = NULL;
mutex_unlock(&kvm->arch.config_lock);
}
static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
static void print_dist_state(struct seq_file *s, struct vgic_dist *dist,
struct vgic_state_iter *iter)
{
bool v3 = dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3;
@ -149,7 +185,7 @@ static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
seq_printf(s, "vgic_model:\t%s\n", v3 ? "GICv3" : "GICv2");
seq_printf(s, "nr_spis:\t%d\n", dist->nr_spis);
if (v3)
seq_printf(s, "nr_lpis:\t%d\n", atomic_read(&dist->lpi_count));
seq_printf(s, "nr_lpis:\t%d\n", iter->nr_lpis);
seq_printf(s, "enabled:\t%d\n", dist->enabled);
seq_printf(s, "\n");
@ -236,7 +272,7 @@ static int vgic_debug_show(struct seq_file *s, void *v)
unsigned long flags;
if (iter->dist_id == 0) {
print_dist_state(s, &kvm->arch.vgic);
print_dist_state(s, &kvm->arch.vgic, iter);
return 0;
}
@ -246,11 +282,13 @@ static int vgic_debug_show(struct seq_file *s, void *v)
if (iter->vcpu_id < iter->nr_cpus)
vcpu = kvm_get_vcpu(kvm, iter->vcpu_id);
/*
* Expect this to succeed, as iter_mark_lpis() takes a reference on
* every LPI to be visited.
*/
irq = vgic_get_irq(kvm, vcpu, iter->intid);
if (!irq) {
seq_printf(s, " LPI %4d freed\n", iter->intid);
return 0;
}
if (WARN_ON_ONCE(!irq))
return -EINVAL;
raw_spin_lock_irqsave(&irq->irq_lock, flags);
print_irq_state(s, irq, vcpu);

View File

@ -53,8 +53,6 @@ void kvm_vgic_early_init(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
INIT_LIST_HEAD(&dist->lpi_translation_cache);
raw_spin_lock_init(&dist->lpi_list_lock);
xa_init_flags(&dist->lpi_xa, XA_FLAGS_LOCK_IRQ);
}
@ -182,27 +180,22 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
return 0;
}
/**
* kvm_vgic_vcpu_init() - Initialize static VGIC VCPU data
* structures and register VCPU-specific KVM iodevs
*
* @vcpu: pointer to the VCPU being created and initialized
*
* Only do initialization, but do not actually enable the
* VGIC CPU interface
*/
int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
int ret = 0;
int i;
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
lockdep_assert_held(&vcpu->kvm->arch.config_lock);
INIT_LIST_HEAD(&vgic_cpu->ap_list_head);
raw_spin_lock_init(&vgic_cpu->ap_list_lock);
atomic_set(&vgic_cpu->vgic_v3.its_vpe.vlpi_count, 0);
if (vgic_cpu->private_irqs)
return 0;
vgic_cpu->private_irqs = kcalloc(VGIC_NR_PRIVATE_IRQS,
sizeof(struct vgic_irq),
GFP_KERNEL_ACCOUNT);
if (!vgic_cpu->private_irqs)
return -ENOMEM;
/*
* Enable and configure all SGIs to be edge-triggered and
@ -227,9 +220,48 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
}
}
return 0;
}
static int vgic_allocate_private_irqs(struct kvm_vcpu *vcpu)
{
int ret;
mutex_lock(&vcpu->kvm->arch.config_lock);
ret = vgic_allocate_private_irqs_locked(vcpu);
mutex_unlock(&vcpu->kvm->arch.config_lock);
return ret;
}
/**
* kvm_vgic_vcpu_init() - Initialize static VGIC VCPU data
* structures and register VCPU-specific KVM iodevs
*
* @vcpu: pointer to the VCPU being created and initialized
*
* Only do initialization, but do not actually enable the
* VGIC CPU interface
*/
int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
int ret = 0;
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
INIT_LIST_HEAD(&vgic_cpu->ap_list_head);
raw_spin_lock_init(&vgic_cpu->ap_list_lock);
atomic_set(&vgic_cpu->vgic_v3.its_vpe.vlpi_count, 0);
if (!irqchip_in_kernel(vcpu->kvm))
return 0;
ret = vgic_allocate_private_irqs(vcpu);
if (ret)
return ret;
/*
* If we are creating a VCPU with a GICv3 we must also register the
* KVM io device for the redistributor that belongs to this VCPU.
@ -285,10 +317,13 @@ int vgic_init(struct kvm *kvm)
/* Initialize groups on CPUs created before the VGIC type was known */
kvm_for_each_vcpu(idx, vcpu, kvm) {
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
ret = vgic_allocate_private_irqs_locked(vcpu);
if (ret)
goto out;
for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) {
struct vgic_irq *irq = &vgic_cpu->private_irqs[i];
struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, i);
switch (dist->vgic_model) {
case KVM_DEV_TYPE_ARM_VGIC_V3:
irq->group = 1;
@ -300,14 +335,15 @@ int vgic_init(struct kvm *kvm)
break;
default:
ret = -EINVAL;
goto out;
}
vgic_put_irq(kvm, irq);
if (ret)
goto out;
}
}
if (vgic_has_its(kvm))
vgic_lpi_translation_cache_init(kvm);
/*
* If we have GICv4.1 enabled, unconditionally request enable the
* v4 support so that we get HW-accelerated vSGIs. Otherwise, only
@ -361,9 +397,6 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
}
if (vgic_has_its(kvm))
vgic_lpi_translation_cache_destroy(kvm);
if (vgic_supports_direct_msis(kvm))
vgic_v4_teardown(kvm);
@ -381,6 +414,9 @@ static void __kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
vgic_flush_pending_lpis(vcpu);
INIT_LIST_HEAD(&vgic_cpu->ap_list_head);
kfree(vgic_cpu->private_irqs);
vgic_cpu->private_irqs = NULL;
if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
vgic_unregister_redist_iodev(vcpu);
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;

View File

@ -23,6 +23,8 @@
#include "vgic.h"
#include "vgic-mmio.h"
static struct kvm_device_ops kvm_arm_vgic_its_ops;
static int vgic_its_save_tables_v0(struct vgic_its *its);
static int vgic_its_restore_tables_v0(struct vgic_its *its);
static int vgic_its_commit_v0(struct vgic_its *its);
@ -67,7 +69,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
irq->target_vcpu = vcpu;
irq->group = 1;
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
xa_lock_irqsave(&dist->lpi_xa, flags);
/*
* There could be a race with another vgic_add_lpi(), so we need to
@ -82,17 +84,14 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
goto out_unlock;
}
ret = xa_err(xa_store(&dist->lpi_xa, intid, irq, 0));
ret = xa_err(__xa_store(&dist->lpi_xa, intid, irq, 0));
if (ret) {
xa_release(&dist->lpi_xa, intid);
kfree(irq);
goto out_unlock;
}
atomic_inc(&dist->lpi_count);
out_unlock:
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
xa_unlock_irqrestore(&dist->lpi_xa, flags);
if (ret)
return ERR_PTR(ret);
@ -150,14 +149,6 @@ struct its_ite {
u32 event_id;
};
struct vgic_translation_cache_entry {
struct list_head entry;
phys_addr_t db;
u32 devid;
u32 eventid;
struct vgic_irq *irq;
};
/**
* struct vgic_its_abi - ITS abi ops and settings
* @cte_esz: collection table entry size
@ -252,8 +243,10 @@ static struct its_ite *find_ite(struct vgic_its *its, u32 device_id,
#define GIC_LPI_OFFSET 8192
#define VITS_TYPER_IDBITS 16
#define VITS_TYPER_DEVBITS 16
#define VITS_TYPER_IDBITS 16
#define VITS_MAX_EVENTID (BIT(VITS_TYPER_IDBITS) - 1)
#define VITS_TYPER_DEVBITS 16
#define VITS_MAX_DEVID (BIT(VITS_TYPER_DEVBITS) - 1)
#define VITS_DTE_MAX_DEVID_OFFSET (BIT(14) - 1)
#define VITS_ITE_MAX_EVENTID_OFFSET (BIT(16) - 1)
@ -316,53 +309,6 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
return 0;
}
#define GIC_LPI_MAX_INTID ((1 << INTERRUPT_ID_BITS_ITS) - 1)
/*
* Create a snapshot of the current LPIs targeting @vcpu, so that we can
* enumerate those LPIs without holding any lock.
* Returns their number and puts the kmalloc'ed array into intid_ptr.
*/
int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr)
{
struct vgic_dist *dist = &kvm->arch.vgic;
XA_STATE(xas, &dist->lpi_xa, GIC_LPI_OFFSET);
struct vgic_irq *irq;
unsigned long flags;
u32 *intids;
int irq_count, i = 0;
/*
* There is an obvious race between allocating the array and LPIs
* being mapped/unmapped. If we ended up here as a result of a
* command, we're safe (locks are held, preventing another
* command). If coming from another path (such as enabling LPIs),
* we must be careful not to overrun the array.
*/
irq_count = atomic_read(&dist->lpi_count);
intids = kmalloc_array(irq_count, sizeof(intids[0]), GFP_KERNEL_ACCOUNT);
if (!intids)
return -ENOMEM;
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
rcu_read_lock();
xas_for_each(&xas, irq, GIC_LPI_MAX_INTID) {
if (i == irq_count)
break;
/* We don't need to "get" the IRQ, as we hold the list lock. */
if (vcpu && irq->target_vcpu != vcpu)
continue;
intids[i++] = irq->intid;
}
rcu_read_unlock();
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
*intid_ptr = intids;
return i;
}
static int update_affinity(struct vgic_irq *irq, struct kvm_vcpu *vcpu)
{
int ret = 0;
@ -446,23 +392,18 @@ static u32 max_lpis_propbaser(u64 propbaser)
static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
{
gpa_t pendbase = GICR_PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
unsigned long intid, flags;
struct vgic_irq *irq;
int last_byte_offset = -1;
int ret = 0;
u32 *intids;
int nr_irqs, i;
unsigned long flags;
u8 pendmask;
nr_irqs = vgic_copy_lpi_list(vcpu->kvm, vcpu, &intids);
if (nr_irqs < 0)
return nr_irqs;
for (i = 0; i < nr_irqs; i++) {
xa_for_each(&dist->lpi_xa, intid, irq) {
int byte_offset, bit_nr;
byte_offset = intids[i] / BITS_PER_BYTE;
bit_nr = intids[i] % BITS_PER_BYTE;
byte_offset = intid / BITS_PER_BYTE;
bit_nr = intid % BITS_PER_BYTE;
/*
* For contiguously allocated LPIs chances are we just read
@ -472,25 +413,23 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
ret = kvm_read_guest_lock(vcpu->kvm,
pendbase + byte_offset,
&pendmask, 1);
if (ret) {
kfree(intids);
if (ret)
return ret;
}
last_byte_offset = byte_offset;
}
irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
irq = vgic_get_irq(vcpu->kvm, NULL, intid);
if (!irq)
continue;
raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->pending_latch = pendmask & (1U << bit_nr);
if (irq->target_vcpu == vcpu)
irq->pending_latch = pendmask & (1U << bit_nr);
vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
vgic_put_irq(vcpu->kvm, irq);
}
kfree(intids);
return ret;
}
@ -566,51 +505,52 @@ static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm,
return 0;
}
static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist,
phys_addr_t db,
u32 devid, u32 eventid)
static struct vgic_its *__vgic_doorbell_to_its(struct kvm *kvm, gpa_t db)
{
struct vgic_translation_cache_entry *cte;
struct kvm_io_device *kvm_io_dev;
struct vgic_io_device *iodev;
list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
/*
* If we hit a NULL entry, there is nothing after this
* point.
*/
if (!cte->irq)
break;
kvm_io_dev = kvm_io_bus_get_dev(kvm, KVM_MMIO_BUS, db);
if (!kvm_io_dev)
return ERR_PTR(-EINVAL);
if (cte->db != db || cte->devid != devid ||
cte->eventid != eventid)
continue;
if (kvm_io_dev->ops != &kvm_io_gic_ops)
return ERR_PTR(-EINVAL);
/*
* Move this entry to the head, as it is the most
* recently used.
*/
if (!list_is_first(&cte->entry, &dist->lpi_translation_cache))
list_move(&cte->entry, &dist->lpi_translation_cache);
iodev = container_of(kvm_io_dev, struct vgic_io_device, dev);
if (iodev->iodev_type != IODEV_ITS)
return ERR_PTR(-EINVAL);
return cte->irq;
}
return iodev->its;
}
static unsigned long vgic_its_cache_key(u32 devid, u32 eventid)
{
return (((unsigned long)devid) << VITS_TYPER_IDBITS) | eventid;
return NULL;
}
static struct vgic_irq *vgic_its_check_cache(struct kvm *kvm, phys_addr_t db,
u32 devid, u32 eventid)
{
struct vgic_dist *dist = &kvm->arch.vgic;
unsigned long cache_key = vgic_its_cache_key(devid, eventid);
struct vgic_its *its;
struct vgic_irq *irq;
unsigned long flags;
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
if (devid > VITS_MAX_DEVID || eventid > VITS_MAX_EVENTID)
return NULL;
irq = __vgic_its_check_cache(dist, db, devid, eventid);
its = __vgic_doorbell_to_its(kvm, db);
if (IS_ERR(its))
return NULL;
rcu_read_lock();
irq = xa_load(&its->translation_cache, cache_key);
if (!vgic_try_get_irq_kref(irq))
irq = NULL;
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
rcu_read_unlock();
return irq;
}
@ -619,41 +559,13 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
u32 devid, u32 eventid,
struct vgic_irq *irq)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_translation_cache_entry *cte;
unsigned long flags;
phys_addr_t db;
unsigned long cache_key = vgic_its_cache_key(devid, eventid);
struct vgic_irq *old;
/* Do not cache a directly injected interrupt */
if (irq->hw)
return;
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
if (unlikely(list_empty(&dist->lpi_translation_cache)))
goto out;
/*
* We could have raced with another CPU caching the same
* translation behind our back, so let's check it is not in
* already
*/
db = its->vgic_its_base + GITS_TRANSLATER;
if (__vgic_its_check_cache(dist, db, devid, eventid))
goto out;
/* Always reuse the last entry (LRU policy) */
cte = list_last_entry(&dist->lpi_translation_cache,
typeof(*cte), entry);
/*
* Caching the translation implies having an extra reference
* to the interrupt, so drop the potential reference on what
* was in the cache, and increment it on the new interrupt.
*/
if (cte->irq)
vgic_put_irq(kvm, cte->irq);
/*
* The irq refcount is guaranteed to be nonzero while holding the
* its_lock, as the ITE (and the reference it holds) cannot be freed.
@ -661,39 +573,44 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
lockdep_assert_held(&its->its_lock);
vgic_get_irq_kref(irq);
cte->db = db;
cte->devid = devid;
cte->eventid = eventid;
cte->irq = irq;
/* Move the new translation to the head of the list */
list_move(&cte->entry, &dist->lpi_translation_cache);
out:
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
/*
* We could have raced with another CPU caching the same
* translation behind our back, ensure we don't leak a
* reference if that is the case.
*/
old = xa_store(&its->translation_cache, cache_key, irq, GFP_KERNEL_ACCOUNT);
if (old)
vgic_put_irq(kvm, old);
}
void vgic_its_invalidate_cache(struct kvm *kvm)
static void vgic_its_invalidate_cache(struct vgic_its *its)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_translation_cache_entry *cte;
unsigned long flags;
struct kvm *kvm = its->dev->kvm;
struct vgic_irq *irq;
unsigned long idx;
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
xa_for_each(&its->translation_cache, idx, irq) {
xa_erase(&its->translation_cache, idx);
vgic_put_irq(kvm, irq);
}
}
list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
/*
* If we hit a NULL entry, there is nothing after this
* point.
*/
if (!cte->irq)
break;
void vgic_its_invalidate_all_caches(struct kvm *kvm)
{
struct kvm_device *dev;
struct vgic_its *its;
vgic_put_irq(kvm, cte->irq);
cte->irq = NULL;
rcu_read_lock();
list_for_each_entry_rcu(dev, &kvm->devices, vm_node) {
if (dev->ops != &kvm_arm_vgic_its_ops)
continue;
its = dev->private;
vgic_its_invalidate_cache(its);
}
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
rcu_read_unlock();
}
int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its,
@ -725,8 +642,6 @@ int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its,
struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi)
{
u64 address;
struct kvm_io_device *kvm_io_dev;
struct vgic_io_device *iodev;
if (!vgic_has_its(kvm))
return ERR_PTR(-ENODEV);
@ -736,18 +651,7 @@ struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi)
address = (u64)msi->address_hi << 32 | msi->address_lo;
kvm_io_dev = kvm_io_bus_get_dev(kvm, KVM_MMIO_BUS, address);
if (!kvm_io_dev)
return ERR_PTR(-EINVAL);
if (kvm_io_dev->ops != &kvm_io_gic_ops)
return ERR_PTR(-EINVAL);
iodev = container_of(kvm_io_dev, struct vgic_io_device, dev);
if (iodev->iodev_type != IODEV_ITS)
return ERR_PTR(-EINVAL);
return iodev->its;
return __vgic_doorbell_to_its(kvm, address);
}
/*
@ -883,7 +787,7 @@ static int vgic_its_cmd_handle_discard(struct kvm *kvm, struct vgic_its *its,
* don't bother here since we clear the ITTE anyway and the
* pending state is a property of the ITTE struct.
*/
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
its_free_ite(kvm, ite);
return 0;
@ -920,7 +824,7 @@ static int vgic_its_cmd_handle_movi(struct kvm *kvm, struct vgic_its *its,
ite->collection = collection;
vcpu = collection_to_vcpu(kvm, collection);
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
return update_affinity(ite->irq, vcpu);
}
@ -955,7 +859,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
switch (type) {
case GITS_BASER_TYPE_DEVICE:
if (id >= BIT_ULL(VITS_TYPER_DEVBITS))
if (id > VITS_MAX_DEVID)
return false;
break;
case GITS_BASER_TYPE_COLLECTION:
@ -1167,7 +1071,8 @@ static int vgic_its_cmd_handle_mapi(struct kvm *kvm, struct vgic_its *its,
}
/* Requires the its_lock to be held. */
static void vgic_its_free_device(struct kvm *kvm, struct its_device *device)
static void vgic_its_free_device(struct kvm *kvm, struct vgic_its *its,
struct its_device *device)
{
struct its_ite *ite, *temp;
@ -1179,7 +1084,7 @@ static void vgic_its_free_device(struct kvm *kvm, struct its_device *device)
list_for_each_entry_safe(ite, temp, &device->itt_head, ite_list)
its_free_ite(kvm, ite);
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
list_del(&device->dev_list);
kfree(device);
@ -1191,7 +1096,7 @@ static void vgic_its_free_device_list(struct kvm *kvm, struct vgic_its *its)
struct its_device *cur, *temp;
list_for_each_entry_safe(cur, temp, &its->device_list, dev_list)
vgic_its_free_device(kvm, cur);
vgic_its_free_device(kvm, its, cur);
}
/* its lock must be held */
@ -1250,7 +1155,7 @@ static int vgic_its_cmd_handle_mapd(struct kvm *kvm, struct vgic_its *its,
* by removing the mapping and re-establishing it.
*/
if (device)
vgic_its_free_device(kvm, device);
vgic_its_free_device(kvm, its, device);
/*
* The spec does not say whether unmapping a not-mapped device
@ -1281,7 +1186,7 @@ static int vgic_its_cmd_handle_mapc(struct kvm *kvm, struct vgic_its *its,
if (!valid) {
vgic_its_free_collection(its, coll_id);
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
} else {
struct kvm_vcpu *vcpu;
@ -1372,23 +1277,19 @@ static int vgic_its_cmd_handle_inv(struct kvm *kvm, struct vgic_its *its,
int vgic_its_invall(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;
int irq_count, i = 0;
u32 *intids;
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq;
unsigned long intid;
irq_count = vgic_copy_lpi_list(kvm, vcpu, &intids);
if (irq_count < 0)
return irq_count;
for (i = 0; i < irq_count; i++) {
struct vgic_irq *irq = vgic_get_irq(kvm, NULL, intids[i]);
xa_for_each(&dist->lpi_xa, intid, irq) {
irq = vgic_get_irq(kvm, NULL, intid);
if (!irq)
continue;
update_lpi_config(kvm, irq, vcpu, false);
vgic_put_irq(kvm, irq);
}
kfree(intids);
if (vcpu->arch.vgic_cpu.vgic_v3.its_vpe.its_vm)
its_invall_vpe(&vcpu->arch.vgic_cpu.vgic_v3.its_vpe);
@ -1431,10 +1332,10 @@ static int vgic_its_cmd_handle_invall(struct kvm *kvm, struct vgic_its *its,
static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
u64 *its_cmd)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct kvm_vcpu *vcpu1, *vcpu2;
struct vgic_irq *irq;
u32 *intids;
int irq_count, i;
unsigned long intid;
/* We advertise GITS_TYPER.PTA==0, making the address the vcpu ID */
vcpu1 = kvm_get_vcpu_by_id(kvm, its_cmd_get_target_addr(its_cmd));
@ -1446,12 +1347,8 @@ static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
if (vcpu1 == vcpu2)
return 0;
irq_count = vgic_copy_lpi_list(kvm, vcpu1, &intids);
if (irq_count < 0)
return irq_count;
for (i = 0; i < irq_count; i++) {
irq = vgic_get_irq(kvm, NULL, intids[i]);
xa_for_each(&dist->lpi_xa, intid, irq) {
irq = vgic_get_irq(kvm, NULL, intid);
if (!irq)
continue;
@ -1460,9 +1357,8 @@ static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
vgic_put_irq(kvm, irq);
}
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
kfree(intids);
return 0;
}
@ -1813,7 +1709,7 @@ static void vgic_mmio_write_its_ctlr(struct kvm *kvm, struct vgic_its *its,
its->enabled = !!(val & GITS_CTLR_ENABLE);
if (!its->enabled)
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
/*
* Try to process any pending commands. This function bails out early
@ -1914,47 +1810,6 @@ out:
return ret;
}
/* Default is 16 cached LPIs per vcpu */
#define LPI_DEFAULT_PCPU_CACHE_SIZE 16
void vgic_lpi_translation_cache_init(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
unsigned int sz;
int i;
if (!list_empty(&dist->lpi_translation_cache))
return;
sz = atomic_read(&kvm->online_vcpus) * LPI_DEFAULT_PCPU_CACHE_SIZE;
for (i = 0; i < sz; i++) {
struct vgic_translation_cache_entry *cte;
/* An allocation failure is not fatal */
cte = kzalloc(sizeof(*cte), GFP_KERNEL_ACCOUNT);
if (WARN_ON(!cte))
break;
INIT_LIST_HEAD(&cte->entry);
list_add(&cte->entry, &dist->lpi_translation_cache);
}
}
void vgic_lpi_translation_cache_destroy(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_translation_cache_entry *cte, *tmp;
vgic_its_invalidate_cache(kvm);
list_for_each_entry_safe(cte, tmp,
&dist->lpi_translation_cache, entry) {
list_del(&cte->entry);
kfree(cte);
}
}
#define INITIAL_BASER_VALUE \
(GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWb) | \
GIC_BASER_CACHEABILITY(GITS_BASER, OUTER, SameAsInner) | \
@ -1987,8 +1842,6 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
kfree(its);
return ret;
}
vgic_lpi_translation_cache_init(dev->kvm);
}
mutex_init(&its->its_lock);
@ -2006,6 +1859,7 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
INIT_LIST_HEAD(&its->device_list);
INIT_LIST_HEAD(&its->collection_list);
xa_init(&its->translation_cache);
dev->kvm->arch.vgic.msis_require_devid = true;
dev->kvm->arch.vgic.has_its = true;
@ -2036,6 +1890,8 @@ static void vgic_its_destroy(struct kvm_device *kvm_dev)
vgic_its_free_device_list(kvm, its);
vgic_its_free_collection_list(kvm, its);
vgic_its_invalidate_cache(its);
xa_destroy(&its->translation_cache);
mutex_unlock(&its->its_lock);
kfree(its);
@ -2438,7 +2294,7 @@ static int vgic_its_restore_dte(struct vgic_its *its, u32 id,
ret = vgic_its_restore_itt(its, dev);
if (ret) {
vgic_its_free_device(its->dev->kvm, dev);
vgic_its_free_device(its->dev->kvm, its, dev);
return ret;
}

View File

@ -277,7 +277,7 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu *vcpu,
return;
vgic_flush_pending_lpis(vcpu);
vgic_its_invalidate_cache(vcpu->kvm);
vgic_its_invalidate_all_caches(vcpu->kvm);
atomic_set_release(&vgic_cpu->ctlr, 0);
} else {
ctlr = atomic_cmpxchg_acquire(&vgic_cpu->ctlr, 0,

View File

@ -464,17 +464,10 @@ void vgic_v2_load(struct kvm_vcpu *vcpu)
kvm_vgic_global_state.vctrl_base + GICH_APR);
}
void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu)
{
struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
cpu_if->vgic_vmcr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VMCR);
}
void vgic_v2_put(struct kvm_vcpu *vcpu)
{
struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
vgic_v2_vmcr_sync(vcpu);
cpu_if->vgic_vmcr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VMCR);
cpu_if->vgic_apr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_APR);
}

View File

@ -722,15 +722,7 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
/*
* If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
* is dependent on ICC_SRE_EL1.SRE, and we have to perform the
* VMCR_EL2 save/restore in the world switch.
*/
if (likely(cpu_if->vgic_sre))
kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
kvm_call_hyp(__vgic_v3_restore_aprs, cpu_if);
kvm_call_hyp(__vgic_v3_restore_vmcr_aprs, cpu_if);
if (has_vhe())
__vgic_v3_activate_traps(cpu_if);
@ -738,24 +730,13 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
WARN_ON(vgic_v4_load(vcpu));
}
void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
if (likely(cpu_if->vgic_sre))
cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
}
void vgic_v3_put(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
kvm_call_hyp(__vgic_v3_save_vmcr_aprs, cpu_if);
WARN_ON(vgic_v4_put(vcpu));
vgic_v3_vmcr_sync(vcpu);
kvm_call_hyp(__vgic_v3_save_aprs, cpu_if);
if (has_vhe())
__vgic_v3_deactivate_traps(cpu_if);
}

View File

@ -29,9 +29,8 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
* its->cmd_lock (mutex)
* its->its_lock (mutex)
* vgic_cpu->ap_list_lock must be taken with IRQs disabled
* kvm->lpi_list_lock must be taken with IRQs disabled
* vgic_dist->lpi_xa.xa_lock must be taken with IRQs disabled
* vgic_irq->irq_lock must be taken with IRQs disabled
* vgic_dist->lpi_xa.xa_lock must be taken with IRQs disabled
* vgic_irq->irq_lock must be taken with IRQs disabled
*
* As the ap_list_lock might be taken from the timer interrupt handler,
* we have to disable IRQs before taking this lock and everything lower
@ -126,7 +125,6 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
__xa_erase(&dist->lpi_xa, irq->intid);
xa_unlock_irqrestore(&dist->lpi_xa, flags);
atomic_dec(&dist->lpi_count);
kfree_rcu(irq, rcu);
}
@ -939,17 +937,6 @@ void kvm_vgic_put(struct kvm_vcpu *vcpu)
vgic_v3_put(vcpu);
}
void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu)
{
if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
return;
if (kvm_vgic_global_state.type == VGIC_V2)
vgic_v2_vmcr_sync(vcpu);
else
vgic_v3_vmcr_sync(vcpu);
}
int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;

View File

@ -16,6 +16,7 @@
#define INTERRUPT_ID_BITS_SPIS 10
#define INTERRUPT_ID_BITS_ITS 16
#define VGIC_LPI_MAX_INTID ((1 << INTERRUPT_ID_BITS_ITS) - 1)
#define VGIC_PRI_BITS 5
#define vgic_irq_is_sgi(intid) ((intid) < VGIC_NR_SGIS)
@ -214,7 +215,6 @@ int vgic_register_dist_iodev(struct kvm *kvm, gpa_t dist_base_address,
void vgic_v2_init_lrs(void);
void vgic_v2_load(struct kvm_vcpu *vcpu);
void vgic_v2_put(struct kvm_vcpu *vcpu);
void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu);
void vgic_v2_save_state(struct kvm_vcpu *vcpu);
void vgic_v2_restore_state(struct kvm_vcpu *vcpu);
@ -253,7 +253,6 @@ bool vgic_v3_check_base(struct kvm *kvm);
void vgic_v3_load(struct kvm_vcpu *vcpu);
void vgic_v3_put(struct kvm_vcpu *vcpu);
void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu);
bool vgic_has_its(struct kvm *kvm);
int kvm_vgic_register_its_device(void);
@ -330,14 +329,11 @@ static inline bool vgic_dist_overlap(struct kvm *kvm, gpa_t base, size_t size)
}
bool vgic_lpis_enabled(struct kvm_vcpu *vcpu);
int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr);
int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its,
u32 devid, u32 eventid, struct vgic_irq **irq);
struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi);
int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi);
void vgic_lpi_translation_cache_init(struct kvm *kvm);
void vgic_lpi_translation_cache_destroy(struct kvm *kvm);
void vgic_its_invalidate_cache(struct kvm *kvm);
void vgic_its_invalidate_all_caches(struct kvm *kvm);
/* GICv4.1 MMIO interface */
int vgic_its_inv_lpi(struct kvm *kvm, struct vgic_irq *irq);

View File

@ -632,6 +632,15 @@ config RANDOMIZE_BASE_MAX_OFFSET
source "kernel/livepatch/Kconfig"
config PARAVIRT
bool "Enable paravirtualization code"
depends on AS_HAS_LVZ_EXTENSION
help
This changes the kernel so it can modify itself when it is run
under a hypervisor, potentially improving performance significantly
over full virtualization. However, when run without a hypervisor
the kernel is theoretically slower and slightly larger.
endmenu
config ARCH_SELECT_MEMORY_MODEL

View File

@ -26,4 +26,3 @@ generic-y += poll.h
generic-y += param.h
generic-y += posix_types.h
generic-y += resource.h
generic-y += kvm_para.h

View File

@ -14,9 +14,15 @@ extern void ack_bad_irq(unsigned int irq);
#define NR_IPI 2
enum ipi_msg_type {
IPI_RESCHEDULE,
IPI_CALL_FUNCTION,
};
typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
atomic_t message ____cacheline_aligned_in_smp;
} ____cacheline_aligned irq_cpustat_t;
DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);

View File

@ -12,6 +12,7 @@
#define INSN_NOP 0x03400000
#define INSN_BREAK 0x002a0000
#define INSN_HVCL 0x002b8000
#define ADDR_IMMMASK_LU52ID 0xFFF0000000000000
#define ADDR_IMMMASK_LU32ID 0x000FFFFF00000000
@ -67,6 +68,7 @@ enum reg2_op {
revhd_op = 0x11,
extwh_op = 0x16,
extwb_op = 0x17,
cpucfg_op = 0x1b,
iocsrrdb_op = 0x19200,
iocsrrdh_op = 0x19201,
iocsrrdw_op = 0x19202,

View File

@ -117,7 +117,16 @@ extern struct fwnode_handle *liointc_handle;
extern struct fwnode_handle *pch_lpc_handle;
extern struct fwnode_handle *pch_pic_handle[MAX_IO_PICS];
extern irqreturn_t loongson_ipi_interrupt(int irq, void *dev);
static inline int get_percpu_irq(int vector)
{
struct irq_domain *d;
d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
if (d)
return irq_create_mapping(d, vector);
return -EINVAL;
}
#include <asm-generic/irq.h>

View File

@ -31,6 +31,11 @@
#define KVM_HALT_POLL_NS_DEFAULT 500000
#define KVM_GUESTDBG_SW_BP_MASK \
(KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP)
#define KVM_GUESTDBG_VALID_MASK \
(KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP | KVM_GUESTDBG_SINGLESTEP)
struct kvm_vm_stat {
struct kvm_vm_stat_generic generic;
u64 pages;
@ -43,6 +48,7 @@ struct kvm_vcpu_stat {
u64 idle_exits;
u64 cpucfg_exits;
u64 signal_exits;
u64 hypercall_exits;
};
#define KVM_MEM_HUGEPAGE_CAPABLE (1UL << 0)
@ -64,6 +70,31 @@ struct kvm_world_switch {
#define MAX_PGTABLE_LEVELS 4
/*
* Physical CPUID is used for interrupt routing, there are different
* definitions about physical cpuid on different hardwares.
*
* For LOONGARCH_CSR_CPUID register, max CPUID size if 512
* For IPI hardware, max destination CPUID size 1024
* For extioi interrupt controller, max destination CPUID size is 256
* For msgint interrupt controller, max supported CPUID size is 65536
*
* Currently max CPUID is defined as 256 for KVM hypervisor, in future
* it will be expanded to 4096, including 16 packages at most. And every
* package supports at most 256 vcpus
*/
#define KVM_MAX_PHYID 256
struct kvm_phyid_info {
struct kvm_vcpu *vcpu;
bool enabled;
};
struct kvm_phyid_map {
int max_phyid;
struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
};
struct kvm_arch {
/* Guest physical mm */
kvm_pte_t *pgd;
@ -71,6 +102,8 @@ struct kvm_arch {
unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
unsigned int pte_shifts[MAX_PGTABLE_LEVELS];
unsigned int root_level;
spinlock_t phyid_map_lock;
struct kvm_phyid_map *phyid_map;
s64 time_offset;
struct kvm_context __percpu *vmcs;
@ -203,7 +236,6 @@ void kvm_flush_tlb_all(void);
void kvm_flush_tlb_gpa(struct kvm_vcpu *vcpu, unsigned long gpa);
int kvm_handle_mm_fault(struct kvm_vcpu *vcpu, unsigned long badv, bool write);
void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end, bool blockable);
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);

View File

@ -0,0 +1,161 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_LOONGARCH_KVM_PARA_H
#define _ASM_LOONGARCH_KVM_PARA_H
/*
* Hypercall code field
*/
#define HYPERVISOR_KVM 1
#define HYPERVISOR_VENDOR_SHIFT 8
#define HYPERCALL_ENCODE(vendor, code) ((vendor << HYPERVISOR_VENDOR_SHIFT) + code)
#define KVM_HCALL_CODE_SERVICE 0
#define KVM_HCALL_CODE_SWDBG 1
#define KVM_HCALL_SERVICE HYPERCALL_ENCODE(HYPERVISOR_KVM, KVM_HCALL_CODE_SERVICE)
#define KVM_HCALL_FUNC_IPI 1
#define KVM_HCALL_SWDBG HYPERCALL_ENCODE(HYPERVISOR_KVM, KVM_HCALL_CODE_SWDBG)
/*
* LoongArch hypercall return code
*/
#define KVM_HCALL_SUCCESS 0
#define KVM_HCALL_INVALID_CODE -1UL
#define KVM_HCALL_INVALID_PARAMETER -2UL
/*
* Hypercall interface for KVM hypervisor
*
* a0: function identifier
* a1-a6: args
* Return value will be placed in a0.
* Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6.
*/
static __always_inline long kvm_hypercall0(u64 fid)
{
register long ret asm("a0");
register unsigned long fun asm("a0") = fid;
__asm__ __volatile__(
"hvcl "__stringify(KVM_HCALL_SERVICE)
: "=r" (ret)
: "r" (fun)
: "memory"
);
return ret;
}
static __always_inline long kvm_hypercall1(u64 fid, unsigned long arg0)
{
register long ret asm("a0");
register unsigned long fun asm("a0") = fid;
register unsigned long a1 asm("a1") = arg0;
__asm__ __volatile__(
"hvcl "__stringify(KVM_HCALL_SERVICE)
: "=r" (ret)
: "r" (fun), "r" (a1)
: "memory"
);
return ret;
}
static __always_inline long kvm_hypercall2(u64 fid,
unsigned long arg0, unsigned long arg1)
{
register long ret asm("a0");
register unsigned long fun asm("a0") = fid;
register unsigned long a1 asm("a1") = arg0;
register unsigned long a2 asm("a2") = arg1;
__asm__ __volatile__(
"hvcl "__stringify(KVM_HCALL_SERVICE)
: "=r" (ret)
: "r" (fun), "r" (a1), "r" (a2)
: "memory"
);
return ret;
}
static __always_inline long kvm_hypercall3(u64 fid,
unsigned long arg0, unsigned long arg1, unsigned long arg2)
{
register long ret asm("a0");
register unsigned long fun asm("a0") = fid;
register unsigned long a1 asm("a1") = arg0;
register unsigned long a2 asm("a2") = arg1;
register unsigned long a3 asm("a3") = arg2;
__asm__ __volatile__(
"hvcl "__stringify(KVM_HCALL_SERVICE)
: "=r" (ret)
: "r" (fun), "r" (a1), "r" (a2), "r" (a3)
: "memory"
);
return ret;
}
static __always_inline long kvm_hypercall4(u64 fid,
unsigned long arg0, unsigned long arg1,
unsigned long arg2, unsigned long arg3)
{
register long ret asm("a0");
register unsigned long fun asm("a0") = fid;
register unsigned long a1 asm("a1") = arg0;
register unsigned long a2 asm("a2") = arg1;
register unsigned long a3 asm("a3") = arg2;
register unsigned long a4 asm("a4") = arg3;
__asm__ __volatile__(
"hvcl "__stringify(KVM_HCALL_SERVICE)
: "=r" (ret)
: "r"(fun), "r" (a1), "r" (a2), "r" (a3), "r" (a4)
: "memory"
);
return ret;
}
static __always_inline long kvm_hypercall5(u64 fid,
unsigned long arg0, unsigned long arg1,
unsigned long arg2, unsigned long arg3, unsigned long arg4)
{
register long ret asm("a0");
register unsigned long fun asm("a0") = fid;
register unsigned long a1 asm("a1") = arg0;
register unsigned long a2 asm("a2") = arg1;
register unsigned long a3 asm("a3") = arg2;
register unsigned long a4 asm("a4") = arg3;
register unsigned long a5 asm("a5") = arg4;
__asm__ __volatile__(
"hvcl "__stringify(KVM_HCALL_SERVICE)
: "=r" (ret)
: "r"(fun), "r" (a1), "r" (a2), "r" (a3), "r" (a4), "r" (a5)
: "memory"
);
return ret;
}
static inline unsigned int kvm_arch_para_features(void)
{
return 0;
}
static inline unsigned int kvm_arch_para_hints(void)
{
return 0;
}
static inline bool kvm_check_and_clear_guest_paused(void)
{
return false;
}
#endif /* _ASM_LOONGARCH_KVM_PARA_H */

View File

@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
void kvm_restore_timer(struct kvm_vcpu *vcpu);
int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
/*
* Loongarch KVM guest interrupt handling
@ -109,4 +110,14 @@ static inline int kvm_queue_exception(struct kvm_vcpu *vcpu,
return -1;
}
static inline unsigned long kvm_read_reg(struct kvm_vcpu *vcpu, int num)
{
return vcpu->arch.gprs[num];
}
static inline void kvm_write_reg(struct kvm_vcpu *vcpu, int num, unsigned long val)
{
vcpu->arch.gprs[num] = val;
}
#endif /* __ASM_LOONGARCH_KVM_VCPU_H__ */

View File

@ -158,6 +158,18 @@
#define CPUCFG48_VFPU_CG BIT(2)
#define CPUCFG48_RAM_CG BIT(3)
/*
* CPUCFG index area: 0x40000000 -- 0x400000ff
* SW emulation for KVM hypervirsor
*/
#define CPUCFG_KVM_BASE 0x40000000
#define CPUCFG_KVM_SIZE 0x100
#define CPUCFG_KVM_SIG (CPUCFG_KVM_BASE + 0)
#define KVM_SIGNATURE "KVM\0"
#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
#define KVM_FEATURE_IPI BIT(1)
#ifndef __ASSEMBLY__
/* CSR */

View File

@ -0,0 +1,30 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_LOONGARCH_PARAVIRT_H
#define _ASM_LOONGARCH_PARAVIRT_H
#ifdef CONFIG_PARAVIRT
#include <linux/static_call_types.h>
struct static_key;
extern struct static_key paravirt_steal_enabled;
extern struct static_key paravirt_steal_rq_enabled;
u64 dummy_steal_clock(int cpu);
DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
static inline u64 paravirt_steal_clock(int cpu)
{
return static_call(pv_steal_clock)(cpu);
}
int __init pv_ipi_init(void);
#else
static inline int pv_ipi_init(void)
{
return 0;
}
#endif // CONFIG_PARAVIRT
#endif

View File

@ -0,0 +1 @@
#include <asm/paravirt.h>

View File

@ -12,6 +12,13 @@
#include <linux/threads.h>
#include <linux/cpumask.h>
struct smp_ops {
void (*init_ipi)(void);
void (*send_ipi_single)(int cpu, unsigned int action);
void (*send_ipi_mask)(const struct cpumask *mask, unsigned int action);
};
extern struct smp_ops mp_ops;
extern int smp_num_siblings;
extern int num_processors;
extern int disabled_cpus;
@ -24,8 +31,6 @@ void loongson_prepare_cpus(unsigned int max_cpus);
void loongson_boot_secondary(int cpu, struct task_struct *idle);
void loongson_init_secondary(void);
void loongson_smp_finish(void);
void loongson_send_ipi_single(int cpu, unsigned int action);
void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action);
#ifdef CONFIG_HOTPLUG_CPU
int loongson_cpu_disable(void);
void loongson_cpu_die(unsigned int cpu);
@ -59,9 +64,12 @@ extern int __cpu_logical_map[NR_CPUS];
#define cpu_physical_id(cpu) cpu_logical_map(cpu)
#define SMP_BOOT_CPU 0x1
#define SMP_RESCHEDULE 0x2
#define SMP_CALL_FUNCTION 0x4
#define ACTION_BOOT_CPU 0
#define ACTION_RESCHEDULE 1
#define ACTION_CALL_FUNCTION 2
#define SMP_BOOT_CPU BIT(ACTION_BOOT_CPU)
#define SMP_RESCHEDULE BIT(ACTION_RESCHEDULE)
#define SMP_CALL_FUNCTION BIT(ACTION_CALL_FUNCTION)
struct secondary_data {
unsigned long stack;
@ -81,12 +89,12 @@ extern void show_ipi_list(struct seq_file *p, int prec);
static inline void arch_send_call_function_single_ipi(int cpu)
{
loongson_send_ipi_single(cpu, SMP_CALL_FUNCTION);
mp_ops.send_ipi_single(cpu, ACTION_CALL_FUNCTION);
}
static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
{
loongson_send_ipi_mask(mask, SMP_CALL_FUNCTION);
mp_ops.send_ipi_mask(mask, ACTION_CALL_FUNCTION);
}
#ifdef CONFIG_HOTPLUG_CPU

View File

@ -17,6 +17,8 @@
#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
#define KVM_DIRTY_LOG_PAGE_OFFSET 64
#define KVM_GUESTDBG_USE_SW_BP 0x00010000
/*
* for KVM_GET_REGS and KVM_SET_REGS
*/
@ -72,6 +74,8 @@ struct kvm_fpu {
#define KVM_REG_LOONGARCH_COUNTER (KVM_REG_LOONGARCH_KVM | KVM_REG_SIZE_U64 | 1)
#define KVM_REG_LOONGARCH_VCPU_RESET (KVM_REG_LOONGARCH_KVM | KVM_REG_SIZE_U64 | 2)
/* Debugging: Special instruction for software breakpoint */
#define KVM_REG_LOONGARCH_DEBUG_INST (KVM_REG_LOONGARCH_KVM | KVM_REG_SIZE_U64 | 3)
#define LOONGARCH_REG_SHIFT 3
#define LOONGARCH_REG_64(TYPE, REG) (TYPE | KVM_REG_SIZE_U64 | (REG << LOONGARCH_REG_SHIFT))

View File

@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_PARAVIRT) += paravirt.o
obj-$(CONFIG_SMP) += smp.o

View File

@ -87,23 +87,9 @@ static void __init init_vec_parent_group(void)
acpi_table_parse(ACPI_SIG_MCFG, early_pci_mcfg_parse);
}
static int __init get_ipi_irq(void)
{
struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
if (d)
return irq_create_mapping(d, INT_IPI);
return -EINVAL;
}
void __init init_IRQ(void)
{
int i;
#ifdef CONFIG_SMP
int r, ipi_irq;
static int ipi_dummy_dev;
#endif
unsigned int order = get_order(IRQ_STACK_SIZE);
struct page *page;
@ -113,13 +99,7 @@ void __init init_IRQ(void)
init_vec_parent_group();
irqchip_init();
#ifdef CONFIG_SMP
ipi_irq = get_ipi_irq();
if (ipi_irq < 0)
panic("IPI IRQ mapping failed\n");
irq_set_percpu_devid(ipi_irq);
r = request_percpu_irq(ipi_irq, loongson_ipi_interrupt, "IPI", &ipi_dummy_dev);
if (r < 0)
panic("IPI IRQ request failed\n");
mp_ops.init_ipi();
#endif
for (i = 0; i < NR_IRQS; i++)
@ -133,5 +113,5 @@ void __init init_IRQ(void)
per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE);
}
set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
}

View File

@ -0,0 +1,151 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/export.h>
#include <linux/types.h>
#include <linux/interrupt.h>
#include <linux/jump_label.h>
#include <linux/kvm_para.h>
#include <linux/static_call.h>
#include <asm/paravirt.h>
struct static_key paravirt_steal_enabled;
struct static_key paravirt_steal_rq_enabled;
static u64 native_steal_clock(int cpu)
{
return 0;
}
DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
#ifdef CONFIG_SMP
static void pv_send_ipi_single(int cpu, unsigned int action)
{
int min, old;
irq_cpustat_t *info = &per_cpu(irq_stat, cpu);
old = atomic_fetch_or(BIT(action), &info->message);
if (old)
return;
min = cpu_logical_map(cpu);
kvm_hypercall3(KVM_HCALL_FUNC_IPI, 1, 0, min);
}
#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG)
static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int action)
{
int i, cpu, min = 0, max = 0, old;
__uint128_t bitmap = 0;
irq_cpustat_t *info;
if (cpumask_empty(mask))
return;
action = BIT(action);
for_each_cpu(i, mask) {
info = &per_cpu(irq_stat, i);
old = atomic_fetch_or(action, &info->message);
if (old)
continue;
cpu = cpu_logical_map(i);
if (!bitmap) {
min = max = cpu;
} else if (cpu < min && cpu > (max - KVM_IPI_CLUSTER_SIZE)) {
/* cpu < min, and bitmap still enough */
bitmap <<= min - cpu;
min = cpu;
} else if (cpu > min && cpu < (min + KVM_IPI_CLUSTER_SIZE)) {
/* cpu > min, and bitmap still enough */
max = cpu > max ? cpu : max;
} else {
/*
* With cpu, bitmap will exceed KVM_IPI_CLUSTER_SIZE,
* send IPI here directly and skip the remaining CPUs.
*/
kvm_hypercall3(KVM_HCALL_FUNC_IPI, (unsigned long)bitmap,
(unsigned long)(bitmap >> BITS_PER_LONG), min);
min = max = cpu;
bitmap = 0;
}
__set_bit(cpu - min, (unsigned long *)&bitmap);
}
if (bitmap)
kvm_hypercall3(KVM_HCALL_FUNC_IPI, (unsigned long)bitmap,
(unsigned long)(bitmap >> BITS_PER_LONG), min);
}
static irqreturn_t pv_ipi_interrupt(int irq, void *dev)
{
u32 action;
irq_cpustat_t *info;
/* Clear SWI interrupt */
clear_csr_estat(1 << INT_SWI0);
info = this_cpu_ptr(&irq_stat);
action = atomic_xchg(&info->message, 0);
if (action & SMP_RESCHEDULE) {
scheduler_ipi();
info->ipi_irqs[IPI_RESCHEDULE]++;
}
if (action & SMP_CALL_FUNCTION) {
generic_smp_call_function_interrupt();
info->ipi_irqs[IPI_CALL_FUNCTION]++;
}
return IRQ_HANDLED;
}
static void pv_init_ipi(void)
{
int r, swi;
swi = get_percpu_irq(INT_SWI0);
if (swi < 0)
panic("SWI0 IRQ mapping failed\n");
irq_set_percpu_devid(swi);
r = request_percpu_irq(swi, pv_ipi_interrupt, "SWI0-IPI", &irq_stat);
if (r < 0)
panic("SWI0 IRQ request failed\n");
}
#endif
static bool kvm_para_available(void)
{
int config;
static int hypervisor_type;
if (!hypervisor_type) {
config = read_cpucfg(CPUCFG_KVM_SIG);
if (!memcmp(&config, KVM_SIGNATURE, 4))
hypervisor_type = HYPERVISOR_KVM;
}
return hypervisor_type == HYPERVISOR_KVM;
}
int __init pv_ipi_init(void)
{
int feature;
if (!cpu_has_hypervisor)
return 0;
if (!kvm_para_available())
return 0;
feature = read_cpucfg(CPUCFG_KVM_FEATURE);
if (!(feature & KVM_FEATURE_IPI))
return 0;
#ifdef CONFIG_SMP
mp_ops.init_ipi = pv_init_ipi;
mp_ops.send_ipi_single = pv_send_ipi_single;
mp_ops.send_ipi_mask = pv_send_ipi_mask;
#endif
return 0;
}

View File

@ -456,16 +456,6 @@ static void loongarch_pmu_disable(struct pmu *pmu)
static DEFINE_MUTEX(pmu_reserve_mutex);
static atomic_t active_events = ATOMIC_INIT(0);
static int get_pmc_irq(void)
{
struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
if (d)
return irq_create_mapping(d, INT_PCOV);
return -EINVAL;
}
static void reset_counters(void *arg);
static int __hw_perf_event_init(struct perf_event *event);
@ -473,7 +463,7 @@ static void hw_perf_event_destroy(struct perf_event *event)
{
if (atomic_dec_and_mutex_lock(&active_events, &pmu_reserve_mutex)) {
on_each_cpu(reset_counters, NULL, 1);
free_irq(get_pmc_irq(), &loongarch_pmu);
free_irq(get_percpu_irq(INT_PCOV), &loongarch_pmu);
mutex_unlock(&pmu_reserve_mutex);
}
}
@ -562,7 +552,7 @@ static int loongarch_pmu_event_init(struct perf_event *event)
if (event->cpu >= 0 && !cpu_online(event->cpu))
return -ENODEV;
irq = get_pmc_irq();
irq = get_percpu_irq(INT_PCOV);
flags = IRQF_PERCPU | IRQF_NOBALANCING | IRQF_NO_THREAD | IRQF_NO_SUSPEND | IRQF_SHARED;
if (!atomic_inc_not_zero(&active_events)) {
mutex_lock(&pmu_reserve_mutex);

View File

@ -29,6 +29,7 @@
#include <asm/loongson.h>
#include <asm/mmu_context.h>
#include <asm/numa.h>
#include <asm/paravirt.h>
#include <asm/processor.h>
#include <asm/setup.h>
#include <asm/time.h>
@ -66,11 +67,6 @@ static cpumask_t cpu_core_setup_map;
struct secondary_data cpuboot_data;
static DEFINE_PER_CPU(int, cpu_state);
enum ipi_msg_type {
IPI_RESCHEDULE,
IPI_CALL_FUNCTION,
};
static const char *ipi_types[NR_IPI] __tracepoint_string = {
[IPI_RESCHEDULE] = "Rescheduling interrupts",
[IPI_CALL_FUNCTION] = "Function call interrupts",
@ -190,24 +186,19 @@ static u32 ipi_read_clear(int cpu)
static void ipi_write_action(int cpu, u32 action)
{
unsigned int irq = 0;
uint32_t val;
while ((irq = ffs(action))) {
uint32_t val = IOCSR_IPI_SEND_BLOCKING;
val |= (irq - 1);
val |= (cpu << IOCSR_IPI_SEND_CPU_SHIFT);
iocsr_write32(val, LOONGARCH_IOCSR_IPI_SEND);
action &= ~BIT(irq - 1);
}
val = IOCSR_IPI_SEND_BLOCKING | action;
val |= (cpu << IOCSR_IPI_SEND_CPU_SHIFT);
iocsr_write32(val, LOONGARCH_IOCSR_IPI_SEND);
}
void loongson_send_ipi_single(int cpu, unsigned int action)
static void loongson_send_ipi_single(int cpu, unsigned int action)
{
ipi_write_action(cpu_logical_map(cpu), (u32)action);
}
void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
static void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
{
unsigned int i;
@ -222,11 +213,11 @@ void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action)
*/
void arch_smp_send_reschedule(int cpu)
{
loongson_send_ipi_single(cpu, SMP_RESCHEDULE);
mp_ops.send_ipi_single(cpu, ACTION_RESCHEDULE);
}
EXPORT_SYMBOL_GPL(arch_smp_send_reschedule);
irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
static irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
{
unsigned int action;
unsigned int cpu = smp_processor_id();
@ -246,6 +237,26 @@ irqreturn_t loongson_ipi_interrupt(int irq, void *dev)
return IRQ_HANDLED;
}
static void loongson_init_ipi(void)
{
int r, ipi_irq;
ipi_irq = get_percpu_irq(INT_IPI);
if (ipi_irq < 0)
panic("IPI IRQ mapping failed\n");
irq_set_percpu_devid(ipi_irq);
r = request_percpu_irq(ipi_irq, loongson_ipi_interrupt, "IPI", &irq_stat);
if (r < 0)
panic("IPI IRQ request failed\n");
}
struct smp_ops mp_ops = {
.init_ipi = loongson_init_ipi,
.send_ipi_single = loongson_send_ipi_single,
.send_ipi_mask = loongson_send_ipi_mask,
};
static void __init fdt_smp_setup(void)
{
#ifdef CONFIG_OF
@ -289,6 +300,7 @@ void __init loongson_smp_setup(void)
cpu_data[0].core = cpu_logical_map(0) % loongson_sysconf.cores_per_package;
cpu_data[0].package = cpu_logical_map(0) / loongson_sysconf.cores_per_package;
pv_ipi_init();
iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_EN);
pr_info("Detected %i available CPU(s)\n", loongson_sysconf.nr_cpus);
}
@ -323,7 +335,7 @@ void loongson_boot_secondary(int cpu, struct task_struct *idle)
csr_mail_send(entry, cpu_logical_map(cpu), 0);
loongson_send_ipi_single(cpu, SMP_BOOT_CPU);
loongson_send_ipi_single(cpu, ACTION_BOOT_CPU);
}
/*
@ -333,7 +345,7 @@ void loongson_init_secondary(void)
{
unsigned int cpu = smp_processor_id();
unsigned int imask = ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 |
ECFGF_IPI | ECFGF_PMC | ECFGF_TIMER;
ECFGF_IPI | ECFGF_PMC | ECFGF_TIMER | ECFGF_SIP0;
change_csr_ecfg(ECFG0_IM, imask);

View File

@ -123,16 +123,6 @@ void sync_counter(void)
csr_write64(init_offset, LOONGARCH_CSR_CNTC);
}
static int get_timer_irq(void)
{
struct irq_domain *d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
if (d)
return irq_create_mapping(d, INT_TI);
return -EINVAL;
}
int constant_clockevent_init(void)
{
unsigned int cpu = smp_processor_id();
@ -142,7 +132,7 @@ int constant_clockevent_init(void)
static int irq = 0, timer_irq_installed = 0;
if (!timer_irq_installed) {
irq = get_timer_irq();
irq = get_percpu_irq(INT_TI);
if (irq < 0)
pr_err("Failed to map irq %d (timer)\n", irq);
}

View File

@ -9,6 +9,7 @@
#include <linux/module.h>
#include <linux/preempt.h>
#include <linux/vmalloc.h>
#include <trace/events/kvm.h>
#include <asm/fpu.h>
#include <asm/inst.h>
#include <asm/loongarch.h>
@ -20,6 +21,46 @@
#include <asm/kvm_vcpu.h>
#include "trace.h"
static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
{
int rd, rj;
unsigned int index;
if (inst.reg2_format.opcode != cpucfg_op)
return EMULATE_FAIL;
rd = inst.reg2_format.rd;
rj = inst.reg2_format.rj;
++vcpu->stat.cpucfg_exits;
index = vcpu->arch.gprs[rj];
/*
* By LoongArch Reference Manual 2.2.10.5
* Return value is 0 for undefined CPUCFG index
*
* Disable preemption since hw gcsr is accessed
*/
preempt_disable();
switch (index) {
case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
break;
case CPUCFG_KVM_SIG:
/* CPUCFG emulation between 0x40000000 -- 0x400000ff */
vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
break;
case CPUCFG_KVM_FEATURE:
vcpu->arch.gprs[rd] = KVM_FEATURE_IPI;
break;
default:
vcpu->arch.gprs[rd] = 0;
break;
}
preempt_enable();
return EMULATE_DONE;
}
static unsigned long kvm_emu_read_csr(struct kvm_vcpu *vcpu, int csrid)
{
unsigned long val = 0;
@ -208,8 +249,6 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
{
int rd, rj;
unsigned int index;
unsigned long curr_pc;
larch_inst inst;
enum emulation_result er = EMULATE_DONE;
@ -224,21 +263,7 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
er = EMULATE_FAIL;
switch (((inst.word >> 24) & 0xff)) {
case 0x0: /* CPUCFG GSPR */
if (inst.reg2_format.opcode == 0x1B) {
rd = inst.reg2_format.rd;
rj = inst.reg2_format.rj;
++vcpu->stat.cpucfg_exits;
index = vcpu->arch.gprs[rj];
er = EMULATE_DONE;
/*
* By LoongArch Reference Manual 2.2.10.5
* return value is 0 for undefined cpucfg index
*/
if (index < KVM_MAX_CPUCFG_REGS)
vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
else
vcpu->arch.gprs[rd] = 0;
}
er = kvm_emu_cpucfg(vcpu, inst);
break;
case 0x4: /* CSR{RD,WR,XCHG} GSPR */
er = kvm_handle_csr(vcpu, inst);
@ -417,6 +442,8 @@ int kvm_emu_mmio_read(struct kvm_vcpu *vcpu, larch_inst inst)
vcpu->arch.io_gpr = rd;
run->mmio.is_write = 0;
vcpu->mmio_is_write = 0;
trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, run->mmio.len,
run->mmio.phys_addr, NULL);
} else {
kvm_err("Read not supported Inst=0x%08x @%lx BadVaddr:%#lx\n",
inst.word, vcpu->arch.pc, vcpu->arch.badv);
@ -463,6 +490,9 @@ int kvm_complete_mmio_read(struct kvm_vcpu *vcpu, struct kvm_run *run)
break;
}
trace_kvm_mmio(KVM_TRACE_MMIO_READ, run->mmio.len,
run->mmio.phys_addr, run->mmio.data);
return er;
}
@ -564,6 +594,8 @@ int kvm_emu_mmio_write(struct kvm_vcpu *vcpu, larch_inst inst)
run->mmio.is_write = 1;
vcpu->mmio_needed = 1;
vcpu->mmio_is_write = 1;
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, run->mmio.len,
run->mmio.phys_addr, data);
} else {
vcpu->arch.pc = curr_pc;
kvm_err("Write not supported Inst=0x%08x @%lx BadVaddr:%#lx\n",
@ -685,6 +717,90 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
return RESUME_GUEST;
}
static int kvm_send_pv_ipi(struct kvm_vcpu *vcpu)
{
unsigned int min, cpu, i;
unsigned long ipi_bitmap;
struct kvm_vcpu *dest;
min = kvm_read_reg(vcpu, LOONGARCH_GPR_A3);
for (i = 0; i < 2; i++, min += BITS_PER_LONG) {
ipi_bitmap = kvm_read_reg(vcpu, LOONGARCH_GPR_A1 + i);
if (!ipi_bitmap)
continue;
cpu = find_first_bit((void *)&ipi_bitmap, BITS_PER_LONG);
while (cpu < BITS_PER_LONG) {
dest = kvm_get_vcpu_by_cpuid(vcpu->kvm, cpu + min);
cpu = find_next_bit((void *)&ipi_bitmap, BITS_PER_LONG, cpu + 1);
if (!dest)
continue;
/* Send SWI0 to dest vcpu to emulate IPI interrupt */
kvm_queue_irq(dest, INT_SWI0);
kvm_vcpu_kick(dest);
}
}
return 0;
}
/*
* Hypercall emulation always return to guest, Caller should check retval.
*/
static void kvm_handle_service(struct kvm_vcpu *vcpu)
{
unsigned long func = kvm_read_reg(vcpu, LOONGARCH_GPR_A0);
long ret;
switch (func) {
case KVM_HCALL_FUNC_IPI:
kvm_send_pv_ipi(vcpu);
ret = KVM_HCALL_SUCCESS;
break;
default:
ret = KVM_HCALL_INVALID_CODE;
break;
};
kvm_write_reg(vcpu, LOONGARCH_GPR_A0, ret);
}
static int kvm_handle_hypercall(struct kvm_vcpu *vcpu)
{
int ret;
larch_inst inst;
unsigned int code;
inst.word = vcpu->arch.badi;
code = inst.reg0i15_format.immediate;
ret = RESUME_GUEST;
switch (code) {
case KVM_HCALL_SERVICE:
vcpu->stat.hypercall_exits++;
kvm_handle_service(vcpu);
break;
case KVM_HCALL_SWDBG:
/* KVM_HCALL_SWDBG only in effective when SW_BP is enabled */
if (vcpu->guest_debug & KVM_GUESTDBG_SW_BP_MASK) {
vcpu->run->exit_reason = KVM_EXIT_DEBUG;
ret = RESUME_HOST;
break;
}
fallthrough;
default:
/* Treat it as noop intruction, only set return value */
kvm_write_reg(vcpu, LOONGARCH_GPR_A0, KVM_HCALL_INVALID_CODE);
break;
}
if (ret == RESUME_GUEST)
update_pc(&vcpu->arch);
return ret;
}
/*
* LoongArch KVM callback handling for unimplemented guest exiting
*/
@ -716,6 +832,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] = {
[EXCCODE_LSXDIS] = kvm_handle_lsx_disabled,
[EXCCODE_LASXDIS] = kvm_handle_lasx_disabled,
[EXCCODE_GSPR] = kvm_handle_gspr,
[EXCCODE_HVC] = kvm_handle_hypercall,
};
int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault)

View File

@ -494,38 +494,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
range->end << PAGE_SHIFT, &ctx);
}
bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
unsigned long prot_bits;
kvm_pte_t *ptep;
kvm_pfn_t pfn = pte_pfn(range->arg.pte);
gpa_t gpa = range->start << PAGE_SHIFT;
ptep = kvm_populate_gpa(kvm, NULL, gpa, 0);
if (!ptep)
return false;
/* Replacing an absent or old page doesn't need flushes */
if (!kvm_pte_present(NULL, ptep) || !kvm_pte_young(*ptep)) {
kvm_set_pte(ptep, 0);
return false;
}
/* Fill new pte if write protected or page migrated */
prot_bits = _PAGE_PRESENT | __READABLE;
prot_bits |= _CACHE_MASK & pte_val(range->arg.pte);
/*
* Set _PAGE_WRITE or _PAGE_DIRTY iff old and new pte both support
* _PAGE_WRITE for map_page_fast if next page write fault
* _PAGE_DIRTY since gpa has already recorded as dirty page
*/
prot_bits |= __WRITEABLE & *ptep & pte_val(range->arg.pte);
kvm_set_pte(ptep, kvm_pfn_pte(pfn, __pgprot(prot_bits)));
return true;
}
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
kvm_ptw_ctx ctx;

View File

@ -19,14 +19,16 @@ DECLARE_EVENT_CLASS(kvm_transition,
TP_PROTO(struct kvm_vcpu *vcpu),
TP_ARGS(vcpu),
TP_STRUCT__entry(
__field(unsigned int, vcpu_id)
__field(unsigned long, pc)
),
TP_fast_assign(
__entry->vcpu_id = vcpu->vcpu_id;
__entry->pc = vcpu->arch.pc;
),
TP_printk("PC: 0x%08lx", __entry->pc)
TP_printk("vcpu %u PC: 0x%08lx", __entry->vcpu_id, __entry->pc)
);
DEFINE_EVENT(kvm_transition, kvm_enter,
@ -54,19 +56,22 @@ DECLARE_EVENT_CLASS(kvm_exit,
TP_PROTO(struct kvm_vcpu *vcpu, unsigned int reason),
TP_ARGS(vcpu, reason),
TP_STRUCT__entry(
__field(unsigned int, vcpu_id)
__field(unsigned long, pc)
__field(unsigned int, reason)
),
TP_fast_assign(
__entry->vcpu_id = vcpu->vcpu_id;
__entry->pc = vcpu->arch.pc;
__entry->reason = reason;
),
TP_printk("[%s]PC: 0x%08lx",
__print_symbolic(__entry->reason,
kvm_trace_symbol_exit_types),
__entry->pc)
TP_printk("vcpu %u [%s] PC: 0x%08lx",
__entry->vcpu_id,
__print_symbolic(__entry->reason,
kvm_trace_symbol_exit_types),
__entry->pc)
);
DEFINE_EVENT(kvm_exit, kvm_exit_idle,
@ -85,14 +90,17 @@ TRACE_EVENT(kvm_exit_gspr,
TP_PROTO(struct kvm_vcpu *vcpu, unsigned int inst_word),
TP_ARGS(vcpu, inst_word),
TP_STRUCT__entry(
__field(unsigned int, vcpu_id)
__field(unsigned int, inst_word)
),
TP_fast_assign(
__entry->vcpu_id = vcpu->vcpu_id;
__entry->inst_word = inst_word;
),
TP_printk("Inst word: 0x%08x", __entry->inst_word)
TP_printk("vcpu %u Inst word: 0x%08x", __entry->vcpu_id,
__entry->inst_word)
);
#define KVM_TRACE_AUX_SAVE 0

View File

@ -19,6 +19,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
STATS_DESC_COUNTER(VCPU, idle_exits),
STATS_DESC_COUNTER(VCPU, cpucfg_exits),
STATS_DESC_COUNTER(VCPU, signal_exits),
STATS_DESC_COUNTER(VCPU, hypercall_exits)
};
const struct kvm_stats_header kvm_vcpu_stats_header = {
@ -247,7 +248,101 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg)
{
return -EINVAL;
if (dbg->control & ~KVM_GUESTDBG_VALID_MASK)
return -EINVAL;
if (dbg->control & KVM_GUESTDBG_ENABLE)
vcpu->guest_debug = dbg->control;
else
vcpu->guest_debug = 0;
return 0;
}
static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
{
int cpuid;
struct kvm_phyid_map *map;
struct loongarch_csrs *csr = vcpu->arch.csr;
if (val >= KVM_MAX_PHYID)
return -EINVAL;
map = vcpu->kvm->arch.phyid_map;
cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_CPUID);
spin_lock(&vcpu->kvm->arch.phyid_map_lock);
if ((cpuid < KVM_MAX_PHYID) && map->phys_map[cpuid].enabled) {
/* Discard duplicated CPUID set operation */
if (cpuid == val) {
spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
return 0;
}
/*
* CPUID is already set before
* Forbid changing to a different CPUID at runtime
*/
spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
return -EINVAL;
}
if (map->phys_map[val].enabled) {
/* Discard duplicated CPUID set operation */
if (vcpu == map->phys_map[val].vcpu) {
spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
return 0;
}
/*
* New CPUID is already set with other vcpu
* Forbid sharing the same CPUID between different vcpus
*/
spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
return -EINVAL;
}
kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
map->phys_map[val].enabled = true;
map->phys_map[val].vcpu = vcpu;
spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
return 0;
}
static inline void kvm_drop_cpuid(struct kvm_vcpu *vcpu)
{
int cpuid;
struct kvm_phyid_map *map;
struct loongarch_csrs *csr = vcpu->arch.csr;
map = vcpu->kvm->arch.phyid_map;
cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_CPUID);
if (cpuid >= KVM_MAX_PHYID)
return;
spin_lock(&vcpu->kvm->arch.phyid_map_lock);
if (map->phys_map[cpuid].enabled) {
map->phys_map[cpuid].vcpu = NULL;
map->phys_map[cpuid].enabled = false;
kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, KVM_MAX_PHYID);
}
spin_unlock(&vcpu->kvm->arch.phyid_map_lock);
}
struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid)
{
struct kvm_phyid_map *map;
if (cpuid >= KVM_MAX_PHYID)
return NULL;
map = kvm->arch.phyid_map;
if (!map->phys_map[cpuid].enabled)
return NULL;
return map->phys_map[cpuid].vcpu;
}
static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 *val)
@ -282,6 +377,9 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
if (get_gcsr_flag(id) & INVALID_GCSR)
return -EINVAL;
if (id == LOONGARCH_CSR_CPUID)
return kvm_set_cpuid(vcpu, val);
if (id == LOONGARCH_CSR_ESTAT) {
/* ESTAT IP0~IP7 inject through GINTC */
gintc = (val >> 2) & 0xff;
@ -409,6 +507,9 @@ static int kvm_get_one_reg(struct kvm_vcpu *vcpu,
case KVM_REG_LOONGARCH_COUNTER:
*v = drdtime() + vcpu->kvm->arch.time_offset;
break;
case KVM_REG_LOONGARCH_DEBUG_INST:
*v = INSN_HVCL | KVM_HCALL_SWDBG;
break;
default:
ret = -EINVAL;
break;
@ -924,6 +1025,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
/* Set cpuid */
kvm_write_sw_gcsr(csr, LOONGARCH_CSR_TMID, vcpu->vcpu_id);
kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, KVM_MAX_PHYID);
/* Start with no pending virtual guest interrupts */
csr->csrs[LOONGARCH_CSR_GINTC] = 0;
@ -942,6 +1044,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
hrtimer_cancel(&vcpu->arch.swtimer);
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
kvm_drop_cpuid(vcpu);
kfree(vcpu->arch.csr);
/*

View File

@ -30,6 +30,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (!kvm->arch.pgd)
return -ENOMEM;
kvm->arch.phyid_map = kvzalloc(sizeof(struct kvm_phyid_map), GFP_KERNEL_ACCOUNT);
if (!kvm->arch.phyid_map) {
free_page((unsigned long)kvm->arch.pgd);
kvm->arch.pgd = NULL;
return -ENOMEM;
}
spin_lock_init(&kvm->arch.phyid_map_lock);
kvm_init_vmcs(kvm);
kvm->arch.gpa_size = BIT(cpu_vabits - 1);
kvm->arch.root_level = CONFIG_PGTABLE_LEVELS - 1;
@ -52,6 +60,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_destroy_vcpus(kvm);
free_page((unsigned long)kvm->arch.pgd);
kvm->arch.pgd = NULL;
kvfree(kvm->arch.phyid_map);
kvm->arch.phyid_map = NULL;
}
int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
@ -66,6 +76,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_IMMEDIATE_EXIT:
case KVM_CAP_IOEVENTFD:
case KVM_CAP_MP_STATE:
case KVM_CAP_SET_GUEST_DEBUG:
r = 1;
break;
case KVM_CAP_NR_VCPUS:

View File

@ -444,36 +444,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
return true;
}
bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
gpa_t gpa = range->start << PAGE_SHIFT;
pte_t hva_pte = range->arg.pte;
pte_t *gpa_pte = kvm_mips_pte_for_gpa(kvm, NULL, gpa);
pte_t old_pte;
if (!gpa_pte)
return false;
/* Mapping may need adjusting depending on memslot flags */
old_pte = *gpa_pte;
if (range->slot->flags & KVM_MEM_LOG_DIRTY_PAGES && !pte_dirty(old_pte))
hva_pte = pte_mkclean(hva_pte);
else if (range->slot->flags & KVM_MEM_READONLY)
hva_pte = pte_wrprotect(hva_pte);
set_pte(gpa_pte, hva_pte);
/* Replacing an absent or old page doesn't need flushes */
if (!pte_present(old_pte) || !pte_young(old_pte))
return false;
/* Pages swapped, aged, moved, or cleaned require flushes */
return !pte_present(hva_pte) ||
!pte_young(hva_pte) ||
pte_pfn(old_pte) != pte_pfn(hva_pte) ||
(pte_dirty(old_pte) && !pte_dirty(hva_pte));
}
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
return kvm_mips_mkold_gpa_pt(kvm, range->start, range->end);

View File

@ -287,7 +287,6 @@ struct kvmppc_ops {
bool (*unmap_gfn_range)(struct kvm *kvm, struct kvm_gfn_range *range);
bool (*age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range);
bool (*test_age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range);
bool (*set_spte_gfn)(struct kvm *kvm, struct kvm_gfn_range *range);
void (*free_memslot)(struct kvm_memory_slot *slot);
int (*init_vm)(struct kvm *kvm);
void (*destroy_vm)(struct kvm *kvm);

View File

@ -899,11 +899,6 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
return kvm->arch.kvm_ops->test_age_gfn(kvm, range);
}
bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
return kvm->arch.kvm_ops->set_spte_gfn(kvm, range);
}
int kvmppc_core_init_vm(struct kvm *kvm)
{

View File

@ -12,7 +12,6 @@ extern void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
extern bool kvm_unmap_gfn_range_hv(struct kvm *kvm, struct kvm_gfn_range *range);
extern bool kvm_age_gfn_hv(struct kvm *kvm, struct kvm_gfn_range *range);
extern bool kvm_test_age_gfn_hv(struct kvm *kvm, struct kvm_gfn_range *range);
extern bool kvm_set_spte_gfn_hv(struct kvm *kvm, struct kvm_gfn_range *range);
extern int kvmppc_mmu_init_pr(struct kvm_vcpu *vcpu);
extern void kvmppc_mmu_destroy_pr(struct kvm_vcpu *vcpu);

View File

@ -1010,18 +1010,6 @@ bool kvm_test_age_gfn_hv(struct kvm *kvm, struct kvm_gfn_range *range)
return kvm_test_age_rmapp(kvm, range->slot, range->start);
}
bool kvm_set_spte_gfn_hv(struct kvm *kvm, struct kvm_gfn_range *range)
{
WARN_ON(range->start + 1 != range->end);
if (kvm_is_radix(kvm))
kvm_unmap_radix(kvm, range->slot, range->start);
else
kvm_unmap_rmapp(kvm, range->slot, range->start);
return false;
}
static int vcpus_running(struct kvm *kvm)
{
return atomic_read(&kvm->arch.vcpus_running) != 0;

View File

@ -6364,7 +6364,6 @@ static struct kvmppc_ops kvm_ops_hv = {
.unmap_gfn_range = kvm_unmap_gfn_range_hv,
.age_gfn = kvm_age_gfn_hv,
.test_age_gfn = kvm_test_age_gfn_hv,
.set_spte_gfn = kvm_set_spte_gfn_hv,
.free_memslot = kvmppc_core_free_memslot_hv,
.init_vm = kvmppc_core_init_vm_hv,
.destroy_vm = kvmppc_core_destroy_vm_hv,

View File

@ -461,12 +461,6 @@ static bool kvm_test_age_gfn_pr(struct kvm *kvm, struct kvm_gfn_range *range)
return false;
}
static bool kvm_set_spte_gfn_pr(struct kvm *kvm, struct kvm_gfn_range *range)
{
/* The page will get remapped properly on its next fault */
return do_kvm_unmap_gfn(kvm, range);
}
/*****************************************/
static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
@ -2071,7 +2065,6 @@ static struct kvmppc_ops kvm_ops_pr = {
.unmap_gfn_range = kvm_unmap_gfn_range_pr,
.age_gfn = kvm_age_gfn_pr,
.test_age_gfn = kvm_test_age_gfn_pr,
.set_spte_gfn = kvm_set_spte_gfn_pr,
.free_memslot = kvmppc_core_free_memslot_pr,
.init_vm = kvmppc_core_init_vm_pr,
.destroy_vm = kvmppc_core_destroy_vm_pr,

View File

@ -747,12 +747,6 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
return false;
}
bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
/* The page will get remapped properly on its next fault */
return kvm_e500_mmu_unmap_gfn(kvm, range);
}
/*****************************************/
int e500_mmu_host_init(struct kvmppc_vcpu_e500 *vcpu_e500)

View File

@ -168,7 +168,8 @@
#define VSIP_TO_HVIP_SHIFT (IRQ_VS_SOFT - IRQ_S_SOFT)
#define VSIP_VALID_MASK ((_AC(1, UL) << IRQ_S_SOFT) | \
(_AC(1, UL) << IRQ_S_TIMER) | \
(_AC(1, UL) << IRQ_S_EXT))
(_AC(1, UL) << IRQ_S_EXT) | \
(_AC(1, UL) << IRQ_PMU_OVF))
/* AIA CSR bits */
#define TOPI_IID_SHIFT 16
@ -281,7 +282,7 @@
#define CSR_HPMCOUNTER30H 0xc9e
#define CSR_HPMCOUNTER31H 0xc9f
#define CSR_SSCOUNTOVF 0xda0
#define CSR_SCOUNTOVF 0xda0
#define CSR_SSTATUS 0x100
#define CSR_SIE 0x104

View File

@ -43,6 +43,17 @@
KVM_ARCH_REQ_FLAGS(5, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_STEAL_UPDATE KVM_ARCH_REQ(6)
#define KVM_HEDELEG_DEFAULT (BIT(EXC_INST_MISALIGNED) | \
BIT(EXC_BREAKPOINT) | \
BIT(EXC_SYSCALL) | \
BIT(EXC_INST_PAGE_FAULT) | \
BIT(EXC_LOAD_PAGE_FAULT) | \
BIT(EXC_STORE_PAGE_FAULT))
#define KVM_HIDELEG_DEFAULT (BIT(IRQ_VS_SOFT) | \
BIT(IRQ_VS_TIMER) | \
BIT(IRQ_VS_EXT))
enum kvm_riscv_hfence_type {
KVM_RISCV_HFENCE_UNKNOWN = 0,
KVM_RISCV_HFENCE_GVMA_VMID_GPA,
@ -169,6 +180,7 @@ struct kvm_vcpu_csr {
struct kvm_vcpu_config {
u64 henvcfg;
u64 hstateen0;
unsigned long hedeleg;
};
struct kvm_vcpu_smstateen_csr {
@ -211,6 +223,7 @@ struct kvm_vcpu_arch {
/* CPU context upon Guest VCPU reset */
struct kvm_cpu_context guest_reset_context;
spinlock_t reset_cntx_lock;
/* CPU CSR context upon Guest VCPU reset */
struct kvm_vcpu_csr guest_reset_csr;
@ -252,8 +265,9 @@ struct kvm_vcpu_arch {
/* Cache pages needed to program page tables with spinlock held */
struct kvm_mmu_memory_cache mmu_page_cache;
/* VCPU power-off state */
bool power_off;
/* VCPU power state */
struct kvm_mp_state mp_state;
spinlock_t mp_state_lock;
/* Don't run the VCPU (blocked) */
bool pause;
@ -374,8 +388,11 @@ int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu);
void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu);
bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, u64 mask);
void __kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
void __kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
bool kvm_riscv_vcpu_stopped(struct kvm_vcpu *vcpu);
void kvm_riscv_vcpu_sbi_sta_reset(struct kvm_vcpu *vcpu);
void kvm_riscv_vcpu_record_steal_time(struct kvm_vcpu *vcpu);

View File

@ -20,7 +20,7 @@ static_assert(RISCV_KVM_MAX_COUNTERS <= 64);
struct kvm_fw_event {
/* Current value of the event */
unsigned long value;
u64 value;
/* Event monitoring status */
bool started;
@ -36,6 +36,7 @@ struct kvm_pmc {
bool started;
/* Monitoring event ID */
unsigned long event_idx;
struct kvm_vcpu *vcpu;
};
/* PMU data structure per vcpu */
@ -50,6 +51,12 @@ struct kvm_pmu {
bool init_done;
/* Bit map of all the virtual counter used */
DECLARE_BITMAP(pmc_in_use, RISCV_KVM_MAX_COUNTERS);
/* Bit map of all the virtual counter overflown */
DECLARE_BITMAP(pmc_overflown, RISCV_KVM_MAX_COUNTERS);
/* The address of the counter snapshot area (guest physical address) */
gpa_t snapshot_addr;
/* The actual data of the snapshot */
struct riscv_pmu_snapshot_data *sdata;
};
#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu_context)
@ -82,9 +89,14 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
unsigned long ctr_mask, unsigned long flags,
unsigned long eidx, u64 evtdata,
struct kvm_vcpu_sbi_return *retdata);
int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
int kvm_riscv_vcpu_pmu_fw_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_return *retdata);
int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_return *retdata);
void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
int kvm_riscv_vcpu_pmu_snapshot_set_shmem(struct kvm_vcpu *vcpu, unsigned long saddr_low,
unsigned long saddr_high, unsigned long flags,
struct kvm_vcpu_sbi_return *retdata);
void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);

View File

@ -131,6 +131,8 @@ enum sbi_ext_pmu_fid {
SBI_EXT_PMU_COUNTER_START,
SBI_EXT_PMU_COUNTER_STOP,
SBI_EXT_PMU_COUNTER_FW_READ,
SBI_EXT_PMU_COUNTER_FW_READ_HI,
SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
};
union sbi_pmu_ctr_info {
@ -147,6 +149,13 @@ union sbi_pmu_ctr_info {
};
};
/* Data structure to contain the pmu snapshot data */
struct riscv_pmu_snapshot_data {
u64 ctr_overflow_mask;
u64 ctr_values[64];
u64 reserved[447];
};
#define RISCV_PMU_RAW_EVENT_MASK GENMASK_ULL(47, 0)
#define RISCV_PMU_RAW_EVENT_IDX 0x20000
@ -232,20 +241,22 @@ enum sbi_pmu_ctr_type {
#define SBI_PMU_EVENT_IDX_INVALID 0xFFFFFFFF
/* Flags defined for config matching function */
#define SBI_PMU_CFG_FLAG_SKIP_MATCH (1 << 0)
#define SBI_PMU_CFG_FLAG_CLEAR_VALUE (1 << 1)
#define SBI_PMU_CFG_FLAG_AUTO_START (1 << 2)
#define SBI_PMU_CFG_FLAG_SET_VUINH (1 << 3)
#define SBI_PMU_CFG_FLAG_SET_VSINH (1 << 4)
#define SBI_PMU_CFG_FLAG_SET_UINH (1 << 5)
#define SBI_PMU_CFG_FLAG_SET_SINH (1 << 6)
#define SBI_PMU_CFG_FLAG_SET_MINH (1 << 7)
#define SBI_PMU_CFG_FLAG_SKIP_MATCH BIT(0)
#define SBI_PMU_CFG_FLAG_CLEAR_VALUE BIT(1)
#define SBI_PMU_CFG_FLAG_AUTO_START BIT(2)
#define SBI_PMU_CFG_FLAG_SET_VUINH BIT(3)
#define SBI_PMU_CFG_FLAG_SET_VSINH BIT(4)
#define SBI_PMU_CFG_FLAG_SET_UINH BIT(5)
#define SBI_PMU_CFG_FLAG_SET_SINH BIT(6)
#define SBI_PMU_CFG_FLAG_SET_MINH BIT(7)
/* Flags defined for counter start function */
#define SBI_PMU_START_FLAG_SET_INIT_VALUE (1 << 0)
#define SBI_PMU_START_FLAG_SET_INIT_VALUE BIT(0)
#define SBI_PMU_START_FLAG_INIT_SNAPSHOT BIT(1)
/* Flags defined for counter stop function */
#define SBI_PMU_STOP_FLAG_RESET (1 << 0)
#define SBI_PMU_STOP_FLAG_RESET BIT(0)
#define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
enum sbi_ext_dbcn_fid {
SBI_EXT_DBCN_CONSOLE_WRITE = 0,
@ -266,7 +277,7 @@ struct sbi_sta_struct {
u8 pad[47];
} __packed;
#define SBI_STA_SHMEM_DISABLE -1
#define SBI_SHMEM_DISABLE -1
/* SBI spec version fields */
#define SBI_SPEC_VERSION_DEFAULT 0x1
@ -284,6 +295,7 @@ struct sbi_sta_struct {
#define SBI_ERR_ALREADY_AVAILABLE -6
#define SBI_ERR_ALREADY_STARTED -7
#define SBI_ERR_ALREADY_STOPPED -8
#define SBI_ERR_NO_SHMEM -9
extern unsigned long sbi_spec_version;
struct sbiret {
@ -355,8 +367,8 @@ static inline unsigned long sbi_minor_version(void)
static inline unsigned long sbi_mk_version(unsigned long major,
unsigned long minor)
{
return ((major & SBI_SPEC_VERSION_MAJOR_MASK) <<
SBI_SPEC_VERSION_MAJOR_SHIFT) | minor;
return ((major & SBI_SPEC_VERSION_MAJOR_MASK) << SBI_SPEC_VERSION_MAJOR_SHIFT)
| (minor & SBI_SPEC_VERSION_MINOR_MASK);
}
int sbi_err_map_linux_errno(int err);

View File

@ -167,6 +167,7 @@ enum KVM_RISCV_ISA_EXT_ID {
KVM_RISCV_ISA_EXT_ZFA,
KVM_RISCV_ISA_EXT_ZTSO,
KVM_RISCV_ISA_EXT_ZACAS,
KVM_RISCV_ISA_EXT_SSCOFPMF,
KVM_RISCV_ISA_EXT_MAX,
};

View File

@ -62,7 +62,7 @@ static int sbi_sta_steal_time_set_shmem(unsigned long lo, unsigned long hi,
ret = sbi_ecall(SBI_EXT_STA, SBI_EXT_STA_STEAL_TIME_SET_SHMEM,
lo, hi, flags, 0, 0, 0);
if (ret.error) {
if (lo == SBI_STA_SHMEM_DISABLE && hi == SBI_STA_SHMEM_DISABLE)
if (lo == SBI_SHMEM_DISABLE && hi == SBI_SHMEM_DISABLE)
pr_warn("Failed to disable steal-time shmem");
else
pr_warn("Failed to set steal-time shmem");
@ -84,8 +84,8 @@ static int pv_time_cpu_online(unsigned int cpu)
static int pv_time_cpu_down_prepare(unsigned int cpu)
{
return sbi_sta_steal_time_set_shmem(SBI_STA_SHMEM_DISABLE,
SBI_STA_SHMEM_DISABLE, 0);
return sbi_sta_steal_time_set_shmem(SBI_SHMEM_DISABLE,
SBI_SHMEM_DISABLE, 0);
}
static u64 pv_time_steal_clock(int cpu)

View File

@ -545,6 +545,9 @@ void kvm_riscv_aia_enable(void)
enable_percpu_irq(hgei_parent_irq,
irq_get_trigger_type(hgei_parent_irq));
csr_set(CSR_HIE, BIT(IRQ_S_GEXT));
/* Enable IRQ filtering for overflow interrupt only if sscofpmf is present */
if (__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSCOFPMF))
csr_write(CSR_HVIEN, BIT(IRQ_PMU_OVF));
}
void kvm_riscv_aia_disable(void)
@ -558,6 +561,8 @@ void kvm_riscv_aia_disable(void)
return;
hgctrl = get_cpu_ptr(&aia_hgei);
if (__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSCOFPMF))
csr_clear(CSR_HVIEN, BIT(IRQ_PMU_OVF));
/* Disable per-CPU SGEI interrupt */
csr_clear(CSR_HIE, BIT(IRQ_S_GEXT));
disable_percpu_irq(hgei_parent_irq);

View File

@ -22,22 +22,8 @@ long kvm_arch_dev_ioctl(struct file *filp,
int kvm_arch_hardware_enable(void)
{
unsigned long hideleg, hedeleg;
hedeleg = 0;
hedeleg |= (1UL << EXC_INST_MISALIGNED);
hedeleg |= (1UL << EXC_BREAKPOINT);
hedeleg |= (1UL << EXC_SYSCALL);
hedeleg |= (1UL << EXC_INST_PAGE_FAULT);
hedeleg |= (1UL << EXC_LOAD_PAGE_FAULT);
hedeleg |= (1UL << EXC_STORE_PAGE_FAULT);
csr_write(CSR_HEDELEG, hedeleg);
hideleg = 0;
hideleg |= (1UL << IRQ_VS_SOFT);
hideleg |= (1UL << IRQ_VS_TIMER);
hideleg |= (1UL << IRQ_VS_EXT);
csr_write(CSR_HIDELEG, hideleg);
csr_write(CSR_HEDELEG, KVM_HEDELEG_DEFAULT);
csr_write(CSR_HIDELEG, KVM_HIDELEG_DEFAULT);
/* VS should access only the time counter directly. Everything else should trap */
csr_write(CSR_HCOUNTEREN, 0x02);

View File

@ -550,26 +550,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
return false;
}
bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
int ret;
kvm_pfn_t pfn = pte_pfn(range->arg.pte);
if (!kvm->arch.pgd)
return false;
WARN_ON(range->end - range->start != 1);
ret = gstage_map_page(kvm, NULL, range->start << PAGE_SHIFT,
__pfn_to_phys(pfn), PAGE_SIZE, true, true);
if (ret) {
kvm_debug("Failed to map G-stage page (error %d)\n", ret);
return true;
}
return false;
}
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
pte_t *ptep;

View File

@ -64,7 +64,9 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
memcpy(csr, reset_csr, sizeof(*csr));
spin_lock(&vcpu->arch.reset_cntx_lock);
memcpy(cntx, reset_cntx, sizeof(*cntx));
spin_unlock(&vcpu->arch.reset_cntx_lock);
kvm_riscv_vcpu_fp_reset(vcpu);
@ -102,6 +104,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *cntx;
struct kvm_vcpu_csr *reset_csr = &vcpu->arch.guest_reset_csr;
spin_lock_init(&vcpu->arch.mp_state_lock);
/* Mark this VCPU never ran */
vcpu->arch.ran_atleast_once = false;
vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
@ -119,12 +123,16 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
spin_lock_init(&vcpu->arch.hfence_lock);
/* Setup reset state of shadow SSTATUS and HSTATUS CSRs */
spin_lock_init(&vcpu->arch.reset_cntx_lock);
spin_lock(&vcpu->arch.reset_cntx_lock);
cntx = &vcpu->arch.guest_reset_context;
cntx->sstatus = SR_SPP | SR_SPIE;
cntx->hstatus = 0;
cntx->hstatus |= HSTATUS_VTW;
cntx->hstatus |= HSTATUS_SPVP;
cntx->hstatus |= HSTATUS_SPV;
spin_unlock(&vcpu->arch.reset_cntx_lock);
if (kvm_riscv_vcpu_alloc_vector_context(vcpu, cntx))
return -ENOMEM;
@ -201,7 +209,7 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
{
return (kvm_riscv_vcpu_has_interrupts(vcpu, -1UL) &&
!vcpu->arch.power_off && !vcpu->arch.pause);
!kvm_riscv_vcpu_stopped(vcpu) && !vcpu->arch.pause);
}
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
@ -365,6 +373,13 @@ void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu)
}
}
/* Sync up the HVIP.LCOFIP bit changes (only clear) by the guest */
if ((csr->hvip ^ hvip) & (1UL << IRQ_PMU_OVF)) {
if (!(hvip & (1UL << IRQ_PMU_OVF)) &&
!test_and_set_bit(IRQ_PMU_OVF, v->irqs_pending_mask))
clear_bit(IRQ_PMU_OVF, v->irqs_pending);
}
/* Sync-up AIA high interrupts */
kvm_riscv_vcpu_aia_sync_interrupts(vcpu);
@ -382,7 +397,8 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
if (irq < IRQ_LOCAL_MAX &&
irq != IRQ_VS_SOFT &&
irq != IRQ_VS_TIMER &&
irq != IRQ_VS_EXT)
irq != IRQ_VS_EXT &&
irq != IRQ_PMU_OVF)
return -EINVAL;
set_bit(irq, vcpu->arch.irqs_pending);
@ -397,14 +413,15 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
{
/*
* We only allow VS-mode software, timer, and external
* We only allow VS-mode software, timer, counter overflow and external
* interrupts when irq is one of the local interrupts
* defined by RISC-V privilege specification.
*/
if (irq < IRQ_LOCAL_MAX &&
irq != IRQ_VS_SOFT &&
irq != IRQ_VS_TIMER &&
irq != IRQ_VS_EXT)
irq != IRQ_VS_EXT &&
irq != IRQ_PMU_OVF)
return -EINVAL;
clear_bit(irq, vcpu->arch.irqs_pending);
@ -429,26 +446,42 @@ bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, u64 mask)
return kvm_riscv_vcpu_aia_has_interrupts(vcpu, mask);
}
void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu)
void __kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu)
{
vcpu->arch.power_off = true;
WRITE_ONCE(vcpu->arch.mp_state.mp_state, KVM_MP_STATE_STOPPED);
kvm_make_request(KVM_REQ_SLEEP, vcpu);
kvm_vcpu_kick(vcpu);
}
void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu)
{
spin_lock(&vcpu->arch.mp_state_lock);
__kvm_riscv_vcpu_power_off(vcpu);
spin_unlock(&vcpu->arch.mp_state_lock);
}
void __kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu)
{
WRITE_ONCE(vcpu->arch.mp_state.mp_state, KVM_MP_STATE_RUNNABLE);
kvm_vcpu_wake_up(vcpu);
}
void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu)
{
vcpu->arch.power_off = false;
kvm_vcpu_wake_up(vcpu);
spin_lock(&vcpu->arch.mp_state_lock);
__kvm_riscv_vcpu_power_on(vcpu);
spin_unlock(&vcpu->arch.mp_state_lock);
}
bool kvm_riscv_vcpu_stopped(struct kvm_vcpu *vcpu)
{
return READ_ONCE(vcpu->arch.mp_state.mp_state) == KVM_MP_STATE_STOPPED;
}
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
if (vcpu->arch.power_off)
mp_state->mp_state = KVM_MP_STATE_STOPPED;
else
mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
*mp_state = READ_ONCE(vcpu->arch.mp_state);
return 0;
}
@ -458,25 +491,36 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
{
int ret = 0;
spin_lock(&vcpu->arch.mp_state_lock);
switch (mp_state->mp_state) {
case KVM_MP_STATE_RUNNABLE:
vcpu->arch.power_off = false;
WRITE_ONCE(vcpu->arch.mp_state, *mp_state);
break;
case KVM_MP_STATE_STOPPED:
kvm_riscv_vcpu_power_off(vcpu);
__kvm_riscv_vcpu_power_off(vcpu);
break;
default:
ret = -EINVAL;
}
spin_unlock(&vcpu->arch.mp_state_lock);
return ret;
}
int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg)
{
/* TODO; To be implemented later. */
return -EINVAL;
if (dbg->control & KVM_GUESTDBG_ENABLE) {
vcpu->guest_debug = dbg->control;
vcpu->arch.cfg.hedeleg &= ~BIT(EXC_BREAKPOINT);
} else {
vcpu->guest_debug = 0;
vcpu->arch.cfg.hedeleg |= BIT(EXC_BREAKPOINT);
}
return 0;
}
static void kvm_riscv_vcpu_setup_config(struct kvm_vcpu *vcpu)
@ -505,6 +549,10 @@ static void kvm_riscv_vcpu_setup_config(struct kvm_vcpu *vcpu)
if (riscv_isa_extension_available(isa, SMSTATEEN))
cfg->hstateen0 |= SMSTATEEN0_SSTATEEN0;
}
cfg->hedeleg = KVM_HEDELEG_DEFAULT;
if (vcpu->guest_debug)
cfg->hedeleg &= ~BIT(EXC_BREAKPOINT);
}
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
@ -519,6 +567,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
csr_write(CSR_VSEPC, csr->vsepc);
csr_write(CSR_VSCAUSE, csr->vscause);
csr_write(CSR_VSTVAL, csr->vstval);
csr_write(CSR_HEDELEG, cfg->hedeleg);
csr_write(CSR_HVIP, csr->hvip);
csr_write(CSR_VSATP, csr->vsatp);
csr_write(CSR_HENVCFG, cfg->henvcfg);
@ -584,11 +633,11 @@ static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) {
kvm_vcpu_srcu_read_unlock(vcpu);
rcuwait_wait_event(wait,
(!vcpu->arch.power_off) && (!vcpu->arch.pause),
(!kvm_riscv_vcpu_stopped(vcpu)) && (!vcpu->arch.pause),
TASK_INTERRUPTIBLE);
kvm_vcpu_srcu_read_lock(vcpu);
if (vcpu->arch.power_off || vcpu->arch.pause) {
if (kvm_riscv_vcpu_stopped(vcpu) || vcpu->arch.pause) {
/*
* Awaken to handle a signal, request to
* sleep again later.

View File

@ -204,6 +204,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
ret = kvm_riscv_vcpu_sbi_ecall(vcpu, run);
break;
case EXC_BREAKPOINT:
run->exit_reason = KVM_EXIT_DEBUG;
ret = 0;
break;
default:
break;
}

View File

@ -36,6 +36,7 @@ static const unsigned long kvm_isa_ext_arr[] = {
/* Multi letter extensions (alphabetically sorted) */
KVM_ISA_EXT_ARR(SMSTATEEN),
KVM_ISA_EXT_ARR(SSAIA),
KVM_ISA_EXT_ARR(SSCOFPMF),
KVM_ISA_EXT_ARR(SSTC),
KVM_ISA_EXT_ARR(SVINVAL),
KVM_ISA_EXT_ARR(SVNAPOT),
@ -99,6 +100,9 @@ static bool kvm_riscv_vcpu_isa_enable_allowed(unsigned long ext)
switch (ext) {
case KVM_RISCV_ISA_EXT_H:
return false;
case KVM_RISCV_ISA_EXT_SSCOFPMF:
/* Sscofpmf depends on interrupt filtering defined in ssaia */
return __riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSAIA);
case KVM_RISCV_ISA_EXT_V:
return riscv_v_vstate_ctrl_user_allowed();
default:
@ -116,6 +120,8 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
case KVM_RISCV_ISA_EXT_C:
case KVM_RISCV_ISA_EXT_I:
case KVM_RISCV_ISA_EXT_M:
/* There is not architectural config bit to disable sscofpmf completely */
case KVM_RISCV_ISA_EXT_SSCOFPMF:
case KVM_RISCV_ISA_EXT_SSTC:
case KVM_RISCV_ISA_EXT_SVINVAL:
case KVM_RISCV_ISA_EXT_SVNAPOT:

View File

@ -14,6 +14,7 @@
#include <asm/csr.h>
#include <asm/kvm_vcpu_sbi.h>
#include <asm/kvm_vcpu_pmu.h>
#include <asm/sbi.h>
#include <linux/bitops.h>
#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
@ -39,7 +40,7 @@ static u64 kvm_pmu_get_sample_period(struct kvm_pmc *pmc)
u64 sample_period;
if (!pmc->counter_val)
sample_period = counter_val_mask + 1;
sample_period = counter_val_mask;
else
sample_period = (-pmc->counter_val) & counter_val_mask;
@ -196,6 +197,36 @@ static int pmu_get_pmc_index(struct kvm_pmu *pmu, unsigned long eidx,
return kvm_pmu_get_programmable_pmc_index(pmu, eidx, cbase, cmask);
}
static int pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
unsigned long *out_val)
{
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
int fevent_code;
if (!IS_ENABLED(CONFIG_32BIT)) {
pr_warn("%s: should be invoked for only RV32\n", __func__);
return -EINVAL;
}
if (cidx >= kvm_pmu_num_counters(kvpmu) || cidx == 1) {
pr_warn("Invalid counter id [%ld]during read\n", cidx);
return -EINVAL;
}
pmc = &kvpmu->pmc[cidx];
if (pmc->cinfo.type != SBI_PMU_CTR_TYPE_FW)
return -EINVAL;
fevent_code = get_event_code(pmc->event_idx);
pmc->counter_val = kvpmu->fw_event[fevent_code].value;
*out_val = pmc->counter_val >> 32;
return 0;
}
static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
unsigned long *out_val)
{
@ -204,6 +235,11 @@ static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
u64 enabled, running;
int fevent_code;
if (cidx >= kvm_pmu_num_counters(kvpmu) || cidx == 1) {
pr_warn("Invalid counter id [%ld] during read\n", cidx);
return -EINVAL;
}
pmc = &kvpmu->pmc[cidx];
if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
@ -229,8 +265,50 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
return 0;
}
static int kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
unsigned long flags, unsigned long eidx, unsigned long evtdata)
static void kvm_riscv_pmu_overflow(struct perf_event *perf_event,
struct perf_sample_data *data,
struct pt_regs *regs)
{
struct kvm_pmc *pmc = perf_event->overflow_handler_context;
struct kvm_vcpu *vcpu = pmc->vcpu;
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
struct riscv_pmu *rpmu = to_riscv_pmu(perf_event->pmu);
u64 period;
/*
* Stop the event counting by directly accessing the perf_event.
* Otherwise, this needs to deferred via a workqueue.
* That will introduce skew in the counter value because the actual
* physical counter would start after returning from this function.
* It will be stopped again once the workqueue is scheduled
*/
rpmu->pmu.stop(perf_event, PERF_EF_UPDATE);
/*
* The hw counter would start automatically when this function returns.
* Thus, the host may continue to interrupt and inject it to the guest
* even without the guest configuring the next event. Depending on the hardware
* the host may have some sluggishness only if privilege mode filtering is not
* available. In an ideal world, where qemu is not the only capable hardware,
* this can be removed.
* FYI: ARM64 does this way while x86 doesn't do anything as such.
* TODO: Should we keep it for RISC-V ?
*/
period = -(local64_read(&perf_event->count));
local64_set(&perf_event->hw.period_left, 0);
perf_event->attr.sample_period = period;
perf_event->hw.sample_period = period;
set_bit(pmc->idx, kvpmu->pmc_overflown);
kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_PMU_OVF);
rpmu->pmu.start(perf_event, PERF_EF_RELOAD);
}
static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
unsigned long flags, unsigned long eidx,
unsigned long evtdata)
{
struct perf_event *event;
@ -247,7 +325,7 @@ static int kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr
*/
attr->sample_period = kvm_pmu_get_sample_period(pmc);
event = perf_event_create_kernel_counter(attr, -1, current, NULL, pmc);
event = perf_event_create_kernel_counter(attr, -1, current, kvm_riscv_pmu_overflow, pmc);
if (IS_ERR(event)) {
pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
return PTR_ERR(event);
@ -310,6 +388,80 @@ int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
return ret;
}
static void kvm_pmu_clear_snapshot_area(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
if (kvpmu->sdata) {
if (kvpmu->snapshot_addr != INVALID_GPA) {
memset(kvpmu->sdata, 0, snapshot_area_size);
kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr,
kvpmu->sdata, snapshot_area_size);
} else {
pr_warn("snapshot address invalid\n");
}
kfree(kvpmu->sdata);
kvpmu->sdata = NULL;
}
kvpmu->snapshot_addr = INVALID_GPA;
}
int kvm_riscv_vcpu_pmu_snapshot_set_shmem(struct kvm_vcpu *vcpu, unsigned long saddr_low,
unsigned long saddr_high, unsigned long flags,
struct kvm_vcpu_sbi_return *retdata)
{
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
int sbiret = 0;
gpa_t saddr;
unsigned long hva;
bool writable;
if (!kvpmu || flags) {
sbiret = SBI_ERR_INVALID_PARAM;
goto out;
}
if (saddr_low == SBI_SHMEM_DISABLE && saddr_high == SBI_SHMEM_DISABLE) {
kvm_pmu_clear_snapshot_area(vcpu);
return 0;
}
saddr = saddr_low;
if (saddr_high != 0) {
if (IS_ENABLED(CONFIG_32BIT))
saddr |= ((gpa_t)saddr_high << 32);
else
sbiret = SBI_ERR_INVALID_ADDRESS;
goto out;
}
hva = kvm_vcpu_gfn_to_hva_prot(vcpu, saddr >> PAGE_SHIFT, &writable);
if (kvm_is_error_hva(hva) || !writable) {
sbiret = SBI_ERR_INVALID_ADDRESS;
goto out;
}
kvpmu->sdata = kzalloc(snapshot_area_size, GFP_ATOMIC);
if (!kvpmu->sdata)
return -ENOMEM;
if (kvm_vcpu_write_guest(vcpu, saddr, kvpmu->sdata, snapshot_area_size)) {
kfree(kvpmu->sdata);
sbiret = SBI_ERR_FAILURE;
goto out;
}
kvpmu->snapshot_addr = saddr;
out:
retdata->err_val = sbiret;
return 0;
}
int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu,
struct kvm_vcpu_sbi_return *retdata)
{
@ -343,20 +495,40 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
int i, pmc_index, sbiret = 0;
struct kvm_pmc *pmc;
int fevent_code;
bool snap_flag_set = flags & SBI_PMU_START_FLAG_INIT_SNAPSHOT;
if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
sbiret = SBI_ERR_INVALID_PARAM;
goto out;
}
if (snap_flag_set) {
if (kvpmu->snapshot_addr == INVALID_GPA) {
sbiret = SBI_ERR_NO_SHMEM;
goto out;
}
if (kvm_vcpu_read_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
sizeof(struct riscv_pmu_snapshot_data))) {
pr_warn("Unable to read snapshot shared memory while starting counters\n");
sbiret = SBI_ERR_FAILURE;
goto out;
}
}
/* Start the counters that have been configured and requested by the guest */
for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
pmc_index = i + ctr_base;
if (!test_bit(pmc_index, kvpmu->pmc_in_use))
continue;
/* The guest started the counter again. Reset the overflow status */
clear_bit(pmc_index, kvpmu->pmc_overflown);
pmc = &kvpmu->pmc[pmc_index];
if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE)
if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE) {
pmc->counter_val = ival;
} else if (snap_flag_set) {
/* The counter index in the snapshot are relative to the counter base */
pmc->counter_val = kvpmu->sdata->ctr_values[i];
}
if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
fevent_code = get_event_code(pmc->event_idx);
if (fevent_code >= SBI_PMU_FW_MAX) {
@ -400,12 +572,19 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
u64 enabled, running;
struct kvm_pmc *pmc;
int fevent_code;
bool snap_flag_set = flags & SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
bool shmem_needs_update = false;
if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
sbiret = SBI_ERR_INVALID_PARAM;
goto out;
}
if (snap_flag_set && kvpmu->snapshot_addr == INVALID_GPA) {
sbiret = SBI_ERR_NO_SHMEM;
goto out;
}
/* Stop the counters that have been configured and requested by the guest */
for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
pmc_index = i + ctr_base;
@ -432,21 +611,49 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
sbiret = SBI_ERR_ALREADY_STOPPED;
}
if (flags & SBI_PMU_STOP_FLAG_RESET) {
/* Relase the counter if this is a reset request */
pmc->counter_val += perf_event_read_value(pmc->perf_event,
&enabled, &running);
if (flags & SBI_PMU_STOP_FLAG_RESET)
/* Release the counter if this is a reset request */
kvm_pmu_release_perf_event(pmc);
}
} else {
sbiret = SBI_ERR_INVALID_PARAM;
}
if (snap_flag_set && !sbiret) {
if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW)
pmc->counter_val = kvpmu->fw_event[fevent_code].value;
else if (pmc->perf_event)
pmc->counter_val += perf_event_read_value(pmc->perf_event,
&enabled, &running);
/*
* The counter and overflow indicies in the snapshot region are w.r.to
* cbase. Modify the set bit in the counter mask instead of the pmc_index
* which indicates the absolute counter index.
*/
if (test_bit(pmc_index, kvpmu->pmc_overflown))
kvpmu->sdata->ctr_overflow_mask |= BIT(i);
kvpmu->sdata->ctr_values[i] = pmc->counter_val;
shmem_needs_update = true;
}
if (flags & SBI_PMU_STOP_FLAG_RESET) {
pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
clear_bit(pmc_index, kvpmu->pmc_in_use);
clear_bit(pmc_index, kvpmu->pmc_overflown);
if (snap_flag_set) {
/*
* Only clear the given counter as the caller is responsible to
* validate both the overflow mask and configured counters.
*/
kvpmu->sdata->ctr_overflow_mask &= ~BIT(i);
shmem_needs_update = true;
}
}
}
if (shmem_needs_update)
kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
sizeof(struct riscv_pmu_snapshot_data));
out:
retdata->err_val = sbiret;
@ -458,7 +665,8 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
unsigned long eidx, u64 evtdata,
struct kvm_vcpu_sbi_return *retdata)
{
int ctr_idx, ret, sbiret = 0;
int ctr_idx, sbiret = 0;
long ret;
bool is_fevent;
unsigned long event_code;
u32 etype = kvm_pmu_get_perf_event_type(eidx);
@ -517,8 +725,10 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
kvpmu->fw_event[event_code].started = true;
} else {
ret = kvm_pmu_create_perf_event(pmc, &attr, flags, eidx, evtdata);
if (ret)
return ret;
if (ret) {
sbiret = SBI_ERR_NOT_SUPPORTED;
goto out;
}
}
set_bit(ctr_idx, kvpmu->pmc_in_use);
@ -530,7 +740,19 @@ out:
return 0;
}
int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_return *retdata)
{
int ret;
ret = pmu_fw_ctr_read_hi(vcpu, cidx, &retdata->out_val);
if (ret == -EINVAL)
retdata->err_val = SBI_ERR_INVALID_PARAM;
return 0;
}
int kvm_riscv_vcpu_pmu_fw_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_return *retdata)
{
int ret;
@ -566,6 +788,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
kvpmu->num_hw_ctrs = num_hw_ctrs + 1;
kvpmu->num_fw_ctrs = SBI_PMU_FW_MAX;
memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
kvpmu->snapshot_addr = INVALID_GPA;
if (kvpmu->num_hw_ctrs > RISCV_KVM_MAX_HW_CTRS) {
pr_warn_once("Limiting the hardware counters to 32 as specified by the ISA");
@ -585,6 +808,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
pmc = &kvpmu->pmc[i];
pmc->idx = i;
pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
pmc->vcpu = vcpu;
if (i < kvpmu->num_hw_ctrs) {
pmc->cinfo.type = SBI_PMU_CTR_TYPE_HW;
if (i < 3)
@ -601,7 +825,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
pmc->cinfo.csr = CSR_CYCLE + i;
} else {
pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
pmc->cinfo.width = BITS_PER_LONG - 1;
pmc->cinfo.width = 63;
}
}
@ -617,14 +841,16 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
if (!kvpmu)
return;
for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_MAX_COUNTERS) {
for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS) {
pmc = &kvpmu->pmc[i];
pmc->counter_val = 0;
kvm_pmu_release_perf_event(pmc);
pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
}
bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
bitmap_zero(kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS);
bitmap_zero(kvpmu->pmc_overflown, RISCV_KVM_MAX_COUNTERS);
memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
kvm_pmu_clear_snapshot_area(vcpu);
}
void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)

View File

@ -138,8 +138,11 @@ void kvm_riscv_vcpu_sbi_system_reset(struct kvm_vcpu *vcpu,
unsigned long i;
struct kvm_vcpu *tmp;
kvm_for_each_vcpu(i, tmp, vcpu->kvm)
tmp->arch.power_off = true;
kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
spin_lock(&vcpu->arch.mp_state_lock);
WRITE_ONCE(tmp->arch.mp_state.mp_state, KVM_MP_STATE_STOPPED);
spin_unlock(&vcpu->arch.mp_state_lock);
}
kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
memset(&run->system_event, 0, sizeof(run->system_event));

Some files were not shown because too many files have changed in this diff Show More