KVM/arm64 updates for Linux 6.10

- Move a lot of state that was previously stored on a per vcpu
   basis into a per-CPU area, because it is only pertinent to the
   host while the vcpu is loaded. This results in better state
   tracking, and a smaller vcpu structure.
 
 - Add full handling of the ERET/ERETAA/ERETAB instructions in
   nested virtualisation. The last two instructions also require
   emulating part of the pointer authentication extension.
   As a result, the trap handling of pointer authentication has
   been greattly simplified.
 
 - Turn the global (and not very scalable) LPI translation cache
   into a per-ITS, scalable cache, making non directly injected
   LPIs much cheaper to make visible to the vcpu.
 
 - A batch of pKVM patches, mostly fixes and cleanups, as the
   upstreaming process seems to be resuming. Fingers crossed!
 
 - Allocate PPIs and SGIs outside of the vcpu structure, allowing
   for smaller EL2 mapping and some flexibility in implementing
   more or less than 32 private IRQs.
 
 - Purge stale mpidr_data if a vcpu is created after the MPIDR
   map has been created.
 
 - Preserve vcpu-specific ID registers across a vcpu reset.
 
 - Various minor cleanups and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmY/PT4ACgkQI9DQutE9
 ekNwSA/7BTro0n5gP5/SfSFJeEedigpmHQJtHJk9og0LBzjXZTvYqKpI5J1HnpWE
 AFsDf3aDRPaSCvI+S14LkkK+TmGtVEXUg8YGytQo08IcO2x6xBT/YjpkVOHy23kq
 SGgNMPNUH2sycb7hTcz9Z/V0vBeYwFzYEAhmpvtROvmaRd8ZIyt+ofcclwUZZAQ2
 SolOXR2d+ynCh8ZCOexqyZ67keikW1NXtW5aNWWFc6S6qhmcWdaWJGDcSyHauFac
 +YuHjPETJYh7TNpwYTmKclRh1fk/CgA/e+r71Hlgdkg+DGCyVnEZBQxqMi6GTzNC
 dzy3qhTtRT61SR54q55yMVIC3o6uRSkht+xNg1Nd+UghiqGKAtoYhvGjduodONW2
 1Eas6O+vHipu98HgFnkJRPlnF1HR3VunPDwpzIWIZjK0fIXEfrWqCR3nHFaxShOR
 dniTEPfELguxOtbl3jCZ+KHCIXueysczXFlqQjSDkg/P1l0jKBgpkZzMPY2mpP1y
 TgjipfSL5gr1GPdbrmh4WznQtn5IYWduKIrdEmSBuru05OmBaCO4geXPUwL4coHd
 O8TBnXYBTN/z3lORZMSOj9uK8hgU1UWmnOIkdJ4YBBAL8DSS+O+KtCRkHQP0ghl+
 whl0q1SWTu4LtOQzN5CUrhq9Tge11erEt888VyJbBJmv8x6qJjE=
 =CEfD
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 6.10

- Move a lot of state that was previously stored on a per vcpu
  basis into a per-CPU area, because it is only pertinent to the
  host while the vcpu is loaded. This results in better state
  tracking, and a smaller vcpu structure.

- Add full handling of the ERET/ERETAA/ERETAB instructions in
  nested virtualisation. The last two instructions also require
  emulating part of the pointer authentication extension.
  As a result, the trap handling of pointer authentication has
  been greattly simplified.

- Turn the global (and not very scalable) LPI translation cache
  into a per-ITS, scalable cache, making non directly injected
  LPIs much cheaper to make visible to the vcpu.

- A batch of pKVM patches, mostly fixes and cleanups, as the
  upstreaming process seems to be resuming. Fingers crossed!

- Allocate PPIs and SGIs outside of the vcpu structure, allowing
  for smaller EL2 mapping and some flexibility in implementing
  more or less than 32 private IRQs.

- Purge stale mpidr_data if a vcpu is created after the MPIDR
  map has been created.

- Preserve vcpu-specific ID registers across a vcpu reset.

- Various minor cleanups and improvements.
This commit is contained in:
Paolo Bonzini 2024-05-12 03:15:53 -04:00
commit e5f62e27b1
74 changed files with 2967 additions and 1052 deletions

View File

@ -6894,6 +6894,13 @@ Note that KVM does not skip the faulting instruction as it does for
KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
if it decides to decode and emulate the instruction.
This feature isn't available to protected VMs, as userspace does not
have access to the state that is required to perform the emulation.
Instead, a data abort exception is directly injected in the guest.
Note that although KVM_CAP_ARM_NISV_TO_USER will be reported if
queried outside of a protected VM context, the feature will not be
exposed if queried on a protected VM file descriptor.
::
/* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */

View File

@ -0,0 +1,138 @@
.. SPDX-License-Identifier: GPL-2.0
=======================================
ARM firmware pseudo-registers interface
=======================================
KVM handles the hypercall services as requested by the guests. New hypercall
services are regularly made available by the ARM specification or by KVM (as
vendor services) if they make sense from a virtualization point of view.
This means that a guest booted on two different versions of KVM can observe
two different "firmware" revisions. This could cause issues if a given guest
is tied to a particular version of a hypercall service, or if a migration
causes a different version to be exposed out of the blue to an unsuspecting
guest.
In order to remedy this situation, KVM exposes a set of "firmware
pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
interface. These registers can be saved/restored by userspace, and set
to a convenient value as required.
The following registers are defined:
* KVM_REG_ARM_PSCI_VERSION:
KVM implements the PSCI (Power State Coordination Interface)
specification in order to provide services such as CPU on/off, reset
and power-off to the guest.
- Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
(and thus has already been initialized)
- Returns the current PSCI version on GET_ONE_REG (defaulting to the
highest PSCI version implemented by KVM and compatible with v0.2)
- Allows any PSCI version implemented by KVM and compatible with
v0.2 to be set with SET_ONE_REG
- Affects the whole VM (even if the register view is per-vcpu)
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
Holds the state of the firmware support to mitigate CVE-2017-5715, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_1 in [1].
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL:
KVM does not offer
firmware support for the workaround. The mitigation status for the
guest is unknown.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL:
The workaround HVC call is
available to the guest and required for the mitigation.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED:
The workaround HVC call
is available to the guest, but it is not needed on this VCPU.
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
Holds the state of the firmware support to mitigate CVE-2018-3639, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_2 in [1]_.
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
A workaround is not
available. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
The workaround state is
unknown. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
The workaround is available,
and can be disabled by a vCPU. If
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED is set, it is active for
this vCPU.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
The workaround is always active on this vCPU or it is not needed.
Bitmap Feature Firmware Registers
---------------------------------
Contrary to the above registers, the following registers exposes the
hypercall services in the form of a feature-bitmap to the userspace. This
bitmap is translated to the services that are available to the guest.
There is a register defined per service call owner and can be accessed via
GET/SET_ONE_REG interface.
By default, these registers are set with the upper limit of the features
that are supported. This way userspace can discover all the usable
hypercall services via GET_ONE_REG. The user-space can write-back the
desired bitmap back via SET_ONE_REG. The features for the registers that
are untouched, probably because userspace isn't aware of them, will be
exposed as is to the guest.
Note that KVM will not allow the userspace to configure the registers
anymore once any of the vCPUs has run at least once. Instead, it will
return a -EBUSY.
The pseudo-firmware bitmap register are as follows:
* KVM_REG_ARM_STD_BMAP:
Controls the bitmap of the ARM Standard Secure Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_BIT_TRNG_V1_0:
The bit represents the services offered under v1.0 of ARM True Random
Number Generator (TRNG) specification, ARM DEN0098.
* KVM_REG_ARM_STD_HYP_BMAP:
Controls the bitmap of the ARM Standard Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_HYP_BIT_PV_TIME:
The bit represents the Paravirtualized Time service as represented by
ARM DEN0057A.
* KVM_REG_ARM_VENDOR_HYP_BMAP:
Controls the bitmap of the Vendor specific Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT
The bit represents the ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID
and ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID function-ids.
Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
The bit represents the Precision Time Protocol KVM service.
Errors:
======= =============================================================
-ENOENT Unknown register accessed.
-EBUSY Attempt a 'write' to the register after the VM has started.
-EINVAL Invalid bitmap written to the register.
======= =============================================================
.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf

View File

@ -1,138 +1,46 @@
.. SPDX-License-Identifier: GPL-2.0
=======================
ARM Hypercall Interface
=======================
===============================================
KVM/arm64-specific hypercalls exposed to guests
===============================================
KVM handles the hypercall services as requested by the guests. New hypercall
services are regularly made available by the ARM specification or by KVM (as
vendor services) if they make sense from a virtualization point of view.
This file documents the KVM/arm64-specific hypercalls which may be
exposed by KVM/arm64 to guest operating systems. These hypercalls are
issued using the HVC instruction according to version 1.1 of the Arm SMC
Calling Convention (DEN0028/C):
This means that a guest booted on two different versions of KVM can observe
two different "firmware" revisions. This could cause issues if a given guest
is tied to a particular version of a hypercall service, or if a migration
causes a different version to be exposed out of the blue to an unsuspecting
guest.
https://developer.arm.com/docs/den0028/c
In order to remedy this situation, KVM exposes a set of "firmware
pseudo-registers" that can be manipulated using the GET/SET_ONE_REG
interface. These registers can be saved/restored by userspace, and set
to a convenient value as required.
All KVM/arm64-specific hypercalls are allocated within the "Vendor
Specific Hypervisor Service Call" range with a UID of
``28b46fb6-2ec5-11e9-a9ca-4b564d003a74``. This UID should be queried by the
guest using the standard "Call UID" function for the service range in
order to determine that the KVM/arm64-specific hypercalls are available.
The following registers are defined:
``ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID``
---------------------------------------------
* KVM_REG_ARM_PSCI_VERSION:
Provides a discovery mechanism for other KVM/arm64 hypercalls.
KVM implements the PSCI (Power State Coordination Interface)
specification in order to provide services such as CPU on/off, reset
and power-off to the guest.
+---------------------+-------------------------------------------------------------+
| Presence: | Mandatory for the KVM/arm64 UID |
+---------------------+-------------------------------------------------------------+
| Calling convention: | HVC32 |
+---------------------+----------+--------------------------------------------------+
| Function ID: | (uint32) | 0x86000000 |
+---------------------+----------+--------------------------------------------------+
| Arguments: | None |
+---------------------+----------+----+---------------------------------------------+
| Return Values: | (uint32) | R0 | Bitmap of available function numbers 0-31 |
| +----------+----+---------------------------------------------+
| | (uint32) | R1 | Bitmap of available function numbers 32-63 |
| +----------+----+---------------------------------------------+
| | (uint32) | R2 | Bitmap of available function numbers 64-95 |
| +----------+----+---------------------------------------------+
| | (uint32) | R3 | Bitmap of available function numbers 96-127 |
+---------------------+----------+----+---------------------------------------------+
- Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set
(and thus has already been initialized)
- Returns the current PSCI version on GET_ONE_REG (defaulting to the
highest PSCI version implemented by KVM and compatible with v0.2)
- Allows any PSCI version implemented by KVM and compatible with
v0.2 to be set with SET_ONE_REG
- Affects the whole VM (even if the register view is per-vcpu)
``ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID``
----------------------------------------
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
Holds the state of the firmware support to mitigate CVE-2017-5715, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_1 in [1].
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL:
KVM does not offer
firmware support for the workaround. The mitigation status for the
guest is unknown.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL:
The workaround HVC call is
available to the guest and required for the mitigation.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED:
The workaround HVC call
is available to the guest, but it is not needed on this VCPU.
* KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
Holds the state of the firmware support to mitigate CVE-2018-3639, as
offered by KVM to the guest via a HVC call. The workaround is described
under SMCCC_ARCH_WORKAROUND_2 in [1]_.
Accepted values are:
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL:
A workaround is not
available. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN:
The workaround state is
unknown. KVM does not offer firmware support for the workaround.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL:
The workaround is available,
and can be disabled by a vCPU. If
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED is set, it is active for
this vCPU.
KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED:
The workaround is always active on this vCPU or it is not needed.
Bitmap Feature Firmware Registers
---------------------------------
Contrary to the above registers, the following registers exposes the
hypercall services in the form of a feature-bitmap to the userspace. This
bitmap is translated to the services that are available to the guest.
There is a register defined per service call owner and can be accessed via
GET/SET_ONE_REG interface.
By default, these registers are set with the upper limit of the features
that are supported. This way userspace can discover all the usable
hypercall services via GET_ONE_REG. The user-space can write-back the
desired bitmap back via SET_ONE_REG. The features for the registers that
are untouched, probably because userspace isn't aware of them, will be
exposed as is to the guest.
Note that KVM will not allow the userspace to configure the registers
anymore once any of the vCPUs has run at least once. Instead, it will
return a -EBUSY.
The pseudo-firmware bitmap register are as follows:
* KVM_REG_ARM_STD_BMAP:
Controls the bitmap of the ARM Standard Secure Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_BIT_TRNG_V1_0:
The bit represents the services offered under v1.0 of ARM True Random
Number Generator (TRNG) specification, ARM DEN0098.
* KVM_REG_ARM_STD_HYP_BMAP:
Controls the bitmap of the ARM Standard Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_STD_HYP_BIT_PV_TIME:
The bit represents the Paravirtualized Time service as represented by
ARM DEN0057A.
* KVM_REG_ARM_VENDOR_HYP_BMAP:
Controls the bitmap of the Vendor specific Hypervisor Service Calls.
The following bits are accepted:
Bit-0: KVM_REG_ARM_VENDOR_HYP_BIT_FUNC_FEAT
The bit represents the ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID
and ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID function-ids.
Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_PTP:
The bit represents the Precision Time Protocol KVM service.
Errors:
======= =============================================================
-ENOENT Unknown register accessed.
-EBUSY Attempt a 'write' to the register after the VM has started.
-EINVAL Invalid bitmap written to the register.
======= =============================================================
.. [1] https://developer.arm.com/-/media/developer/pdf/ARM_DEN_0070A_Firmware_interfaces_for_mitigating_CVE-2017-5715.pdf
See ptp_kvm.rst

View File

@ -7,6 +7,7 @@ ARM
.. toctree::
:maxdepth: 2
fw-pseudo-registers
hyp-abi
hypercalls
pvtime

View File

@ -7,19 +7,29 @@ PTP_KVM is used for high precision time sync between host and guests.
It relies on transferring the wall clock and counter value from the
host to the guest using a KVM-specific hypercall.
* ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID: 0x86000001
``ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID``
----------------------------------------
This hypercall uses the SMC32/HVC32 calling convention:
Retrieve current time information for the specific counter. There are no
endianness restrictions.
ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID
============== ======== =====================================
Function ID: (uint32) 0x86000001
Arguments: (uint32) KVM_PTP_VIRT_COUNTER(0)
KVM_PTP_PHYS_COUNTER(1)
Return Values: (int32) NOT_SUPPORTED(-1) on error, or
(uint32) Upper 32 bits of wall clock time (r0)
(uint32) Lower 32 bits of wall clock time (r1)
(uint32) Upper 32 bits of counter (r2)
(uint32) Lower 32 bits of counter (r3)
Endianness: No Restrictions.
============== ======== =====================================
+---------------------+-------------------------------------------------------+
| Presence: | Optional |
+---------------------+-------------------------------------------------------+
| Calling convention: | HVC32 |
+---------------------+----------+--------------------------------------------+
| Function ID: | (uint32) | 0x86000001 |
+---------------------+----------+----+---------------------------------------+
| Arguments: | (uint32) | R1 | ``KVM_PTP_VIRT_COUNTER (0)`` |
| | | +---------------------------------------+
| | | | ``KVM_PTP_PHYS_COUNTER (1)`` |
+---------------------+----------+----+---------------------------------------+
| Return Values: | (int32) | R0 | ``NOT_SUPPORTED (-1)`` on error, else |
| | | | upper 32 bits of wall clock time |
| +----------+----+---------------------------------------+
| | (uint32) | R1 | Lower 32 bits of wall clock time |
| +----------+----+---------------------------------------+
| | (uint32) | R2 | Upper 32 bits of counter |
| +----------+----+---------------------------------------+
| | (uint32) | R3 | Lower 32 bits of counter |
+---------------------+----------+----+---------------------------------------+

View File

@ -404,6 +404,18 @@ static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
return (esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_ACCESS;
}
/* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
static inline bool esr_iss_is_eretax(unsigned long esr)
{
return esr & ESR_ELx_ERET_ISS_ERET;
}
/* Indicate which key is used for ERETAx (false: A-Key, true: B-Key) */
static inline bool esr_iss_is_eretab(unsigned long esr)
{
return esr & ESR_ELx_ERET_ISS_ERETA;
}
const char *esr_get_class_string(unsigned long esr);
#endif /* __ASSEMBLY */

View File

@ -73,10 +73,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_range,
__KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context,
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
__KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr,
__KVM_HOST_SMCCC_FUNC___vgic_v3_write_vmcr,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
@ -241,8 +239,6 @@ extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
extern void __kvm_adjust_pc(struct kvm_vcpu *vcpu);
extern u64 __vgic_v3_get_gic_config(void);
extern u64 __vgic_v3_read_vmcr(void);
extern void __vgic_v3_write_vmcr(u32 vmcr);
extern void __vgic_v3_init_lrs(void);
extern u64 __kvm_get_mdcr_el2(void);

View File

@ -125,16 +125,6 @@ static inline void vcpu_set_wfx_traps(struct kvm_vcpu *vcpu)
vcpu->arch.hcr_el2 |= HCR_TWI;
}
static inline void vcpu_ptrauth_enable(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 |= (HCR_API | HCR_APK);
}
static inline void vcpu_ptrauth_disable(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 &= ~(HCR_API | HCR_APK);
}
static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu)
{
return vcpu->arch.vsesr_el2;
@ -587,16 +577,14 @@ static __always_inline u64 kvm_get_reset_cptr_el2(struct kvm_vcpu *vcpu)
} else if (has_hvhe()) {
val = (CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN);
if (!vcpu_has_sve(vcpu) ||
(vcpu->arch.fp_state != FP_STATE_GUEST_OWNED))
if (!vcpu_has_sve(vcpu) || !guest_owns_fp_regs())
val |= CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN;
if (cpus_have_final_cap(ARM64_SME))
val |= CPACR_EL1_SMEN_EL1EN | CPACR_EL1_SMEN_EL0EN;
} else {
val = CPTR_NVHE_EL2_RES1;
if (vcpu_has_sve(vcpu) &&
(vcpu->arch.fp_state == FP_STATE_GUEST_OWNED))
if (vcpu_has_sve(vcpu) && guest_owns_fp_regs())
val |= CPTR_EL2_TZ;
if (cpus_have_final_cap(ARM64_SME))
val &= ~CPTR_EL2_TSM;

View File

@ -211,6 +211,7 @@ typedef unsigned int pkvm_handle_t;
struct kvm_protected_vm {
pkvm_handle_t handle;
struct kvm_hyp_memcache teardown_mc;
bool enabled;
};
struct kvm_mpidr_data {
@ -220,20 +221,10 @@ struct kvm_mpidr_data {
static inline u16 kvm_mpidr_index(struct kvm_mpidr_data *data, u64 mpidr)
{
unsigned long mask = data->mpidr_mask;
u64 aff = mpidr & MPIDR_HWID_BITMASK;
int nbits, bit, bit_idx = 0;
u16 index = 0;
unsigned long index = 0, mask = data->mpidr_mask;
unsigned long aff = mpidr & MPIDR_HWID_BITMASK;
/*
* If this looks like RISC-V's BEXT or x86's PEXT
* instructions, it isn't by accident.
*/
nbits = fls(mask);
for_each_set_bit(bit, &mask, nbits) {
index |= (aff & BIT(bit)) >> (bit - bit_idx);
bit_idx++;
}
bitmap_gather(&index, &aff, &mask, fls(mask));
return index;
}
@ -530,8 +521,42 @@ struct kvm_cpu_context {
u64 *vncr_array;
};
/*
* This structure is instantiated on a per-CPU basis, and contains
* data that is:
*
* - tied to a single physical CPU, and
* - either have a lifetime that does not extend past vcpu_put()
* - or is an invariant for the lifetime of the system
*
* Use host_data_ptr(field) as a way to access a pointer to such a
* field.
*/
struct kvm_host_data {
struct kvm_cpu_context host_ctxt;
struct user_fpsimd_state *fpsimd_state; /* hyp VA */
/* Ownership of the FP regs */
enum {
FP_STATE_FREE,
FP_STATE_HOST_OWNED,
FP_STATE_GUEST_OWNED,
} fp_owner;
/*
* host_debug_state contains the host registers which are
* saved and restored during world switches.
*/
struct {
/* {Break,watch}point registers */
struct kvm_guest_debug_arch regs;
/* Statistical profiling extension */
u64 pmscr_el1;
/* Self-hosted trace */
u64 trfcr_el1;
/* Values of trap registers for the host before guest entry. */
u64 mdcr_el2;
} host_debug_state;
};
struct kvm_host_psci_config {
@ -592,19 +617,9 @@ struct kvm_vcpu_arch {
u64 mdcr_el2;
u64 cptr_el2;
/* Values of trap registers for the host before guest entry. */
u64 mdcr_el2_host;
/* Exception Information */
struct kvm_vcpu_fault_info fault;
/* Ownership of the FP regs */
enum {
FP_STATE_FREE,
FP_STATE_HOST_OWNED,
FP_STATE_GUEST_OWNED,
} fp_state;
/* Configuration flags, set once and for all before the vcpu can run */
u8 cflags;
@ -627,11 +642,10 @@ struct kvm_vcpu_arch {
* We maintain more than a single set of debug registers to support
* debugging the guest from the host and to maintain separate host and
* guest state during world switches. vcpu_debug_state are the debug
* registers of the vcpu as the guest sees them. host_debug_state are
* the host registers which are saved and restored during
* world switches. external_debug_state contains the debug
* values we want to debug the guest. This is set via the
* KVM_SET_GUEST_DEBUG ioctl.
* registers of the vcpu as the guest sees them.
*
* external_debug_state contains the debug values we want to debug the
* guest. This is set via the KVM_SET_GUEST_DEBUG ioctl.
*
* debug_ptr points to the set of debug registers that should be loaded
* onto the hardware when running the guest.
@ -640,18 +654,6 @@ struct kvm_vcpu_arch {
struct kvm_guest_debug_arch vcpu_debug_state;
struct kvm_guest_debug_arch external_debug_state;
struct user_fpsimd_state *host_fpsimd_state; /* hyp VA */
struct task_struct *parent_task;
struct {
/* {Break,watch}point registers */
struct kvm_guest_debug_arch regs;
/* Statistical profiling extension */
u64 pmscr_el1;
/* Self-hosted trace */
u64 trfcr_el1;
} host_debug_state;
/* VGIC state */
struct vgic_cpu vgic_cpu;
struct arch_timer_cpu timer_cpu;
@ -817,8 +819,6 @@ struct kvm_vcpu_arch {
#define DEBUG_STATE_SAVE_SPE __vcpu_single_flag(iflags, BIT(5))
/* Save TRBE context if active */
#define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
/* vcpu running in HYP context */
#define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7))
/* SVE enabled for host EL0 */
#define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
@ -896,7 +896,7 @@ struct kvm_vcpu_arch {
* Don't bother with VNCR-based accesses in the nVHE code, it has no
* business dealing with NV.
*/
static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
static inline u64 *___ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
{
#if !defined (__KVM_NVHE_HYPERVISOR__)
if (unlikely(cpus_have_final_cap(ARM64_HAS_NESTED_VIRT) &&
@ -906,6 +906,13 @@ static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
return (u64 *)&ctxt->sys_regs[r];
}
#define __ctxt_sys_reg(c,r) \
({ \
BUILD_BUG_ON(__builtin_constant_p(r) && \
(r) >= NR_SYS_REGS); \
___ctxt_sys_reg(c, r); \
})
#define ctxt_sys_reg(c,r) (*__ctxt_sys_reg(c,r))
u64 kvm_vcpu_sanitise_vncr_reg(const struct kvm_vcpu *, enum vcpu_sysreg);
@ -1168,6 +1175,44 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
DECLARE_KVM_HYP_PER_CPU(struct kvm_host_data, kvm_host_data);
/*
* How we access per-CPU host data depends on the where we access it from,
* and the mode we're in:
*
* - VHE and nVHE hypervisor bits use their locally defined instance
*
* - the rest of the kernel use either the VHE or nVHE one, depending on
* the mode we're running in.
*
* Unless we're in protected mode, fully deprivileged, and the nVHE
* per-CPU stuff is exclusively accessible to the protected EL2 code.
* In this case, the EL1 code uses the *VHE* data as its private state
* (which makes sense in a way as there shouldn't be any shared state
* between the host and the hypervisor).
*
* Yes, this is all totally trivial. Shoot me now.
*/
#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
#define host_data_ptr(f) (&this_cpu_ptr(&kvm_host_data)->f)
#else
#define host_data_ptr(f) \
(static_branch_unlikely(&kvm_protected_mode_initialized) ? \
&this_cpu_ptr(&kvm_host_data)->f : \
&this_cpu_ptr_hyp_sym(kvm_host_data)->f)
#endif
/* Check whether the FP regs are owned by the guest */
static inline bool guest_owns_fp_regs(void)
{
return *host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED;
}
/* Check whether the FP regs are owned by the host */
static inline bool host_owns_fp_regs(void)
{
return *host_data_ptr(fp_owner) == FP_STATE_HOST_OWNED;
}
static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt)
{
/* The host's MPIDR is immutable, so let's set it up at boot time */
@ -1211,7 +1256,6 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu);
static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
{
@ -1247,10 +1291,9 @@ struct kvm *kvm_arch_alloc_vm(void);
#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
static inline bool kvm_vm_is_protected(struct kvm *kvm)
{
return false;
}
#define kvm_vm_is_protected(kvm) (is_protected_kvm_enabled() && (kvm)->arch.pkvm.enabled)
#define vcpu_is_protected(vcpu) kvm_vm_is_protected((vcpu)->kvm)
int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature);
bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
@ -1275,6 +1318,8 @@ static inline bool __vcpu_has_feature(const struct kvm_arch *ka, int feature)
#define vcpu_has_feature(v, f) __vcpu_has_feature(&(v)->kvm->arch, (f))
#define kvm_vcpu_initialized(v) vcpu_get_flag(vcpu, VCPU_INITIALIZED)
int kvm_trng_call(struct kvm_vcpu *vcpu);
#ifdef CONFIG_KVM
extern phys_addr_t hyp_mem_base;
@ -1331,4 +1376,19 @@ bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
(get_idreg_field((kvm), id, fld) >= expand_field_sign(id, fld, min) && \
get_idreg_field((kvm), id, fld) <= expand_field_sign(id, fld, max))
/* Check for a given level of PAuth support */
#define kvm_has_pauth(k, l) \
({ \
bool pa, pi, pa3; \
\
pa = kvm_has_feat((k), ID_AA64ISAR1_EL1, APA, l); \
pa &= kvm_has_feat((k), ID_AA64ISAR1_EL1, GPA, IMP); \
pi = kvm_has_feat((k), ID_AA64ISAR1_EL1, API, l); \
pi &= kvm_has_feat((k), ID_AA64ISAR1_EL1, GPI, IMP); \
pa3 = kvm_has_feat((k), ID_AA64ISAR2_EL1, APA3, l); \
pa3 &= kvm_has_feat((k), ID_AA64ISAR2_EL1, GPA3, IMP); \
\
(pa + pi + pa3) == 1; \
})
#endif /* __ARM64_KVM_HOST_H__ */

View File

@ -80,8 +80,8 @@ void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_save_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if);
void __vgic_v3_restore_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if);
int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
#ifdef __KVM_NVHE_HYPERVISOR__

View File

@ -60,7 +60,20 @@ static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
return ttbr0 & ~GENMASK_ULL(63, 48);
}
extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
int kvm_init_nv_sysregs(struct kvm *kvm);
#ifdef CONFIG_ARM64_PTR_AUTH
bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr);
#else
static inline bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr)
{
/* We really should never execute this... */
WARN_ON_ONCE(1);
*elr = 0xbad9acc0debadbad;
return false;
}
#endif
#endif /* __ARM64_KVM_NESTED_H */

View File

@ -99,5 +99,26 @@ alternative_else_nop_endif
.macro ptrauth_switch_to_hyp g_ctxt, h_ctxt, reg1, reg2, reg3
.endm
#endif /* CONFIG_ARM64_PTR_AUTH */
#else /* !__ASSEMBLY */
#define __ptrauth_save_key(ctxt, key) \
do { \
u64 __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYLO_EL1); \
ctxt_sys_reg(ctxt, key ## KEYLO_EL1) = __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYHI_EL1); \
ctxt_sys_reg(ctxt, key ## KEYHI_EL1) = __val; \
} while(0)
#define ptrauth_save_keys(ctxt) \
do { \
__ptrauth_save_key(ctxt, APIA); \
__ptrauth_save_key(ctxt, APIB); \
__ptrauth_save_key(ctxt, APDA); \
__ptrauth_save_key(ctxt, APDB); \
__ptrauth_save_key(ctxt, APGA); \
} while(0)
#endif /* __ASSEMBLY__ */
#endif /* __ASM_KVM_PTRAUTH_H */

View File

@ -297,6 +297,7 @@
#define TCR_TBI1 (UL(1) << 38)
#define TCR_HA (UL(1) << 39)
#define TCR_HD (UL(1) << 40)
#define TCR_TBID0 (UL(1) << 51)
#define TCR_TBID1 (UL(1) << 52)
#define TCR_NFD0 (UL(1) << 53)
#define TCR_NFD1 (UL(1) << 54)

View File

@ -82,6 +82,12 @@ bool is_kvm_arm_initialised(void);
DECLARE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
static inline bool is_pkvm_initialized(void)
{
return IS_ENABLED(CONFIG_KVM) &&
static_branch_likely(&kvm_protected_mode_initialized);
}
/* Reports the availability of HYP mode */
static inline bool is_hyp_mode_available(void)
{
@ -89,8 +95,7 @@ static inline bool is_hyp_mode_available(void)
* If KVM protected mode is initialized, all CPUs must have been booted
* in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
*/
if (IS_ENABLED(CONFIG_KVM) &&
static_branch_likely(&kvm_protected_mode_initialized))
if (is_pkvm_initialized())
return true;
return (__boot_cpu_mode[0] == BOOT_CPU_MODE_EL2 &&
@ -104,8 +109,7 @@ static inline bool is_hyp_mode_mismatched(void)
* If KVM protected mode is initialized, all CPUs must have been booted
* in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
*/
if (IS_ENABLED(CONFIG_KVM) &&
static_branch_likely(&kvm_protected_mode_initialized))
if (is_pkvm_initialized())
return false;
return __boot_cpu_mode[0] != __boot_cpu_mode[1];

View File

@ -209,8 +209,8 @@ static const struct {
char alias[FTR_ALIAS_NAME_LEN];
char feature[FTR_ALIAS_OPTION_LEN];
} aliases[] __initconst = {
{ "kvm_arm.mode=nvhe", "id_aa64mmfr1.vh=0" },
{ "kvm_arm.mode=protected", "id_aa64mmfr1.vh=0" },
{ "kvm_arm.mode=nvhe", "arm64_sw.hvhe=0 id_aa64mmfr1.vh=0" },
{ "kvm_arm.mode=protected", "arm64_sw.hvhe=1" },
{ "arm64.nosve", "id_aa64pfr0.sve=0" },
{ "arm64.nosme", "id_aa64pfr1.sme=0" },
{ "arm64.nobti", "id_aa64pfr1.bt=0" },

View File

@ -23,6 +23,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
vgic/vgic-its.o vgic/vgic-debug.o
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o
kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o
always-y := hyp_constants.h hyp-constants.s

View File

@ -35,10 +35,11 @@
#include <asm/virt.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_nested.h>
#include <asm/kvm_pkvm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_ptrauth.h>
#include <asm/sections.h>
#include <kvm/arm_hypercalls.h>
@ -69,15 +70,42 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
}
/*
* This functions as an allow-list of protected VM capabilities.
* Features not explicitly allowed by this function are denied.
*/
static bool pkvm_ext_allowed(struct kvm *kvm, long ext)
{
switch (ext) {
case KVM_CAP_IRQCHIP:
case KVM_CAP_ARM_PSCI:
case KVM_CAP_ARM_PSCI_0_2:
case KVM_CAP_NR_VCPUS:
case KVM_CAP_MAX_VCPUS:
case KVM_CAP_MAX_VCPU_ID:
case KVM_CAP_MSI_DEVID:
case KVM_CAP_ARM_VM_IPA_SIZE:
case KVM_CAP_ARM_PMU_V3:
case KVM_CAP_ARM_SVE:
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
case KVM_CAP_ARM_PTRAUTH_GENERIC:
return true;
default:
return false;
}
}
int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
struct kvm_enable_cap *cap)
{
int r;
u64 new_cap;
int r = -EINVAL;
if (cap->flags)
return -EINVAL;
if (kvm_vm_is_protected(kvm) && !pkvm_ext_allowed(kvm, cap->cap))
return -EINVAL;
switch (cap->cap) {
case KVM_CAP_ARM_NISV_TO_USER:
r = 0;
@ -86,9 +114,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
break;
case KVM_CAP_ARM_MTE:
mutex_lock(&kvm->lock);
if (!system_supports_mte() || kvm->created_vcpus) {
r = -EINVAL;
} else {
if (system_supports_mte() && !kvm->created_vcpus) {
r = 0;
set_bit(KVM_ARCH_FLAG_MTE_ENABLED, &kvm->arch.flags);
}
@ -99,25 +125,22 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
set_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags);
break;
case KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE:
new_cap = cap->args[0];
mutex_lock(&kvm->slots_lock);
/*
* To keep things simple, allow changing the chunk
* size only when no memory slots have been created.
*/
if (!kvm_are_all_memslots_empty(kvm)) {
r = -EINVAL;
} else if (new_cap && !kvm_is_block_size_supported(new_cap)) {
r = -EINVAL;
} else {
r = 0;
kvm->arch.mmu.split_page_chunk_size = new_cap;
if (kvm_are_all_memslots_empty(kvm)) {
u64 new_cap = cap->args[0];
if (!new_cap || kvm_is_block_size_supported(new_cap)) {
r = 0;
kvm->arch.mmu.split_page_chunk_size = new_cap;
}
}
mutex_unlock(&kvm->slots_lock);
break;
default:
r = -EINVAL;
break;
}
@ -195,6 +218,23 @@ void kvm_arch_create_vm_debugfs(struct kvm *kvm)
kvm_sys_regs_create_debugfs(kvm);
}
static void kvm_destroy_mpidr_data(struct kvm *kvm)
{
struct kvm_mpidr_data *data;
mutex_lock(&kvm->arch.config_lock);
data = rcu_dereference_protected(kvm->arch.mpidr_data,
lockdep_is_held(&kvm->arch.config_lock));
if (data) {
rcu_assign_pointer(kvm->arch.mpidr_data, NULL);
synchronize_rcu();
kfree(data);
}
mutex_unlock(&kvm->arch.config_lock);
}
/**
* kvm_arch_destroy_vm - destroy the VM data structure
* @kvm: pointer to the KVM struct
@ -209,7 +249,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
if (is_protected_kvm_enabled())
pkvm_destroy_hyp_vm(kvm);
kfree(kvm->arch.mpidr_data);
kvm_destroy_mpidr_data(kvm);
kfree(kvm->arch.sysreg_masks);
kvm_destroy_vcpus(kvm);
@ -218,9 +259,47 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_arm_teardown_hypercalls(kvm);
}
static bool kvm_has_full_ptr_auth(void)
{
bool apa, gpa, api, gpi, apa3, gpa3;
u64 isar1, isar2, val;
/*
* Check that:
*
* - both Address and Generic auth are implemented for a given
* algorithm (Q5, IMPDEF or Q3)
* - only a single algorithm is implemented.
*/
if (!system_has_full_ptr_auth())
return false;
isar1 = read_sanitised_ftr_reg(SYS_ID_AA64ISAR1_EL1);
isar2 = read_sanitised_ftr_reg(SYS_ID_AA64ISAR2_EL1);
apa = !!FIELD_GET(ID_AA64ISAR1_EL1_APA_MASK, isar1);
val = FIELD_GET(ID_AA64ISAR1_EL1_GPA_MASK, isar1);
gpa = (val == ID_AA64ISAR1_EL1_GPA_IMP);
api = !!FIELD_GET(ID_AA64ISAR1_EL1_API_MASK, isar1);
val = FIELD_GET(ID_AA64ISAR1_EL1_GPI_MASK, isar1);
gpi = (val == ID_AA64ISAR1_EL1_GPI_IMP);
apa3 = !!FIELD_GET(ID_AA64ISAR2_EL1_APA3_MASK, isar2);
val = FIELD_GET(ID_AA64ISAR2_EL1_GPA3_MASK, isar2);
gpa3 = (val == ID_AA64ISAR2_EL1_GPA3_IMP);
return (apa == gpa && api == gpi && apa3 == gpa3 &&
(apa + api + apa3) == 1);
}
int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
{
int r;
if (kvm && kvm_vm_is_protected(kvm) && !pkvm_ext_allowed(kvm, ext))
return 0;
switch (ext) {
case KVM_CAP_IRQCHIP:
r = vgic_present;
@ -311,7 +390,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
break;
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
case KVM_CAP_ARM_PTRAUTH_GENERIC:
r = system_has_full_ptr_auth();
r = kvm_has_full_ptr_auth();
break;
case KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE:
if (kvm)
@ -378,12 +457,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
/*
* Default value for the FP state, will be overloaded at load
* time if we support FP (pretty likely)
*/
vcpu->arch.fp_state = FP_STATE_FREE;
/* Set up the timer */
kvm_timer_vcpu_init(vcpu);
@ -395,6 +468,13 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
/*
* This vCPU may have been created after mpidr_data was initialized.
* Throw out the pre-computed mappings if that is the case which forces
* KVM to fall back to iteratively searching the vCPUs.
*/
kvm_destroy_mpidr_data(vcpu->kvm);
err = kvm_vgic_vcpu_init(vcpu);
if (err)
return err;
@ -428,6 +508,44 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
}
static void vcpu_set_pauth_traps(struct kvm_vcpu *vcpu)
{
if (vcpu_has_ptrauth(vcpu)) {
/*
* Either we're running running an L2 guest, and the API/APK
* bits come from L1's HCR_EL2, or API/APK are both set.
*/
if (unlikely(vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu))) {
u64 val;
val = __vcpu_sys_reg(vcpu, HCR_EL2);
val &= (HCR_API | HCR_APK);
vcpu->arch.hcr_el2 &= ~(HCR_API | HCR_APK);
vcpu->arch.hcr_el2 |= val;
} else {
vcpu->arch.hcr_el2 |= (HCR_API | HCR_APK);
}
/*
* Save the host keys if there is any chance for the guest
* to use pauth, as the entry code will reload the guest
* keys in that case.
* Protected mode is the exception to that rule, as the
* entry into the EL2 code eagerly switch back and forth
* between host and hyp keys (and kvm_hyp_ctxt is out of
* reach anyway).
*/
if (is_protected_kvm_enabled())
return;
if (vcpu->arch.hcr_el2 & (HCR_API | HCR_APK)) {
struct kvm_cpu_context *ctxt;
ctxt = this_cpu_ptr_hyp_sym(kvm_hyp_ctxt);
ptrauth_save_keys(ctxt);
}
}
}
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
struct kvm_s2_mmu *mmu;
@ -466,8 +584,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
else
vcpu_set_wfx_traps(vcpu);
if (vcpu_has_ptrauth(vcpu))
vcpu_ptrauth_disable(vcpu);
vcpu_set_pauth_traps(vcpu);
kvm_arch_vcpu_load_debug_state_flags(vcpu);
if (!cpumask_test_cpu(cpu, vcpu->kvm->arch.supported_cpus))
@ -580,11 +698,6 @@ unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)
}
#endif
static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
{
return vcpu_get_flag(vcpu, VCPU_INITIALIZED);
}
static void kvm_init_mpidr_data(struct kvm *kvm)
{
struct kvm_mpidr_data *data = NULL;
@ -594,7 +707,8 @@ static void kvm_init_mpidr_data(struct kvm *kvm)
mutex_lock(&kvm->arch.config_lock);
if (kvm->arch.mpidr_data || atomic_read(&kvm->online_vcpus) == 1)
if (rcu_access_pointer(kvm->arch.mpidr_data) ||
atomic_read(&kvm->online_vcpus) == 1)
goto out;
kvm_for_each_vcpu(c, vcpu, kvm) {
@ -631,7 +745,7 @@ static void kvm_init_mpidr_data(struct kvm *kvm)
data->cmpidr_to_idx[index] = c;
}
kvm->arch.mpidr_data = data;
rcu_assign_pointer(kvm->arch.mpidr_data, data);
out:
mutex_unlock(&kvm->arch.config_lock);
}
@ -790,9 +904,8 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
* doorbells to be signalled, should an interrupt become pending.
*/
preempt_disable();
kvm_vgic_vmcr_sync(vcpu);
vcpu_set_flag(vcpu, IN_WFI);
vgic_v4_put(vcpu);
kvm_vgic_put(vcpu);
preempt_enable();
kvm_vcpu_halt(vcpu);
@ -800,7 +913,7 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
preempt_disable();
vcpu_clear_flag(vcpu, IN_WFI);
vgic_v4_load(vcpu);
kvm_vgic_load(vcpu);
preempt_enable();
}
@ -980,7 +1093,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
if (run->exit_reason == KVM_EXIT_MMIO) {
ret = kvm_handle_mmio_return(vcpu);
if (ret)
if (ret <= 0)
return ret;
}
@ -1270,7 +1383,7 @@ static unsigned long system_supported_vcpu_features(void)
if (!system_supports_sve())
clear_bit(KVM_ARM_VCPU_SVE, &features);
if (!system_has_full_ptr_auth()) {
if (!kvm_has_full_ptr_auth()) {
clear_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, &features);
clear_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, &features);
}
@ -1971,7 +2084,7 @@ static void cpu_set_hyp_vector(void)
static void cpu_hyp_init_context(void)
{
kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
kvm_init_host_cpu_context(host_data_ptr(host_ctxt));
if (!is_kernel_in_hyp_mode())
cpu_init_hyp_mode();
@ -2470,22 +2583,28 @@ out_err:
struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
{
struct kvm_vcpu *vcpu;
struct kvm_vcpu *vcpu = NULL;
struct kvm_mpidr_data *data;
unsigned long i;
mpidr &= MPIDR_HWID_BITMASK;
if (kvm->arch.mpidr_data) {
u16 idx = kvm_mpidr_index(kvm->arch.mpidr_data, mpidr);
rcu_read_lock();
data = rcu_dereference(kvm->arch.mpidr_data);
vcpu = kvm_get_vcpu(kvm,
kvm->arch.mpidr_data->cmpidr_to_idx[idx]);
if (data) {
u16 idx = kvm_mpidr_index(data, mpidr);
vcpu = kvm_get_vcpu(kvm, data->cmpidr_to_idx[idx]);
if (mpidr != kvm_vcpu_get_mpidr_aff(vcpu))
vcpu = NULL;
return vcpu;
}
rcu_read_unlock();
if (vcpu)
return vcpu;
kvm_for_each_vcpu(i, vcpu, kvm) {
if (mpidr == kvm_vcpu_get_mpidr_aff(vcpu))
return vcpu;

View File

@ -2117,6 +2117,26 @@ inject:
return true;
}
static bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
{
bool control_bit_set;
if (!vcpu_has_nv(vcpu))
return false;
control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
if (!is_hyp_ctxt(vcpu) && control_bit_set) {
kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
return true;
}
return false;
}
bool forward_smc_trap(struct kvm_vcpu *vcpu)
{
return forward_traps(vcpu, HCR_TSC);
}
static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
{
u64 mode = spsr & PSR_MODE_MASK;
@ -2152,37 +2172,39 @@ static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
{
u64 spsr, elr, mode;
bool direct_eret;
u64 spsr, elr, esr;
/*
* Going through the whole put/load motions is a waste of time
* if this is a VHE guest hypervisor returning to its own
* userspace, or the hypervisor performing a local exception
* return. No need to save/restore registers, no need to
* switch S2 MMU. Just do the canonical ERET.
* Forward this trap to the virtual EL2 if the virtual
* HCR_EL2.NV bit is set and this is coming from !EL2.
*/
spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
spsr = kvm_check_illegal_exception_return(vcpu, spsr);
mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
direct_eret = (mode == PSR_MODE_EL0t &&
vcpu_el2_e2h_is_set(vcpu) &&
vcpu_el2_tge_is_set(vcpu));
direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
if (direct_eret) {
*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
*vcpu_cpsr(vcpu) = spsr;
trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
if (forward_traps(vcpu, HCR_NV))
return;
/* Check for an ERETAx */
esr = kvm_vcpu_get_esr(vcpu);
if (esr_iss_is_eretax(esr) && !kvm_auth_eretax(vcpu, &elr)) {
/*
* Oh no, ERETAx failed to authenticate. If we have
* FPACCOMBINE, deliver an exception right away. If we
* don't, then let the mangled ELR value trickle down the
* ERET handling, and the guest will have a little surprise.
*/
if (kvm_has_pauth(vcpu->kvm, FPACCOMBINE)) {
esr &= ESR_ELx_ERET_ISS_ERETA;
esr |= FIELD_PREP(ESR_ELx_EC_MASK, ESR_ELx_EC_FPAC);
kvm_inject_nested_sync(vcpu, esr);
return;
}
}
preempt_disable();
kvm_arch_vcpu_put(vcpu);
elr = __vcpu_sys_reg(vcpu, ELR_EL2);
spsr = __vcpu_sys_reg(vcpu, SPSR_EL2);
spsr = kvm_check_illegal_exception_return(vcpu, spsr);
if (!esr_iss_is_eretax(esr))
elr = __vcpu_sys_reg(vcpu, ELR_EL2);
trace_kvm_nested_eret(vcpu, elr, spsr);

View File

@ -14,19 +14,6 @@
#include <asm/kvm_mmu.h>
#include <asm/sysreg.h>
void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
{
struct task_struct *p = vcpu->arch.parent_task;
struct user_fpsimd_state *fpsimd;
if (!is_protected_kvm_enabled() || !p)
return;
fpsimd = &p->thread.uw.fpsimd_state;
kvm_unshare_hyp(fpsimd, fpsimd + 1);
put_task_struct(p);
}
/*
* Called on entry to KVM_RUN unless this vcpu previously ran at least
* once and the most recent prior KVM_RUN for this vcpu was called from
@ -38,30 +25,18 @@ void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
*/
int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
{
struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
int ret;
struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
kvm_vcpu_unshare_task_fp(vcpu);
/* pKVM has its own tracking of the host fpsimd state. */
if (is_protected_kvm_enabled())
return 0;
/* Make sure the host task fpsimd state is visible to hyp: */
ret = kvm_share_hyp(fpsimd, fpsimd + 1);
if (ret)
return ret;
vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
/*
* We need to keep current's task_struct pinned until its data has been
* unshared with the hypervisor to make sure it is not re-used by the
* kernel and donated to someone else while already shared -- see
* kvm_vcpu_unshare_task_fp() for the matching put_task_struct().
*/
if (is_protected_kvm_enabled()) {
get_task_struct(current);
vcpu->arch.parent_task = current;
}
return 0;
}
@ -86,7 +61,8 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
* guest in kvm_arch_vcpu_ctxflush_fp() and override this to
* FP_STATE_FREE if the flag set.
*/
vcpu->arch.fp_state = FP_STATE_HOST_OWNED;
*host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
*host_data_ptr(fpsimd_state) = kern_hyp_va(&current->thread.uw.fpsimd_state);
vcpu_clear_flag(vcpu, HOST_SVE_ENABLED);
if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
@ -110,7 +86,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
* been saved, this is very unlikely to happen.
*/
if (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK)) {
vcpu->arch.fp_state = FP_STATE_FREE;
*host_data_ptr(fp_owner) = FP_STATE_FREE;
fpsimd_save_and_flush_cpu_state();
}
}
@ -126,7 +102,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu)
{
if (test_thread_flag(TIF_FOREIGN_FPSTATE))
vcpu->arch.fp_state = FP_STATE_FREE;
*host_data_ptr(fp_owner) = FP_STATE_FREE;
}
/*
@ -142,8 +118,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
WARN_ON_ONCE(!irqs_disabled());
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
if (guest_owns_fp_regs()) {
/*
* Currently we do not support SME guests so SVCR is
* always 0 and we just need a variable to point to.
@ -196,16 +171,38 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
isb();
}
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
if (guest_owns_fp_regs()) {
if (vcpu_has_sve(vcpu)) {
__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
/* Restore the VL that was saved when bound to the CPU */
/*
* Restore the VL that was saved when bound to the CPU,
* which is the maximum VL for the guest. Because the
* layout of the data when saving the sve state depends
* on the VL, we need to use a consistent (i.e., the
* maximum) VL.
* Note that this means that at guest exit ZCR_EL1 is
* not necessarily the same as on guest entry.
*
* Restoring the VL isn't needed in VHE mode since
* ZCR_EL2 (accessed via ZCR_EL1) would fulfill the same
* role when doing the save from EL2.
*/
if (!has_vhe())
sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1,
SYS_ZCR_EL1);
}
/*
* Flush (save and invalidate) the fpsimd/sve state so that if
* the host tries to use fpsimd/sve, it's not using stale data
* from the guest.
*
* Flushing the state sets the TIF_FOREIGN_FPSTATE bit for the
* context unconditionally, in both nVHE and VHE. This allows
* the kernel to restore the fpsimd/sve state, including ZCR_EL1
* when needed.
*/
fpsimd_save_and_flush_cpu_state();
} else if (has_vhe() && system_supports_sve()) {
/*

View File

@ -55,6 +55,13 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
static int handle_smc(struct kvm_vcpu *vcpu)
{
/*
* Forward this trapped smc instruction to the virtual EL2 if
* the guest has asked for it.
*/
if (forward_smc_trap(vcpu))
return 1;
/*
* "If an SMC instruction executed at Non-secure EL1 is
* trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
@ -207,19 +214,40 @@ static int handle_sve(struct kvm_vcpu *vcpu)
}
/*
* Guest usage of a ptrauth instruction (which the guest EL1 did not turn into
* a NOP). If we get here, it is that we didn't fixup ptrauth on exit, and all
* that we can do is give the guest an UNDEF.
* Two possibilities to handle a trapping ptrauth instruction:
*
* - Guest usage of a ptrauth instruction (which the guest EL1 did not
* turn into a NOP). If we get here, it is because we didn't enable
* ptrauth for the guest. This results in an UNDEF, as it isn't
* supposed to use ptrauth without being told it could.
*
* - Running an L2 NV guest while L1 has left HCR_EL2.API==0, and for
* which we reinject the exception into L1.
*
* Anything else is an emulation bug (hence the WARN_ON + UNDEF).
*/
static int kvm_handle_ptrauth(struct kvm_vcpu *vcpu)
{
if (!vcpu_has_ptrauth(vcpu)) {
kvm_inject_undefined(vcpu);
return 1;
}
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu)) {
kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
return 1;
}
/* Really shouldn't be here! */
WARN_ON_ONCE(1);
kvm_inject_undefined(vcpu);
return 1;
}
static int kvm_handle_eret(struct kvm_vcpu *vcpu)
{
if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_ERET_ISS_ERET)
if (esr_iss_is_eretax(kvm_vcpu_get_esr(vcpu)) &&
!vcpu_has_ptrauth(vcpu))
return kvm_handle_ptrauth(vcpu);
/*

View File

@ -135,9 +135,9 @@ static inline void __debug_switch_to_guest_common(struct kvm_vcpu *vcpu)
if (!vcpu_get_flag(vcpu, DEBUG_DIRTY))
return;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
guest_ctxt = &vcpu->arch.ctxt;
host_dbg = &vcpu->arch.host_debug_state.regs;
host_dbg = host_data_ptr(host_debug_state.regs);
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
__debug_save_state(host_dbg, host_ctxt);
@ -154,9 +154,9 @@ static inline void __debug_switch_to_host_common(struct kvm_vcpu *vcpu)
if (!vcpu_get_flag(vcpu, DEBUG_DIRTY))
return;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
guest_ctxt = &vcpu->arch.ctxt;
host_dbg = &vcpu->arch.host_debug_state.regs;
host_dbg = host_data_ptr(host_debug_state.regs);
guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
__debug_save_state(guest_dbg, guest_ctxt);

View File

@ -27,6 +27,7 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_nested.h>
#include <asm/kvm_ptrauth.h>
#include <asm/fpsimd.h>
#include <asm/debug-monitors.h>
#include <asm/processor.h>
@ -39,12 +40,6 @@ struct kvm_exception_table_entry {
extern struct kvm_exception_table_entry __start___kvm_ex_table;
extern struct kvm_exception_table_entry __stop___kvm_ex_table;
/* Check whether the FP regs are owned by the guest */
static inline bool guest_owns_fp_regs(struct kvm_vcpu *vcpu)
{
return vcpu->arch.fp_state == FP_STATE_GUEST_OWNED;
}
/* Save the 32-bit only FPSIMD system register state */
static inline void __fpsimd_save_fpexc32(struct kvm_vcpu *vcpu)
{
@ -155,7 +150,7 @@ static inline bool cpu_has_amu(void)
static inline void __activate_traps_hfgxtr(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
CHECK_FGT_MASKS(HFGRTR_EL2);
@ -191,7 +186,7 @@ static inline void __activate_traps_hfgxtr(struct kvm_vcpu *vcpu)
static inline void __deactivate_traps_hfgxtr(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
if (!cpus_have_final_cap(ARM64_HAS_FGT))
@ -226,13 +221,13 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
write_sysreg(0, pmselr_el0);
hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
hctxt = host_data_ptr(host_ctxt);
ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
vcpu_set_flag(vcpu, PMUSERENR_ON_CPU);
}
vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
*host_data_ptr(host_debug_state.mdcr_el2) = read_sysreg(mdcr_el2);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
if (cpus_have_final_cap(ARM64_HAS_HCX)) {
@ -254,13 +249,13 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
{
write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
write_sysreg(*host_data_ptr(host_debug_state.mdcr_el2), mdcr_el2);
write_sysreg(0, hstr_el2);
if (kvm_arm_support_pmu_v3()) {
struct kvm_cpu_context *hctxt;
hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
hctxt = host_data_ptr(host_ctxt);
write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0);
vcpu_clear_flag(vcpu, PMUSERENR_ON_CPU);
}
@ -271,10 +266,8 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
__deactivate_traps_hfgxtr(vcpu);
}
static inline void ___activate_traps(struct kvm_vcpu *vcpu)
static inline void ___activate_traps(struct kvm_vcpu *vcpu, u64 hcr)
{
u64 hcr = vcpu->arch.hcr_el2;
if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM))
hcr |= HCR_TVM;
@ -376,8 +369,8 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
isb();
/* Write out the host state if it's in the registers */
if (vcpu->arch.fp_state == FP_STATE_HOST_OWNED)
__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
if (host_owns_fp_regs())
__fpsimd_save_state(*host_data_ptr(fpsimd_state));
/* Restore the guest state */
if (sve_guest)
@ -389,7 +382,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
if (!(read_sysreg(hcr_el2) & HCR_RW))
write_sysreg(__vcpu_sys_reg(vcpu, FPEXC32_EL2), fpexc32_el2);
vcpu->arch.fp_state = FP_STATE_GUEST_OWNED;
*host_data_ptr(fp_owner) = FP_STATE_GUEST_OWNED;
return true;
}
@ -449,60 +442,6 @@ static inline bool handle_tx2_tvm(struct kvm_vcpu *vcpu)
return true;
}
static inline bool esr_is_ptrauth_trap(u64 esr)
{
switch (esr_sys64_to_sysreg(esr)) {
case SYS_APIAKEYLO_EL1:
case SYS_APIAKEYHI_EL1:
case SYS_APIBKEYLO_EL1:
case SYS_APIBKEYHI_EL1:
case SYS_APDAKEYLO_EL1:
case SYS_APDAKEYHI_EL1:
case SYS_APDBKEYLO_EL1:
case SYS_APDBKEYHI_EL1:
case SYS_APGAKEYLO_EL1:
case SYS_APGAKEYHI_EL1:
return true;
}
return false;
}
#define __ptrauth_save_key(ctxt, key) \
do { \
u64 __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYLO_EL1); \
ctxt_sys_reg(ctxt, key ## KEYLO_EL1) = __val; \
__val = read_sysreg_s(SYS_ ## key ## KEYHI_EL1); \
ctxt_sys_reg(ctxt, key ## KEYHI_EL1) = __val; \
} while(0)
DECLARE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
static bool kvm_hyp_handle_ptrauth(struct kvm_vcpu *vcpu, u64 *exit_code)
{
struct kvm_cpu_context *ctxt;
u64 val;
if (!vcpu_has_ptrauth(vcpu))
return false;
ctxt = this_cpu_ptr(&kvm_hyp_ctxt);
__ptrauth_save_key(ctxt, APIA);
__ptrauth_save_key(ctxt, APIB);
__ptrauth_save_key(ctxt, APDA);
__ptrauth_save_key(ctxt, APDB);
__ptrauth_save_key(ctxt, APGA);
vcpu_ptrauth_enable(vcpu);
val = read_sysreg(hcr_el2);
val |= (HCR_API | HCR_APK);
write_sysreg(val, hcr_el2);
return true;
}
static bool kvm_hyp_handle_cntpct(struct kvm_vcpu *vcpu)
{
struct arch_timer_context *ctxt;
@ -590,9 +529,6 @@ static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code)
__vgic_v3_perform_cpuif_access(vcpu) == 1)
return true;
if (esr_is_ptrauth_trap(kvm_vcpu_get_esr(vcpu)))
return kvm_hyp_handle_ptrauth(vcpu, exit_code);
if (kvm_hyp_handle_cntpct(vcpu))
return true;

View File

@ -53,7 +53,13 @@ pkvm_hyp_vcpu_to_hyp_vm(struct pkvm_hyp_vcpu *hyp_vcpu)
return container_of(hyp_vcpu->vcpu.kvm, struct pkvm_hyp_vm, kvm);
}
static inline bool pkvm_hyp_vcpu_is_protected(struct pkvm_hyp_vcpu *hyp_vcpu)
{
return vcpu_is_protected(&hyp_vcpu->vcpu);
}
void pkvm_hyp_vm_table_init(void *tbl);
void pkvm_host_fpsimd_state_init(void);
int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
unsigned long pgd_hva);

View File

@ -83,10 +83,10 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
/* Disable and flush SPE data generation */
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
__debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
__debug_save_spe(host_data_ptr(host_debug_state.pmscr_el1));
/* Disable and flush Self-Hosted Trace generation */
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
__debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
__debug_save_trace(host_data_ptr(host_debug_state.trfcr_el1));
}
void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
@ -97,9 +97,9 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
__debug_restore_spe(*host_data_ptr(host_debug_state.pmscr_el1));
if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
__debug_restore_trace(*host_data_ptr(host_debug_state.trfcr_el1));
}
void __debug_switch_to_host(struct kvm_vcpu *vcpu)

View File

@ -600,7 +600,6 @@ static bool ffa_call_supported(u64 func_id)
case FFA_MSG_POLL:
case FFA_MSG_WAIT:
/* 32-bit variants of 64-bit calls */
case FFA_MSG_SEND_DIRECT_REQ:
case FFA_MSG_SEND_DIRECT_RESP:
case FFA_RXTX_MAP:
case FFA_MEM_DONATE:

View File

@ -39,10 +39,8 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
hyp_vcpu->vcpu.arch.cptr_el2 = host_vcpu->arch.cptr_el2;
hyp_vcpu->vcpu.arch.iflags = host_vcpu->arch.iflags;
hyp_vcpu->vcpu.arch.fp_state = host_vcpu->arch.fp_state;
hyp_vcpu->vcpu.arch.debug_ptr = kern_hyp_va(host_vcpu->arch.debug_ptr);
hyp_vcpu->vcpu.arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;
hyp_vcpu->vcpu.arch.vsesr_el2 = host_vcpu->arch.vsesr_el2;
@ -64,7 +62,6 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
host_vcpu->arch.fault = hyp_vcpu->vcpu.arch.fault;
host_vcpu->arch.iflags = hyp_vcpu->vcpu.arch.iflags;
host_vcpu->arch.fp_state = hyp_vcpu->vcpu.arch.fp_state;
host_cpu_if->vgic_hcr = hyp_cpu_if->vgic_hcr;
for (i = 0; i < hyp_cpu_if->used_lrs; ++i)
@ -178,16 +175,6 @@ static void handle___vgic_v3_get_gic_config(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __vgic_v3_get_gic_config();
}
static void handle___vgic_v3_read_vmcr(struct kvm_cpu_context *host_ctxt)
{
cpu_reg(host_ctxt, 1) = __vgic_v3_read_vmcr();
}
static void handle___vgic_v3_write_vmcr(struct kvm_cpu_context *host_ctxt)
{
__vgic_v3_write_vmcr(cpu_reg(host_ctxt, 1));
}
static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
{
__vgic_v3_init_lrs();
@ -198,18 +185,18 @@ static void handle___kvm_get_mdcr_el2(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __kvm_get_mdcr_el2();
}
static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
static void handle___vgic_v3_save_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
__vgic_v3_save_aprs(kern_hyp_va(cpu_if));
__vgic_v3_save_vmcr_aprs(kern_hyp_va(cpu_if));
}
static void handle___vgic_v3_restore_aprs(struct kvm_cpu_context *host_ctxt)
static void handle___vgic_v3_restore_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
__vgic_v3_restore_aprs(kern_hyp_va(cpu_if));
__vgic_v3_restore_vmcr_aprs(kern_hyp_va(cpu_if));
}
static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
@ -340,10 +327,8 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_tlb_flush_vmid_range),
HANDLE_FUNC(__kvm_flush_cpu_context),
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__vgic_v3_read_vmcr),
HANDLE_FUNC(__vgic_v3_write_vmcr),
HANDLE_FUNC(__vgic_v3_save_aprs),
HANDLE_FUNC(__vgic_v3_restore_aprs),
HANDLE_FUNC(__vgic_v3_save_vmcr_aprs),
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
HANDLE_FUNC(__pkvm_vcpu_init_traps),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),

View File

@ -533,7 +533,13 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
int ret = 0;
esr = read_sysreg_el2(SYS_ESR);
BUG_ON(!__get_fault_info(esr, &fault));
if (!__get_fault_info(esr, &fault)) {
/*
* We've presumably raced with a page-table change which caused
* AT to fail, try again.
*/
return;
}
addr = (fault.hpfar_el2 & HPFAR_MASK) << 8;
ret = host_stage2_idmap(addr);

View File

@ -200,7 +200,7 @@ static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
}
/*
* Initialize trap register values for protected VMs.
* Initialize trap register values in protected mode.
*/
void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
{
@ -247,6 +247,17 @@ void pkvm_hyp_vm_table_init(void *tbl)
vm_table = tbl;
}
void pkvm_host_fpsimd_state_init(void)
{
unsigned long i;
for (i = 0; i < hyp_nr_cpus; i++) {
struct kvm_host_data *host_data = per_cpu_ptr(&kvm_host_data, i);
host_data->fpsimd_state = &host_data->host_ctxt.fp_regs;
}
}
/*
* Return the hyp vm structure corresponding to the handle.
*/
@ -430,6 +441,7 @@ static void *map_donated_memory(unsigned long host_va, size_t size)
static void __unmap_donated_memory(void *va, size_t size)
{
kvm_flush_dcache_to_poc(va, size);
WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
PAGE_ALIGN(size) >> PAGE_SHIFT));
}

View File

@ -205,7 +205,7 @@ asmlinkage void __noreturn __kvm_host_psci_cpu_entry(bool is_cpu_on)
struct psci_boot_args *boot_args;
struct kvm_cpu_context *host_ctxt;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
if (is_cpu_on)
boot_args = this_cpu_ptr(&cpu_on_args);

View File

@ -257,8 +257,7 @@ static int fix_hyp_pgtable_refcnt(void)
void __noreturn __pkvm_init_finalise(void)
{
struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
struct kvm_cpu_context *host_ctxt = host_data_ptr(host_ctxt);
unsigned long nr_pages, reserved_pages, pfn;
int ret;
@ -301,6 +300,7 @@ void __noreturn __pkvm_init_finalise(void)
goto out;
pkvm_hyp_vm_table_init(vm_table_base);
pkvm_host_fpsimd_state_init();
out:
/*
* We tail-called to here from handle___pkvm_init() and will not return,

View File

@ -40,7 +40,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
___activate_traps(vcpu);
___activate_traps(vcpu, vcpu->arch.hcr_el2);
__activate_traps_common(vcpu);
val = vcpu->arch.cptr_el2;
@ -53,7 +53,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val |= CPTR_EL2_TSM;
}
if (!guest_owns_fp_regs(vcpu)) {
if (!guest_owns_fp_regs()) {
if (has_hvhe())
val &= ~(CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN |
CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN);
@ -191,7 +191,6 @@ static const exit_handler_fn hyp_exit_handlers[] = {
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
[ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low,
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
[ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops,
};
@ -203,13 +202,12 @@ static const exit_handler_fn pvm_exit_handlers[] = {
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
[ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low,
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
[ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops,
};
static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
{
if (unlikely(kvm_vm_is_protected(kern_hyp_va(vcpu->kvm))))
if (unlikely(vcpu_is_protected(vcpu)))
return pvm_exit_handlers;
return hyp_exit_handlers;
@ -228,9 +226,7 @@ static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
*/
static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
{
struct kvm *kvm = kern_hyp_va(vcpu->kvm);
if (kvm_vm_is_protected(kvm) && vcpu_mode_is_32bit(vcpu)) {
if (unlikely(vcpu_is_protected(vcpu) && vcpu_mode_is_32bit(vcpu))) {
/*
* As we have caught the guest red-handed, decide that it isn't
* fit for purpose anymore by making the vcpu invalid. The VMM
@ -264,7 +260,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
pmr_sync();
}
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = &vcpu->arch.ctxt;
@ -337,7 +333,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
__sysreg_restore_state_nvhe(host_ctxt);
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)
if (guest_owns_fp_regs())
__fpsimd_save_fpexc32(vcpu);
__debug_switch_to_host(vcpu);
@ -367,7 +363,7 @@ asmlinkage void __noreturn hyp_panic(void)
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
vcpu = host_ctxt->__hyp_running_vcpu;
if (vcpu) {

View File

@ -11,13 +11,23 @@
#include <nvhe/mem_protect.h>
struct tlb_inv_context {
u64 tcr;
struct kvm_s2_mmu *mmu;
u64 tcr;
u64 sctlr;
};
static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt,
bool nsh)
static void enter_vmid_context(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt,
bool nsh)
{
struct kvm_s2_mmu *host_s2_mmu = &host_mmu.arch.mmu;
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
vcpu = host_ctxt->__hyp_running_vcpu;
cxt->mmu = NULL;
/*
* We have two requirements:
*
@ -40,20 +50,55 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
else
dsb(ish);
/*
* If we're already in the desired context, then there's nothing to do.
*/
if (vcpu) {
/*
* We're in guest context. However, for this to work, this needs
* to be called from within __kvm_vcpu_run(), which ensures that
* __hyp_running_vcpu is set to the current guest vcpu.
*/
if (mmu == vcpu->arch.hw_mmu || WARN_ON(mmu != host_s2_mmu))
return;
cxt->mmu = vcpu->arch.hw_mmu;
} else {
/* We're in host context. */
if (mmu == host_s2_mmu)
return;
cxt->mmu = host_s2_mmu;
}
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
u64 val;
/*
* For CPUs that are affected by ARM 1319367, we need to
* avoid a host Stage-1 walk while we have the guest's
* VMID set in the VTTBR in order to invalidate TLBs.
* We're guaranteed that the S1 MMU is enabled, so we can
* simply set the EPD bits to avoid any further TLB fill.
* avoid a Stage-1 walk with the old VMID while we have
* the new VMID set in the VTTBR in order to invalidate TLBs.
* We're guaranteed that the host S1 MMU is enabled, so
* we can simply set the EPD bits to avoid any further
* TLB fill. For guests, we ensure that the S1 MMU is
* temporarily enabled in the next context.
*/
val = cxt->tcr = read_sysreg_el1(SYS_TCR);
val |= TCR_EPD1_MASK | TCR_EPD0_MASK;
write_sysreg_el1(val, SYS_TCR);
isb();
if (vcpu) {
val = cxt->sctlr = read_sysreg_el1(SYS_SCTLR);
if (!(val & SCTLR_ELx_M)) {
val |= SCTLR_ELx_M;
write_sysreg_el1(val, SYS_SCTLR);
isb();
}
} else {
/* The host S1 MMU is always enabled. */
cxt->sctlr = SCTLR_ELx_M;
}
}
/*
@ -62,18 +107,40 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
* ensuring that we always have an ISB, but not two ISBs back
* to back.
*/
__load_stage2(mmu, kern_hyp_va(mmu->arch));
if (vcpu)
__load_host_stage2();
else
__load_stage2(mmu, kern_hyp_va(mmu->arch));
asm(ALTERNATIVE("isb", "nop", ARM64_WORKAROUND_SPECULATIVE_AT));
}
static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
static void exit_vmid_context(struct tlb_inv_context *cxt)
{
__load_host_stage2();
struct kvm_s2_mmu *mmu = cxt->mmu;
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
vcpu = host_ctxt->__hyp_running_vcpu;
if (!mmu)
return;
if (vcpu)
__load_stage2(mmu, kern_hyp_va(mmu->arch));
else
__load_host_stage2();
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
/* Ensure write of the host VMID */
/* Ensure write of the old VMID */
isb();
/* Restore the host's TCR_EL1 */
if (!(cxt->sctlr & SCTLR_ELx_M)) {
write_sysreg_el1(cxt->sctlr, SYS_SCTLR);
isb();
}
write_sysreg_el1(cxt->tcr, SYS_TCR);
}
}
@ -84,7 +151,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
/*
* We could do so much better if we had the VA as well.
@ -105,7 +172,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
@ -114,7 +181,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, true);
enter_vmid_context(mmu, &cxt, true);
/*
* We could do so much better if we had the VA as well.
@ -135,7 +202,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
@ -152,7 +219,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
start = round_down(start, stride);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride,
TLBI_TTL_UNKNOWN);
@ -162,7 +229,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
@ -170,13 +237,13 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
__tlbi(vmalls12e1is);
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
@ -184,19 +251,19 @@ void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt, false);
enter_vmid_context(mmu, &cxt, false);
__tlbi(vmalle1);
asm volatile("ic iallu");
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_vm_context(void)
{
/* Same remark as in __tlb_switch_to_guest() */
/* Same remark as in enter_vmid_context() */
dsb(ish);
__tlbi(alle1is);
dsb(ish);

View File

@ -914,12 +914,12 @@ static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
{
u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
return kvm_pte_valid(pte) && memattr == KVM_S2_MEMATTR(pgt, NORMAL);
}
static bool stage2_pte_executable(kvm_pte_t pte)
{
return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
return kvm_pte_valid(pte) && !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
}
static u64 stage2_map_walker_phys_addr(const struct kvm_pgtable_visit_ctx *ctx,
@ -979,6 +979,21 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
if (!stage2_pte_needs_update(ctx->old, new))
return -EAGAIN;
/* If we're only changing software bits, then store them and go! */
if (!kvm_pgtable_walk_shared(ctx) &&
!((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW)) {
bool old_is_counted = stage2_pte_is_counted(ctx->old);
if (old_is_counted != stage2_pte_is_counted(new)) {
if (old_is_counted)
mm_ops->put_page(ctx->ptep);
else
mm_ops->get_page(ctx->ptep);
}
WARN_ON_ONCE(!stage2_try_set_pte(ctx, new));
return 0;
}
if (!stage2_try_break_pte(ctx, data->mmu))
return -EAGAIN;
@ -1370,7 +1385,7 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
struct kvm_pgtable *pgt = ctx->arg;
struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
if (!stage2_pte_cacheable(pgt, ctx->old))
return 0;
if (mm_ops->dcache_clean_inval_poc)

View File

@ -330,7 +330,7 @@ void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if)
write_gicreg(0, ICH_HCR_EL2);
}
void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
static void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
{
u64 val;
u32 nr_pre_bits;
@ -363,7 +363,7 @@ void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
}
}
void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if)
static void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if)
{
u64 val;
u32 nr_pre_bits;
@ -455,16 +455,35 @@ u64 __vgic_v3_get_gic_config(void)
return val;
}
u64 __vgic_v3_read_vmcr(void)
static u64 __vgic_v3_read_vmcr(void)
{
return read_gicreg(ICH_VMCR_EL2);
}
void __vgic_v3_write_vmcr(u32 vmcr)
static void __vgic_v3_write_vmcr(u32 vmcr)
{
write_gicreg(vmcr, ICH_VMCR_EL2);
}
void __vgic_v3_save_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if)
{
__vgic_v3_save_aprs(cpu_if);
if (cpu_if->vgic_sre)
cpu_if->vgic_vmcr = __vgic_v3_read_vmcr();
}
void __vgic_v3_restore_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if)
{
/*
* If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
* is dependent on ICC_SRE_EL1.SRE, and we have to perform the
* VMCR_EL2 save/restore in the world switch.
*/
if (cpu_if->vgic_sre)
__vgic_v3_write_vmcr(cpu_if->vgic_vmcr);
__vgic_v3_restore_aprs(cpu_if);
}
static int __vgic_v3_bpr_min(void)
{
/* See Pseudocode for VPriorityGroup */

View File

@ -33,11 +33,43 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
/*
* HCR_EL2 bits that the NV guest can freely change (no RES0/RES1
* semantics, irrespective of the configuration), but that cannot be
* applied to the actual HW as things would otherwise break badly.
*
* - TGE: we want the guest to use EL1, which is incompatible with
* this bit being set
*
* - API/APK: they are already accounted for by vcpu_load(), and can
* only take effect across a load/put cycle (such as ERET)
*/
#define NV_HCR_GUEST_EXCLUDE (HCR_TGE | HCR_API | HCR_APK)
static u64 __compute_hcr(struct kvm_vcpu *vcpu)
{
u64 hcr = vcpu->arch.hcr_el2;
if (!vcpu_has_nv(vcpu))
return hcr;
if (is_hyp_ctxt(vcpu)) {
hcr |= HCR_NV | HCR_NV2 | HCR_AT | HCR_TTLB;
if (!vcpu_el2_e2h_is_set(vcpu))
hcr |= HCR_NV1;
write_sysreg_s(vcpu->arch.ctxt.vncr_array, SYS_VNCR_EL2);
}
return hcr | (__vcpu_sys_reg(vcpu, HCR_EL2) & ~NV_HCR_GUEST_EXCLUDE);
}
static void __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
___activate_traps(vcpu);
___activate_traps(vcpu, __compute_hcr(vcpu));
if (has_cntpoff()) {
struct timer_map map;
@ -75,7 +107,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val |= CPTR_EL2_TAM;
if (guest_owns_fp_regs(vcpu)) {
if (guest_owns_fp_regs()) {
if (vcpu_has_sve(vcpu))
val |= CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN;
} else {
@ -162,6 +194,8 @@ static void __vcpu_put_deactivate_traps(struct kvm_vcpu *vcpu)
void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu)
{
host_data_ptr(host_ctxt)->__hyp_running_vcpu = vcpu;
__vcpu_load_switch_sysregs(vcpu);
__vcpu_load_activate_traps(vcpu);
__load_stage2(vcpu->arch.hw_mmu, vcpu->arch.hw_mmu->arch);
@ -171,6 +205,61 @@ void kvm_vcpu_put_vhe(struct kvm_vcpu *vcpu)
{
__vcpu_put_deactivate_traps(vcpu);
__vcpu_put_switch_sysregs(vcpu);
host_data_ptr(host_ctxt)->__hyp_running_vcpu = NULL;
}
static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
{
u64 esr = kvm_vcpu_get_esr(vcpu);
u64 spsr, elr, mode;
/*
* Going through the whole put/load motions is a waste of time
* if this is a VHE guest hypervisor returning to its own
* userspace, or the hypervisor performing a local exception
* return. No need to save/restore registers, no need to
* switch S2 MMU. Just do the canonical ERET.
*
* Unless the trap has to be forwarded further down the line,
* of course...
*/
if ((__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV) ||
(__vcpu_sys_reg(vcpu, HFGITR_EL2) & HFGITR_EL2_ERET))
return false;
spsr = read_sysreg_el1(SYS_SPSR);
mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
switch (mode) {
case PSR_MODE_EL0t:
if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
return false;
break;
case PSR_MODE_EL2t:
mode = PSR_MODE_EL1t;
break;
case PSR_MODE_EL2h:
mode = PSR_MODE_EL1h;
break;
default:
return false;
}
/* If ERETAx fails, take the slow path */
if (esr_iss_is_eretax(esr)) {
if (!(vcpu_has_ptrauth(vcpu) && kvm_auth_eretax(vcpu, &elr)))
return false;
} else {
elr = read_sysreg_el1(SYS_ELR);
}
spsr = (spsr & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
write_sysreg_el2(spsr, SYS_SPSR);
write_sysreg_el2(elr, SYS_ELR);
return true;
}
static const exit_handler_fn hyp_exit_handlers[] = {
@ -182,7 +271,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
[ESR_ELx_EC_WATCHPT_LOW] = kvm_hyp_handle_watchpt_low,
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
[ESR_ELx_EC_ERET] = kvm_hyp_handle_eret,
[ESR_ELx_EC_MOPS] = kvm_hyp_handle_mops,
};
@ -197,7 +286,7 @@ static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
* If we were in HYP context on entry, adjust the PSTATE view
* so that the usual helpers work correctly.
*/
if (unlikely(vcpu_get_flag(vcpu, VCPU_HYP_CONTEXT))) {
if (vcpu_has_nv(vcpu) && (read_sysreg(hcr_el2) & HCR_NV)) {
u64 mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
switch (mode) {
@ -221,8 +310,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *guest_ctxt;
u64 exit_code;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt->__hyp_running_vcpu = vcpu;
host_ctxt = host_data_ptr(host_ctxt);
guest_ctxt = &vcpu->arch.ctxt;
sysreg_save_host_state_vhe(host_ctxt);
@ -240,11 +328,6 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
sysreg_restore_guest_state_vhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
if (is_hyp_ctxt(vcpu))
vcpu_set_flag(vcpu, VCPU_HYP_CONTEXT);
else
vcpu_clear_flag(vcpu, VCPU_HYP_CONTEXT);
do {
/* Jump in the fire! */
exit_code = __guest_enter(vcpu);
@ -258,7 +341,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
sysreg_restore_host_state_vhe(host_ctxt);
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)
if (guest_owns_fp_regs())
__fpsimd_save_fpexc32(vcpu);
__debug_switch_to_host(vcpu);
@ -306,7 +389,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 par)
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
vcpu = host_ctxt->__hyp_running_vcpu;
__deactivate_traps(vcpu);

View File

@ -67,7 +67,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
struct kvm_cpu_context *host_ctxt;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
__sysreg_save_user_state(host_ctxt);
/*
@ -110,7 +110,7 @@ void __vcpu_put_switch_sysregs(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
struct kvm_cpu_context *host_ctxt;
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
host_ctxt = host_data_ptr(host_ctxt);
__sysreg_save_el1_state(guest_ctxt);
__sysreg_save_user_state(guest_ctxt);

View File

@ -17,8 +17,8 @@ struct tlb_inv_context {
u64 sctlr;
};
static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt)
static void enter_vmid_context(struct kvm_s2_mmu *mmu,
struct tlb_inv_context *cxt)
{
struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
u64 val;
@ -67,7 +67,7 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
isb();
}
static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
static void exit_vmid_context(struct tlb_inv_context *cxt)
{
/*
* We're done with the TLB operation, let's restore the host's
@ -97,7 +97,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
dsb(ishst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
/*
* We could do so much better if we had the VA as well.
@ -118,7 +118,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
@ -129,7 +129,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
dsb(nshst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
/*
* We could do so much better if we had the VA as well.
@ -150,7 +150,7 @@ void __kvm_tlb_flush_vmid_ipa_nsh(struct kvm_s2_mmu *mmu,
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
@ -169,7 +169,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
dsb(ishst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride,
TLBI_TTL_UNKNOWN);
@ -179,7 +179,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
@ -189,13 +189,13 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
dsb(ishst);
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
__tlbi(vmalls12e1is);
dsb(ish);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
@ -203,14 +203,14 @@ void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
struct tlb_inv_context cxt;
/* Switch to requested VMID */
__tlb_switch_to_guest(mmu, &cxt);
enter_vmid_context(mmu, &cxt);
__tlbi(vmalle1);
asm volatile("ic iallu");
dsb(nsh);
isb();
__tlb_switch_to_host(&cxt);
exit_vmid_context(&cxt);
}
void __kvm_flush_vm_context(void)

View File

@ -86,7 +86,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
/* Detect an already handled MMIO return */
if (unlikely(!vcpu->mmio_needed))
return 0;
return 1;
vcpu->mmio_needed = 0;
@ -117,7 +117,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
*/
kvm_incr_pc(vcpu);
return 0;
return 1;
}
int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
@ -133,11 +133,19 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
/*
* No valid syndrome? Ask userspace for help if it has
* volunteered to do so, and bail out otherwise.
*
* In the protected VM case, there isn't much userspace can do
* though, so directly deliver an exception to the guest.
*/
if (!kvm_vcpu_dabt_isvalid(vcpu)) {
trace_kvm_mmio_nisv(*vcpu_pc(vcpu), kvm_vcpu_get_esr(vcpu),
kvm_vcpu_get_hfar(vcpu), fault_ipa);
if (vcpu_is_protected(vcpu)) {
kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
return 1;
}
if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
&vcpu->kvm->arch.flags)) {
run->exit_reason = KVM_EXIT_ARM_NISV;

View File

@ -1522,8 +1522,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
read_lock(&kvm->mmu_lock);
pgt = vcpu->arch.hw_mmu->pgt;
if (mmu_invalidate_retry(kvm, mmu_seq))
if (mmu_invalidate_retry(kvm, mmu_seq)) {
ret = -EAGAIN;
goto out_unlock;
}
/*
* If we are not forced to use page mapping, check if we are
@ -1581,6 +1583,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
memcache,
KVM_PGTABLE_WALK_HANDLE_FAULT |
KVM_PGTABLE_WALK_SHARED);
out_unlock:
read_unlock(&kvm->mmu_lock);
/* Mark the page dirty only if the fault is handled successfully */
if (writable && !ret) {
@ -1588,8 +1592,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
mark_page_dirty_in_slot(kvm, memslot, gfn);
}
out_unlock:
read_unlock(&kvm->mmu_lock);
kvm_release_pfn_clean(pfn);
return ret != -EAGAIN ? ret : 0;
}

View File

@ -35,13 +35,9 @@ static u64 limit_nv_id_reg(u32 id, u64 val)
break;
case SYS_ID_AA64ISAR1_EL1:
/* Support everything but PtrAuth and Spec Invalidation */
/* Support everything but Spec Invalidation */
val &= ~(GENMASK_ULL(63, 56) |
NV_FTR(ISAR1, SPECRES) |
NV_FTR(ISAR1, GPI) |
NV_FTR(ISAR1, GPA) |
NV_FTR(ISAR1, API) |
NV_FTR(ISAR1, APA));
NV_FTR(ISAR1, SPECRES));
break;
case SYS_ID_AA64PFR0_EL1:

206
arch/arm64/kvm/pauth.c Normal file
View File

@ -0,0 +1,206 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2024 - Google LLC
* Author: Marc Zyngier <maz@kernel.org>
*
* Primitive PAuth emulation for ERETAA/ERETAB.
*
* This code assumes that is is run from EL2, and that it is part of
* the emulation of ERETAx for a guest hypervisor. That's a lot of
* baked-in assumptions and shortcuts.
*
* Do no reuse for anything else!
*/
#include <linux/kvm_host.h>
#include <asm/gpr-num.h>
#include <asm/kvm_emulate.h>
#include <asm/pointer_auth.h>
/* PACGA Xd, Xn, Xm */
#define PACGA(d,n,m) \
asm volatile(__DEFINE_ASM_GPR_NUMS \
".inst 0x9AC03000 |" \
"(.L__gpr_num_%[Rd] << 0) |" \
"(.L__gpr_num_%[Rn] << 5) |" \
"(.L__gpr_num_%[Rm] << 16)\n" \
: [Rd] "=r" ((d)) \
: [Rn] "r" ((n)), [Rm] "r" ((m)))
static u64 compute_pac(struct kvm_vcpu *vcpu, u64 ptr,
struct ptrauth_key ikey)
{
struct ptrauth_key gkey;
u64 mod, pac = 0;
preempt_disable();
if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU))
mod = __vcpu_sys_reg(vcpu, SP_EL2);
else
mod = read_sysreg(sp_el1);
gkey.lo = read_sysreg_s(SYS_APGAKEYLO_EL1);
gkey.hi = read_sysreg_s(SYS_APGAKEYHI_EL1);
__ptrauth_key_install_nosync(APGA, ikey);
isb();
PACGA(pac, ptr, mod);
isb();
__ptrauth_key_install_nosync(APGA, gkey);
preempt_enable();
/* PAC in the top 32bits */
return pac;
}
static bool effective_tbi(struct kvm_vcpu *vcpu, bool bit55)
{
u64 tcr = vcpu_read_sys_reg(vcpu, TCR_EL2);
bool tbi, tbid;
/*
* Since we are authenticating an instruction address, we have
* to take TBID into account. If E2H==0, ignore VA[55], as
* TCR_EL2 only has a single TBI/TBID. If VA[55] was set in
* this case, this is likely a guest bug...
*/
if (!vcpu_el2_e2h_is_set(vcpu)) {
tbi = tcr & BIT(20);
tbid = tcr & BIT(29);
} else if (bit55) {
tbi = tcr & TCR_TBI1;
tbid = tcr & TCR_TBID1;
} else {
tbi = tcr & TCR_TBI0;
tbid = tcr & TCR_TBID0;
}
return tbi && !tbid;
}
static int compute_bottom_pac(struct kvm_vcpu *vcpu, bool bit55)
{
static const int maxtxsz = 39; // Revisit these two values once
static const int mintxsz = 16; // (if) we support TTST/LVA/LVA2
u64 tcr = vcpu_read_sys_reg(vcpu, TCR_EL2);
int txsz;
if (!vcpu_el2_e2h_is_set(vcpu) || !bit55)
txsz = FIELD_GET(TCR_T0SZ_MASK, tcr);
else
txsz = FIELD_GET(TCR_T1SZ_MASK, tcr);
return 64 - clamp(txsz, mintxsz, maxtxsz);
}
static u64 compute_pac_mask(struct kvm_vcpu *vcpu, bool bit55)
{
int bottom_pac;
u64 mask;
bottom_pac = compute_bottom_pac(vcpu, bit55);
mask = GENMASK(54, bottom_pac);
if (!effective_tbi(vcpu, bit55))
mask |= GENMASK(63, 56);
return mask;
}
static u64 to_canonical_addr(struct kvm_vcpu *vcpu, u64 ptr, u64 mask)
{
bool bit55 = !!(ptr & BIT(55));
if (bit55)
return ptr | mask;
return ptr & ~mask;
}
static u64 corrupt_addr(struct kvm_vcpu *vcpu, u64 ptr)
{
bool bit55 = !!(ptr & BIT(55));
u64 mask, error_code;
int shift;
if (effective_tbi(vcpu, bit55)) {
mask = GENMASK(54, 53);
shift = 53;
} else {
mask = GENMASK(62, 61);
shift = 61;
}
if (esr_iss_is_eretab(kvm_vcpu_get_esr(vcpu)))
error_code = 2 << shift;
else
error_code = 1 << shift;
ptr &= ~mask;
ptr |= error_code;
return ptr;
}
/*
* Authenticate an ERETAA/ERETAB instruction, returning true if the
* authentication succeeded and false otherwise. In all cases, *elr
* contains the VA to ERET to. Potential exception injection is left
* to the caller.
*/
bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr)
{
u64 sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL2);
u64 esr = kvm_vcpu_get_esr(vcpu);
u64 ptr, cptr, pac, mask;
struct ptrauth_key ikey;
*elr = ptr = vcpu_read_sys_reg(vcpu, ELR_EL2);
/* We assume we're already in the context of an ERETAx */
if (esr_iss_is_eretab(esr)) {
if (!(sctlr & SCTLR_EL1_EnIB))
return true;
ikey.lo = __vcpu_sys_reg(vcpu, APIBKEYLO_EL1);
ikey.hi = __vcpu_sys_reg(vcpu, APIBKEYHI_EL1);
} else {
if (!(sctlr & SCTLR_EL1_EnIA))
return true;
ikey.lo = __vcpu_sys_reg(vcpu, APIAKEYLO_EL1);
ikey.hi = __vcpu_sys_reg(vcpu, APIAKEYHI_EL1);
}
mask = compute_pac_mask(vcpu, !!(ptr & BIT(55)));
cptr = to_canonical_addr(vcpu, ptr, mask);
pac = compute_pac(vcpu, cptr, ikey);
/*
* Slightly deviate from the pseudocode: if we have a PAC
* match with the signed pointer, then it must be good.
* Anything after this point is pure error handling.
*/
if ((pac & mask) == (ptr & mask)) {
*elr = cptr;
return true;
}
/*
* Authentication failed, corrupt the canonical address if
* PAuth2 isn't implemented, or some XORing if it is.
*/
if (!kvm_has_pauth(vcpu->kvm, PAuth2))
cptr = corrupt_addr(vcpu, cptr);
else
cptr = ptr ^ (pac & mask);
*elr = cptr;
return false;
}

View File

@ -222,7 +222,6 @@ void pkvm_destroy_hyp_vm(struct kvm *host_kvm)
int pkvm_init_host_vm(struct kvm *host_kvm)
{
mutex_init(&host_kvm->lock);
return 0;
}
@ -259,6 +258,7 @@ static int __init finalize_pkvm(void)
* at, which would end badly once inaccessible.
*/
kmemleak_free_part(__hyp_bss_start, __hyp_bss_end - __hyp_bss_start);
kmemleak_free_part(__hyp_rodata_start, __hyp_rodata_end - __hyp_rodata_start);
kmemleak_free_part_phys(hyp_mem_base, hyp_mem_size);
ret = pkvm_drop_host_privileges();

View File

@ -232,7 +232,7 @@ bool kvm_set_pmuserenr(u64 val)
if (!vcpu || !vcpu_get_flag(vcpu, PMUSERENR_ON_CPU))
return false;
hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
hctxt = host_data_ptr(host_ctxt);
ctxt_sys_reg(hctxt, PMUSERENR_EL0) = val;
return true;
}

View File

@ -151,7 +151,6 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
{
void *sve_state = vcpu->arch.sve_state;
kvm_vcpu_unshare_task_fp(vcpu);
kvm_unshare_hyp(vcpu, vcpu + 1);
if (sve_state)
kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));

View File

@ -1568,17 +1568,31 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, const struct sys_reg_desc *r
return IDREG(vcpu->kvm, reg_to_encoding(r));
}
static bool is_feature_id_reg(u32 encoding)
{
return (sys_reg_Op0(encoding) == 3 &&
(sys_reg_Op1(encoding) < 2 || sys_reg_Op1(encoding) == 3) &&
sys_reg_CRn(encoding) == 0 &&
sys_reg_CRm(encoding) <= 7);
}
/*
* Return true if the register's (Op0, Op1, CRn, CRm, Op2) is
* (3, 0, 0, crm, op2), where 1<=crm<8, 0<=op2<8.
* (3, 0, 0, crm, op2), where 1<=crm<8, 0<=op2<8, which is the range of ID
* registers KVM maintains on a per-VM basis.
*/
static inline bool is_id_reg(u32 id)
static inline bool is_vm_ftr_id_reg(u32 id)
{
return (sys_reg_Op0(id) == 3 && sys_reg_Op1(id) == 0 &&
sys_reg_CRn(id) == 0 && sys_reg_CRm(id) >= 1 &&
sys_reg_CRm(id) < 8);
}
static inline bool is_vcpu_ftr_id_reg(u32 id)
{
return is_feature_id_reg(id) && !is_vm_ftr_id_reg(id);
}
static inline bool is_aa32_id_reg(u32 id)
{
return (sys_reg_Op0(id) == 3 && sys_reg_Op1(id) == 0 &&
@ -2338,7 +2352,6 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_AA64MMFR0_EL1_TGRAN16_2)),
ID_WRITABLE(ID_AA64MMFR1_EL1, ~(ID_AA64MMFR1_EL1_RES0 |
ID_AA64MMFR1_EL1_HCX |
ID_AA64MMFR1_EL1_XNX |
ID_AA64MMFR1_EL1_TWED |
ID_AA64MMFR1_EL1_XNX |
ID_AA64MMFR1_EL1_VH |
@ -3069,12 +3082,14 @@ static bool check_sysreg_table(const struct sys_reg_desc *table, unsigned int n,
for (i = 0; i < n; i++) {
if (!is_32 && table[i].reg && !table[i].reset) {
kvm_err("sys_reg table %pS entry %d lacks reset\n", &table[i], i);
kvm_err("sys_reg table %pS entry %d (%s) lacks reset\n",
&table[i], i, table[i].name);
return false;
}
if (i && cmp_sys_reg(&table[i-1], &table[i]) >= 0) {
kvm_err("sys_reg table %pS entry %d out of order\n", &table[i - 1], i - 1);
kvm_err("sys_reg table %pS entry %d (%s -> %s) out of order\n",
&table[i], i, table[i - 1].name, table[i].name);
return false;
}
}
@ -3509,26 +3524,25 @@ void kvm_sys_regs_create_debugfs(struct kvm *kvm)
&idregs_debug_fops);
}
static void kvm_reset_id_regs(struct kvm_vcpu *vcpu)
static void reset_vm_ftr_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *reg)
{
const struct sys_reg_desc *idreg = first_idreg;
u32 id = reg_to_encoding(idreg);
u32 id = reg_to_encoding(reg);
struct kvm *kvm = vcpu->kvm;
if (test_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags))
return;
lockdep_assert_held(&kvm->arch.config_lock);
IDREG(kvm, id) = reg->reset(vcpu, reg);
}
/* Initialize all idregs */
while (is_id_reg(id)) {
IDREG(kvm, id) = idreg->reset(vcpu, idreg);
static void reset_vcpu_ftr_id_reg(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *reg)
{
if (kvm_vcpu_initialized(vcpu))
return;
idreg++;
id = reg_to_encoding(idreg);
}
set_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags);
reg->reset(vcpu, reg);
}
/**
@ -3540,19 +3554,24 @@ static void kvm_reset_id_regs(struct kvm_vcpu *vcpu)
*/
void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;
unsigned long i;
kvm_reset_id_regs(vcpu);
for (i = 0; i < ARRAY_SIZE(sys_reg_descs); i++) {
const struct sys_reg_desc *r = &sys_reg_descs[i];
if (is_id_reg(reg_to_encoding(r)))
if (!r->reset)
continue;
if (r->reset)
if (is_vm_ftr_id_reg(reg_to_encoding(r)))
reset_vm_ftr_id_reg(vcpu, r);
else if (is_vcpu_ftr_id_reg(reg_to_encoding(r)))
reset_vcpu_ftr_id_reg(vcpu, r);
else
r->reset(vcpu, r);
}
set_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags);
}
/**
@ -3978,14 +3997,6 @@ int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
sys_reg_CRm(r), \
sys_reg_Op2(r))
static bool is_feature_id_reg(u32 encoding)
{
return (sys_reg_Op0(encoding) == 3 &&
(sys_reg_Op1(encoding) < 2 || sys_reg_Op1(encoding) == 3) &&
sys_reg_CRn(encoding) == 0 &&
sys_reg_CRm(encoding) <= 7);
}
int kvm_vm_ioctl_get_reg_writable_masks(struct kvm *kvm, struct reg_mask_range *range)
{
const void *zero_page = page_to_virt(ZERO_PAGE(0));
@ -4014,7 +4025,7 @@ int kvm_vm_ioctl_get_reg_writable_masks(struct kvm *kvm, struct reg_mask_range *
* compliant with a given revision of the architecture, but the
* RES0/RES1 definitions allow us to do that.
*/
if (is_id_reg(encoding)) {
if (is_vm_ftr_id_reg(encoding)) {
if (!reg->val ||
(is_aa32_id_reg(encoding) && !kvm_supports_32bit_el0()))
continue;

View File

@ -28,27 +28,65 @@ struct vgic_state_iter {
int nr_lpis;
int dist_id;
int vcpu_id;
int intid;
unsigned long intid;
int lpi_idx;
u32 *lpi_array;
};
static void iter_next(struct vgic_state_iter *iter)
static void iter_next(struct kvm *kvm, struct vgic_state_iter *iter)
{
struct vgic_dist *dist = &kvm->arch.vgic;
if (iter->dist_id == 0) {
iter->dist_id++;
return;
}
/*
* Let the xarray drive the iterator after the last SPI, as the iterator
* has exhausted the sequentially-allocated INTID space.
*/
if (iter->intid >= (iter->nr_spis + VGIC_NR_PRIVATE_IRQS - 1)) {
if (iter->lpi_idx < iter->nr_lpis)
xa_find_after(&dist->lpi_xa, &iter->intid,
VGIC_LPI_MAX_INTID,
LPI_XA_MARK_DEBUG_ITER);
iter->lpi_idx++;
return;
}
iter->intid++;
if (iter->intid == VGIC_NR_PRIVATE_IRQS &&
++iter->vcpu_id < iter->nr_cpus)
iter->intid = 0;
}
if (iter->intid >= (iter->nr_spis + VGIC_NR_PRIVATE_IRQS)) {
if (iter->lpi_idx < iter->nr_lpis)
iter->intid = iter->lpi_array[iter->lpi_idx];
iter->lpi_idx++;
static int iter_mark_lpis(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq;
unsigned long intid;
int nr_lpis = 0;
xa_for_each(&dist->lpi_xa, intid, irq) {
if (!vgic_try_get_irq_kref(irq))
continue;
xa_set_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER);
nr_lpis++;
}
return nr_lpis;
}
static void iter_unmark_lpis(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq;
unsigned long intid;
xa_for_each(&dist->lpi_xa, intid, irq) {
xa_clear_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER);
vgic_put_irq(kvm, irq);
}
}
@ -61,15 +99,12 @@ static void iter_init(struct kvm *kvm, struct vgic_state_iter *iter,
iter->nr_cpus = nr_cpus;
iter->nr_spis = kvm->arch.vgic.nr_spis;
if (kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
iter->nr_lpis = vgic_copy_lpi_list(kvm, NULL, &iter->lpi_array);
if (iter->nr_lpis < 0)
iter->nr_lpis = 0;
}
if (kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
iter->nr_lpis = iter_mark_lpis(kvm);
/* Fast forward to the right position if needed */
while (pos--)
iter_next(iter);
iter_next(kvm, iter);
}
static bool end_of_vgic(struct vgic_state_iter *iter)
@ -114,7 +149,7 @@ static void *vgic_debug_next(struct seq_file *s, void *v, loff_t *pos)
struct vgic_state_iter *iter = kvm->arch.vgic.iter;
++*pos;
iter_next(iter);
iter_next(kvm, iter);
if (end_of_vgic(iter))
iter = NULL;
return iter;
@ -134,13 +169,14 @@ static void vgic_debug_stop(struct seq_file *s, void *v)
mutex_lock(&kvm->arch.config_lock);
iter = kvm->arch.vgic.iter;
kfree(iter->lpi_array);
iter_unmark_lpis(kvm);
kfree(iter);
kvm->arch.vgic.iter = NULL;
mutex_unlock(&kvm->arch.config_lock);
}
static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
static void print_dist_state(struct seq_file *s, struct vgic_dist *dist,
struct vgic_state_iter *iter)
{
bool v3 = dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3;
@ -149,7 +185,7 @@ static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
seq_printf(s, "vgic_model:\t%s\n", v3 ? "GICv3" : "GICv2");
seq_printf(s, "nr_spis:\t%d\n", dist->nr_spis);
if (v3)
seq_printf(s, "nr_lpis:\t%d\n", atomic_read(&dist->lpi_count));
seq_printf(s, "nr_lpis:\t%d\n", iter->nr_lpis);
seq_printf(s, "enabled:\t%d\n", dist->enabled);
seq_printf(s, "\n");
@ -236,7 +272,7 @@ static int vgic_debug_show(struct seq_file *s, void *v)
unsigned long flags;
if (iter->dist_id == 0) {
print_dist_state(s, &kvm->arch.vgic);
print_dist_state(s, &kvm->arch.vgic, iter);
return 0;
}
@ -246,11 +282,13 @@ static int vgic_debug_show(struct seq_file *s, void *v)
if (iter->vcpu_id < iter->nr_cpus)
vcpu = kvm_get_vcpu(kvm, iter->vcpu_id);
/*
* Expect this to succeed, as iter_mark_lpis() takes a reference on
* every LPI to be visited.
*/
irq = vgic_get_irq(kvm, vcpu, iter->intid);
if (!irq) {
seq_printf(s, " LPI %4d freed\n", iter->intid);
return 0;
}
if (WARN_ON_ONCE(!irq))
return -EINVAL;
raw_spin_lock_irqsave(&irq->irq_lock, flags);
print_irq_state(s, irq, vcpu);

View File

@ -53,8 +53,6 @@ void kvm_vgic_early_init(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
INIT_LIST_HEAD(&dist->lpi_translation_cache);
raw_spin_lock_init(&dist->lpi_list_lock);
xa_init_flags(&dist->lpi_xa, XA_FLAGS_LOCK_IRQ);
}
@ -182,27 +180,22 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
return 0;
}
/**
* kvm_vgic_vcpu_init() - Initialize static VGIC VCPU data
* structures and register VCPU-specific KVM iodevs
*
* @vcpu: pointer to the VCPU being created and initialized
*
* Only do initialization, but do not actually enable the
* VGIC CPU interface
*/
int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
int ret = 0;
int i;
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
lockdep_assert_held(&vcpu->kvm->arch.config_lock);
INIT_LIST_HEAD(&vgic_cpu->ap_list_head);
raw_spin_lock_init(&vgic_cpu->ap_list_lock);
atomic_set(&vgic_cpu->vgic_v3.its_vpe.vlpi_count, 0);
if (vgic_cpu->private_irqs)
return 0;
vgic_cpu->private_irqs = kcalloc(VGIC_NR_PRIVATE_IRQS,
sizeof(struct vgic_irq),
GFP_KERNEL_ACCOUNT);
if (!vgic_cpu->private_irqs)
return -ENOMEM;
/*
* Enable and configure all SGIs to be edge-triggered and
@ -227,9 +220,48 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
}
}
return 0;
}
static int vgic_allocate_private_irqs(struct kvm_vcpu *vcpu)
{
int ret;
mutex_lock(&vcpu->kvm->arch.config_lock);
ret = vgic_allocate_private_irqs_locked(vcpu);
mutex_unlock(&vcpu->kvm->arch.config_lock);
return ret;
}
/**
* kvm_vgic_vcpu_init() - Initialize static VGIC VCPU data
* structures and register VCPU-specific KVM iodevs
*
* @vcpu: pointer to the VCPU being created and initialized
*
* Only do initialization, but do not actually enable the
* VGIC CPU interface
*/
int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
int ret = 0;
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
INIT_LIST_HEAD(&vgic_cpu->ap_list_head);
raw_spin_lock_init(&vgic_cpu->ap_list_lock);
atomic_set(&vgic_cpu->vgic_v3.its_vpe.vlpi_count, 0);
if (!irqchip_in_kernel(vcpu->kvm))
return 0;
ret = vgic_allocate_private_irqs(vcpu);
if (ret)
return ret;
/*
* If we are creating a VCPU with a GICv3 we must also register the
* KVM io device for the redistributor that belongs to this VCPU.
@ -285,10 +317,13 @@ int vgic_init(struct kvm *kvm)
/* Initialize groups on CPUs created before the VGIC type was known */
kvm_for_each_vcpu(idx, vcpu, kvm) {
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
ret = vgic_allocate_private_irqs_locked(vcpu);
if (ret)
goto out;
for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) {
struct vgic_irq *irq = &vgic_cpu->private_irqs[i];
struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, i);
switch (dist->vgic_model) {
case KVM_DEV_TYPE_ARM_VGIC_V3:
irq->group = 1;
@ -300,14 +335,15 @@ int vgic_init(struct kvm *kvm)
break;
default:
ret = -EINVAL;
goto out;
}
vgic_put_irq(kvm, irq);
if (ret)
goto out;
}
}
if (vgic_has_its(kvm))
vgic_lpi_translation_cache_init(kvm);
/*
* If we have GICv4.1 enabled, unconditionally request enable the
* v4 support so that we get HW-accelerated vSGIs. Otherwise, only
@ -361,9 +397,6 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
}
if (vgic_has_its(kvm))
vgic_lpi_translation_cache_destroy(kvm);
if (vgic_supports_direct_msis(kvm))
vgic_v4_teardown(kvm);
@ -381,6 +414,9 @@ static void __kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
vgic_flush_pending_lpis(vcpu);
INIT_LIST_HEAD(&vgic_cpu->ap_list_head);
kfree(vgic_cpu->private_irqs);
vgic_cpu->private_irqs = NULL;
if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
vgic_unregister_redist_iodev(vcpu);
vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;

View File

@ -23,6 +23,8 @@
#include "vgic.h"
#include "vgic-mmio.h"
static struct kvm_device_ops kvm_arm_vgic_its_ops;
static int vgic_its_save_tables_v0(struct vgic_its *its);
static int vgic_its_restore_tables_v0(struct vgic_its *its);
static int vgic_its_commit_v0(struct vgic_its *its);
@ -67,7 +69,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
irq->target_vcpu = vcpu;
irq->group = 1;
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
xa_lock_irqsave(&dist->lpi_xa, flags);
/*
* There could be a race with another vgic_add_lpi(), so we need to
@ -82,17 +84,14 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
goto out_unlock;
}
ret = xa_err(xa_store(&dist->lpi_xa, intid, irq, 0));
ret = xa_err(__xa_store(&dist->lpi_xa, intid, irq, 0));
if (ret) {
xa_release(&dist->lpi_xa, intid);
kfree(irq);
goto out_unlock;
}
atomic_inc(&dist->lpi_count);
out_unlock:
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
xa_unlock_irqrestore(&dist->lpi_xa, flags);
if (ret)
return ERR_PTR(ret);
@ -150,14 +149,6 @@ struct its_ite {
u32 event_id;
};
struct vgic_translation_cache_entry {
struct list_head entry;
phys_addr_t db;
u32 devid;
u32 eventid;
struct vgic_irq *irq;
};
/**
* struct vgic_its_abi - ITS abi ops and settings
* @cte_esz: collection table entry size
@ -252,8 +243,10 @@ static struct its_ite *find_ite(struct vgic_its *its, u32 device_id,
#define GIC_LPI_OFFSET 8192
#define VITS_TYPER_IDBITS 16
#define VITS_TYPER_DEVBITS 16
#define VITS_TYPER_IDBITS 16
#define VITS_MAX_EVENTID (BIT(VITS_TYPER_IDBITS) - 1)
#define VITS_TYPER_DEVBITS 16
#define VITS_MAX_DEVID (BIT(VITS_TYPER_DEVBITS) - 1)
#define VITS_DTE_MAX_DEVID_OFFSET (BIT(14) - 1)
#define VITS_ITE_MAX_EVENTID_OFFSET (BIT(16) - 1)
@ -316,53 +309,6 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
return 0;
}
#define GIC_LPI_MAX_INTID ((1 << INTERRUPT_ID_BITS_ITS) - 1)
/*
* Create a snapshot of the current LPIs targeting @vcpu, so that we can
* enumerate those LPIs without holding any lock.
* Returns their number and puts the kmalloc'ed array into intid_ptr.
*/
int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr)
{
struct vgic_dist *dist = &kvm->arch.vgic;
XA_STATE(xas, &dist->lpi_xa, GIC_LPI_OFFSET);
struct vgic_irq *irq;
unsigned long flags;
u32 *intids;
int irq_count, i = 0;
/*
* There is an obvious race between allocating the array and LPIs
* being mapped/unmapped. If we ended up here as a result of a
* command, we're safe (locks are held, preventing another
* command). If coming from another path (such as enabling LPIs),
* we must be careful not to overrun the array.
*/
irq_count = atomic_read(&dist->lpi_count);
intids = kmalloc_array(irq_count, sizeof(intids[0]), GFP_KERNEL_ACCOUNT);
if (!intids)
return -ENOMEM;
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
rcu_read_lock();
xas_for_each(&xas, irq, GIC_LPI_MAX_INTID) {
if (i == irq_count)
break;
/* We don't need to "get" the IRQ, as we hold the list lock. */
if (vcpu && irq->target_vcpu != vcpu)
continue;
intids[i++] = irq->intid;
}
rcu_read_unlock();
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
*intid_ptr = intids;
return i;
}
static int update_affinity(struct vgic_irq *irq, struct kvm_vcpu *vcpu)
{
int ret = 0;
@ -446,23 +392,18 @@ static u32 max_lpis_propbaser(u64 propbaser)
static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
{
gpa_t pendbase = GICR_PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
unsigned long intid, flags;
struct vgic_irq *irq;
int last_byte_offset = -1;
int ret = 0;
u32 *intids;
int nr_irqs, i;
unsigned long flags;
u8 pendmask;
nr_irqs = vgic_copy_lpi_list(vcpu->kvm, vcpu, &intids);
if (nr_irqs < 0)
return nr_irqs;
for (i = 0; i < nr_irqs; i++) {
xa_for_each(&dist->lpi_xa, intid, irq) {
int byte_offset, bit_nr;
byte_offset = intids[i] / BITS_PER_BYTE;
bit_nr = intids[i] % BITS_PER_BYTE;
byte_offset = intid / BITS_PER_BYTE;
bit_nr = intid % BITS_PER_BYTE;
/*
* For contiguously allocated LPIs chances are we just read
@ -472,25 +413,23 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
ret = kvm_read_guest_lock(vcpu->kvm,
pendbase + byte_offset,
&pendmask, 1);
if (ret) {
kfree(intids);
if (ret)
return ret;
}
last_byte_offset = byte_offset;
}
irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
irq = vgic_get_irq(vcpu->kvm, NULL, intid);
if (!irq)
continue;
raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->pending_latch = pendmask & (1U << bit_nr);
if (irq->target_vcpu == vcpu)
irq->pending_latch = pendmask & (1U << bit_nr);
vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
vgic_put_irq(vcpu->kvm, irq);
}
kfree(intids);
return ret;
}
@ -566,51 +505,52 @@ static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm,
return 0;
}
static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist,
phys_addr_t db,
u32 devid, u32 eventid)
static struct vgic_its *__vgic_doorbell_to_its(struct kvm *kvm, gpa_t db)
{
struct vgic_translation_cache_entry *cte;
struct kvm_io_device *kvm_io_dev;
struct vgic_io_device *iodev;
list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
/*
* If we hit a NULL entry, there is nothing after this
* point.
*/
if (!cte->irq)
break;
kvm_io_dev = kvm_io_bus_get_dev(kvm, KVM_MMIO_BUS, db);
if (!kvm_io_dev)
return ERR_PTR(-EINVAL);
if (cte->db != db || cte->devid != devid ||
cte->eventid != eventid)
continue;
if (kvm_io_dev->ops != &kvm_io_gic_ops)
return ERR_PTR(-EINVAL);
/*
* Move this entry to the head, as it is the most
* recently used.
*/
if (!list_is_first(&cte->entry, &dist->lpi_translation_cache))
list_move(&cte->entry, &dist->lpi_translation_cache);
iodev = container_of(kvm_io_dev, struct vgic_io_device, dev);
if (iodev->iodev_type != IODEV_ITS)
return ERR_PTR(-EINVAL);
return cte->irq;
}
return iodev->its;
}
static unsigned long vgic_its_cache_key(u32 devid, u32 eventid)
{
return (((unsigned long)devid) << VITS_TYPER_IDBITS) | eventid;
return NULL;
}
static struct vgic_irq *vgic_its_check_cache(struct kvm *kvm, phys_addr_t db,
u32 devid, u32 eventid)
{
struct vgic_dist *dist = &kvm->arch.vgic;
unsigned long cache_key = vgic_its_cache_key(devid, eventid);
struct vgic_its *its;
struct vgic_irq *irq;
unsigned long flags;
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
if (devid > VITS_MAX_DEVID || eventid > VITS_MAX_EVENTID)
return NULL;
irq = __vgic_its_check_cache(dist, db, devid, eventid);
its = __vgic_doorbell_to_its(kvm, db);
if (IS_ERR(its))
return NULL;
rcu_read_lock();
irq = xa_load(&its->translation_cache, cache_key);
if (!vgic_try_get_irq_kref(irq))
irq = NULL;
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
rcu_read_unlock();
return irq;
}
@ -619,41 +559,13 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
u32 devid, u32 eventid,
struct vgic_irq *irq)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_translation_cache_entry *cte;
unsigned long flags;
phys_addr_t db;
unsigned long cache_key = vgic_its_cache_key(devid, eventid);
struct vgic_irq *old;
/* Do not cache a directly injected interrupt */
if (irq->hw)
return;
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
if (unlikely(list_empty(&dist->lpi_translation_cache)))
goto out;
/*
* We could have raced with another CPU caching the same
* translation behind our back, so let's check it is not in
* already
*/
db = its->vgic_its_base + GITS_TRANSLATER;
if (__vgic_its_check_cache(dist, db, devid, eventid))
goto out;
/* Always reuse the last entry (LRU policy) */
cte = list_last_entry(&dist->lpi_translation_cache,
typeof(*cte), entry);
/*
* Caching the translation implies having an extra reference
* to the interrupt, so drop the potential reference on what
* was in the cache, and increment it on the new interrupt.
*/
if (cte->irq)
vgic_put_irq(kvm, cte->irq);
/*
* The irq refcount is guaranteed to be nonzero while holding the
* its_lock, as the ITE (and the reference it holds) cannot be freed.
@ -661,39 +573,44 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
lockdep_assert_held(&its->its_lock);
vgic_get_irq_kref(irq);
cte->db = db;
cte->devid = devid;
cte->eventid = eventid;
cte->irq = irq;
/* Move the new translation to the head of the list */
list_move(&cte->entry, &dist->lpi_translation_cache);
out:
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
/*
* We could have raced with another CPU caching the same
* translation behind our back, ensure we don't leak a
* reference if that is the case.
*/
old = xa_store(&its->translation_cache, cache_key, irq, GFP_KERNEL_ACCOUNT);
if (old)
vgic_put_irq(kvm, old);
}
void vgic_its_invalidate_cache(struct kvm *kvm)
static void vgic_its_invalidate_cache(struct vgic_its *its)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_translation_cache_entry *cte;
unsigned long flags;
struct kvm *kvm = its->dev->kvm;
struct vgic_irq *irq;
unsigned long idx;
raw_spin_lock_irqsave(&dist->lpi_list_lock, flags);
xa_for_each(&its->translation_cache, idx, irq) {
xa_erase(&its->translation_cache, idx);
vgic_put_irq(kvm, irq);
}
}
list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
/*
* If we hit a NULL entry, there is nothing after this
* point.
*/
if (!cte->irq)
break;
void vgic_its_invalidate_all_caches(struct kvm *kvm)
{
struct kvm_device *dev;
struct vgic_its *its;
vgic_put_irq(kvm, cte->irq);
cte->irq = NULL;
rcu_read_lock();
list_for_each_entry_rcu(dev, &kvm->devices, vm_node) {
if (dev->ops != &kvm_arm_vgic_its_ops)
continue;
its = dev->private;
vgic_its_invalidate_cache(its);
}
raw_spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
rcu_read_unlock();
}
int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its,
@ -725,8 +642,6 @@ int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its,
struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi)
{
u64 address;
struct kvm_io_device *kvm_io_dev;
struct vgic_io_device *iodev;
if (!vgic_has_its(kvm))
return ERR_PTR(-ENODEV);
@ -736,18 +651,7 @@ struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi)
address = (u64)msi->address_hi << 32 | msi->address_lo;
kvm_io_dev = kvm_io_bus_get_dev(kvm, KVM_MMIO_BUS, address);
if (!kvm_io_dev)
return ERR_PTR(-EINVAL);
if (kvm_io_dev->ops != &kvm_io_gic_ops)
return ERR_PTR(-EINVAL);
iodev = container_of(kvm_io_dev, struct vgic_io_device, dev);
if (iodev->iodev_type != IODEV_ITS)
return ERR_PTR(-EINVAL);
return iodev->its;
return __vgic_doorbell_to_its(kvm, address);
}
/*
@ -883,7 +787,7 @@ static int vgic_its_cmd_handle_discard(struct kvm *kvm, struct vgic_its *its,
* don't bother here since we clear the ITTE anyway and the
* pending state is a property of the ITTE struct.
*/
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
its_free_ite(kvm, ite);
return 0;
@ -920,7 +824,7 @@ static int vgic_its_cmd_handle_movi(struct kvm *kvm, struct vgic_its *its,
ite->collection = collection;
vcpu = collection_to_vcpu(kvm, collection);
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
return update_affinity(ite->irq, vcpu);
}
@ -955,7 +859,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
switch (type) {
case GITS_BASER_TYPE_DEVICE:
if (id >= BIT_ULL(VITS_TYPER_DEVBITS))
if (id > VITS_MAX_DEVID)
return false;
break;
case GITS_BASER_TYPE_COLLECTION:
@ -1167,7 +1071,8 @@ static int vgic_its_cmd_handle_mapi(struct kvm *kvm, struct vgic_its *its,
}
/* Requires the its_lock to be held. */
static void vgic_its_free_device(struct kvm *kvm, struct its_device *device)
static void vgic_its_free_device(struct kvm *kvm, struct vgic_its *its,
struct its_device *device)
{
struct its_ite *ite, *temp;
@ -1179,7 +1084,7 @@ static void vgic_its_free_device(struct kvm *kvm, struct its_device *device)
list_for_each_entry_safe(ite, temp, &device->itt_head, ite_list)
its_free_ite(kvm, ite);
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
list_del(&device->dev_list);
kfree(device);
@ -1191,7 +1096,7 @@ static void vgic_its_free_device_list(struct kvm *kvm, struct vgic_its *its)
struct its_device *cur, *temp;
list_for_each_entry_safe(cur, temp, &its->device_list, dev_list)
vgic_its_free_device(kvm, cur);
vgic_its_free_device(kvm, its, cur);
}
/* its lock must be held */
@ -1250,7 +1155,7 @@ static int vgic_its_cmd_handle_mapd(struct kvm *kvm, struct vgic_its *its,
* by removing the mapping and re-establishing it.
*/
if (device)
vgic_its_free_device(kvm, device);
vgic_its_free_device(kvm, its, device);
/*
* The spec does not say whether unmapping a not-mapped device
@ -1281,7 +1186,7 @@ static int vgic_its_cmd_handle_mapc(struct kvm *kvm, struct vgic_its *its,
if (!valid) {
vgic_its_free_collection(its, coll_id);
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
} else {
struct kvm_vcpu *vcpu;
@ -1372,23 +1277,19 @@ static int vgic_its_cmd_handle_inv(struct kvm *kvm, struct vgic_its *its,
int vgic_its_invall(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;
int irq_count, i = 0;
u32 *intids;
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq;
unsigned long intid;
irq_count = vgic_copy_lpi_list(kvm, vcpu, &intids);
if (irq_count < 0)
return irq_count;
for (i = 0; i < irq_count; i++) {
struct vgic_irq *irq = vgic_get_irq(kvm, NULL, intids[i]);
xa_for_each(&dist->lpi_xa, intid, irq) {
irq = vgic_get_irq(kvm, NULL, intid);
if (!irq)
continue;
update_lpi_config(kvm, irq, vcpu, false);
vgic_put_irq(kvm, irq);
}
kfree(intids);
if (vcpu->arch.vgic_cpu.vgic_v3.its_vpe.its_vm)
its_invall_vpe(&vcpu->arch.vgic_cpu.vgic_v3.its_vpe);
@ -1431,10 +1332,10 @@ static int vgic_its_cmd_handle_invall(struct kvm *kvm, struct vgic_its *its,
static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
u64 *its_cmd)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct kvm_vcpu *vcpu1, *vcpu2;
struct vgic_irq *irq;
u32 *intids;
int irq_count, i;
unsigned long intid;
/* We advertise GITS_TYPER.PTA==0, making the address the vcpu ID */
vcpu1 = kvm_get_vcpu_by_id(kvm, its_cmd_get_target_addr(its_cmd));
@ -1446,12 +1347,8 @@ static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
if (vcpu1 == vcpu2)
return 0;
irq_count = vgic_copy_lpi_list(kvm, vcpu1, &intids);
if (irq_count < 0)
return irq_count;
for (i = 0; i < irq_count; i++) {
irq = vgic_get_irq(kvm, NULL, intids[i]);
xa_for_each(&dist->lpi_xa, intid, irq) {
irq = vgic_get_irq(kvm, NULL, intid);
if (!irq)
continue;
@ -1460,9 +1357,8 @@ static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
vgic_put_irq(kvm, irq);
}
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
kfree(intids);
return 0;
}
@ -1813,7 +1709,7 @@ static void vgic_mmio_write_its_ctlr(struct kvm *kvm, struct vgic_its *its,
its->enabled = !!(val & GITS_CTLR_ENABLE);
if (!its->enabled)
vgic_its_invalidate_cache(kvm);
vgic_its_invalidate_cache(its);
/*
* Try to process any pending commands. This function bails out early
@ -1914,47 +1810,6 @@ out:
return ret;
}
/* Default is 16 cached LPIs per vcpu */
#define LPI_DEFAULT_PCPU_CACHE_SIZE 16
void vgic_lpi_translation_cache_init(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
unsigned int sz;
int i;
if (!list_empty(&dist->lpi_translation_cache))
return;
sz = atomic_read(&kvm->online_vcpus) * LPI_DEFAULT_PCPU_CACHE_SIZE;
for (i = 0; i < sz; i++) {
struct vgic_translation_cache_entry *cte;
/* An allocation failure is not fatal */
cte = kzalloc(sizeof(*cte), GFP_KERNEL_ACCOUNT);
if (WARN_ON(!cte))
break;
INIT_LIST_HEAD(&cte->entry);
list_add(&cte->entry, &dist->lpi_translation_cache);
}
}
void vgic_lpi_translation_cache_destroy(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_translation_cache_entry *cte, *tmp;
vgic_its_invalidate_cache(kvm);
list_for_each_entry_safe(cte, tmp,
&dist->lpi_translation_cache, entry) {
list_del(&cte->entry);
kfree(cte);
}
}
#define INITIAL_BASER_VALUE \
(GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWb) | \
GIC_BASER_CACHEABILITY(GITS_BASER, OUTER, SameAsInner) | \
@ -1987,8 +1842,6 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
kfree(its);
return ret;
}
vgic_lpi_translation_cache_init(dev->kvm);
}
mutex_init(&its->its_lock);
@ -2006,6 +1859,7 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
INIT_LIST_HEAD(&its->device_list);
INIT_LIST_HEAD(&its->collection_list);
xa_init(&its->translation_cache);
dev->kvm->arch.vgic.msis_require_devid = true;
dev->kvm->arch.vgic.has_its = true;
@ -2036,6 +1890,8 @@ static void vgic_its_destroy(struct kvm_device *kvm_dev)
vgic_its_free_device_list(kvm, its);
vgic_its_free_collection_list(kvm, its);
vgic_its_invalidate_cache(its);
xa_destroy(&its->translation_cache);
mutex_unlock(&its->its_lock);
kfree(its);
@ -2438,7 +2294,7 @@ static int vgic_its_restore_dte(struct vgic_its *its, u32 id,
ret = vgic_its_restore_itt(its, dev);
if (ret) {
vgic_its_free_device(its->dev->kvm, dev);
vgic_its_free_device(its->dev->kvm, its, dev);
return ret;
}

View File

@ -277,7 +277,7 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu *vcpu,
return;
vgic_flush_pending_lpis(vcpu);
vgic_its_invalidate_cache(vcpu->kvm);
vgic_its_invalidate_all_caches(vcpu->kvm);
atomic_set_release(&vgic_cpu->ctlr, 0);
} else {
ctlr = atomic_cmpxchg_acquire(&vgic_cpu->ctlr, 0,

View File

@ -464,17 +464,10 @@ void vgic_v2_load(struct kvm_vcpu *vcpu)
kvm_vgic_global_state.vctrl_base + GICH_APR);
}
void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu)
{
struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
cpu_if->vgic_vmcr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VMCR);
}
void vgic_v2_put(struct kvm_vcpu *vcpu)
{
struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
vgic_v2_vmcr_sync(vcpu);
cpu_if->vgic_vmcr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VMCR);
cpu_if->vgic_apr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_APR);
}

View File

@ -722,15 +722,7 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
/*
* If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
* is dependent on ICC_SRE_EL1.SRE, and we have to perform the
* VMCR_EL2 save/restore in the world switch.
*/
if (likely(cpu_if->vgic_sre))
kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
kvm_call_hyp(__vgic_v3_restore_aprs, cpu_if);
kvm_call_hyp(__vgic_v3_restore_vmcr_aprs, cpu_if);
if (has_vhe())
__vgic_v3_activate_traps(cpu_if);
@ -738,24 +730,13 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
WARN_ON(vgic_v4_load(vcpu));
}
void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
if (likely(cpu_if->vgic_sre))
cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
}
void vgic_v3_put(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
kvm_call_hyp(__vgic_v3_save_vmcr_aprs, cpu_if);
WARN_ON(vgic_v4_put(vcpu));
vgic_v3_vmcr_sync(vcpu);
kvm_call_hyp(__vgic_v3_save_aprs, cpu_if);
if (has_vhe())
__vgic_v3_deactivate_traps(cpu_if);
}

View File

@ -29,9 +29,8 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
* its->cmd_lock (mutex)
* its->its_lock (mutex)
* vgic_cpu->ap_list_lock must be taken with IRQs disabled
* kvm->lpi_list_lock must be taken with IRQs disabled
* vgic_dist->lpi_xa.xa_lock must be taken with IRQs disabled
* vgic_irq->irq_lock must be taken with IRQs disabled
* vgic_dist->lpi_xa.xa_lock must be taken with IRQs disabled
* vgic_irq->irq_lock must be taken with IRQs disabled
*
* As the ap_list_lock might be taken from the timer interrupt handler,
* we have to disable IRQs before taking this lock and everything lower
@ -126,7 +125,6 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
__xa_erase(&dist->lpi_xa, irq->intid);
xa_unlock_irqrestore(&dist->lpi_xa, flags);
atomic_dec(&dist->lpi_count);
kfree_rcu(irq, rcu);
}
@ -939,17 +937,6 @@ void kvm_vgic_put(struct kvm_vcpu *vcpu)
vgic_v3_put(vcpu);
}
void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu)
{
if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
return;
if (kvm_vgic_global_state.type == VGIC_V2)
vgic_v2_vmcr_sync(vcpu);
else
vgic_v3_vmcr_sync(vcpu);
}
int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;

View File

@ -16,6 +16,7 @@
#define INTERRUPT_ID_BITS_SPIS 10
#define INTERRUPT_ID_BITS_ITS 16
#define VGIC_LPI_MAX_INTID ((1 << INTERRUPT_ID_BITS_ITS) - 1)
#define VGIC_PRI_BITS 5
#define vgic_irq_is_sgi(intid) ((intid) < VGIC_NR_SGIS)
@ -214,7 +215,6 @@ int vgic_register_dist_iodev(struct kvm *kvm, gpa_t dist_base_address,
void vgic_v2_init_lrs(void);
void vgic_v2_load(struct kvm_vcpu *vcpu);
void vgic_v2_put(struct kvm_vcpu *vcpu);
void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu);
void vgic_v2_save_state(struct kvm_vcpu *vcpu);
void vgic_v2_restore_state(struct kvm_vcpu *vcpu);
@ -253,7 +253,6 @@ bool vgic_v3_check_base(struct kvm *kvm);
void vgic_v3_load(struct kvm_vcpu *vcpu);
void vgic_v3_put(struct kvm_vcpu *vcpu);
void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu);
bool vgic_has_its(struct kvm *kvm);
int kvm_vgic_register_its_device(void);
@ -330,14 +329,11 @@ static inline bool vgic_dist_overlap(struct kvm *kvm, gpa_t base, size_t size)
}
bool vgic_lpis_enabled(struct kvm_vcpu *vcpu);
int vgic_copy_lpi_list(struct kvm *kvm, struct kvm_vcpu *vcpu, u32 **intid_ptr);
int vgic_its_resolve_lpi(struct kvm *kvm, struct vgic_its *its,
u32 devid, u32 eventid, struct vgic_irq **irq);
struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi);
int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi);
void vgic_lpi_translation_cache_init(struct kvm *kvm);
void vgic_lpi_translation_cache_destroy(struct kvm *kvm);
void vgic_its_invalidate_cache(struct kvm *kvm);
void vgic_its_invalidate_all_caches(struct kvm *kvm);
/* GICv4.1 MMIO interface */
int vgic_its_inv_lpi(struct kvm *kvm, struct vgic_irq *irq);

View File

@ -210,6 +210,12 @@ struct vgic_its {
struct mutex its_lock;
struct list_head device_list;
struct list_head collection_list;
/*
* Caches the (device_id, event_id) -> vgic_irq translation for
* LPIs that are mapped and enabled.
*/
struct xarray translation_cache;
};
struct vgic_state_iter;
@ -274,13 +280,8 @@ struct vgic_dist {
*/
u64 propbaser;
/* Protects the lpi_list. */
raw_spinlock_t lpi_list_lock;
#define LPI_XA_MARK_DEBUG_ITER XA_MARK_0
struct xarray lpi_xa;
atomic_t lpi_count;
/* LPI translation cache */
struct list_head lpi_translation_cache;
/* used by vgic-debug */
struct vgic_state_iter *iter;
@ -330,7 +331,7 @@ struct vgic_cpu {
struct vgic_v3_cpu_if vgic_v3;
};
struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
struct vgic_irq *private_irqs;
raw_spinlock_t ap_list_lock; /* Protects the ap_list */
@ -388,7 +389,6 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
void kvm_vgic_load(struct kvm_vcpu *vcpu);
void kvm_vgic_put(struct kvm_vcpu *vcpu);
void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
#define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel))
#define vgic_initialized(k) ((k)->arch.vgic.initialized)

View File

@ -45,6 +45,7 @@ LIBKVM_x86_64 += lib/x86_64/vmx.c
LIBKVM_aarch64 += lib/aarch64/gic.c
LIBKVM_aarch64 += lib/aarch64/gic_v3.c
LIBKVM_aarch64 += lib/aarch64/gic_v3_its.c
LIBKVM_aarch64 += lib/aarch64/handlers.S
LIBKVM_aarch64 += lib/aarch64/processor.c
LIBKVM_aarch64 += lib/aarch64/spinlock.c
@ -158,6 +159,7 @@ TEST_GEN_PROGS_aarch64 += aarch64/smccc_filter
TEST_GEN_PROGS_aarch64 += aarch64/vcpu_width_config
TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
TEST_GEN_PROGS_aarch64 += aarch64/vgic_irq
TEST_GEN_PROGS_aarch64 += aarch64/vgic_lpi_stress
TEST_GEN_PROGS_aarch64 += aarch64/vpmu_counter_access
TEST_GEN_PROGS_aarch64 += access_tracking_perf_test
TEST_GEN_PROGS_aarch64 += arch_timer

View File

@ -14,9 +14,6 @@
#include "timer_test.h"
#include "vgic.h"
#define GICD_BASE_GPA 0x8000000ULL
#define GICR_BASE_GPA 0x80A0000ULL
enum guest_stage {
GUEST_STAGE_VTIMER_CVAL = 1,
GUEST_STAGE_VTIMER_TVAL,
@ -149,8 +146,7 @@ static void guest_code(void)
local_irq_disable();
gic_init(GIC_V3, test_args.nr_vcpus,
(void *)GICD_BASE_GPA, (void *)GICR_BASE_GPA);
gic_init(GIC_V3, test_args.nr_vcpus);
timer_set_ctl(VIRTUAL, CTL_IMASK);
timer_set_ctl(PHYSICAL, CTL_IMASK);
@ -209,7 +205,7 @@ struct kvm_vm *test_vm_create(void)
vcpu_init_descriptor_tables(vcpus[i]);
test_init_timer_irq(vm);
gic_fd = vgic_v3_setup(vm, nr_vcpus, 64, GICD_BASE_GPA, GICR_BASE_GPA);
gic_fd = vgic_v3_setup(vm, nr_vcpus, 64);
__TEST_REQUIRE(gic_fd >= 0, "Failed to create vgic-v3");
/* Make all the test's cmdline args visible to the guest */

View File

@ -13,7 +13,9 @@
#define _GNU_SOURCE
#include <linux/kernel.h>
#include <linux/psci.h>
#include <asm/cputype.h>
#include "kvm_util.h"
#include "processor.h"

View File

@ -327,8 +327,8 @@ uint64_t get_invalid_value(const struct reg_ftr_bits *ftr_bits, uint64_t ftr)
return ftr;
}
static void test_reg_set_success(struct kvm_vcpu *vcpu, uint64_t reg,
const struct reg_ftr_bits *ftr_bits)
static uint64_t test_reg_set_success(struct kvm_vcpu *vcpu, uint64_t reg,
const struct reg_ftr_bits *ftr_bits)
{
uint8_t shift = ftr_bits->shift;
uint64_t mask = ftr_bits->mask;
@ -346,6 +346,8 @@ static void test_reg_set_success(struct kvm_vcpu *vcpu, uint64_t reg,
vcpu_set_reg(vcpu, reg, val);
vcpu_get_reg(vcpu, reg, &new_val);
TEST_ASSERT_EQ(new_val, val);
return new_val;
}
static void test_reg_set_fail(struct kvm_vcpu *vcpu, uint64_t reg,
@ -374,7 +376,15 @@ static void test_reg_set_fail(struct kvm_vcpu *vcpu, uint64_t reg,
TEST_ASSERT_EQ(val, old_val);
}
static void test_user_set_reg(struct kvm_vcpu *vcpu, bool aarch64_only)
static uint64_t test_reg_vals[KVM_ARM_FEATURE_ID_RANGE_SIZE];
#define encoding_to_range_idx(encoding) \
KVM_ARM_FEATURE_ID_RANGE_IDX(sys_reg_Op0(encoding), sys_reg_Op1(encoding), \
sys_reg_CRn(encoding), sys_reg_CRm(encoding), \
sys_reg_Op2(encoding))
static void test_vm_ftr_id_regs(struct kvm_vcpu *vcpu, bool aarch64_only)
{
uint64_t masks[KVM_ARM_FEATURE_ID_RANGE_SIZE];
struct reg_mask_range range = {
@ -398,9 +408,7 @@ static void test_user_set_reg(struct kvm_vcpu *vcpu, bool aarch64_only)
int idx;
/* Get the index to masks array for the idreg */
idx = KVM_ARM_FEATURE_ID_RANGE_IDX(sys_reg_Op0(reg_id), sys_reg_Op1(reg_id),
sys_reg_CRn(reg_id), sys_reg_CRm(reg_id),
sys_reg_Op2(reg_id));
idx = encoding_to_range_idx(reg_id);
for (int j = 0; ftr_bits[j].type != FTR_END; j++) {
/* Skip aarch32 reg on aarch64 only system, since they are RAZ/WI. */
@ -414,7 +422,9 @@ static void test_user_set_reg(struct kvm_vcpu *vcpu, bool aarch64_only)
TEST_ASSERT_EQ(masks[idx] & ftr_bits[j].mask, ftr_bits[j].mask);
test_reg_set_fail(vcpu, reg, &ftr_bits[j]);
test_reg_set_success(vcpu, reg, &ftr_bits[j]);
test_reg_vals[idx] = test_reg_set_success(vcpu, reg,
&ftr_bits[j]);
ksft_test_result_pass("%s\n", ftr_bits[j].name);
}
@ -425,7 +435,6 @@ static void test_guest_reg_read(struct kvm_vcpu *vcpu)
{
bool done = false;
struct ucall uc;
uint64_t val;
while (!done) {
vcpu_run(vcpu);
@ -436,8 +445,8 @@ static void test_guest_reg_read(struct kvm_vcpu *vcpu)
break;
case UCALL_SYNC:
/* Make sure the written values are seen by guest */
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(uc.args[2]), &val);
TEST_ASSERT_EQ(val, uc.args[3]);
TEST_ASSERT_EQ(test_reg_vals[encoding_to_range_idx(uc.args[2])],
uc.args[3]);
break;
case UCALL_DONE:
done = true;
@ -448,13 +457,85 @@ static void test_guest_reg_read(struct kvm_vcpu *vcpu)
}
}
/* Politely lifted from arch/arm64/include/asm/cache.h */
/* Ctypen, bits[3(n - 1) + 2 : 3(n - 1)], for n = 1 to 7 */
#define CLIDR_CTYPE_SHIFT(level) (3 * (level - 1))
#define CLIDR_CTYPE_MASK(level) (7 << CLIDR_CTYPE_SHIFT(level))
#define CLIDR_CTYPE(clidr, level) \
(((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level))
static void test_clidr(struct kvm_vcpu *vcpu)
{
uint64_t clidr;
int level;
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_CLIDR_EL1), &clidr);
/* find the first empty level in the cache hierarchy */
for (level = 1; level < 7; level++) {
if (!CLIDR_CTYPE(clidr, level))
break;
}
/*
* If you have a mind-boggling 7 levels of cache, congratulations, you
* get to fix this.
*/
TEST_ASSERT(level <= 7, "can't find an empty level in cache hierarchy");
/* stick in a unified cache level */
clidr |= BIT(2) << CLIDR_CTYPE_SHIFT(level);
vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_CLIDR_EL1), clidr);
test_reg_vals[encoding_to_range_idx(SYS_CLIDR_EL1)] = clidr;
}
static void test_vcpu_ftr_id_regs(struct kvm_vcpu *vcpu)
{
u64 val;
test_clidr(vcpu);
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), &val);
val++;
vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), val);
test_reg_vals[encoding_to_range_idx(SYS_MPIDR_EL1)] = val;
ksft_test_result_pass("%s\n", __func__);
}
static void test_assert_id_reg_unchanged(struct kvm_vcpu *vcpu, uint32_t encoding)
{
size_t idx = encoding_to_range_idx(encoding);
uint64_t observed;
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(encoding), &observed);
TEST_ASSERT_EQ(test_reg_vals[idx], observed);
}
static void test_reset_preserves_id_regs(struct kvm_vcpu *vcpu)
{
/*
* Calls KVM_ARM_VCPU_INIT behind the scenes, which will do an
* architectural reset of the vCPU.
*/
aarch64_vcpu_setup(vcpu, NULL);
for (int i = 0; i < ARRAY_SIZE(test_regs); i++)
test_assert_id_reg_unchanged(vcpu, test_regs[i].reg);
test_assert_id_reg_unchanged(vcpu, SYS_CLIDR_EL1);
ksft_test_result_pass("%s\n", __func__);
}
int main(void)
{
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
bool aarch64_only;
uint64_t val, el0;
int ftr_cnt;
int test_cnt;
TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES));
@ -467,18 +548,22 @@ int main(void)
ksft_print_header();
ftr_cnt = ARRAY_SIZE(ftr_id_aa64dfr0_el1) + ARRAY_SIZE(ftr_id_dfr0_el1) +
ARRAY_SIZE(ftr_id_aa64isar0_el1) + ARRAY_SIZE(ftr_id_aa64isar1_el1) +
ARRAY_SIZE(ftr_id_aa64isar2_el1) + ARRAY_SIZE(ftr_id_aa64pfr0_el1) +
ARRAY_SIZE(ftr_id_aa64mmfr0_el1) + ARRAY_SIZE(ftr_id_aa64mmfr1_el1) +
ARRAY_SIZE(ftr_id_aa64mmfr2_el1) + ARRAY_SIZE(ftr_id_aa64zfr0_el1) -
ARRAY_SIZE(test_regs);
test_cnt = ARRAY_SIZE(ftr_id_aa64dfr0_el1) + ARRAY_SIZE(ftr_id_dfr0_el1) +
ARRAY_SIZE(ftr_id_aa64isar0_el1) + ARRAY_SIZE(ftr_id_aa64isar1_el1) +
ARRAY_SIZE(ftr_id_aa64isar2_el1) + ARRAY_SIZE(ftr_id_aa64pfr0_el1) +
ARRAY_SIZE(ftr_id_aa64mmfr0_el1) + ARRAY_SIZE(ftr_id_aa64mmfr1_el1) +
ARRAY_SIZE(ftr_id_aa64mmfr2_el1) + ARRAY_SIZE(ftr_id_aa64zfr0_el1) -
ARRAY_SIZE(test_regs) + 2;
ksft_set_plan(ftr_cnt);
ksft_set_plan(test_cnt);
test_vm_ftr_id_regs(vcpu, aarch64_only);
test_vcpu_ftr_id_regs(vcpu);
test_user_set_reg(vcpu, aarch64_only);
test_guest_reg_read(vcpu);
test_reset_preserves_id_regs(vcpu);
kvm_vm_free(vm);
ksft_finished();

View File

@ -19,9 +19,6 @@
#include "gic_v3.h"
#include "vgic.h"
#define GICD_BASE_GPA 0x08000000ULL
#define GICR_BASE_GPA 0x080A0000ULL
/*
* Stores the user specified args; it's passed to the guest and to every test
* function.
@ -49,9 +46,6 @@ struct test_args {
#define IRQ_DEFAULT_PRIO (LOWEST_PRIO - 1)
#define IRQ_DEFAULT_PRIO_REG (IRQ_DEFAULT_PRIO << KVM_PRIO_SHIFT) /* 0xf0 */
static void *dist = (void *)GICD_BASE_GPA;
static void *redist = (void *)GICR_BASE_GPA;
/*
* The kvm_inject_* utilities are used by the guest to ask the host to inject
* interrupts (e.g., using the KVM_IRQ_LINE ioctl).
@ -152,7 +146,7 @@ static void reset_stats(void)
static uint64_t gic_read_ap1r0(void)
{
uint64_t reg = read_sysreg_s(SYS_ICV_AP1R0_EL1);
uint64_t reg = read_sysreg_s(SYS_ICC_AP1R0_EL1);
dsb(sy);
return reg;
@ -160,7 +154,7 @@ static uint64_t gic_read_ap1r0(void)
static void gic_write_ap1r0(uint64_t val)
{
write_sysreg_s(val, SYS_ICV_AP1R0_EL1);
write_sysreg_s(val, SYS_ICC_AP1R0_EL1);
isb();
}
@ -478,7 +472,7 @@ static void guest_code(struct test_args *args)
bool level_sensitive = args->level_sensitive;
struct kvm_inject_desc *f, *inject_fns;
gic_init(GIC_V3, 1, dist, redist);
gic_init(GIC_V3, 1);
for (i = 0; i < nr_irqs; i++)
gic_irq_enable(i);
@ -764,8 +758,7 @@ static void test_vgic(uint32_t nr_irqs, bool level_sensitive, bool eoi_split)
memcpy(addr_gva2hva(vm, args_gva), &args, sizeof(args));
vcpu_args_set(vcpu, 1, args_gva);
gic_fd = vgic_v3_setup(vm, 1, nr_irqs,
GICD_BASE_GPA, GICR_BASE_GPA);
gic_fd = vgic_v3_setup(vm, 1, nr_irqs);
__TEST_REQUIRE(gic_fd >= 0, "Failed to create vgic-v3, skipping");
vm_install_exception_handler(vm, VECTOR_IRQ_CURRENT,

View File

@ -0,0 +1,410 @@
// SPDX-License-Identifier: GPL-2.0
/*
* vgic_lpi_stress - Stress test for KVM's ITS emulation
*
* Copyright (c) 2024 Google LLC
*/
#include <linux/sizes.h>
#include <pthread.h>
#include <stdatomic.h>
#include <sys/sysinfo.h>
#include "kvm_util.h"
#include "gic.h"
#include "gic_v3.h"
#include "gic_v3_its.h"
#include "processor.h"
#include "ucall.h"
#include "vgic.h"
#define TEST_MEMSLOT_INDEX 1
#define GIC_LPI_OFFSET 8192
static size_t nr_iterations = 1000;
static vm_paddr_t gpa_base;
static struct kvm_vm *vm;
static struct kvm_vcpu **vcpus;
static int gic_fd, its_fd;
static struct test_data {
bool request_vcpus_stop;
u32 nr_cpus;
u32 nr_devices;
u32 nr_event_ids;
vm_paddr_t device_table;
vm_paddr_t collection_table;
vm_paddr_t cmdq_base;
void *cmdq_base_va;
vm_paddr_t itt_tables;
vm_paddr_t lpi_prop_table;
vm_paddr_t lpi_pend_tables;
} test_data = {
.nr_cpus = 1,
.nr_devices = 1,
.nr_event_ids = 16,
};
static void guest_irq_handler(struct ex_regs *regs)
{
u32 intid = gic_get_and_ack_irq();
if (intid == IAR_SPURIOUS)
return;
GUEST_ASSERT(intid >= GIC_LPI_OFFSET);
gic_set_eoi(intid);
}
static void guest_setup_its_mappings(void)
{
u32 coll_id, device_id, event_id, intid = GIC_LPI_OFFSET;
u32 nr_events = test_data.nr_event_ids;
u32 nr_devices = test_data.nr_devices;
u32 nr_cpus = test_data.nr_cpus;
for (coll_id = 0; coll_id < nr_cpus; coll_id++)
its_send_mapc_cmd(test_data.cmdq_base_va, coll_id, coll_id, true);
/* Round-robin the LPIs to all of the vCPUs in the VM */
coll_id = 0;
for (device_id = 0; device_id < nr_devices; device_id++) {
vm_paddr_t itt_base = test_data.itt_tables + (device_id * SZ_64K);
its_send_mapd_cmd(test_data.cmdq_base_va, device_id,
itt_base, SZ_64K, true);
for (event_id = 0; event_id < nr_events; event_id++) {
its_send_mapti_cmd(test_data.cmdq_base_va, device_id,
event_id, coll_id, intid++);
coll_id = (coll_id + 1) % test_data.nr_cpus;
}
}
}
static void guest_invalidate_all_rdists(void)
{
int i;
for (i = 0; i < test_data.nr_cpus; i++)
its_send_invall_cmd(test_data.cmdq_base_va, i);
}
static void guest_setup_gic(void)
{
static atomic_int nr_cpus_ready = 0;
u32 cpuid = guest_get_vcpuid();
gic_init(GIC_V3, test_data.nr_cpus);
gic_rdist_enable_lpis(test_data.lpi_prop_table, SZ_64K,
test_data.lpi_pend_tables + (cpuid * SZ_64K));
atomic_fetch_add(&nr_cpus_ready, 1);
if (cpuid > 0)
return;
while (atomic_load(&nr_cpus_ready) < test_data.nr_cpus)
cpu_relax();
its_init(test_data.collection_table, SZ_64K,
test_data.device_table, SZ_64K,
test_data.cmdq_base, SZ_64K);
guest_setup_its_mappings();
guest_invalidate_all_rdists();
}
static void guest_code(size_t nr_lpis)
{
guest_setup_gic();
GUEST_SYNC(0);
/*
* Don't use WFI here to avoid blocking the vCPU thread indefinitely and
* never getting the stop signal.
*/
while (!READ_ONCE(test_data.request_vcpus_stop))
cpu_relax();
GUEST_DONE();
}
static void setup_memslot(void)
{
size_t pages;
size_t sz;
/*
* For the ITS:
* - A single level device table
* - A single level collection table
* - The command queue
* - An ITT for each device
*/
sz = (3 + test_data.nr_devices) * SZ_64K;
/*
* For the redistributors:
* - A shared LPI configuration table
* - An LPI pending table for each vCPU
*/
sz += (1 + test_data.nr_cpus) * SZ_64K;
pages = sz / vm->page_size;
gpa_base = ((vm_compute_max_gfn(vm) + 1) * vm->page_size) - sz;
vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, gpa_base,
TEST_MEMSLOT_INDEX, pages, 0);
}
#define LPI_PROP_DEFAULT_PRIO 0xa0
static void configure_lpis(void)
{
size_t nr_lpis = test_data.nr_devices * test_data.nr_event_ids;
u8 *tbl = addr_gpa2hva(vm, test_data.lpi_prop_table);
size_t i;
for (i = 0; i < nr_lpis; i++) {
tbl[i] = LPI_PROP_DEFAULT_PRIO |
LPI_PROP_GROUP1 |
LPI_PROP_ENABLED;
}
}
static void setup_test_data(void)
{
size_t pages_per_64k = vm_calc_num_guest_pages(vm->mode, SZ_64K);
u32 nr_devices = test_data.nr_devices;
u32 nr_cpus = test_data.nr_cpus;
vm_paddr_t cmdq_base;
test_data.device_table = vm_phy_pages_alloc(vm, pages_per_64k,
gpa_base,
TEST_MEMSLOT_INDEX);
test_data.collection_table = vm_phy_pages_alloc(vm, pages_per_64k,
gpa_base,
TEST_MEMSLOT_INDEX);
cmdq_base = vm_phy_pages_alloc(vm, pages_per_64k, gpa_base,
TEST_MEMSLOT_INDEX);
virt_map(vm, cmdq_base, cmdq_base, pages_per_64k);
test_data.cmdq_base = cmdq_base;
test_data.cmdq_base_va = (void *)cmdq_base;
test_data.itt_tables = vm_phy_pages_alloc(vm, pages_per_64k * nr_devices,
gpa_base, TEST_MEMSLOT_INDEX);
test_data.lpi_prop_table = vm_phy_pages_alloc(vm, pages_per_64k,
gpa_base, TEST_MEMSLOT_INDEX);
configure_lpis();
test_data.lpi_pend_tables = vm_phy_pages_alloc(vm, pages_per_64k * nr_cpus,
gpa_base, TEST_MEMSLOT_INDEX);
sync_global_to_guest(vm, test_data);
}
static void setup_gic(void)
{
gic_fd = vgic_v3_setup(vm, test_data.nr_cpus, 64);
__TEST_REQUIRE(gic_fd >= 0, "Failed to create GICv3");
its_fd = vgic_its_setup(vm);
}
static void signal_lpi(u32 device_id, u32 event_id)
{
vm_paddr_t db_addr = GITS_BASE_GPA + GITS_TRANSLATER;
struct kvm_msi msi = {
.address_lo = db_addr,
.address_hi = db_addr >> 32,
.data = event_id,
.devid = device_id,
.flags = KVM_MSI_VALID_DEVID,
};
/*
* KVM_SIGNAL_MSI returns 1 if the MSI wasn't 'blocked' by the VM,
* which for arm64 implies having a valid translation in the ITS.
*/
TEST_ASSERT(__vm_ioctl(vm, KVM_SIGNAL_MSI, &msi) == 1,
"KVM_SIGNAL_MSI ioctl failed");
}
static pthread_barrier_t test_setup_barrier;
static void *lpi_worker_thread(void *data)
{
u32 device_id = (size_t)data;
u32 event_id;
size_t i;
pthread_barrier_wait(&test_setup_barrier);
for (i = 0; i < nr_iterations; i++)
for (event_id = 0; event_id < test_data.nr_event_ids; event_id++)
signal_lpi(device_id, event_id);
return NULL;
}
static void *vcpu_worker_thread(void *data)
{
struct kvm_vcpu *vcpu = data;
struct ucall uc;
while (true) {
vcpu_run(vcpu);
switch (get_ucall(vcpu, &uc)) {
case UCALL_SYNC:
pthread_barrier_wait(&test_setup_barrier);
continue;
case UCALL_DONE:
return NULL;
case UCALL_ABORT:
REPORT_GUEST_ASSERT(uc);
break;
default:
TEST_FAIL("Unknown ucall: %lu", uc.cmd);
}
}
return NULL;
}
static void report_stats(struct timespec delta)
{
double nr_lpis;
double time;
nr_lpis = test_data.nr_devices * test_data.nr_event_ids * nr_iterations;
time = delta.tv_sec;
time += ((double)delta.tv_nsec) / NSEC_PER_SEC;
pr_info("Rate: %.2f LPIs/sec\n", nr_lpis / time);
}
static void run_test(void)
{
u32 nr_devices = test_data.nr_devices;
u32 nr_vcpus = test_data.nr_cpus;
pthread_t *lpi_threads = malloc(nr_devices * sizeof(pthread_t));
pthread_t *vcpu_threads = malloc(nr_vcpus * sizeof(pthread_t));
struct timespec start, delta;
size_t i;
TEST_ASSERT(lpi_threads && vcpu_threads, "Failed to allocate pthread arrays");
pthread_barrier_init(&test_setup_barrier, NULL, nr_vcpus + nr_devices + 1);
for (i = 0; i < nr_vcpus; i++)
pthread_create(&vcpu_threads[i], NULL, vcpu_worker_thread, vcpus[i]);
for (i = 0; i < nr_devices; i++)
pthread_create(&lpi_threads[i], NULL, lpi_worker_thread, (void *)i);
pthread_barrier_wait(&test_setup_barrier);
clock_gettime(CLOCK_MONOTONIC, &start);
for (i = 0; i < nr_devices; i++)
pthread_join(lpi_threads[i], NULL);
delta = timespec_elapsed(start);
write_guest_global(vm, test_data.request_vcpus_stop, true);
for (i = 0; i < nr_vcpus; i++)
pthread_join(vcpu_threads[i], NULL);
report_stats(delta);
}
static void setup_vm(void)
{
int i;
vcpus = malloc(test_data.nr_cpus * sizeof(struct kvm_vcpu));
TEST_ASSERT(vcpus, "Failed to allocate vCPU array");
vm = vm_create_with_vcpus(test_data.nr_cpus, guest_code, vcpus);
vm_init_descriptor_tables(vm);
for (i = 0; i < test_data.nr_cpus; i++)
vcpu_init_descriptor_tables(vcpus[i]);
vm_install_exception_handler(vm, VECTOR_IRQ_CURRENT, guest_irq_handler);
setup_memslot();
setup_gic();
setup_test_data();
}
static void destroy_vm(void)
{
close(its_fd);
close(gic_fd);
kvm_vm_free(vm);
free(vcpus);
}
static void pr_usage(const char *name)
{
pr_info("%s [-v NR_VCPUS] [-d NR_DEVICES] [-e NR_EVENTS] [-i ITERS] -h\n", name);
pr_info(" -v:\tnumber of vCPUs (default: %u)\n", test_data.nr_cpus);
pr_info(" -d:\tnumber of devices (default: %u)\n", test_data.nr_devices);
pr_info(" -e:\tnumber of event IDs per device (default: %u)\n", test_data.nr_event_ids);
pr_info(" -i:\tnumber of iterations (default: %lu)\n", nr_iterations);
}
int main(int argc, char **argv)
{
u32 nr_threads;
int c;
while ((c = getopt(argc, argv, "hv:d:e:i:")) != -1) {
switch (c) {
case 'v':
test_data.nr_cpus = atoi(optarg);
break;
case 'd':
test_data.nr_devices = atoi(optarg);
break;
case 'e':
test_data.nr_event_ids = atoi(optarg);
break;
case 'i':
nr_iterations = strtoul(optarg, NULL, 0);
break;
case 'h':
default:
pr_usage(argv[0]);
return 1;
}
}
nr_threads = test_data.nr_cpus + test_data.nr_devices;
if (nr_threads > get_nprocs())
pr_info("WARNING: running %u threads on %d CPUs; performance is degraded.\n",
nr_threads, get_nprocs());
setup_vm();
run_test();
destroy_vm();
return 0;
}

View File

@ -404,9 +404,6 @@ static void guest_code(uint64_t expected_pmcr_n)
GUEST_DONE();
}
#define GICD_BASE_GPA 0x8000000ULL
#define GICR_BASE_GPA 0x80A0000ULL
/* Create a VM that has one vCPU with PMUv3 configured. */
static void create_vpmu_vm(void *guest_code)
{
@ -438,8 +435,7 @@ static void create_vpmu_vm(void *guest_code)
init.features[0] |= (1 << KVM_ARM_VCPU_PMU_V3);
vpmu_vm.vcpu = aarch64_vcpu_add(vpmu_vm.vm, 0, &init, guest_code);
vcpu_init_descriptor_tables(vpmu_vm.vcpu);
vpmu_vm.gic_fd = vgic_v3_setup(vpmu_vm.vm, 1, 64,
GICD_BASE_GPA, GICR_BASE_GPA);
vpmu_vm.gic_fd = vgic_v3_setup(vpmu_vm.vm, 1, 64);
__TEST_REQUIRE(vpmu_vm.gic_fd >= 0,
"Failed to create vgic-v3, skipping");

View File

@ -22,9 +22,6 @@
#ifdef __aarch64__
#include "aarch64/vgic.h"
#define GICD_BASE_GPA 0x8000000ULL
#define GICR_BASE_GPA 0x80A0000ULL
static int gic_fd;
static void arch_setup_vm(struct kvm_vm *vm, unsigned int nr_vcpus)
@ -33,7 +30,7 @@ static void arch_setup_vm(struct kvm_vm *vm, unsigned int nr_vcpus)
* The test can still run even if hardware does not support GICv3, as it
* is only an optimization to reduce guest exits.
*/
gic_fd = vgic_v3_setup(vm, nr_vcpus, 64, GICD_BASE_GPA, GICR_BASE_GPA);
gic_fd = vgic_v3_setup(vm, nr_vcpus, 64);
}
static void arch_cleanup_vm(struct kvm_vm *vm)

View File

@ -6,11 +6,26 @@
#ifndef SELFTEST_KVM_GIC_H
#define SELFTEST_KVM_GIC_H
#include <asm/kvm.h>
enum gic_type {
GIC_V3,
GIC_TYPE_MAX,
};
/*
* Note that the redistributor frames are at the end, as the range scales
* with the number of vCPUs in the VM.
*/
#define GITS_BASE_GPA 0x8000000ULL
#define GICD_BASE_GPA (GITS_BASE_GPA + KVM_VGIC_V3_ITS_SIZE)
#define GICR_BASE_GPA (GICD_BASE_GPA + KVM_VGIC_V3_DIST_SIZE)
/* The GIC is identity-mapped into the guest at the time of setup. */
#define GITS_BASE_GVA ((volatile void *)GITS_BASE_GPA)
#define GICD_BASE_GVA ((volatile void *)GICD_BASE_GPA)
#define GICR_BASE_GVA ((volatile void *)GICR_BASE_GPA)
#define MIN_SGI 0
#define MIN_PPI 16
#define MIN_SPI 32
@ -21,8 +36,7 @@ enum gic_type {
#define INTID_IS_PPI(intid) (MIN_PPI <= (intid) && (intid) < MIN_SPI)
#define INTID_IS_SPI(intid) (MIN_SPI <= (intid) && (intid) <= MAX_SPI)
void gic_init(enum gic_type type, unsigned int nr_cpus,
void *dist_base, void *redist_base);
void gic_init(enum gic_type type, unsigned int nr_cpus);
void gic_irq_enable(unsigned int intid);
void gic_irq_disable(unsigned int intid);
unsigned int gic_get_and_ack_irq(void);
@ -44,4 +58,7 @@ void gic_irq_clear_pending(unsigned int intid);
bool gic_irq_get_pending(unsigned int intid);
void gic_irq_set_config(unsigned int intid, bool is_edge);
void gic_rdist_enable_lpis(vm_paddr_t cfg_table, size_t cfg_table_size,
vm_paddr_t pend_table);
#endif /* SELFTEST_KVM_GIC_H */

View File

@ -1,82 +1,604 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* ARM Generic Interrupt Controller (GIC) v3 specific defines
* Copyright (C) 2013, 2014 ARM Limited, All Rights Reserved.
* Author: Marc Zyngier <marc.zyngier@arm.com>
*/
#ifndef SELFTEST_KVM_GICV3_H
#define SELFTEST_KVM_GICV3_H
#include <asm/sysreg.h>
#ifndef __SELFTESTS_GIC_V3_H
#define __SELFTESTS_GIC_V3_H
/*
* Distributor registers
* Distributor registers. We assume we're running non-secure, with ARE
* being set. Secure-only and non-ARE registers are not described.
*/
#define GICD_CTLR 0x0000
#define GICD_TYPER 0x0004
#define GICD_IIDR 0x0008
#define GICD_TYPER2 0x000C
#define GICD_STATUSR 0x0010
#define GICD_SETSPI_NSR 0x0040
#define GICD_CLRSPI_NSR 0x0048
#define GICD_SETSPI_SR 0x0050
#define GICD_CLRSPI_SR 0x0058
#define GICD_IGROUPR 0x0080
#define GICD_ISENABLER 0x0100
#define GICD_ICENABLER 0x0180
#define GICD_ISPENDR 0x0200
#define GICD_ICPENDR 0x0280
#define GICD_ICACTIVER 0x0380
#define GICD_ISACTIVER 0x0300
#define GICD_ICACTIVER 0x0380
#define GICD_IPRIORITYR 0x0400
#define GICD_ICFGR 0x0C00
#define GICD_IGRPMODR 0x0D00
#define GICD_NSACR 0x0E00
#define GICD_IGROUPRnE 0x1000
#define GICD_ISENABLERnE 0x1200
#define GICD_ICENABLERnE 0x1400
#define GICD_ISPENDRnE 0x1600
#define GICD_ICPENDRnE 0x1800
#define GICD_ISACTIVERnE 0x1A00
#define GICD_ICACTIVERnE 0x1C00
#define GICD_IPRIORITYRnE 0x2000
#define GICD_ICFGRnE 0x3000
#define GICD_IROUTER 0x6000
#define GICD_IROUTERnE 0x8000
#define GICD_IDREGS 0xFFD0
#define GICD_PIDR2 0xFFE8
#define ESPI_BASE_INTID 4096
/*
* The assumption is that the guest runs in a non-secure mode.
* The following bits of GICD_CTLR are defined accordingly.
* Those registers are actually from GICv2, but the spec demands that they
* are implemented as RES0 if ARE is 1 (which we do in KVM's emulated GICv3).
*/
#define GICD_ITARGETSR 0x0800
#define GICD_SGIR 0x0F00
#define GICD_CPENDSGIR 0x0F10
#define GICD_SPENDSGIR 0x0F20
#define GICD_CTLR_RWP (1U << 31)
#define GICD_CTLR_nASSGIreq (1U << 8)
#define GICD_CTLR_DS (1U << 6)
#define GICD_CTLR_ARE_NS (1U << 4)
#define GICD_CTLR_ENABLE_G1A (1U << 1)
#define GICD_CTLR_ENABLE_G1 (1U << 0)
#define GICD_TYPER_SPIS(typer) ((((typer) & 0x1f) + 1) * 32)
#define GICD_INT_DEF_PRI_X4 0xa0a0a0a0
#define GICD_IIDR_IMPLEMENTER_SHIFT 0
#define GICD_IIDR_IMPLEMENTER_MASK (0xfff << GICD_IIDR_IMPLEMENTER_SHIFT)
#define GICD_IIDR_REVISION_SHIFT 12
#define GICD_IIDR_REVISION_MASK (0xf << GICD_IIDR_REVISION_SHIFT)
#define GICD_IIDR_VARIANT_SHIFT 16
#define GICD_IIDR_VARIANT_MASK (0xf << GICD_IIDR_VARIANT_SHIFT)
#define GICD_IIDR_PRODUCT_ID_SHIFT 24
#define GICD_IIDR_PRODUCT_ID_MASK (0xff << GICD_IIDR_PRODUCT_ID_SHIFT)
/*
* Redistributor registers
* In systems with a single security state (what we emulate in KVM)
* the meaning of the interrupt group enable bits is slightly different
*/
#define GICR_CTLR 0x000
#define GICR_WAKER 0x014
#define GICD_CTLR_ENABLE_SS_G1 (1U << 1)
#define GICD_CTLR_ENABLE_SS_G0 (1U << 0)
#define GICR_CTLR_RWP (1U << 3)
#define GICD_TYPER_RSS (1U << 26)
#define GICD_TYPER_LPIS (1U << 17)
#define GICD_TYPER_MBIS (1U << 16)
#define GICD_TYPER_ESPI (1U << 8)
#define GICD_TYPER_ID_BITS(typer) ((((typer) >> 19) & 0x1f) + 1)
#define GICD_TYPER_NUM_LPIS(typer) ((((typer) >> 11) & 0x1f) + 1)
#define GICD_TYPER_SPIS(typer) ((((typer) & 0x1f) + 1) * 32)
#define GICD_TYPER_ESPIS(typer) \
(((typer) & GICD_TYPER_ESPI) ? GICD_TYPER_SPIS((typer) >> 27) : 0)
#define GICD_TYPER2_nASSGIcap (1U << 8)
#define GICD_TYPER2_VIL (1U << 7)
#define GICD_TYPER2_VID GENMASK(4, 0)
#define GICD_IROUTER_SPI_MODE_ONE (0U << 31)
#define GICD_IROUTER_SPI_MODE_ANY (1U << 31)
#define GIC_PIDR2_ARCH_MASK 0xf0
#define GIC_PIDR2_ARCH_GICv3 0x30
#define GIC_PIDR2_ARCH_GICv4 0x40
#define GIC_V3_DIST_SIZE 0x10000
#define GIC_PAGE_SIZE_4K 0ULL
#define GIC_PAGE_SIZE_16K 1ULL
#define GIC_PAGE_SIZE_64K 2ULL
#define GIC_PAGE_SIZE_MASK 3ULL
/*
* Re-Distributor registers, offsets from RD_base
*/
#define GICR_CTLR GICD_CTLR
#define GICR_IIDR 0x0004
#define GICR_TYPER 0x0008
#define GICR_STATUSR GICD_STATUSR
#define GICR_WAKER 0x0014
#define GICR_SETLPIR 0x0040
#define GICR_CLRLPIR 0x0048
#define GICR_PROPBASER 0x0070
#define GICR_PENDBASER 0x0078
#define GICR_INVLPIR 0x00A0
#define GICR_INVALLR 0x00B0
#define GICR_SYNCR 0x00C0
#define GICR_IDREGS GICD_IDREGS
#define GICR_PIDR2 GICD_PIDR2
#define GICR_CTLR_ENABLE_LPIS (1UL << 0)
#define GICR_CTLR_CES (1UL << 1)
#define GICR_CTLR_IR (1UL << 2)
#define GICR_CTLR_RWP (1UL << 3)
#define GICR_TYPER_CPU_NUMBER(r) (((r) >> 8) & 0xffff)
#define EPPI_BASE_INTID 1056
#define GICR_TYPER_NR_PPIS(r) \
({ \
unsigned int __ppinum = ((r) >> 27) & 0x1f; \
unsigned int __nr_ppis = 16; \
if (__ppinum == 1 || __ppinum == 2) \
__nr_ppis += __ppinum * 32; \
\
__nr_ppis; \
})
#define GICR_WAKER_ProcessorSleep (1U << 1)
#define GICR_WAKER_ChildrenAsleep (1U << 2)
#define GIC_BASER_CACHE_nCnB 0ULL
#define GIC_BASER_CACHE_SameAsInner 0ULL
#define GIC_BASER_CACHE_nC 1ULL
#define GIC_BASER_CACHE_RaWt 2ULL
#define GIC_BASER_CACHE_RaWb 3ULL
#define GIC_BASER_CACHE_WaWt 4ULL
#define GIC_BASER_CACHE_WaWb 5ULL
#define GIC_BASER_CACHE_RaWaWt 6ULL
#define GIC_BASER_CACHE_RaWaWb 7ULL
#define GIC_BASER_CACHE_MASK 7ULL
#define GIC_BASER_NonShareable 0ULL
#define GIC_BASER_InnerShareable 1ULL
#define GIC_BASER_OuterShareable 2ULL
#define GIC_BASER_SHAREABILITY_MASK 3ULL
#define GIC_BASER_CACHEABILITY(reg, inner_outer, type) \
(GIC_BASER_CACHE_##type << reg##_##inner_outer##_CACHEABILITY_SHIFT)
#define GIC_BASER_SHAREABILITY(reg, type) \
(GIC_BASER_##type << reg##_SHAREABILITY_SHIFT)
/* encode a size field of width @w containing @n - 1 units */
#define GIC_ENCODE_SZ(n, w) (((unsigned long)(n) - 1) & GENMASK_ULL(((w) - 1), 0))
#define GICR_PROPBASER_SHAREABILITY_SHIFT (10)
#define GICR_PROPBASER_INNER_CACHEABILITY_SHIFT (7)
#define GICR_PROPBASER_OUTER_CACHEABILITY_SHIFT (56)
#define GICR_PROPBASER_SHAREABILITY_MASK \
GIC_BASER_SHAREABILITY(GICR_PROPBASER, SHAREABILITY_MASK)
#define GICR_PROPBASER_INNER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, MASK)
#define GICR_PROPBASER_OUTER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GICR_PROPBASER, OUTER, MASK)
#define GICR_PROPBASER_CACHEABILITY_MASK GICR_PROPBASER_INNER_CACHEABILITY_MASK
#define GICR_PROPBASER_InnerShareable \
GIC_BASER_SHAREABILITY(GICR_PROPBASER, InnerShareable)
#define GICR_PROPBASER_nCnB GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, nCnB)
#define GICR_PROPBASER_nC GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, nC)
#define GICR_PROPBASER_RaWt GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWt)
#define GICR_PROPBASER_RaWb GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWb)
#define GICR_PROPBASER_WaWt GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, WaWt)
#define GICR_PROPBASER_WaWb GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, WaWb)
#define GICR_PROPBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWaWt)
#define GICR_PROPBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWaWb)
#define GICR_PROPBASER_IDBITS_MASK (0x1f)
#define GICR_PROPBASER_ADDRESS(x) ((x) & GENMASK_ULL(51, 12))
#define GICR_PENDBASER_ADDRESS(x) ((x) & GENMASK_ULL(51, 16))
#define GICR_PENDBASER_SHAREABILITY_SHIFT (10)
#define GICR_PENDBASER_INNER_CACHEABILITY_SHIFT (7)
#define GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT (56)
#define GICR_PENDBASER_SHAREABILITY_MASK \
GIC_BASER_SHAREABILITY(GICR_PENDBASER, SHAREABILITY_MASK)
#define GICR_PENDBASER_INNER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, MASK)
#define GICR_PENDBASER_OUTER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GICR_PENDBASER, OUTER, MASK)
#define GICR_PENDBASER_CACHEABILITY_MASK GICR_PENDBASER_INNER_CACHEABILITY_MASK
#define GICR_PENDBASER_InnerShareable \
GIC_BASER_SHAREABILITY(GICR_PENDBASER, InnerShareable)
#define GICR_PENDBASER_nCnB GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, nCnB)
#define GICR_PENDBASER_nC GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, nC)
#define GICR_PENDBASER_RaWt GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWt)
#define GICR_PENDBASER_RaWb GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWb)
#define GICR_PENDBASER_WaWt GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, WaWt)
#define GICR_PENDBASER_WaWb GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, WaWb)
#define GICR_PENDBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWaWt)
#define GICR_PENDBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWaWb)
#define GICR_PENDBASER_PTZ BIT_ULL(62)
/*
* Redistributor registers, offsets from SGI base
* Re-Distributor registers, offsets from SGI_base
*/
#define GICR_IGROUPR0 GICD_IGROUPR
#define GICR_ISENABLER0 GICD_ISENABLER
#define GICR_ICENABLER0 GICD_ICENABLER
#define GICR_ISPENDR0 GICD_ISPENDR
#define GICR_ICPENDR0 GICD_ICPENDR
#define GICR_ISACTIVER0 GICD_ISACTIVER
#define GICR_ICACTIVER0 GICD_ICACTIVER
#define GICR_ICENABLER GICD_ICENABLER
#define GICR_ICACTIVER GICD_ICACTIVER
#define GICR_IPRIORITYR0 GICD_IPRIORITYR
#define GICR_ICFGR0 GICD_ICFGR
#define GICR_IGRPMODR0 GICD_IGRPMODR
#define GICR_NSACR GICD_NSACR
/* CPU interface registers */
#define SYS_ICC_PMR_EL1 sys_reg(3, 0, 4, 6, 0)
#define SYS_ICC_IAR1_EL1 sys_reg(3, 0, 12, 12, 0)
#define SYS_ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1)
#define SYS_ICC_DIR_EL1 sys_reg(3, 0, 12, 11, 1)
#define SYS_ICC_CTLR_EL1 sys_reg(3, 0, 12, 12, 4)
#define SYS_ICC_SRE_EL1 sys_reg(3, 0, 12, 12, 5)
#define SYS_ICC_GRPEN1_EL1 sys_reg(3, 0, 12, 12, 7)
#define GICR_TYPER_PLPIS (1U << 0)
#define GICR_TYPER_VLPIS (1U << 1)
#define GICR_TYPER_DIRTY (1U << 2)
#define GICR_TYPER_DirectLPIS (1U << 3)
#define GICR_TYPER_LAST (1U << 4)
#define GICR_TYPER_RVPEID (1U << 7)
#define GICR_TYPER_COMMON_LPI_AFF GENMASK_ULL(25, 24)
#define GICR_TYPER_AFFINITY GENMASK_ULL(63, 32)
#define SYS_ICV_AP1R0_EL1 sys_reg(3, 0, 12, 9, 0)
#define GICR_INVLPIR_INTID GENMASK_ULL(31, 0)
#define GICR_INVLPIR_VPEID GENMASK_ULL(47, 32)
#define GICR_INVLPIR_V GENMASK_ULL(63, 63)
#define ICC_PMR_DEF_PRIO 0xf0
#define GICR_INVALLR_VPEID GICR_INVLPIR_VPEID
#define GICR_INVALLR_V GICR_INVLPIR_V
#define GIC_V3_REDIST_SIZE 0x20000
#define LPI_PROP_GROUP1 (1 << 1)
#define LPI_PROP_ENABLED (1 << 0)
/*
* Re-Distributor registers, offsets from VLPI_base
*/
#define GICR_VPROPBASER 0x0070
#define GICR_VPROPBASER_IDBITS_MASK 0x1f
#define GICR_VPROPBASER_SHAREABILITY_SHIFT (10)
#define GICR_VPROPBASER_INNER_CACHEABILITY_SHIFT (7)
#define GICR_VPROPBASER_OUTER_CACHEABILITY_SHIFT (56)
#define GICR_VPROPBASER_SHAREABILITY_MASK \
GIC_BASER_SHAREABILITY(GICR_VPROPBASER, SHAREABILITY_MASK)
#define GICR_VPROPBASER_INNER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, MASK)
#define GICR_VPROPBASER_OUTER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GICR_VPROPBASER, OUTER, MASK)
#define GICR_VPROPBASER_CACHEABILITY_MASK \
GICR_VPROPBASER_INNER_CACHEABILITY_MASK
#define GICR_VPROPBASER_InnerShareable \
GIC_BASER_SHAREABILITY(GICR_VPROPBASER, InnerShareable)
#define GICR_VPROPBASER_nCnB GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, nCnB)
#define GICR_VPROPBASER_nC GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, nC)
#define GICR_VPROPBASER_RaWt GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWt)
#define GICR_VPROPBASER_RaWb GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWb)
#define GICR_VPROPBASER_WaWt GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, WaWt)
#define GICR_VPROPBASER_WaWb GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, WaWb)
#define GICR_VPROPBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWaWt)
#define GICR_VPROPBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_VPROPBASER, INNER, RaWaWb)
/*
* GICv4.1 VPROPBASER reinvention. A subtle mix between the old
* VPROPBASER and ITS_BASER. Just not quite any of the two.
*/
#define GICR_VPROPBASER_4_1_VALID (1ULL << 63)
#define GICR_VPROPBASER_4_1_ENTRY_SIZE GENMASK_ULL(61, 59)
#define GICR_VPROPBASER_4_1_INDIRECT (1ULL << 55)
#define GICR_VPROPBASER_4_1_PAGE_SIZE GENMASK_ULL(54, 53)
#define GICR_VPROPBASER_4_1_Z (1ULL << 52)
#define GICR_VPROPBASER_4_1_ADDR GENMASK_ULL(51, 12)
#define GICR_VPROPBASER_4_1_SIZE GENMASK_ULL(6, 0)
#define GICR_VPENDBASER 0x0078
#define GICR_VPENDBASER_SHAREABILITY_SHIFT (10)
#define GICR_VPENDBASER_INNER_CACHEABILITY_SHIFT (7)
#define GICR_VPENDBASER_OUTER_CACHEABILITY_SHIFT (56)
#define GICR_VPENDBASER_SHAREABILITY_MASK \
GIC_BASER_SHAREABILITY(GICR_VPENDBASER, SHAREABILITY_MASK)
#define GICR_VPENDBASER_INNER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, MASK)
#define GICR_VPENDBASER_OUTER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GICR_VPENDBASER, OUTER, MASK)
#define GICR_VPENDBASER_CACHEABILITY_MASK \
GICR_VPENDBASER_INNER_CACHEABILITY_MASK
#define GICR_VPENDBASER_NonShareable \
GIC_BASER_SHAREABILITY(GICR_VPENDBASER, NonShareable)
#define GICR_VPENDBASER_InnerShareable \
GIC_BASER_SHAREABILITY(GICR_VPENDBASER, InnerShareable)
#define GICR_VPENDBASER_nCnB GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, nCnB)
#define GICR_VPENDBASER_nC GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, nC)
#define GICR_VPENDBASER_RaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWt)
#define GICR_VPENDBASER_RaWb GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWb)
#define GICR_VPENDBASER_WaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, WaWt)
#define GICR_VPENDBASER_WaWb GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, WaWb)
#define GICR_VPENDBASER_RaWaWt GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWaWt)
#define GICR_VPENDBASER_RaWaWb GIC_BASER_CACHEABILITY(GICR_VPENDBASER, INNER, RaWaWb)
#define GICR_VPENDBASER_Dirty (1ULL << 60)
#define GICR_VPENDBASER_PendingLast (1ULL << 61)
#define GICR_VPENDBASER_IDAI (1ULL << 62)
#define GICR_VPENDBASER_Valid (1ULL << 63)
/*
* GICv4.1 VPENDBASER, used for VPE residency. On top of these fields,
* also use the above Valid, PendingLast and Dirty.
*/
#define GICR_VPENDBASER_4_1_DB (1ULL << 62)
#define GICR_VPENDBASER_4_1_VGRP0EN (1ULL << 59)
#define GICR_VPENDBASER_4_1_VGRP1EN (1ULL << 58)
#define GICR_VPENDBASER_4_1_VPEID GENMASK_ULL(15, 0)
#define GICR_VSGIR 0x0080
#define GICR_VSGIR_VPEID GENMASK(15, 0)
#define GICR_VSGIPENDR 0x0088
#define GICR_VSGIPENDR_BUSY (1U << 31)
#define GICR_VSGIPENDR_PENDING GENMASK(15, 0)
/*
* ITS registers, offsets from ITS_base
*/
#define GITS_CTLR 0x0000
#define GITS_IIDR 0x0004
#define GITS_TYPER 0x0008
#define GITS_MPIDR 0x0018
#define GITS_CBASER 0x0080
#define GITS_CWRITER 0x0088
#define GITS_CREADR 0x0090
#define GITS_BASER 0x0100
#define GITS_IDREGS_BASE 0xffd0
#define GITS_PIDR0 0xffe0
#define GITS_PIDR1 0xffe4
#define GITS_PIDR2 GICR_PIDR2
#define GITS_PIDR4 0xffd0
#define GITS_CIDR0 0xfff0
#define GITS_CIDR1 0xfff4
#define GITS_CIDR2 0xfff8
#define GITS_CIDR3 0xfffc
#define GITS_TRANSLATER 0x10040
#define GITS_SGIR 0x20020
#define GITS_SGIR_VPEID GENMASK_ULL(47, 32)
#define GITS_SGIR_VINTID GENMASK_ULL(3, 0)
#define GITS_CTLR_ENABLE (1U << 0)
#define GITS_CTLR_ImDe (1U << 1)
#define GITS_CTLR_ITS_NUMBER_SHIFT 4
#define GITS_CTLR_ITS_NUMBER (0xFU << GITS_CTLR_ITS_NUMBER_SHIFT)
#define GITS_CTLR_QUIESCENT (1U << 31)
#define GITS_TYPER_PLPIS (1UL << 0)
#define GITS_TYPER_VLPIS (1UL << 1)
#define GITS_TYPER_ITT_ENTRY_SIZE_SHIFT 4
#define GITS_TYPER_ITT_ENTRY_SIZE GENMASK_ULL(7, 4)
#define GITS_TYPER_IDBITS_SHIFT 8
#define GITS_TYPER_DEVBITS_SHIFT 13
#define GITS_TYPER_DEVBITS GENMASK_ULL(17, 13)
#define GITS_TYPER_PTA (1UL << 19)
#define GITS_TYPER_HCC_SHIFT 24
#define GITS_TYPER_HCC(r) (((r) >> GITS_TYPER_HCC_SHIFT) & 0xff)
#define GITS_TYPER_VMOVP (1ULL << 37)
#define GITS_TYPER_VMAPP (1ULL << 40)
#define GITS_TYPER_SVPET GENMASK_ULL(42, 41)
#define GITS_IIDR_REV_SHIFT 12
#define GITS_IIDR_REV_MASK (0xf << GITS_IIDR_REV_SHIFT)
#define GITS_IIDR_REV(r) (((r) >> GITS_IIDR_REV_SHIFT) & 0xf)
#define GITS_IIDR_PRODUCTID_SHIFT 24
#define GITS_CBASER_VALID (1ULL << 63)
#define GITS_CBASER_SHAREABILITY_SHIFT (10)
#define GITS_CBASER_INNER_CACHEABILITY_SHIFT (59)
#define GITS_CBASER_OUTER_CACHEABILITY_SHIFT (53)
#define GITS_CBASER_SHAREABILITY_MASK \
GIC_BASER_SHAREABILITY(GITS_CBASER, SHAREABILITY_MASK)
#define GITS_CBASER_INNER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, MASK)
#define GITS_CBASER_OUTER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GITS_CBASER, OUTER, MASK)
#define GITS_CBASER_CACHEABILITY_MASK GITS_CBASER_INNER_CACHEABILITY_MASK
#define GITS_CBASER_InnerShareable \
GIC_BASER_SHAREABILITY(GITS_CBASER, InnerShareable)
#define GITS_CBASER_nCnB GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, nCnB)
#define GITS_CBASER_nC GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, nC)
#define GITS_CBASER_RaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWt)
#define GITS_CBASER_RaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWb)
#define GITS_CBASER_WaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, WaWt)
#define GITS_CBASER_WaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, WaWb)
#define GITS_CBASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWt)
#define GITS_CBASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWb)
#define GITS_CBASER_ADDRESS(cbaser) ((cbaser) & GENMASK_ULL(51, 12))
#define GITS_BASER_NR_REGS 8
#define GITS_BASER_VALID (1ULL << 63)
#define GITS_BASER_INDIRECT (1ULL << 62)
#define GITS_BASER_INNER_CACHEABILITY_SHIFT (59)
#define GITS_BASER_OUTER_CACHEABILITY_SHIFT (53)
#define GITS_BASER_INNER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GITS_BASER, INNER, MASK)
#define GITS_BASER_CACHEABILITY_MASK GITS_BASER_INNER_CACHEABILITY_MASK
#define GITS_BASER_OUTER_CACHEABILITY_MASK \
GIC_BASER_CACHEABILITY(GITS_BASER, OUTER, MASK)
#define GITS_BASER_SHAREABILITY_MASK \
GIC_BASER_SHAREABILITY(GITS_BASER, SHAREABILITY_MASK)
#define GITS_BASER_nCnB GIC_BASER_CACHEABILITY(GITS_BASER, INNER, nCnB)
#define GITS_BASER_nC GIC_BASER_CACHEABILITY(GITS_BASER, INNER, nC)
#define GITS_BASER_RaWt GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWt)
#define GITS_BASER_RaWb GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWb)
#define GITS_BASER_WaWt GIC_BASER_CACHEABILITY(GITS_BASER, INNER, WaWt)
#define GITS_BASER_WaWb GIC_BASER_CACHEABILITY(GITS_BASER, INNER, WaWb)
#define GITS_BASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWaWt)
#define GITS_BASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWaWb)
#define GITS_BASER_TYPE_SHIFT (56)
#define GITS_BASER_TYPE(r) (((r) >> GITS_BASER_TYPE_SHIFT) & 7)
#define GITS_BASER_ENTRY_SIZE_SHIFT (48)
#define GITS_BASER_ENTRY_SIZE(r) ((((r) >> GITS_BASER_ENTRY_SIZE_SHIFT) & 0x1f) + 1)
#define GITS_BASER_ENTRY_SIZE_MASK GENMASK_ULL(52, 48)
#define GITS_BASER_PHYS_52_to_48(phys) \
(((phys) & GENMASK_ULL(47, 16)) | (((phys) >> 48) & 0xf) << 12)
#define GITS_BASER_ADDR_48_to_52(baser) \
(((baser) & GENMASK_ULL(47, 16)) | (((baser) >> 12) & 0xf) << 48)
#define GITS_BASER_SHAREABILITY_SHIFT (10)
#define GITS_BASER_InnerShareable \
GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable)
#define GITS_BASER_PAGE_SIZE_SHIFT (8)
#define __GITS_BASER_PSZ(sz) (GIC_PAGE_SIZE_ ## sz << GITS_BASER_PAGE_SIZE_SHIFT)
#define GITS_BASER_PAGE_SIZE_4K __GITS_BASER_PSZ(4K)
#define GITS_BASER_PAGE_SIZE_16K __GITS_BASER_PSZ(16K)
#define GITS_BASER_PAGE_SIZE_64K __GITS_BASER_PSZ(64K)
#define GITS_BASER_PAGE_SIZE_MASK __GITS_BASER_PSZ(MASK)
#define GITS_BASER_PAGES_MAX 256
#define GITS_BASER_PAGES_SHIFT (0)
#define GITS_BASER_NR_PAGES(r) (((r) & 0xff) + 1)
#define GITS_BASER_TYPE_NONE 0
#define GITS_BASER_TYPE_DEVICE 1
#define GITS_BASER_TYPE_VCPU 2
#define GITS_BASER_TYPE_RESERVED3 3
#define GITS_BASER_TYPE_COLLECTION 4
#define GITS_BASER_TYPE_RESERVED5 5
#define GITS_BASER_TYPE_RESERVED6 6
#define GITS_BASER_TYPE_RESERVED7 7
#define GITS_LVL1_ENTRY_SIZE (8UL)
/*
* ITS commands
*/
#define GITS_CMD_MAPD 0x08
#define GITS_CMD_MAPC 0x09
#define GITS_CMD_MAPTI 0x0a
#define GITS_CMD_MAPI 0x0b
#define GITS_CMD_MOVI 0x01
#define GITS_CMD_DISCARD 0x0f
#define GITS_CMD_INV 0x0c
#define GITS_CMD_MOVALL 0x0e
#define GITS_CMD_INVALL 0x0d
#define GITS_CMD_INT 0x03
#define GITS_CMD_CLEAR 0x04
#define GITS_CMD_SYNC 0x05
/*
* GICv4 ITS specific commands
*/
#define GITS_CMD_GICv4(x) ((x) | 0x20)
#define GITS_CMD_VINVALL GITS_CMD_GICv4(GITS_CMD_INVALL)
#define GITS_CMD_VMAPP GITS_CMD_GICv4(GITS_CMD_MAPC)
#define GITS_CMD_VMAPTI GITS_CMD_GICv4(GITS_CMD_MAPTI)
#define GITS_CMD_VMOVI GITS_CMD_GICv4(GITS_CMD_MOVI)
#define GITS_CMD_VSYNC GITS_CMD_GICv4(GITS_CMD_SYNC)
/* VMOVP, VSGI and INVDB are the odd ones, as they dont have a physical counterpart */
#define GITS_CMD_VMOVP GITS_CMD_GICv4(2)
#define GITS_CMD_VSGI GITS_CMD_GICv4(3)
#define GITS_CMD_INVDB GITS_CMD_GICv4(0xe)
/*
* ITS error numbers
*/
#define E_ITS_MOVI_UNMAPPED_INTERRUPT 0x010107
#define E_ITS_MOVI_UNMAPPED_COLLECTION 0x010109
#define E_ITS_INT_UNMAPPED_INTERRUPT 0x010307
#define E_ITS_CLEAR_UNMAPPED_INTERRUPT 0x010507
#define E_ITS_MAPD_DEVICE_OOR 0x010801
#define E_ITS_MAPD_ITTSIZE_OOR 0x010802
#define E_ITS_MAPC_PROCNUM_OOR 0x010902
#define E_ITS_MAPC_COLLECTION_OOR 0x010903
#define E_ITS_MAPTI_UNMAPPED_DEVICE 0x010a04
#define E_ITS_MAPTI_ID_OOR 0x010a05
#define E_ITS_MAPTI_PHYSICALID_OOR 0x010a06
#define E_ITS_INV_UNMAPPED_INTERRUPT 0x010c07
#define E_ITS_INVALL_UNMAPPED_COLLECTION 0x010d09
#define E_ITS_MOVALL_PROCNUM_OOR 0x010e01
#define E_ITS_DISCARD_UNMAPPED_INTERRUPT 0x010f07
/*
* CPU interface registers
*/
#define ICC_CTLR_EL1_EOImode_SHIFT (1)
#define ICC_CTLR_EL1_EOImode_drop_dir (0U << ICC_CTLR_EL1_EOImode_SHIFT)
#define ICC_CTLR_EL1_EOImode_drop (1U << ICC_CTLR_EL1_EOImode_SHIFT)
#define ICC_CTLR_EL1_EOImode_MASK (1 << ICC_CTLR_EL1_EOImode_SHIFT)
#define ICC_CTLR_EL1_CBPR_SHIFT 0
#define ICC_CTLR_EL1_CBPR_MASK (1 << ICC_CTLR_EL1_CBPR_SHIFT)
#define ICC_CTLR_EL1_PMHE_SHIFT 6
#define ICC_CTLR_EL1_PMHE_MASK (1 << ICC_CTLR_EL1_PMHE_SHIFT)
#define ICC_CTLR_EL1_PRI_BITS_SHIFT 8
#define ICC_CTLR_EL1_PRI_BITS_MASK (0x7 << ICC_CTLR_EL1_PRI_BITS_SHIFT)
#define ICC_CTLR_EL1_ID_BITS_SHIFT 11
#define ICC_CTLR_EL1_ID_BITS_MASK (0x7 << ICC_CTLR_EL1_ID_BITS_SHIFT)
#define ICC_CTLR_EL1_SEIS_SHIFT 14
#define ICC_CTLR_EL1_SEIS_MASK (0x1 << ICC_CTLR_EL1_SEIS_SHIFT)
#define ICC_CTLR_EL1_A3V_SHIFT 15
#define ICC_CTLR_EL1_A3V_MASK (0x1 << ICC_CTLR_EL1_A3V_SHIFT)
#define ICC_CTLR_EL1_RSS (0x1 << 18)
#define ICC_CTLR_EL1_ExtRange (0x1 << 19)
#define ICC_PMR_EL1_SHIFT 0
#define ICC_PMR_EL1_MASK (0xff << ICC_PMR_EL1_SHIFT)
#define ICC_BPR0_EL1_SHIFT 0
#define ICC_BPR0_EL1_MASK (0x7 << ICC_BPR0_EL1_SHIFT)
#define ICC_BPR1_EL1_SHIFT 0
#define ICC_BPR1_EL1_MASK (0x7 << ICC_BPR1_EL1_SHIFT)
#define ICC_IGRPEN0_EL1_SHIFT 0
#define ICC_IGRPEN0_EL1_MASK (1 << ICC_IGRPEN0_EL1_SHIFT)
#define ICC_IGRPEN1_EL1_SHIFT 0
#define ICC_IGRPEN1_EL1_MASK (1 << ICC_IGRPEN1_EL1_SHIFT)
#define ICC_SRE_EL1_DIB (1U << 2)
#define ICC_SRE_EL1_DFB (1U << 1)
#define ICC_SRE_EL1_SRE (1U << 0)
#define ICC_IGRPEN1_EL1_ENABLE (1U << 0)
/* These are for GICv2 emulation only */
#define GICH_LR_VIRTUALID (0x3ffUL << 0)
#define GICH_LR_PHYSID_CPUID_SHIFT (10)
#define GICH_LR_PHYSID_CPUID (7UL << GICH_LR_PHYSID_CPUID_SHIFT)
#define GICV3_MAX_CPUS 512
#define ICC_IAR1_EL1_SPURIOUS 0x3ff
#endif /* SELFTEST_KVM_GICV3_H */
#define ICC_SRE_EL2_SRE (1 << 0)
#define ICC_SRE_EL2_ENABLE (1 << 3)
#define ICC_SGI1R_TARGET_LIST_SHIFT 0
#define ICC_SGI1R_TARGET_LIST_MASK (0xffff << ICC_SGI1R_TARGET_LIST_SHIFT)
#define ICC_SGI1R_AFFINITY_1_SHIFT 16
#define ICC_SGI1R_AFFINITY_1_MASK (0xff << ICC_SGI1R_AFFINITY_1_SHIFT)
#define ICC_SGI1R_SGI_ID_SHIFT 24
#define ICC_SGI1R_SGI_ID_MASK (0xfULL << ICC_SGI1R_SGI_ID_SHIFT)
#define ICC_SGI1R_AFFINITY_2_SHIFT 32
#define ICC_SGI1R_AFFINITY_2_MASK (0xffULL << ICC_SGI1R_AFFINITY_2_SHIFT)
#define ICC_SGI1R_IRQ_ROUTING_MODE_BIT 40
#define ICC_SGI1R_RS_SHIFT 44
#define ICC_SGI1R_RS_MASK (0xfULL << ICC_SGI1R_RS_SHIFT)
#define ICC_SGI1R_AFFINITY_3_SHIFT 48
#define ICC_SGI1R_AFFINITY_3_MASK (0xffULL << ICC_SGI1R_AFFINITY_3_SHIFT)
#endif

View File

@ -0,0 +1,19 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __SELFTESTS_GIC_V3_ITS_H__
#define __SELFTESTS_GIC_V3_ITS_H__
#include <linux/sizes.h>
void its_init(vm_paddr_t coll_tbl, size_t coll_tbl_sz,
vm_paddr_t device_tbl, size_t device_tbl_sz,
vm_paddr_t cmdq, size_t cmdq_size);
void its_send_mapd_cmd(void *cmdq_base, u32 device_id, vm_paddr_t itt_base,
size_t itt_size, bool valid);
void its_send_mapc_cmd(void *cmdq_base, u32 vcpu_id, u32 collection_id, bool valid);
void its_send_mapti_cmd(void *cmdq_base, u32 device_id, u32 event_id,
u32 collection_id, u32 intid);
void its_send_invall_cmd(void *cmdq_base, u32 collection_id);
#endif // __SELFTESTS_GIC_V3_ITS_H__

View File

@ -58,8 +58,6 @@
MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) | \
MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT))
#define MPIDR_HWID_BITMASK (0xff00fffffful)
void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init);
struct kvm_vcpu *aarch64_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
struct kvm_vcpu_init *init, void *guest_code);
@ -177,11 +175,28 @@ static __always_inline u32 __raw_readl(const volatile void *addr)
return val;
}
static __always_inline void __raw_writeq(u64 val, volatile void *addr)
{
asm volatile("str %0, [%1]" : : "rZ" (val), "r" (addr));
}
static __always_inline u64 __raw_readq(const volatile void *addr)
{
u64 val;
asm volatile("ldr %0, [%1]" : "=r" (val) : "r" (addr));
return val;
}
#define writel_relaxed(v,c) ((void)__raw_writel((__force u32)cpu_to_le32(v),(c)))
#define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32)__raw_readl(c)); __r; })
#define writeq_relaxed(v,c) ((void)__raw_writeq((__force u64)cpu_to_le64(v),(c)))
#define readq_relaxed(c) ({ u64 __r = le64_to_cpu((__force __le64)__raw_readq(c)); __r; })
#define writel(v,c) ({ __iowmb(); writel_relaxed((v),(c));})
#define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(__v); __v; })
#define writeq(v,c) ({ __iowmb(); writeq_relaxed((v),(c));})
#define readq(c) ({ u64 __v = readq_relaxed(c); __iormb(__v); __v; })
static inline void local_irq_enable(void)
{

View File

@ -16,8 +16,7 @@
((uint64_t)(flags) << 12) | \
index)
int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
uint64_t gicd_base_gpa, uint64_t gicr_base_gpa);
int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs);
#define VGIC_MAX_RESERVED 1023
@ -33,4 +32,6 @@ void kvm_irq_write_isactiver(int gic_fd, uint32_t intid, struct kvm_vcpu *vcpu);
#define KVM_IRQCHIP_NUM_PINS (1020 - 32)
int vgic_its_setup(struct kvm_vm *vm);
#endif // SELFTEST_KVM_VGIC_H

View File

@ -17,13 +17,12 @@
static const struct gic_common_ops *gic_common_ops;
static struct spinlock gic_lock;
static void gic_cpu_init(unsigned int cpu, void *redist_base)
static void gic_cpu_init(unsigned int cpu)
{
gic_common_ops->gic_cpu_init(cpu, redist_base);
gic_common_ops->gic_cpu_init(cpu);
}
static void
gic_dist_init(enum gic_type type, unsigned int nr_cpus, void *dist_base)
static void gic_dist_init(enum gic_type type, unsigned int nr_cpus)
{
const struct gic_common_ops *gic_ops = NULL;
@ -40,7 +39,7 @@ gic_dist_init(enum gic_type type, unsigned int nr_cpus, void *dist_base)
GUEST_ASSERT(gic_ops);
gic_ops->gic_init(nr_cpus, dist_base);
gic_ops->gic_init(nr_cpus);
gic_common_ops = gic_ops;
/* Make sure that the initialized data is visible to all the vCPUs */
@ -49,18 +48,15 @@ gic_dist_init(enum gic_type type, unsigned int nr_cpus, void *dist_base)
spin_unlock(&gic_lock);
}
void gic_init(enum gic_type type, unsigned int nr_cpus,
void *dist_base, void *redist_base)
void gic_init(enum gic_type type, unsigned int nr_cpus)
{
uint32_t cpu = guest_get_vcpuid();
GUEST_ASSERT(type < GIC_TYPE_MAX);
GUEST_ASSERT(dist_base);
GUEST_ASSERT(redist_base);
GUEST_ASSERT(nr_cpus);
gic_dist_init(type, nr_cpus, dist_base);
gic_cpu_init(cpu, redist_base);
gic_dist_init(type, nr_cpus);
gic_cpu_init(cpu);
}
void gic_irq_enable(unsigned int intid)

View File

@ -8,8 +8,8 @@
#define SELFTEST_KVM_GIC_PRIVATE_H
struct gic_common_ops {
void (*gic_init)(unsigned int nr_cpus, void *dist_base);
void (*gic_cpu_init)(unsigned int cpu, void *redist_base);
void (*gic_init)(unsigned int nr_cpus);
void (*gic_cpu_init)(unsigned int cpu);
void (*gic_irq_enable)(unsigned int intid);
void (*gic_irq_disable)(unsigned int intid);
uint64_t (*gic_read_iar)(void);

View File

@ -9,12 +9,21 @@
#include "processor.h"
#include "delay.h"
#include "gic.h"
#include "gic_v3.h"
#include "gic_private.h"
#define GICV3_MAX_CPUS 512
#define GICD_INT_DEF_PRI 0xa0
#define GICD_INT_DEF_PRI_X4 ((GICD_INT_DEF_PRI << 24) |\
(GICD_INT_DEF_PRI << 16) |\
(GICD_INT_DEF_PRI << 8) |\
GICD_INT_DEF_PRI)
#define ICC_PMR_DEF_PRIO 0xf0
struct gicv3_data {
void *dist_base;
void *redist_base[GICV3_MAX_CPUS];
unsigned int nr_cpus;
unsigned int nr_spis;
};
@ -35,17 +44,23 @@ static void gicv3_gicd_wait_for_rwp(void)
{
unsigned int count = 100000; /* 1s */
while (readl(gicv3_data.dist_base + GICD_CTLR) & GICD_CTLR_RWP) {
while (readl(GICD_BASE_GVA + GICD_CTLR) & GICD_CTLR_RWP) {
GUEST_ASSERT(count--);
udelay(10);
}
}
static void gicv3_gicr_wait_for_rwp(void *redist_base)
static inline volatile void *gicr_base_cpu(uint32_t cpu)
{
/* Align all the redistributors sequentially */
return GICR_BASE_GVA + cpu * SZ_64K * 2;
}
static void gicv3_gicr_wait_for_rwp(uint32_t cpu)
{
unsigned int count = 100000; /* 1s */
while (readl(redist_base + GICR_CTLR) & GICR_CTLR_RWP) {
while (readl(gicr_base_cpu(cpu) + GICR_CTLR) & GICR_CTLR_RWP) {
GUEST_ASSERT(count--);
udelay(10);
}
@ -56,7 +71,7 @@ static void gicv3_wait_for_rwp(uint32_t cpu_or_dist)
if (cpu_or_dist & DIST_BIT)
gicv3_gicd_wait_for_rwp();
else
gicv3_gicr_wait_for_rwp(gicv3_data.redist_base[cpu_or_dist]);
gicv3_gicr_wait_for_rwp(cpu_or_dist);
}
static enum gicv3_intid_range get_intid_range(unsigned int intid)
@ -116,15 +131,15 @@ static void gicv3_set_eoi_split(bool split)
uint32_t gicv3_reg_readl(uint32_t cpu_or_dist, uint64_t offset)
{
void *base = cpu_or_dist & DIST_BIT ? gicv3_data.dist_base
: sgi_base_from_redist(gicv3_data.redist_base[cpu_or_dist]);
volatile void *base = cpu_or_dist & DIST_BIT ? GICD_BASE_GVA
: sgi_base_from_redist(gicr_base_cpu(cpu_or_dist));
return readl(base + offset);
}
void gicv3_reg_writel(uint32_t cpu_or_dist, uint64_t offset, uint32_t reg_val)
{
void *base = cpu_or_dist & DIST_BIT ? gicv3_data.dist_base
: sgi_base_from_redist(gicv3_data.redist_base[cpu_or_dist]);
volatile void *base = cpu_or_dist & DIST_BIT ? GICD_BASE_GVA
: sgi_base_from_redist(gicr_base_cpu(cpu_or_dist));
writel(reg_val, base + offset);
}
@ -263,7 +278,7 @@ static bool gicv3_irq_get_pending(uint32_t intid)
return gicv3_read_reg(intid, GICD_ISPENDR, 32, 1);
}
static void gicv3_enable_redist(void *redist_base)
static void gicv3_enable_redist(volatile void *redist_base)
{
uint32_t val = readl(redist_base + GICR_WAKER);
unsigned int count = 100000; /* 1s */
@ -278,21 +293,15 @@ static void gicv3_enable_redist(void *redist_base)
}
}
static inline void *gicr_base_cpu(void *redist_base, uint32_t cpu)
static void gicv3_cpu_init(unsigned int cpu)
{
/* Align all the redistributors sequentially */
return redist_base + cpu * SZ_64K * 2;
}
static void gicv3_cpu_init(unsigned int cpu, void *redist_base)
{
void *sgi_base;
volatile void *sgi_base;
unsigned int i;
void *redist_base_cpu;
volatile void *redist_base_cpu;
GUEST_ASSERT(cpu < gicv3_data.nr_cpus);
redist_base_cpu = gicr_base_cpu(redist_base, cpu);
redist_base_cpu = gicr_base_cpu(cpu);
sgi_base = sgi_base_from_redist(redist_base_cpu);
gicv3_enable_redist(redist_base_cpu);
@ -310,7 +319,7 @@ static void gicv3_cpu_init(unsigned int cpu, void *redist_base)
writel(GICD_INT_DEF_PRI_X4,
sgi_base + GICR_IPRIORITYR0 + i);
gicv3_gicr_wait_for_rwp(redist_base_cpu);
gicv3_gicr_wait_for_rwp(cpu);
/* Enable the GIC system register (ICC_*) access */
write_sysreg_s(read_sysreg_s(SYS_ICC_SRE_EL1) | ICC_SRE_EL1_SRE,
@ -320,18 +329,15 @@ static void gicv3_cpu_init(unsigned int cpu, void *redist_base)
write_sysreg_s(ICC_PMR_DEF_PRIO, SYS_ICC_PMR_EL1);
/* Enable non-secure Group-1 interrupts */
write_sysreg_s(ICC_IGRPEN1_EL1_ENABLE, SYS_ICC_GRPEN1_EL1);
gicv3_data.redist_base[cpu] = redist_base_cpu;
write_sysreg_s(ICC_IGRPEN1_EL1_MASK, SYS_ICC_IGRPEN1_EL1);
}
static void gicv3_dist_init(void)
{
void *dist_base = gicv3_data.dist_base;
unsigned int i;
/* Disable the distributor until we set things up */
writel(0, dist_base + GICD_CTLR);
writel(0, GICD_BASE_GVA + GICD_CTLR);
gicv3_gicd_wait_for_rwp();
/*
@ -339,33 +345,32 @@ static void gicv3_dist_init(void)
* Also, deactivate and disable them.
*/
for (i = 32; i < gicv3_data.nr_spis; i += 32) {
writel(~0, dist_base + GICD_IGROUPR + i / 8);
writel(~0, dist_base + GICD_ICACTIVER + i / 8);
writel(~0, dist_base + GICD_ICENABLER + i / 8);
writel(~0, GICD_BASE_GVA + GICD_IGROUPR + i / 8);
writel(~0, GICD_BASE_GVA + GICD_ICACTIVER + i / 8);
writel(~0, GICD_BASE_GVA + GICD_ICENABLER + i / 8);
}
/* Set a default priority for all the SPIs */
for (i = 32; i < gicv3_data.nr_spis; i += 4)
writel(GICD_INT_DEF_PRI_X4,
dist_base + GICD_IPRIORITYR + i);
GICD_BASE_GVA + GICD_IPRIORITYR + i);
/* Wait for the settings to sync-in */
gicv3_gicd_wait_for_rwp();
/* Finally, enable the distributor globally with ARE */
writel(GICD_CTLR_ARE_NS | GICD_CTLR_ENABLE_G1A |
GICD_CTLR_ENABLE_G1, dist_base + GICD_CTLR);
GICD_CTLR_ENABLE_G1, GICD_BASE_GVA + GICD_CTLR);
gicv3_gicd_wait_for_rwp();
}
static void gicv3_init(unsigned int nr_cpus, void *dist_base)
static void gicv3_init(unsigned int nr_cpus)
{
GUEST_ASSERT(nr_cpus <= GICV3_MAX_CPUS);
gicv3_data.nr_cpus = nr_cpus;
gicv3_data.dist_base = dist_base;
gicv3_data.nr_spis = GICD_TYPER_SPIS(
readl(gicv3_data.dist_base + GICD_TYPER));
readl(GICD_BASE_GVA + GICD_TYPER));
if (gicv3_data.nr_spis > 1020)
gicv3_data.nr_spis = 1020;
@ -396,3 +401,27 @@ const struct gic_common_ops gicv3_ops = {
.gic_irq_get_pending = gicv3_irq_get_pending,
.gic_irq_set_config = gicv3_irq_set_config,
};
void gic_rdist_enable_lpis(vm_paddr_t cfg_table, size_t cfg_table_size,
vm_paddr_t pend_table)
{
volatile void *rdist_base = gicr_base_cpu(guest_get_vcpuid());
u32 ctlr;
u64 val;
val = (cfg_table |
GICR_PROPBASER_InnerShareable |
GICR_PROPBASER_RaWaWb |
((ilog2(cfg_table_size) - 1) & GICR_PROPBASER_IDBITS_MASK));
writeq_relaxed(val, rdist_base + GICR_PROPBASER);
val = (pend_table |
GICR_PENDBASER_InnerShareable |
GICR_PENDBASER_RaWaWb);
writeq_relaxed(val, rdist_base + GICR_PENDBASER);
ctlr = readl_relaxed(rdist_base + GICR_CTLR);
ctlr |= GICR_CTLR_ENABLE_LPIS;
writel_relaxed(ctlr, rdist_base + GICR_CTLR);
}

View File

@ -0,0 +1,248 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Guest ITS library, generously donated by drivers/irqchip/irq-gic-v3-its.c
* over in the kernel tree.
*/
#include <linux/kvm.h>
#include <linux/sizes.h>
#include <asm/kvm_para.h>
#include <asm/kvm.h>
#include "kvm_util.h"
#include "vgic.h"
#include "gic.h"
#include "gic_v3.h"
#include "processor.h"
static u64 its_read_u64(unsigned long offset)
{
return readq_relaxed(GITS_BASE_GVA + offset);
}
static void its_write_u64(unsigned long offset, u64 val)
{
writeq_relaxed(val, GITS_BASE_GVA + offset);
}
static u32 its_read_u32(unsigned long offset)
{
return readl_relaxed(GITS_BASE_GVA + offset);
}
static void its_write_u32(unsigned long offset, u32 val)
{
writel_relaxed(val, GITS_BASE_GVA + offset);
}
static unsigned long its_find_baser(unsigned int type)
{
int i;
for (i = 0; i < GITS_BASER_NR_REGS; i++) {
u64 baser;
unsigned long offset = GITS_BASER + (i * sizeof(baser));
baser = its_read_u64(offset);
if (GITS_BASER_TYPE(baser) == type)
return offset;
}
GUEST_FAIL("Couldn't find an ITS BASER of type %u", type);
return -1;
}
static void its_install_table(unsigned int type, vm_paddr_t base, size_t size)
{
unsigned long offset = its_find_baser(type);
u64 baser;
baser = ((size / SZ_64K) - 1) |
GITS_BASER_PAGE_SIZE_64K |
GITS_BASER_InnerShareable |
base |
GITS_BASER_RaWaWb |
GITS_BASER_VALID;
its_write_u64(offset, baser);
}
static void its_install_cmdq(vm_paddr_t base, size_t size)
{
u64 cbaser;
cbaser = ((size / SZ_4K) - 1) |
GITS_CBASER_InnerShareable |
base |
GITS_CBASER_RaWaWb |
GITS_CBASER_VALID;
its_write_u64(GITS_CBASER, cbaser);
}
void its_init(vm_paddr_t coll_tbl, size_t coll_tbl_sz,
vm_paddr_t device_tbl, size_t device_tbl_sz,
vm_paddr_t cmdq, size_t cmdq_size)
{
u32 ctlr;
its_install_table(GITS_BASER_TYPE_COLLECTION, coll_tbl, coll_tbl_sz);
its_install_table(GITS_BASER_TYPE_DEVICE, device_tbl, device_tbl_sz);
its_install_cmdq(cmdq, cmdq_size);
ctlr = its_read_u32(GITS_CTLR);
ctlr |= GITS_CTLR_ENABLE;
its_write_u32(GITS_CTLR, ctlr);
}
struct its_cmd_block {
union {
u64 raw_cmd[4];
__le64 raw_cmd_le[4];
};
};
static inline void its_fixup_cmd(struct its_cmd_block *cmd)
{
/* Let's fixup BE commands */
cmd->raw_cmd_le[0] = cpu_to_le64(cmd->raw_cmd[0]);
cmd->raw_cmd_le[1] = cpu_to_le64(cmd->raw_cmd[1]);
cmd->raw_cmd_le[2] = cpu_to_le64(cmd->raw_cmd[2]);
cmd->raw_cmd_le[3] = cpu_to_le64(cmd->raw_cmd[3]);
}
static void its_mask_encode(u64 *raw_cmd, u64 val, int h, int l)
{
u64 mask = GENMASK_ULL(h, l);
*raw_cmd &= ~mask;
*raw_cmd |= (val << l) & mask;
}
static void its_encode_cmd(struct its_cmd_block *cmd, u8 cmd_nr)
{
its_mask_encode(&cmd->raw_cmd[0], cmd_nr, 7, 0);
}
static void its_encode_devid(struct its_cmd_block *cmd, u32 devid)
{
its_mask_encode(&cmd->raw_cmd[0], devid, 63, 32);
}
static void its_encode_event_id(struct its_cmd_block *cmd, u32 id)
{
its_mask_encode(&cmd->raw_cmd[1], id, 31, 0);
}
static void its_encode_phys_id(struct its_cmd_block *cmd, u32 phys_id)
{
its_mask_encode(&cmd->raw_cmd[1], phys_id, 63, 32);
}
static void its_encode_size(struct its_cmd_block *cmd, u8 size)
{
its_mask_encode(&cmd->raw_cmd[1], size, 4, 0);
}
static void its_encode_itt(struct its_cmd_block *cmd, u64 itt_addr)
{
its_mask_encode(&cmd->raw_cmd[2], itt_addr >> 8, 51, 8);
}
static void its_encode_valid(struct its_cmd_block *cmd, int valid)
{
its_mask_encode(&cmd->raw_cmd[2], !!valid, 63, 63);
}
static void its_encode_target(struct its_cmd_block *cmd, u64 target_addr)
{
its_mask_encode(&cmd->raw_cmd[2], target_addr >> 16, 51, 16);
}
static void its_encode_collection(struct its_cmd_block *cmd, u16 col)
{
its_mask_encode(&cmd->raw_cmd[2], col, 15, 0);
}
#define GITS_CMDQ_POLL_ITERATIONS 0
static void its_send_cmd(void *cmdq_base, struct its_cmd_block *cmd)
{
u64 cwriter = its_read_u64(GITS_CWRITER);
struct its_cmd_block *dst = cmdq_base + cwriter;
u64 cbaser = its_read_u64(GITS_CBASER);
size_t cmdq_size;
u64 next;
int i;
cmdq_size = ((cbaser & 0xFF) + 1) * SZ_4K;
its_fixup_cmd(cmd);
WRITE_ONCE(*dst, *cmd);
dsb(ishst);
next = (cwriter + sizeof(*cmd)) % cmdq_size;
its_write_u64(GITS_CWRITER, next);
/*
* Polling isn't necessary considering KVM's ITS emulation at the time
* of writing this, as the CMDQ is processed synchronously after a write
* to CWRITER.
*/
for (i = 0; its_read_u64(GITS_CREADR) != next; i++) {
__GUEST_ASSERT(i < GITS_CMDQ_POLL_ITERATIONS,
"ITS didn't process command at offset %lu after %d iterations\n",
cwriter, i);
cpu_relax();
}
}
void its_send_mapd_cmd(void *cmdq_base, u32 device_id, vm_paddr_t itt_base,
size_t itt_size, bool valid)
{
struct its_cmd_block cmd = {};
its_encode_cmd(&cmd, GITS_CMD_MAPD);
its_encode_devid(&cmd, device_id);
its_encode_size(&cmd, ilog2(itt_size) - 1);
its_encode_itt(&cmd, itt_base);
its_encode_valid(&cmd, valid);
its_send_cmd(cmdq_base, &cmd);
}
void its_send_mapc_cmd(void *cmdq_base, u32 vcpu_id, u32 collection_id, bool valid)
{
struct its_cmd_block cmd = {};
its_encode_cmd(&cmd, GITS_CMD_MAPC);
its_encode_collection(&cmd, collection_id);
its_encode_target(&cmd, vcpu_id);
its_encode_valid(&cmd, valid);
its_send_cmd(cmdq_base, &cmd);
}
void its_send_mapti_cmd(void *cmdq_base, u32 device_id, u32 event_id,
u32 collection_id, u32 intid)
{
struct its_cmd_block cmd = {};
its_encode_cmd(&cmd, GITS_CMD_MAPTI);
its_encode_devid(&cmd, device_id);
its_encode_event_id(&cmd, event_id);
its_encode_phys_id(&cmd, intid);
its_encode_collection(&cmd, collection_id);
its_send_cmd(cmdq_base, &cmd);
}
void its_send_invall_cmd(void *cmdq_base, u32 collection_id)
{
struct its_cmd_block cmd = {};
its_encode_cmd(&cmd, GITS_CMD_INVALL);
its_encode_collection(&cmd, collection_id);
its_send_cmd(cmdq_base, &cmd);
}

View File

@ -3,8 +3,10 @@
* ARM Generic Interrupt Controller (GIC) v3 host support
*/
#include <linux/kernel.h>
#include <linux/kvm.h>
#include <linux/sizes.h>
#include <asm/cputype.h>
#include <asm/kvm_para.h>
#include <asm/kvm.h>
@ -19,8 +21,6 @@
* Input args:
* vm - KVM VM
* nr_vcpus - Number of vCPUs supported by this VM
* gicd_base_gpa - Guest Physical Address of the Distributor region
* gicr_base_gpa - Guest Physical Address of the Redistributor region
*
* Output args: None
*
@ -30,11 +30,10 @@
* redistributor regions of the guest. Since it depends on the number of
* vCPUs for the VM, it must be called after all the vCPUs have been created.
*/
int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
uint64_t gicd_base_gpa, uint64_t gicr_base_gpa)
int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs)
{
int gic_fd;
uint64_t redist_attr;
uint64_t attr;
struct list_head *iter;
unsigned int nr_gic_pages, nr_vcpus_created = 0;
@ -60,18 +59,19 @@ int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
KVM_DEV_ARM_VGIC_CTRL_INIT, NULL);
attr = GICD_BASE_GPA;
kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
KVM_VGIC_V3_ADDR_TYPE_DIST, &gicd_base_gpa);
KVM_VGIC_V3_ADDR_TYPE_DIST, &attr);
nr_gic_pages = vm_calc_num_guest_pages(vm->mode, KVM_VGIC_V3_DIST_SIZE);
virt_map(vm, gicd_base_gpa, gicd_base_gpa, nr_gic_pages);
virt_map(vm, GICD_BASE_GPA, GICD_BASE_GPA, nr_gic_pages);
/* Redistributor setup */
redist_attr = REDIST_REGION_ATTR_ADDR(nr_vcpus, gicr_base_gpa, 0, 0);
attr = REDIST_REGION_ATTR_ADDR(nr_vcpus, GICR_BASE_GPA, 0, 0);
kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &redist_attr);
KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &attr);
nr_gic_pages = vm_calc_num_guest_pages(vm->mode,
KVM_VGIC_V3_REDIST_SIZE * nr_vcpus);
virt_map(vm, gicr_base_gpa, gicr_base_gpa, nr_gic_pages);
virt_map(vm, GICR_BASE_GPA, GICR_BASE_GPA, nr_gic_pages);
kvm_device_attr_set(gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
KVM_DEV_ARM_VGIC_CTRL_INIT, NULL);
@ -168,3 +168,21 @@ void kvm_irq_write_isactiver(int gic_fd, uint32_t intid, struct kvm_vcpu *vcpu)
{
vgic_poke_irq(gic_fd, intid, vcpu, GICD_ISACTIVER);
}
int vgic_its_setup(struct kvm_vm *vm)
{
int its_fd = kvm_create_device(vm, KVM_DEV_TYPE_ARM_VGIC_ITS);
u64 attr;
attr = GITS_BASE_GPA;
kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
KVM_VGIC_ITS_ADDR_TYPE, &attr);
kvm_device_attr_set(its_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
KVM_DEV_ARM_VGIC_CTRL_INIT, NULL);
virt_map(vm, GITS_BASE_GPA, GITS_BASE_GPA,
vm_calc_num_guest_pages(vm->mode, KVM_VGIC_V3_ITS_SIZE));
return its_fd;
}

View File

@ -1292,6 +1292,12 @@ static void kvm_destroy_devices(struct kvm *kvm)
* We do not need to take the kvm->lock here, because nobody else
* has a reference to the struct kvm at this point and therefore
* cannot access the devices list anyhow.
*
* The device list is generally managed as an rculist, but list_del()
* is used intentionally here. If a bug in KVM introduced a reader that
* was not backed by a reference on the kvm struct, the hope is that
* it'd consume the poisoned forward pointer instead of suffering a
* use-after-free, even though this cannot be guaranteed.
*/
list_for_each_entry_safe(dev, tmp, &kvm->devices, vm_node) {
list_del(&dev->vm_node);
@ -4688,7 +4694,8 @@ static int kvm_device_release(struct inode *inode, struct file *filp)
if (dev->ops->release) {
mutex_lock(&kvm->lock);
list_del(&dev->vm_node);
list_del_rcu(&dev->vm_node);
synchronize_rcu();
dev->ops->release(dev);
mutex_unlock(&kvm->lock);
}
@ -4771,7 +4778,7 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
kfree(dev);
return ret;
}
list_add(&dev->vm_node, &kvm->devices);
list_add_rcu(&dev->vm_node, &kvm->devices);
mutex_unlock(&kvm->lock);
if (ops->init)
@ -4782,7 +4789,8 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
if (ret < 0) {
kvm_put_kvm_no_destroy(kvm);
mutex_lock(&kvm->lock);
list_del(&dev->vm_node);
list_del_rcu(&dev->vm_node);
synchronize_rcu();
if (ops->release)
ops->release(dev);
mutex_unlock(&kvm->lock);

View File

@ -366,6 +366,8 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
struct kvm_device *tmp;
struct kvm_vfio *kv;
lockdep_assert_held(&dev->kvm->lock);
/* Only one VFIO "device" per VM */
list_for_each_entry(tmp, &dev->kvm->devices, vm_node)
if (tmp->ops == &kvm_vfio_ops)