2019-06-04 08:11:32 +00:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
2015-06-19 13:45:05 +00:00
|
|
|
/*
|
|
|
|
* KVM PMU support for AMD
|
|
|
|
*
|
|
|
|
* Copyright 2015, Red Hat, Inc. and/or its affiliates.
|
|
|
|
*
|
|
|
|
* Author:
|
|
|
|
* Wei Huang <wei@redhat.com>
|
|
|
|
*
|
|
|
|
* Implementation is based on pmu_intel.c file
|
|
|
|
*/
|
KVM: x86: Unify pr_fmt to use module name for all KVM modules
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code. In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.
Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.
Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.
Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20221130230934.1014142-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-30 23:09:18 +00:00
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
|
|
|
|
2015-06-19 13:45:05 +00:00
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/kvm_host.h>
|
|
|
|
#include <linux/perf_event.h>
|
|
|
|
#include "x86.h"
|
|
|
|
#include "cpuid.h"
|
|
|
|
#include "lapic.h"
|
|
|
|
#include "pmu.h"
|
2021-11-17 08:03:04 +00:00
|
|
|
#include "svm.h"
|
2015-06-19 13:45:05 +00:00
|
|
|
|
2018-02-05 19:24:52 +00:00
|
|
|
enum pmu_type {
|
|
|
|
PMU_TYPE_COUNTER = 0,
|
|
|
|
PMU_TYPE_EVNTSEL,
|
|
|
|
};
|
|
|
|
|
KVM: x86/pmu: Move pmc_idx => pmc translation helper to common code
Add a common helper for *internal* PMC lookups, and delete the ops hook
and Intel's implementation. Keep AMD's implementation, but rename it to
amd_pmu_get_pmc() to make it somewhat more obvious that it's suited for
both KVM-internal and guest-initiated lookups.
Because KVM tracks all counters in a single bitmap, getting a counter
when iterating over a bitmap, e.g. of all valid PMCs, requires a small
amount of math, that while simple, isn't super obvious and doesn't use the
same semantics as PMC lookups from RDPMC! Although AMD doesn't support
fixed counters, the common PMU code still behaves as if there a split, the
high half of which just happens to always be empty.
Opportunstically add a comment to explain both what is going on, and why
KVM uses a single bitmap, e.g. the boilerplate for iterating over separate
bitmaps could be done via macros, so it's not (just) about deduplicating
code.
Link: https://lore.kernel.org/r/20231110022857.1273836-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-10 02:28:50 +00:00
|
|
|
static struct kvm_pmc *amd_pmu_get_pmc(struct kvm_pmu *pmu, int pmc_idx)
|
2018-02-05 19:24:52 +00:00
|
|
|
{
|
2022-08-31 08:53:28 +00:00
|
|
|
unsigned int num_counters = pmu->nr_arch_gp_counters;
|
|
|
|
|
|
|
|
if (pmc_idx >= num_counters)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return &pmu->gp_counters[array_index_nospec(pmc_idx, num_counters)];
|
2018-02-05 19:24:52 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
|
|
|
|
enum pmu_type type)
|
|
|
|
{
|
2021-03-23 08:45:15 +00:00
|
|
|
struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
|
2022-08-31 08:53:28 +00:00
|
|
|
unsigned int idx;
|
2021-03-23 08:45:15 +00:00
|
|
|
|
2022-02-23 22:57:41 +00:00
|
|
|
if (!vcpu->kvm->arch.enable_pmu)
|
2021-11-17 08:03:04 +00:00
|
|
|
return NULL;
|
|
|
|
|
2018-02-05 19:24:52 +00:00
|
|
|
switch (msr) {
|
2022-08-31 08:53:28 +00:00
|
|
|
case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
|
2021-03-23 08:45:15 +00:00
|
|
|
if (!guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE))
|
|
|
|
return NULL;
|
2022-08-31 08:53:28 +00:00
|
|
|
/*
|
|
|
|
* Each PMU counter has a pair of CTL and CTR MSRs. CTLn
|
|
|
|
* MSRs (accessed via EVNTSEL) are even, CTRn MSRs are odd.
|
|
|
|
*/
|
|
|
|
idx = (unsigned int)((msr - MSR_F15H_PERF_CTL0) / 2);
|
|
|
|
if (!(msr & 0x1) != (type == PMU_TYPE_EVNTSEL))
|
|
|
|
return NULL;
|
|
|
|
break;
|
2018-02-05 19:24:52 +00:00
|
|
|
case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
|
|
|
|
if (type != PMU_TYPE_EVNTSEL)
|
|
|
|
return NULL;
|
2022-08-31 08:53:28 +00:00
|
|
|
idx = msr - MSR_K7_EVNTSEL0;
|
2018-02-05 19:24:52 +00:00
|
|
|
break;
|
|
|
|
case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
|
|
|
|
if (type != PMU_TYPE_COUNTER)
|
|
|
|
return NULL;
|
2022-08-31 08:53:28 +00:00
|
|
|
idx = msr - MSR_K7_PERFCTR0;
|
2018-02-05 19:24:52 +00:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
KVM: x86/pmu: Move pmc_idx => pmc translation helper to common code
Add a common helper for *internal* PMC lookups, and delete the ops hook
and Intel's implementation. Keep AMD's implementation, but rename it to
amd_pmu_get_pmc() to make it somewhat more obvious that it's suited for
both KVM-internal and guest-initiated lookups.
Because KVM tracks all counters in a single bitmap, getting a counter
when iterating over a bitmap, e.g. of all valid PMCs, requires a small
amount of math, that while simple, isn't super obvious and doesn't use the
same semantics as PMC lookups from RDPMC! Although AMD doesn't support
fixed counters, the common PMU code still behaves as if there a split, the
high half of which just happens to always be empty.
Opportunstically add a comment to explain both what is going on, and why
KVM uses a single bitmap, e.g. the boilerplate for iterating over separate
bitmaps could be done via macros, so it's not (just) about deduplicating
code.
Link: https://lore.kernel.org/r/20231110022857.1273836-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-10 02:28:50 +00:00
|
|
|
return amd_pmu_get_pmc(pmu, idx);
|
2018-02-05 19:24:52 +00:00
|
|
|
}
|
|
|
|
|
KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad index
Apply the pre-intercepts RDPMC validity check only to AMD, and rename all
relevant functions to make it as clear as possible that the check is not a
standard PMC index check. On Intel, the basic rule is that only invalid
opcodes and privilege/permission/mode checks have priority over VM-Exit,
i.e. RDPMC with an invalid index should VM-Exit, not #GP. While the SDM
doesn't explicitly call out RDPMC, it _does_ explicitly use RDMSR of a
non-existent MSR as an example where VM-Exit has priority over #GP, and
RDPMC is effectively just a variation of RDMSR.
Manually testing on various Intel CPUs confirms this behavior, and the
inverted priority was introduced for SVM compatibility, i.e. was not an
intentional change for Intel PMUs. On AMD, *all* exceptions on RDPMC have
priority over VM-Exit.
Check for a NULL kvm_pmu_ops.check_rdpmc_early instead of using a RET0
static call so as to provide a convenient location to document the
difference between Intel and AMD, and to again try to make it as obvious
as possible that the early check is a one-off thing, not a generic "is
this PMC valid?" helper.
Fixes: 8061252ee0d2 ("KVM: SVM: Add intercept checks for remaining twobyte instructions")
Cc: Jim Mattson <jmattson@google.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240109230250.424295-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-09 23:02:27 +00:00
|
|
|
static int amd_check_rdpmc_early(struct kvm_vcpu *vcpu, unsigned int idx)
|
2015-06-19 13:45:05 +00:00
|
|
|
{
|
2015-06-12 05:34:55 +00:00
|
|
|
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
|
|
|
|
|
KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad index
Apply the pre-intercepts RDPMC validity check only to AMD, and rename all
relevant functions to make it as clear as possible that the check is not a
standard PMC index check. On Intel, the basic rule is that only invalid
opcodes and privilege/permission/mode checks have priority over VM-Exit,
i.e. RDPMC with an invalid index should VM-Exit, not #GP. While the SDM
doesn't explicitly call out RDPMC, it _does_ explicitly use RDMSR of a
non-existent MSR as an example where VM-Exit has priority over #GP, and
RDPMC is effectively just a variation of RDMSR.
Manually testing on various Intel CPUs confirms this behavior, and the
inverted priority was introduced for SVM compatibility, i.e. was not an
intentional change for Intel PMUs. On AMD, *all* exceptions on RDPMC have
priority over VM-Exit.
Check for a NULL kvm_pmu_ops.check_rdpmc_early instead of using a RET0
static call so as to provide a convenient location to document the
difference between Intel and AMD, and to again try to make it as obvious
as possible that the early check is a one-off thing, not a generic "is
this PMC valid?" helper.
Fixes: 8061252ee0d2 ("KVM: SVM: Add intercept checks for remaining twobyte instructions")
Cc: Jim Mattson <jmattson@google.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240109230250.424295-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-09 23:02:27 +00:00
|
|
|
if (idx >= pmu->nr_arch_gp_counters)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
return 0;
|
2015-06-19 13:45:05 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* idx is the ECX register of RDPMC instruction */
|
2019-10-27 10:52:40 +00:00
|
|
|
static struct kvm_pmc *amd_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
|
|
|
|
unsigned int idx, u64 *mask)
|
2015-06-19 13:45:05 +00:00
|
|
|
{
|
KVM: x86/pmu: Move pmc_idx => pmc translation helper to common code
Add a common helper for *internal* PMC lookups, and delete the ops hook
and Intel's implementation. Keep AMD's implementation, but rename it to
amd_pmu_get_pmc() to make it somewhat more obvious that it's suited for
both KVM-internal and guest-initiated lookups.
Because KVM tracks all counters in a single bitmap, getting a counter
when iterating over a bitmap, e.g. of all valid PMCs, requires a small
amount of math, that while simple, isn't super obvious and doesn't use the
same semantics as PMC lookups from RDPMC! Although AMD doesn't support
fixed counters, the common PMU code still behaves as if there a split, the
high half of which just happens to always be empty.
Opportunstically add a comment to explain both what is going on, and why
KVM uses a single bitmap, e.g. the boilerplate for iterating over separate
bitmaps could be done via macros, so it's not (just) about deduplicating
code.
Link: https://lore.kernel.org/r/20231110022857.1273836-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-10 02:28:50 +00:00
|
|
|
return amd_pmu_get_pmc(vcpu_to_pmu(vcpu), idx);
|
2015-06-19 13:45:05 +00:00
|
|
|
}
|
|
|
|
|
2019-10-27 10:52:41 +00:00
|
|
|
static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
|
2015-06-19 13:45:05 +00:00
|
|
|
{
|
2015-06-12 05:34:55 +00:00
|
|
|
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
|
2019-10-27 10:52:41 +00:00
|
|
|
struct kvm_pmc *pmc;
|
2015-06-12 05:34:55 +00:00
|
|
|
|
2019-10-27 10:52:41 +00:00
|
|
|
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
|
|
|
|
pmc = pmc ? pmc : get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
|
2015-06-12 05:34:55 +00:00
|
|
|
|
2019-10-27 10:52:41 +00:00
|
|
|
return pmc;
|
2015-06-19 13:45:05 +00:00
|
|
|
}
|
|
|
|
|
2023-06-03 01:10:57 +00:00
|
|
|
static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
|
|
|
|
{
|
|
|
|
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
|
|
|
|
|
|
|
|
switch (msr) {
|
|
|
|
case MSR_K7_EVNTSEL0 ... MSR_K7_PERFCTR3:
|
|
|
|
return pmu->version > 0;
|
|
|
|
case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
|
|
|
|
return guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE);
|
|
|
|
case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
|
|
|
|
case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
|
|
|
|
case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
|
|
|
|
return pmu->version > 1;
|
|
|
|
default:
|
|
|
|
if (msr > MSR_F15H_PERF_CTR5 &&
|
|
|
|
msr < MSR_F15H_PERF_CTL0 + 2 * pmu->nr_arch_gp_counters)
|
|
|
|
return pmu->version > 1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return amd_msr_idx_to_pmc(vcpu, msr);
|
|
|
|
}
|
|
|
|
|
2020-05-29 07:43:44 +00:00
|
|
|
static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
2015-06-19 13:45:05 +00:00
|
|
|
{
|
2015-06-12 05:34:55 +00:00
|
|
|
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
|
|
|
|
struct kvm_pmc *pmc;
|
2020-05-29 07:43:44 +00:00
|
|
|
u32 msr = msr_info->index;
|
2015-06-12 05:34:55 +00:00
|
|
|
|
2018-02-05 19:24:52 +00:00
|
|
|
/* MSR_PERFCTRn */
|
|
|
|
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
|
2015-06-12 05:34:55 +00:00
|
|
|
if (pmc) {
|
2020-05-29 07:43:44 +00:00
|
|
|
msr_info->data = pmc_read_counter(pmc);
|
2015-06-12 05:34:55 +00:00
|
|
|
return 0;
|
|
|
|
}
|
2018-02-05 19:24:52 +00:00
|
|
|
/* MSR_EVNTSELn */
|
|
|
|
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
|
2015-06-12 05:34:55 +00:00
|
|
|
if (pmc) {
|
2020-05-29 07:43:44 +00:00
|
|
|
msr_info->data = pmc->eventsel;
|
2015-06-12 05:34:55 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-06-19 13:45:05 +00:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
|
|
|
{
|
2015-06-12 05:34:55 +00:00
|
|
|
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
|
|
|
|
struct kvm_pmc *pmc;
|
|
|
|
u32 msr = msr_info->index;
|
|
|
|
u64 data = msr_info->data;
|
|
|
|
|
2018-02-05 19:24:52 +00:00
|
|
|
/* MSR_PERFCTRn */
|
|
|
|
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
|
2015-06-12 05:34:55 +00:00
|
|
|
if (pmc) {
|
KVM: x86/pmu: Truncate counter value to allowed width on write
Performance counters are defined to have width less than 64 bits. The
vPMU code maintains the counters in u64 variables but assumes the value
to fit within the defined width. However, for Intel non-full-width
counters (MSR_IA32_PERFCTRx) the value receieved from the guest is
truncated to 32 bits and then sign-extended to full 64 bits. If a
negative value is set, it's sign-extended to 64 bits, but then in
kvm_pmu_incr_counter() it's incremented, truncated, and compared to the
previous value for overflow detection.
That previous value is not truncated, so it always evaluates bigger than
the truncated new one, and a PMI is injected. If the PMI handler writes
a negative counter value itself, the vCPU never quits the PMI loop.
Turns out that Linux PMI handler actually does write the counter with
the value just read with RDPMC, so when no full-width support is exposed
via MSR_IA32_PERF_CAPABILITIES, and the guest initializes the counter to
a negative value, it locks up.
This has been observed in the field, for example, when the guest configures
atop to use perfevents and runs two instances of it simultaneously.
To address the problem, maintain the invariant that the counter value
always fits in the defined bit width, by truncating the received value
in the respective set_msr methods. For better readability, factor the
out into a helper function, pmc_write_counter(), shared by vmx and svm
parts.
Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
Cc: stable@vger.kernel.org
Signed-off-by: Roman Kagan <rkagan@amazon.de>
Link: https://lore.kernel.org/all/20230504120042.785651-1-rkagan@amazon.de
Tested-by: Like Xu <likexu@tencent.com>
[sean: tweak changelog, s/set/write in the helper]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-05-04 12:00:42 +00:00
|
|
|
pmc_write_counter(pmc, data);
|
2015-06-12 05:34:55 +00:00
|
|
|
return 0;
|
|
|
|
}
|
2018-02-05 19:24:52 +00:00
|
|
|
/* MSR_EVNTSELn */
|
|
|
|
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
|
2015-06-12 05:34:55 +00:00
|
|
|
if (pmc) {
|
KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
reserved bits are cleared, and some are not. Specifically, on
Zen3/Milan, bits 19 and 42 are not cleared.
When emulating such a WRMSR, KVM should not synthesize a #GP,
regardless of which bits are set. However, undocumented bits should
not be passed through to the hardware MSR. So, rather than checking
for reserved bits and synthesizing a #GP, just clear the reserved
bits.
This may seem pedantic, but since KVM currently does not support the
"Host/Guest Only" bits (41:40), it is necessary to clear these bits
rather than synthesizing #GP, because some popular guests (e.g Linux)
will set the "Host Only" bit even on CPUs that don't support
EFER.SVME, and they don't expect a #GP.
For example,
root@Ubuntu1804:~# perf stat -e r26 -a sleep 1
Performance counter stats for 'system wide':
0 r26
1.001070977 seconds time elapsed
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace:
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90
Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Reported-by: Lotus Fenn <lotusf@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Like Xu <likexu@tencent.com>
Reviewed-by: David Dunn <daviddunn@google.com>
Message-Id: <20220226234131.2167175-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-26 23:41:31 +00:00
|
|
|
data &= ~pmu->reserved_bits;
|
2022-05-18 13:25:06 +00:00
|
|
|
if (data != pmc->eventsel) {
|
|
|
|
pmc->eventsel = data;
|
2023-03-10 11:33:49 +00:00
|
|
|
kvm_pmu_request_counter_reprogram(pmc);
|
2022-05-18 13:25:06 +00:00
|
|
|
}
|
KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
reserved bits are cleared, and some are not. Specifically, on
Zen3/Milan, bits 19 and 42 are not cleared.
When emulating such a WRMSR, KVM should not synthesize a #GP,
regardless of which bits are set. However, undocumented bits should
not be passed through to the hardware MSR. So, rather than checking
for reserved bits and synthesizing a #GP, just clear the reserved
bits.
This may seem pedantic, but since KVM currently does not support the
"Host/Guest Only" bits (41:40), it is necessary to clear these bits
rather than synthesizing #GP, because some popular guests (e.g Linux)
will set the "Host Only" bit even on CPUs that don't support
EFER.SVME, and they don't expect a #GP.
For example,
root@Ubuntu1804:~# perf stat -e r26 -a sleep 1
Performance counter stats for 'system wide':
0 r26
1.001070977 seconds time elapsed
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace:
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90
Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Reported-by: Lotus Fenn <lotusf@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Like Xu <likexu@tencent.com>
Reviewed-by: David Dunn <daviddunn@google.com>
Message-Id: <20220226234131.2167175-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-26 23:41:31 +00:00
|
|
|
return 0;
|
2015-06-12 05:34:55 +00:00
|
|
|
}
|
|
|
|
|
2015-06-19 13:45:05 +00:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2015-06-12 05:34:55 +00:00
|
|
|
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
|
2023-06-03 01:10:57 +00:00
|
|
|
union cpuid_0x80000022_ebx ebx;
|
2015-06-12 05:34:55 +00:00
|
|
|
|
2023-06-03 01:10:57 +00:00
|
|
|
pmu->version = 1;
|
|
|
|
if (guest_cpuid_has(vcpu, X86_FEATURE_PERFMON_V2)) {
|
|
|
|
pmu->version = 2;
|
|
|
|
/*
|
|
|
|
* Note, PERFMON_V2 is also in 0x80000022.0x0, i.e. the guest
|
|
|
|
* CPUID entry is guaranteed to be non-NULL.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(x86_feature_cpuid(X86_FEATURE_PERFMON_V2).function != 0x80000022 ||
|
|
|
|
x86_feature_cpuid(X86_FEATURE_PERFMON_V2).index);
|
|
|
|
ebx.full = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0)->ebx;
|
|
|
|
pmu->nr_arch_gp_counters = ebx.split.num_core_pmc;
|
|
|
|
} else if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) {
|
2018-02-05 19:24:52 +00:00
|
|
|
pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE;
|
2023-06-03 01:10:57 +00:00
|
|
|
} else {
|
2018-02-05 19:24:52 +00:00
|
|
|
pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
|
2023-06-03 01:10:57 +00:00
|
|
|
}
|
2018-02-05 19:24:52 +00:00
|
|
|
|
2023-06-03 01:10:55 +00:00
|
|
|
pmu->nr_arch_gp_counters = min_t(unsigned int, pmu->nr_arch_gp_counters,
|
|
|
|
kvm_pmu_cap.num_counters_gp);
|
|
|
|
|
2023-06-03 01:10:57 +00:00
|
|
|
if (pmu->version > 1) {
|
|
|
|
pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1);
|
|
|
|
pmu->global_status_mask = pmu->global_ctrl_mask;
|
|
|
|
}
|
|
|
|
|
2015-06-12 05:34:55 +00:00
|
|
|
pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
|
2021-11-18 13:03:20 +00:00
|
|
|
pmu->reserved_bits = 0xfffffff000280000ull;
|
2022-03-08 01:24:52 +00:00
|
|
|
pmu->raw_event_mask = AMD64_RAW_EVENT_MASK;
|
2015-06-12 05:34:55 +00:00
|
|
|
/* not applicable to AMD; but clean them to prevent any fall out */
|
|
|
|
pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
|
|
|
|
pmu->nr_arch_fixed_counters = 0;
|
2019-10-27 10:52:43 +00:00
|
|
|
bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
|
2015-06-19 13:45:05 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void amd_pmu_init(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2015-06-12 05:34:55 +00:00
|
|
|
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
|
|
|
|
int i;
|
|
|
|
|
2022-09-19 09:10:08 +00:00
|
|
|
BUILD_BUG_ON(KVM_AMD_PMC_MAX_GENERIC > AMD64_NUM_COUNTERS_CORE);
|
|
|
|
BUILD_BUG_ON(KVM_AMD_PMC_MAX_GENERIC > INTEL_PMC_MAX_GENERIC);
|
2018-02-05 19:24:52 +00:00
|
|
|
|
2022-09-19 09:10:08 +00:00
|
|
|
for (i = 0; i < KVM_AMD_PMC_MAX_GENERIC ; i++) {
|
2015-06-12 05:34:55 +00:00
|
|
|
pmu->gp_counters[i].type = KVM_PMC_GP;
|
|
|
|
pmu->gp_counters[i].vcpu = vcpu;
|
|
|
|
pmu->gp_counters[i].idx = i;
|
KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter
The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
a heavyweight and high-frequency operation, especially when host disables
the watchdog (maximum 21000000 ns) which leads to an unacceptable latency
of the guest NMI handler. It limits the use of vPMUs in the guest.
When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop
and release its existing perf_event (if any) every time EVEN in most cases
almost the same requested perf_event will be created and configured again.
For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl'
for fixed) is the same as its current config AND a new sample period based
on pmc->counter is accepted by host perf interface, the current event could
be reused safely as a new created one does. Otherwise, do release the
undesirable perf_event and reprogram a new one as usual.
It's light-weight to call pmc_pause_counter (disable, read and reset event)
and pmc_resume_counter (recalibrate period and re-enable event) as guest
expects instead of release-and-create again on any condition. Compared to
use the filterable event->attr or hw.config, a new 'u64 current_config'
field is added to save the last original programed config for each vPMC.
Based on this implementation, the number of calls to pmc_reprogram_counter
is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event.
In the usage of multiplexing perf sampling mode, the average latency of the
guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up).
If host disables watchdog, the minimum latecy of guest NMI handler could be
speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average.
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-27 10:52:42 +00:00
|
|
|
pmu->gp_counters[i].current_config = 0;
|
2015-06-12 05:34:55 +00:00
|
|
|
}
|
2015-06-19 13:45:05 +00:00
|
|
|
}
|
|
|
|
|
2022-03-29 23:50:53 +00:00
|
|
|
struct kvm_pmu_ops amd_pmu_ops __initdata = {
|
2019-10-27 10:52:40 +00:00
|
|
|
.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
|
2019-10-27 10:52:41 +00:00
|
|
|
.msr_idx_to_pmc = amd_msr_idx_to_pmc,
|
KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad index
Apply the pre-intercepts RDPMC validity check only to AMD, and rename all
relevant functions to make it as clear as possible that the check is not a
standard PMC index check. On Intel, the basic rule is that only invalid
opcodes and privilege/permission/mode checks have priority over VM-Exit,
i.e. RDPMC with an invalid index should VM-Exit, not #GP. While the SDM
doesn't explicitly call out RDPMC, it _does_ explicitly use RDMSR of a
non-existent MSR as an example where VM-Exit has priority over #GP, and
RDPMC is effectively just a variation of RDMSR.
Manually testing on various Intel CPUs confirms this behavior, and the
inverted priority was introduced for SVM compatibility, i.e. was not an
intentional change for Intel PMUs. On AMD, *all* exceptions on RDPMC have
priority over VM-Exit.
Check for a NULL kvm_pmu_ops.check_rdpmc_early instead of using a RET0
static call so as to provide a convenient location to document the
difference between Intel and AMD, and to again try to make it as obvious
as possible that the early check is a one-off thing, not a generic "is
this PMC valid?" helper.
Fixes: 8061252ee0d2 ("KVM: SVM: Add intercept checks for remaining twobyte instructions")
Cc: Jim Mattson <jmattson@google.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240109230250.424295-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-09 23:02:27 +00:00
|
|
|
.check_rdpmc_early = amd_check_rdpmc_early,
|
2015-06-19 13:45:05 +00:00
|
|
|
.is_valid_msr = amd_is_valid_msr,
|
|
|
|
.get_msr = amd_pmu_get_msr,
|
|
|
|
.set_msr = amd_pmu_set_msr,
|
|
|
|
.refresh = amd_pmu_refresh,
|
|
|
|
.init = amd_pmu_init,
|
2022-12-20 16:12:30 +00:00
|
|
|
.EVENTSEL_EVENT = AMD64_EVENTSEL_EVENT,
|
2023-01-24 23:49:00 +00:00
|
|
|
.MAX_NR_GP_COUNTERS = KVM_AMD_PMC_MAX_GENERIC,
|
2023-06-03 01:10:53 +00:00
|
|
|
.MIN_NR_GP_COUNTERS = AMD64_NUM_COUNTERS,
|
2015-06-19 13:45:05 +00:00
|
|
|
};
|