Commit Graph

136 Commits

Author SHA1 Message Date
Sean Christopherson 83bdfe04c9 KVM: x86/pmu: Avoid CPL lookup if PMC enabline for USER and KERNEL is the same
Don't bother querying the CPL if a PMC is (not) counting for both USER and
KERNEL, i.e. if the end result is guaranteed to be the same regardless of
the CPL.  Querying the CPL on Intel requires a VMREAD, i.e. isn't free,
and a single CMP+Jcc is cheap.

Link: https://lore.kernel.org/r/20231110022857.1273836-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01 09:35:48 -08:00
Sean Christopherson e35529fb4a KVM: x86/pmu: Check eventsel first when emulating (branch) insns retired
When triggering events, i.e. emulating PMC events in software, check for
a matching event selector before checking the event is allowed.  The "is
allowed" check *might* be cheap, but it could also be very costly, e.g. if
userspace has defined a large PMU event filter.  The event selector check
on the other hand is all but guaranteed to be <10 uops, e.g. looks
something like:

   0xffffffff8105e615 <+5>:	movabs $0xf0000ffff,%rax
   0xffffffff8105e61f <+15>:	xor    %rdi,%rsi
   0xffffffff8105e622 <+18>:	test   %rax,%rsi
   0xffffffff8105e625 <+21>:	sete   %al

Link: https://lore.kernel.org/r/20231110022857.1273836-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01 09:35:48 -08:00
Sean Christopherson afda2d7666 KVM: x86/pmu: Expand the comment about what bits are check emulating events
Expand the comment about what bits are and aren't checked when emulating
PMC events in software.  As pointed out by Jim, AMD's mask includes bits
35:32, which on Intel overlap with the IN_TX and IN_TXCP bits (32 and 33)
as well as reserved bits (34 and 45).

Checking The IN_TX* bits is actually correct, as it's safe to assert that
the vCPU can't be in an HLE/RTM transaction if KVM is emulating an
instruction, i.e. KVM *shouldn't count if either of those bits is set.

For the reserved bits, KVM is has equal odds of being right if Intel adds
new behavior, i.e. ignoring them is just as likely to be correct as
checking them.

Opportunistically explain *why* the other flags aren't checked.

Suggested-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20231110022857.1273836-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01 09:35:48 -08:00
Sean Christopherson f19063b1ca KVM: x86/pmu: Snapshot event selectors that KVM emulates in software
Snapshot the event selectors for the events that KVM emulates in software,
which is currently instructions retired and branch instructions retired.
The event selectors a tied to the underlying CPU, i.e. are constant for a
given platform even though perf doesn't manage the mappings as such.

Getting the event selectors from perf isn't exactly cheap, especially if
mitigations are enabled, as at least one indirect call is involved.

Snapshot the values in KVM instead of optimizing perf as working with the
raw event selectors will be required if KVM ever wants to emulate events
that aren't part of perf's uABI, i.e. that don't have an "enum perf_hw_id"
entry.

Link: https://lore.kernel.org/r/20231110022857.1273836-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01 09:35:48 -08:00
Sean Christopherson d2b321ea93 KVM: x86/pmu: Process only enabled PMCs when emulating events in software
Mask off disabled counters based on PERF_GLOBAL_CTRL *before* iterating
over PMCs to emulate (branch) instruction required events in software.  In
the common case where the guest isn't utilizing the PMU, pre-checking for
enabled counters turns a relatively expensive search into a few AND uops
and a Jcc.

Sadly, PMUs without PERF_GLOBAL_CTRL, e.g. most existing AMD CPUs, are out
of luck as there is no way to check that a PMC isn't being used without
checking the PMC's event selector.

Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
Link: https://lore.kernel.org/r/20231110022857.1273836-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01 09:35:48 -08:00
Sean Christopherson e5a65d4f72 KVM: x86/pmu: Add macros to iterate over all PMCs given a bitmap
Add and use kvm_for_each_pmc() to dedup a variety of open coded for-loops
that iterate over valid PMCs given a bitmap (and because seeing checkpatch
whine about bad macro style is always amusing).

No functional change intended.

Link: https://lore.kernel.org/r/20231110022857.1273836-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01 09:35:48 -08:00
Sean Christopherson 004a0aa56e KVM: x86/pmu: Snapshot and clear reprogramming bitmap before reprogramming
Refactor the handling of the reprogramming bitmap to snapshot and clear
to-be-processed bits before doing the reprogramming, and then explicitly
set bits for PMCs that need to be reprogrammed (again).  This will allow
adding a macro to iterate over all valid PMCs without having to add
special handling for the reprogramming bit, which (a) can have bits set
for non-existent PMCs and (b) needs to clear such bits to avoid wasting
cycles in perpetuity.

Note, the existing behavior of clearing bits after reprogramming does NOT
have a race with kvm_vm_ioctl_set_pmu_event_filter().  Setting a new PMU
filter synchronizes SRCU _before_ setting the bitmap, i.e. guarantees that
the vCPU isn't in the middle of reprogramming with a stale filter prior to
setting the bitmap.

Link: https://lore.kernel.org/r/20231110022857.1273836-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01 09:35:48 -08:00
Sean Christopherson b31880ca2f KVM: x86/pmu: Move pmc_idx => pmc translation helper to common code
Add a common helper for *internal* PMC lookups, and delete the ops hook
and Intel's implementation.  Keep AMD's implementation, but rename it to
amd_pmu_get_pmc() to make it somewhat more obvious that it's suited for
both KVM-internal and guest-initiated lookups.

Because KVM tracks all counters in a single bitmap, getting a counter
when iterating over a bitmap, e.g. of all valid PMCs, requires a small
amount of math, that while simple, isn't super obvious and doesn't use the
same semantics as PMC lookups from RDPMC!  Although AMD doesn't support
fixed counters, the common PMU code still behaves as if there a split, the
high half of which just happens to always be empty.

Opportunstically add a comment to explain both what is going on, and why
KVM uses a single bitmap, e.g. the boilerplate for iterating over separate
bitmaps could be done via macros, so it's not (just) about deduplicating
code.

Link: https://lore.kernel.org/r/20231110022857.1273836-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01 09:35:47 -08:00
Sean Christopherson be6b067dae KVM: x86/pmu: Add common define to capture fixed counters offset
Add a common define to "officially" solidify KVM's split of counters,
i.e. to commit to using bits 31:0 to track general purpose counters and
bits 63:32 to track fixed counters (which only Intel supports).  KVM
already bleeds this behavior all over common PMU code, and adding a KVM-
defined macro allows clarifying that the value is a _base_, as oppposed to
the _flag_ that is used to access fixed PMCs via RDPMC (which perf
confusingly calls INTEL_PMC_FIXED_RDPMC_BASE).

No functional change intended.

Link: https://lore.kernel.org/r/20231110022857.1273836-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01 09:34:31 -08:00
Sean Christopherson f933b88e20 KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabled
Move the purging of common PMU metadata from intel_pmu_refresh() to
kvm_pmu_refresh(), and invoke the vendor refresh() hook if and only if
the VM is supposed to have a vPMU.

KVM already denies access to the PMU based on kvm->arch.enable_pmu, as
get_gp_pmc_amd() returns NULL for all PMCs in that case, i.e. KVM already
violates AMD's architecture by not virtualizing a PMU (kernels have long
since learned to not panic when the PMU is unavailable).  But configuring
the PMU as if it were enabled causes unwanted side effects, e.g. calls to
kvm_pmu_trigger_event() waste an absurd number of cycles due to the
all_valid_pmc_idx bitmap being non-zero.

Fixes: b1d66dad65 ("KVM: x86/svm: Add module param to control PMU virtualization")
Reported-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Closes: https://lore.kernel.org/all/20231109180646.2963718-2-khorenko@virtuozzo.com
Link: https://lore.kernel.org/r/20231110022857.1273836-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01 09:34:31 -08:00
Sean Christopherson d652981db0 KVM: x86/pmu: Apply "fast" RDPMC only to Intel PMUs
Move the handling of "fast" RDPMC instructions, which drop bits 63:32 of
the count, to Intel.  The "fast" flag, and all modifiers for that matter,
are Intel-only and aren't supported by AMD.

Opportunistically replace open coded bit crud with proper #defines, and
add comments to try and disentangle the flags vs. values mess for
non-architectural vs. architectural PMUs.

Fixes: ca724305a2 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240109230250.424295-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-30 15:28:02 -08:00
Sean Christopherson 7bb7fce136 KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad index
Apply the pre-intercepts RDPMC validity check only to AMD, and rename all
relevant functions to make it as clear as possible that the check is not a
standard PMC index check.  On Intel, the basic rule is that only invalid
opcodes and privilege/permission/mode checks have priority over VM-Exit,
i.e. RDPMC with an invalid index should VM-Exit, not #GP.  While the SDM
doesn't explicitly call out RDPMC, it _does_ explicitly use RDMSR of a
non-existent MSR as an example where VM-Exit has priority over #GP, and
RDPMC is effectively just a variation of RDMSR.

Manually testing on various Intel CPUs confirms this behavior, and the
inverted priority was introduced for SVM compatibility, i.e. was not an
intentional change for Intel PMUs.  On AMD, *all* exceptions on RDPMC have
priority over VM-Exit.

Check for a NULL kvm_pmu_ops.check_rdpmc_early instead of using a RET0
static call so as to provide a convenient location to document the
difference between Intel and AMD, and to again try to make it as obvious
as possible that the early check is a one-off thing, not a generic "is
this PMC valid?" helper.

Fixes: 8061252ee0 ("KVM: SVM: Add intercept checks for remaining twobyte instructions")
Cc: Jim Mattson <jmattson@google.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240109230250.424295-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-30 15:28:02 -08:00
Sean Christopherson cbbd1aa891 KVM: x86/pmu: Allow programming events that match unsupported arch events
Remove KVM's bogus restriction that the guest can't program an event whose
encoding matches an unsupported architectural event.  The enumeration of
an architectural event only says that if a CPU supports an architectural
event, then the event can be programmed using the architectural encoding.
The enumeration does NOT say anything about the encoding when the CPU
doesn't report support the architectural event.

Preventing the guest from counting events whose encoding happens to match
an architectural event breaks existing functionality whenever Intel adds
an architectural encoding that was *ever* used for a CPU that doesn't
enumerate support for the architectural event, even if the encoding is for
the exact same event!

E.g. the architectural encoding for Top-Down Slots is 0x01a4.  Broadwell
CPUs, which do not support the Top-Down Slots architectural event, 0x01a4
is a valid, model-specific event.  Denying guest usage of 0x01a4 if/when
KVM adds support for Top-Down slots would break any Broadwell-based guest.

Reported-by: Kan Liang <kan.liang@linux.intel.com>
Closes: https://lore.kernel.org/all/2004baa6-b494-462c-a11f-8104ea152c6a@linux.intel.com
Fixes: a21864486f ("KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event")
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240109230250.424295-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-30 15:28:02 -08:00
Sean Christopherson fd89499a51 KVM: x86/pmu: Track emulated counter events instead of previous counter
Explicitly track emulated counter events instead of using the common
counter value that's shared with the hardware counter owned by perf.
Bumping the common counter requires snapshotting the pre-increment value
in order to detect overflow from emulation, and the snapshot approach is
inherently flawed.

Snapshotting the previous counter at every increment assumes that there is
at most one emulated counter event per emulated instruction (or rather,
between checks for KVM_REQ_PMU).  That's mostly holds true today because
KVM only emulates (branch) instructions retired, but the approach will
fall apart if KVM ever supports event types that don't have a 1:1
relationship with instructions.

And KVM already has a relevant bug, as handle_invalid_guest_state()
emulates multiple instructions without checking KVM_REQ_PMU, i.e. could
miss an overflow event due to clobbering pmc->prev_counter.  Not checking
KVM_REQ_PMU is problematic in both cases, but at least with the emulated
counter approach, the resulting behavior is delayed overflow detection,
as opposed to completely lost detection.

Tracking the emulated count fixes another bug where the snapshot approach
can signal spurious overflow due to incorporating both the emulated count
and perf's count in the check, i.e. if overflow is detected by perf, then
KVM's emulation will also incorrectly signal overflow.  Add a comment in
the related code to call out the need to process emulated events *after*
pausing the perf event (big kudos to Mingwei for figuring out that
particular wrinkle).

Cc: Mingwei Zhang <mizhang@google.com>
Cc: Roman Kagan <rkagan@amazon.de>
Cc: Jim Mattson <jmattson@google.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Like Xu <like.xu.linux@gmail.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20231103230541.352265-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:55 -08:00
Sean Christopherson 89acf1237b KVM: x86/pmu: Update sample period in pmc_write_counter()
Update a PMC's sample period in pmc_write_counter() to deduplicate code
across all callers of pmc_write_counter().  Opportunistically move
pmc_write_counter() into pmc.c now that it's doing more work.  WRMSR isn't
such a hot path that an extra CALL+RET pair will be problematic, and the
order of function definitions needs to be changed anyways, i.e. now is a
convenient time to eat the churn.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:55 -08:00
Sean Christopherson ec61b2306d KVM: x86/pmu: Remove manual clearing of fields in kvm_pmu_init()
Remove code that unnecessarily clears event_count and need_cleanup in
kvm_pmu_init(), the entire kvm_pmu is zeroed just a few lines earlier.
Vendor code doesn't set event_count or need_cleanup during .init(), and
if either VMX or SVM did set those fields it would be a flagrant bug.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:54 -08:00
Sean Christopherson f2f63f7ec6 KVM: x86/pmu: Stop calling kvm_pmu_reset() at RESET (it's redundant)
Drop kvm_vcpu_reset()'s call to kvm_pmu_reset(), the call is performed
only for RESET, which is really just the same thing as vCPU creation,
and kvm_arch_vcpu_create() *just* called kvm_pmu_init(), i.e. there can't
possibly be any work to do.

Unlike Intel, AMD's amd_pmu_refresh() does fill all_valid_pmc_idx even if
guest CPUID is empty, but everything that is at all dynamic is guaranteed
to be '0'/NULL, e.g. it should be impossible for KVM to have already
created a perf event.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:54 -08:00
Sean Christopherson 1647b52757 KVM: x86/pmu: Reset the PMU, i.e. stop counters, before refreshing
Stop all counters and release all perf events before refreshing the vPMU,
i.e. before reconfiguring the vPMU to respond to changes in the vCPU
model.

Clear need_cleanup in kvm_pmu_reset() as well so that KVM doesn't
prematurely stop counters, e.g. if KVM enters the guest and enables
counters before the vCPU is scheduled out.

Cc: stable@vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:54 -08:00
Sean Christopherson cbb359d81a KVM: x86/pmu: Move PMU reset logic to common x86 code
Move the common (or at least "ignored") aspects of resetting the vPMU to
common x86 code, along with the stop/release helpers that are no used only
by the common pmu.c.

There is no need to manually handle fixed counters as all_valid_pmc_idx
tracks both fixed and general purpose counters, and resetting the vPMU is
far from a hot path, i.e. the extra bit of overhead to the PMC from the
index is a non-issue.

Zero fixed_ctr_ctrl in common code even though it's Intel specific.
Ensuring it's zero doesn't harm AMD/SVM in any way, and stopping the fixed
counters via all_valid_pmc_idx, but not clearing the associated control
bits, would be odd/confusing.

Make the .reset() hook optional as SVM no longer needs vendor specific
handling.

Cc: stable@vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:54 -08:00
Jim Mattson 73554b29bd KVM: x86/pmu: Synthesize at most one PMI per VM-exit
When the irq_work callback, kvm_pmi_trigger_fn(), is invoked during a
VM-exit that also invokes __kvm_perf_overflow() as a result of
instruction emulation, kvm_pmu_deliver_pmi() will be called twice
before the next VM-entry.

Calling kvm_pmu_deliver_pmi() twice is unlikely to be problematic now that
KVM sets the LVTPC mask bit when delivering a PMI.  But using IRQ work to
trigger the PMI is still broken, albeit very theoretically.

E.g. if the self-IPI to trigger IRQ work is be delayed long enough for the
vCPU to be migrated to a different pCPU, then it's possible for
kvm_pmi_trigger_fn() to race with the kvm_pmu_deliver_pmi() from
KVM_REQ_PMI and still generate two PMIs.

KVM could set the mask bit using an atomic operation, but that'd just be
piling on unnecessary code to workaround what is effectively a hack.  The
*only* reason KVM uses IRQ work is to ensure the PMI is treated as a wake
event, e.g. if the vCPU just executed HLT.

Remove the irq_work callback for synthesizing a PMI, and all of the
logic for invoking it. Instead, to prevent a vcpu from leaving C0 with
a PMI pending, add a check for KVM_REQ_PMI to kvm_vcpu_has_events().

Fixes: 9cd803d496 ("KVM: x86: Update vPMCs when retiring instructions")
Signed-off-by: Jim Mattson <jmattson@google.com>
Tested-by: Mingwei Zhang <mizhang@google.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20230925173448.3518223-2-mizhang@google.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-25 14:42:52 -07:00
Sean Christopherson 6de2ccc169 KVM: x86/pmu: Move .hw_event_available() check out of PMC filter helper
Move the call to kvm_x86_pmu.hw_event_available(), which has nothing to
with the userspace PMU filter, out of check_pmu_event_filter() and into
its sole caller pmc_event_is_allowed().  pmc_event_is_allowed() didn't
exist when commit 7aadaa988c ("KVM: x86/pmu: Drop amd_event_mapping[]
in the KVM context"), so presumably the motivation for invoking
.hw_event_available() from check_pmu_event_filter() was to avoid having
to add multiple call sites.

Link: https://lore.kernel.org/r/20230607010206.1425277-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02 16:44:36 -07:00
Like Xu 4a2771895c KVM: x86/svm/pmu: Add AMD PerfMonV2 support
If AMD Performance Monitoring Version 2 (PerfMonV2) is detected by
the guest, it can use a new scheme to manage the Core PMCs using the
new global control and status registers.

In addition to benefiting from the PerfMonV2 functionality in the same
way as the host (higher precision), the guest also can reduce the number
of vm-exits by lowering the total number of MSRs accesses.

In terms of implementation details, amd_is_valid_msr() is resurrected
since three newly added MSRs could not be mapped to one vPMC.
The possibility of emulating PerfMonV2 on the mainframe has also
been eliminated for reasons of precision.

Co-developed-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Like Xu <likexu@tencent.com>
[sean: drop "Based on the observed HW." comments]
Link: https://lore.kernel.org/r/20230603011058.1038821-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-06 17:31:44 -07:00
Like Xu 13afa29ae4 KVM: x86/pmu: Provide Intel PMU's pmc_is_enabled() as generic x86 code
Move the Intel PMU implementation of pmc_is_enabled() to common x86 code
as pmc_is_globally_enabled(), and drop AMD's implementation.  AMD PMU
currently supports only v1, and thus not PERF_GLOBAL_CONTROL, thus the
semantics for AMD are unchanged.  And when support for AMD PMU v2 comes
along, the common behavior will also Just Work.

Signed-off-by: Like Xu <likexu@tencent.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230603011058.1038821-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-06 17:31:44 -07:00
Like Xu c85cdc1cc1 KVM: x86/pmu: Move handling PERF_GLOBAL_CTRL and friends to common x86
Move the handling of GLOBAL_CTRL, GLOBAL_STATUS, and GLOBAL_OVF_CTRL,
a.k.a. GLOBAL_STATUS_RESET, from Intel PMU code to generic x86 PMU code.
AMD PerfMonV2 defines three registers that have the same semantics as
Intel's variants, just with different names and indices.  Conveniently,
since KVM virtualizes GLOBAL_CTRL on Intel only for PMU v2 and above, and
AMD's version shows up in v2, KVM can use common code for the existence
check as well.

Signed-off-by: Like Xu <likexu@tencent.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230603011058.1038821-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-06 17:31:44 -07:00
Paolo Bonzini 48b1893ae3 KVM x86 PMU changes for 6.4:
- Disallow virtualizing legacy LBRs if architectural LBRs are available,
    the two are mutually exclusive in hardware
 
  - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
    after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better
    validate PERF_CAPABILITIES
 
  - Apply PMU filters to emulated events and add test coverage to the
    pmu_event_filter selftest
 
  - Misc cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmRGtd4SHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5Z9kP/i3WZ40hevvQvB/5cEpxxmxYDwCYnnjM
 hiQgK5jT4SrMTmVjLgkNdI2PogQoS4CX+GC7lcA9bvse84hjuPvgOflb2B+p2UQi
 Ytbr9g/tfKNIpnKIk9mcPcSObN9vm2Kgt7n28rtPrHWj89eQzgc66eijqdpKBLxA
 c3crVR8krwYAQK0tmzHq1+H6hB369YbHAHyTTRRI/bNWnqKblnvUbt0NL2aBusa9
 rNMaOdRtinLpy2dmuX/b3japRB8QTnlf7zpPIF4cBEhbYXy5woClZpf1D2fCA6Er
 XFbEoYawMVd9UeJYbW4z5yErLT83eYoGp4U0eFXWp6fvh8nZlgCGvBKE9g4mmqwj
 aSLaTR5eVN2qlw6jXVeg3unCo8Eyl36AwYwve2L6sFmBvZvNV5iz2eQ7rrOe4oE3
 dnTUaLQ8I2SVg04MbYmCq5W+frTL/I7kqNpbccL1Z3R5WO4y5gz63mug6NfLIvhR
 t45TAIaifxBfcXQsBZM3v2KUK/xQrD3AbJmFKh54L2CKqiGaNWsMLX+6NZ7LZWgf
 8rEqsVkkQDgF7z8eXai4TR26nYfSX6g9gDqtOH73L87aJ7PJk5cRoDWQ1sWs1e/l
 4HA/L0Bo/3pnKAa0ZWxJOixmzqY49gNQf3dj8gt3jk3y2ijbAivshiSpPBmIxn0u
 QLeOf/LGvipl
 =m18F
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-pmu-6.4' of https://github.com/kvm-x86/linux into HEAD

KVM x86 PMU changes for 6.4:

 - Disallow virtualizing legacy LBRs if architectural LBRs are available,
   the two are mutually exclusive in hardware

 - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
   after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better
   validate PERF_CAPABILITIES

 - Apply PMU filters to emulated events and add test coverage to the
   pmu_event_filter selftest

 - Misc cleanups and fixes
2023-04-26 15:53:36 -04:00
Aaron Lewis dfdeda67ea KVM: x86/pmu: Prevent the PMU from counting disallowed events
When counting "Instructions Retired" (0xc0) in a guest, KVM will
occasionally increment the PMU counter regardless of if that event is
being filtered. This is because some PMU events are incremented via
kvm_pmu_trigger_event(), which doesn't know about the event filter. Add
the event filter to kvm_pmu_trigger_event(), so events that are
disallowed do not increment their counters.

Fixes: 9cd803d496 ("KVM: x86: Update vPMCs when retiring instructions")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20230307141400.1486314-2-aaronlewis@google.com
[sean: prepend "pmc" to the new function]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07 09:24:16 -07:00
Like Xu 4fa5843d81 KVM: x86/pmu: Fix a typo in kvm_pmu_request_counter_reprogam()
Fix a "reprogam" => "reprogram" typo in kvm_pmu_request_counter_reprogam().

Fixes: 68fb4757e8 ("KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()")
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20230310113349.31799-1-likexu@tencent.com
[sean: trim the changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07 09:07:41 -07:00
Like Xu cdd2fbf636 KVM: x86/pmu: Rename pmc_is_enabled() to pmc_is_globally_enabled()
The name of function pmc_is_enabled() is a bit misleading. A PMC can
be disabled either by PERF_CLOBAL_CTRL or by its corresponding EVTSEL.
Append global semantics to its name.

Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20230214050757.9623-2-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06 15:57:19 -07:00
Sean Christopherson 3a6de51a43 KVM: x86/pmu: WARN and bug the VM if PMU is refreshed after vCPU has run
Now that KVM disallows changing feature MSRs, i.e. PERF_CAPABILITIES,
after running a vCPU, WARN and bug the VM if the PMU is refreshed after
the vCPU has run.

Note, KVM has disallowed CPUID updates after running a vCPU since commit
feb627e8d6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN"), i.e.
PERF_CAPABILITIES was the only remaining way to trigger a PMU refresh
after KVM_RUN.

Cc: Like Xu <like.xu.linux@gmail.com>
Link: https://lore.kernel.org/r/20230311004618.920745-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06 14:58:43 -07:00
Like Xu 7e768ce827 KVM: x86/pmu: Zero out pmu->all_valid_pmc_idx each time it's refreshed
The kvm_pmu_refresh() may be called repeatedly (e.g. configure guest
CPUID repeatedly or update MSR_IA32_PERF_CAPABILITIES) and each
call will use the last pmu->all_valid_pmc_idx value, with the residual
bits introducing additional overhead later in the vPMU emulation.

Fixes: b35e5548b4 ("KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC")
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20230404071759.75376-1-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-05 16:33:10 -07:00
Binbin Wu 607475cfa0 KVM: x86: Add helpers to query individual CR0/CR4 bits
Add helpers to check if a specific CR0/CR4 bit is set to avoid a plethora
of implicit casts from the "unsigned long" return of kvm_read_cr*_bits(),
and to make each caller's intent more obvious.

Defer converting helpers that do truly ugly casts from "unsigned long" to
"int", e.g. is_pse(), to a future commit so that their conversion is more
isolated.

Opportunistically drop the superfluous pcid_enabled from kvm_set_cr3();
the local variable is used only once, immediately after its declaration.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20230322045824.22970-2-binbin.wu@linux.intel.com
[sean: move "obvious" conversions to this commit, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22 10:10:53 -07:00
Mathias Krause 74cdc83691 KVM: x86: Make use of kvm_read_cr*_bits() when testing bits
Make use of the kvm_read_cr{0,4}_bits() helper functions when we only
want to know the state of certain bits instead of the whole register.

This not only makes the intent cleaner, it also avoids a potential
VMREAD in case the tested bits aren't guest owned.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20230322013731.102955-5-minipli@grsecurity.net
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22 07:47:25 -07:00
Paolo Bonzini 157ed9cb04 KVM x86 PMU changes for 6.3:
- Add support for created masked events for the PMU filter to allow
    userspace to heavily restrict what events the guest can use without
    needing to create an absurd number of events
 
  - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
    support is disabled
 
  - Add PEBS support for Intel SPR
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmPsFZ4SHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5eKEP/0qeZsOQot53wkf+wNiGh1X6qDacBPFP
 A8GAPC70fEisxAt776DeKEBwikHpARPglCt1Il9dFvkG+0jgYpvPu8UGF1LpouKX
 cD/7itr2k8GZlXZBg2Rgu3TRyFBJEGHT6tAu7PBhZyL6yWQDUxao8FPFrRGfmJ7O
 Z6eFMo1cERNHICQm+W/2TBd1xguiF+m4CXKlA70R4wzM37aPF9o5HvmIwAvPzyhU
 w4WzcIQbjVPs1VpBTzwPqRmyZ8omSlDYo7VqmsDiRtJbucqgbhFI2wR+nyImFCa9
 D2pI5TV3CFTt0fvd8SZpH19nR3S6cMLCXONOsijmvR2BmS3PhJMP4dMm5m4R06nF
 RBtnTj9fkbeL1ghFEkMxHBZVTG3bBlO4ySOxIqNHCvPjqQ37mJ+xP4C8kcIC9p5F
 +xL3AvZ7zenPv3A29SY9YH+QvZLBwyDJzAsveLeYkLFoJxoDT4glOY/Wpi1rkZ17
 /zHDZWoF49l1Eu3Bql0hFetkCreUNFGpa4moUmEC0evYOvV2WCb+39TDXZ8CPCGD
 +cDiRnD8MFQpBw47F03EnFheFHxiJoL0Clv5vvM3C+xOq2J9WVG9mqQWCk+4ta2B
 Um4D++0a9lwvJhOImaR7uyiV3K7oVm+rU8+46x+nTNGaIP2bnE+vronY+b6KGeUx
 7+xzTKlYygGe
 =ev5v
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-pmu-6.3' of https://github.com/kvm-x86/linux into HEAD

KVM x86 PMU changes for 6.3:

 - Add support for created masked events for the PMU filter to allow
   userspace to heavily restrict what events the guest can use without
   needing to create an absurd number of events

 - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
   support is disabled

 - Add PEBS support for Intel SPR
2023-02-15 08:23:24 -05:00
Michal Luczaj 95744a90db KVM: x86: Optimize kvm->lock and SRCU interaction (KVM_SET_PMU_EVENT_FILTER)
Reduce time spent holding kvm->lock: unlock mutex before calling
synchronize_srcu_expedited().  There is no need to hold kvm->lock until
all vCPUs have been kicked, KVM only needs to guarantee that all vCPUs
will switch to the new filter before exiting to userspace.  Protecting
the write to __reprogram_pmi is also unnecessary as a vCPU may process
a set bit before receiving the final KVM_REQ_PMU, but the per-vCPU writes
are guaranteed to occur after all vCPUs have switched to the new filter.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230107001256.2365304-2-mhal@rbox.co
[sean: expand changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-03 15:19:22 -08:00
Like Xu 974850be01 KVM: x86/pmu: Add PRIR++ and PDist support for SPR and later models
The pebs capability on the SPR is basically the same as Ice Lake Server
with the exception of two special facilities that have been enhanced and
require special handling.

Upon triggering a PEBS assist, there will be a finite delay between the
time the counter overflows and when the microcode starts to carry out
its data collection obligations. Even if the delay is constant in core
clock space, it invariably manifest as variable "skids" in instruction
address space.

On the Ice Lake Server, the Precise Distribution of Instructions Retire
(PDIR) facility mitigates the "skid" problem by providing an early
indication of when the counter is about to overflow. On SPR, the PDIR
counter available (Fixed 0) is unchanged, but the capability is enhanced
to Instruction-Accurate PDIR (PDIR++), where PEBS is taken on the
next instruction after the one that caused the overflow.

SPR also introduces a new Precise Distribution (PDist) facility only on
general programmable counter 0. Per Intel SDM, PDist eliminates any
skid or shadowing effects from PEBS. With PDist, the PEBS record will
be generated precisely upon completion of the instruction or operation
that causes the counter to overflow (there is no "wait for next occurrence"
by default).

In terms of KVM handling, when guest accesses those special counters,
the KVM needs to request the same index counters via the perf_event
kernel subsystem to ensure that the guest uses the correct pebs hardware
counter (PRIR++ or PDist). This is mainly achieved by adjusting the
event precise level to the maximum, where the semantics of this magic
number is mainly defined by the internal software context of perf_event
and it's also backwards compatible as part of the user space interface.

Opportunistically, refine confusing comments on TNT+, as the only
ones that currently support pebs_ept are Ice Lake server and SPR (GLC+).

Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20221109082802.27543-3-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-01 16:42:36 -08:00
Aaron Lewis 14329b825f KVM: x86/pmu: Introduce masked events to the pmu event filter
When building a list of filter events, it can sometimes be a challenge
to fit all the events needed to adequately restrict the guest into the
limited space available in the pmu event filter.  This stems from the
fact that the pmu event filter requires each event (i.e. event select +
unit mask) be listed, when the intention might be to restrict the
event select all together, regardless of it's unit mask.  Instead of
increasing the number of filter events in the pmu event filter, add a
new encoding that is able to do a more generalized match on the unit mask.

Introduce masked events as another encoding the pmu event filter
understands.  Masked events has the fields: mask, match, and exclude.
When filtering based on these events, the mask is applied to the guest's
unit mask to see if it matches the match value (i.e. umask & mask ==
match).  The exclude bit can then be used to exclude events from that
match.  E.g. for a given event select, if it's easier to say which unit
mask values shouldn't be filtered, a masked event can be set up to match
all possible unit mask values, then another masked event can be set up to
match the unit mask values that shouldn't be filtered.

Userspace can query to see if this feature exists by looking for the
capability, KVM_CAP_PMU_EVENT_MASKED_EVENTS.

This feature is enabled by setting the flags field in the pmu event
filter to KVM_PMU_EVENT_FLAG_MASKED_EVENTS.

Events can be encoded by using KVM_PMU_ENCODE_MASKED_ENTRY().

It is an error to have a bit set outside the valid bits for a masked
event, and calls to KVM_SET_PMU_EVENT_FILTER will return -EINVAL in
such cases, including the high bits of the event select (35:32) if
called on Intel.

With these updates the filter matching code has been updated to match on
a common event.  Masked events were flexible enough to handle both event
types, so they were used as the common event.  This changes how guest
events get filtered because regardless of the type of event used in the
uAPI, they will be converted to masked events.  Because of this there
could be a slight performance hit because instead of matching the filter
event with a lookup on event select + unit mask, it does a lookup on event
select then walks the unit masks to find the match.  This shouldn't be a
big problem because I would expect the set of common event selects to be
small, and if they aren't the set can likely be reduced by using masked
events to generalize the unit mask.  Using one type of event when
filtering guest events allows for a common code path to be used.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-5-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:06:12 -08:00
Aaron Lewis c5a287fa0d KVM: x86/pmu: prepare the pmu event filter for masked events
Refactor check_pmu_event_filter() in preparation for masked events.

No functional changes intended

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-4-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:06:11 -08:00
Aaron Lewis 8589827fd5 KVM: x86/pmu: Remove impossible events from the pmu event filter
If it's not possible for an event in the pmu event filter to match a
pmu event being programmed by the guest, it's pointless to have it in
the list.  Opt for a shorter list by removing those events.

Because this is established uAPI the pmu event filter can't outright
rejected these events as garbage and return an error.  Instead, play
nice and remove them from the list.

Also, opportunistically rewrite the comment when the filter is set to
clarify that it guards against *all* TOCTOU attacks on the verified
data.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-3-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:06:11 -08:00
Aaron Lewis 6a5cba7bed KVM: x86/pmu: Correct the mask used in a pmu event filter lookup
When checking if a pmu event the guest is attempting to program should
be filtered, only consider the event select + unit mask in that
decision. Use an architecture specific mask to mask out all other bits,
including bits 35:32 on Intel.  Those bits are not part of the event
select and should not be considered in that decision.

Fixes: 66bb8a065f ("KVM: x86: PMU Event Filter")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-2-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:06:10 -08:00
Sean Christopherson 8d20bd6381 KVM: x86: Unify pr_fmt to use module name for all KVM modules
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code.  In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.

Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.

Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.

Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl().  But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20221130230934.1014142-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:47:35 -05:00
Like Xu 55c590adfe KVM: x86/pmu: Prevent zero period event from being repeatedly released
The current vPMU can reuse the same pmc->perf_event for the same
hardware event via pmc_pause/resume_counter(), but this optimization
does not apply to a portion of the TSX events (e.g., "event=0x3c,in_tx=1,
in_tx_cp=1"), where event->attr.sample_period is legally zero at creation,
thus making the perf call to perf_event_period() meaningless (no need to
adjust sample period in this case), and instead causing such reusable
perf_events to be repeatedly released and created.

Avoid releasing zero sample_period events by checking is_sampling_event()
to follow the previously enable/disable optimization.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20221207071506.15733-2-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-23 12:06:45 -05:00
Like Xu de0f619564 KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counter
Defer reprogramming counters and handling overflow via KVM_REQ_PMU
when incrementing counters.  KVM skips emulated WRMSR in the VM-Exit
fastpath, the fastpath runs with IRQs disabled, skipping instructions
can increment and reprogram counters, reprogramming counters can
sleep, and sleeping is disallowed while IRQs are disabled.

 [*] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
 [*] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2981888, name: CPU 15/KVM
 [*] preempt_count: 1, expected: 0
 [*] RCU nest depth: 0, expected: 0
 [*] INFO: lockdep is turned off.
 [*] irq event stamp: 0
 [*] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
 [*] hardirqs last disabled at (0): [<ffffffff8121222a>] copy_process+0x146a/0x62d0
 [*] softirqs last  enabled at (0): [<ffffffff81212269>] copy_process+0x14a9/0x62d0
 [*] softirqs last disabled at (0): [<0000000000000000>] 0x0
 [*] Preemption disabled at:
 [*] [<ffffffffc2063fc1>] vcpu_enter_guest+0x1001/0x3dc0 [kvm]
 [*] CPU: 17 PID: 2981888 Comm: CPU 15/KVM Kdump: 5.19.0-rc1-g239111db364c-dirty #2
 [*] Call Trace:
 [*]  <TASK>
 [*]  dump_stack_lvl+0x6c/0x9b
 [*]  __might_resched.cold+0x22e/0x297
 [*]  __mutex_lock+0xc0/0x23b0
 [*]  perf_event_ctx_lock_nested+0x18f/0x340
 [*]  perf_event_pause+0x1a/0x110
 [*]  reprogram_counter+0x2af/0x1490 [kvm]
 [*]  kvm_pmu_trigger_event+0x429/0x950 [kvm]
 [*]  kvm_skip_emulated_instruction+0x48/0x90 [kvm]
 [*]  handle_fastpath_set_msr_irqoff+0x349/0x3b0 [kvm]
 [*]  vmx_vcpu_run+0x268e/0x3b80 [kvm_intel]
 [*]  vcpu_enter_guest+0x1d22/0x3dc0 [kvm]

Add a field to kvm_pmc to track the previous counter value in order
to defer overflow detection to kvm_pmu_handle_event() (the counter must
be paused before handling overflow, and that may increment the counter).

Opportunistically shrink sizeof(struct kvm_pmc) a bit.

Suggested-by: Wanpeng Li <wanpengli@tencent.com>
Fixes: 9cd803d496 ("KVM: x86: Update vPMCs when retiring instructions")
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20220831085328.45489-6-likexu@tencent.com
[sean: avoid re-triggering KVM_REQ_PMU on overflow, tweak changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220923001355.3741194-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09 12:31:36 -05:00
Like Xu 68fb4757e8 KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()
Batch reprogramming PMU counters by setting KVM_REQ_PMU and thus
deferring reprogramming kvm_pmu_handle_event() to avoid reprogramming
a counter multiple times during a single VM-Exit.

Deferring programming will also allow KVM to fix a bug where immediately
reprogramming a counter can result in sleeping (taking a mutex) while
interrupts are disabled in the VM-Exit fastpath.

Introduce kvm_pmu_request_counter_reprogam() to make it obvious that
KVM is _requesting_ a reprogram and not actually doing the reprogram.

Opportunistically refine related comments to avoid misunderstandings.

Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20220831085328.45489-5-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220923001355.3741194-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09 12:31:36 -05:00
Sean Christopherson dcbb816a28 KVM: x86/pmu: Clear "reprogram" bit if counter is disabled or disallowed
When reprogramming a counter, clear the counter's "reprogram pending" bit
if the counter is disabled (by the guest) or is disallowed (by the
userspace filter).  In both cases, there's no need to re-attempt
programming on the next coincident KVM_REQ_PMU as enabling the counter by
either method will trigger reprogramming.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220923001355.3741194-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09 12:31:35 -05:00
Sean Christopherson f1c5651fda KVM: x86/pmu: Force reprogramming of all counters on PMU filter change
Force vCPUs to reprogram all counters on a PMU filter change to provide
a sane ABI for userspace.  Use the existing KVM_REQ_PMU to do the
programming, and take advantage of the fact that the reprogram_pmi bitmap
fits in a u64 to set all bits in a single atomic update.  Note, setting
the bitmap and making the request needs to be done _after_ the SRCU
synchronization to ensure that vCPUs will reprogram using the new filter.

KVM's current "lazy" approach is confusing and non-deterministic.  It's
confusing because, from a developer perspective, the code is buggy as it
makes zero sense to let userspace modify the filter but then not actually
enforce the new filter.  The lazy approach is non-deterministic because
KVM enforces the filter whenever a counter is reprogrammed, not just on
guest WRMSRs, i.e. a guest might gain/lose access to an event at random
times depending on what is going on in the host.

Note, the resulting behavior is still non-determinstic while the filter
is in flux.  If userspace wants to guarantee deterministic behavior, all
vCPUs should be paused during the filter update.

Jim Mattson <jmattson@google.com>

Fixes: 66bb8a065f ("KVM: x86: PMU Event Filter")
Cc: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220923001355.3741194-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09 12:31:35 -05:00
Like Xu 4f1fa2a1bb KVM: x86/pmu: Limit the maximum number of supported Intel GP counters
The Intel Architectural IA32_PMCx MSRs addresses range allows for a
maximum of 8 GP counters, and KVM cannot address any more.  Introduce a
local macro (named KVM_INTEL_PMC_MAX_GENERIC) and use it consistently to
refer to the number of counters supported by KVM, thus avoiding possible
out-of-bound accesses.

Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220919091008.60695-2-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09 12:26:53 -05:00
Like Xu cf52de619c KVM: x86/pmu: Avoid using PEBS perf_events for normal counters
The check logic in the pmc_resume_counter() to determine whether
a perf_event is reusable is partial and flawed, especially when it
comes to a pseudocode sequence (contrived, but valid) like:

  - enabling a counter and its PEBS bit
  - enable global_ctrl
  - run workload
  - disable only the PEBS bit, leaving the global_ctrl bit enabled

In this corner case, a perf_event created for PEBS can be reused by
a normal counter before it has been released and recreated, and when this
normal counter overflows, it triggers a PEBS interrupt (precise_ip != 0).

To address this issue, reprogram all affected counters when PEBS_ENABLE
change and reuse a counter if and only if PEBS exactly matches precise.

Fixes: 79f3e3b583 ("KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter")
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20220831085328.45489-4-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28 12:47:22 -07:00
Like Xu f331601c65 KVM: x86/pmu: Don't generate PEBS records for emulated instructions
KVM will accumulate an enabled counter for at least INSTRUCTIONS or
BRANCH_INSTRUCTION hw event from any KVM emulated instructions,
generating emulated overflow interrupt on counter overflow, which
in theory should also happen when the PEBS counter overflows but
it currently lacks this part of the underlying support (e.g. through
software injection of records in the irq context or a lazy approach).

In this case, KVM skips the injection of this BUFFER_OVF PMI (effectively
dropping one PEBS record) and let the overflow counter move on. The loss
of a single sample does not introduce a loss of accuracy, but is easily
noticeable for certain specific instructions.

This issue is expected to be addressed along with the issue
of PEBS cross-mapped counters with a slow-path proposal.

Fixes: 79f3e3b583 ("KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter")
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20220831085328.45489-3-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28 12:47:21 -07:00
Sean Christopherson 545feb96c0 Revert "KVM: x86: always allow host-initiated writes to PMU MSRs"
Revert the hack to allow host-initiated accesses to all "PMU" MSRs,
as intel_is_valid_msr() returns true for _all_ MSRs, regardless of whether
or not it has a snowball's chance in hell of actually being a PMU MSR.

That mostly gets papered over by the actual get/set helpers only handling
MSRs that they knows about, except there's the minor detail that
kvm_pmu_{g,s}et_msr() eat reads and writes when the PMU is disabled.
I.e. KVM will happy allow reads and writes to _any_ MSR if the PMU is
disabled, either via module param or capability.

This reverts commit d1c88a4020.

Fixes: d1c88a4020 ("KVM: x86: always allow host-initiated writes to PMU MSRs")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220611005755.753273-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20 11:49:46 -04:00
Sean Christopherson 5d4283df5a Revert "KVM: x86/pmu: Accept 0 for absent PMU MSRs when host-initiated if !enable_pmu"
Eating reads and writes to all "PMU" MSRs when there is no PMU is wildly
broken as it results in allowing accesses to _any_ MSR on Intel CPUs
as intel_is_valid_msr() returns true for all host_initiated accesses.

A revert of commit d1c88a4020 ("KVM: x86: always allow host-initiated
writes to PMU MSRs") will soon follow.

This reverts commit 8e6a58e28b.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220611005755.753273-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20 11:49:35 -04:00