linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-31 16:38:12 +00:00

Author	SHA1	Message	Date
Jing Liu	61f208134a	kvm: x86: Disable RDMSR interception of IA32_XFD_ERR This saves one unnecessary VM-exit in guest #NM handler, given that the MSR is already restored with the guest value before the guest is resumed. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-15-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-14 13:44:33 -05:00
Jing Liu	548e83650a	kvm: x86: Emulate IA32_XFD_ERR for guest Emulate read/write to IA32_XFD_ERR MSR. Only the saved value in the guest_fpu container is touched in the emulation handler. Actual MSR update is handled right before entering the guest (with preemption disabled) Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Zeng Guang <guang.zeng@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-14-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-14 13:44:26 -05:00
Jing Liu	ec5be88ab2	kvm: x86: Intercept #NM for saving IA32_XFD_ERR Guest IA32_XFD_ERR is generally modified in two places: - Set by CPU when #NM is triggered; - Cleared by guest in its #NM handler; Intercept #NM for the first case when a nonzero value is written to IA32_XFD. Nonzero indicates that the guest is willing to do dynamic fpstate expansion for certain xfeatures, thus KVM needs to manage and virtualize guest XFD_ERR properly. The vcpu exception bitmap is updated in XFD write emulation according to guest_fpu::xfd. Save the current XFD_ERR value to the guest_fpu container in the #NM VM-exit handler. This must be done with interrupt disabled, otherwise the unsaved MSR value may be clobbered by host activity. The saving operation is conducted conditionally only when guest_fpu:xfd includes a non-zero value. Doing so also avoids misread on a platform which doesn't support XFD but #NM is triggered due to L1 interception. Queueing #NM to the guest is postponed to handle_exception_nmi(). This goes through the nested_vmx check so a virtual vmexit is queued instead when #NM is triggered in L2 but L1 wants to intercept it. Restore the host value (always ZERO outside of the host #NM handler) before enabling interrupt. Restore the guest value from the guest_fpu container right before entering the guest (with interrupt disabled). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-13-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-14 13:44:18 -05:00
Jing Liu	1df4fd834e	x86/fpu: Prepare xfd_err in struct fpu_guest When XFD causes an instruction to generate #NM, IA32_XFD_ERR contains information about which disabled state components are being accessed. The #NM handler is expected to check this information and then enable the state components by clearing IA32_XFD for the faulting task (if having permission). If the XFD_ERR value generated in guest is consumed/clobbered by the host before the guest itself doing so, it may lead to non-XFD-related #NM treated as XFD #NM in host (due to non-zero value in XFD_ERR), or XFD-related #NM treated as non-XFD #NM in guest (XFD_ERR cleared by the host #NM handler). Introduce a new field in fpu_guest to save the guest xfd_err value. KVM is expected to save guest xfd_err before interrupt is enabled and restore it right before entering the guest (with interrupt disabled). Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-12-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-14 13:44:08 -05:00
Jing Liu	820a6ee944	kvm: x86: Add emulation for IA32_XFD Intel's eXtended Feature Disable (XFD) feature allows the software to dynamically adjust fpstate buffer size for XSAVE features which have large state. Because guest fpstate has been expanded for all possible dynamic xstates at KVM_SET_CPUID2, emulation of the IA32_XFD MSR is straightforward. For write just call fpu_update_guest_xfd() to update the guest fpu container once all the sanity checks are passed. For read simply return the cached value in the container. Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Zeng Guang <guang.zeng@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-11-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-14 13:43:29 -05:00
Kevin Tian	8eb9a48ac1	x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation Guest XFD can be updated either in the emulation path or in the restore path. Provide a wrapper to update guest_fpu::fpstate::xfd. If the guest fpstate is currently in-use, also update the per-cpu xfd cache and the actual MSR. Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-10-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-14 13:43:22 -05:00
Jing Liu	5ab2f45bba	kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2 KVM can request fpstate expansion in two approaches: 1) When intercepting guest updates to XCR0 and XFD MSR; 2) Before vcpu runs (e.g. at KVM_SET_CPUID2); The first option doesn't waste memory for legacy guest if it doesn't support XFD. However doing so introduces more complexity and also imposes an order requirement in the restoring path, i.e. XCR0/XFD must be restored before XSTATE. Given that the agreement is to do the static approach. This is considered a better tradeoff though it does waste 8K memory for legacy guest if its CPUID includes dynamically-enabled xfeatures. Successful fpstate expansion requires userspace VMM to acquire guest xstate permissions before calling KVM_SET_CPUID2. Also take the chance to adjust the indent in kvm_set_cpuid(). Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-9-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-14 13:43:21 -05:00
Sean Christopherson	0781d60f65	x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM Provide a wrapper for expanding the guest fpstate buffer according to requested xfeatures. KVM wants to call this wrapper to manage any dynamic xstate used by the guest. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220105123532.12586-8-yang.zhong@intel.com> [Remove unnecessary 32-bit check. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-14 13:43:21 -05:00
Thomas Gleixner	c270ce393d	x86/fpu: Add guest support to xfd_enable_feature() Guest support for dynamically enabled FPU features requires a few modifications to the enablement function which is currently invoked from the #NM handler: 1) Use guest permissions and sizes for the update 2) Update fpu_guest state accordingly 3) Take into account that the enabling can be triggered either from a running guest via XSETBV and MSR_IA32_XFD write emulation or from a guest restore. In the latter case the guests fpstate is not the current tasks active fpstate. Split the function and implement the guest mechanics throughout the callchain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-7-yang.zhong@intel.com> [Add 32-bit stub for __xfd_enable_feature. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-14 13:43:11 -05:00
Jing Liu	b0237dad2d	x86/fpu: Make XFD initialization in __fpstate_reset() a function argument vCPU threads are different from native tasks regarding to the initial XFD value. While all native tasks follow a fixed value (init_fpstate::xfd) established by the FPU core at boot, vCPU threads need to obey the reset value (i.e. ZERO) defined by the specification, to meet the expectation of the guest. Let the caller supply an argument and adjust the host and guest related invocations accordingly. Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-6-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-14 13:40:57 -05:00
Jing Liu	445ecdf79b	kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID KVM_GET_SUPPORTED_CPUID should not include any dynamic xstates in CPUID[0xD] if they have not been requested with prctl. Otherwise a process which directly passes KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID2 would now fail even if it doesn't intend to use a dynamically enabled feature. Userspace must know that prctl is required and allocate >4K xstate buffer before setting any dynamic bit. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-5-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 13:33:04 -05:00
Jing Liu	cc04b6a21d	kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule CPUID.0xD.1.EBX enumerates the size of the XSAVE area (in compacted format) required by XSAVES. If CPUID.0xD.i.ECX[1] is set for a state component (i), this state component should be located on the next 64-bytes boundary following the preceding state component in the compacted layout. Fix xstate_required_size() to follow the alignment rule. AMX is the first state component with 64-bytes alignment to catch this bug. Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-4-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 13:33:04 -05:00
Thomas Gleixner	36487e6228	x86/fpu: Prepare guest FPU for dynamically enabled FPU features To support dynamically enabled FPU features for guests prepare the guest pseudo FPU container to keep track of the currently enabled xfeatures and the guest permissions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-3-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 13:33:03 -05:00
Thomas Gleixner	980fe2fddc	x86/fpu: Extend fpu_xstate_prctl() with guest permissions KVM requires a clear separation of host user space and guest permissions for dynamic XSTATE components. Add a guest permissions member to struct fpu and a separate set of prctl() arguments: ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM. The semantics are equivalent to the host user space permission control except for the following constraints: 1) Permissions have to be requested before the first vCPU is created 2) Permissions are frozen when the first vCPU is created to ensure consistency. Any attempt to expand permissions via the prctl() after that point is rejected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-2-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 13:33:03 -05:00
Michael Roth	96c1a62855	kvm: selftests: move ucall declarations into ucall_common.h Now that core kvm_util declarations have special home in kvm_util_base.h, move ucall-related declarations out into a separate header. Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20211210164620.11636-3-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 13:25:33 -05:00
Michael Roth	7d9a662ed9	kvm: selftests: move base kvm_util.h declarations to kvm_util_base.h Between helper macros and interfaces that will be introduced in subsequent patches, much of kvm_util.h would end up being declarations specific to ucall. Ideally these could be separated out into a separate header since they are not strictly required for writing guest tests and are mostly self-contained interfaces other than a reliance on a few core declarations like struct kvm_vm. This doesn't make a big difference as far as how tests will be compiled/written since all these interfaces will still be packaged up into a single/common libkvm.a used by all tests, but it is still nice to be able to compartmentalize to improve readabilty and reduce merge conflicts in the future for common tasks like adding new interfaces to kvm_util.h. Furthermore, some of the ucall declarations will be arch-specific, requiring various #ifdef'ery in kvm_util.h. Ideally these declarations could live in separate arch-specific headers, e.g. include/<arch>/ucall.h, which would handle arch-specific declarations as well as pulling in common ucall-related declarations shared by all archs. One simple way to do this would be to #include ucall.h at the bottom of kvm_util.h, after declarations it relies upon like struct kvm_vm. This is brittle however, and doesn't scale easily to other sets of interfaces that may be added in the future. Instead, move all declarations currently in kvm_util.h into kvm_util_base.h, then have kvm_util.h #include it. With this change, non-base declarations can be selectively moved/introduced into separate headers, which can then be included in kvm_util.h so that individual tests don't need to be touched. Subsequent patches will then move ucall-related declarations into a separate header to meet the above goals. Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20211210164620.11636-2-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 13:25:33 -05:00
Michael Roth	405329fc9a	KVM: SVM: include CR3 in initial VMSA state for SEV-ES guests Normally guests will set up CR3 themselves, but some guests, such as kselftests, and potentially CONFIG_PVH guests, rely on being booted with paging enabled and CR3 initialized to a pre-allocated page table. Currently CR3 updates via KVM_SET_SREGS* are not loaded into the guest VMCB until just prior to entering the guest. For SEV-ES/SEV-SNP, this is too late, since it will have switched over to using the VMSA page prior to that point, with the VMSA CR3 copied from the VMCB initial CR3 value: 0. Address this by sync'ing the CR3 value into the VMCB save area immediately when KVM_SET_SREGS* is issued so it will find it's way into the initial VMSA. Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20211216171358.61140-10-michael.roth@amd.com> [Remove vmx_post_set_cr3; add a remark about kvm_set_cr3 not calling the new hook. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:46 -05:00
Peter Zijlstra	907d139318	KVM: VMX: Provide vmread version using asm-goto-with-outputs Use asm-goto-output for smaller fast path code. Message-Id: <YbcbbGW2GcMx6KpD@hirez.programming.kicks-ass.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:46 -05:00
David Woodhouse	55749769fe	KVM: x86: Fix wall clock writes in Xen shared_info not to mark page dirty When dirty ring logging is enabled, any dirty logging without an active vCPU context will cause a kernel oops. But we've already declared that the shared_info page doesn't get dirty tracking anyway, since it would be kind of insane to mark it dirty every time we deliver an event channel interrupt. Userspace is supposed to just assume it's always dirty any time a vCPU can run or event channels are routed. So stop using the generic kvm_write_wall_clock() and just write directly through the gfn_to_pfn_cache that we already have set up. We can make kvm_write_wall_clock() static in x86.c again now, but let's not remove the 'sec_hi_ofs' argument even though it's not used yet. At some point we will want to use that for KVM guests too. Fixes: `629b534884` ("KVM: x86/xen: update wallclock region") Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-6-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:45 -05:00
David Woodhouse	14243b3871	KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery This adds basic support for delivering 2 level event channels to a guest. Initially, it only supports delivery via the IRQ routing table, triggered by an eventfd. In order to do so, it has a kvm_xen_set_evtchn_fast() function which will use the pre-mapped shared_info page if it already exists and is still valid, while the slow path through the irqfd_inject workqueue will remap the shared_info page if necessary. It sets the bits in the shared_info page but not the vcpu_info; that is deferred to __kvm_xen_has_interrupt() which raises the vector to the appropriate vCPU. Add a 'verbose' mode to xen_shinfo_test while adding test cases for this. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-5-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:45 -05:00
David Woodhouse	1cfc9c4b9d	KVM: x86/xen: Maintain valid mapping of Xen shared_info page Use the newly reinstated gfn_to_pfn_cache to maintain a kernel mapping of the Xen shared_info page so that it can be accessed in atomic context. Note that we do not participate in dirty tracking for the shared info page and we do not explicitly mark it dirty every single tim we deliver an event channel interrupts. We wouldn't want to do that even if we did have a valid vCPU context with which to do so. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-4-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:45 -05:00
David Woodhouse	982ed0de47	KVM: Reinstate gfn_to_pfn_cache with invalidation support This can be used in two modes. There is an atomic mode where the cached mapping is accessed while holding the rwlock, and a mode where the physical address is used by a vCPU in guest mode. For the latter case, an invalidation will wake the vCPU with the new KVM_REQ_GPC_INVALIDATE, and the architecture will need to refresh any caches it still needs to access before entering guest mode again. Only one vCPU can be targeted by the wake requests; it's simple enough to make it wake all vCPUs or even a mask but I don't see a use case for that additional complexity right now. Invalidation happens from the invalidate_range_start MMU notifier, which needs to be able to sleep in order to wake the vCPU and wait for it. This means that revalidation potentially needs to "wait" for the MMU operation to complete and the invalidate_range_end notifier to be invoked. Like the vCPU when it takes a page fault in that period, we just spin — fixing that in a future patch by implementing an actual wait may be another part of shaving this particularly hirsute yak. As noted in the comments in the function itself, the only case where the invalidate_range_start notifier is expected to be called without being able to sleep is when the OOM reaper is killing the process. In that case, we expect the vCPU threads already to have exited, and thus there will be nothing to wake, and no reason to wait. So we clear the KVM_REQUEST_WAIT bit and send the request anyway, then complain loudly if there actually was anything to wake up. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:44 -05:00
David Woodhouse	2efd61a608	KVM: Warn if mark_page_dirty() is called without an active vCPU The various kvm_write_guest() and mark_page_dirty() functions must only ever be called in the context of an active vCPU, because if dirty ring tracking is enabled it may simply oops when kvm_get_running_vcpu() returns NULL for the vcpu and then kvm_dirty_ring_get() dereferences it. This oops was reported by "butt3rflyh4ck" <butterflyhuangxx@gmail.com> in https://lore.kernel.org/kvm/CAFcO6XOmoS7EacN_n6v4Txk7xL7iqRa2gABg3F7E3Naf5uG94g@mail.gmail.com/ That actual bug will be fixed under separate cover but this warning should help to prevent new ones from being added. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:44 -05:00
David Woodhouse	f3f26dae05	x86/kvm: Silence per-cpu pr_info noise about KVM clocks and steal time I made the actual CPU bringup go nice and fast... and then Linux spends half a minute printing stupid nonsense about clocks and steal time for each of 256 vCPUs. Don't do that. Nobody cares. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211209150938.3518-12-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:43 -05:00
Eric Hankland	018d70ffcf	KVM: x86: Update vPMCs when retiring branch instructions When KVM retires a guest branch instruction through emulation, increment any vPMCs that are configured to monitor "branch instructions retired," and update the sample period of those counters so that they will overflow at the right time. Signed-off-by: Eric Hankland <ehankland@google.com> [jmattson: - Split the code to increment "branch instructions retired" into a separate commit. - Moved/consolidated the calls to kvm_pmu_trigger_event() in the emulation of VMLAUNCH/VMRESUME to accommodate the evolution of that code. ] Fixes: `f5132b0138` ("KVM: Expose a version 2 architectural PMU to a guests") Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20211130074221.93635-7-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:43 -05:00
Eric Hankland	9cd803d496	KVM: x86: Update vPMCs when retiring instructions When KVM retires a guest instruction through emulation, increment any vPMCs that are configured to monitor "instructions retired," and update the sample period of those counters so that they will overflow at the right time. Signed-off-by: Eric Hankland <ehankland@google.com> [jmattson: - Split the code to increment "branch instructions retired" into a separate commit. - Added 'static' to kvm_pmu_incr_counter() definition. - Modified kvm_pmu_incr_counter() to check pmc->perf_event->state == PERF_EVENT_STATE_ACTIVE. ] Fixes: `f5132b0138` ("KVM: Expose a version 2 architectural PMU to a guests") Signed-off-by: Jim Mattson <jmattson@google.com> [likexu: - Drop checks for pmc->perf_event or event state or event type - Increase a counter once its umask bits and the first 8 select bits are matched - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf; - Rename kvm_pmu_record_event to kvm_pmu_trigger_event; - Add counter enable and CPL check for kvm_pmu_trigger_event(); ] Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-6-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:42 -05:00
Like Xu	40ccb96d54	KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}() Depending on whether intr should be triggered or not, KVM registers two different event overflow callbacks in the perf_event context. The code skeleton of these two functions is very similar, so the pmc->intr can be stored into pmc from pmc_reprogram_counter() which provides smaller instructions footprint against the u-architecture branch predictor. The __kvm_perf_overflow() can be called in non-nmi contexts and a flag is needed to distinguish the caller context and thus avoid a check on kvm_is_in_guest(), otherwise we might get warnings from suspicious RCU or check_preemption_disabled(). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-5-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:42 -05:00
Like Xu	6ed1298eb0	KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event() Since we set the same semantic event value for the fixed counter in pmc->eventsel, returning the perf_hw_id for the fixed counter via find_fixed_event() can be painlessly replaced by pmc_perf_hw_id() with the help of pmc_is_fixed() check. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-4-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:42 -05:00
Like Xu	7c174f305c	KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id() The find_arch_event() returns a "unsigned int" value, which is used by the pmc_reprogram_counter() to program a PERF_TYPE_HARDWARE type perf_event. The returned value is actually the kernel defined generic perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler incoming parameters for better self-explanation. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-3-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:41 -05:00
Like Xu	761875634a	KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs The current pmc->eventsel for fixed counter is underutilised. The pmc->eventsel can be setup for all known available fixed counters since we have mapping between fixed pmc index and the intel_arch_events array. Either gp or fixed counter, it will simplify the later checks for consistency between eventsel and perf_hw_id. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-2-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:41 -05:00
Paolo Bonzini	006a0f0607	KVM: x86: avoid out of bounds indices for fixed performance counters Because IceLake has 4 fixed performance counters but KVM only supports 3, it is possible for reprogram_fixed_counters to pass to reprogram_fixed_counter an index that is out of bounds for the fixed_pmc_events array. Ultimately intel_find_fixed_event, which is the only place that uses fixed_pmc_events, handles this correctly because it checks against the size of fixed_pmc_events anyway. Every other place operates on the fixed_counters[] array which is sized according to INTEL_PMC_MAX_FIXED. However, it is cleaner if the unsupported performance counters are culled early on in reprogram_fixed_counters. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:40 -05:00
Lai Jiangshan	5b61178cd2	KVM: VMX: Mark VCPU_EXREG_CR3 dirty when !CR0_PG -> CR0_PG if EPT + !URG When !CR0_PG -> CR0_PG, vcpu->arch.cr3 becomes active, but GUEST_CR3 is still vmx->ept_identity_map_addr if EPT + !URG. So VCPU_EXREG_CR3 is considered to be dirty and GUEST_CR3 needs to be updated in this case. Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211216021938.11752-4-jiangshanlai@gmail.com> Fixes: `c62c7bd4f9` ("KVM: VMX: Update vmcs.GUEST_CR3 only when the guest CR3 is dirty") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:40 -05:00
Lai Jiangshan	6b123c3a89	KVM: x86/mmu: Reconstruct shadow page root if the guest PDPTEs is changed For shadow paging, the page table needs to be reconstructed before the coming VMENTER if the guest PDPTEs is changed. But not all paths that call load_pdptrs() will cause the page tables to be reconstructed. Normally, kvm_mmu_reset_context() and kvm_mmu_free_roots() are used to launch later reconstruction. The commit d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed") skips kvm_mmu_reset_context() after load_pdptrs() when changing CR0.CD and CR0.NW. The commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush") skips kvm_mmu_free_roots() after load_pdptrs() when rewriting the CR3 with the same value. The commit a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after load_pdptrs() when changing CR4.PGE. Guests like linux would keep the PDPTEs unchanged for every instance of pagetable, so this missing reconstruction has no problem for linux guests. Fixes: d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed") Fixes: 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush") Fixes: a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211216021938.11752-3-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:40 -05:00
Lai Jiangshan	a9f2705ec8	KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs() The host CR3 in the vcpu thread can only be changed when scheduling, so commit `15ad9762d6` ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()") changed vmx.c to only save it in vmx_prepare_switch_to_guest(). However, it also has to be synced in vmx_sync_vmcs_host_state() when switching VMCS. vmx_set_host_fs_gs() is called in both places, so rename it to vmx_set_vmcs_host_state() and make it update HOST_CR3. Fixes: `15ad9762d6` ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211216021938.11752-2-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:39 -05:00
Paolo Bonzini	46cbc0400f	Revert "KVM: X86: Update mmu->pdptrs only when it is changed" This reverts commit `24cd19a28c`. Sean Christopherson reports: "Commit `24cd19a28c` ('KVM: X86: Update mmu->pdptrs only when it is changed') breaks nested VMs with EPT in L0 and PAE shadow paging in L2. Reproducing is trivial, just disable EPT in L1 and run a VM. I haven't investigating how it breaks things." Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:39 -05:00
Peter Gonda	a6fec53947	selftests: KVM: sev_migrate_tests: Add mirror command tests Add tests to confirm mirror vms can only run correct subset of commands. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Marc Orr <marcorr@google.com> Signed-off-by: Peter Gonda <pgonda@google.com> Message-Id: <20211208191642.3792819-4-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:38 -05:00
Peter Gonda	427d046a41	selftests: KVM: sev_migrate_tests: Fix sev_ioctl() TEST_ASSERT in SEV ioctl was allowing errors because it checked return value was good OR the FW error code was OK. This TEST_ASSERT should require both (aka. AND) values are OK. Removes the LAUNCH_START from the mirror VM because this call correctly fails because mirror VMs cannot call this command. Currently issues with the PSP driver functions mean the firmware error is not always reset to SEV_RET_SUCCESS when a call is successful. Mainly sev_platform_init() doesn't correctly set the fw error if the platform has already been initialized. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Marc Orr <marcorr@google.com> Signed-off-by: Peter Gonda <pgonda@google.com> Message-Id: <20211208191642.3792819-3-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:38 -05:00
Peter Gonda	4c66b56781	selftests: KVM: sev_migrate_tests: Fix test_sev_mirror() Mirrors should not be able to call LAUNCH_START. Remove the call on the mirror to correct the test before fixing sev_ioctl() to correctly assert on this failed ioctl. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Marc Orr <marcorr@google.com> Signed-off-by: Peter Gonda <pgonda@google.com> Message-Id: <20211208191642.3792819-2-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 10:44:38 -05:00
Paolo Bonzini	1b0c9d00aa	KVM/riscv changes for 5.17, take #1 - Use common KVM implementation of MMU memory caches - SBI v0.2 support for Guest - Initial KVM selftests support - Fix to avoid spurious virtual interrupts after clearing hideleg CSR - Update email address for Anup and Atish -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmHW6GcACgkQrUjsVaLH LAcA+Q//bRKyuC2JGn0qN0e4WOcb8zpDUw5zep8WqlWviFiNjxxVjHeroT//cDtr 7apwTCJogDlgnkcH0e88CzD3M0Gh/NJ7JAZ/Z1gBMhBMz7afcahnADqXcTuottMf x0stMIQxKlQhess2IQa502KGb23uitbLfiY2MzaPVnXbxfBbM08YUPAcIhSSl+iP ZXtvweqhrafUoUvaEFXSHkA27QMWEH+vZq4JlRwLSy7y3U3Hd/51nH04Fxp/n4Qh 5XyWO0mPqmiTb6Dz5I/hx7sZLZ5ErMpFI5II22sZYcOqtrrL59f5I9gvYQOYc7im GjyBshD8bB4SVEciMGEJq9QucOw41M6cTFmdiaQR+NCHfMa/A5RPwf0zT+15Xrtg zEkNQCRdWgDhxb/cYqKaAQXERfeposr0xS398qoSUT29GXFvv0N+P2N/WAgQqg+D 2cnhGRMlsdUJEVWUXCJjZ1u/Wwx6gkxJbjvRY48vvvB76eZzr82sOXYEIF8MBrPG co5wa/mzUl3CfgzHO4fESvR+hNTbXiPLbW/FPzSdNNMWxB5GOREP42vcDn9be1Xf IXBcKlpL2MhExPh+J6DjM3BqMdV+8qywcn0iR0f3W7CuAiJS0sF7Pyn8Ur7GpU6U YFwwLWYBQmdPmkXZuH+8fAju2GMjAHxJBCOfpKahhDcWk7TECdE= =Ao1E -----END PGP SIGNATURE----- Merge tag 'kvm-riscv-5.17-1' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv changes for 5.17, take #1 - Use common KVM implementation of MMU memory caches - SBI v0.2 support for Guest - Initial KVM selftests support - Fix to avoid spurious virtual interrupts after clearing hideleg CSR - Update email address for Anup and Atish	2022-01-07 10:43:02 -05:00
Paolo Bonzini	7fd55a02a4	KVM/arm64 updates for Linux 5.16 - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmHYIpgPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDndsP/RsBmX6bmQnDEhaaqfGAxOETyq/my1eT9r/V 3Ax4fEqSFfD5yHbYvqNRC8ueycH4r8WAr4ACWDAI6XpS/pYx00nx2N+HCSgjGyQR FeXqITuGPEsn4NkGuPci0PFmI8rVUzanl1ugRGQAETVrZo2ZVH2uqKVGT8XOlu0J FB/0x6Z4vMuIgEXyfa+DZ8WdW1aCRgPU2oyOdSdWE57/grjyLJqk6EdMmLyaQ19E vz6vXuRnA/GQwOtByqYEnQ8a4VXsQedCMqg/f9mj0BxpDzxC1ps8Nrpv36aJXKUN LEXapP9bCWPW9LqaKAOZnQYrUIIEFHsCUom0n3reDHrgObA+jivpz75L8GEr3CdC Bv78N04Yymjpp2WA6CuO3r9HjL1nJ6tYqobXU2pvqln4nNC3Ukucjq9ZVuWgS6Hx qOZXgPcZ/HpS3l/U+dAu8yIcV2SchQXDudaq8BsfLd8M1bD+oirSBolZFSvz7MYZ 6+jtEDLUOEO5s4rXiJF46+MauxiELcjaewAEK4WwrS8NBwEyhYe9EPsYcQ5pcrQF QwAd1+y7oLfhpGHv5KJKWswfvbtlLCm6NOAhawq0UXM8bS+79tu0dGjiDzVPBuSf SyA3VtBSKxcpvCrljw9ubtjxvKrviE0MDvlmTP2B1NU+lwm8xRBiwUwOH29qP9zU HDeUj2fy =HkZk -----END PGP SIGNATURE----- Merge tag 'kvmarm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update	2022-01-07 10:42:19 -05:00
Anup Patel	497685f2c7	MAINTAINERS: Update Anup's email address I am no longer work at Western Digital so update my email address to personal one and add entries to .mailmap as well. Signed-off-by: Anup Patel <anup@brainfault.org> Acked-by: Atish Patra <atishp@rivosinc.com>	2022-01-06 15:18:22 +05:30
Vincent Chen	33e5b5746c	KVM: RISC-V: Avoid spurious virtual interrupts after clearing hideleg CSR When the last VM is terminated, the host kernel will invoke function hardware_disable_nolock() on each CPU to disable the related virtualization functions. Here, RISC-V currently only clears hideleg CSR and hedeleg CSR. This behavior will cause the host kernel to receive spurious interrupts if hvip CSR has pending interrupts and the corresponding enable bits in vsie CSR are asserted. To avoid it, hvip CSR and vsie CSR must be cleared before clearing hideleg CSR. Fixes: `99cdc6c18c` ("RISC-V: Add initial skeletal KVM support") Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>	2022-01-06 15:18:18 +05:30
Anup Patel	3e06cdf105	KVM: selftests: Add initial support for RISC-V 64-bit We add initial support for RISC-V 64-bit in KVM selftests using which we can cross-compile and run arch independent tests such as: demand_paging_test dirty_log_test kvm_create_max_vcpus, kvm_page_table_test set_memory_region_test kvm_binary_stats_test All VM guest modes defined in kvm_util.h require at least 48-bit guest virtual address so to use KVM RISC-V selftests hardware need to support at least Sv48 MMU for guest (i.e. VS-mode). Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-and-tested-by: Atish Patra <atishp@rivosinc.com>	2022-01-06 15:17:50 +05:30
Anup Patel	788490e798	KVM: selftests: Add EXTRA_CFLAGS in top-level Makefile We add EXTRA_CFLAGS to the common CFLAGS of top-level Makefile which will allow users to pass additional compile-time flags such as "-static". Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-and-tested-by: Atish Patra <atishp@rivosinc.com> Reviewed-and-tested-by: Sean Christopherson <seanjc@google.com>	2022-01-06 15:17:46 +05:30
Anup Patel	a457fd5660	RISC-V: KVM: Add VM capability to allow userspace get GPA bits The number of GPA bits supported for a RISC-V Guest/VM is based on the MMU mode used by the G-stage translation. The KVM RISC-V will detect and use the best possible MMU mode for the G-stage in kvm_arch_init(). We add a generic VM capability KVM_CAP_VM_GPA_BITS which can be used by the KVM userspace to get the number of GPA (guest physical address) bits supported for a Guest/VM. Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-and-tested-by: Atish Patra <atishp@rivosinc.com>	2022-01-06 15:16:58 +05:30
Anup Patel	ef8949a986	RISC-V: KVM: Forward SBI experimental and vendor extensions The SBI experimental extension space is for temporary (or experimental) stuff whereas SBI vendor extension space is for hardware vendor specific stuff. Both these SBI extension spaces won't be standardized by the SBI specification so let's blindly forward such SBI calls to the userspace. Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-and-tested-by: Atish Patra <atishp@rivosinc.com>	2022-01-06 15:14:33 +05:30
Jisheng Zhang	637ad6551b	RISC-V: KVM: make kvm_riscv_vcpu_fp_clean() static There are no users outside vcpu_fp.c so make kvm_riscv_vcpu_fp_clean() static. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Anup Patel <anup.patel@wdc.com>	2022-01-06 15:13:58 +05:30
Atish Patra	4abed558b2	MAINTAINERS: Update Atish's email address I am no longer employed by western digital. Update my email address to personal one and add entries to .mailmap as well. Signed-off-by: Atish Patra <atishp@atishpatra.org> Signed-off-by: Anup Patel <anup.patel@wdc.com>	2022-01-06 15:13:54 +05:30
Atish Patra	3e1d86569c	RISC-V: KVM: Add SBI HSM extension in KVM SBI HSM extension allows OS to start/stop harts any time. It also allows ordered booting of harts instead of random booting. Implement SBI HSM exntesion and designate the vcpu 0 as the boot vcpu id. All other non-zero non-booting vcpus should be brought up by the OS implementing HSM extension. If the guest OS doesn't implement HSM extension, only single vcpu will be available to OS. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>	2022-01-06 15:12:47 +05:30
Atish Patra	5f862df558	RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2 The SBI v0.2 contains some of the improved versions of required v0.1 extensions such as remote fence, timer and IPI. This patch implements those extensions. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>	2022-01-06 15:12:15 +05:30

1 2 3 4 5 ...

1059550 commits