Commit graph

1043108 commits

Author SHA1 Message Date
Oliver Upton
5000653934 tools: arch: x86: pull in pvclock headers
Copy over approximately clean versions of the pvclock headers into
tools. Reconcile headers/symbols missing in tools that are unneeded.

Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20210916181555.973085-2-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18 14:43:45 -04:00
Oliver Upton
828ca89628 KVM: x86: Expose TSC offset controls to userspace
To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.

A much cleaner approach to configuring the guest's views of the TSC is to
simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus not subject to change depending on when the VMM actually
reads/writes values from/to KVM. The VMM can then read the TSC once with
KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
the guest is paused.

Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-8-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18 14:43:45 -04:00
Oliver Upton
58d4277be9 KVM: x86: Refactor tsc synchronization code
Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.

Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20210916181538.968978-7-oupton@google.com>
[Make sure kvm->arch.cur_tsc_generation and vcpu->arch.this_tsc_generation are
 equal at the end of __kvm_synchronize_tsc, if matched is false. Reported by
 Maxim Levitsky. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18 14:43:45 -04:00
Paolo Bonzini
869b44211a kvm: x86: protect masterclock with a seqcount
Protect the reference point for kvmclock with a seqcount, so that
kvmclock updates for all vCPUs can proceed in parallel.  Xen runstate
updates will also run in parallel and not bounce the kvmclock cacheline.

Of the variables that were protected by pvclock_gtod_sync_lock,
nr_vcpus_matched_tsc is different because it is updated outside
pvclock_update_vm_gtod_copy and read inside it.  Therefore, we
need to keep it protected by a spinlock.  In fact it must now
be a raw spinlock, because pvclock_update_vm_gtod_copy, being the
write-side of a seqcount, is non-preemptible.  Since we already
have tsc_write_lock which is a raw spinlock, we can just use
tsc_write_lock as the lock that protects the write-side of the
seqcount.

Co-developed-by: Oliver Upton <oupton@google.com>
Message-Id: <20210916181538.968978-6-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18 14:43:44 -04:00
Oliver Upton
c68dc1b577 KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.

Provide userspace with a host TSC & realtime pair iff the realtime clock
is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
realtime value, advance the KVM clock by the amount of elapsed time. Do
not step the KVM clock backwards, though, as it is a monotonic
oscillator.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-5-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18 14:43:44 -04:00
Paolo Bonzini
3d5e7a28b1 KVM: x86: avoid warning with -Wbitwise-instead-of-logical
This is a new warning in clang top-of-tree (will be clang 14):

In file included from arch/x86/kvm/mmu/mmu.c:27:
arch/x86/kvm/mmu/spte.h:318:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
        return __is_bad_mt_xwr(rsvd_check, spte) |
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                 ||
arch/x86/kvm/mmu/spte.h:318:9: note: cast one or both operands to int to silence this warning

The code is fine, but change it anyway to shut up this clever clogs
of a compiler.

Reported-by: torvic9@mailbox.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18 14:43:42 -04:00
Paolo Bonzini
a25c78d04c Merge commit 'kvm-pagedata-alloc-fixes' into HEAD 2021-10-18 14:13:37 -04:00
Paolo Bonzini
fa13843d15 KVM: X86: fix lazy allocation of rmaps
If allocation of rmaps fails, but some of the pointers have already been written,
those pointers can be cleaned up when the memslot is freed, or even reused later
for another attempt at allocating the rmaps.  Therefore there is no need to
WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
skipped lest KVM will overwrite the previous pointer and will indeed leak memory.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18 14:07:17 -04:00
Andrei Vagin
a7cc099f2e KVM: x86/mmu: kvm_faultin_pfn has to return false if pfh is returned
This looks like a typo in 8f32d5e563. This change didn't intend to do
any functional changes.

The problem was caught by gVisor tests.

Fixes: 8f32d5e563 ("KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Message-Id: <20211015163221.472508-1-avagin@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18 08:19:18 -04:00
Paolo Bonzini
e2b6d941ec KVM/arm64 fixes for 5.15, take #2
- Properly refcount pages used as a concatenated stage-2 PGD
 - Fix missing unlock when detecting the use of MTE+VM_SHARED
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmFpPQwPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDasoP/iNiTIEw7zrHs37Vx4dtTjz15RNF9gmFd2iR
 EXCg76V+2VN8/87bIcYdKeHkjXtERJ2tTCOJq9X/dn8MixvyShhCxJnk5chc1eZE
 2W4GY0gkuKO6E5rDAe10kl+zeFKVAd77zAeUezZYGGfRlm1Ly8sXrwzfR7ZXtvDH
 1DL89adycUJlh28lwcX73ScXWpiAsFXdWFjsVpHJngDZ1z3aYUqtPJmESH+mwtuK
 J85tTq4PZCsnubmUpuCYopgBJnogjh4KTulVOyenykvQyXZ+EVBjTA4Xh8iCyi77
 LMxqazvVkFnL3tBw2H7OCWPhQ/xfmWclpKcngDvnfPMIW6b3Tl5dCIY6uNWoLMnV
 8COh4ggahhqgsswHVCiFPBkJ/J3G+L8vLz2PHXtHJw6yOzPTEPD5yXH8oXoJhQ+Y
 U2aT83bWdc5sK9ZvhRHSpVXFZ6bWSUnMztP9szz1DM4QzyT6RHgXJE7TxWXFHhPR
 Hs0VGX13hz1f/AYN/+BDxicCtH9r/SbG6hlTPrt9+IFUAiz9EZ5Xy28fT1+c93Cf
 ArylAulF+HQyZMMe4RV2quAEXx5q5ag8pMsm/KQrh9fn0srrO4AnaoBGtuIjKJu3
 ccBd/1vbIAU9AxLw82LHW558gAzdq3LskPApgjkTA19oj7ffdMEqqs/cIDkN7nDx
 QEDaVxFv
 =eQRY
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.15, take #2

- Properly refcount pages used as a concatenated stage-2 PGD
- Fix missing unlock when detecting the use of MTE+VM_SHARED
2021-10-15 04:47:55 -04:00
Paolo Bonzini
019057bd73 KVM: SEV-ES: fix length of string I/O
The size of the data in the scratch buffer is not divided by the size of
each port I/O operation, so vcpu->arch.pio.count ends up being larger
than it should be by a factor of size.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-15 04:47:36 -04:00
Quentin Perret
6e6a8ef088 KVM: arm64: Release mmap_lock when using VM_SHARED with MTE
VM_SHARED mappings are currently forbidden in a memslot with MTE to
prevent two VMs racing to sanitise the same page. However, this check
is performed while holding current->mm's mmap_lock, but fails to release
it. Fix this by releasing the lock when needed.

Fixes: ea7fc1bb1c ("KVM: arm64: Introduce MTE VM feature")
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005122031.809857-1-qperret@google.com
2021-10-05 13:22:45 +01:00
Quentin Perret
7615c2a514 KVM: arm64: Report corrupted refcount at EL2
Some of the refcount manipulation helpers used at EL2 are instrumented
to catch a corrupted state, but not all of them are treated equally. Let's
make things more consistent by instrumenting hyp_page_ref_dec_and_test()
as well.

Acked-by: Will Deacon <will@kernel.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005090155.734578-6-qperret@google.com
2021-10-05 13:02:54 +01:00
Quentin Perret
1d58a17ef5 KVM: arm64: Fix host stage-2 PGD refcount
The KVM page-table library refcounts the pages of concatenated stage-2
PGDs individually. However, when running KVM in protected mode, the
host's stage-2 PGD is currently managed by EL2 as a single high-order
compound page, which can cause the refcount of the tail pages to reach 0
when they shouldn't, hence corrupting the page-table.

Fix this by introducing a new hyp_split_page() helper in the EL2 page
allocator (matching the kernel's split_page() function), and make use of
it from host_s2_zalloc_pages_exact().

Fixes: 1025c8c0c6 ("KVM: arm64: Wrap the host with a stage 2")
Acked-by: Will Deacon <will@kernel.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005090155.734578-5-qperret@google.com
2021-10-05 13:02:54 +01:00
Paolo Bonzini
542a2640a2 Initial KVM RISC-V support
Following features are supported by the initial KVM RISC-V support:
 1. No RISC-V specific KVM IOCTL
 2. Loadable KVM RISC-V module
 3. Minimal possible KVM world-switch which touches only GPRs and few CSRs
 4. Works on both RV64 and RV32 host
 5. Full Guest/VM switch via vcpu_get/vcpu_put infrastructure
 6. KVM ONE_REG interface for VCPU register access from KVM user-space
 7. Interrupt controller emulation in KVM user-space
 8. Timer and IPI emuation in kernel
 9. Both Sv39x4 and Sv48x4 supported for RV64 host
 10. MMU notifiers supported
 11. Generic dirty log supported
 12. FP lazy save/restore supported
 13. SBI v0.1 emulation for Guest/VM
 14. Forward unhandled SBI calls to KVM user-space
 15. Hugepage support for Guest/VM
 16. IOEVENTFD support for Vhost
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmFb7+kACgkQrUjsVaLH
 LAfvBRAAiD/sUCbrGNk4IAOpI3drxBUTJAC3cCClOAlffHjgUgSf3Qlz+9NMNfuV
 6h3Jk9cheIVwrsQLPG4Zawnk2bujFpaJ4rn+kO3aA1ArT+ECGR2XVie2CpBuJ5g7
 ysRlrhcr9YVXXlt8ZczUBwYSjDkDaM1gaDQCt3ep65RCm8NU++Y/TlJOK4xFqC2Z
 Km2jlX2FAxKQLjNJstCPFGiXHhvRK9tu9TK18Fl9Ibk+oO7/H+tZ6VwCivzjzdiW
 kYAa6xdX4NT80H6UGeR/R72b2CekD2RJDnyaN4LE2ticIzuysZpxcI3806gOazDm
 8zVXB9627LRzHzJt4JjdKw76BghuKE3DW8V8hg4WuTpVa6Nx6EHxgpOpGA2sFZDr
 QZPv4gWQeEGBsTxQJpVJ8D6sIEsQdNQQERUemQToAzX9Hl/kkKaSQSfocSOYqgVP
 iMNY4LsJfNg/vDEDzMub/kF9qH0jPH8pkQuNX/X0A8nAtJkznjxVN9NaTxDD25Tp
 udw5/easWD1qDGE6i4Kk/JyDb9NbVAK0ZK6CM5JjtqDtOAnVXCXy+rXgEPsNs3Td
 kZVa34hKAlTpVhMfI0yj0s5jX0d/2w5KuQX147C/IWi3pBdIYZLvcqAb/1cxSoff
 uyW4dryfU7cQQdOfbIKHgaLU3b221/q5wYmEi3zqtVbixhIUtNE=
 =MMFv
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-5.16-1' of git://github.com/kvm-riscv/linux into HEAD

Initial KVM RISC-V support

Following features are supported by the initial KVM RISC-V support:
1. No RISC-V specific KVM IOCTL
2. Loadable KVM RISC-V module
3. Minimal possible KVM world-switch which touches only GPRs and few CSRs
4. Works on both RV64 and RV32 host
5. Full Guest/VM switch via vcpu_get/vcpu_put infrastructure
6. KVM ONE_REG interface for VCPU register access from KVM user-space
7. Interrupt controller emulation in KVM user-space
8. Timer and IPI emuation in kernel
9. Both Sv39x4 and Sv48x4 supported for RV64 host
10. MMU notifiers supported
11. Generic dirty log supported
12. FP lazy save/restore supported
13. SBI v0.1 emulation for Guest/VM
14. Forward unhandled SBI calls to KVM user-space
15. Hugepage support for Guest/VM
16. IOEVENTFD support for Vhost
2021-10-05 04:19:24 -04:00
Anup Patel
24b699d12c RISC-V: KVM: Add MAINTAINERS entry
Add myself as maintainer for KVM RISC-V and Atish as designated reviewer.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:14:10 +05:30
Anup Patel
da40d85805 RISC-V: KVM: Document RISC-V specific parts of KVM API
Document RISC-V specific parts of the KVM API, such as:
 - The interrupt numbers passed to the KVM_INTERRUPT ioctl.
 - The states supported by the KVM_{GET,SET}_MP_STATE ioctls.
 - The registers supported by the KVM_{GET,SET}_ONE_REG interface
   and the encoding of those register ids.
 - The exit reason KVM_EXIT_RISCV_SBI for SBI calls forwarded to
   userspace tool.

CC: Jonathan Corbet <corbet@lwn.net>
CC: linux-doc@vger.kernel.org
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:12:34 +05:30
Atish Patra
dea8ee31a0 RISC-V: KVM: Add SBI v0.1 support
The KVM host kernel is running in HS-mode needs so we need to handle
the SBI calls coming from guest kernel running in VS-mode.

This patch adds SBI v0.1 support in KVM RISC-V. Almost all SBI v0.1
calls are implemented in KVM kernel module except GETCHAR and PUTCHART
calls which are forwarded to user space because these calls cannot be
implemented in kernel space. In future, when we implement SBI v0.2 for
Guest, we will forward SBI v0.2 experimental and vendor extension calls
to user space.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:11:30 +05:30
Atish Patra
4d9c5c072f RISC-V: KVM: Implement ONE REG interface for FP registers
Add a KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctl interface for floating
point registers such as F0-F31 and FCSR. This support is added for
both 'F' and 'D' extensions.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:09:32 +05:30
Atish Patra
5de52d4a23 RISC-V: KVM: FP lazy save/restore
This patch adds floating point (F and D extension) context save/restore
for guest VCPUs. The FP context is saved and restored lazily only when
kernel enter/exits the in-kernel run loop and not during the KVM world
switch. This way FP save/restore has minimal impact on KVM performance.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:08:23 +05:30
Atish Patra
3a9f66cb25 RISC-V: KVM: Add timer functionality
The RISC-V hypervisor specification doesn't have any virtual timer
feature.

Due to this, the guest VCPU timer will be programmed via SBI calls.
The host will use a separate hrtimer event for each guest VCPU to
provide timer functionality. We inject a virtual timer interrupt to
the guest VCPU whenever the guest VCPU hrtimer event expires.

This patch adds guest VCPU timer implementation along with ONE_REG
interface to access VCPU timer state from user space.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:07:16 +05:30
Anup Patel
9955371cc0 RISC-V: KVM: Implement MMU notifiers
This patch implements MMU notifiers for KVM RISC-V so that Guest
physical address space is in-sync with Host physical address space.

This will allow swapping, page migration, etc to work transparently
with KVM RISC-V.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:03:39 +05:30
Anup Patel
9d05c1fee8 RISC-V: KVM: Implement stage2 page table programming
This patch implements all required functions for programming
the stage2 page table for each Guest/VM.

At high-level, the flow of stage2 related functions is similar
from KVM ARM/ARM64 implementation but the stage2 page table
format is quite different for KVM RISC-V.

[jiangyifei: stage2 dirty log support]
Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:02:19 +05:30
Anup Patel
fd7bb4a251 RISC-V: KVM: Implement VMID allocator
We implement a simple VMID allocator for Guests/VMs which:
1. Detects number of VMID bits at boot-time
2. Uses atomic number to track VMID version and increments
   VMID version whenever we run-out of VMIDs
3. Flushes Guest TLBs on all host CPUs whenever we run-out
   of VMIDs
4. Force updates HW Stage2 VMID for each Guest VCPU whenever
   VMID changes using VCPU request KVM_REQ_UPDATE_HGATP

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:01:04 +05:30
Anup Patel
5a5d79acd7 RISC-V: KVM: Handle WFI exits for VCPU
We get illegal instruction trap whenever Guest/VM executes WFI
instruction.

This patch handles WFI trap by blocking the trapped VCPU using
kvm_vcpu_block() API. The blocked VCPU will be automatically
resumed whenever a VCPU interrupt is injected from user-space
or from in-kernel IRQCHIP emulation.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 15:55:01 +05:30
Anup Patel
9f70132651 RISC-V: KVM: Handle MMIO exits for VCPU
We will get stage2 page faults whenever Guest/VM access SW emulated
MMIO device or unmapped Guest RAM.

This patch implements MMIO read/write emulation by extracting MMIO
details from the trapped load/store instruction and forwarding the
MMIO read/write to user-space. The actual MMIO emulation will happen
in user-space and KVM kernel module will only take care of register
updates before resuming the trapped VCPU.

The handling for stage2 page faults for unmapped Guest RAM will be
implemeted by a separate patch later.

[jiangyifei: ioeventfd and in-kernel mmio device support]
Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 15:51:47 +05:30
Anup Patel
34bde9d8b9 RISC-V: KVM: Implement VCPU world-switch
This patch implements the VCPU world-switch for KVM RISC-V.

The KVM RISC-V world-switch (i.e. __kvm_riscv_switch_to()) mostly
switches general purpose registers, SSTATUS, STVEC, SSCRATCH and
HSTATUS CSRs. Other CSRs are switched via vcpu_load() and vcpu_put()
interface in kvm_arch_vcpu_load() and kvm_arch_vcpu_put() functions
respectively.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 15:49:57 +05:30
Anup Patel
92ad82002c RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
VCPU config and registers from user-space.

We have three types of VCPU registers:
1. CONFIG - these are VCPU config and capabilities
2. CORE   - these are VCPU general purpose registers
3. CSR    - these are VCPU control and status registers

The CONFIG register available to user-space is ISA. The ISA register is
a read and write register where user-space can only write the desired
VCPU ISA capabilities before running the VCPU.

The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
PC and MODE. The PC register represents program counter whereas the MODE
register represent VCPU privilege mode (i.e. S/U-mode).

The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.

In future, more VCPU register types will be added (such as FP) for the
KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 15:46:22 +05:30
Anup Patel
cce69aff68 RISC-V: KVM: Implement VCPU interrupts and requests handling
This patch implements VCPU interrupts and requests which are both
asynchronous events.

The VCPU interrupts can be set/unset using KVM_INTERRUPT ioctl from
user-space. In future, the in-kernel IRQCHIP emulation will use
kvm_riscv_vcpu_set_interrupt() and kvm_riscv_vcpu_unset_interrupt()
functions to set/unset VCPU interrupts.

Important VCPU requests implemented by this patch are:
KVM_REQ_SLEEP       - set whenever VCPU itself goes to sleep state
KVM_REQ_VCPU_RESET  - set whenever VCPU reset is requested

The WFI trap-n-emulate (added later) will use KVM_REQ_SLEEP request
and kvm_riscv_vcpu_has_interrupt() function.

The KVM_REQ_VCPU_RESET request will be used by SBI emulation (added
later) to power-up a VCPU in power-off state. The user-space can use
the GET_MPSTATE/SET_MPSTATE ioctls to get/set power state of a VCPU.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 15:42:43 +05:30
Anup Patel
a33c72faf2 RISC-V: KVM: Implement VCPU create, init and destroy functions
This patch implements VCPU create, init and destroy functions
required by generic KVM module. We don't have much dynamic
resources in struct kvm_vcpu_arch so these functions are quite
simple for KVM RISC-V.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 15:41:17 +05:30
Anup Patel
99cdc6c18c RISC-V: Add initial skeletal KVM support
This patch adds initial skeletal KVM RISC-V support which has:
1. A simple implementation of arch specific VM functions
   except kvm_vm_ioctl_get_dirty_log() which will implemeted
   in-future as part of stage2 page loging.
2. Stubs of required arch specific VCPU functions except
   kvm_arch_vcpu_ioctl_run() which is semi-complete and
   extended by subsequent patches.
3. Stubs for required arch specific stage2 MMU functions.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 15:40:08 +05:30
Anup Patel
3f2401f47d RISC-V: Add hypervisor extension related CSR defines
This patch adds asm/kvm_csr.h for RISC-V hypervisor extension
related defines.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Message-Id: <20210927114016.1089328-2-anup.patel@wdc.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-04 04:54:55 -04:00
Paolo Bonzini
2353e593a1 KVM: s390: allow to compile without warning with W=1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+SKTgaM0CPnbq/vKEXu8gLWmHHwFAmFVscoACgkQEXu8gLWm
 HHzC2A/8DXRlixedn2tC5RIrrm0dSWtz6XJBATVH0SiY3R8Z0u4zWs05jt2S5Z2w
 SpVzIgSm5HRZP68xRnJ/kiZeLiW/gKbxMRYwniok483DK0GcJiTIegJMYYl/jLZH
 aIFXxzPyAPdRwa503EUX61ehDE4HCaO4gctB7ZU1+jmTk27bV81NYAWsXSvH6FkC
 ZSenC/NZ6iZVMB78CLeb9GxykEaztc04/FhhtjwsPek62g0hsOe6nOlIFrf9N0q6
 PiZoNFalJUxxX1Qgaj4Jl7QEV+o7re4xalOlt+X7Rnh2AhGyNfoe8ZuCk0fRikjU
 GNbBEMy24r3DsMiHRpBXbPSrtCeahSRXNe9ewxGi7TYSpZvrFVEHb46HPBW+Kpux
 lh2Eur3Cd+M+vHeqAowBwT1iRvxQjlJ5Vhh0bhs+2t3iWxFlPwIjq2QzHMFPaii8
 M9lnrWhzQ9/IySjoMJEN5scrd7ZsVquVENm2B8zXCkvCT+pfAV/p+xrMIzuUt56B
 p0ESsuRoF0bYNqnheHJlv+AomO5talPxaBeMcmqFP7E2rlhqzd/+WIwOfYn7INhz
 A8a5u06oEDn96vixF1rAYCJg8JzXPQ3EGe4NAAchWRlHZc+z5ZH/gM6dFKiGAQed
 M6jmuEMe6DthtzpTF83lXwLktnyfv8bbKsVRSYhSrwC0VvkiqNc=
 =A8Ck
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master

KVM: s390: allow to compile without warning with W=1
2021-10-04 03:58:25 -04:00
David Stevens
deae4a10f1 KVM: x86: only allocate gfn_track when necessary
Avoid allocating the gfn_track arrays if nothing needs them. If there
are no external to KVM users of the API (i.e. no GVT-g), then page
tracking is only needed for shadow page tables. This means that when tdp
is enabled and there are no external users, then the gfn_track arrays
can be lazily allocated when the shadow MMU is actually used. This avoid
allocations equal to .05% of guest memory when nested virtualization is
not used, if the kernel is compiled without GVT-g.

Signed-off-by: David Stevens <stevensd@chromium.org>
Message-Id: <20210922045859.2011227-3-stevensd@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:58 -04:00
David Stevens
e9d0c0c4f7 KVM: x86: add config for non-kvm users of page tracking
Add a config option that allows kvm to determine whether or not there
are any external users of page tracking.

Signed-off-by: David Stevens <stevensd@chromium.org>
Message-Id: <20210922045859.2011227-2-stevensd@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:57 -04:00
Krish Sadhukhan
174a921b69 nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB
According to section "TLB Flush" in APM vol 2,

    "Support for TLB_CONTROL commands other than the first two, is
     optional and is indicated by CPUID Fn8000_000A_EDX[FlushByAsid].

     All encodings of TLB_CONTROL not defined in the APM are reserved."

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20210920235134.101970-3-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:57 -04:00
Juergen Gross
78b497f2e6 kvm: use kvfree() in kvm_arch_free_vm()
By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can
use the common variant. This can be accomplished by adding another
macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now.

Further simplification can be achieved by adding __kvm_arch_free_vm()
doing the common part.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-Id: <20210903130808.30142-5-jgross@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:57 -04:00
Babu Moger
b73a54321a KVM: x86: Expose Predictive Store Forwarding Disable
Predictive Store Forwarding: AMD Zen3 processors feature a new
technology called Predictive Store Forwarding (PSF).

PSF is a hardware-based micro-architectural optimization designed
to improve the performance of code execution by predicting address
dependencies between loads and stores.

How PSF works:

It is very common for a CPU to execute a load instruction to an address
that was recently written by a store. Modern CPUs implement a technique
known as Store-To-Load-Forwarding (STLF) to improve performance in such
cases. With STLF, data from the store is forwarded directly to the load
without having to wait for it to be written to memory. In a typical CPU,
STLF occurs after the address of both the load and store are calculated
and determined to match.

PSF expands on this by speculating on the relationship between loads and
stores without waiting for the address calculation to complete. With PSF,
the CPU learns over time the relationship between loads and stores. If
STLF typically occurs between a particular store and load, the CPU will
remember this.

In typical code, PSF provides a performance benefit by speculating on
the load result and allowing later instructions to begin execution
sooner than they otherwise would be able to.

The details of security analysis of AMD predictive store forwarding is
documented here.
https://www.amd.com/system/files/documents/security-analysis-predictive-store-forwarding.pdf

Predictive Store Forwarding controls:
There are two hardware control bits which influence the PSF feature:
- MSR 48h bit 2 – Speculative Store Bypass (SSBD)
- MSR 48h bit 7 – Predictive Store Forwarding Disable (PSFD)

The PSF feature is disabled if either of these bits are set.  These bits
are controllable on a per-thread basis in an SMT system. By default, both
SSBD and PSFD are 0 meaning that the speculation features are enabled.

While the SSBD bit disables PSF and speculative store bypass, PSFD only
disables PSF.

PSFD may be desirable for software which is concerned with the
speculative behavior of PSF but desires a smaller performance impact than
setting SSBD.

Support for PSFD is indicated in CPUID Fn8000_0008 EBX[28].
All processors that support PSF will also support PSFD.

Linux kernel does not have the interface to enable/disable PSFD yet. Plan
here is to expose the PSFD technology to KVM so that the guest kernel can
make use of it if they wish to.

Signed-off-by: Babu Moger <Babu.Moger@amd.com>
Message-Id: <163244601049.30292.5855870305350227855.stgit@bmoger-ubuntu>
[Keep feature private to KVM, as requested by Borislav Petkov. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:57 -04:00
David Matlack
53597858db KVM: x86/mmu: Avoid memslot lookup in make_spte and mmu_try_to_unsync_pages
mmu_try_to_unsync_pages checks if page tracking is active for the given
gfn, which requires knowing the memslot. We can pass down the memslot
via make_spte to avoid this lookup.

The memslot is also handy for make_spte's marking of the gfn as dirty:
we can test whether dirty page tracking is enabled, and if so ensure that
pages are mapped as writable with 4K granularity.  Apart from the warning,
no functional change is intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-7-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:56 -04:00
David Matlack
8a9f566ae4 KVM: x86/mmu: Avoid memslot lookup in rmap_add
Avoid the memslot lookup in rmap_add, by passing it down from the fault
handling code to mmu_set_spte and then to rmap_add.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-6-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:56 -04:00
Paolo Bonzini
a12f43818b KVM: MMU: pass struct kvm_page_fault to mmu_set_spte
mmu_set_spte is called for either PTE prefetching or page faults.  The
three boolean arguments write_fault, speculative and host_writable are
always respectively false/true/true for prefetching and coming from
a struct kvm_page_fault for page faults.

Let mmu_set_spte distinguish these two situation by accepting a
possibly NULL struct kvm_page_fault argument.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:56 -04:00
Paolo Bonzini
7158bee4b4 KVM: MMU: pass kvm_mmu_page struct to make_spte
The level and A/D bit support of the new SPTE can be found in the role,
which is stored in the kvm_mmu_page struct.  This merges two arguments
into one.

For the TDP MMU, the kvm_mmu_page was not used (kvm_tdp_mmu_map does
not use it if the SPTE is already present) so we fetch it just before
calling make_spte.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:55 -04:00
Paolo Bonzini
87e888eafd KVM: MMU: set ad_disabled in TDP MMU role
Prepare for removing the ad_disabled argument of make_spte; instead it can
be found in the role of a struct kvm_mmu_page.  First of all, the TDP MMU
must set the role accurately.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:55 -04:00
Paolo Bonzini
eb5cd7ffe1 KVM: MMU: remove unnecessary argument to mmu_set_spte
The level of the new SPTE can be found in the kvm_mmu_page struct; there
is no need to pass it down.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:55 -04:00
Paolo Bonzini
ad67e4806e KVM: MMU: clean up make_spte return value
Now that make_spte is called directly by the shadow MMU (rather than
wrapped by set_spte), it only has to return one boolean value.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:54 -04:00
Paolo Bonzini
4758d47e0d KVM: MMU: inline set_spte in FNAME(sync_page)
Since the two callers of set_spte do different things with the results,
inlining it actually makes the code simpler to reason about.  For example,
FNAME(sync_page) already has a struct kvm_mmu_page *, but set_spte had to
fish it back out of sptep's private page data.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:54 -04:00
Paolo Bonzini
d786c7783b KVM: MMU: inline set_spte in mmu_set_spte
Since the two callers of set_spte do different things with the results,
inlining it actually makes the code simpler to reason about.  For example,
mmu_set_spte looks quite like tdp_mmu_map_handle_target_level, but the
similarity is hidden by set_spte.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:54 -04:00
David Matlack
888104138c KVM: x86/mmu: Avoid memslot lookup in page_fault_handle_page_track
Now that kvm_page_fault has a pointer to the memslot it can be passed
down to the page tracking code to avoid a redundant slot lookup.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-5-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:53 -04:00
David Matlack
e710c5f6be KVM: x86/mmu: Pass the memslot around via struct kvm_page_fault
The memslot for the faulting gfn is used throughout the page fault
handling code, so capture it in kvm_page_fault as soon as we know the
gfn and use it in the page fault handling code that has direct access
to the kvm_page_fault struct.  Replace various tests using is_noslot_pfn
with more direct tests on fault->slot being NULL.

This, in combination with the subsequent patch, improves "Populate
memory time" in dirty_log_perf_test by 5% when using the legacy MMU.
There is no discerable improvement to the performance of the TDP MMU.

No functional change intended.

Suggested-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-4-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:53 -04:00
Paolo Bonzini
6ccf443882 KVM: MMU: unify tdp_mmu_map_set_spte_atomic and tdp_mmu_set_spte_atomic_no_dirty_log
tdp_mmu_map_set_spte_atomic is not taking care of dirty logging anymore,
the only difference that remains is that it takes a vCPU instead of
the struct kvm.  Merge the two functions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 03:44:53 -04:00