Commit graph

778 commits

Author SHA1 Message Date
Paolo Bonzini
609e36d372 KVM: x86: pass host_initiated to functions that read MSRs
SMBASE is only readable from SMM for the VCPU, but it must be always
accessible if userspace is accessing it.  Thus, all functions that
read MSRs are changed to accept a struct msr_data; the host_initiated
and index fields are pre-initialized, while the data field is filled
on return.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04 16:01:00 +02:00
Paolo Bonzini
a9b4fb7e79 Merge branch 'kvm-master' into kvm-next
Grab MPX bugfix, and fix conflicts against Rik's adaptive FPU
deactivation patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 12:31:37 +02:00
Paolo Bonzini
0fdd74f778 Revert "KVM: x86: drop fpu_activate hook"
This reverts commit 4473b570a7.  We'll
use the hook again.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 12:30:15 +02:00
Nadav Amit
3db176d5b4 KVM: x86: Fix DR7 mask on task-switch while debugging
If the host sets hardware breakpoints to debug the guest, and a task-switch
occurs in the guest, the architectural DR7 will not be updated. The effective
DR7 would be updated instead.

This fix puts the DR7 update during task-switch emulation, so it now uses the
standard DR setting mechanism instead of the one that was previously used. As a
bonus, the update of DR7 will now be effective for AMD as well.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:35 +02:00
Jan Kiszka
8a9781f7ad KVM: nVMX: Fix host crash when loading MSRs with userspace irqchip
vcpu->arch.apic is NULL when a userspace irqchip is active. But instead
of letting the test incorrectly depend on in-kernel irqchip mode,
open-code it to catch also userspace x2APICs.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-08 10:51:45 +02:00
Paolo Bonzini
4eb64dce8d KVM: x86: dump VMCS on invalid entry
Code and format roughly based on Xen's vmcs_dump_vcpu.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07 11:30:40 +02:00
Radim Krčmář
74545705cb KVM: x86: fix initial PAT value
PAT should be 0007_0406_0007_0406h on RESET and not modified on INIT.
VMX used a wrong value (host's PAT) and while SVM used the right one,
it never got to arch.pat.

This is not an issue with QEMU as it will force the correct value.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07 11:29:46 +02:00
Nadav Amit
d28bc9dd25 KVM: x86: INIT and reset sequences are different
x86 architecture defines differences between the reset and INIT sequences.
INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU,
MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP.

References (from Intel SDM):

"If the MP protocol has completed and a BSP is chosen, subsequent INITs (either
to a specific processor or system wide) do not cause the MP protocol to be
repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions]

[Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT]

"If the processor is reset by asserting the INIT# pin, the x87 FPU state is not
changed." [9.2: X87 FPU INITIALIZATION]

"The state of the local APIC following an INIT reset is the same as it is after
a power-up or hardware reset, except that the APIC ID and arbitration ID
registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset
("Wait-for-SIPI" State)]

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07 11:29:43 +02:00
Ben Serebrin
085e68eeaf KVM: VMX: Preserve host CR4.MCE value while in guest mode.
The host's decision to enable machine check exceptions should remain
in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
and passed a slightly-modified 0 to the vmcs.guest_cr4 value.

Tested: Built.
On earlier version, tested by injecting machine check
while a guest is spinning.

Before the change, if guest CR4.MCE==0, then the machine check is
escalated to Catastrophic Error (CATERR) and the machine dies.
If guest CR4.MCE==1, then the machine check causes VMEXIT and is
handled normally by host Linux. After the change, injecting a machine
check causes normal Linux machine check handling.

Signed-off-by: Ben Serebrin <serebrin@google.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-21 19:01:44 +02:00
Linus Torvalds
9003601310 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64.
ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling
 vhost, too), page aging
 
 s390: interrupt handling rework, allowing to inject all local interrupts
 via new ioctl and to get/set the full local irq state for migration
 and introspection.  New ioctls to access memory by virtual address,
 and to get/set the guest storage keys.  SIMD support.
 
 MIPS: FPU and MIPS SIMD Architecture (MSA) support.  Includes some patches
 from Ralf Baechle's MIPS tree.
 
 x86: bugfixes (notably for pvclock, the others are small) and cleanups.
 Another small latency improvement for the TSC deadline timer.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJVJ9vmAAoJEL/70l94x66DoMEH/R3rh8IMf4jTiWRkcqohOMPX
 k1+NaSY/lCKayaSgggJ2hcQenMbQoXEOdslvaA/H0oC+VfJGK+lmU6E63eMyyhjQ
 Y+Px6L85NENIzDzaVu/TIWWuhil5PvIRr3VO8cvntExRoCjuekTUmNdOgCvN2ObW
 wswN2qRdPIeEj2kkulbnye+9IV4G0Ne9bvsmUdOdfSSdi6ZcV43JcvrpOZT++mKj
 RrKB+3gTMZYGJXMMLBwMkdl8mK1ozriD+q0mbomT04LUyGlPwYLl4pVRDBqyksD7
 KsSSybaK2E4i5R80WEljgDMkNqrCgNfg6VZe4n9Y+CfAAOToNnkMJaFEi+yuqbs=
 =yu2b
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.1

  The most interesting bit here is irqfd/ioeventfd support for ARM and
  ARM64.

  Summary:

  ARM/ARM64:
     fixes for live migration, irqfd and ioeventfd support (enabling
     vhost, too), page aging

  s390:
     interrupt handling rework, allowing to inject all local interrupts
     via new ioctl and to get/set the full local irq state for migration
     and introspection.  New ioctls to access memory by virtual address,
     and to get/set the guest storage keys.  SIMD support.

  MIPS:
     FPU and MIPS SIMD Architecture (MSA) support.  Includes some
     patches from Ralf Baechle's MIPS tree.

  x86:
     bugfixes (notably for pvclock, the others are small) and cleanups.
     Another small latency improvement for the TSC deadline timer"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
  KVM: use slowpath for cross page cached accesses
  kvm: mmu: lazy collapse small sptes into large sptes
  KVM: x86: Clear CR2 on VCPU reset
  KVM: x86: DR0-DR3 are not clear on reset
  KVM: x86: BSP in MSR_IA32_APICBASE is writable
  KVM: x86: simplify kvm_apic_map
  KVM: x86: avoid logical_map when it is invalid
  KVM: x86: fix mixed APIC mode broadcast
  KVM: x86: use MDA for interrupt matching
  kvm/ppc/mpic: drop unused IRQ_testbit
  KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
  KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
  KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
  KVM: vmx: pass error code with internal error #2
  x86: vdso: fix pvclock races with task migration
  KVM: remove kvm_read_hva and kvm_read_hva_atomic
  KVM: x86: optimize delivery of TSC deadline timer interrupt
  KVM: x86: extract blocking logic from __vcpu_run
  kvm: x86: fix x86 eflags fixed bit
  KVM: s390: migrate vcpu interrupt state
  ...
2015-04-13 09:47:01 -07:00
Nadav Amit
58d269d8cc KVM: x86: BSP in MSR_IA32_APICBASE is writable
After reset, the CPU can change the BSP, which will be used upon INIT.  Reset
should return the BSP which QEMU asked for, and therefore handled accordingly.

To quote: "If the MP protocol has completed and a BSP is chosen, subsequent
INITs (either to a specific processor or system wide) do not cause the MP
protocol to be repeated."
[Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions]

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427933438-12782-3-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08 10:47:02 +02:00
Eugene Korenevsky
92d71bc695 KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
After speed-up of cpuid_maxphyaddr() it can be called frequently:
instead of heavyweight enumeration of CPUID entries it returns a cached
pre-computed value. It is also inlined now. So caching its result became
unnecessary and can be removed.

Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Message-Id: <20150329205644.GA1258@gnote>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08 10:46:58 +02:00
Eugene Korenevsky
9090422f1c KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
On each VM-entry CPU should check the following VMCS fields for zero bits
beyond physical address width:
-  APIC-access address
-  virtual-APIC address
-  posted-interrupt descriptor address
This patch adds these checks required by Intel SDM.

Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Message-Id: <20150329205627.GA1244@gnote>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08 10:46:57 +02:00
Radim Krčmář
80f0e95d1b KVM: vmx: pass error code with internal error #2
Exposing the on-stack error code with internal error is cheap and
potentially useful.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Message-Id: <1428001865-32280-1-git-send-email-rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08 10:46:56 +02:00
Paolo Bonzini
bf0fb67cf9 KVM/ARM changes for v4.1:
- fixes for live migration
 - irqfd support
 - kvm-io-bus & vgic rework to enable ioeventfd
 - page ageing for stage-2 translation
 - various cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVHQ0kAAoJECPQ0LrRPXpDHKQQALjw6STaZd7n20OFopNgHd4P
 qVeWYEKBxnsiSvL4p3IOSlZlEul+7x08aZqtyxWQRQcDT4ggTI+3FKKfc+8yeRpH
 WV6YJP0bGqz7039PyMLuIgs48xkSZtntePw69hPJfHZh4C1RBlP5T2SfE8mU8VZX
 fWToiU3W12QfKnmN7JFgxZopnGhrYCrG0EexdTDziAZu0GEMlDrO4wnyTR60WCvT
 4TEF73R0kpAz4yplKuhcDHuxIG7VFhQ4z7b09M1JtR0gQ3wUvfbD3Wqqi49SwHkv
 NQOStcyLsIlDosSRcLXNCwb3IxjObXTBcAxnzgm2Aoc1xMMZX1ZPQNNs6zFZzycb
 2c6QMiQ35zm7ellbvrG+bT+BP86JYWcAtHjWcaUFgqSJjb8MtqcMtsCea/DURhqx
 /kictqbPYBBwKW6SKbkNkisz59hPkuQnv35fuf992MRCbT9LAXLPRLbcirucCzkE
 p1MOotsWoO3ldJMZaVn0KYk3sQf6mCIfbYPEdOcw3fhJlvyy3NdjVkLOFbA5UUg1
 rQ7Ru2rTemBc0ExVrymngNTMpMB4XcEeJzXfhcgMl3DWbDj60Ku/O26sDtZ6bsFv
 JuDYn8FVDHz9gpEQHgiUi1YMsBKXLhnILa1ppaa6AflykU3BRfYjAk1SXmX84nQK
 mJUJEdFuxi6pHN0UKxUI
 =avA4
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next'

KVM/ARM changes for v4.1:

- fixes for live migration
- irqfd support
- kvm-io-bus & vgic rework to enable ioeventfd
- page ageing for stage-2 translation
- various cleanups
2015-04-07 18:09:20 +02:00
Joe Perches
1d804d079a x86: Use bool function return values of true/false not 1/0
Use the normal return values for bool functions

Signed-off-by: Joe Perches <joe@perches.com>
Message-Id: <9f593eb2f43b456851cd73f7ed09654ca58fb570.1427759009.git.joe@perches.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-31 18:05:09 +02:00
Nadav Amit
b32a991800 KVM: x86: Remove redundant definitions
Some constants are redfined in emulate.c. Avoid it.

s/SELECTOR_RPL_MASK/SEGMENT_RPL_MASK
s/SELECTOR_TI_MASK/SEGMENT_TI_MASK

No functional change.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Message-Id: <1427635984-8113-3-git-send-email-namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-30 16:46:42 +02:00
Jan Kiszka
b3a2a9076d KVM: nVMX: Add support for rdtscp
If the guest CPU is supposed to support rdtscp and the host has rdtscp
enabled in the secondary execution controls, we can also expose this
feature to L1. Just extend nested_vmx_exit_handled to properly route
EXIT_REASON_RDTSCP.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-26 22:33:48 -03:00
Nikolay Nikolaev
e32edf4fd0 KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks.
This is needed in e.g. ARM vGIC emulation, where the MMIO handling
depends on the VCPU that does the access.

Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-26 21:43:11 +00:00
Radim Krčmář
0790ec172d KVM: nVMX: mask unrestricted_guest if disabled on L0
If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
L0.  L1 triple faulted when running L2 guest that required emulation.

Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
in L0's dmesg:
  WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()

Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
the host doesn't have it enabled.

Fixes: 78051e3b7e ("KVM: nVMX: Disable unrestricted mode if ept=0")
Cc: stable@vger.kernel.org
Tested-By: Kashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-17 22:09:17 -03:00
Jan Kiszka
ae1f576707 KVM: nVMX: Do not emulate #UD while in guest mode
While in L2, leave all #UD to L2 and do not try to emulate it. If L1 is
interested in doing this, it reports its interest via the exception
bitmap, and we never get into handle_exception of L0 anyway.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-13 13:44:43 -03:00
Wincy Van
670125bda1 KVM: VMX: Set msr bitmap correctly if vcpu is in guest mode
In commit 3af18d9c5f ("KVM: nVMX: Prepare for using hardware MSR bitmap"),
we are setting MSR_BITMAP in prepare_vmcs02 if we should use hardware. This
is not enough since the field will be modified by following vmx_set_efer.

Fix this by setting vmx_msr_bitmap_nested in vmx_set_msr_bitmap if vcpu is
in guest mode.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-13 09:24:51 -03:00
Joel Schopp
5cb56059c9 kvm: x86: make kvm_emulate_* consistant
Currently kvm_emulate() skips the instruction but kvm_emulate_* sometimes
don't.  The end reult is the caller ends up doing the skip themselves.
Let's make them consistant.

Signed-off-by: Joel Schopp <joel.schopp@amd.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-10 20:29:15 -03:00
Radim Krčmář
21bc8dc5b7 KVM: VMX: fix build without CONFIG_SMP
'apic' is not defined if !CONFIG_X86_64 && !CONFIG_X86_LOCAL_APIC.
Posted interrupt makes no sense without CONFIG_SMP, and
CONFIG_X86_LOCAL_APIC will be set with it.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-23 22:28:48 +01:00
Linus Torvalds
37507717de Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf updates from Ingo Molnar:
 "This series tightens up RDPMC permissions: currently even highly
  sandboxed x86 execution environments (such as seccomp) have permission
  to execute RDPMC, which may leak various perf events / PMU state such
  as timing information and other CPU execution details.

  This 'all is allowed' RDPMC mode is still preserved as the
  (non-default) /sys/devices/cpu/rdpmc=2 setting.  The new default is
  that RDPMC access is only allowed if a perf event is mmap-ed (which is
  needed to correctly interpret RDPMC counter values in any case).

  As a side effect of these changes CR4 handling is cleaned up in the
  x86 code and a shadow copy of the CR4 value is added.

  The extra CR4 manipulation adds ~ <50ns to the context switch cost
  between rdpmc-capable and rdpmc-non-capable mms"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
  perf/x86: Only allow rdpmc if a perf_event is mapped
  perf: Pass the event to arch_perf_update_userpage()
  perf: Add pmu callbacks to track event mapping and unmapping
  x86: Add a comment clarifying LDT context switching
  x86: Store a per-cpu shadow copy of CR4
  x86: Clean up cr4 manipulation
2015-02-16 14:58:12 -08:00
Radim Krčmář
dab2087def KVM: x86: fix build with !CONFIG_SMP
<asm/apic.h> isn't included directly and without CONFIG_SMP, an option
that automagically pulls it can't be enabled.

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-10 08:53:18 +01:00
Andy Lutomirski
1e02ce4ccc x86: Store a per-cpu shadow copy of CR4
Context switches and TLB flushes can change individual bits of CR4.
CR4 reads take several cycles, so store a shadow copy of CR4 in a
per-cpu variable.

To avoid wasting a cache line, I added the CR4 shadow to
cpu_tlbstate, which is already touched in switch_mm.  The heaviest
users of the cr4 shadow will be switch_mm and __switch_to_xtra, and
__switch_to_xtra is called shortly after switch_mm during context
switch, so the cacheline is likely to be hot.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:42 +01:00
Andy Lutomirski
375074cc73 x86: Clean up cr4 manipulation
CR4 manipulation was split, seemingly at random, between direct
(write_cr4) and using a helper (set/clear_in_cr4).  Unfortunately,
the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code,
which only a small subset of users actually wanted.

This patch replaces all cr4 access in functions that don't leave cr4
exactly the way they found it with new helpers cr4_set_bits,
cr4_clear_bits, and cr4_set_bits_and_update_boot.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:41 +01:00
Wincy Van
705699a139 KVM: nVMX: Enable nested posted interrupt processing
If vcpu has a interrupt in vmx non-root mode, injecting that interrupt
requires a vmexit.  With posted interrupt processing, the vmexit
is not needed, and interrupts are fully taken care of by hardware.
In nested vmx, this feature avoids much more vmexits than non-nested vmx.

When L1 asks L0 to deliver L1's posted interrupt vector, and the target
VCPU is in non-root mode, we use a physical ipi to deliver POSTED_INTR_NV
to the target vCPU.  Using POSTED_INTR_NV avoids unexpected interrupts
if a concurrent vmexit happens and L1's vector is different with L0's.
The IPI triggers posted interrupt processing in the target physical CPU.

In case the target vCPU was not in guest mode, complete the posted
interrupt delivery on the next entry to L2.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:15:08 +01:00
Wincy Van
608406e290 KVM: nVMX: Enable nested virtual interrupt delivery
With virtual interrupt delivery, the hardware lets KVM use a more
efficient mechanism for interrupt injection. This is an important feature
for nested VMX, because it reduces vmexits substantially and they are
much more expensive with nested virtualization.  This is especially
important for throughput-bound scenarios.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:07:38 +01:00
Wincy Van
82f0dd4b27 KVM: nVMX: Enable nested apic register virtualization
We can reduce apic register virtualization cost with this feature,
it is also a requirement for virtual interrupt delivery and posted
interrupt processing.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:07:03 +01:00
Wincy Van
b9c237bb1d KVM: nVMX: Make nested control MSRs per-cpu
To enable nested apicv support, we need per-cpu vmx
control MSRs:
  1. If in-kernel irqchip is enabled, we can enable nested
     posted interrupt, we should set posted intr bit in
     the nested_vmx_pinbased_ctls_high.
  2. If in-kernel irqchip is disabled, we can not enable
     nested posted interrupt, the posted intr bit
     in the nested_vmx_pinbased_ctls_high will be cleared.

Since there would be different settings about in-kernel
irqchip between VMs, different nested control MSRs
are needed.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:06:51 +01:00
Wincy Van
f2b93280ed KVM: nVMX: Enable nested virtualize x2apic mode
When L2 is using x2apic, we can use virtualize x2apic mode to
gain higher performance, especially in apicv case.

This patch also introduces nested_vmx_check_apicv_controls
for the nested apicv patches.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:06:17 +01:00
Wincy Van
3af18d9c5f KVM: nVMX: Prepare for using hardware MSR bitmap
Currently, if L1 enables MSR_BITMAP, we will emulate this feature, all
of L2's msr access is intercepted by L0.  Features like "virtualize
x2apic mode" require that the MSR bitmap is enabled, or the hardware
will exit and for example not virtualize the x2apic MSRs.  In order to
let L1 use these features, we need to build a merged bitmap that only
not cause a VMEXIT if 1) L1 requires that 2) the bit is not required by
the processor for APIC virtualization.

For now the guests are still run with MSR bitmap disabled, but this
patch already introduces nested_vmx_merge_msr_bitmap for future use.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:02:32 +01:00
Marcelo Tosatti
2e6d015799 KVM: x86: revert "add method to test PIR bitmap vector"
Revert 7c6a98dfa1, given
that testing PIR is not necessary anymore.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-02 18:36:34 +01:00
Paolo Bonzini
ad15a29647 kvm: vmx: fix oops with explicit flexpriority=0 option
A function pointer was not NULLed, causing kvm_vcpu_reload_apic_access_page to
go down the wrong path and OOPS when doing put_page(NULL).

This did not happen on old processors, only when setting the module option
explicitly.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30 16:18:49 +01:00
Kai Huang
843e433057 KVM: VMX: Add PML support in VMX
This patch adds PML support in VMX. A new module parameter 'enable_pml' is added
to allow user to enable/disable it manually.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30 09:39:54 +01:00
Rickard Strandqvist
0c55d6d931 x86: kvm: vmx: Remove some unused functions
Removes some functions that are not used anywhere:
cpu_has_vmx_eptp_writeback() cpu_has_vmx_eptp_uncacheable()

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-19 11:09:36 +01:00
Paolo Bonzini
ad896af0b5 KVM: x86: mmu: remove argument to kvm_init_shadow_mmu and kvm_init_shadow_ept_mmu
The initialization function in mmu.c can always use walk_mmu, which
is known to be vcpu->arch.mmu.  Only init_kvm_nested_mmu is used to
initialize vcpu->arch.nested_mmu.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:48:02 +01:00
Marcelo Tosatti
7c6a98dfa1 KVM: x86: add method to test PIR bitmap vector
kvm_x86_ops->test_posted_interrupt() returns true/false depending
whether 'vector' is set.

Next patch makes use of this interface.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:18 +01:00
Tiejun Chen
b4eef9b36d kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv
In most cases calling hwapic_isr_update(), we always check if
kvm_apic_vid_enabled() == 1, but actually,
kvm_apic_vid_enabled()
    -> kvm_x86_ops->vm_has_apicv()
        -> vmx_vm_has_apicv() or '0' in svm case
            -> return enable_apicv && irqchip_in_kernel(kvm)

So its a little cost to recall vmx_vm_has_apicv() inside
hwapic_isr_update(), here just NULL out hwapic_isr_update() in
case of !enable_apicv inside hardware_setup() then make all
related stuffs follow this. Note we don't check this under that
condition of irqchip_in_kernel() since we should make sure
definitely any caller don't work  without in-kernel irqchip.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:17 +01:00
Eugene Korenevsky
19d5f10b3a KVM: nVMX: consult PFEC_MASK and PFEC_MATCH when generating #PF VM-exit
When generating #PF VM-exit, check equality:
(PFEC & PFEC_MASK) == PFEC_MATCH
If there is equality, the 14 bit of exception bitmap is used to take decision
about generating #PF VM-exit. If there is inequality, inverted 14 bit is used.

Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:16 +01:00
Eugene Korenevsky
e9ac033e6b KVM: nVMX: Improve nested msr switch checking
This patch improve checks required by Intel Software Developer Manual.
 - SMM MSRs are not allowed.
 - microcode MSRs are not allowed.
 - check x2apic MSRs only when LAPIC is in x2apic mode.
 - MSR switch areas must be aligned to 16 bytes.
 - address of first and last byte in MSR switch areas should not set any bits
   beyond the processor's physical-address width.

Also it adds warning messages on failures during MSR switch. These messages
are useful for people who debug their VMMs in nVMX.

Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:15 +01:00
Wincy Van
ff651cb613 KVM: nVMX: Add nested msr load/restore algorithm
Several hypervisors need MSR auto load/restore feature.
We read MSRs from VM-entry MSR load area which specified by L1,
and load them via kvm_set_msr in the nested entry.
When nested exit occurs, we get MSRs via kvm_get_msr, writing
them to L1`s MSR store area. After this, we read MSRs from VM-exit
MSR load area, and load them via kvm_set_msr.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:14 +01:00
Tiejun Chen
baa035227b kvm: x86: vmx: reorder some msr writing
The commit 34a1cd60d1, "x86: vmx: move some vmx setting from
vmx_init() to hardware_setup()", tried to refactor some codes
specific to vmx hardware setting into hardware_setup(), but some
msr writing should depend on our previous setting condition like
enable_apicv, enable_ept and so on.

Reported-by: Jamie Heilman <jamie@audible.transient.net>
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-27 21:52:10 +01:00
Bandan Das
78051e3b7e KVM: nVMX: Disable unrestricted mode if ept=0
If L0 has disabled EPT, don't advertise unrestricted
mode at all since it depends on EPT to run real mode code.

Fixes: 92fbc7b195
Cc: stable@vger.kernel.org
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-11 12:26:15 +01:00
Wanpeng Li
81dc01f749 kvm: vmx: add nested virtualization support for xsaves
Add nested virtualization support for xsaves.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:44 +01:00
Wanpeng Li
203000993d kvm: vmx: add MSR logic for XSAVES
Add logic to get/set the XSS model-specific register.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:39 +01:00
Wanpeng Li
f53cd63c2d kvm: x86: handle XSAVES vmcs and vmexit
Initialize the XSS exit bitmap.  It is zero so there should be no XSAVES
or XRSTORS exits.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:33 +01:00
Wanpeng Li
55412b2eda kvm: x86: Add kvm_x86_ops hook that enables XSAVES for guest
Expose the XSAVES feature to the guest if the kvm_x86_ops say it is
available.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:57:16 +01:00