Commit graph

494893 commits

Author SHA1 Message Date
Wincy Van
3af18d9c5f KVM: nVMX: Prepare for using hardware MSR bitmap
Currently, if L1 enables MSR_BITMAP, we will emulate this feature, all
of L2's msr access is intercepted by L0.  Features like "virtualize
x2apic mode" require that the MSR bitmap is enabled, or the hardware
will exit and for example not virtualize the x2apic MSRs.  In order to
let L1 use these features, we need to build a merged bitmap that only
not cause a VMEXIT if 1) L1 requires that 2) the bit is not required by
the processor for APIC virtualization.

For now the guests are still run with MSR bitmap disabled, but this
patch already introduces nested_vmx_merge_msr_bitmap for future use.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03 17:02:32 +01:00
Marcelo Tosatti
2e6d015799 KVM: x86: revert "add method to test PIR bitmap vector"
Revert 7c6a98dfa1, given
that testing PIR is not necessary anymore.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-02 18:36:34 +01:00
Marcelo Tosatti
f933986038 KVM: x86: fix lapic_timer_int_injected with APIC-v
With APICv, LAPIC timer interrupt is always delivered via IRR:
apic_find_highest_irr syncs PIR to IRR.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-02 18:36:25 +01:00
Paolo Bonzini
ad15a29647 kvm: vmx: fix oops with explicit flexpriority=0 option
A function pointer was not NULLed, causing kvm_vcpu_reload_apic_access_page to
go down the wrong path and OOPS when doing put_page(NULL).

This did not happen on old processors, only when setting the module option
explicitly.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30 16:18:49 +01:00
Radim Krčmář
8a395363e2 KVM: x86: fix x2apic logical address matching
We cannot hit the bug now, but future patches will expose this path.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30 12:26:46 +01:00
Radim Krčmář
3697f302ab KVM: x86: replace 0 with APIC_DEST_PHYSICAL
To make the code self-documenting.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30 12:26:46 +01:00
Radim Krčmář
9368b56762 KVM: x86: cleanup kvm_apic_match_*()
The majority of this patch turns
  result = 0; if (CODE) result = 1; return result;
into
  return CODE;
because we return bool now.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30 12:26:45 +01:00
Radim Krčmář
52c233a440 KVM: x86: return bool from kvm_apic_match*()
And don't export the internal ones while at it.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30 12:26:45 +01:00
Kai Huang
843e433057 KVM: VMX: Add PML support in VMX
This patch adds PML support in VMX. A new module parameter 'enable_pml' is added
to allow user to enable/disable it manually.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30 09:39:54 +01:00
Kai Huang
88178fd4f7 KVM: x86: Add new dirty logging kvm_x86_ops for PML
This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty
logging for particular memory slot, and to flush potentially logged dirty GPAs
before reporting slot->dirty_bitmap to userspace.

kvm x86 common code calls these hooks when they are available so PML logic can
be hidden to VMX specific. SVM won't be impacted as these hooks remain NULL
there.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29 15:31:41 +01:00
Kai Huang
1c91cad423 KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access
This patch changes the second parameter of kvm_mmu_slot_remove_write_access from
'slot id' to 'struct kvm_memory_slot *' to align with kvm_x86_ops dirty logging
hooks, which will be introduced in further patch.

Better way is to change second parameter of kvm_arch_commit_memory_region from
'struct kvm_userspace_memory_region *' to 'struct kvm_memory_slot * new', but it
requires changes on other non-x86 ARCH too, so avoid it now.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29 15:31:37 +01:00
Kai Huang
9b51a63024 KVM: MMU: Explicitly set D-bit for writable spte.
This patch avoids unnecessary dirty GPA logging to PML buffer in EPT violation
path by setting D-bit manually prior to the occurrence of the write from guest.

We only set D-bit manually in set_spte, and leave fast_page_fault path
unchanged, as fast_page_fault is very unlikely to happen in case of PML.

For the hva <-> pa change case, the spte is updated to either read-only (host
pte is read-only) or be dropped (host pte is writeable), and both cases will be
handled by above changes, therefore no change is necessary.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29 15:31:33 +01:00
Kai Huang
f4b4b18086 KVM: MMU: Add mmu help functions to support PML
This patch adds new mmu layer functions to clear/set D-bit for memory slot, and
to write protect superpages for memory slot.

In case of PML, CPU logs the dirty GPA automatically to PML buffer when CPU
updates D-bit from 0 to 1, therefore we don't have to write protect 4K pages,
instead, we only need to clear D-bit in order to log that GPA.

For superpages, we still write protect it and let page fault code to handle
dirty page logging, as we still need to split superpage to 4K pages in PML.

As PML is always enabled during guest's lifetime, to eliminate unnecessary PML
GPA logging, we set D-bit manually for the slot with dirty logging disabled.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29 15:31:29 +01:00
Kai Huang
3b0f1d01e5 KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirty
We don't have to write protect guest memory for dirty logging if architecture
supports hardware dirty logging, such as PML on VMX, so rename it to be more
generic.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29 15:30:38 +01:00
Tiejun Chen
b0165f1b41 kvm: update_memslots: clean flags for invalid memslots
Indeed, any invalid memslots should be new->npages = 0,
new->base_gfn = 0 and new->flags = 0 at the same time.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-27 21:31:44 +01:00
Joerg Roedel
128ca093cc kvm: iommu: Add cond_resched to legacy device assignment code
When assigning devices to large memory guests (>=128GB guest
memory in the failure case) the functions to create the
IOMMU page-tables for the whole guest might run for a very
long time. On non-preemptible kernels this might cause
Soft-Lockup warnings. Fix these by adding a cond_resched()
to the mapping and unmapping loops.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-27 21:31:12 +01:00
Nadav Amit
82268083fa KVM: x86: Emulation of call may use incorrect stack size
On long-mode, when far call that changes cs.l takes place, the stack size is
determined by the new mode.  For instance, if we go from 32-bit mode to 64-bit
mode, the stack-size if 64.  KVM uses the old stack size.

Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:17:34 +01:00
Nadav Amit
bac155310b KVM: x86: 32-bit wraparound read/write not emulated correctly
If we got a wraparound of 32-bit operand, and the limit is 0xffffffff, read and
writes should be successful. It just needs to be done in two segments.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:15:18 +01:00
Nadav Amit
2b42fce695 KVM: x86: Fix defines in emulator.c
Unnecassary define was left after commit 7d882ffa81 ("KVM: x86: Revert
NoBigReal patch in the emulator").

Commit 39f062ff51 ("KVM: x86: Generate #UD when memory operand is required")
was missing undef.

Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:15:03 +01:00
Nadav Amit
2276b5116e KVM: x86: ARPL emulation can cause spurious exceptions
ARPL and MOVSXD are encoded the same and their execution depends on the
execution mode.  The operand sizes of each instruction are different.
Currently, ARPL is detected too late, after the decoding was already done, and
therefore may result in spurious exception (instead of failed emulation).

Introduce a group to the emulator to handle instructions according to execution
mode (32/64 bits). Note: in order not to make changes that may affect
performance, the new ModeDual can only be applied to instructions with ModRM.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:49 +01:00
Nadav Amit
801806d956 KVM: x86: IRET emulation does not clear NMI masking
The IRET instruction should clear NMI masking, but the current implementation
does not do so.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:42 +01:00
Nadav Amit
16794aaaab KVM: x86: Wrong operand size for far ret
Indeed, Intel SDM specifically states that for the RET instruction "In 64-bit
mode, the default operation size of this instruction is the stack-address size,
i.e. 64 bits."

However, experiments show this is not the case. Here is for example objdump of
small 64-bit asm:

  4004f1:	ca 14 00             	lret   $0x14
  4004f4:	48 cb                	lretq
  4004f6:	48 ca 14 00          	lretq  $0x14

Therefore, remove the Stack flag from far-ret instructions.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:25 +01:00
Nadav Amit
2fcf5c8ae2 KVM: x86: Dirty the dest op page on cmpxchg emulation
Intel SDM says for CMPXCHG: "To simplify the interface to the processor’s bus,
the destination operand receives a write cycle without regard to the result of
the comparison.". This means the destination page should be dirtied.

Fix it to by writing back the original value if cmpxchg failed.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:18 +01:00
Paolo Bonzini
8fff5e374a KVM: s390: fixes and features for kvm/next (3.20)
1. Generic
 - sparse warning (make function static)
 - optimize locking
 - bugfixes for interrupt injection
 - fix MVPG addressing modes
 
 2. hrtimer/wakeup fun
 A recent change can cause KVM hangs if adjtime is used in the host.
 The hrtimer might wake up too early or too late. Too early is fatal
 as vcpu_block will see that the wakeup condition is not met and
 sleep again. This CPU might never wake up again.
 This series addresses this problem. adjclock slowing down the host
 clock will result in too late wakeups. This will require more work.
 In addition to that we also change the hrtimer from REALTIME to
 MONOTONIC to avoid similar problems with timedatectl set-time.
 
 3. sigp rework
 We will move all "slow" sigps to QEMU (protected with a capability that
 can be enabled) to avoid several races between concurrent SIGP orders.
 
 4. Optimize the shadow page table
 Provide an interface to announce the maximum guest size. The kernel
 will use that to make the pagetable 2,3,4 (or theoretically) 5 levels.
 
 5. Provide an interface to set the guest TOD
 We now use two vm attributes instead of two oneregs, as oneregs are
 vcpu ioctl and we don't want to call them from other threads.
 
 6. Protected key functions
 The real HMC allows to enable/disable protected key CPACF functions.
 Lets provide an implementation + an interface for QEMU to activate
 this the protected key instructions.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJUwj60AAoJEBF7vIC1phx8iV0QAKq1LZRTmgTLS2fd0oyWKZeN
 ShWUIUiB+7IUiuogYXZMfqOm61oogxwc95Ti+3tpSWYwkzUWagpS/RJQze7E1HOc
 3pHpXwrR01ueUT6uVV4xc/vmVIlQAIl/ScRDDPahlAT2crCleWcKVC9l0zBs/Kut
 IrfzN9pJcrkmXD178CDP8/VwXsn02ptLQEpidGibGHCd03YVFjp3X0wfwNdQxMbU
 qOwNYCz3SLfDm5gsybO2DG+aVY3AbM2ZOJt/qLv2j4Phz4XB4t4W9iJnAefSz7JA
 W4677wbMQpfZlUQYhI78H/Cl9SfWAuLug1xk83O/+lbEiR5u+8zLxB69dkFTiBaH
 442OY957T6TQZ/V9d0jDo2XxFrcaU9OONbVLsfBQ56Vwv5cAg9/7zqG8eqH7Nq9R
 gU3fQesgD4N0Kpa77T9k45TT/hBRnUEtsGixAPT6QYKyE6cK4AJATHKSjMSLbdfj
 ELbt0p2mVtKhuCcANfEx54U2CxOrg5ElBmPz8hRw0OkXdwpqh1sGKmt0govcHP1I
 BGSzE9G4mswwI1bQ7cqcyTk/lwL8g3+KQmRJoOcgCveQlnY12X5zGD5DhuPMPiIT
 VENqbcTzjlxdu+4t7Enml+rXl7ySsewT9L231SSrbLsTQVgCudD1B9m72WLu5ZUT
 9/Z6znv6tkeKV5rM9DYE
 =zLjR
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-20150122' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

KVM: s390: fixes and features for kvm/next (3.20)

1. Generic
- sparse warning (make function static)
- optimize locking
- bugfixes for interrupt injection
- fix MVPG addressing modes

2. hrtimer/wakeup fun
A recent change can cause KVM hangs if adjtime is used in the host.
The hrtimer might wake up too early or too late. Too early is fatal
as vcpu_block will see that the wakeup condition is not met and
sleep again. This CPU might never wake up again.
This series addresses this problem. adjclock slowing down the host
clock will result in too late wakeups. This will require more work.
In addition to that we also change the hrtimer from REALTIME to
MONOTONIC to avoid similar problems with timedatectl set-time.

3. sigp rework
We will move all "slow" sigps to QEMU (protected with a capability that
can be enabled) to avoid several races between concurrent SIGP orders.

4. Optimize the shadow page table
Provide an interface to announce the maximum guest size. The kernel
will use that to make the pagetable 2,3,4 (or theoretically) 5 levels.

5. Provide an interface to set the guest TOD
We now use two vm attributes instead of two oneregs, as oneregs are
vcpu ioctl and we don't want to call them from other threads.

6. Protected key functions
The real HMC allows to enable/disable protected key CPACF functions.
Lets provide an implementation + an interface for QEMU to activate
this the protected key instructions.
2015-01-23 14:33:36 +01:00
Paolo Bonzini
1c6007d59a KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added
trace symbols, and adding an explicit VGIC init device control IOCTL.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUwhsKAAoJEEtpOizt6ddyuSEH/ia2uf07N0i+C1dPKYiqhKEd
 nFqBvgrhAMVztWLmy1Wq4SnO9YNd+CrPYATrfCiYsYQ9aKc09+qDq+uo06bVpZXz
 KsHjVGUsdyJ4qRqjDixkPvZviGIXa6C//+hcwg1XH2nit1uHmXVupzB9dDz3ZM2l
 GCwApdRdaaUVDt5Ud2ljqIWZa18Qf/5/HD8MdPXpmotDOKucL6pBr/1R1XWueCU/
 ejRs/qy3EFyMWdEdfGFAMCa0ZvHbPmsJmvB/EgkyUnuJj77ptA0jNo1jtzSfEyis
 53x4ffWnIsPl9yqhk0oKerIALVUvV4A7/me2ya6tsQ5fiBX7lJ3+qwggvCkWQzw=
 =fMS2
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next

KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added
trace symbols, and adding an explicit VGIC init device control IOCTL.

Conflicts:
	arch/arm64/include/asm/kvm_arm.h
	arch/arm64/kvm/handle_exit.c
2015-01-23 13:39:51 +01:00
Jens Freimann
0eb135ff9f KVM: s390: remove redundant setting of interrupt type
Setting inti->type again is unnecessary here, so let's
remove this.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:42 +01:00
Jens Freimann
94d1f564ad KVM: s390: fix bug in interrupt parameter check
When we convert interrupt data from struct kvm_s390_interrupt to
struct kvm_s390_irq we need to check the data in the input parameter
not the output parameter.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:42 +01:00
David Hildenbrand
428d53be5e KVM: s390: avoid memory leaks if __inject_vm() fails
We have to delete the allocated interrupt info if __inject_vm() fails.

Otherwise user space can keep flooding kvm with floating interrupts and
provoke more and more memory leaks.

Reported-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:41 +01:00
Tony Krowiak
a374e892c3 KVM: s390/cpacf: Enable/disable protected key functions for kvm guest
Created new KVM device attributes for indicating whether the AES and
DES/TDES protected key functions are available for programs running
on the KVM guest.  The attributes are used to set up the controls in
the guest SIE block that specify whether programs running on the
guest will be given access to the protected key functions available
on the s390 hardware.

Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split MSA4/protected key into two patches]
2015-01-23 13:25:40 +01:00
Jason J. Herne
72f250206f KVM: s390: Provide guest TOD Clock Get/Set Controls
Provide controls for setting/getting the guest TOD clock based on the VM
attribute interface.

Provide TOD and TOD_HIGH vm attributes on s390 for managing guest Time Of
Day clock value.

TOD_HIGH is presently always set to 0. In the future it will contain a high
order expansion of the tod clock value after it overflows the 64-bits of
the TOD.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:40 +01:00
Jens Freimann
556cc0dab1 KVM: s390: trace correct values for set prefix and machine checks
When injecting SIGP set prefix or a machine check, we trace
the values in our per-vcpu local_int data structure instead
of the parameters passed to the function.

Fix this by changing the trace statement to use the correct values.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:39 +01:00
Jens Freimann
49538d1238 KVM: s390: fix bug in sigp emergency signal injection
Currently we are always setting the wrong bit in the
bitmap for pending emergency signals. Instead of using
emerg.code from the passed in irq parameter, we use the
value in our per-vcpu local_int structure, which is always zero.
That means all emergency signals will have address 0 as parameter.
If two CPUs send a SIGP to the same target, one might be lost.

Let's fix this by using the value from the parameter and
also trace the correct value.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:39 +01:00
Thomas Huth
3cfad02380 KVM: s390: Take addressing mode into account for MVPG interception
The handler for MVPG partial execution interception does not take
the current CPU addressing mode into account yet, so addresses are
always treated as 64-bit addresses. For correct behaviour, we should
properly handle 24-bit and 31-bit addresses, too.

Since MVPG is defined to work with logical addresses, we can simply
use guest_translate_address() to achieve the required behaviour
(since DAT is disabled here, guest_translate_address() skips the MMU
translation and only translates the address via kvm_s390_logical_to_effective()
and kvm_s390_real_to_abs(), which is exactly what we want here).

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:38 +01:00
Christian Borntraeger
69a8d45626 KVM: s390: no need to hold the kvm->mutex for floating interrupts
The kvm mutex was (probably) used to protect against cpu hotplug.
The current code no longer needs to protect against that, as we only
rely on CPU data structures that are guaranteed to be available
if we can access the CPU. (e.g. vcpu_create will put the cpu
in the array AFTER the cpu is ready).

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
2015-01-23 13:25:37 +01:00
David Hildenbrand
2444b352c3 KVM: s390: forward most SIGP orders to user space
Most SIGP orders are handled partially in kernel and partially in
user space. In order to:
- Get a correct SIGP SET PREFIX handler that informs user space
- Avoid race conditions between concurrently executed SIGP orders
- Serialize SIGP orders per VCPU

We need to handle all "slow" SIGP orders in user space. The remaining
ones to be handled completely in kernel are:
- SENSE
- SENSE RUNNING
- EXTERNAL CALL
- EMERGENCY SIGNAL
- CONDITIONAL EMERGENCY SIGNAL
According to the PoP, they have to be fast. They can be executed
without conflicting to the actions of other pending/concurrently
executing orders (e.g. STOP vs. START).

This patch introduces a new capability that will - when enabled -
forward all but the mentioned SIGP orders to user space. The
instruction counters in the kernel are still updated.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:37 +01:00
David Hildenbrand
9fbd80828c KVM: s390: clear the pfault queue if user space sets the invalid token
We need a way to clear the async pfault queue from user space (e.g.
for resets and SIGP SET ARCHITECTURE).

This patch simply clears the queue as soon as user space sets the
invalid pfault token. The definition of the invalid token is moved
to uapi.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:36 +01:00
David Hildenbrand
ea5f496925 KVM: s390: only one external call may be pending at a time
Only one external call may be pending at a vcpu at a time. For this
reason, we have to detect whether the SIGP externcal call interpretation
facility is available. If so, all external calls have to be injected
using this mechanism.

SIGP EXTERNAL CALL orders have to return whether another external
call is already pending. This check was missing until now.

SIGP SENSE hasn't returned yet in all conditions whether an external
call was pending.

If a SIGP EXTERNAL CALL irq is to be injected and one is already
pending, -EBUSY is returned.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:36 +01:00
David Hildenbrand
d614be05c8 s390/sclp: introduce check for the SIGP Interpretation Facility
This patch introduces the infrastructure to check whether the SIGP
Interpretation Facility is installed on all VCPUs in the configuration.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:35 +01:00
David Hildenbrand
a3a9c59a68 KVM: s390: SIGP SET PREFIX cleanup
This patch cleanes up the the SIGP SET PREFIX code.

A SIGP SET PREFIX irq may only be injected if the target vcpu is
stopped. Let's move the checking code into the injection code and
return -EBUSY if the target vcpu is not stopped.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:34 +01:00
David Hildenbrand
9a022067ad KVM: s390: a VCPU may only stop when no interrupts are left pending
As a SIGP STOP is an interrupt with the least priority, it may only result
in stop of the vcpu when no other interrupts are left pending.

To detect whether a non-stop irq is pending, we need a way to mask out
stop irqs from the general kvm_cpu_has_interrupt() function. For this
reason, the existing function (with an outdated name) is replaced by
kvm_s390_vcpu_has_irq() which allows to mask out pending stop irqs.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:34 +01:00
David Hildenbrand
6cddd432e3 KVM: s390: handle stop irqs without action_bits
This patch removes the famous action_bits and moves the handling of
SIGP STOP AND STORE STATUS directly into the SIGP STOP interrupt.

The new local interrupt infrastructure is used to track pending stop
requests.

STOP irqs are the only irqs that don't get actively delivered. They
remain pending until the stop function is executed (=stop intercept).

If another STOP irq is already pending, -EBUSY will now be returned
(needed for the SIGP handling code).

Migration of pending SIGP STOP (AND STORE STATUS) orders should now
be supported out of the box.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:33 +01:00
David Hildenbrand
2822545f9f KVM: s390: new parameter for SIGP STOP irqs
In order to get rid of the action_flags and to properly migrate pending SIGP
STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember
whether to store the status when stopping.

For this reason, a new parameter (flags) for the SIGP STOP irq is introduced.
These flags further define details of the requested STOP and can be easily
migrated.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:33 +01:00
David Hildenbrand
2d00f75942 KVM: s390: forward hrtimer if guest ckc not pending yet
Patch 0759d0681c ("KVM: s390: cleanup handle_wait by reusing
kvm_vcpu_block") changed the way pending guest clock comparator
interrupts are detected. It was assumed that as soon as the hrtimer
wakes up, the condition for the guest ckc is satisfied.

This is however only true as long as adjclock() doesn't speed
up the monotonic clock. Reason is that the hrtimer is based on
CLOCK_MONOTONIC, the guest clock comparator detection is based
on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the
TOD clock, the hrtimer wakes the target VCPU up too early and
the target VCPU will not detect any pending interrupts, therefore
going back to sleep. It will never be woken up again because the
hrtimer has finished. The VCPU is stuck.

As a quick fix, we have to forward the hrtimer until the guest
clock comparator is really due, to guarantee properly timed wake
ups.

As the hrtimer callback might be triggered on another cpu, we
have to make sure that the timer is really stopped and not currently
executing the callback on another cpu. This can happen if the vcpu
thread is scheduled onto another physical cpu, but the timer base
is not migrated. So lets use hrtimer_cancel instead of try_to_cancel.

A proper fix might be to introduce a RAW based hrtimer.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:32 +01:00
David Hildenbrand
0ac96caf0f KVM: s390: base hrtimer on a monotonic clock
The hrtimer that handles the wait with enabled timer interrupts
should not be disturbed by changes of the host time.

This patch changes our hrtimer to be based on a monotonic clock.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:32 +01:00
David Hildenbrand
bda343ef14 KVM: s390: prevent sleep duration underflows in handle_wait()
We sometimes get an underflow for the sleep duration, which most
likely won't result in the short sleep time we wanted.

So let's check for sleep duration underflows and directly continue
to run the guest if we get one.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:31 +01:00
Dominik Dingel
8c0a7ce606 KVM: s390: Allow userspace to limit guest memory size
With commit c6c956b80b ("KVM: s390/mm: support gmap page tables with less
than 5 levels") we are able to define a limit for the guest memory size.

As we round up the guest size in respect to the levels of page tables
we get to guest limits of: 2048 MB, 4096 GB, 8192 TB and 16384 PB.
We currently limit the guest size to 16 TB, which means we end up
creating a page table structure supporting guest sizes up to 8192 TB.

This patch introduces an interface that allows userspace to tune
this limit. This may bring performance improvements for small guests.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:30 +01:00
Dominik Dingel
dafd032a15 KVM: s390: move vcpu specific initalization to a later point
As we will allow in a later patch to recreate gmaps with new limits,
we need to make sure that vcpus get their reference for that gmap
after they increased the online_vcpu counter, so there is no possible race.

While we are doing this, we also can simplify the vcpu_init function, by
moving ucontrol specifics to an own function.
That way we also start now setting the kvm_valid_regs for the ucontrol path.

Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:30 +01:00
Christian Borntraeger
0675d92dcf KVM: s390: make local function static
sparse rightfully complains about
warning: symbol '__inject_extcall' was not declared. Should it be static?

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:29 +01:00
Dominik Dingel
31928aa586 KVM: remove unneeded return value of vcpu_postcreate
The return value of kvm_arch_vcpu_postcreate is not checked in its
caller.  This is okay, because only x86 provides vcpu_postcreate right
now and it could only fail if vcpu_load failed.  But that is not
possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so
just get rid of the unchecked return value.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:24:52 +01:00
Paolo Bonzini
c6156df9d3 Merge branch 'arm64/common-esr-macros' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux into kvm-next
ESR_ELx definitions clean-up from Mark Rutland.

* 'arm64/common-esr-macros' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux:
  arm64: kvm: decode ESR_ELx.EC when reporting exceptions
  arm64: kvm: remove ESR_EL2_* macros
  arm64: remove ESR_EL1_* macros
  arm64: kvm: move to ESR_ELx macros
  arm64: decode ESR_ELx.EC when reporting exceptions
  arm64: move to ESR_ELx macros
  arm64: introduce common ESR_ELx_* definitions

This is required by the patch "arm/arm64: KVM: add tracing support for
arm64 exit handler" in Christoffer's pull request.
2015-01-23 13:09:15 +01:00