Commit graph

95111 commits

Author SHA1 Message Date
Thomas Huth
a3fb577e48 KVM: s390: Improve is_valid_psw()
As a program status word is also invalid (and thus generates an
specification exception) if the instruction address is not even,
we should test this in is_valid_psw(), too. This patch also exports
the function so that it becomes available for other parts of the
S390 KVM code as well.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:18 +02:00
Martin Schwidefsky
3a801517ad KVM: s390: correct locking for s390_enable_skey
Use the mm semaphore to serialize multiple invocations of s390_enable_skey.
The second CPU faulting on a storage key operation needs to wait for the
completion of the page table update. Taking the mm semaphore writable
has the positive side-effect that it prevents any host faults from
taking place which does have implications on keys vs PGSTE.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:17 +02:00
Jan Kiszka
d9f89b88f5 KVM: x86: Fix CR3 reserved bits check in long mode
Regression of 346874c9: PAE is set in long mode, but that does not mean
we have valid PDPTRs.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-12 20:04:01 +02:00
Gabriel L. Somlo
87c00572ba kvm: x86: emulate monitor and mwait instructions as nop
Treat monitor and mwait instructions as nop, which is architecturally
correct (but inefficient) behavior. We do this to prevent misbehaving
guests (e.g. OS X <= 10.7) from crashing after they fail to check for
monitor/mwait availability via cpuid.

Since mwait-based idle loops relying on these nop-emulated instructions
would keep the host CPU pegged at 100%, do NOT advertise their presence
via cpuid, to prevent compliant guests from using them inadvertently.

Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-08 15:40:49 +02:00
Michael S. Tsirkin
b63cf42fd1 kvm/x86: implement hv EOI assist
It seems that it's easy to implement the EOI assist
on top of the PV EOI feature: simply convert the
page address to the format expected by PV EOI.

Notes:
-"No EOI required" is set only if interrupt injected
 is edge triggered; this is true because level interrupts are going
 through IOAPIC which disables PV EOI.
 In any case, if guest triggers EOI the bit will get cleared on exit.
-For migration, set of HV_X64_MSR_APIC_ASSIST_PAGE sets
 KVM_PV_EOI_EN internally, so restoring HV_X64_MSR_APIC_ASSIST_PAGE
 seems sufficient
 In any case, bit is cleared on exit so worst case it's never re-enabled
-no handling of PV EOI data is performed at HV_X64_MSR_EOI write;
 HV_X64_MSR_EOI is a separate optimization - it's an X2APIC
 replacement that lets you do EOI with an MSR and not IO.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07 18:00:49 +02:00
Nadav Amit
5f7dde7bbb KVM: x86: Mark bit 7 in long-mode PDPTE according to 1GB pages support
In long-mode, bit 7 in the PDPTE is not reserved only if 1GB pages are
supported by the CPU. Currently the bit is considered by KVM as always
reserved.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07 17:25:22 +02:00
Nadav Amit
a4ab9d0cf1 KVM: vmx: handle_dr does not handle RSP correctly
The RSP register is not automatically cached, causing mov DR instruction with
RSP to fail.  Instead the regular register accessing interface should be used.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07 17:24:59 +02:00
Bandan Das
4291b58885 KVM: nVMX: move vmclear and vmptrld pre-checks to nested_vmx_check_vmptr
Some checks are common to all, and moreover,
according to the spec, the check for whether any bits
beyond the physical address width are set are also
applicable to all of them

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06 19:00:43 +02:00
Bandan Das
96ec146330 KVM: nVMX: fail on invalid vmclear/vmptrld pointer
The spec mandates that if the vmptrld or vmclear
address is equal to the vmxon region pointer, the
instruction should fail with error "VMPTRLD with
VMXON pointer" or "VMCLEAR with VMXON pointer"

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06 19:00:37 +02:00
Bandan Das
3573e22cfe KVM: nVMX: additional checks on vmxon region
Currently, the vmxon region isn't used in the nested case.
However, according to the spec, the vmxon instruction performs
additional sanity checks on this region and the associated
pointer. Modify emulated vmxon to better adhere to the spec
requirements

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06 19:00:27 +02:00
Bandan Das
19677e32fe KVM: nVMX: rearrange get_vmx_mem_address
Our common function for vmptr checks (in 2/4) needs to fetch
the memory address

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06 18:59:57 +02:00
Paolo Bonzini
2ce316f0b9 1. Fixes an error return code for the breakpoint setup
2. External interrupt fixes
 2.1. Some interrupt conditions like cpu timer or clock comparator
 stay pending even after the interrupt is injected. If the external
 new PSW is enabled for interrupts this will result in an endless
 loop. Usually this indicates a programming error in the guest OS.
 Lets detect such situations and go to userspace. We will provide
 a QEMU patch that sets the guest in panicked/crashed state to avoid
 wasting CPU cycles.
 2.2 Resend external interrupts back to the guest if the HW could
 not do it.
 -
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTaOGKAAoJEBF7vIC1phx8IqkP/0zQ3gWbYdGV20UEvIB+oHsO
 u7OysZdyfXS3wx6rysTWepQJ6rtWJ/yQSyzTt+RnCTYxUnyhMVPKMJOmoztyhkD5
 37I9ricqMS/Ob5A3pKGEW2p/TojPYL5o8svCRt+UWbyxz05AQiCEPteeD7MrcOK+
 ASULR2z2h95EYfrMhZSeFjFoXHrPfeMoR5OVESP8gef7uGTlqIZO1mZ6QkAFqL/b
 VtqCI74oTc+XpNj7jxnvxznilqnvjD31oaci2oK+AX+DQcwOnTIGuUlU1bS+XOwm
 WFbDKUbksNC/QQ2hPqcCvZTtK+U7XlPZz7pRyEdvHYRckaNDzLbiLzYHvRGgCHoq
 uy9u429L1pthoj1vQvUY2ZD4HyI4K/UusApie5x3hmYlePNSEcC7TNDt2SvdjrID
 yX6X9zWC9ffHSmKLBI11PWNs5R1EUrUlBcZ7CFDDmJDCeKRmwmY1+nuYSm7x80iB
 ctfpXTJG4Ajrbbki5LCdoLPU0piR/IkSEwxeEY0u/5XLcdEiY/Z3SEJzlWeuIPf6
 bNuWQK8YP6ane8p3Vc/UwmtMgaCEsnAwYrcRfmjOEQfVDxmRzHARIxbIFs0EsM54
 S+6SH6LN1HCeFsG3zvpwPrm9gK2GojvJ0tCwZ78UZZx5m4CrgtHVHHfbspygftv8
 6L/YJ/Q0PQja0s3lx/Eh
 =R95o
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140506' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

1. Fixes an error return code for the breakpoint setup

2. External interrupt fixes
2.1. Some interrupt conditions like cpu timer or clock comparator
stay pending even after the interrupt is injected. If the external
new PSW is enabled for interrupts this will result in an endless
loop. Usually this indicates a programming error in the guest OS.
Lets detect such situations and go to userspace. We will provide
a QEMU patch that sets the guest in panicked/crashed state to avoid
wasting CPU cycles.
2.2 Resend external interrupts back to the guest if the HW could
not do it.
-
2014-05-06 17:20:37 +02:00
Thomas Huth
f14d82e06a KVM: s390: Fix external interrupt interception
The external interrupt interception can only occur in rare cases, e.g.
when the PSW of the interrupt handler has a bad value. The old handler
for this interception simply ignored these events (except for increasing
the exit_external_interrupt counter), but for proper operation we either
have to inject the interrupts manually or we should drop to userspace in
case of errors.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-06 14:58:10 +02:00
Thomas Huth
e029ae5b78 KVM: s390: Add clock comparator and CPU timer IRQ injection
Add an interface to inject clock comparator and CPU timer interrupts
into the guest. This is needed for handling the external interrupt
interception.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-06 14:58:05 +02:00
Dan Carpenter
fcc9aec3de KVM: s390: return -EFAULT if copy_from_user() fails
When copy_from_user() fails, this code returns the number of bytes
remaining instead of a negative error code.  The positive number is
returned to the user but otherwise it is harmless.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-06 14:57:59 +02:00
Ulrich Obergfell
1171903d89 KVM: x86: improve the usability of the 'kvm_pio' tracepoint
This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated()
and emulator_pio_out_emulated(), and it adds an argument (a pointer to
the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched
from 'pio_data' (depending on 'size'), and the value is included in the
trace record ('val'). If 'count' is greater than one, this is indicated
by the string "(...)" in the trace output.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-05 22:42:05 +02:00
Paolo Bonzini
57b5981cd3 1. Guest handling fixes
The handling of MVPG, PFMF and Test Block is fixed to better follow
 the architecture. None of these fixes is critical for any current
 Linux guests, but let's play safe.
 
 2. Optimization for single CPU guests
 We can enable the IBS facility if only one VCPU is running (!STOPPED
 state). We also enable this optimization for guest > 1 VCPU as soon
 as all but one VCPU is in stopped state. Thus will help guests that
 have tools like cpuplugd (from s390-utils) that do dynamic offline/
 online of CPUs.
 
 3. NOTES
 There is one non-s390 change in include/linux/kvm_host.h that
 introduces 2 defines for VCPU requests:
 define KVM_REQ_ENABLE_IBS        23
 define KVM_REQ_DISABLE_IBS       24
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTX6eTAAoJEBF7vIC1phx8S8YQAKRQ1Oe75qLS+F1yipNHi905
 7byi0K2nuGF/K4NSkQeWyv8mqjlWIEE+a8PR69mrgx6biae/Sn6l2ZZiV3Ml0flH
 9FEIlu0PU/RvyPfERT9MxUsbY2Dbec4r3Q3U+RLftlAht6oA/AaiLaY6cKSzwXqa
 vTqm7VORgTsM7JUkdoC/BdwNH6+94I7IM6CGaWMmWqELAhKq6SUCmc8g26bvEjBd
 94bkUEBYgHuceEPYAmDA1r7QSStpnU+cgj0haHRI12g4y1PhuBuFkmAGPv/E60wT
 iGOkhQ5XchR6dmZBLC/zRbjObi5NqqRojRYcI8RZfbdfvxtD8+xKg7G++KXw+VkK
 lR2CyJfrXms9r90mo7Oi3oCEuy0gC5ToOtRbOb/Xo8CYjCd6pS8DJraaQQfUhsoX
 koWiPVu87Gk6GMALZdJYCAdHgOhwC/dglKVGHThsFFVwLgsC7ZKsluWYA/Y0FRtA
 B2A7DBIC5O1NNXs3CYIv+v0M3jNF+4tt7wxV4omNPiSZTb4IAhG+0ucvU+hIZFbm
 i07a1sULAOSAX7qeizmHamomrC5NeEOUQeQ2ciQbSH9L+GEZQj10YaOXXKQh79C+
 7LXTtXWzESzx+EXBuCPcFsNt84GH294YoFTitHzN0hAburkVqwQzIAEAby7ylgX2
 WUC6UjhyheZr0rBqO2g9
 =Lxxg
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140429' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

1. Guest handling fixes
The handling of MVPG, PFMF and Test Block is fixed to better follow
the architecture. None of these fixes is critical for any current
Linux guests, but let's play safe.

2. Optimization for single CPU guests
We can enable the IBS facility if only one VCPU is running (!STOPPED
state). We also enable this optimization for guest > 1 VCPU as soon
as all but one VCPU is in stopped state. Thus will help guests that
have tools like cpuplugd (from s390-utils) that do dynamic offline/
online of CPUs.

3. NOTES
There is one non-s390 change in include/linux/kvm_host.h that
introduces 2 defines for VCPU requests:
define KVM_REQ_ENABLE_IBS        23
define KVM_REQ_DISABLE_IBS       24
2014-04-30 12:29:41 +02:00
Marcelo Tosatti
e4c9a5a175 KVM: x86: expose invariant tsc cpuid bit (v2)
Invariant TSC is a property of TSC, no additional
support code necessary.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-29 15:22:43 +02:00
David Hildenbrand
8ad3575517 KVM: s390: enable IBS for single running VCPUs
This patch enables the IBS facility when a single VCPU is running.
The facility is dynamically turned on/off as soon as other VCPUs
enter/leave the stopped state.

When this facility is operating, some instructions can be executed
faster for single-cpu guests.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:54 +02:00
David Hildenbrand
6852d7b69b KVM: s390: introduce kvm_s390_vcpu_{start,stop}
This patch introduces two new functions to set/clear the CPUSTAT_STOPPED bit and
makes use of it at all applicable places. These functions prepare the additional
execution of code when starting/stopping a vcpu.

The CPUSTAT_STOPPED bit should not be touched outside of these functions.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:54 +02:00
Thomas Huth
e45efa28e5 KVM: s390: Add low-address protection to TEST BLOCK
TEST BLOCK is also subject to the low-address protection, so we need
to check the destination address in our handler.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:53 +02:00
Thomas Huth
fb34c60365 KVM: s390: Fixes for PFMF
Add a check for low-address protection to the PFMF handler and
convert real-addresses to absolute if necessary, as it is defined
in the Principles of Operations specification.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:53 +02:00
Thomas Huth
f8232c8cf7 KVM: s390: Add a function for checking the low-address protection
The s390 architecture has a special protection mechanism that can
be used to prevent write access to the vital data in the low-core
memory area. This patch adds a new helper function that can be used
to check for such write accesses and in case of protection, it also
sets up the exception data accordingly.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:52 +02:00
Thomas Huth
9a558ee3cc KVM: s390: Handle MVPG partial execution interception
When the guest executes the MVPG instruction with DAT disabled,
and the source or destination page is not mapped in the host,
the so-called partial execution interception occurs. We need to
handle this event by setting up a mapping for the corresponding
user pages.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:51 +02:00
Xiao Guangrong
198c74f43f KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
Now we can flush all the TLBs out of the mmu lock without TLB corruption when
write-proect the sptes, it is because:
- we have marked large sptes readonly instead of dropping them that means we
  just change the spte from writable to readonly so that we only need to care
  the case of changing spte from present to present (changing the spte from
  present to nonpresent will flush all the TLBs immediately), in other words,
  the only case we need to care is mmu_spte_update()

- in mmu_spte_update(), we haved checked
  SPTE_HOST_WRITEABLE | PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that
  means it does not depend on PT_WRITABLE_MASK anymore

Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:49:52 -03:00
Xiao Guangrong
7f31c9595e KVM: MMU: flush tlb if the spte can be locklessly modified
Relax the tlb flush condition since we will write-protect the spte out of mmu
lock. Note lockless write-protection only marks the writable spte to readonly
and the spte can be writable only if both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable)

This patch is used to avoid this kind of race:

      VCPU 0                         VCPU 1
lockless wirte protection:
      set spte.w = 0
                                 lock mmu-lock

                                 write protection the spte to sync shadow page,
                                 see spte.w = 0, then without flush tlb

				 unlock mmu-lock

                                 !!! At this point, the shadow page can still be
                                     writable due to the corrupt tlb entry
     Flush all TLB

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:49:51 -03:00
Xiao Guangrong
c126d94f2c KVM: MMU: lazily drop large spte
Currently, kvm zaps the large spte if write-protected is needed, the later
read can fault on that spte. Actually, we can make the large spte readonly
instead of making them un-present, the page fault caused by read access can
be avoided

The idea is from Avi:
| As I mentioned before, write-protecting a large spte is a good idea,
| since it moves some work from protect-time to fault-time, so it reduces
| jitter.  This removes the need for the return value.

This version has fixed the issue reported in 6b73a9606, the reason of that
issue is that fast_page_fault() directly sets the readonly large spte to
writable but only dirty the first page into the dirty-bitmap that means
other pages are missed. Fixed it by only the normal sptes (on the
PT_PAGE_TABLE_LEVEL level) can be fast fixed

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:49:50 -03:00
Xiao Guangrong
92a476cbfc KVM: MMU: properly check last spte in fast_page_fault()
Using sp->role.level instead of @level since @level is not got from the
page table hierarchy

There is no issue in current code since the fast page fault currently only
fixes the fault caused by dirty-log that is always on the last level
(level = 1)

This patch makes the code more readable and avoids potential issue in the
further development

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:49:49 -03:00
Xiao Guangrong
a086f6a1eb Revert "KVM: Simplify kvm->tlbs_dirty handling"
This reverts commit 5befdc385d.

Since we will allow flush tlb out of mmu-lock in the later
patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:49:48 -03:00
Nadav Amit
42bf549f3c KVM: x86: Processor mode may be determined incorrectly
If EFER.LMA is off, cs.l does not determine execution mode.
Currently, the emulation engine assumes differently.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:47:00 -03:00
Nadav Amit
e6e39f0438 KVM: x86: IN instruction emulation should ignore REP-prefix
The IN instruction is not be affected by REP-prefix as INS is.  Therefore, the
emulation should ignore the REP prefix as well.  The current emulator
implementation tries to perform writeback when IN instruction with REP-prefix
is emulated. This causes it to perform wrong memory write or spurious #GP
exception to be injected to the guest.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:46:59 -03:00
Nadav Amit
346874c950 KVM: x86: Fix CR3 reserved bits
According to Intel specifications, PAE and non-PAE does not have any reserved
bits.  In long-mode, regardless to PCIDE, only the high bits (above the
physical address) are reserved.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:46:57 -03:00
Nadav Amit
671bd9934a KVM: x86: Fix wrong/stuck PMU when guest does not use PMI
If a guest enables a performance counter but does not enable PMI, the
hypervisor currently does not reprogram the performance counter once it
overflows.  As a result the host performance counter is kept with the original
sampling period which was configured according to the value of the guest's
counter when the counter was enabled.

Such behaviour can cause very bad consequences. The most distrubing one can
cause the guest not to make any progress at all, and keep exiting due to host
PMI before any guest instructions is exeucted. This situation occurs when the
performance counter holds a very high value when the guest enables the
performance counter. As a result the host's sampling period is configured to be
very short. The host then never reconfigures the sampling period and get stuck
at entry->PMI->exit loop. We encountered such a scenario in our experiments.

The solution is to reprogram the counter even if the guest does not use PMI.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:46:52 -03:00
Bandan Das
e0ba1a6ffc KVM: nVMX: Advertise support for interrupt acknowledgement
Some Type 1 hypervisors such as XEN won't enable VMX without it present

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-22 18:41:34 -03:00
Bandan Das
77b0f5d67f KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to
This feature emulates the "Acknowledge interrupt on exit" behavior.
We can safely emulate it for L1 to run L2 even if L0 itself has it
disabled (to run L1).

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-22 18:41:33 -03:00
Bandan Das
4b85507860 KVM: nVMX: Don't advertise single context invalidation for invept
For single context invalidation, we fall through to global
invalidation in handle_invept() except for one case - when
the operand supplied by L1 is different from what we have in
vmcs12. However, typically hypervisors will only call invept
for the currently loaded eptp, so the condition will
never be true.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-22 18:41:28 -03:00
Huw Davies
fd2a445a94 KVM: VMX: Advance rip to after an ICEBP instruction
When entering an exception after an ICEBP, the saved instruction
pointer should point to after the instruction.

This fixes the bug here: https://bugs.launchpad.net/qemu/+bug/1119686

Signed-off-by: Huw Davies <huw@codeweavers.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-22 18:37:43 -03:00
Marcelo Tosatti
63b5cf04f4 Lazy storage key handling
-------------------------
 Linux does not use the ACC and F bits of the storage key. Newer Linux
 versions also do not use the storage keys for dirty and reference
 tracking. We can optimize the guest handling for those guests for faults
 as well as page-in and page-out by simply not caring about the guest
 visible storage key. We trap guest storage key instruction to enable
 those keys only on demand.
 
 Migration bitmap
 
 Until now s390 never provided a proper dirty bitmap.  Let's provide a
 proper migration bitmap for s390. We also change the user dirty tracking
 to a fault based mechanism. This makes the host completely independent
 from the storage keys. Long term this will allow us to back guest memory
 with large pages.
 
 per-VM device attributes
 ------------------------
 To avoid the introduction of new ioctls, let's provide the
 attribute semanantic also on the VM-"device".
 
 Userspace controlled CMMA
 -------------------------
 The CMMA assist is changed from "always on" to "on if requested" via
 per-VM device attributes. In addition a callback to reset all usage
 states is provided.
 
 Proper guest DAT handling for intercepts
 ----------------------------------------
 While instructions handled by SIE take care of all addressing aspects,
 KVM/s390 currently does not care about guest address translation of
 intercepts. This worked out fine, because
 - the s390 Linux kernel has a 1:1 mapping between kernel virtual<->real
  for all pages up to memory size
 - intercepts happen only for a small amount of cases
 - all of these intercepts happen to be in the kernel text for current
   distros
 
 Of course we need to be better for other intercepts, kernel modules etc.
 We provide the infrastructure and rework all in-kernel intercepts to work
 on logical addresses (paging etc) instead of real ones. The code has
 been running internally for several months now, so it is time for going
 public.
 
 GDB support
 -----------
 We provide breakpoints, single stepping and watchpoints.
 
 Fixes/Cleanups
 --------------
 - Improve program check delivery
 - Factor out the handling of transactional memory  on program checks
 - Use the existing define __LC_PGM_TDB
 - Several cleanups in the lowcore structure
 - Documentation
 
 NOTES
 -----
 - All patches touching base s390 are either ACKed or written by the s390
   maintainers
 - One base KVM patch "KVM: add kvm_is_error_gpa() helper"
 - One patch introduces the notion of VM device attributes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTVlHZAAoJEBF7vIC1phx8REgP/1P0EUzfBpoS53z1v60n2uLT
 lW79LY9Op4/ZacEgHtU9LzmGa88X0arDsIpBZQsTNLF77AGFcMCCV3X2il/lQrRG
 KSE+ycKLoFjCcES442DwF4gHoGldD+KL/+5LPWSQZtvb9dDpHDft9aeMRBbpUL0Q
 M2kKQDlmJ2XqQu3D5PwSHgVRByHiHOzmTe2ejSSbdppkwBpaiqSBBBk0jVYDW9Jh
 eqUnBcrrYW2p+QS37ELM6hOkfDXN/vXoHBQeyca19TuZVCPNA7HeJaPc2mJ/GZk9
 wrNWEmY3f/lY0lk0zMwBwsDOS5K7jbtvXzcex6m+NsIqQuOvKsmPBy1BWb/axcK5
 uZq/JGFC0fxsFU+7ImtvQrJ/DMHnVuvSKF4WUVle2GdMlDIqkguwX27WwHSiH4/r
 Au02KlVIMUZdLAEUrw/W/S4MPLeZYoGfetHGCOmSaP2qGc97BVFedZaqekDlUgMw
 3gIoQmSIBcfrgF4k9N4nLjdhAX2S4gkviwF3pTlIkecNfa7RcI3Xk7U9mVPmIhL4
 IquVqjdXZH4m0e4gViBMtQ0IPwGt1qFlV6Wv3O9MExhfi7VQ8M8TMYNhEvtGpY75
 cuZwZYGM4FqszDAy9hbk0avTLqCxqlTiBKi3tHoQMappQmsJPrIdxIpev3MZPHCp
 vZMkbzhM9l3eefNJVw66
 =jxBp
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140422' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into queue

Lazy storage key handling
-------------------------
Linux does not use the ACC and F bits of the storage key. Newer Linux
versions also do not use the storage keys for dirty and reference
tracking. We can optimize the guest handling for those guests for faults
as well as page-in and page-out by simply not caring about the guest
visible storage key. We trap guest storage key instruction to enable
those keys only on demand.

Migration bitmap

Until now s390 never provided a proper dirty bitmap.  Let's provide a
proper migration bitmap for s390. We also change the user dirty tracking
to a fault based mechanism. This makes the host completely independent
from the storage keys. Long term this will allow us to back guest memory
with large pages.

per-VM device attributes
------------------------
To avoid the introduction of new ioctls, let's provide the
attribute semanantic also on the VM-"device".

Userspace controlled CMMA
-------------------------
The CMMA assist is changed from "always on" to "on if requested" via
per-VM device attributes. In addition a callback to reset all usage
states is provided.

Proper guest DAT handling for intercepts
----------------------------------------
While instructions handled by SIE take care of all addressing aspects,
KVM/s390 currently does not care about guest address translation of
intercepts. This worked out fine, because
- the s390 Linux kernel has a 1:1 mapping between kernel virtual<->real
 for all pages up to memory size
- intercepts happen only for a small amount of cases
- all of these intercepts happen to be in the kernel text for current
  distros

Of course we need to be better for other intercepts, kernel modules etc.
We provide the infrastructure and rework all in-kernel intercepts to work
on logical addresses (paging etc) instead of real ones. The code has
been running internally for several months now, so it is time for going
public.

GDB support
-----------
We provide breakpoints, single stepping and watchpoints.

Fixes/Cleanups
--------------
- Improve program check delivery
- Factor out the handling of transactional memory  on program checks
- Use the existing define __LC_PGM_TDB
- Several cleanups in the lowcore structure
- Documentation

NOTES
-----
- All patches touching base s390 are either ACKed or written by the s390
  maintainers
- One base KVM patch "KVM: add kvm_is_error_gpa() helper"
- One patch introduces the notion of VM device attributes

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Conflicts:
	include/uapi/linux/kvm.h
2014-04-22 10:51:06 -03:00
Michael Mueller
e325fe69aa KVM: s390: Factor out handle_itdb to handle TX aborts
Factor out the new function handle_itdb(), which copies the ITDB into
guest lowcore to fully handle a TX abort.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:54 +02:00
Michael Mueller
a86dcc2482 KVM: s390: replace TDB_ADDR by __LC_PGM_TDB
The generically assembled low core labels already contain the
address for the TDB.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:53 +02:00
Christian Borntraeger
67335e63c9 KVM: s390: Drop pending interrupts on guest exit
On hard exits (abort, sigkill) we have have some kvm_s390_interrupt_info
structures hanging around. Delete those on exit to avoid memory leaks.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: stable@vger.kernel.org
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
2014-04-22 13:24:53 +02:00
David Hildenbrand
f71d0dc508 KVM: s390: no timer interrupts when single-stepping a guest
When a guest is single-stepped, we want to disable timer interrupts. Otherwise,
the guest will continuously execute the external interrupt handler and make
debugging of code where timer interrupts are enabled almost impossible.

The delivery of timer interrupts can be enforced in such sections by setting a
breakpoint and continuing execution.

In order to disable timer interrupts, they are disabled in the control register
of the guest just before SIE entry and are suppressed in the interrupt
check/delivery methods.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:52 +02:00
David Hildenbrand
bb78c5ec91 KVM: s390: move timer interrupt checks into own functions
This patch moves the checks for enabled timer (clock-comparator) interrupts and pending
timer interrupts into own functions, making the code better readable and easier to
maintain.

The method kvm_cpu_has_pending_timer is filled with life.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:52 +02:00
David Hildenbrand
27291e2165 KVM: s390: hardware support for guest debugging
This patch adds support to debug the guest using the PER facility on s390.
Single-stepping, hardware breakpoints and hardware watchpoints are supported. In
order to use the PER facility of the guest without it noticing it, the control
registers of the guest have to be patched and access to them has to be
intercepted(stctl, stctg, lctl, lctlg).

All PER program interrupts have to be intercepted and only the relevant PER
interrupts for the guest have to be given back. Special care has to be taken
about repeated exits on the same hardware breakpoint. The intervention of the
host in the guests PER configuration is not fully transparent. PER instruction
nullification can not be used by the guest and too many storage alteration
events may be reported to the guest (if it is activated for special address
ranges only) when the host concurrently debugging it.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:51 +02:00
David Hildenbrand
af1827e773 KVM: s390: kernel header addition for guest debugging
This patch adds the structs to the kernel headers needed to pass information
from/to userspace in order to debug a guest on s390 with hardware support.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:50 +02:00
David Hildenbrand
aba0750889 KVM: s390: emulate stctl and stctg
Introduce the methods to emulate the stctl and stctg instruction. Added tracing
code.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:50 +02:00
David Hildenbrand
8712836b30 KVM: s390: deliver program irq parameters and use correct ilc
When a program interrupt was to be delivered until now, no program interrupt
parameters were stored in the low-core of the target vcpu.

This patch enables the delivery of those program interrupt parameters, takes
care of concurrent PER events which can be injected in addition to any program
interrupt and uses the correct instruction length code (depending on the
interception code) for the injection of program interrupts.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:49 +02:00
David Hildenbrand
439716a5ca KVM: s390: extract irq parameters of intercepted program irqs
Whenever a program interrupt is intercepted, some parameters are stored in the
sie control block. These parameters have to be extracted in order to be
reinjected correctly. This patch also takes care of intercepted PER events which
can occurr in addition to any program interrupt.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:49 +02:00
Jens Freimann
da7cf2570c s390: add fields to lowcore definition
This patch adds fields which are currently missing but needed for the correct
injection of interrupts.

This is based on a patch by David Hildenbrand

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:49 +02:00
Jens Freimann
21ee7ffd17 s390: rename and split lowcore field per_perc_atmid
per_perc_atmid is currently a two-byte field that combines two
fields, the PER code and the PER Addressing-and-Translation-Mode
Identification (ATMID)

Let's make them accessible indepently and also rename per_cause to
per_code.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:48 +02:00
Jens Freimann
3d53b46ce8 s390: fix name of lowcore field at offset 0xa3
According to the Principles of Operation, at offset 0xA3
in the lowcore we have the "Architectural-Mode identification",
not an "access identification".

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:48 +02:00
Heiko Carstens
e497a96ae8 KVM: s390: cleanup kvm_s390_real_to_abs()
Add kerneldoc comment to kvm_s390_real_to_abs() and change the code
so it matches the coding style of the rest of gaccess.h.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:47 +02:00
Heiko Carstens
3263bd1637 KVM: s390: remove old guest access functions
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:47 +02:00
Heiko Carstens
645c5bc1d5 KVM: s390: convert handle_stsi()
Convert handle_stsi() to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:46 +02:00
Heiko Carstens
f987a3eef0 KVM: s390: convert handle lctl[g]()
Convert handle lctl[g]() to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:46 +02:00
Heiko Carstens
7d777d7824 KVM: s390: convert handle_stidp()
Convert handle_stidp() to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:45 +02:00
Heiko Carstens
2d8bcaeda1 KVM: s390: convert handle_lpsw[e]()
Convert handle_lpsw[e]() to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:45 +02:00
Cornelia Huck
2f32d4ea28 KVM: s390: reinject io interrupt on tpi failure
The tpi instruction should be suppressed on addressing and protection
exceptions, so we need to re-inject the dequeued io interrupt in that
case.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:45 +02:00
Heiko Carstens
4799b557c9 KVM: s390: convert handle_tpi()
Convert handle_tpi() to new guest access functions.

The code now sets up a structure which is copied with a single call to
guest space instead of issuing several separate guest access calls.
This is necessary since the to be copied data may cross a page boundary.
If a protection exception happens while accessing any of the pages, the
instruction is suppressed and may not have modified any memory contents.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:44 +02:00
Heiko Carstens
ef23e7790e KVM: s390: convert handle_test_block()
Convert handle_test_block() to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:44 +02:00
Heiko Carstens
8b96de0e03 KVM: s390: convert handle_store_cpu_address()
Convert handle_store_cpu_address() to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:43 +02:00
Heiko Carstens
f748f4a7ec KVM: s390: convert handle_store_prefix()
Convert handle_store_prefix() to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:43 +02:00
Heiko Carstens
0e7a3f9405 KVM: s390: convert handle_set_clock()
Convert handle_set_clock() to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:42 +02:00
Heiko Carstens
665170cb47 KVM: s390: convert __sigp_set_prefix()/handle_set_prefix()
Convert __sigp_set_prefix() and handle_set_prefix() to new guest
access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:42 +02:00
Heiko Carstens
d0bce6054a KVM: s390: convert kvm_s390_store_status_unloaded()
Convert kvm_s390_store_status_unloaded() to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:41 +02:00
Heiko Carstens
0040e7d20f KVM: s390: convert handle_prog()
Convert handle_prog() to new guest access functions.
Also make the code a bit more readable and look at the return code
of write_guest_lc() which was missing before.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:41 +02:00
Heiko Carstens
81480cc19c KVM: s390: convert pfault code
Convert pfault code to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:40 +02:00
Heiko Carstens
0f9701c6c2 KVM: s390: convert handle_stfl()
Convert handle_stfl() to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:40 +02:00
Jens Freimann
1a03b76422 KVM: s390: convert local irqs in __do_deliver_interrupt()
Convert local irqs in __do_deliver_interrupt() to new guest
access functions.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:40 +02:00
Heiko Carstens
7988276df7 KVM: s390: convert __do_deliver_interrupt()
Convert __do_deliver_interrupt() to new guest access functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:39 +02:00
Heiko Carstens
8a242234b4 KVM: s390: make use of ipte lock
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:39 +02:00
Heiko Carstens
217a440683 KVM: s390/sclp: correctly set eca siif bit
Check if siif is available before setting.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:38 +02:00
Heiko Carstens
2293897805 KVM: s390: add architecture compliant guest access functions
The new guest memory access function write_guest() and read_guest() can be
used to access guest memory in an architecture compliant way.
These functions will look at the vcpu's PSW and select the correct address
space for memory access and also perform correct address wrap around.
In case DAT is turned on, page tables will be walked otherwise access will
happen to real or absolute memory.

Any access exception will be recognized and exception data will be stored
in the vcpu's kvm_vcpu_arch.pgm member. Subsequently an exception can be
injected if necessary.

Missing are:
- key protection checks
- access register mode support
- program event recording support

This patch also adds write_guest_real(), read_guest_real(),
write_guest_absolute() and read_guest_absolute() guest functions which can
be used to access real and absolute storage. These functions currently do
not perform any access checks, since there is no use case (yet?).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:38 +02:00
Heiko Carstens
d95fb12ff4 KVM: s390: add lowcore access functions
put_guest_lc, read_guest_lc and write_guest_lc are guest access
functions which shall only be used to access the lowcore of a vcpu.
These functions should be used for e.g. interrupt handlers where no
guest memory access protection facilities, like key or low address
protection, are applicable.

At a later point guest vcpu lowcore access should happen via pinned
prefix pages, so that these pages can be accessed directly via the
kernel mapping. All of these *_lc functions can be removed then.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:37 +02:00
Heiko Carstens
1b0462e574 KVM: s390: add 'pgm' member to kvm_vcpu_arch and helper function
Add a 'struct kvm_s390_pgm_info pgm' member to kvm_vcpu_arch. This
structure will be used if during instruction emulation in the context
of a vcpu exception data needs to be stored somewhere.

Also add a helper function kvm_s390_inject_prog_cond() which can inject
vcpu's last exception if needed.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:37 +02:00
Heiko Carstens
072c9878ee KVM: s390: add kvm_s390_logical_to_effective() helper
Add kvm_s390_logical_to_effective() helper which converts a guest vcpu's
logical storage address to a guest vcpu effective address by applying the
rules of the vcpu's addressing mode defined by PSW bits 31 and 32
(extendended and basic addressing mode).
Depending on the vcpu's addressing mode the upper 40 bits (24 bit addressing
mode), 33 bits (31 bit addressing mode) or no bits (64 bit addressing mode)
will be zeroed and the remaining bits will be returned.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:36 +02:00
Heiko Carstens
5f4e87a227 s390/ctl_reg: add union type for control register 0
Add 'union ctlreg0_bits' to easily allow setting and testing bits of
control register 0 bits.
This patch only adds the bits needed for the new guest access functions.
Other bits and control registers can be added when needed.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:36 +02:00
Heiko Carstens
1365632bde s390/ptrace: add struct psw and accessor function
Introduce a 'struct psw' which makes it easier to decode and test if
certain bits in a psw are set or are not set.
In addition also add a 'psw_bits()' helper define which allows to
directly modify and test a psw_t structure. E.g.

psw_t psw;
psw_bits(psw).t = 1; /* set dat bit */

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:36 +02:00
Heiko Carstens
280ef0f1f9 KVM: s390: export test_vfacility()
Make test_vfacility() available for other files. This is needed for the
new guest access functions, which test if certain facilities are available
for a guest.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:35 +02:00
Jens Freimann
bcd846837c KVM: s390: allow injecting every kind of interrupt
Add a new data structure and function that allows to inject
all kinds of interrupt as defined in the PoP

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:34 +02:00
Dominik Dingel
4f718eab26 KVM: s390: Exploiting generic userspace interface for cmma
To enable CMMA and to reset its state we use the vm kvm_device ioctls,
encapsulating attributes within the KVM_S390_VM_MEM_CTRL group.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:32 +02:00
Dominik Dingel
b31605c12f KVM: s390: make cmma usage conditionally
When userspace reset the guest without notifying kvm, the CMMA state
of the pages might be unused, resulting in guest data corruption.
To avoid this, CMMA must be enabled only if userspace understands
the implications.

CMMA must be enabled before vCPU creation. It can't be switched off
once enabled.  All subsequently created vCPUs will be enabled for
CMMA according to the CMMA state of the VM.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[remove now unnecessary calls to page_table_reset_pgste]
2014-04-22 13:24:13 +02:00
Dominik Dingel
f206165620 KVM: s390: Per-vm kvm device controls
We sometimes need to get/set attributes specific to a virtual machine
and so need something else than ONE_REG.

Let's copy the KVM_DEVICE approach, and define the respective ioctls
for the vm file descriptor.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 13:24:12 +02:00
Jason J. Herne
15f36ebd34 KVM: s390: Add proper dirty bitmap support to S390 kvm.
Replace the kvm_s390_sync_dirty_log() stub with code to construct the KVM
dirty_bitmap from S390 memory change bits.  Also add code to properly clear
the dirty_bitmap size when clearing the bitmap.

Signed-off-by: Jason J. Herne <jjherne@us.ibm.com>
CC: Dominik Dingel <dingel@linux.vnet.ibm.com>
[Dominik Dingel: use gmap_test_and_clear_dirty, locking fixes]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:28 +02:00
Dominik Dingel
a0bf4f149b KVM: s390/mm: new gmap_test_and_clear_dirty function
For live migration kvm needs to test and clear the dirty bit of guest pages.

That for is ptep_test_and_clear_user_dirty, to be sure we are not racing with
other code, we protect the pte. This needs to be done within
the architecture memory management code.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:27 +02:00
Martin Schwidefsky
0a61b222df KVM: s390/mm: use software dirty bit detection for user dirty tracking
Switch the user dirty bit detection used for migration from the hardware
provided host change-bit in the pgste to a fault based detection method.
This reduced the dependency of the host from the storage key to a point
where it becomes possible to enable the RCP bypass for KVM guests.

The fault based dirty detection will only indicate changes caused
by accesses via the guest address space. The hardware based method
can detect all changes, even those caused by I/O or accesses via the
kernel page table. The KVM/qemu code needs to take this into account.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:26 +02:00
Dominik Dingel
693ffc0802 KVM: s390: Don't enable skeys by default
The first invocation of storage key operations on a given cpu will be intercepted.

On these intercepts we will enable storage keys for the guest and remove the
previously added intercepts.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:26 +02:00
Dominik Dingel
934bc131ef KVM: s390: Allow skeys to be enabled for the current process
Introduce a new function s390_enable_skey(), which enables storage key
handling via setting the use_skey flag in the mmu context.

This function is only useful within the context of kvm.

Note that enabling storage keys will cause a one-time hickup when
walking the page table; however, it saves us special effort for cases
like clear reset while making it possible for us to be architecture
conform.

s390_enable_skey() takes the page table lock to prevent reseting
storage keys triggered from multiple vcpus.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:25 +02:00
Dominik Dingel
d4cb11340b KVM: s390: Clear storage keys
page_table_reset_pgste() already does a complete page table walk to
reset the pgste. Enhance it to initialize the storage keys to
PAGE_DEFAULT_KEY if requested by the caller. This will be used
for lazy storage key handling. Also provide an empty stub for
!CONFIG_PGSTE

Lets adopt the current code (diag 308) to not clear the keys.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:24 +02:00
Dominik Dingel
65eef33550 KVM: s390: Adding skey bit to mmu context
For lazy storage key handling, we need a mechanism to track if the
process ever issued a storage key operation.

This patch adds the basic infrastructure for making the storage
key handling optional, but still leaves it enabled for now by default.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22 09:36:23 +02:00
Nadav Amit
5c7411e293 KVM: x86: Fix CR3 and LDT sel should not be saved in TSS
According to Intel specifications, only general purpose registers and segment
selectors should be saved in the old TSS during 32-bit task-switch.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-21 17:33:49 -03:00
Michael S. Tsirkin
68c3b4d167 KVM: VMX: speed up wildcard MMIO EVENTFD
With KVM, MMIO is much slower than PIO, due to the need to
do page walk and emulation. But with EPT, it does not have to be: we
know the address from the VMCS so if the address is unique, we can look
up the eventfd directly, bypassing emulation.

Unfortunately, this only works if userspace does not need to match on
access length and data.  The implementation adds a separate FAST_MMIO
bus internally. This serves two purposes:
    - minimize overhead for old userspace that does not use eventfd with lengtth = 0
    - minimize disruption in other code (since we don't know the length,
      devices on the MMIO bus only get a valid address in write, this
      way we don't need to touch all devices to teach them to handle
      an invalid length)

At the moment, this optimization only has effect for EPT on x86.

It will be possible to speed up MMIO for NPT and MMU using the same
idea in the future.

With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
I was unable to detect any measureable slowdown to non-eventfd MMIO.

Making MMIO faster is important for the upcoming virtio 1.0 which
includes an MMIO signalling capability.

The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
pre-review and suggestions.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-17 14:01:43 -03:00
Michael S. Tsirkin
f848a5a8dc KVM: support any-length wildcard ioeventfd
It is sometimes benefitial to ignore IO size, and only match on address.
In hindsight this would have been a better default than matching length
when KVM_IOEVENTFD_FLAG_DATAMATCH is not set, In particular, this kind
of access can be optimized on VMX: there no need to do page lookups.
This can currently be done with many ioeventfds but in a suboptimal way.

However we can't change kernel/userspace ABI without risk of breaking
some applications.
Use len = 0 to mean "ignore length for matching" in a more optimal way.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-17 14:01:42 -03:00
Nadav Amit
cd9ae5fe47 KVM: x86: Fix page-tables reserved bits
KVM does not handle the reserved bits of x86 page tables correctly:
In PAE, bits 5:8 are reserved in the PDPTE.
In IA-32e, bit 8 is not reserved.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-16 18:59:23 -03:00
Linus Torvalds
0f689a33ad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 patches from Martin Schwidefsky:
 "An update to the oops output with additional information about the
  crash.  The renameat2 system call is enabled.  Two patches in regard
  to the PTR_ERR_OR_ZERO cleanup.  And a bunch of bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO
  s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO
  s390/sclp_vt220: Fix kernel panic due to early terminal input
  s390/compat: fix typo
  s390/uaccess: fix possible register corruption in strnlen_user_srst()
  s390: add 31 bit warning message
  s390: wire up sys_renameat2
  s390: show_registers() should not map user space addresses to kernel symbols
  s390/mm: print control registers and page table walk on crash
  s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
  s390: fix control register update
2014-04-16 11:28:25 -07:00
Linus Torvalds
7d38cc0290 Small workaround for a rare, but annoying, erratum
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTTsl0AAoJEKurIx+X31iBlZgP/1B01Zsqbq/Oh2JFRwjnLJbW
 FSMDMoAYHbIoB+4tIvQdQf6vGxPz++oBtJq95CeDeez2wh/BPvGFj/LrdBfYYgHo
 uutrB4c6P0WBOQJnfRcXS/MC4frihzV5IaA+VXAXTIBTVKlEi3PGUBM3/WgUA8QS
 Aga73Q0ze3XCjbtGeqjwE3YYzuiOzELeb4MJbNjH3a40ZkXCCmZcerZT8Zp0O5Qi
 hYZr6OTPmmoXrgX4MeBX7kFcwhHe6zb86TvdRAoPIY8fwPJt/zrhQlqrTfoUzRH5
 xyX3y4ByU7HU0CEk0g8AMiiR6EZNmUBo8onei0xVwRP8Z2u884RKQ1cC6D059ars
 isyMO/7lH6Rn+48OOTiXb0e2NOmejr/TRZ3asrvtfvb9uyMGFkG+6g2/pYxbnTb2
 QC8jknY07PhCB9J738RTllAFX6OFRvhk8b3DtrtXC5GTfIiA7HkdkTQ3HISNTgif
 C5WalPhyjh6iCJbtFtdFg0uwEpXgQ5QX5sN08NQOkuDXiI9obftD7pw5EIPiZCTt
 GIIxXh1KDWiktoOHrFAiIm4bh+CAxNRG10WvbZQj3CY+m9GYcI30h/XFmkoil4ds
 IoqVMan42yNK5AwqSa8L9Eq6t5Za1Uic1OgOofX2iYEEqp84aklUI0m5b8woN+/u
 t0BK8ThF4msbaWlJgKRK
 =lloU
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull itanium erratum fix from Tony Luck:
 "Small workaround for a rare, but annoying, erratum #237"

* tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237)
2014-04-16 11:22:45 -07:00
Tony Luck
c0b5a64d93 [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237)
April 2014 Itanium processor specification update:

http://www.intel.com/content/www/us/en/processors/itanium/itanium-specification-update.html

describes this erratum:

=========================================================================
237. Under a complex set of conditions, store to load forwarding for a
sub 8-byte load may complete incorrectly

Problem: A load instruction may complete incorrectly when a code sequence
using 4-byte or smaller load and store operations to the same address
is executed in combination with specific timing of all the following
concurrent conditions: store to load forwarding, alignment checking
enabled, a mis-predicted branch, and complex cache utilization activity.

Implication: The affected sub 8-byte instruction may complete
incorrectly resulting in unpredictable system behavior. There is an
extremely low probability of exposure due to the significant number of
complex microarchitectural concurrent conditions required to encounter
the erratum.

Workaround: Set PSR.ac = 0 to completely avoid the erratum. Disabling
Hyper-Threading will significantly reduce exposure to the conditions
that contribute to encountering the erratum.

Status: See the Summary Table of Changes for the affected steppings.
=========================================================================

[Table of changes essentially lists all models from McKinley to Tukwila]

The PSR.ac bit controls whether the processor will always generate
an unaligned reference trap (0x5a00) for a misaligned data access
(when PSR.ac=1) or if it will let the access succeed when running
on a cpu that implements logic to handle some unaligned accesses.

Way back in 2008 in commit b704882e70
  [IA64] Rationalize kernel mode alignment checking
we made the decision to always enable strict checking. We were
already doing so in trap/interrupt context because the common
preamble code set this bit - but the rest of supervisor code
(and by inheritance user code) ran with PSR.ac=0.

We now reverse that decision and set PSR.ac=0 everywhere in the
kernel (also inherited by user processes). This will avoid the
erratum using the method described in the Itanium specification
update.  Net effect for users is that the processor will handle
unaligned access when it can (typically with a tiny performance
bubble in the pipeline ... but much less invasive than taking a
trap and having the OS perform the access).

Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-04-16 10:20:34 -07:00
Linus Torvalds
55101e2d6c Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Marcelo Tosatti:
 - Fix for guest triggerable BUG_ON (CVE-2014-0155)
 - CR4.SMAP support
 - Spurious WARN_ON() fix

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: remove WARN_ON from get_kernel_ns()
  KVM: Rename variable smep to cr4_smep
  KVM: expose SMAP feature to guest
  KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
  KVM: Add SMAP support when setting CR4
  KVM: Remove SMAP bit from CR4_RESERVED_BITS
  KVM: ioapic: try to recover if pending_eoi goes out of range
  KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
2014-04-14 16:21:28 -07:00
Marcelo Tosatti
b351c39cc9 KVM: x86: remove WARN_ON from get_kernel_ns()
Function and callers can be preempted.

https://bugzilla.kernel.org/show_bug.cgi?id=73721

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-14 17:50:43 -03:00
Feng Wu
66386ade2a KVM: Rename variable smep to cr4_smep
Rename variable smep to cr4_smep, which can better reflect the
meaning of the variable.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14 17:50:40 -03:00