Commit Graph

859 Commits

Author SHA1 Message Date
Paul Mackerras 5d00f66b86 KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells
POWER8 has support for hypervisor doorbell interrupts.  Though the
kernel doesn't use them for IPIs on the powernv platform yet, it
probably will in future, so this makes KVM cope gracefully if a
hypervisor doorbell interrupt arrives while in a guest.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:13 +01:00
Paul Mackerras e0622bd9f2 KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8
POWER8 has a bit in the LPCR to enable or disable the PURR and SPURR
registers to count when in the guest.  Set this bit.

POWER8 has a field in the LPCR called AIL (Alternate Interrupt Location)
which is used to enable relocation-on interrupts.  Allow userspace to
set this field.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:11 +01:00
Paul Mackerras aa31e84322 KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs
* SRR1 wake reason field for system reset interrupt on wakeup from nap
  is now a 4-bit field on P8, compared to 3 bits on P7.

* Set PECEDP in LPCR when napping because of H_CEDE so guest doorbells
  will wake us up.

* Waking up from nap because of a guest doorbell interrupt is not a
  reason to exit the guest.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:10 +01:00
Paul Mackerras e3bbbbfa13 KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap
Currently in book3s_hv_rmhandlers.S we have three places where we
have woken up from nap mode and we check the reason field in SRR1
to see what event woke us up.  This consolidates them into a new
function, kvmppc_check_wake_reason.  It looks at the wake reason
field in SRR1, and if it indicates that an external interrupt caused
the wakeup, calls kvmppc_read_intr to check what sort of interrupt
it was.

This also consolidates the two places where we synthesize an external
interrupt (0x500 vector) for the guest.  Now, if the guest exit code
finds that there was an external interrupt which has been handled
(i.e. it was an IPI indicating that there is now an interrupt pending
for the guest), it jumps to deliver_guest_interrupt, which is in the
last part of the guest entry code, where we synthesize guest external
and decrementer interrupts.  That code has been streamlined a little
and now clears LPCR[MER] when appropriate as well as setting it.

The extra clearing of any pending IPI on a secondary, offline CPU
thread before going back to nap mode has been removed.  It is no longer
necessary now that we have code to read and acknowledge IPIs in the
guest exit path.

This fixes a minor bug in the H_CEDE real-mode handling - previously,
if we found that other threads were already exiting the guest when we
were about to go to nap mode, we would branch to the cede wakeup path
and end up looking in SRR1 for a wakeup reason.  Now we branch to a
point after we have checked the wakeup reason.

This also fixes a minor bug in kvmppc_read_intr - previously it could
return 0xff rather than 1, in the case where we find that a host IPI
is pending after we have cleared the IPI.  Now it returns 1.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:08 +01:00
Paul Mackerras 5557ae0ec7 KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8
This allows us to select architecture 2.05 (POWER6) or 2.06 (POWER7)
compatibility modes on a POWER8 processor.  (Note that transactional
memory is disabled for usermode if either or both of the PCR_TM_DIS
and PCR_ARCH_206 bits are set.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:06 +01:00
Michael Ellerman bd3048b80c KVM: PPC: Book3S HV: Add handler for HV facility unavailable
At present this should never happen, since the host kernel sets
HFSCR to allow access to all facilities.  It's better to be prepared
to handle it cleanly if it does ever happen, though.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:04 +01:00
Paul Mackerras ca25205513 KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8
POWER8 has 512 sets in the TLB, compared to 128 for POWER7, so we need
to do more tlbiel instructions when flushing the TLB on POWER8.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:02 +01:00
Michael Neuling b005255e12 KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs
This adds fields to the struct kvm_vcpu_arch to store the new
guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
functions to allow userspace to access this state, and adds code to
the guest entry and exit to context-switch these SPRs between host
and guest.

Note that DPDES (Directed Privileged Doorbell Exception State) is
shared between threads on a core; hence we store it in struct
kvmppc_vcore and have the master thread save and restore it.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:01:00 +01:00
Paul Mackerras e0b7ec058c KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers
On a threaded processor such as POWER7, we group VCPUs into virtual
cores and arrange that the VCPUs in a virtual core run on the same
physical core.  Currently we don't enforce any correspondence between
virtual thread numbers within a virtual core and physical thread
numbers.  Physical threads are allocated starting at 0 on a first-come
first-served basis to runnable virtual threads (VCPUs).

POWER8 implements a new "msgsndp" instruction which guest kernels can
use to interrupt other threads in the same core or sub-core.  Since
the instruction takes the destination physical thread ID as a parameter,
it becomes necessary to align the physical thread IDs with the virtual
thread IDs, that is, to make sure virtual thread N within a virtual
core always runs on physical thread N.

This means that it's possible that thread 0, which is where we call
__kvmppc_vcore_entry, may end up running some other vcpu than the
one whose task called kvmppc_run_core(), or it may end up running
no vcpu at all, if for example thread 0 of the virtual core is
currently executing in userspace.  However, we do need thread 0
to be responsible for switching the MMU -- a previous version of
this patch that had other threads switching the MMU was found to
be responsible for occasional memory corruption and machine check
interrupts in the guest on POWER7 machines.

To accommodate this, we no longer pass the vcpu pointer to
__kvmppc_vcore_entry, but instead let the assembly code load it from
the PACA.  Since the assembly code will need to know the kvm pointer
and the thread ID for threads which don't have a vcpu, we move the
thread ID into the PACA and we add a kvm pointer to the virtual core
structure.

In the case where thread 0 has no vcpu to run, it still calls into
kvmppc_hv_entry in order to do the MMU switch, and then naps until
either its vcpu is ready to run in the guest, or some other thread
needs to exit the guest.  In the latter case, thread 0 jumps to the
code that switches the MMU back to the host.  This control flow means
that now we switch the MMU before loading any guest vcpu state.
Similarly, on guest exit we now save all the guest vcpu state before
switching the MMU back to the host.  This has required substantial
code movement, making the diff rather large.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:00:59 +01:00
Michael Neuling eee7ff9d2c KVM: PPC: Book3S HV: Don't set DABR on POWER8
POWER8 doesn't have the DABR and DABRX registers; instead it has
new DAWR/DAWRX registers, which will be handled in a later patch.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:00:57 +01:00
Scott Wood 6c85f52b10 kvm/ppc: IRQ disabling cleanup
Simplify the handling of lazy EE by going directly from fully-enabled
to hard-disabled.  This replaces the lazy_irq_pending() check
(including its misplaced kvm_guest_exit() call).

As suggested by Tiejun Chen, move the interrupt disabling into
kvmppc_prepare_to_enter() rather than have each caller do it.  Also
move the IRQ enabling on heavyweight exit into
kvmppc_prepare_to_enter().

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:00:55 +01:00
Mihai Caraman 70713fe315 KVM: PPC: e500: Fix bad address type in deliver_tlb_misss()
Use gva_t instead of unsigned int for eaddr in deliver_tlb_miss().

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
CC: stable@vger.kernel.org
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:00:54 +01:00
Andreas Schwab 48eaef0518 KVM: PPC: Book3S HV: use xics_wake_cpu only when defined
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
CC: stable@vger.kernel.org
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:00:52 +01:00
Cédric Le Goater 736017752d KVM: PPC: Book3S: MMIO emulation support for little endian guests
MMIO emulation reads the last instruction executed by the guest
and then emulates. If the guest is running in Little Endian order,
or more generally in a different endian order of the host, the
instruction needs to be byte-swapped before being emulated.

This patch adds a helper routine which tests the endian order of
the host and the guest in order to decide whether a byteswap is
needed or not. It is then used to byteswap the last instruction
of the guest in the endian order of the host before MMIO emulation
is performed.

Finally, kvmppc_handle_load() of kvmppc_handle_store() are modified
to reverse the endianness of the MMIO if required.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
[agraf: add booke handling]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27 16:00:39 +01:00
Zhouyi Zhou 47d45d9f53 KVM: PPC: NULL return of kvmppc_mmu_hpte_cache_next should be handled
NULL return of kvmppc_mmu_hpte_cache_next should be handled

Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:15:11 +01:00
Tiejun Chen 9bd880a2c8 KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state
Rather than calling hard_irq_disable() when we're back in C code
we can just call RECONCILE_IRQ_STATE to soft disable IRQs while
we're already in hard disabled state.

This should be functionally equivalent to the code before, but
cleaner and faster.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
[agraf: fix comment, commit message]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:15:10 +01:00
Bharat Bhushan 08c9a188d0 kvm: powerpc: use caching attributes as per linux pte
KVM uses same WIM tlb attributes as the corresponding qemu pte.
For this we now search the linux pte for the requested page and
get these cache caching/coherency attributes from pte.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:15:08 +01:00
Bharat Bhushan 7c85e6b39c kvm: book3s: rename lookup_linux_pte() to lookup_linux_pte_and_update()
lookup_linux_pte() is doing more than lookup, updating the pte,
so for clarity it is renamed to lookup_linux_pte_and_update()

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:15:06 +01:00
Bharat Bhushan 30a91fe24b kvm: booke: clear host tlb reference flag on guest tlb invalidation
On booke, "struct tlbe_ref" contains host tlb mapping information
(pfn: for guest-pfn to pfn, flags: attribute associated with this mapping)
for a guest tlb entry. So when a guest creates a TLB entry then
"struct tlbe_ref" is set to point to valid "pfn" and set attributes in
"flags" field of the above said structure. When a guest TLB entry is
invalidated then flags field of corresponding "struct tlbe_ref" is
updated to point that this is no more valid, also we selectively clear
some other attribute bits, example: if E500_TLB_BITMAP was set then we clear
E500_TLB_BITMAP, if E500_TLB_TLB0 is set then we clear this.

Ideally we should clear complete "flags" as this entry is invalid and does not
have anything to re-used. The other part of the problem is that when we use
the same entry again then also we do not clear (started doing or-ing etc).

So far it was working because the selectively clearing mentioned above
actually clears "flags" what was set during TLB mapping. But the problem
starts coming when we add more attributes to this then we need to selectively
clear them and which is not needed.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:15:04 +01:00
Paul Mackerras 595e4f7e69 KVM: PPC: Book3S HV: Use load/store_fp_state functions in HV guest entry/exit
This modifies kvmppc_load_fp and kvmppc_save_fp to use the generic
FP/VSX and VMX load/store functions instead of open-coding the
FP/VSX/VMX load/store instructions.  Since kvmppc_load/save_fp don't
follow C calling conventions, we make them private symbols within
book3s_hv_rmhandlers.S.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:15:03 +01:00
Paul Mackerras 99dae3bad2 KVM: PPC: Load/save FP/VMX/VSX state directly to/from vcpu struct
Now that we have the vcpu floating-point and vector state stored in
the same type of struct as the main kernel uses, we can load that
state directly from the vcpu struct instead of having extra copies
to/from the thread_struct.  Similarly, when the guest state needs to
be saved, we can have it saved it directly to the vcpu struct by
setting the current->thread.fp_save_area and current->thread.vr_save_area
pointers.  That also means that we don't need to back up and restore
userspace's FP/vector state.  This all makes the code simpler and
faster.

Note that it's not necessary to save or modify current->thread.fpexc_mode,
since nothing in KVM uses or is affected by its value.  Nor is it
necessary to touch used_vr or used_vsr.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:15:02 +01:00
Paul Mackerras efff191223 KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structures
This uses struct thread_fp_state and struct thread_vr_state to store
the floating-point, VMX/Altivec and VSX state, rather than flat arrays.
This makes transferring the state to/from the thread_struct simpler
and allows us to unify the get/set_one_reg implementations for the
VSX registers.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:15:00 +01:00
Paul Mackerras 09548fdaf3 KVM: PPC: Use load_fp/vr_state rather than load_up_fpu/altivec
The load_up_fpu and load_up_altivec functions were never intended to
be called from C, and do things like modifying the MSR value in their
callers' stack frames, which are assumed to be interrupt frames.  In
addition, on 32-bit Book S they require the MMU to be off.

This makes KVM use the new load_fp_state() and load_vr_state() functions
instead of load_up_fpu/altivec.  This means we can remove the assembler
glue in book3s_rmhandlers.S, and potentially fixes a bug on Book E,
where load_up_fpu was called directly from C.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:14:59 +01:00
Gleb Natapov 458ff3c099 KVM: PPC: fix couple of memory leaks in MPIC/XICS devices
XICS failed to free xics structure on error path. MPIC destroy handler
forgot to delete kvm_device structure.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:14:54 +01:00
Alexander Graf 398a76c677 KVM: PPC: Add devname:kvm aliases for modules
Systems that support automatic loading of kernel modules through
device aliases should try and automatically load kvm when /dev/kvm
gets opened.

Add code to support that magic for all PPC kvm targets, even the
ones that don't support modules yet.

Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09 10:14:00 +01:00
Liu Ping Fan 27025a602c powerpc: kvm: optimize "sc 1" as fast return
In some scene, e.g openstack CI, PR guest can trigger "sc 1" frequently,
this patch optimizes the path by directly delivering BOOK3S_INTERRUPT_SYSCALL
to HV guest, so powernv can return to HV guest without heavy exit, i.e,
no need to swap TLB, HTAB,.. etc

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-11-21 14:56:45 +01:00
Linus Torvalds f080480488 Here are the 3.13 KVM changes. There was a lot of work on the PPC
side: the HV and emulation flavors can now coexist in a single kernel
 is probably the most interesting change from a user point of view.
 On the x86 side there are nested virtualization improvements and a
 few bugfixes.  ARM got transparent huge page support, improved
 overcommit, and support for big endian guests.
 
 Finally, there is a new interface to connect KVM with VFIO.  This
 helps with devices that use NoSnoop PCI transactions, letting the
 driver in the guest execute WBINVD instructions.  This includes
 some nVidia cards on Windows, that fail to start without these
 patches and the corresponding userspace changes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJShPAhAAoJEBvWZb6bTYbyl48P/297GgmELHAGBgjvb6q7yyGu
 L8+eHjKbh4XBAkPwyzbvUjuww5z2hM0N3JQ0BDV9oeXlO+zwwCEns/sg2Q5/NJXq
 XxnTeShaKnp9lqVBnE6G9rAOUWKoyLJ2wItlvUL8JlaO9xJ0Vmk0ta4n2Nv5GqDp
 db6UD7vju6rHtIAhNpvvAO51kAOwc01xxRixCVb7KUYOnmO9nvpixzoI/S0Rp1gu
 w/OWMfCosDzBoT+cOe79Yx1OKcpaVW94X6CH1s+ShCw3wcbCL2f13Ka8/E3FIcuq
 vkZaLBxio7vjUAHRjPObw0XBW4InXEbhI1DjzIvm8dmc4VsgmtLQkTCG8fj+jINc
 dlHQUq6Do+1F4zy6WMBUj8tNeP1Z9DsABp98rQwR8+BwHoQpGQBpAxW0TE0ZMngC
 t1caqyvjZ5pPpFUxSrAV+8Kg4AvobXPYOim0vqV7Qea07KhFcBXLCfF7BWdwq/Jc
 0CAOlsLL4mHGIQWZJuVGw0YGP7oATDCyewlBuDObx+szYCoV4fQGZVBEL0KwJx/1
 7lrLN7JWzRyw6xTgJ5VVwgYE1tUY4IFQcHu7/5N+dw8/xg9KWA3f4PeMavIKSf+R
 qteewbtmQsxUnvuQIBHLs8NRWPnBPy+F3Sc2ckeOLIe4pmfTte6shtTXcLDL+LqH
 NTmT/cfmYp2BRkiCfCiS
 =rWNf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM changes from Paolo Bonzini:
 "Here are the 3.13 KVM changes.  There was a lot of work on the PPC
  side: the HV and emulation flavors can now coexist in a single kernel
  is probably the most interesting change from a user point of view.

  On the x86 side there are nested virtualization improvements and a few
  bugfixes.

  ARM got transparent huge page support, improved overcommit, and
  support for big endian guests.

  Finally, there is a new interface to connect KVM with VFIO.  This
  helps with devices that use NoSnoop PCI transactions, letting the
  driver in the guest execute WBINVD instructions.  This includes some
  nVidia cards on Windows, that fail to start without these patches and
  the corresponding userspace changes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
  kvm, vmx: Fix lazy FPU on nested guest
  arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu
  arm/arm64: KVM: MMIO support for BE guest
  kvm, cpuid: Fix sparse warning
  kvm: Delete prototype for non-existent function kvm_check_iopl
  kvm: Delete prototype for non-existent function complete_pio
  hung_task: add method to reset detector
  pvclock: detect watchdog reset at pvclock read
  kvm: optimize out smp_mb after srcu_read_unlock
  srcu: API for barrier after srcu read unlock
  KVM: remove vm mmap method
  KVM: IOMMU: hva align mapping page size
  KVM: x86: trace cpuid emulation when called from emulator
  KVM: emulator: cleanup decode_register_operand() a bit
  KVM: emulator: check rex prefix inside decode_register()
  KVM: x86: fix emulation of "movzbl %bpl, %eax"
  kvm_host: typo fix
  KVM: x86: emulate SAHF instruction
  MAINTAINERS: add tree for kvm.git
  Documentation/kvm: add a 00-INDEX file
  ...
2013-11-15 13:51:36 +09:00
Linus Torvalds 66a173b926 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Benjamin Herrenschmidt:
 "The bulk of this is LE updates.  One should now be able to build an LE
  kernel and even run some things in it.

  I'm still sitting on a handful of patches to enable the new ABI that I
  *might* still send this merge window around, but due to the
  incertainty (they are pretty fresh) I want to keep them separate.

  Other notable changes are some infrastructure bits to better handle
  PCI pass-through under KVM, some bits and pieces added to the new
  PowerNV platform support such as access to the CPU SCOM bus via sysfs,
  and support for EEH error handling on PHB3 (Power8 PCIe).

  We also grew arch_get_random_long() for both pseries and powernv when
  running on P7+ and P8, exploiting the HW rng.

  And finally various embedded updates from freescale"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (154 commits)
  powerpc: Fix fatal SLB miss when restoring PPR
  powerpc/powernv: Reserve the correct PE number
  powerpc/powernv: Add PE to its own PELTV
  powerpc/powernv: Add support for indirect XSCOM via debugfs
  powerpc/scom: Improve debugfs interface
  powerpc/scom: Enable 64-bit addresses
  powerpc/boot: Properly handle the base "of" boot wrapper
  powerpc/bpf: Support MOD operation
  powerpc/bpf: Fix DIVWU instruction opcode
  of: Move definition of of_find_next_cache_node into common code.
  powerpc: Remove big endianness assumption in of_find_next_cache_node
  powerpc/tm: Remove interrupt disable in __switch_to()
  powerpc: word-at-a-time optimization for 64-bit Little Endian
  powerpc/bpf: BPF JIT compiler for 64-bit Little Endian
  powerpc: Only save/restore SDR1 if in hypervisor mode
  powerpc/pmu: Fix ADB_PMU_LED_IDE dependencies
  powerpc/nvram: Fix endian issue when using the partition length
  powerpc/nvram: Fix endian issue when reading the NVRAM size
  powerpc/nvram: Scan partitions only once
  powerpc/mpc512x: remove unnecessary #if
  ...
2013-11-12 14:34:19 +09:00
Gleb Natapov 95f328d3ad Merge branch 'kvm-ppc-queue' of git://github.com/agraf/linux-2.6 into queue
Conflicts:
	arch/powerpc/include/asm/processor.h
2013-11-04 10:20:57 +02:00
Aneesh Kumar K.V a78b55d1c0 kvm: powerpc: book3s: drop is_hv_enabled
drop is_hv_enabled, because that should not be a callback property

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 18:43:34 +02:00
Aneesh Kumar K.V cbbc58d4fd kvm: powerpc: book3s: Allow the HV and PR selection per virtual machine
This moves the kvmppc_ops callbacks to be a per VM entity. This
enables us to select HV and PR mode when creating a VM. We also
allow both kvm-hv and kvm-pr kernel module to be loaded. To
achieve this we move /dev/kvm ownership to kvm.ko module. Depending on
which KVM mode we select during VM creation we take a reference
count on respective module

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: fix coding style]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 18:42:36 +02:00
Aneesh Kumar K.V 5587027ce9 kvm: Add struct kvm arg to memslot APIs
We will use that in the later patch to find the kvm ops handler

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:49:23 +02:00
Aneesh Kumar K.V 2ba9f0d887 kvm: powerpc: book3s: Support building HV and PR KVM as module
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: squash in compile fix]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:45:35 +02:00
Aneesh Kumar K.V dba291f2ce kvm: powerpc: booke: Move booke related tracepoints to separate header
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:37:16 +02:00
Aneesh Kumar K.V 72c1253574 kvm: powerpc: book3s: pr: move PR related tracepoints to a separate header
This patch moves PR related tracepoints to a separate header. This
enables in converting PR to a kernel module which will be done in
later patches

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:36:22 +02:00
Aneesh Kumar K.V 699cc87641 kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops
This help us to identify whether we are running with hypervisor mode KVM
enabled. The change is needed so that we can have both HV and PR kvm
enabled in the same kernel.

If both HV and PR KVM are included, interrupts come in to the HV version
of the kvmppc_interrupt code, which then jumps to the PR handler,
renamed to kvmppc_interrupt_pr, if the guest is a PR guest.

Allowing both PR and HV in the same kernel required some changes to
kvm_dev_ioctl_check_extension(), since the values returned now can't
be selected with #ifdefs as much as previously. We look at is_hv_enabled
to return the right value when checking for capabilities.For capabilities that
are only provided by HV KVM, we return the HV value only if
is_hv_enabled is true. For capabilities provided by PR KVM but not HV,
we return the PR value only if is_hv_enabled is false.

NOTE: in later patch we replace is_hv_enabled with a static inline
function comparing kvm_ppc_ops

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:29:09 +02:00
Aneesh Kumar K.V dd96b2c2dc kvm: powerpc: book3s: Cleanup interrupt handling code
With this patch if HV is included, interrupts come in to the HV version
of the kvmppc_interrupt code, which then jumps to the PR handler,
renamed to kvmppc_interrupt_pr, if the guest is a PR guest. This helps
in enabling both HV and PR, which we do in later patch

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:26:31 +02:00
Aneesh Kumar K.V 3a167beac0 kvm: powerpc: Add kvmppc_ops callback
This patch add a new callback kvmppc_ops. This will help us in enabling
both HV and PR KVM together in the same kernel. The actual change to
enable them together is done in the later patch in the series.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: squash in booke changes]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:24:26 +02:00
Aneesh Kumar K.V 9975f5e369 kvm: powerpc: book3s: Add a new config variable CONFIG_KVM_BOOK3S_HV_POSSIBLE
This help ups to select the relevant code in the kernel code
when we later move HV and PR bits as seperate modules. The patch
also makes the config options for PR KVM selectable

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:18:28 +02:00
Aneesh Kumar K.V 7aa79938f7 kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLE
With later patches supporting PR kvm as a kernel module, the changes
that has to be built into the main kernel binary to enable PR KVM module
is now selected via KVM_BOOK3S_PR_POSSIBLE

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:17:49 +02:00
Paul Mackerras 066212e02a kvm: powerpc: book3s: move book3s_64_vio_hv.c into the main kernel binary
Since the code in book3s_64_vio_hv.c is called from real mode with HV
KVM, and therefore has to be built into the main kernel binary, this
makes it always built-in rather than part of the KVM module.  It gets
called from the KVM module by PR KVM, so this adds an EXPORT_SYMBOL_GPL().

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:17:25 +02:00
Paul Mackerras 178db620ee kvm: powerpc: book3s: remove kvmppc_handler_highmem label
This label is not used now.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:15:56 +02:00
Bharat Bhushan ce11e48b7f KVM: PPC: E500: Add userspace debug stub support
This patch adds the debug stub support on booke/bookehv.
Now QEMU debug stub can use hw breakpoint, watchpoint and
software breakpoint to debug guest.

This is how we save/restore debug register context when switching
between guest, userspace and kernel user-process:

When QEMU is running
 -> thread->debug_reg == QEMU debug register context.
 -> Kernel will handle switching the debug register on context switch.
 -> no vcpu_load() called

QEMU makes ioctls (except RUN)
 -> This will call vcpu_load()
 -> should not change context.
 -> Some ioctls can change vcpu debug register, context saved in vcpu->debug_regs

QEMU Makes RUN ioctl
 -> Save thread->debug_reg on STACK
 -> Store thread->debug_reg == vcpu->debug_reg
 -> load thread->debug_reg
 -> RUN VCPU ( So thread points to vcpu context )

Context switch happens When VCPU running
 -> makes vcpu_load() should not load any context
 -> kernel loads the vcpu context as thread->debug_regs points to vcpu context.

On heavyweight_exit
 -> Load the context saved on stack in thread->debug_reg

Currently we do not support debug resource emulation to guest,
On debug exception, always exit to user space irrespective of
user space is expecting the debug exception or not. If this is
unexpected exception (breakpoint/watchpoint event not set by
userspace) then let us leave the action on user space. This
is similar to what it was before, only thing is that now we
have proper exit state available to user space.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:49:40 +02:00
Bharat Bhushan 547465ef8b KVM: PPC: E500: Using "struct debug_reg"
For KVM also use the "struct debug_reg" defined in asm/processor.h

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:49:39 +02:00
Bharat Bhushan b12c784123 KVM: PPC: E500: exit to user space on "ehpriv 1" instruction
"ehpriv 1" instruction is used for setting software breakpoints
by user space. This patch adds support to exit to user space
with "run->debug" have relevant information.

As this is the first point we are using run->debug, also defined
the run->debug structure.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:49:39 +02:00
Bharat Bhushan 84e4d632b5 kvm: powerpc: e500: mark page accessed when mapping a guest page
Mark the guest page as accessed so that there is likely
less chances of this page getting swap-out.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:49:38 +02:00
Bharat Bhushan ca8ccbd41d kvm: powerpc: allow guest control "G" attribute in mas2
"G" bit in MAS2 indicates whether the page is Guarded.
There is no reason to stop guest setting  "G", so allow him.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:49:37 +02:00
Bharat Bhushan fd75cb51f2 kvm: powerpc: allow guest control "E" attribute in mas2
"E" bit in MAS2 bit indicates whether the page is accessed
in Little-Endian or Big-Endian byte order.
There is no reason to stop guest setting  "E", so allow him."

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:49:37 +02:00
Paul Mackerras 44a3add863 KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode
When an interrupt or exception happens in the guest that comes to the
host, the CPU goes to hypervisor real mode (MMU off) to handle the
exception but doesn't change the MMU context.  After saving a few
registers, we then clear the "in guest" flag.  If, for any reason,
we get an exception in the real-mode code, that then gets handled
by the normal kernel exception handlers, which turn the MMU on.  This
is disastrous if the MMU is still set to the guest context, since we
end up executing instructions from random places in the guest kernel
with hypervisor privilege.

In order to catch this situation, we define a new value for the "in guest"
flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real
mode with guest MMU context.  If the "in guest" flag is set to this value,
we branch off to an emergency handler.  For the moment, this just does
a branch to self to stop the CPU from doing anything further.

While we're here, we define another new flag value to indicate that we
are in a HV guest, as distinct from a PR guest.  This will be useful
when we have a kernel that can support both PR and HV guests concurrently.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:49:37 +02:00
Paul Mackerras f1378b1c0b kvm: powerpc: book3s hv: Fix vcore leak
add kvmppc_free_vcores() to free the kvmppc_vcore structures
that we allocate for a guest, which are currently being leaked.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 14:49:36 +02:00