Commit graph

9465 commits

Author SHA1 Message Date
Alexander Graf
c46dc9a861 KVM: PPC: Emulator: clean up instruction parsing
Instructions on PPC are pretty similarly encoded. So instead of
every instruction emulation code decoding the instruction fields
itself, we can move that code to more generic places and rely on
the compiler to optimize the unused bits away.

This has 2 advantages. It makes the code smaller and it makes the
code less error prone, as the instruction fields are always
available, so accidental misusage is reduced.

Functionally, this patch doesn't change anything.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:12 +02:00
Benjamin Herrenschmidt
5b74716eba kvm/powerpc: Add new ioctl to retreive server MMU infos
This is necessary for qemu to be able to pass the right information
to the guest, such as the supported page sizes and corresponding
encodings in the SLB and hash table, which can vary depending
on the processor type, the type of KVM used (PR vs HV) and the
version of KVM

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix compilation on hv, adjust for newer ioctl numbers]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:12 +02:00
Benjamin Herrenschmidt
f31e65e117 kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM
There is nothing in the code for emulating TCE tables in the kernel
that prevents it from working on "PR" KVM... other than ifdef's and
location of the code.

This and moves the bulk of the code there to a new file called
book3s_64_vio.c.

This speeds things up a bit on my G5.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix for hv kvm, 32bit, whitespace]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:11 +02:00
Mihai Caraman
4444aa5f78 KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler
Guest r8 register is held in the scratch register and stored correctly,
so remove the instruction that clobbers it. Guest r13 was missing from vcpu,
store it there.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:11 +02:00
Alexander Graf
3b1d9d7d95 KVM: PPC: Book3S: Enable IRQs during exit handling
While handling an exit, we should listen for interrupts and make sure to
receive them when they arrive, to keep our latencies low.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:11 +02:00
Alexander Graf
11f7d6c2d1 KVM: PPC: Fix PR KVM on POWER7 bare metal
When running on a system that is HV capable, some interrupts use HSRR
SPRs instead of the normal SRR SPRs. These are also used in the Linux
handlers to jump back to code after an interrupt got processed.

Unfortunately, in our "jump back to the real host handler after we've
done the context switch" code, we were only setting the SRR SPRs,
rendering Linux to jump back to some invalid IP after it's processed
the interrupt.

This fixes random crashes on p7 opal mode with PR KVM for me.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:10 +02:00
Alexander Graf
978b4fae45 KVM: PPC: Fix stbux emulation
Stbux writes the address it's operating on to the register specified in ra,
not into the data source register.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:10 +02:00
Mihai Caraman
518f040c82 KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields
Interrupt code used PPC_LL/PPC_STL macros to load/store some of u32 fields
which led to memory overflow on 64-bit. Use lwz/stw instead.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:09 +02:00
Alexander Graf
af415087d2 KVM: PPC: Book3S: PR: No isync in slbie path
While messing around with the SLBs we're running in real mode. The
entry to guest space goes through rfid, which is context synchronizing,
so there's no need to manually synchronize anything through isync.

With this patch and a simple priviledged SPR access loop guest, I get
a speed bump from 2035607 to 2181301 exits per second.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:09 +02:00
Alexander Graf
8c2d0be7ef KVM: PPC: Book3S: PR: Optimize entry path
By shuffling a few instructions around we can execute more memory
loads in parallel, giving us a small performance boost.

With this patch and a simple priviledged SPR access loop guest, I get
a speed bump from 2013052 to 2035607 exits per second.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:09 +02:00
Varun Sethi
30124906db KVM: PPC: booke(hv): Fix save/restore of guest accessible SPRGs.
For Guest accessible SPRGs 4-7, save/restore must be handled differently for 64bit and
non-64 bit case. Use the PPC_STD/PPC_LD macros for saving/restoring to/from these registers.

Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:09 +02:00
Alexander Graf
3d4c6826ed KVM: PPC: Restrict PPC_[L|ST]D macro to asm code
We only want asm code macros to be accessible from asm code, so #ifdef it
depending on it.

Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:08 +02:00
Varun Sethi
185e4188da KVM: PPC: bookehv: Use a Macro for saving/restoring guest registers to/from their 64 bit copies.
Introduced PPC_STD/PPC_LD macros for saving/restoring guest registers to/from their 64 bit copies.

Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:08 +02:00
Bharat Bhushan
6e35994d1f KVM: PPC: Use clockevent multiplier and shifter for decrementer
Time for which the hrtimer is started for decrementer emulation is calculated
using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC
reprogramming (if needed) and which calculate timebase ticks using the
multiplier and shifter mechanism implemented within clockevent layer.

It was observed that this conversion (timebase->time->timebase) are not
correct because the mechanism are not consistent.
In our setup it adds 2% jitter.

With this patch clockevent multiplier and shifter mechanism are used when
starting hrtimer for decrementer emulation. Now the jitter is < 0.5%.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:07 +02:00
Bharat Bhushan
cc902ad4f2 KVM: Use minimum and maximum address mapped by TLB1
Keep track of minimum and maximum address mapped by tlb1.
This helps in TLBMISS handling in KVM to quick check whether the address lies in mapped range.
If address does not lies in this range then no need to look in each tlb1 entry of tlb1 array.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-05-06 16:19:07 +02:00
Marcelo Tosatti
eac0556750 Merge branch 'linus' into queue
Merge reason: development work has dependency on kvm patches merged
upstream.

Conflicts:
	Documentation/feature-removal-schedule.txt

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-04-19 17:06:26 -03:00
Linus Torvalds
7e06648972 irqdomain bug fixes for v3.4-rc3
This branch fixes a bug in irq_create_mapping() where an error return
 from irq_alloc_desc_from() gets ignored.  It also removes irq_virq_count
 to fix a bug on powerpc where the irqdomain code does not find irqs
 allocated above the CONFIG_NR_IRQS boundary.  The remaining patches get
 rid of an completely pointless export and fix some minor bugs in the
 irqdomain debug output.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPhni4AAoJEEFnBt12D9kBA/cP/jv3ENYDy2/g1/eE6W1aSkUf
 /7FlfpXsufS0Bl+wfk7sN8D1NLoB/36bLVU0TStup90vL03WT9A+BHl9tjogpZVz
 oDuLFYHSuVVOK40SSrcnOUc6rncKAni9tGjVjFCxVAx3FlqebTHWDu/Cl4BAaWBo
 +j2u4HHelHgr8oXCY5avWS0cOn3L7rIoJ54/Jqpn10OooqH2cgz9xYMb+1/ORfz1
 xjpJ4OiXKnSvuG7WD0S1EKPMbaiyak+jBoHYYNpEOriTMtcOTNg5hjz7b3jDfOrm
 gkNReffdDXCnsCPj/1gEhJlB4i+iTES0lTBVfOZ8M2luhF6wuGUYeRaiy+/m00DZ
 qYFXD5TaVM0+2USCeo71DPfag8now6YrJNIv93CGEY0fLGDJJg2yJI3oUN728p9a
 E88JLPs8f//8rxQaBatGtHmReD4wKwCevciVekSWZSROnPxnIP8PvBPq8e4Bf04r
 q+VBmr+gJh+oaDAZrIaRPsRCidHhwzIrexa4cv7rt84vnx2Hltq75ijaPNlR3JU7
 FFhZj1l8185HxXEsTJHEmiKN0J/drVIu/beGgHD7NbWWIdt8tqgtNOEUudVTisfM
 VgBdgjjbKFwQDuOxgaYgERwCkb1YXFT/kDKpgKaYnxl0yGaALjxO+ISd2fIJOuKO
 fzeVN4LDvVCysAQ/SeOG
 =6Ejq
 -----END PGP SIGNATURE-----

Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull irqdomain bug fixes from Grant Likely:
 "This branch fixes a bug in irq_create_mapping() where an error return
  from irq_alloc_desc_from() gets ignored.

  It also removes irq_virq_count to fix a bug on powerpc where the
  irqdomain code does not find irqs allocated above the CONFIG_NR_IRQS
  boundary.

  The remaining patches get rid of an completely pointless export and
  fix some minor bugs in the irqdomain debug output."

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  irq_domain: Move irq_virq_count into NOMAP revmap
  irqdomain: Fix debugfs formatting
  irq_domain: correct the debugfs file name
  irq: Kill pointless irqd_to_hw export
  irq/irq_domain: Quit ignoring error returns from irq_alloc_desc_from().
2012-04-12 12:49:56 -07:00
Grant Likely
6fa6c8e25e irq_domain: Move irq_virq_count into NOMAP revmap
This patch replaces the old global setting of irq_virq_count that is only
used by the NOMAP mapping and instead uses a revmap_data property so that
the maximum NOMAP allocation can be set per NOMAP irq_domain.

There is exactly one user of irq_virq_count in-tree right now: PS3.
Also, irq_virq_count is only useful for the NOMAP mapping.  So,
instead of having a single global irq_virq_count values, this change
drops it entirely and added a max_irq argument to irq_domain_add_nomap().
That makes it a property of an individual nomap irq domain instead of
a global system settting.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
2012-04-12 00:37:48 -06:00
Linus Torvalds
45852766a0 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Benjamin Herrenschmidt:
 "Fixes for two nasty regression affecting powerpc in 3.4."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix typo in runlatch code
  powerpc: Fix page fault with lockdep regression
2012-04-11 20:56:28 -07:00
Scott Wood
9de6fe91af KVM: PPC: add CPU_FTR_EMB_HV to CPU table
e6500 support (commit 10241842fb,
"powerpc: Add initial e6500 cpu support" and the introduction of
CPU_FTR_EMB_HV (commit 73196cd364,
"KVM: PPC: e500mc support") collided during merge, leaving e6500's CPU
table entry missing CPU_FTR_EMB_HV.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-04-11 22:00:48 -03:00
Grant Likely
a699e4e49e irq: Kill pointless irqd_to_hw export
It makes no sense to export this trivial function.  Make it a static inline
instead.

This patch also drops virq_to_hw from arch/c6x since it is unused by that
architecture.

v2: Move irq_hw_number_t into types.h to fix ARM build failure

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-04-10 22:39:17 -06:00
Benjamin Herrenschmidt
fae2e0fb24 powerpc: Fix typo in runlatch code
Commit fe1952fc0a
"powerpc: Rework runlatch code" has a nasty typo
where it uses "TLF_RUNLATCH" instead of "_TLF_RUNLATCH"
(bit number instead of bit mask), causing some flags to
be potentially lost such as _TLF_RESTORE_SIGMASK

(Brown paper bag for me ! We should be able to make
that break at compile time with a bit of magic, any
volunteer ?)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-04-11 10:42:15 +10:00
Benjamin Herrenschmidt
08f1ec8a59 powerpc: Fix page fault with lockdep regression
commit a546498f3b
introduced a regression on 32-bit when irq tracing
is enabled by exposing an old bug in our irq tracing
code for exception entry.

The code would save and restore some GPRs around the
calls to the C lockdep code, however, it tries to be
too smart for its own good and restores some of the
GPRs from the exception frame (as saved there on
exception entry).

However, for page faults, we do replace those GPRs with
arguments to do_page_fault before we call transfer_to_handler
and so restoring from the exception frame is plain wrong in
this case.

This was fine as long as we didn't touch the interrupt state
when taking page fault, but when I started doing it, it would
trigger the lockdep calls and the bug.

This fixes it by cleaning up that code a bit. It did create
a small stack frame for the sake of backtraces, so let's
make it a bit bigger and use it to save and restore the
stuff we care about.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-04-10 17:21:35 +10:00
Benjamin Herrenschmidt
bbcc9c0669 powerpc/kvm: Fix magic page vs. 32-bit RTAS on ppc64
When the kernel calls into RTAS, it switches to 32-bit mode. The
magic page was is longer accessible in that case, causing the
patched instructions in the RTAS call wrapper to crash.

This fixes it by making available a 32-bit mapping of the magic
page in that case. This mapping is flushed whenever we switch
the kernel back to 64-bit mode.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: add a check if the magic page is mapped]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:02:39 +03:00
Alexander Graf
966cd0f3bd KVM: PPC: Ignore unhalt request from kvm_vcpu_block
When running kvm_vcpu_block and it realizes that the CPU is actually good
to run, we get a request bit set for KVM_REQ_UNHALT. Right now, there's
nothing we can do with that bit, so let's unset it right after the call
again so we don't get confused in our later checks for pending work.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:02:38 +03:00
Alexander Graf
4f225ae06e KVM: PPC: Book3s: PR: Add HV traps so we can run in HV=1 mode on p7
When running PR KVM on a p7 system in bare metal, we get HV exits instead
of normal supervisor traps. Semantically they are identical though and the
HSRR vs SRR difference is already taken care of in the exit code.

So all we need to do is handle them in addition to our normal exits.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:02:00 +03:00
Alexander Graf
6df79df5b2 KVM: PPC: Emulate tw and td instructions
There are 4 conditional trapping instructions: tw, twi, td, tdi. The
ones with an i take an immediate comparison, the others compare two
registers. All of them arrive in the emulator when the condition to
trap was successfully fulfilled.

Unfortunately, we were only implementing the i versions so far, so
let's also add support for the other two.

This fixes kernel booting with recents book3s_32 guest kernels.

Reported-by: Jörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:01:57 +03:00
Alexander Graf
6020c0f6e7 KVM: PPC: Pass EA to updating emulation ops
When emulating updating load/store instructions (lwzu, stwu, ...) we need to
write the effective address of the load/store into a register.

Currently, we write the physical address in there, which is very wrong. So
instead let's save off where the virtual fault was on MMIO and use that
information as value to put into the register.

While at it, also move the XOP variants of the above instructions to the new
scheme of using the already known vaddr instead of calculating it themselves.

Reported-by: Jörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:01:37 +03:00
Paul Mackerras
8943633cf9 KVM: PPC: Work around POWER7 DABR corruption problem
It turns out that on POWER7, writing to the DABR can cause a corrupted
value to be written if the PMU is active and updating SDAR in continuous
sampling mode.  To work around this, we make sure that the PMU is inactive
and SDAR updates are disabled (via MMCRA) when we are context-switching
DABR.

When the guest sets DABR via the H_SET_DABR hypercall, we use a slightly
different workaround, which is to read back the DABR and write it again
if it got corrupted.

While we are at it, make it consistent that the saving and restoring
of the guest's non-volatile GPRs and the FPRs are done with the guest
setup of the PMU active.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:01:36 +03:00
Paul Mackerras
7657f4089b KVM: PPC: Book 3S: Fix compilation for !HV configs
Commits 2f5cdd5487 ("KVM: PPC: Book3S HV: Make secondary threads more
robust against stray IPIs") and 1c2066b0f7 ("KVM: PPC: Book3S HV: Make
virtual processor area registration more robust") added fields to
struct kvm_vcpu_arch inside #ifdef CONFIG_KVM_BOOK3S_64_HV regions,
and added lines to arch/powerpc/kernel/asm-offsets.c to generate
assembler constants for their offsets.  Unfortunately this led to
compile errors on Book 3S machines for configs that had KVM enabled
but not CONFIG_KVM_BOOK3S_64_HV.  This fixes the problem by moving
the offending lines inside #ifdef CONFIG_KVM_BOOK3S_64_HV regions.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:01:34 +03:00
Bharat Bhushan
c0fe7b0999 Restore guest CR after exit timing calculation
No instruction which can change Condition Register (CR) should be executed after
Guest CR is loaded. So the guest CR is restored after the Exit Timing in
lightweight_exit executes cmpw, which can clobber CR.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:01:31 +03:00
Paul Mackerras
0456ec4ff2 KVM: PPC: Book3S HV: Report stolen time to guest through dispatch trace log
This adds code to measure "stolen" time per virtual core in units of
timebase ticks, and to report the stolen time to the guest using the
dispatch trace log (DTL).  The guest can register an area of memory
for the DTL for a given vcpu.  The DTL is a ring buffer where KVM
fills in one entry every time it enters the guest for that vcpu.

Stolen time is measured as time when the virtual core is not running,
either because the vcore is not runnable (e.g. some of its vcpus are
executing elsewhere in the kernel or in userspace), or when the vcpu
thread that is running the vcore is preempted.  This includes time
when all the vcpus are idle (i.e. have executed the H_CEDE hypercall),
which is OK because the guest accounts stolen time while idle as idle
time.

Each vcpu keeps a record of how much stolen time has been reported to
the guest for that vcpu so far.  When we are about to enter the guest,
we create a new DTL entry (if the guest vcpu has a DTL) and report the
difference between total stolen time for the vcore and stolen time
reported so far for the vcpu as the "enqueue to dispatch" time in the
DTL entry.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:01:29 +03:00
Paul Mackerras
2e25aa5f64 KVM: PPC: Book3S HV: Make virtual processor area registration more robust
The PAPR API allows three sorts of per-virtual-processor areas to be
registered (VPA, SLB shadow buffer, and dispatch trace log), and
furthermore, these can be registered and unregistered for another
virtual CPU.  Currently we just update the vcpu fields pointing to
these areas at the time of registration or unregistration.  If this
is done on another vcpu, there is the possibility that the target vcpu
is using those fields at the time and could end up using a bogus
pointer and corrupting memory.

This fixes the race by making the target cpu itself do the update, so
we can be sure that the update happens at a time when the fields
aren't being used.  Each area now has a struct kvmppc_vpa which is
used to manage these updates.  There is also a spinlock which protects
access to all of the kvmppc_vpa structs, other than to the pinned_addr
fields.  (We could have just taken the spinlock when using the vpa,
slb_shadow or dtl fields, but that would mean taking the spinlock on
every guest entry and exit.)

This also changes 'struct dtl' (which was undefined) to 'struct dtl_entry',
which is what the rest of the kernel uses.

Thanks to Michael Ellerman <michael@ellerman.id.au> for pointing out
the need to initialize vcpu->arch.vpa_update_lock.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:01:27 +03:00
Paul Mackerras
f0888f7015 KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIs
Currently on POWER7, if we are running the guest on a core and we don't
need all the hardware threads, we do nothing to ensure that the unused
threads aren't executing in the kernel (other than checking that they
are offline).  We just assume they're napping and we don't do anything
to stop them trying to enter the kernel while the guest is running.
This means that a stray IPI can wake up the hardware thread and it will
then try to enter the kernel, but since the core is in guest context,
it will execute code from the guest in hypervisor mode once it turns the
MMU on, which tends to lead to crashes or hangs in the host.

This fixes the problem by adding two new one-byte flags in the
kvmppc_host_state structure in the PACA which are used to interlock
between the primary thread and the unused secondary threads when entering
the guest.  With these flags, the primary thread can ensure that the
unused secondaries are not already in kernel mode (i.e. handling a stray
IPI) and then indicate that they should not try to enter the kernel
if they do get woken for any reason.  Instead they will go into KVM code,
find that there is no vcpu to run, acknowledge and clear the IPI and go
back to nap mode.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:01:20 +03:00
Alexander Graf
f6127716c3 KVM: PPC: Save/Restore CR over vcpu_run
On PPC, CR2-CR4 are nonvolatile, thus have to be saved across function calls.
We didn't respect that for any architecture until Paul spotted it in his
patch for Book3S-HV. This patch saves/restores CR for all KVM capable PPC hosts.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 14:01:02 +03:00
Matt Evans
3aaefef200 KVM: PPC: Book3s: PR: Add SPAPR H_BULK_REMOVE support
SPAPR support includes various in-kernel hypercalls, improving performance
by cutting out the exit to userspace.  H_BULK_REMOVE is implemented in this
patch.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:31 +03:00
Alexander Graf
03660ba270 KVM: PPC: Booke: only prepare to enter when we enter
So far, we've always called prepare_to_enter even when all we did was return
to the host. This patch changes that semantic to only call prepare_to_enter
when we actually want to get back into the guest.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:29 +03:00
Alexander Graf
7cc1e8ee78 KVM: PPC: booke: Reinject performance monitor interrupts
When we get a performance monitor interrupt, we need to make sure that
the host receives it. So reinject it like we reinject the other host
destined interrupts.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:28 +03:00
Alexander Graf
4e642ccbd6 KVM: PPC: booke: expose good state on irq reinject
When reinjecting an interrupt into the host interrupt handler after we're
back in host kernel land, we need to tell the kernel where the interrupt
happened. We can't tell it that we were in guest state, because that might
lead to random code walking host addresses. So instead, we tell it that
we came from the interrupt reinject code.

This helps getting reasonable numbers out of perf.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:26 +03:00
Alexander Graf
95f2e92144 KVM: PPC: booke: Support perfmon interrupts
When during guest context we get a performance monitor interrupt, we
currently bail out and oops. Let's route it to its correct handler
instead.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:24 +03:00
Alexander Graf
c6b3733bef KVM: PPC: e500: fix typo in tlb code
The tlbncfg registers should be populated with their respective TLB's
values. Fix the obvious typo.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:22 +03:00
Alexander Graf
55cdf08b9a KVM: PPC: bookehv: remove unused code
There was some unused code in the exit code path that must have been
a leftover from earlier iterations. While it did no harm, it's superfluous
and thus should be removed.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:21 +03:00
Alexander Graf
0268597c81 KVM: PPC: booke: add GS documentation for program interrupt
The comment for program interrupts triggered when using bookehv was
misleading. Update it to mention why MSR_GS indicates that we have
to inject an interrupt into the guest again, not emulate it.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:19 +03:00
Alexander Graf
c35c9d84cf KVM: PPC: booke: Readd debug abort code for machine check
When during guest execution we get a machine check interrupt, we don't
know how to handle it yet. So let's add the error printing code back
again that we dropped accidently earlier and tell user space that something
went really wrong.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:17 +03:00
Alexander Graf
5fd8505ea4 KVM: PPC: bookehv: add comment about shadow_msr
For BookE HV the guest visible MSR is shared->msr and is identical to
the MSR that is in use while the guest is running, because we can't trap
reads from/to MSR.

So shadow_msr is unused there. Indicate that with a comment.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:15 +03:00
Alexander Graf
e9ba39c1f3 KVM: PPC: bookehv: disable MAS register updates early
We need to make sure that no MAS updates happen automatically while we
have the guest MAS registers loaded. So move the disabling code a bit
higher up so that it covers the full time we have guest values in MAS
registers.

The race this patch fixes should never occur, but it makes the code a
bit more logical to do it this way around.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:14 +03:00
Alexander Graf
8a3da55784 KVM: PPC: bookehv: remove SET_VCPU
The SET_VCPU macro is a leftover from times when the vcpu struct wasn't
stored in the thread on vcpu_load/put. It's not needed anymore. Remove it.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:12 +03:00
Alexander Graf
8764b46ee3 KVM: PPC: bookehv: remove negation for CONFIG_64BIT
Instead if doing

  #ifndef CONFIG_64BIT
  ...
  #else
  ...
  #endif

we should rather do

  #ifdef CONFIG_64BIT
  ...
  #else
  ...
  #endif

which is a lot easier to read. Change the bookehv implementation to
stick with this rule.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:10 +03:00
Alexander Graf
73ede8d32b KVM: PPC: bookehv: fix exit timing
When using exit timing stats, we clobber r9 in the NEED_EMU case,
so better move that part down a few lines and fix it that way.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:08 +03:00
Alexander Graf
8b3a00fcd3 KVM: PPC: booke: BOOKE_IRQPRIO_MAX is n+1
The semantics of BOOKE_IRQPRIO_MAX changed to denote the highest available
irqprio + 1, so let's reflect that in the code too.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:55:06 +03:00