Commit graph

768086 commits

Author SHA1 Message Date
Junaid Shahid
c9470a2e28 kvm: x86: Propagate guest PCIDs to host PCIDs
When using shadow paging mode, propagate the guest's PCID value to
the shadow CR3 in the host instead of always using PCID 0.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:56 +02:00
Junaid Shahid
afe828d1de kvm: x86: Add ability to skip TLB flush when switching CR3
Remove the implicit flush from the set_cr3 handlers, so that the
callers are able to decide whether to flush the TLB or not.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:55 +02:00
Junaid Shahid
50c28f21d0 kvm: x86: Use fast CR3 switch for nested VMX
Use the fast CR3 switch mechanism to locklessly change the MMU root
page when switching between L1 and L2. The switch from L2 to L1 should
always go through the fast path, while the switch from L1 to L2 should
go through the fast path if L1's CR3/EPTP for L2 hasn't changed
since the last time.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:54 +02:00
Junaid Shahid
1c53da3fa3 kvm: x86: Support resetting the MMU context without resetting roots
This adds support for re-initializing the MMU context in a different
mode while preserving the active root_hpa and the prev_root.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:54 +02:00
Junaid Shahid
0aab33e4f9 kvm: x86: Add support for fast CR3 switch across different MMU modes
This generalizes the lockless CR3 switch path to be able to work
across different MMU modes (e.g. nested vs non-nested) by checking
that the expected page role of the new root page matches the page role
of the previously stored root page in addition to checking that the new
CR3 matches the previous CR3. Furthermore, instead of loading the
hardware CR3 in fast_cr3_switch(), it is now done in vcpu_enter_guest(),
as by that time the MMU context would be up-to-date with the VCPU mode.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:53 +02:00
Junaid Shahid
6e42782f51 kvm: x86: Introduce KVM_REQ_LOAD_CR3
The KVM_REQ_LOAD_CR3 request loads the hardware CR3 using the
current root_hpa.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:52 +02:00
Junaid Shahid
9fa72119b2 kvm: x86: Introduce kvm_mmu_calc_root_page_role()
These functions factor out the base role calculation from the
corresponding kvm_init_*_mmu() functions. The new functions return
what would be the role assigned to a root page in the current VCPU
state. This can be masked with mmu_base_role_mask to derive the base
role.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:51 +02:00
Junaid Shahid
7c390d350f kvm: x86: Add fast CR3 switch code path
When using shadow paging, a CR3 switch in the guest results in a VM Exit.
In the common case, that VM exit doesn't require much processing by KVM.
However, it does acquire the MMU lock, which can start showing signs of
contention under some workloads even on a 2 VCPU VM when the guest is
using KPTI. Therefore, we add a fast path that avoids acquiring the MMU
lock in the most common cases e.g. when switching back and forth between
the kernel and user mode CR3s used by KPTI with no guest page table
changes in between.

For now, this fast path is implemented only for 64-bit guests and hosts
to avoid the handling of PDPTEs, but it can be extended later to 32-bit
guests and/or hosts as well.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:51 +02:00
Junaid Shahid
578e1c4db2 kvm: x86: Avoid taking MMU lock in kvm_mmu_sync_roots if no sync is needed
kvm_mmu_sync_roots() can locklessly check whether a sync is needed and just
bail out if it isn't.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:50 +02:00
Junaid Shahid
5ce4786f75 kvm: x86: Make sync_page() flush remote TLBs once only
sync_page() calls set_spte() from a loop across a page table. It would
work better if set_spte() left the TLB flushing to its callers, so that
sync_page() can aggregate into a single call.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:49 +02:00
Peter Xu
42522d08cd KVM: MMU: drop vcpu param in gpte_access
It's never used.  Drop it.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:48 +02:00
Liran Alon
abfc52c612 KVM: nVMX: Separate logic allocating shadow vmcs to a function
No functionality change.
This is done as a preparation for VMCS shadowing virtualization.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:48 +02:00
Liran Alon
491a603845 KVM: VMX: Mark vmcs header as shadow in case alloc_vmcs_cpu() allocate shadow vmcs
No functionality change.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:47 +02:00
Liran Alon
32c7acf044 KVM: nVMX: Expose VMCS shadowing to L1 guest
Expose VMCS shadowing to L1 as a VMX capability of the virtual CPU,
whether or not VMCS shadowing is supported by the physical CPU.
(VMCS shadowing emulation)

Shadowed VMREADs and VMWRITEs from L2 are handled by L0, without a
VM-exit to L1.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:46 +02:00
Liran Alon
a7cde481b6 KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps
This is done as a preparation for VMCS shadowing emulation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:45 +02:00
Liran Alon
6d894f498f KVM: nVMX: vmread/vmwrite: Use shadow vmcs12 if running L2
This is done as a preparation to VMCS shadowing emulation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:44 +02:00
Paolo Bonzini
9a78bdf31d KVM: selftests: add tests for shadow VMCS save/restore
This includes setting up the shadow VMCS and the secondary execution
controls in lib/vmx.c.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:44 +02:00
Paolo Bonzini
fa58a9fa74 KVM: nVMX: include shadow vmcs12 in nested state
The shadow vmcs12 cannot be flushed on KVM_GET_NESTED_STATE,
because at that point guest memory is assumed by userspace to
be immutable.  Capture the cache in vmx_get_nested_state, adding
another page at the end if there is an active shadow vmcs12.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:43 +02:00
Liran Alon
61ada7488f KVM: nVMX: Cache shadow vmcs12 on VMEntry and flush to memory on VMExit
This is done is done as a preparation to VMCS shadowing emulation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:42 +02:00
Liran Alon
f145d90d97 KVM: nVMX: Verify VMCS shadowing VMCS link pointer
Intel SDM considers these checks to be part of
"Checks on Guest Non-Register State".

Note that it is legal for vmcs->vmcs_link_pointer to be -1ull
when VMCS shadowing is enabled. In this case, any VMREAD/VMWRITE to
shadowed-field sets the ALU flags for VMfailInvalid (i.e. CF=1).

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:41 +02:00
Liran Alon
a8a7c02bf7 KVM: nVMX: Verify VMCS shadowing controls
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:40 +02:00
Liran Alon
f792d2743e KVM: nVMX: Introduce nested_cpu_has_shadow_vmcs()
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:40 +02:00
Liran Alon
a6192d40d5 KVM: nVMX: Fail VMLAUNCH and VMRESUME on shadow VMCS
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:39 +02:00
Liran Alon
fa97d7dba7 KVM: nVMX: Allow VMPTRLD for shadow VMCS if vCPU supports VMCS shadowing
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:38 +02:00
Liran Alon
e253674227 KVM: VMX: Change vmcs12_{read,write}_any() to receive vmcs12 as parameter
No functionality change.
This is done as a preparation for VMCS shadowing emulation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:37 +02:00
Liran Alon
392b2f25aa KVM: VMX: Create struct for VMCS header
No functionality change.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:37 +02:00
Paolo Bonzini
cb5476379f kvm: selftests: add test for nested state save/restore
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:36 +02:00
Jim Mattson
8fcc4b5923 kvm: nVMX: Introduce KVM_CAP_NESTED_STATE
For nested virtualization L0 KVM is managing a bit of state for L2 guests,
this state can not be captured through the currently available IOCTLs. In
fact the state captured through all of these IOCTLs is usually a mix of L1
and L2 state. It is also dependent on whether the L2 guest was running at
the moment when the process was interrupted to save its state.

With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
that is in VMX operation.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jim Mattson <jmattson@google.com>
[karahmed@ - rename structs and functions and make them ready for AMD and
             address previous comments.
           - handle nested.smm state.
           - rebase & a bit of refactoring.
           - Merge 7/8 and 8/8 into one patch. ]
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:58:30 +02:00
Paolo Bonzini
7f7f1ba33c KVM: x86: do not load vmcs12 pages while still in SMM
If the vCPU enters system management mode while running a nested guest,
RSM starts processing the vmentry while still in SMM.  In that case,
however, the pages pointed to by the vmcs12 might be incorrectly
loaded from SMRAM.  To avoid this, delay the handling of the pages
until just before the next vmentry.  This is done with a new request
and a new entry in kvm_x86_ops, which we will be able to reuse for
nested VMX state migration.

Extracted from a patch by Jim Mattson and KarimAllah Ahmed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:57:58 +02:00
Paolo Bonzini
fa3899add1 kvm: selftests: add basic test for state save and restore
The test calls KVM_RUN repeatedly, and creates an entirely new VM with the
old memory and vCPU state on every exit to userspace.  The kvm_util API is
expanded with two functions that manage the lifetime of a kvm_vm struct:
the first closes the file descriptors and leaves the memory allocated,
and the second opens the file descriptors and reuses the memory from
the previous incarnation of the kvm_vm struct.

For now the test is very basic, as it does not test for example XSAVE or
vCPU events.  However, it will test nested virtualization state starting
with the next patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:32:04 +02:00
Paolo Bonzini
0a505fe6f2 kvm: selftests: ensure vcpu file is released
The selftests were not munmap-ing the kvm_run area from the vcpu file descriptor.
The result was that kvm_vcpu_release was not called and a reference was left in the
parent "struct kvm".  Ultimately this was visible in the upcoming state save/restore
test as an error when KVM attempted to create a duplicate debugfs entry.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:32:04 +02:00
Paolo Bonzini
87ccb7dbb2 kvm: selftests: actually use all of lib/vmx.c
The allocation of the VMXON and VMCS is currently done twice, in
lib/vmx.c and in vmx_tsc_adjust_test.c.  Reorganize the code to
provide a cleaner and easier to use API to the tests.  lib/vmx.c
now does the complete setup of the VMX data structures, but does not
create the VM or set CPUID.  This has to be done by the caller.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:32:03 +02:00
Paolo Bonzini
2305339ee7 kvm: selftests: create a GDT and TSS
The GDT and the TSS base were left to zero, and this has interesting effects
when the TSS descriptor is later read to set up a VMCS's TR_BASE.  Basically
it worked by chance, and this patch fixes it by setting up all the protected
mode data structures properly.

Because the GDT and TSS addresses are virtual, the page tables now always
exist at the time of vcpu setup.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:32:02 +02:00
Paolo Bonzini
44883f01fe KVM: x86: ensure all MSRs can always be KVM_GET/SET_MSR'd
Some of the MSRs returned by GET_MSR_INDEX_LIST currently cannot be sent back
to KVM_GET_MSR and/or KVM_SET_MSR; either they can never be sent back, or you
they are only accepted under special conditions.  This makes the API a pain to
use.

To avoid this pain, this patch makes it so that the result of the get-list
ioctl can always be used for host-initiated get and set.  Since we don't have
a separate way to check for read-only MSRs, this means some Hyper-V MSRs are
ignored when written.  Arguably they should not even be in the result of
GET_MSR_INDEX_LIST, but I am leaving there in case userspace is using the
outcome of GET_MSR_INDEX_LIST to derive the support for the corresponding
Hyper-V feature.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:32:01 +02:00
Sean Christopherson
cf81a7e580 KVM: vmx: remove save/restore of host BNDCGFS MSR
Linux does not support Memory Protection Extensions (MPX) in the
kernel itself, thus the BNDCFGS (Bound Config Supervisor) MSR will
always be zero in the KVM host, i.e. RDMSR in vmx_save_host_state()
is superfluous.  KVM unconditionally sets VM_EXIT_CLEAR_BNDCFGS,
i.e. BNDCFGS will always be zero after VMEXIT, thus manually loading
BNDCFGS is also superfluous.

And in the event the MPX kernel support is added (unlikely given
that MPX for userspace is in its death throes[1]), BNDCFGS will
likely be common across all CPUs[2], and at the least shouldn't
change on a regular basis, i.e. saving the MSR on every VMENTRY is
completely unnecessary.

WARN_ONCE in hardware_setup() if the host's BNDCFGS is non-zero to
document that KVM does not preserve BNDCFGS and to serve as a hint
as to how BNDCFGS likely should be handled if MPX is used in the
kernel, e.g. BNDCFGS should be saved once during KVM setup.

[1] https://lkml.org/lkml/2018/4/27/1046
[2] http://www.openwall.com/lists/kernel-hardening/2017/07/24/28

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:32:00 +02:00
KarimAllah Ahmed
86dafed50e KVM: Switch 'requests' to be 64-bit (explicitly)
Switch 'requests' to be explicitly 64-bit and update BUILD_BUG_ON check to
use the size of "requests" instead of the hard-coded '32'.

That gives us a bit more room again for arch-specific requests as we
already ran out of space for x86 due to the hard-coded check.

The only exception here is ARM32 as it is still 32-bits.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:31:59 +02:00
Wei Huang
ca35906688 kvm: selftests: add cr4_cpuid_sync_test
KVM is supposed to update some guest VM's CPUID bits (e.g. OSXSAVE) when
CR4 is changed. A bug was found in KVM recently and it was fixed by
Commit c4d2188206 ("KVM: x86: Update cpuid properly when CR4.OSXAVE or
CR4.PKE is changed"). This patch adds a test to verify the synchronization
between guest VM's CR4 and CPUID bits.

Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06 17:31:59 +02:00
Paolo Bonzini
d2ce98ca0a Linux 4.18-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltU8z0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5X8H/2fJr7m3k242+t76
 sitwvx1eoPqTgryW59dRKm9IuXAGA+AjauvHzaz1QxomeQa50JghGWefD0eiJfkA
 1AphQ/24EOiAbbVk084dAI/C2p122dE4D5Fy7CrfLnuouyrbFaZI5STbnrRct7sR
 9deeYW0GDHO1Uenp4WDCj0baaqJqaevZ+7GG09DnWpya2nQtSkGBjqn6GpYmrfOU
 mqFuxAX8mEOW6cwK16y/vYtnVjuuMAiZ63/OJ8AQ6d6ArGLwAsdn7f8Fn4I4tEr2
 L0d3CRLUyegms4++Dmlu05k64buQu46WlPhjCZc5/Ts4kjrNxBuHejj2/jeSnUSt
 vJJlibI=
 =42a5
 -----END PGP SIGNATURE-----

Merge tag 'v4.18-rc6' into HEAD

Pull bug fixes into the KVM development tree to avoid nasty conflicts.
2018-08-06 17:31:36 +02:00
Paolo Bonzini
85eae57bbb KVM: s390: Features for 4.19
- initial version for host large page support. Must be enabled with
   module parameter hpage=1 and will conflict with the nested=1
   parameter.
 - enable etoken facility for guests
 - Fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJbYAz8AAoJEBF7vIC1phx8r6cP+wa1xFphqa5ha7/qlr/upfWF
 aqIQ9IqyUcsYC83EQuEKTgEEgXkhOZ6eUM8B5dkmTP0iTroB8bFc5+AEKZfRoMzk
 rLBuxZti5cKI0l5nE4ChX20WNzjwVIRiHqkyh7Z+otpfVS5tEujkVPbOkzHTVZIL
 MJWAGb0m8PXbXShqAq3d5aPKcQz3L/y4uPHZsPJtv6vf/e06l5OlvP5gxiIFORFQ
 ss01RrzlqXGJ8KEUPD1kWT2I3mbSUZldb2Yd0oi/m4RK2csJ9WAEYNBjLstdXJ9/
 l0iqT7ZAbRfdAxRKHSUPxoj1WK/EPxEYfh+NbR+gbVgGrsMi9yeKPAIVC1ws/kG1
 qdjVHbJyMXCHRaKi/+ayyNwdcv6cWSIEFbfIwqFfKEm/5/AbSIg76uReuWh1oDM4
 tX7aQNJp7zmRWBbxGy9ezJlvOOvz+G2hMtf5MiD3rMBI602FE+eak8NBTXmOQ8Jv
 /EZw/xJFQBOnD7G/cNMQRiPGHOPRdhS4O/9N4DUh9Qw88rOtqLKQRRnkB5cTawWa
 LEVU+sOhqCROEr/pQdN4cID71bnWb2bsvtmTX5EOK1UF4LeaKLaEmgCJBIkOGjns
 VI9RecNa7meeeYYN67Mgp9VdGab38fc+ybT9TVejVheznrxcKrR8RlwCjcmLP2n4
 QpzmR/9E3SzwY6y/k11i
 =WSH5
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Features for 4.19

- initial version for host large page support. Must be enabled with
  module parameter hpage=1 and will conflict with the nested=1
  parameter.
- enable etoken facility for guests
- Fixes
2018-08-02 13:57:29 +02:00
Paolo Bonzini
3a1174cd3e PPC KVM update for 4.19.
This update adds no new features; it just has some minor code cleanups
 and bug fixes, including a fix to allow us to create KVM_MAX_VCPUS
 vCPUs on POWER9 in all CPU threading modes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJbX/AJAAoJEJ2a6ncsY3GfO7sH/jEq1yuCzIjegVFkO7qPFd0C
 LXwcXyFfM7EIB7kWfHLPzfavBaQZvILUR5N2J2nEjEkyeHzouZ3utwU0zUeVLmDm
 3zqb/OT1viNxpKV88qkYCIsW1aB1j9dIP714BTqXYF49z09tv/LXE5ErqWBLJoVi
 eK1o1CoivfudONTyz4SQlMbGrdA0UgR9hxQviIhPnQmgOJfJCWpQR5wTkQh7kxDn
 htBi2Qe0gpRL+9VMMGQbCVpHRw/VTolOfS+MDxfmrGlHh6ML9GjoOMBO5uraTeDg
 1he8PZS1In5KR62qrOhzfraJ49oscSn+dhxrQfYx2crCmxPygJsWkVIJrmGFiFU=
 =P447
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-next-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

PPC KVM update for 4.19.

This update adds no new features; it just has some minor code cleanups
and bug fixes, including a fix to allow us to create KVM_MAX_VCPUS
vCPUs on POWER9 in all CPU threading modes.
2018-08-02 13:57:26 +02:00
Janosch Frank
2375846193 KVM: s390: initial host large page support
- must be enabled via module parameter hpage=1
 - cannot be used together with nested
 - does support migration
 - does support hugetlbfs
 - no THP yet
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJbX4AwAAoJEBF7vIC1phx85eMP/ifsNHwqfAOrZBdlJuLVPla5
 47J8iY4i4DOKGhKI4YOTcJQhn1izKZhECXS8d8hghB/sQUCE2CLVr1X/r1Udy2Pq
 bpKG4apYtcJZBF6qn7yDMjBGkIRK4OCBD1pkuKEq2NyvUgPsHUVUgpuq2gngMTBk
 ZN9MIfRQMdIEJsT389D6T9as0lwABJ0MJap5AudkQwguN2dDhQGeZv8l0QYV8C2I
 WqRI2VsI1QEo3cJr1lJ5li/F9fC7q0l6QwlvPVocIHJAnq01zJvOekeAgQ4hzz16
 JIoQckJq8m4d4PqZ7aWmAaMEemoQ9llmCavovspJNtFT79jho6cWWtBEvq+t0GLQ
 qTsG9Yi20hONZMWAw+JIdSdOuFMD0HCpOWdUtSMjENFRbr8LLHUr91dGIxRLjF8Z
 gv3vDJrbGzCQ+b9qPA8SrAN7U3VNCZG384MEmobwTuv5hxOopWp6chcK7RCriV/m
 7cFDfO7+2pZymdW7D4DWlFiZl4mWpwOxip32C9tCt0CQveqeYSZsb5Qb9Pe+50vr
 JhpB74UL79Wffvd65InGlu5jx1SdGG0QAzmBOkdOsAhX+0WMmXRB1ddn4whu7HPU
 ssNtdKgLt9KkM/kIsB9RC/YLvUFK1lBVHrfnzUmLw3CBHP3QeO+V+arLwdVLVDjV
 PA/LPECBWtGtQtxGWb2H
 =Y0Wl
 -----END PGP SIGNATURE-----

Merge tag 'hlp_stage1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvms390/next

KVM: s390: initial host large page support

- must be enabled via module parameter hpage=1
- cannot be used together with nested
- does support migration
- does support hugetlbfs
- no THP yet
2018-07-30 23:20:48 +02:00
Janosch Frank
a449938297 KVM: s390: Add huge page enablement control
General KVM huge page support on s390 has to be enabled via the
kvm.hpage module parameter. Either nested or hpage can be enabled, as
we currently do not support vSIE for huge backed guests. Once the vSIE
support is added we will either drop the parameter or enable it as
default.

For a guest the feature has to be enabled through the new
KVM_CAP_S390_HPAGE_1M capability and the hpage module
parameter. Enabling it means that cmm can't be enabled for the vm and
disables pfmf and storage key interpretation.

This is due to the fact that in some cases, in upcoming patches, we
have to split huge pages in the guest mapping to be able to set more
granular memory protection on 4k pages. These split pages have fake
page tables that are not visible to the Linux memory management which
subsequently will not manage its PGSTEs, while the SIE will. Disabling
these features lets us manage PGSTE data in a consistent matter and
solve that problem.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30 23:13:38 +02:00
Janosch Frank
a9e00d8349 s390/mm: Add huge page gmap linking support
Let's allow huge pmd linking when enabled through the
KVM_CAP_S390_HPAGE_1M capability. Also we can now restrict gmap
invalidation and notification to the cases where the capability has
been activated and save some cycles when that's not the case.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30 23:13:38 +02:00
Dominik Dingel
7d735b9ae8 s390/mm: hugetlb pages within a gmap can not be freed
Guests backed by huge pages could theoretically free unused pages via
the diagnose 10 instruction. We currently don't allow that, so we
don't have to refault it once it's needed again.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2018-07-30 23:13:38 +02:00
Janosch Frank
57cb198cfd KVM: s390: Beautify skey enable check
Let's introduce an explicit check if skeys have already been enabled
for the vcpu, so we don't have to check the mm context if we don't have
the storage key facility.

This lets us check for enablement without having to take the mm
semaphore and thus speedup skey emulation.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2018-07-30 17:05:52 +02:00
Janosch Frank
bd096f6443 KVM: s390: Add skey emulation fault handling
When doing skey emulation for huge guests, we now need to fault in
pmds, as we don't have PGSTES anymore to store them when we do not
have valid table entries.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2018-07-30 11:20:18 +01:00
Janosch Frank
637ff9efe5 s390/mm: Add huge pmd storage key handling
Storage keys for guests with huge page mappings have to be managed in
hardware. There are no PGSTEs for PMDs that we could use to retain the
guests's logical view of the key.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30 11:20:18 +01:00
Janosch Frank
3afdfca698 s390/mm: Clear skeys for newly mapped huge guest pmds
Similarly to the pte skey handling, where we set the storage key to
the default key for each newly mapped pte, we have to also do that for
huge pmds.

With the PG_arch_1 flag we keep track if the area has already been
cleared of its skeys.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-30 11:20:18 +01:00
Dominik Dingel
964c2c05c9 s390/mm: Clear huge page storage keys on enable_skey
When a guest starts using storage keys, we trap and set a default one
for its whole valid address space. With this patch we are now able to
do that for large pages.

To speed up the storage key insertion, we use
__storage_key_init_range, which in-turn will use sske_frame to set
multiple storage keys with one instruction. As it has been previously
used for debuging we have to get rid of the default key check and make
it quiescing.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
[replaced page_set_storage_key loop with __storage_key_init_range]
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30 11:20:18 +01:00
Janosch Frank
0959e16867 s390/mm: Add huge page dirty sync support
To do dirty loging with huge pages, we protect huge pmds in the
gmap. When they are written to, we unprotect them and mark them dirty.

We introduce the function gmap_test_and_clear_dirty_pmd which handles
dirty sync for huge pages.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
2018-07-30 11:20:18 +01:00