Commit Graph

37 Commits

Author SHA1 Message Date
Linus Torvalds e8069f5a8e ARM64:
* Eager page splitting optimization for dirty logging, optionally
   allowing for a VM to avoid the cost of hugepage splitting in the stage-2
   fault path.
 
 * Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with
   services that live in the Secure world. pKVM intervenes on FF-A calls
   to guarantee the host doesn't misuse memory donated to the hyp or a
   pKVM guest.
 
 * Support for running the split hypervisor with VHE enabled, known as
   'hVHE' mode. This is extremely useful for testing the split
   hypervisor on VHE-only systems, and paves the way for new use cases
   that depend on having two TTBRs available at EL2.
 
 * Generalized framework for configurable ID registers from userspace.
   KVM/arm64 currently prevents arbitrary CPU feature set configuration
   from userspace, but the intent is to relax this limitation and allow
   userspace to select a feature set consistent with the CPU.
 
 * Enable the use of Branch Target Identification (FEAT_BTI) in the
   hypervisor.
 
 * Use a separate set of pointer authentication keys for the hypervisor
   when running in protected mode, as the host is untrusted at runtime.
 
 * Ensure timer IRQs are consistently released in the init failure
   paths.
 
 * Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps
   (FEAT_EVT), as it is a register commonly read from userspace.
 
 * Erratum workaround for the upcoming AmpereOne part, which has broken
   hardware A/D state management.
 
 RISC-V:
 
 * Redirect AMO load/store misaligned traps to KVM guest
 
 * Trap-n-emulate AIA in-kernel irqchip for KVM guest
 
 * Svnapot support for KVM Guest
 
 s390:
 
 * New uvdevice secret API
 
 * CMM selftest and fixes
 
 * fix racy access to target CPU for diag 9c
 
 x86:
 
 * Fix missing/incorrect #GP checks on ENCLS
 
 * Use standard mmu_notifier hooks for handling APIC access page
 
 * Drop now unnecessary TR/TSS load after VM-Exit on AMD
 
 * Print more descriptive information about the status of SEV and SEV-ES during
   module load
 
 * Add a test for splitting and reconstituting hugepages during and after
   dirty logging
 
 * Add support for CPU pinning in demand paging test
 
 * Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes
   included along the way
 
 * Add a "nx_huge_pages=never" option to effectively avoid creating NX hugepage
   recovery threads (because nx_huge_pages=off can be toggled at runtime)
 
 * Move handling of PAT out of MTRR code and dedup SVM+VMX code
 
 * Fix output of PIC poll command emulation when there's an interrupt
 
 * Add a maintainer's handbook to document KVM x86 processes, preferred coding
   style, testing expectations, etc.
 
 * Misc cleanups, fixes and comments
 
 Generic:
 
 * Miscellaneous bugfixes and cleanups
 
 Selftests:
 
 * Generate dependency files so that partial rebuilds work as expected
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmSgHrIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroORcAf+KkBlXwQMf+Q0Hy6Mfe0OtkKmh0Ae
 6HJ6dsuMfOHhWv5kgukh+qvuGUGzHq+gpVKmZg2yP3h3cLHOLUAYMCDm+rjXyjsk
 F4DbnJLfxq43Pe9PHRKFxxSecRcRYCNox0GD5UYL4PLKcH0FyfQrV+HVBK+GI8L3
 FDzUcyJkR12Lcj1qf++7fsbzfOshL0AJPmidQCoc6wkLJpUEr/nYUqlI1Kx3YNuQ
 LKmxFHS4l4/O/px3GKNDrLWDbrVlwciGIa3GZLS52PZdW3mAqT+cqcPcYK6SW71P
 m1vE80VbNELX5q3YSRoOXtedoZ3Pk97LEmz/xQAsJ/jri0Z5Syk0Ok0m/Q==
 =AMXp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM64:

   - Eager page splitting optimization for dirty logging, optionally
     allowing for a VM to avoid the cost of hugepage splitting in the
     stage-2 fault path.

   - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact
     with services that live in the Secure world. pKVM intervenes on
     FF-A calls to guarantee the host doesn't misuse memory donated to
     the hyp or a pKVM guest.

   - Support for running the split hypervisor with VHE enabled, known as
     'hVHE' mode. This is extremely useful for testing the split
     hypervisor on VHE-only systems, and paves the way for new use cases
     that depend on having two TTBRs available at EL2.

   - Generalized framework for configurable ID registers from userspace.
     KVM/arm64 currently prevents arbitrary CPU feature set
     configuration from userspace, but the intent is to relax this
     limitation and allow userspace to select a feature set consistent
     with the CPU.

   - Enable the use of Branch Target Identification (FEAT_BTI) in the
     hypervisor.

   - Use a separate set of pointer authentication keys for the
     hypervisor when running in protected mode, as the host is untrusted
     at runtime.

   - Ensure timer IRQs are consistently released in the init failure
     paths.

   - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization
     Traps (FEAT_EVT), as it is a register commonly read from userspace.

   - Erratum workaround for the upcoming AmpereOne part, which has
     broken hardware A/D state management.

  RISC-V:

   - Redirect AMO load/store misaligned traps to KVM guest

   - Trap-n-emulate AIA in-kernel irqchip for KVM guest

   - Svnapot support for KVM Guest

  s390:

   - New uvdevice secret API

   - CMM selftest and fixes

   - fix racy access to target CPU for diag 9c

  x86:

   - Fix missing/incorrect #GP checks on ENCLS

   - Use standard mmu_notifier hooks for handling APIC access page

   - Drop now unnecessary TR/TSS load after VM-Exit on AMD

   - Print more descriptive information about the status of SEV and
     SEV-ES during module load

   - Add a test for splitting and reconstituting hugepages during and
     after dirty logging

   - Add support for CPU pinning in demand paging test

   - Add support for AMD PerfMonV2, with a variety of cleanups and minor
     fixes included along the way

   - Add a "nx_huge_pages=never" option to effectively avoid creating NX
     hugepage recovery threads (because nx_huge_pages=off can be toggled
     at runtime)

   - Move handling of PAT out of MTRR code and dedup SVM+VMX code

   - Fix output of PIC poll command emulation when there's an interrupt

   - Add a maintainer's handbook to document KVM x86 processes,
     preferred coding style, testing expectations, etc.

   - Misc cleanups, fixes and comments

  Generic:

   - Miscellaneous bugfixes and cleanups

  Selftests:

   - Generate dependency files so that partial rebuilds work as
     expected"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits)
  Documentation/process: Add a maintainer handbook for KVM x86
  Documentation/process: Add a label for the tip tree handbook's coding style
  KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index
  RISC-V: KVM: Remove unneeded semicolon
  RISC-V: KVM: Allow Svnapot extension for Guest/VM
  riscv: kvm: define vcpu_sbi_ext_pmu in header
  RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip
  RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC
  RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip
  RISC-V: KVM: Add in-kernel emulation of AIA APLIC
  RISC-V: KVM: Implement device interface for AIA irqchip
  RISC-V: KVM: Skeletal in-kernel AIA irqchip support
  RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero
  RISC-V: KVM: Add APLIC related defines
  RISC-V: KVM: Add IMSIC related defines
  RISC-V: KVM: Implement guest external interrupt line management
  KVM: x86: Remove PRIx* definitions as they are solely for user space
  s390/uv: Update query for secret-UVCs
  s390/uv: replace scnprintf with sysfs_emit
  s390/uvdevice: Add 'Lock Secret Store' UVC
  ...
2023-07-03 15:32:22 -07:00
Hugh Dickins 5c7f3bf04a s390: allow pte_offset_map_lock() to fail
In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.

Add comment on mm's contract with s390 above __zap_zero_pages(),
and fix old comment there: must be called after THP was disabled.

Link: https://lkml.kernel.org/r/3ff29363-336a-9733-12a1-5c31a45c8aeb@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:09 -07:00
Steffen Eiden db54dfc9f7 s390/uv: Update query for secret-UVCs
Update the query struct such that secret-UVC related
information can be parsed.
Add sysfs files for these new values.

'supp_add_secret_req_ver' notes the supported versions for the
Add Secret UVC. Bit 0 indicates that version 0x100 is supported,
bit 1 indicates 0x200, and so on.

'supp_add_secret_pcf' notes the supported plaintext flags for
the Add Secret UVC.

'supp_secret_types' notes the supported types of secrets.
Bit 0 indicates secret type 1, bit 1 indicates type 2, and so on.

'max_secrets' notes the maximum amount of secrets the secret store can
store per pv guest.

Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20230615100533.3996107-8-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-Id: <20230615100533.3996107-8-seiden@linux.ibm.com>
2023-06-16 11:08:09 +02:00
Steffen Eiden 78d3326e72 s390/uv: replace scnprintf with sysfs_emit
Replace scnprintf(page, PAGE_SIZE, ...) with the page size aware
sysfs_emit(buf, ...) which adds some sanity checks.

Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20230615100533.3996107-7-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-Id: <20230615100533.3996107-7-seiden@linux.ibm.com>
2023-06-16 11:08:09 +02:00
Steffen Eiden 4255ce0177 s390/uv: Always export uv_info
KVM needs the struct's values to be able to provide PV support.

The uvdevice is currently guest only and will need the struct's values
for call support checking and potential future expansions.

As uv.c is only compiled with CONFIG_PGSTE or
CONFIG_PROTECTED_VIRTUALIZATION_GUEST we don't need a second check in
the code. Users of uv_info will need to fence for these two config
options for the time being.

Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20230615100533.3996107-2-seiden@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-Id: <20230615100533.3996107-2-seiden@linux.ibm.com>
2023-06-16 11:08:09 +02:00
Claudio Imbrenda c148dc8e2f KVM: s390: fix race in gmap_make_secure()
Fix a potential race in gmap_make_secure() and remove the last user of
follow_page() without FOLL_GET.

The old code is locking something it doesn't have a reference to, and
as explained by Jason and David in this discussion:
https://lore.kernel.org/linux-mm/Y9J4P%2FRNvY1Ztn0Q@nvidia.com/
it can lead to all kind of bad things, including the page getting
unmapped (MADV_DONTNEED), freed, reallocated as a larger folio and the
unlock_page() would target the wrong bit.
There is also another race with the FOLL_WRITE, which could race
between the follow_page() and the get_locked_pte().

The main point is to remove the last use of follow_page() without
FOLL_GET or FOLL_PIN, removing the races can be considered a nice
bonus.

Link: https://lore.kernel.org/linux-mm/Y9J4P%2FRNvY1Ztn0Q@nvidia.com/
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Fixes: 214d9bbcd3 ("s390/mm: provide memory management functions for protected KVM guests")
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-Id: <20230428092753.27913-2-imbrenda@linux.ibm.com>
2023-05-04 18:26:27 +02:00
Claudio Imbrenda afe20eb8df KVM: s390: pv: avoid export before import if possible
If the appropriate UV feature bit is set, there is no need to perform
an export before import.

The misc feature indicates, among other things, that importing a shared
page from a different protected VM will automatically also transfer its
ownership.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20221111170632.77622-5-imbrenda@linux.ibm.com
Message-Id: <20221111170632.77622-5-imbrenda@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-11-23 09:06:50 +00:00
Claudio Imbrenda 72b1daff26 KVM: s390: pv: add export before import
Due to upcoming changes, it will be possible to temporarily have
multiple protected VMs in the same address space, although only one
will be actually active.

In that scenario, it is necessary to perform an export of every page
that is to be imported, since the hardware does not allow a page
belonging to a protected guest to be imported into a different
protected guest.

This also applies to pages that are shared, and thus accessible by the
host.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20220628135619.32410-7-imbrenda@linux.ibm.com
Message-Id: <20220628135619.32410-7-imbrenda@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-07-13 14:42:11 +00:00
Claudio Imbrenda a52c25848e KVM: s390: pv: handle secure storage violations for protected guests
A secure storage violation is triggered when a protected guest tries to
access secure memory that has been mapped erroneously, or that belongs
to a different protected guest or to the ultravisor.

With upcoming patches, protected guests will be able to trigger secure
storage violations in normal operation. This happens for example if a
protected guest is rebooted with deferred destroy enabled and the new
guest is also protected.

When the new protected guest touches pages that have not yet been
destroyed, and thus are accounted to the previous protected guest, a
secure storage violation is raised.

This patch adds handling of secure storage violations for protected
guests.

This exception is handled by first trying to destroy the page, because
it is expected to belong to a defunct protected guest where a destroy
should be possible. Note that a secure page can only be destroyed if
its protected VM does not have any CPUs, which only happens when the
protected VM is being terminated. If that fails, a normal export of
the page is attempted.

This means that pages that trigger the exception will be made
non-secure (in one way or another) before attempting to use them again
for a different secure guest.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20220628135619.32410-3-imbrenda@linux.ibm.com
Message-Id: <20220628135619.32410-3-imbrenda@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-07-13 14:42:11 +00:00
Steffen Eiden 1b6abe95b5 s390: Add attestation query information
We have information about the supported attestation header version
and plaintext attestation flag bits.
Let's expose it via the sysfs files.

Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/lkml/20220601100245.3189993-1-seiden@linux.ibm.com/
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-07-11 11:29:51 +02:00
Janosch Frank 38c218259d s390/uv: Add dump fields to query
The new dump feature requires us to know how much memory is needed for
the "dump storage state" and "dump finalize" ultravisor call. These
values are reported via the UV query call.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-3-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-3-frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-06-01 16:57:14 +02:00
Janosch Frank ac640db3a0 s390/uv: Add SE hdr query information
We have information about the supported se header version and pcf bits
so let's expose it via the sysfs files.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-2-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-2-frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-06-01 16:57:14 +02:00
Heiko Carstens 15b5c1833a s390/uv: fix memblock virtual vs physical address confusion
memblock_alloc_try_nid() returns a virtual address, however in error
case the allocated memory is incorrectly freed with memblock_phys_free().
Properly use memblock_free() instead, and pass a physical address to
uv_init() to fix this.

Note: this doesn't fix a bug currently, since virtual and physical
addresses are identical.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-16 19:58:07 +01:00
Linus Torvalds 512b7931ad Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "257 patches.

  Subsystems affected by this patch series: scripts, ocfs2, vfs, and
  mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
  gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
  pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
  memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
  vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
  cleanups, kfence, and damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
  mm/damon: remove return value from before_terminate callback
  mm/damon: fix a few spelling mistakes in comments and a pr_debug message
  mm/damon: simplify stop mechanism
  Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
  Docs/admin-guide/mm/damon/start: simplify the content
  Docs/admin-guide/mm/damon/start: fix a wrong link
  Docs/admin-guide/mm/damon/start: fix wrong example commands
  mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
  mm/damon: remove unnecessary variable initialization
  Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
  mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
  selftests/damon: support watermarks
  mm/damon/dbgfs: support watermarks
  mm/damon/schemes: activate schemes based on a watermarks mechanism
  tools/selftests/damon: update for regions prioritization of schemes
  mm/damon/dbgfs: support prioritization weights
  mm/damon/vaddr,paddr: support pageout prioritization
  mm/damon/schemes: prioritize regions within the quotas
  mm/damon/selftests: support schemes quotas
  mm/damon/dbgfs: support quotas of schemes
  ...
2021-11-06 14:08:17 -07:00
Mike Rapoport 3ecc68349b memblock: rename memblock_free to memblock_phys_free
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().

The callers are updated with the below semantic patch:

    @@
    expression addr;
    expression size;
    @@
    - memblock_free(addr, size);
    + memblock_phys_free(addr, size);

Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Claudio Imbrenda 380d97bd02 KVM: s390: pv: properly handle page flags for protected guests
Introduce variants of the convert and destroy page functions that also
clear the PG_arch_1 bit used to mark them as secure pages.

The PG_arch_1 flag is always allowed to overindicate; using the new
functions introduced here allows to reduce the extent of overindication
and thus improve performance.

These new functions can only be called on pages for which a reference
is already being held.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20210920132502.36111-7-imbrenda@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-27 07:55:53 +02:00
Claudio Imbrenda f0a1a0615a KVM: s390: pv: avoid stalls when making pages secure
Improve make_secure_pte to avoid stalls when the system is heavily
overcommitted. This was especially problematic in kvm_s390_pv_unpack,
because of the loop over all pages that needed unpacking.

Due to the locks being held, it was not possible to simply replace
uv_call with uv_call_sched. A more complex approach was
needed, in which uv_call is replaced with __uv_call, which does not
loop. When the UVC needs to be executed again, -EAGAIN is returned, and
the caller (or its caller) will try again.

When -EAGAIN is returned, the path is the same as when the page is in
writeback (and the writeback check is also performed, which is
harmless).

Fixes: 214d9bbcd3 ("s390/mm: provide memory management functions for protected KVM guests")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20210920132502.36111-5-imbrenda@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25 09:20:39 +02:00
David Hildenbrand 46c22ffd27 s390/uv: fully validate the VMA before calling follow_page()
We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f260 ("mm: mmap: zap pages
with read mmap_sem in munmap").

find_vma() does not check if the address is >= the VMA start address;
use vma_lookup() instead.

Fixes: 214d9bbcd3 ("s390/mm: provide memory management functions for protected KVM guests")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/r/20210909162248.14969-6-david@redhat.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2021-10-25 09:20:38 +02:00
Alexander Egorenkov 7f33565b25 s390/uv: de-duplicate checks for Protected Host Virtualization
De-duplicate checks for Protected Host Virtualization in decompressor and
kernel.

Set prot_virt_host=0 in the decompressor in *any* of the following cases
and hand it over to the decompressed kernel:
* No explicit prot_virt=1 is given on the kernel command-line
* Protected Guest Virtualization is enabled
* Hardware support not present
* kdump or stand-alone dump

The decompressed kernel needs to use only is_prot_virt_host() instead of
performing again all checks done by the decompressor.

Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-27 09:39:14 +02:00
Janosch Frank 85b18d7b5e s390: mm: Fix secure storage access exception handling
Turns out that the bit 61 in the TEID is not always 1 and if that's
the case the address space ID and the address are
unpredictable. Without an address and its address space ID we can't
export memory and hence we can only send a SIGSEGV to the process or
panic the kernel depending on who caused the exception.

Unfortunately bit 61 is only reliable if we have the "misc" UV feature
bit.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 084ea4d611 ("s390/mm: add (non)secure page access exceptions handlers")
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05 12:44:23 +02:00
Vasily Gorbik 0c4f2623b9 s390: setup kernel memory layout early
Currently there are two separate places where kernel memory layout has
to be known and adjusted:
1. early kasan setup.
2. paging setup later.

Those 2 places had to be kept in sync and adjusted to reflect peculiar
technical details of one another. With additional factors which influence
kernel memory layout like ultravisor secure storage limit, complexity
of keeping two things in sync grew up even more.

Besides that if we look forward towards creating identity mapping and
enabling DAT before jumping into uncompressed kernel - that would also
require full knowledge of and control over kernel memory layout.

So, de-duplicate and move kernel memory layout setup logic into
the decompressor.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-18 16:41:19 +02:00
zhongbaisong 644975179c s390/protvirt: fix error return code in uv_info_init()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
Fixes: 37564ed834 ("s390/uv: add prot virt guest/host indication files")
Link: https://lore.kernel.org/r/2f7d62a4-3e75-b2b4-951b-75ef8ef59d16@huawei.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-04-12 12:46:41 +02:00
Janosch Frank df2e400e07 s390/uv: fix prot virt host indication compilation
prot_virt_host is only available if CONFIG_KVM is enabled. So lets use
a variable initialized to zero and overwrite it when that config
option is set with prot_virt_host.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: 37564ed834 ("s390/uv: add prot virt guest/host indication files")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-24 16:06:19 +01:00
Janosch Frank 37564ed834 s390/uv: add prot virt guest/host indication files
Let's export the prot_virt_guest and prot_virt_host variables into the
UV sysfs firmware interface to make them easily consumable by
administrators.

prot_virt_host being 1 indicates that we did the UV
initialization (opt-in)

prot_virt_guest being 1 indicates that the UV indicates the share and
unshare ultravisor calls which is an indication that we are running as
a protected guest.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-03-22 11:36:03 +01:00
Janosch Frank e82080e1f4 s390: uv: Fix sysfs max number of VCPUs reporting
The number reported by the query is N-1 and I think people reading the
sysfs file would expect N instead. For users creating VMs there's no
actual difference because KVM's limit is currently below the UV's
limit.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: a0f60f8431 ("s390/protvirt: Add sysfs firmware interface for Ultravisor information")
Cc: stable@vger.kernel.org
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-27 13:00:04 +01:00
Christian Borntraeger 4c80d05714 s390/uv: handle destroy page legacy interface
Older firmware can return rc=0x107 rrc=0xd for destroy page if the
page is already non-secure. This should be handled like a success
as already done by newer firmware.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 1a80b54d1c ("s390/uv: add destroy page call")
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
2020-11-18 13:08:49 +01:00
Vasily Gorbik c360c9a238 s390/kasan: support protvirt with 4-level paging
Currently the kernel crashes in Kasan instrumentation code if
CONFIG_KASAN_S390_4_LEVEL_PAGING is used on protected virtualization
capable machine where the ultravisor imposes addressing limitations on
the host and those limitations are lower then KASAN_SHADOW_OFFSET.

The problem is that Kasan has to know in advance where vmalloc/modules
areas would be. With protected virtualization enabled vmalloc/modules
areas are moved down to the ultravisor secure storage limit while kasan
still expects them at the very end of 4-level paging address space.

To fix that make Kasan recognize when protected virtualization is enabled
and predefine vmalloc/modules areas position which are compliant with
ultravisor secure storage limit.

Kasan shadow itself stays in place and might reside above that ultravisor
secure storage limit.

One slight difference compaired to a kernel without Kasan enabled is that
vmalloc/modules areas position is not reverted to default if ultravisor
initialization fails. It would still be below the ultravisor secure
storage limit.

Kernel layout with kasan, 4-level paging and protected virtualization
enabled (ultravisor secure storage limit is at 0x0000800000000000):
---[ vmemmap Area Start ]---
0x0000400000000000-0x0000400080000000
---[ vmemmap Area End ]---
---[ vmalloc Area Start ]---
0x00007fe000000000-0x00007fff80000000
---[ vmalloc Area End ]---
---[ Modules Area Start ]---
0x00007fff80000000-0x0000800000000000
---[ Modules Area End ]---
---[ Kasan Shadow Start ]---
0x0018000000000000-0x001c000000000000
---[ Kasan Shadow End ]---
0x001c000000000000-0x0020000000000000         1P PGD I

Kernel layout with kasan, 4-level paging and protected virtualization
disabled/unsupported:
---[ vmemmap Area Start ]---
0x0000400000000000-0x0000400060000000
---[ vmemmap Area End ]---
---[ Kasan Shadow Start ]---
0x0018000000000000-0x001c000000000000
---[ Kasan Shadow End ]---
---[ vmalloc Area Start ]---
0x001fffe000000000-0x001fffff80000000
---[ vmalloc Area End ]---
---[ Modules Area Start ]---
0x001fffff80000000-0x0020000000000000
---[ Modules Area End ]---

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16 14:08:48 +02:00
Vasily Gorbik c2314cb2dd s390/protvirt: support ultravisor without secure storage limit
Avoid potential crash due to lack of secure storage limit. Check that
max_sec_stor_addr is not 0 before adjusting vmalloc position.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16 14:08:47 +02:00
Vasily Gorbik 1d6671ae46 s390/protvirt: parse prot_virt option in the decompressor
To make early kernel address space layout definition possible parse
prot_virt option in the decompressor and pass it to the uncompressed
kernel. This enables kasan to take ultravisor secure storage limit into
consideration and pre-define vmalloc position correctly.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-16 14:08:47 +02:00
Janosch Frank 1a80b54d1c s390/uv: add destroy page call
We don't need to export pages if we destroy the VM configuration
afterwards anyway. Instead we can destroy the page which will zero it
and then make it accessible to the host.

Destroying is about twice as fast as the export.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/kvm/20200907124700.10374-2-frankja@linux.ibm.com/
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-14 11:38:35 +02:00
Chen Zhou 99448016ac s390/protvirt: use scnprintf() instead of snprintf()
snprintf() returns the number of bytes that would be written,
which may be greater than the the actual length to be written.

uv_query_facilities() should return the number of bytes printed
into the buffer. This is the return value of scnprintf().
The other functions are the same.

Link: https://lkml.kernel.org/r/20200509085608.41061-4-chenzhou10@huawei.com
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-06-16 13:44:05 +02:00
Michel Lespinasse d8ed45c5dc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Claudio Imbrenda 673deb0beb s390/protvirt: fix compilation issue
The kernel fails to compile with CONFIG_PROTECTED_VIRTUALIZATION_GUEST
set but CONFIG_KVM unset.

This patch fixes the issue by making the needed variable always available.

Link: https://lkml.kernel.org/r/20200423120114.2027410-1-imbrenda@linux.ibm.com
Fixes: a0f60f8431 ("s390/protvirt: Add sysfs firmware interface for Ultravisor information")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Philipp Rudo <prudo@linux.ibm.com>
Suggested-by: Philipp Rudo <prudo@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-04-25 10:17:24 +02:00
Janosch Frank a0f60f8431 s390/protvirt: Add sysfs firmware interface for Ultravisor information
That information, e.g. the maximum number of guests or installed
Ultravisor facilities, is interesting for QEMU, Libvirt and
administrators.

Let's provide an easily parsable API to get that information.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27 19:44:40 +01:00
Claudio Imbrenda 214d9bbcd3 s390/mm: provide memory management functions for protected KVM guests
This provides the basic ultravisor calls and page table handling to cope
with secure guests:
- provide arch_make_page_accessible
- make pages accessible after unmapping of secure guests
- provide the ultravisor commands convert to/from secure
- provide the ultravisor commands pin/unpin shared
- provide callbacks to make pages secure (inacccessible)
 - we check for the expected pin count to only make pages secure if the
   host is not accessing them
 - we fence hugetlbfs for secure pages
- add missing radix-tree include into gmap.h

The basic idea is that a page can have 3 states: secure, normal or
shared. The hypervisor can call into a firmware function called
ultravisor that allows to change the state of a page: convert from/to
secure. The convert from secure will encrypt the page and make it
available to the host and host I/O. The convert to secure will remove
the host capability to access this page.
The design is that on convert to secure we will wait until writeback and
page refs are indicating no host usage. At the same time the convert
from secure (export to host) will be called in common code when the
refcount or the writeback bit is already set. This avoids races between
convert from and to secure.

Then there is also the concept of shared pages. Those are kind of secure
where the host can still access those pages. We need to be notified when
the guest "unshares" such a page, basically doing a convert to secure by
then. There is a call "pin shared page" that we use instead of convert
from secure when possible.

We do use PG_arch_1 as an optimization to minimize the convert from
secure/pin shared.

Several comments have been added in the code to explain the logic in
the relevant places.

Co-developed-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27 19:44:40 +01:00
Vasily Gorbik 29d37e5b82 s390/protvirt: add ultravisor initialization
Before being able to host protected virtual machines, donate some of
the memory to the ultravisor. Besides that the ultravisor might impose
addressing limitations for memory used to back protected VM storage. Treat
that limit as protected virtualization host's virtual memory limit.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27 19:44:40 +01:00
Vasily Gorbik ecdc5d842b s390/protvirt: introduce host side setup
Add "prot_virt" command line option which controls if the kernel
protected VMs support is enabled at early boot time. This has to be
done early, because it needs large amounts of memory and will disable
some features like STP time sync for the lpar.

Extend ultravisor info definitions and expose it via uv_info struct
filled in during startup.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27 19:44:40 +01:00