Commit Graph

527 Commits

Author SHA1 Message Date
Paolo Bonzini e4922088f8 * Two more V!=R patches
* The last part of the cmpxchg patches
 * A few fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmPkwH0ACgkQ41TmuOI4
 ufhrshAAmv9OlCNVsGTmQLpEnGdnxGM2vBPDEygdi+oVHtpMBFn27R3fu295aUR0
 v0o3xsSImhaOU03OxWrsLqPanEL5BqnicLwkL4xou3NXXD4Wo0Zrstd3ykfaODhq
 bTDx7zC2zMQ5J+LPuwDaYUat5R0bHv7cULv1CKLdyISnPGafy0kpUPvC30nymJZi
 nV7/DjvDYbuOFfhdTEOklGRXvMSEBPLGhIJk/cYZzJECNeNJFUeSs+00uNJ8P6WO
 BQD/FLWie+Fn6lTGIUhulZCPf65KI4bHHLB6WFXA5Jy+O08urdtLiZwlBC4iNsFV
 NFIwangpJ/RnupJoOMwQfw31op5SZuiOYn91njaGIiLpHgvA9+iaERsqXtjp8NW7
 /ne1TZqtrGbYY71XvZ/yPQU5VGc/MG1CyCGX1CPNSQO7v4yl27BNChxdkBHzzm2u
 C0IuLZuXl25XwAt8xbdi65fb84pJOeWRU4Zoe4cUZ3drBy5cZsmFXe3lhEAqs7nf
 MB9XekTLpZ6pCqTE1u/BOrobVg5es/lDQiDeLCvDe1I3I5inSD6ehjJz7qjK0w8o
 3pn0rb+Kb4Ijzfi4RNbgJXmBNzkwwSSPPwYt4THHOZtr8p0fZMBeGHqq1wTJmKcq
 M/+9w4cZqgFpdyNqitj8NyTayX1Lj4LWayexCBYaGkLuHTD6cCk=
 =HOly
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-6.3-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

* Two more V!=R patches
* The last part of the cmpxchg patches
* A few fixes
2023-02-15 12:35:26 -05:00
Paolo Bonzini 33436335e9 KVM/riscv changes for 6.3
- Fix wrong usage of PGDIR_SIZE to check page sizes
 - Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect()
 - Redirect illegal instruction traps to guest
 - SBI PMU support for guest
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmPifFIACgkQrUjsVaLH
 LAcEyxAAinMBaBhiPmwWZQvcCzh/UFmJo8BQCwAPuwoc/a4ZGAR7ylzd0oJilP8M
 wSgX6Ad8XF+CEW2VpxW9nwyi41N25ep1Lrf8vOaWy9L9QNUo0t15WrCIbXT2p399
 HrK9fz7HHKKIMsJy+rYb9EepdmMf55xtr1Y/EjyvhoDQbrEMlKsAODYz/SUoriQG
 Tn3cCYBzLdvzDzu0xXM9v+nsetWXdajK/v4je+mE3NQceXhePAO4oVWP4IpnoROd
 ZQm3evvVdf0WtKG9curxwMB7jjBqDBFrcLYl0qHGa7pi2o5PzVM7esgaV47KwetH
 IgA/Mrf1IfzpgM7VYDDax5wUHlKj63KisqU0J8rU3PUloQXaWqv7+ho51t9GzZ/i
 9x4uyO/evVntgyTw6HCbqmQJDgEtJiG1ydrR/ydBMYHLnh7LPY2UpKgcqmirtbkK
 1/DYDp84vikQ5VW1hc8IACdoBShh9Moh4xsEStzkTrIeHcZCjtORXUh8UIPZ0Mu2
 7Mnkktu9I55SLwA3rwH/EYT1ISrOV1G+q3wfqgeLpn8YUWwCIiqWQ5Ur0/WSMJse
 uJ3HedZDzj9T4n4khX+mKEYh6joAafQZag+4TID2lRSwd0S/mpeC22hYrViMdDmq
 yhE+JNin/sz4AVaHNzGwfqk2NC2RFl9aRn2X0xTwyBubif9pKMQ=
 =spUL
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.3-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.3

- Fix wrong usage of PGDIR_SIZE to check page sizes
- Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect()
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
2023-02-15 12:33:28 -05:00
Janis Schoetterl-Glausch a7b0417328 Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG
Describe the semantics of the new KVM_S390_MEMOP_F_CMPXCHG flag for
absolute vm write memops which allows user space to perform (storage key
checked) cmpxchg operations on guest memory.

Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20230206164602.138068-14-scgl@linux.ibm.com
Message-Id: <20230206164602.138068-14-scgl@linux.ibm.com>
[frankja@de.ibm.com: Removed a line from an earlier version]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-02-07 18:06:00 +01:00
Nico Boehr f2d3155e2a KVM: s390: disable migration mode when dirty tracking is disabled
Migration mode is a VM attribute which enables tracking of changes in
storage attributes (PGSTE). It assumes dirty tracking is enabled on all
memslots to keep a dirty bitmap of pages with changed storage attributes.

When enabling migration mode, we currently check that dirty tracking is
enabled for all memslots. However, userspace can disable dirty tracking
without disabling migration mode.

Since migration mode is pointless with dirty tracking disabled, disable
migration mode whenever userspace disables dirty tracking on any slot.

Also update the documentation to clarify that dirty tracking must be
enabled when enabling migration mode, which is already enforced by the
code in kvm_s390_vm_start_migration().

Also highlight in the documentation for KVM_S390_GET_CMMA_BITS that it
can now fail with -EINVAL when dirty tracking is disabled while
migration mode is on. Move all the error codes to a table so this stays
readable.

To disable migration mode, slots_lock should be held, which is taken
in kvm_set_memory_region() and thus held in
kvm_arch_prepare_memory_region().

Restructure the prepare code a bit so all the sanity checking is done
before disabling migration mode. This ensures migration mode isn't
disabled when some sanity check fails.

Cc: stable@vger.kernel.org
Fixes: 190df4a212 ("KVM: s390: CMMA tracking, ESSA emulation, migration mode")
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20230127140532.230651-2-nrb@linux.ibm.com
Message-Id: <20230127140532.230651-2-nrb@linux.ibm.com>
[frankja@linux.ibm.com: fixed commit message typo, moved api.rst error table upwards]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2023-02-07 18:05:59 +01:00
Paolo Bonzini 25b72cf7da KVM/arm64 fixes for 6.2, take #3
- Yet another fix for non-CPU accesses to the memory backing
   the VGICv3 subsystem
 
 - A set of fixes for the setlftest checking for the S1PTW
   behaviour after the fix that went in ealier in the cycle
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmPWwGEPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDi/wP/3kbZrZ+y/YNcYioQwqibRS5DKuACKXM1dbh
 sMX0e8t3frmkrfkHZ1FsBNjSWtDLmRbjANNDWi8ypAXaPVm7/0whFqkJgyPWDO+v
 /1VXYMwMjy2zpWfPGPu+/fQL0Ninp+EfLP3Y2/Lr8VW5rH21bfuQ1rm41ucK/jB5
 IsMiQ+YObZUTrSq22fHfNJKc8fysSqeMHW96bl0QnJxf6aDDieZFGF9rlRQf/faq
 lPux0faasgQC0VgXlokWGdU1x5kXIf3Ta4VtiKARKNwxziuG8B484+5hHXvoBR1h
 bXFJJUQjQs2qBuH75BJftini9fvWvQPgbk4NvkD1tlyMhlZ5w2MTTKB4QmuW/WDT
 OGuGXAcuP2stm0dUaSn1aCwzfYgtihssp+RCAB5DOoL64i/CtHl+FJgz8wZfDPRk
 UNXdK2JccDfD6bGv/kQqPJoozjI5e8Ha2ks1O4IPHIDpIsVMIWRRGULgIRvLaHaS
 iaR7Vx+XgzW50Knj++S85eak/aTSkVaykYZIiiB4DTai1/XuAZfMA79X6IvQLxHq
 419FHmXwhJmYdWZ/JFBXWnbR6wRJiv4TR23A5u8X6o/YgBn6fmwAt6o8Avk1quZQ
 mslRPHG45hM/7Z7uSEsIQnbVVnHPhbaKr3GmHlJJ4zXRI8GaSMe23wpnJdUj1q9a
 w1Oe0rpq
 =2l/n
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.2, take #3

- Yet another fix for non-CPU accesses to the memory backing
  the VGICv3 subsystem

- A set of fixes for the setlftest checking for the S1PTW
  behaviour after the fix that went in ealier in the cycle
2023-02-04 08:57:43 -05:00
Wyes Karny fbabc2eaef Documentation: KVM: Update AMD memory encryption link
Update AMD memory encryption white-paper document link.
Previous link is not available. Update new available link.

Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Link: https://lore.kernel.org/r/20230125175948.21100-1-wyes.karny@amd.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-02-02 11:32:57 -07:00
Gavin Shan 6028acbe3a KVM: arm64: Allow no running vcpu on saving vgic3 pending table
We don't have a running VCPU context to save vgic3 pending table due
to KVM_DEV_ARM_VGIC_{GRP_CTRL, SAVE_PENDING_TABLES} command on KVM
device "kvm-arm-vgic-v3". The unknown case is caught by kvm-unit-tests.

   # ./kvm-unit-tests/tests/its-pending-migration
   WARNING: CPU: 120 PID: 7973 at arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3325 \
   mark_page_dirty_in_slot+0x60/0xe0
    :
   mark_page_dirty_in_slot+0x60/0xe0
   __kvm_write_guest_page+0xcc/0x100
   kvm_write_guest+0x7c/0xb0
   vgic_v3_save_pending_tables+0x148/0x2a0
   vgic_set_common_attr+0x158/0x240
   vgic_v3_set_attr+0x4c/0x5c
   kvm_device_ioctl+0x100/0x160
   __arm64_sys_ioctl+0xa8/0xf0
   invoke_syscall.constprop.0+0x7c/0xd0
   el0_svc_common.constprop.0+0x144/0x160
   do_el0_svc+0x34/0x60
   el0_svc+0x3c/0x1a0
   el0t_64_sync_handler+0xb4/0x130
   el0t_64_sync+0x178/0x17c

Use vgic_write_guest_lock() to save vgic3 pending table.

Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230126235451.469087-5-gshan@redhat.com
2023-01-29 18:46:11 +00:00
Gavin Shan 2f8b1ad222 KVM: arm64: Allow no running vcpu on restoring vgic3 LPI pending status
We don't have a running VCPU context to restore vgic3 LPI pending status
due to command KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_RESTORE_TABLES} on KVM
device "kvm-arm-vgic-its".

Use vgic_write_guest_lock() to restore vgic3 LPI pending status.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230126235451.469087-4-gshan@redhat.com
2023-01-29 18:46:11 +00:00
Wang Yong 608348285a Documentation: KVM: fix typos in running-nested-guests.rst
change "gues" to "guest" and remove redundant ")".

Signed-off-by: Wang Yong <yongw.kernel@gmail.com>
Reviewed-by: Kashyap Chamarthy <kchamart@redhat.com>
Link: https://lore.kernel.org/r/20230110150046.549755-1-yongw.kernel@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-26 11:23:32 -07:00
SeongJae Park 941c95fdd6 Docs/subsystem-apis: Remove '[The ]Linux' prefixes from titles of listed documents
Some documents that listed on subsystem-apis have 'Linux' or 'The Linux'
title prefixes.  It's duplicated information, and makes finding the
document of interest with human eyes not easy.  Remove the prefixes from
the titles.

Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Iwona Winiarska <iwona.winiarska@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20230122184834.181977-1-sj@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-24 15:27:08 -07:00
Aaron Lewis 14329b825f KVM: x86/pmu: Introduce masked events to the pmu event filter
When building a list of filter events, it can sometimes be a challenge
to fit all the events needed to adequately restrict the guest into the
limited space available in the pmu event filter.  This stems from the
fact that the pmu event filter requires each event (i.e. event select +
unit mask) be listed, when the intention might be to restrict the
event select all together, regardless of it's unit mask.  Instead of
increasing the number of filter events in the pmu event filter, add a
new encoding that is able to do a more generalized match on the unit mask.

Introduce masked events as another encoding the pmu event filter
understands.  Masked events has the fields: mask, match, and exclude.
When filtering based on these events, the mask is applied to the guest's
unit mask to see if it matches the match value (i.e. umask & mask ==
match).  The exclude bit can then be used to exclude events from that
match.  E.g. for a given event select, if it's easier to say which unit
mask values shouldn't be filtered, a masked event can be set up to match
all possible unit mask values, then another masked event can be set up to
match the unit mask values that shouldn't be filtered.

Userspace can query to see if this feature exists by looking for the
capability, KVM_CAP_PMU_EVENT_MASKED_EVENTS.

This feature is enabled by setting the flags field in the pmu event
filter to KVM_PMU_EVENT_FLAG_MASKED_EVENTS.

Events can be encoded by using KVM_PMU_ENCODE_MASKED_ENTRY().

It is an error to have a bit set outside the valid bits for a masked
event, and calls to KVM_SET_PMU_EVENT_FILTER will return -EINVAL in
such cases, including the high bits of the event select (35:32) if
called on Intel.

With these updates the filter matching code has been updated to match on
a common event.  Masked events were flexible enough to handle both event
types, so they were used as the common event.  This changes how guest
events get filtered because regardless of the type of event used in the
uAPI, they will be converted to masked events.  Because of this there
could be a slight performance hit because instead of matching the filter
event with a lookup on event select + unit mask, it does a lookup on event
select then walks the unit masks to find the match.  This shouldn't be a
big problem because I would expect the set of common event selects to be
small, and if they aren't the set can likely be reduced by using masked
events to generalize the unit mask.  Using one type of event when
filtering guest events allows for a common code path to be used.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-5-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:06:12 -08:00
Paolo Bonzini f15a87c006 Merge branch 'kvm-lapic-fix-and-cleanup' into HEAD
The first half or so patches fix semi-urgent, real-world relevant APICv
and AVIC bugs.

The second half fixes a variety of AVIC and optimized APIC map bugs
where KVM doesn't play nice with various edge cases that are
architecturally legal(ish), but are unlikely to occur in most real world
scenarios

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-24 06:08:01 -05:00
Paolo Bonzini dc7c31e922 Merge branch 'kvm-v6.2-rc4-fixes' into HEAD
ARM:

* Fix the PMCR_EL0 reset value after the PMU rework

* Correctly handle S2 fault triggered by a S1 page table walk
  by not always classifying it as a write, as this breaks on
  R/O memslots

* Document why we cannot exit with KVM_EXIT_MMIO when taking
  a write fault from a S1 PTW on a R/O memslot

* Put the Apple M2 on the naughty list for not being able to
  correctly implement the vgic SEIS feature, just like the M1
  before it

* Reviewer updates: Alex is stepping down, replaced by Zenghui

x86:

* Fix various rare locking issues in Xen emulation and teach lockdep
  to detect them

* Documentation improvements

* Do not return host topology information from KVM_GET_SUPPORTED_CPUID
2023-01-24 06:05:23 -05:00
Sean Christopherson 5b84b02917 KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs
Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs
for x2APIC.  If 32-bit IDs are not enabled, disable the optimized map to
honor x86 architectural behavior if multiple vCPUs shared a physical APIC
ID.  As called out in the changelog that added the hack, all CPUs whose
(possibly truncated) APIC ID matches the target are supposed to receive
the IPI.

  KVM intentionally differs from real hardware, because real hardware
  (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
  accept the interrupt in xAPIC mode and it can deliver one interrupt to
  more than one physical destination, e.g. 0x123 to 0x123 and 0x23.

Applying the hack even when x2APIC is not fully enabled means KVM doesn't
correctly handle scenarios where the guest has aliased xAPIC IDs across
multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any
interrupts.  It's extremely unlikely any real world guest aliases APIC
IDs, or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the
lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which
vCPU is "normal".

Furthermore, the hack is _not_ guaranteed to work!  The hack works if and
only if the optimized APIC map is successfully allocated.  If the map
allocation fails (unlikely), KVM will fall back to its unoptimized
behavior, which _does_ honor the architectural behavior.

Pivot on 32-bit x2APIC IDs being enabled as that is required to take
advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't
break existing setups unless they are way, way off in the weeds.

And an entry in KVM's errata to document the hack.  Alternatively, KVM
could provide an actual x2APIC quirk and document the hack that way, but
there's unlikely to ever be a use case for disabling the quirk.  Go the
errata route to avoid having to validate a quirk no one cares about.

Fixes: 5bd5db385b ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230106011306.85230-23-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13 10:45:30 -05:00
David Woodhouse 310bc39546 KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
In commit 14243b3871 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN
and event channel delivery") the clever version of me left some helpful
notes for those who would come after him:

       /*
        * For the irqfd workqueue, using the main kvm->lock mutex is
        * fine since this function is invoked from kvm_set_irq() with
        * no other lock held, no srcu. In future if it will be called
        * directly from a vCPU thread (e.g. on hypercall for an IPI)
        * then it may need to switch to using a leaf-node mutex for
        * serializing the shared_info mapping.
        */
       mutex_lock(&kvm->lock);

In commit 2fd6df2f2b ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
the other version of me ran straight past that comment without reading it,
and introduced a potential deadlock by taking vcpu->mutex and kvm->lock
in the wrong order.

Solve this as originally suggested, by adding a leaf-node lock in the Xen
state rather than using kvm->lock for it.

Fixes: 2fd6df2f2b ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20230111180651.14394-4-dwmw2@infradead.org>
[Rebase, add docs. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 17:45:58 -05:00
Paolo Bonzini 71d0393576 KVM/arm64 fixes for 6.2, take #1
- Fix the PMCR_EL0 reset value after the PMU rework
 
 - Correctly handle S2 fault triggered by a S1 page table walk
   by not always classifying it as a write, as this breaks on
   R/O memslots
 
 - Document why we cannot exit with KVM_EXIT_MMIO when taking
   a write fault from a S1 PTW on a R/O memslot
 
 - Put the Apple M2 on the naughty step for not being able to
   correctly implement the vgic SEIS feature, just liek the M1
   before it
 
 - Reviewer updates: Alex is stepping down, replaced by Zenghui
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmO27gQPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDwioP/A0UE7ujSxv3dlBstBhmtzOoX64pRufX01Kr
 1oF24M1VuTVLwl3pp1nWH10SVWv5kukYZJAJ/3tDJOaMt/Q9c0exPCPc95i2p/r7
 OC9j8rZVZnjGN6sAP5zazIT67tSanyLDeCC+j4J1pw20r2tB67LKSOoozEb5How7
 CX+Oa2OiEiI34jp33v3mFQ3VxY3714QUMBUK7n+L29IFXGmQp6dfbhn2iY3uNpoU
 YYrkPzBLUC1H//oCx0qoDDCXXeOKMGuWP1At5GIDz6ZSCBVpKdVbftCC59Dk7dDz
 7BdQ5JoEc15RTZajdopOog4RV4YHP8VszaClhCA1ML0Pd2Mf4UVLlPnn7F+3yR3r
 pMgjlOAlLJwHiwggJZ0EQ0wFdx9LuGeu3OwckGE/JxeEwaMdzGAEfcFoAGZV0ExZ
 7riiKS+NmtrkuE9wJfWOrpDiseymmUbuhHq+F/HDq/SP6UdezAylkcxZRuN/ZCRc
 9XVhTcWu/UPxoaSSd/sB4l9X8Ey/cZe28+kV7eE/m2g79bZKxHd4UUOUymb/aJxj
 og10A6i0B1DOWMtKJ9hEsB6wI6Hllrqcbo8ewX1znKoKbfHZDeU/N5D4ZvTz85sf
 zyqbsSZPDxMOwBPYTqZqG65tEWWw68HIJ9cqQzKDehN1Xm1coNIWSPrUnBMpSsWJ
 qDQNmIzf
 =XBtQ
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm64 fixes for 6.2, take #1

- Fix the PMCR_EL0 reset value after the PMU rework

- Correctly handle S2 fault triggered by a S1 page table walk
  by not always classifying it as a write, as this breaks on
  R/O memslots

- Document why we cannot exit with KVM_EXIT_MMIO when taking
  a write fault from a S1 PTW on a R/O memslot

- Put the Apple M2 on the naughty step for not being able to
  correctly implement the vgic SEIS feature, just liek the M1
  before it

- Reviewer updates: Alex is stepping down, replaced by Zenghui
2023-01-11 13:31:53 -05:00
Paolo Bonzini 3a9ae31ac2 Documentation: kvm: fix SRCU locking order docs
kvm->srcu is taken in KVM_RUN and several other vCPU ioctls, therefore
vcpu->mutex is susceptible to the same deadlock that is documented
for kvm->slots_lock.  The same holds for kvm->lock, since kvm->lock
is held outside vcpu->mutex.  Fix the documentation and rearrange it
to highlight the difference between these locks and kvm->slots_arch_lock,
and how kvm->slots_arch_lock can be useful while processing a vmexit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 13:31:33 -05:00
Paolo Bonzini 45e966fcca KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
Passing the host topology to the guest is almost certainly wrong
and will confuse the scheduler.  In addition, several fields of
these CPUID leaves vary on each processor; it is simply impossible to
return the right values from KVM_GET_SUPPORTED_CPUID in such a way that
they can be passed to KVM_SET_CPUID2.

The values that will most likely prevent confusion are all zeroes.
Userspace will have to override it anyway if it wishes to present a
specific topology to the guest.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-09 05:35:21 -05:00
Marc Zyngier afbb1b1cae Merge branch kvm-arm64/s1ptw-write-fault into kvmarm-master/fixes
* kvm-arm64/s1ptw-write-fault:
  : .
  : Fix S1PTW fault handling that was until then always taken
  : as a write. From the cover letter:
  :
  : `Recent developments on the EFI front have resulted in guests that
  : simply won't boot if the page tables are in a read-only memslot and
  : that you're a bit unlucky in the way S2 gets paged in... The core
  : issue is related to the fact that we treat a S1PTW as a write, which
  : is close enough to what needs to be done. Until to get to RO memslots.
  :
  : The first patch fixes this and is definitely a stable candidate. It
  : splits the faulting of page tables in two steps (RO translation fault,
  : followed by a writable permission fault -- should it even happen).
  : The second one documents the slightly odd behaviour of PTW writes to
  : RO memslot, which do not result in a KVM_MMIO exit. The last patch is
  : totally optional, only tangentially related, and randomly repainting
  : stuff (maybe that's contagious, who knows)."
  :
  : .
  KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
  KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
  KVM: arm64: Fix S1PTW handling on RO memslots

Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-01-05 15:25:54 +00:00
Marc Zyngier b8f8d190fa KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
Although the KVM API says that a write to a RO memslot must result
in a KVM_EXIT_MMIO describing the write, the arm64 architecture
doesn't provide the *data* written by a Stage-1 page table walk
(we only get the address).

Since there isn't much userspace can do with so little information
anyway, document the fact that such an access results in a guest
exception, not an exit. This is consistent with the guest being
terminally broken anyway.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-01-03 10:01:52 +00:00
Isaku Yamahata 0bf50497f0 KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock
Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock now
that KVM hooks CPU hotplug during the ONLINE phase, which can sleep.
Previously, KVM hooked the STARTING phase, which is not allowed to sleep
and thus could not take kvm_lock (a mutex).  This effectively allows the
task that's initiating hardware enabling/disabling to preempted and/or
migrated.

Note, the Documentation/virt/kvm/locking.rst statement that kvm_count_lock
is "raw" because hardware enabling/disabling needs to be atomic with
respect to migration is wrong on multiple fronts.  First, while regular
spinlocks can be preempted, the task holding the lock cannot be migrated.
Second, preventing migration is not required.  on_each_cpu() disables
preemption, which ensures that cpus_hardware_enabled correctly reflects
hardware state.  The task may be preempted/migrated between bumping
kvm_usage_count and invoking on_each_cpu(), but that's perfectly ok as
kvm_usage_count is still protected, e.g. other tasks that call
hardware_enable_all() will be blocked until the preempted/migrated owner
exits its critical section.

KVM does have lockless accesses to kvm_usage_count in the suspend/resume
flows, but those are safe because all tasks must be frozen prior to
suspending CPUs, and a task cannot be frozen while it holds one or more
locks (userspace tasks are frozen via a fake signal).

Preemption doesn't need to be explicitly disabled in the hotplug path.
The hotplug thread is pinned to the CPU that's being hotplugged, and KVM
only cares about having a stable CPU, i.e. to ensure hardware is enabled
on the correct CPU.  Lockep, i.e. check_preemption_disabled(), plays nice
with this state too, as is_percpu_thread() is true for the hotplug thread.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221130230934.1014142-45-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:48:34 -05:00
Sean Christopherson 3af4a9e61e KVM: x86: Serialize vendor module initialization (hardware setup)
Acquire a new mutex, vendor_module_lock, in kvm_x86_vendor_init() while
doing hardware setup to ensure that concurrent calls are fully serialized.
KVM rejects attempts to load vendor modules if a different module has
already been loaded, but doesn't handle the case where multiple vendor
modules are loaded at the same time, and module_init() doesn't run under
the global module_mutex.

Note, in practice, this is likely a benign bug as no platform exists that
supports both SVM and VMX, i.e. barring a weird VM setup, one of the
vendor modules is guaranteed to fail a support check before modifying
common KVM state.

Alternatively, KVM could perform an atomic CMPXCHG on .hardware_enable,
but that comes with its own ugliness as it would require setting
.hardware_enable before success is guaranteed, e.g. attempting to load
the "wrong" could result in spurious failure to load the "right" module.

Introduce a new mutex as using kvm_lock is extremely deadlock prone due
to kvm_lock being taken under cpus_write_lock(), and in the future, under
under cpus_read_lock().  Any operation that takes cpus_read_lock() while
holding kvm_lock would potentially deadlock, e.g. kvm_timer_init() takes
cpus_read_lock() to register a callback.  In theory, KVM could avoid
such problematic paths, i.e. do less setup under kvm_lock, but avoiding
all calls to cpus_read_lock() is subtly difficult and thus fragile.  E.g.
updating static calls also acquires cpus_read_lock().

Inverting the lock ordering, i.e. always taking kvm_lock outside
cpus_read_lock(), is not a viable option as kvm_lock is taken in various
callbacks that may be invoked under cpus_read_lock(), e.g. x86's
kvmclock_cpufreq_notifier().

The lockdep splat below is dependent on future patches to take
cpus_read_lock() in hardware_enable_all(), but as above, deadlock is
already is already possible.

  ======================================================
  WARNING: possible circular locking dependency detected
  6.0.0-smp--7ec93244f194-init2 #27 Tainted: G           O
  ------------------------------------------------------
  stable/251833 is trying to acquire lock:
  ffffffffc097ea28 (kvm_lock){+.+.}-{3:3}, at: hardware_enable_all+0x1f/0xc0 [kvm]

               but task is already holding lock:
  ffffffffa2456828 (cpu_hotplug_lock){++++}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm]

               which lock already depends on the new lock.

               the existing dependency chain (in reverse order) is:

               -> #1 (cpu_hotplug_lock){++++}-{0:0}:
         cpus_read_lock+0x2a/0xa0
         __cpuhp_setup_state+0x2b/0x60
         __kvm_x86_vendor_init+0x16a/0x1870 [kvm]
         kvm_x86_vendor_init+0x23/0x40 [kvm]
         0xffffffffc0a4d02b
         do_one_initcall+0x110/0x200
         do_init_module+0x4f/0x250
         load_module+0x1730/0x18f0
         __se_sys_finit_module+0xca/0x100
         __x64_sys_finit_module+0x1d/0x20
         do_syscall_64+0x3d/0x80
         entry_SYSCALL_64_after_hwframe+0x63/0xcd

               -> #0 (kvm_lock){+.+.}-{3:3}:
         __lock_acquire+0x16f4/0x30d0
         lock_acquire+0xb2/0x190
         __mutex_lock+0x98/0x6f0
         mutex_lock_nested+0x1b/0x20
         hardware_enable_all+0x1f/0xc0 [kvm]
         kvm_dev_ioctl+0x45e/0x930 [kvm]
         __se_sys_ioctl+0x77/0xc0
         __x64_sys_ioctl+0x1d/0x20
         do_syscall_64+0x3d/0x80
         entry_SYSCALL_64_after_hwframe+0x63/0xcd

               other info that might help us debug this:

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(cpu_hotplug_lock);
                                 lock(kvm_lock);
                                 lock(cpu_hotplug_lock);
    lock(kvm_lock);

                *** DEADLOCK ***

  1 lock held by stable/251833:
   #0: ffffffffa2456828 (cpu_hotplug_lock){++++}-{0:0}, at: hardware_enable_all+0xf/0xc0 [kvm]

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221130230934.1014142-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:41:03 -05:00
Paolo Bonzini fc471e8310 Merge branch 'kvm-late-6.1' into HEAD
x86:

* Change tdp_mmu to a read-only parameter

* Separate TDP and shadow MMU page fault paths

* Enable Hyper-V invariant TSC control

selftests:

* Use TAP interface for kvm_binary_stats_test and tsc_msrs_test

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:36:47 -05:00
Paolo Bonzini a5496886eb Merge branch 'kvm-late-6.1-fixes' into HEAD
x86:

* several fixes to nested VMX execution controls

* fixes and clarification to the documentation for Xen emulation

* do not unnecessarily release a pmu event with zero period

* MMU fixes

* fix Coverity warning in kvm_hv_flush_tlb()

selftests:

* fixes for the ucall mechanism in selftests

* other fixes mostly related to compilation with clang
2022-12-28 07:19:14 -05:00
Paolo Bonzini 02d9a04da4 Documentation: kvm: clarify SRCU locking order
Currently only the locking order of SRCU vs kvm->slots_arch_lock
and kvm->slots_lock is documented.  Extend this to kvm->lock
since Xen emulation got it terribly wrong.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-28 06:02:54 -05:00
David Woodhouse af2808906a KVM: x86/xen: Documentation updates and clarifications
Most notably, the KVM_XEN_EVTCHN_RESET feature had escaped documentation
entirely. Along with how to turn most stuff off on SHUTDOWN_soft_reset.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221226120320.1125390-6-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27 06:01:50 -05:00
Sean Christopherson 23e528d9bc KVM: Delete extra block of "};" in the KVM API documentation
Delete an extra block of code/documentation that snuck in when KVM's
documentation was converted to ReST format.

Fixes: 106ee47dc6 ("docs: kvm: Convert api.txt to ReST format")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221207003637.2041211-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27 06:00:51 -05:00
Linus Torvalds 8fa590bf34 ARM64:
* Enable the per-vcpu dirty-ring tracking mechanism, together with an
   option to keep the good old dirty log around for pages that are
   dirtied by something other than a vcpu.
 
 * Switch to the relaxed parallel fault handling, using RCU to delay
   page table reclaim and giving better performance under load.
 
 * Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option,
   which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d:
   "Fix a number of issues with MTE, such as races on the tags being
   initialised vs the PG_mte_tagged flag as well as the lack of support
   for VM_SHARED when KVM is involved.  Patches from Catalin Marinas and
   Peter Collingbourne").
 
 * Merge the pKVM shadow vcpu state tracking that allows the hypervisor
   to have its own view of a vcpu, keeping that state private.
 
 * Add support for the PMUv3p5 architecture revision, bringing support
   for 64bit counters on systems that support it, and fix the
   no-quite-compliant CHAIN-ed counter support for the machines that
   actually exist out there.
 
 * Fix a handful of minor issues around 52bit VA/PA support (64kB pages
   only) as a prefix of the oncoming support for 4kB and 16kB pages.
 
 * Pick a small set of documentation and spelling fixes, because no
   good merge window would be complete without those.
 
 s390:
 
 * Second batch of the lazy destroy patches
 
 * First batch of KVM changes for kernel virtual != physical address support
 
 * Removal of a unused function
 
 x86:
 
 * Allow compiling out SMM support
 
 * Cleanup and documentation of SMM state save area format
 
 * Preserve interrupt shadow in SMM state save area
 
 * Respond to generic signals during slow page faults
 
 * Fixes and optimizations for the non-executable huge page errata fix.
 
 * Reprogram all performance counters on PMU filter change
 
 * Cleanups to Hyper-V emulation and tests
 
 * Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest
   running on top of a L1 Hyper-V hypervisor)
 
 * Advertise several new Intel features
 
 * x86 Xen-for-KVM:
 
 ** Allow the Xen runstate information to cross a page boundary
 
 ** Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
 
 ** Add support for 32-bit guests in SCHEDOP_poll
 
 * Notable x86 fixes and cleanups:
 
 ** One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
 
 ** Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
    years back when eliminating unnecessary barriers when switching between
    vmcs01 and vmcs02.
 
 ** Clean up vmread_error_trampoline() to make it more obvious that params
    must be passed on the stack, even for x86-64.
 
 ** Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
    of the current guest CPUID.
 
 ** Fudge around a race with TSC refinement that results in KVM incorrectly
    thinking a guest needs TSC scaling when running on a CPU with a
    constant TSC, but no hardware-enumerated TSC frequency.
 
 ** Advertise (on AMD) that the SMM_CTL MSR is not supported
 
 ** Remove unnecessary exports
 
 Generic:
 
 * Support for responding to signals during page faults; introduces
   new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
 
 Selftests:
 
 * Fix an inverted check in the access tracking perf test, and restore
   support for asserting that there aren't too many idle pages when
   running on bare metal.
 
 * Fix build errors that occur in certain setups (unsure exactly what is
   unique about the problematic setup) due to glibc overriding
   static_assert() to a variant that requires a custom message.
 
 * Introduce actual atomics for clear/set_bit() in selftests
 
 * Add support for pinning vCPUs in dirty_log_perf_test.
 
 * Rename the so called "perf_util" framework to "memstress".
 
 * Add a lightweight psuedo RNG for guest use, and use it to randomize
   the access pattern and write vs. read percentage in the memstress tests.
 
 * Add a common ucall implementation; code dedup and pre-work for running
   SEV (and beyond) guests in selftests.
 
 * Provide a common constructor and arch hook, which will eventually be
   used by x86 to automatically select the right hypercall (AMD vs. Intel).
 
 * A bunch of added/enabled/fixed selftests for ARM64, covering memslots,
   breakpoints, stage-2 faults and access tracking.
 
 * x86-specific selftest changes:
 
 ** Clean up x86's page table management.
 
 ** Clean up and enhance the "smaller maxphyaddr" test, and add a related
    test to cover generic emulation failure.
 
 ** Clean up the nEPT support checks.
 
 ** Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
 
 ** Fix an ordering issue in the AMX test introduced by recent conversions
    to use kvm_cpu_has(), and harden the code to guard against similar bugs
    in the future.  Anything that tiggers caching of KVM's supported CPUID,
    kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
    the caching occurs before the test opts in via prctl().
 
 Documentation:
 
 * Remove deleted ioctls from documentation
 
 * Clean up the docs for the x86 MSR filter.
 
 * Various fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOaFrcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPemQgAq49excg2Cc+EsHnZw3vu/QWdA0Rt
 KhL3OgKxuHNjCbD2O9n2t5di7eJOTQ7F7T0eDm3xPTr4FS8LQ2327/mQePU/H2CF
 mWOpq9RBWLzFsSTeVA2Mz9TUTkYSnDHYuRsBvHyw/n9cL76BWVzjImldFtjYjjex
 yAwl8c5itKH6bc7KO+5ydswbvBzODkeYKUSBNdbn6m0JGQST7XppNwIAJvpiHsii
 Qgpk0e4Xx9q4PXG/r5DedI6BlufBsLhv0aE9SHPzyKH3JbbUFhJYI8ZD5OhBQuYW
 MwxK2KlM5Jm5ud2NZDDlsMmmvd1lnYCFDyqNozaKEWC1Y5rq1AbMa51fXA==
 =QAYX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM64:

   - Enable the per-vcpu dirty-ring tracking mechanism, together with an
     option to keep the good old dirty log around for pages that are
     dirtied by something other than a vcpu.

   - Switch to the relaxed parallel fault handling, using RCU to delay
     page table reclaim and giving better performance under load.

   - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
     option, which multi-process VMMs such as crosvm rely on (see merge
     commit 382b5b87a97d: "Fix a number of issues with MTE, such as
     races on the tags being initialised vs the PG_mte_tagged flag as
     well as the lack of support for VM_SHARED when KVM is involved.
     Patches from Catalin Marinas and Peter Collingbourne").

   - Merge the pKVM shadow vcpu state tracking that allows the
     hypervisor to have its own view of a vcpu, keeping that state
     private.

   - Add support for the PMUv3p5 architecture revision, bringing support
     for 64bit counters on systems that support it, and fix the
     no-quite-compliant CHAIN-ed counter support for the machines that
     actually exist out there.

   - Fix a handful of minor issues around 52bit VA/PA support (64kB
     pages only) as a prefix of the oncoming support for 4kB and 16kB
     pages.

   - Pick a small set of documentation and spelling fixes, because no
     good merge window would be complete without those.

  s390:

   - Second batch of the lazy destroy patches

   - First batch of KVM changes for kernel virtual != physical address
     support

   - Removal of a unused function

  x86:

   - Allow compiling out SMM support

   - Cleanup and documentation of SMM state save area format

   - Preserve interrupt shadow in SMM state save area

   - Respond to generic signals during slow page faults

   - Fixes and optimizations for the non-executable huge page errata
     fix.

   - Reprogram all performance counters on PMU filter change

   - Cleanups to Hyper-V emulation and tests

   - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2
     guest running on top of a L1 Hyper-V hypervisor)

   - Advertise several new Intel features

   - x86 Xen-for-KVM:

      - Allow the Xen runstate information to cross a page boundary

      - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured

      - Add support for 32-bit guests in SCHEDOP_poll

   - Notable x86 fixes and cleanups:

      - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).

      - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped
        a few years back when eliminating unnecessary barriers when
        switching between vmcs01 and vmcs02.

      - Clean up vmread_error_trampoline() to make it more obvious that
        params must be passed on the stack, even for x86-64.

      - Let userspace set all supported bits in MSR_IA32_FEAT_CTL
        irrespective of the current guest CPUID.

      - Fudge around a race with TSC refinement that results in KVM
        incorrectly thinking a guest needs TSC scaling when running on a
        CPU with a constant TSC, but no hardware-enumerated TSC
        frequency.

      - Advertise (on AMD) that the SMM_CTL MSR is not supported

      - Remove unnecessary exports

  Generic:

   - Support for responding to signals during page faults; introduces
     new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks

  Selftests:

   - Fix an inverted check in the access tracking perf test, and restore
     support for asserting that there aren't too many idle pages when
     running on bare metal.

   - Fix build errors that occur in certain setups (unsure exactly what
     is unique about the problematic setup) due to glibc overriding
     static_assert() to a variant that requires a custom message.

   - Introduce actual atomics for clear/set_bit() in selftests

   - Add support for pinning vCPUs in dirty_log_perf_test.

   - Rename the so called "perf_util" framework to "memstress".

   - Add a lightweight psuedo RNG for guest use, and use it to randomize
     the access pattern and write vs. read percentage in the memstress
     tests.

   - Add a common ucall implementation; code dedup and pre-work for
     running SEV (and beyond) guests in selftests.

   - Provide a common constructor and arch hook, which will eventually
     be used by x86 to automatically select the right hypercall (AMD vs.
     Intel).

   - A bunch of added/enabled/fixed selftests for ARM64, covering
     memslots, breakpoints, stage-2 faults and access tracking.

   - x86-specific selftest changes:

      - Clean up x86's page table management.

      - Clean up and enhance the "smaller maxphyaddr" test, and add a
        related test to cover generic emulation failure.

      - Clean up the nEPT support checks.

      - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.

      - Fix an ordering issue in the AMX test introduced by recent
        conversions to use kvm_cpu_has(), and harden the code to guard
        against similar bugs in the future. Anything that tiggers
        caching of KVM's supported CPUID, kvm_cpu_has() in this case,
        effectively hides opt-in XSAVE features if the caching occurs
        before the test opts in via prctl().

  Documentation:

   - Remove deleted ioctls from documentation

   - Clean up the docs for the x86 MSR filter.

   - Various fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits)
  KVM: x86: Add proper ReST tables for userspace MSR exits/flags
  KVM: selftests: Allocate ucall pool from MEM_REGION_DATA
  KVM: arm64: selftests: Align VA space allocator with TTBR0
  KVM: arm64: Fix benign bug with incorrect use of VA_BITS
  KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
  KVM: x86: Advertise that the SMM_CTL MSR is not supported
  KVM: x86: remove unnecessary exports
  KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"
  tools: KVM: selftests: Convert clear/set_bit() to actual atomics
  tools: Drop "atomic_" prefix from atomic test_and_set_bit()
  tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
  KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests
  perf tools: Use dedicated non-atomic clear/set bit helpers
  tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
  KVM: arm64: selftests: Enable single-step without a "full" ucall()
  KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
  KVM: Remove stale comment about KVM_REQ_UNHALT
  KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
  KVM: Reference to kvm_userspace_memory_region in doc and comments
  KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
  ...
2022-12-15 11:12:21 -08:00
Sean Christopherson 549a715b98 KVM: x86: Add proper ReST tables for userspace MSR exits/flags
Add ReST formatting to the set of userspace MSR exits/flags so that the
resulting HTML docs generate a table instead of malformed gunk.  This
also fixes a warning that was introduced by a recent cleanup of the
relevant documentation (yay copy+paste).

 >> Documentation/virt/kvm/api.rst:7287: WARNING: Block quote ends
    without a blank line; unexpected unindent.

Fixes: 1ae099540e ("KVM: x86: Allow deflecting unknown MSR accesses to user space")
Fixes: 1f15814718 ("KVM: x86: Clean up KVM_CAP_X86_USER_SPACE_MSR documentation")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221207000959.2035098-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-14 13:28:31 -05:00
Linus Torvalds a89ef2aa55 Add TDX guest attestation infrastructure and driver
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYmwACgkQaDWVMHDJ
 krD8hg/+J0hUTfljmlCctwGZyqVR3Y2E722wL9oTvbgYiUAtFrARzfPF0WNwvHi5
 Ywvod5hQ4unPoluthdVAD/uJqcPVhjIZ7CvNTGrS8J7ED5x5ydGLNWAL3Rn+9s6O
 xkz/DsV4zl+cPQ60XLsO+3Mc6RhwVs9DUthpUovl22epmgmRPCovkHWkvQsZajJq
 ceF/78ThfrkG4dDouaIXi1gsmKLLzU4KdHeBATMg0bgPQXFJZSGBCLaeJXWmLapq
 7N3SznUqDMn4Plr/IuP4XuMA6VTVojrakCcBmw5SGVqhkVWGM1/FMg7jHSQS7Z5V
 5uG7CkhTBqh17v9xKwDMPh34D51TLtNifA7jbecyL5155czFkj7BoSwEFINU/wCz
 agUO9NvK9j1chUnA2UGqGQigM3nWGZHMwaQjfgBWyq5gqF8HURUUrjx6XuunOfmB
 1byyrDu0g48u/zaQ/RpNfewz1ZY+WylDPcqOhYaVWF1PYThStML/VMBKpdsl1Ovw
 nytUdQsaBIjFHQdB+snizaF93+/0FG+FTGAlDnHYmey/8plL2LYuzrcDnDYnGEXa
 tN3HFd2lAi4JBLmvmgF39gH+BLXuKTLweIhwTXZTn91cfire3yxiXAnLd0tuptMP
 aXFddxKMdMpxTqzy2X+8gJjqCr2lZ9gZkxaPsWwrBM+xrJf0p2w=
 =JGnq
 -----END PGP SIGNATURE-----

Merge tag 'x86_tdx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 tdx updates from Dave Hansen:
 "This includes a single chunk of new functionality for TDX guests which
  allows them to talk to the trusted TDX module software and obtain an
  attestation report.

  This report can then be used to prove the trustworthiness of the guest
  to a third party and get access to things like storage encryption
  keys"

* tag 'x86_tdx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/tdx: Test TDX attestation GetReport support
  virt: Add TDX guest driver
  x86/tdx: Add a wrapper to get TDREPORT0 from the TDX Module
2022-12-12 14:27:49 -08:00
Paolo Bonzini 9352e7470a Merge remote-tracking branch 'kvm/queue' into HEAD
x86 Xen-for-KVM:

* Allow the Xen runstate information to cross a page boundary

* Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured

* add support for 32-bit guests in SCHEDOP_poll

x86 fixes:

* One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).

* Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
   years back when eliminating unnecessary barriers when switching between
   vmcs01 and vmcs02.

* Clean up the MSR filter docs.

* Clean up vmread_error_trampoline() to make it more obvious that params
  must be passed on the stack, even for x86-64.

* Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
  of the current guest CPUID.

* Fudge around a race with TSC refinement that results in KVM incorrectly
  thinking a guest needs TSC scaling when running on a CPU with a
  constant TSC, but no hardware-enumerated TSC frequency.

* Advertise (on AMD) that the SMM_CTL MSR is not supported

* Remove unnecessary exports

Selftests:

* Fix an inverted check in the access tracking perf test, and restore
  support for asserting that there aren't too many idle pages when
  running on bare metal.

* Fix an ordering issue in the AMX test introduced by recent conversions
  to use kvm_cpu_has(), and harden the code to guard against similar bugs
  in the future.  Anything that tiggers caching of KVM's supported CPUID,
  kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
  the caching occurs before the test opts in via prctl().

* Fix build errors that occur in certain setups (unsure exactly what is
  unique about the problematic setup) due to glibc overriding
  static_assert() to a variant that requires a custom message.

* Introduce actual atomics for clear/set_bit() in selftests

Documentation:

* Remove deleted ioctls from documentation

* Various fixes
2022-12-12 15:54:07 -05:00
Paolo Bonzini eb5618911a KVM/arm64 updates for 6.2
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
   option to keep the good old dirty log around for pages that are
   dirtied by something other than a vcpu.
 
 - Switch to the relaxed parallel fault handling, using RCU to delay
   page table reclaim and giving better performance under load.
 
 - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
   option, which multi-process VMMs such as crosvm rely on.
 
 - Merge the pKVM shadow vcpu state tracking that allows the hypervisor
   to have its own view of a vcpu, keeping that state private.
 
 - Add support for the PMUv3p5 architecture revision, bringing support
   for 64bit counters on systems that support it, and fix the
   no-quite-compliant CHAIN-ed counter support for the machines that
   actually exist out there.
 
 - Fix a handful of minor issues around 52bit VA/PA support (64kB pages
   only) as a prefix of the oncoming support for 4kB and 16kB pages.
 
 - Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
   stage-2 faults and access tracking. You name it, we got it, we
   probably broke it.
 
 - Pick a small set of documentation and spelling fixes, because no
   good merge window would be complete without those.
 
 As a side effect, this tag also drags:
 
 - The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
   series
 
 - A shared branch with the arm64 tree that repaints all the system
   registers to match the ARM ARM's naming, and resulting in
   interesting conflicts
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmOODb0PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDztsQAInRnsgLl57/SpqhZzExNCllN6AT/bdeB3uz
 rnw3ScJOV174uNKp8lnPWoTvu2YUGiVtBp6tFHhDI8le7zHX438ZT8KE5mcs8p5i
 KfFKnb8SHV2DDpqkcy24c0Xl/6vsg1qkKrdfJb49yl5ZakRITDpynW/7tn6dXsxX
 wASeGFdCYeW4g2xMQzsCbtx6LgeQ8uomBmzRfPrOtZHYYxAn6+4Mj4595EC1sWxM
 AQnbp8tW3Vw46saEZAQvUEOGOW9q0Nls7G21YqQ52IA+ZVDK1LmAF2b1XY3edjkk
 pX8EsXOURfqdasBxfSfF3SgnUazoz9GHpSzp1cTVTktrPp40rrT7Ldtml0ktq69d
 1malPj47KVMDsIq0kNJGnMxciXFgAHw+VaCQX+k4zhIatNwviMbSop2fEoxj22jc
 4YGgGOxaGrnvmAJhreCIbr4CkZk5CJ8Zvmtfg+QM6npIp8BY8896nvORx/d4i6tT
 H4caadd8AAR56ANUyd3+KqF3x0WrkaU0PLHJLy1tKwOXJUUTjcpvIfahBAAeUlSR
 qEFrtb+EEMPgAwLfNOICcNkPZR/yyuYvM+FiUQNVy5cNiwFkpztpIctfOFaHySGF
 K07O2/a1F6xKL0OKRUg7hGKknF9ecmux4vHhiUMuIk9VOgNTWobHozBDorLKXMzC
 aWa6oGVC
 =iIPT
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.2

- Enable the per-vcpu dirty-ring tracking mechanism, together with an
  option to keep the good old dirty log around for pages that are
  dirtied by something other than a vcpu.

- Switch to the relaxed parallel fault handling, using RCU to delay
  page table reclaim and giving better performance under load.

- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
  option, which multi-process VMMs such as crosvm rely on.

- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
  to have its own view of a vcpu, keeping that state private.

- Add support for the PMUv3p5 architecture revision, bringing support
  for 64bit counters on systems that support it, and fix the
  no-quite-compliant CHAIN-ed counter support for the machines that
  actually exist out there.

- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
  only) as a prefix of the oncoming support for 4kB and 16kB pages.

- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
  stage-2 faults and access tracking. You name it, we got it, we
  probably broke it.

- Pick a small set of documentation and spelling fixes, because no
  good merge window would be complete without those.

As a side effect, this tag also drags:

- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
  series

- A shared branch with the arm64 tree that repaints all the system
  registers to match the ARM ARM's naming, and resulting in
  interesting conflicts
2022-12-09 09:12:12 +01:00
Marc Zyngier 86f27d849b Merge branch kvm-arm64/misc-6.2 into kvmarm-master/next
* kvm-arm64/misc-6.2:
  : .
  : Misc fixes for 6.2:
  :
  : - Fix formatting for the pvtime documentation
  :
  : - Fix a comment in the VHE-specific Makefile
  : .
  KVM: arm64: Fix typo in comment
  KVM: arm64: Fix pvtime documentation

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05 14:39:12 +00:00
Marc Zyngier 382b5b87a9 Merge branch kvm-arm64/mte-map-shared into kvmarm-master/next
* kvm-arm64/mte-map-shared:
  : .
  : Update the MTE support to allow the VMM to use shared mappings
  : to back the memslots exposed to MTE-enabled guests.
  :
  : Patches courtesy of Catalin Marinas and Peter Collingbourne.
  : .
  : Fix a number of issues with MTE, such as races on the tags
  : being initialised vs the PG_mte_tagged flag as well as the
  : lack of support for VM_SHARED when KVM is involved.
  :
  : Patches from Catalin Marinas and Peter Collingbourne.
  : .
  Documentation: document the ABI changes for KVM_CAP_ARM_MTE
  KVM: arm64: permit all VM_MTE_ALLOWED mappings with MTE enabled
  KVM: arm64: unify the tests for VMAs in memslots when MTE is enabled
  arm64: mte: Lock a page for MTE tag initialisation
  mm: Add PG_arch_3 page flag
  KVM: arm64: Simplify the sanitise_mte_tags() logic
  arm64: mte: Fix/clarify the PG_mte_tagged semantics
  mm: Do not enable PG_arch_2 for all 64-bit architectures

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05 14:38:24 +00:00
David Matlack 34e30ebbe4 KVM: Document the interaction between KVM_CAP_HALT_POLL and halt_poll_ns
Clarify the existing documentation about how KVM_CAP_HALT_POLL and
halt_poll_ns interact to make it clear that VMs using KVM_CAP_HALT_POLL
ignore halt_poll_ns.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20221201195249.3369720-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02 13:20:30 -05:00
David Matlack b8b43a4c2e KVM: Move halt-polling documentation into common directory
Move halt-polling.rst into the common KVM documentation directory and
out of the x86-specific directory. Halt-polling is a common feature and
the existing documentation is already written as such.

Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20221201195249.3369720-2-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02 13:20:30 -05:00
Paolo Bonzini b376144595 Misc KVM x86 fixes and cleanups for 6.2:
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
 
  - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
    years back when eliminating unnecessary barriers when switching between
    vmcs01 and vmcs02.
 
  - Clean up the MSR filter docs.
 
  - Clean up vmread_error_trampoline() to make it more obvious that params
    must be passed on the stack, even for x86-64.
 
  - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
    of the current guest CPUID.
 
  - Fudge around a race with TSC refinement that results in KVM incorrectly
    thinking a guest needs TSC scaling when running on a CPU with a
    constant TSC, but no hardware-enumerated TSC frequency.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmOJQesSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5IR4QAKGPbLRykY/2FohV2HDu5fDPxA2Fe9nu
 5W7ZIptQu+tQtCTWKFEjcQdwYoNrLbr0hr1eGubVbIvBqJbwPQfH7G0765eOIcvy
 s6Zn2N24IisIoUxdkJGOL3Tt1UR7wCFbwC+ms0i4FQvMcw+TbM0BTHgJDdwR5laX
 mGN7ubz5iImwDFFE3Bd8Qy5I+FGL9CI60l+RzK6b7J8HYi1wOBMLU9QueF/dB7gR
 g+navZJAAnvN6YIkjP5j8yPBuvhDzni379ue5ATDq1ALvyyI7xlYALsxpUjCnLuo
 CkbvgmfmC94Vdm7pzFgsbazUN2oIjwoimjFQHP1bf8Jmd+770R282JpnwiD/ydCV
 Tl2ArwXA2zxVxNZm9g/XqPBwWBWWgWfYIQbuuxc065MnXCnHkY5UGGf0JNx41CDl
 hdtm9DHkft8+6kyBBmgkdKxd328Znljq02v3nLePUipfpDVaNj4VAUj9IpV6Lpuj
 GJjs4Wx7oqFwH1Im/LqZgnOGwgkSj3ObHtkYx2RSrQAQultbjuplFz2qZWP8PF6A
 FrJbcddKOmLINrdNOlvTd5WKCAjtV8vycjFkk+/7H67rpZdM8AI1StrzMP6gmwg4
 ARozZJ2UF8nTriRYFQbFQyNm9bBTZ7YQ/HajqfhvCuZLi7i1EaImhC0F1xn7IU5S
 6XhvQPvjRTgS
 =i6OA
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-fixes-6.2-1' of https://github.com/kvm-x86/linux into HEAD

Misc KVM x86 fixes and cleanups for 6.2:

 - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).

 - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
   years back when eliminating unnecessary barriers when switching between
   vmcs01 and vmcs02.

 - Clean up the MSR filter docs.

 - Clean up vmread_error_trampoline() to make it more obvious that params
   must be passed on the stack, even for x86-64.

 - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
   of the current guest CPUID.

 - Fudge around a race with TSC refinement that results in KVM incorrectly
   thinking a guest needs TSC scaling when running on a CPU with a
   constant TSC, but no hardware-enumerated TSC frequency.
2022-12-02 12:56:25 -05:00
Javier Martinez Canillas 10c5e80b2c KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
The ioctls are missing an architecture property that is present in others.

Suggested-by: Sergio Lopez Pascual <slp@redhat.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-5-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02 12:54:41 -05:00
Javier Martinez Canillas 30ee198ce4 KVM: Reference to kvm_userspace_memory_region in doc and comments
There are still references to the removed kvm_memory_region data structure
but the doc and comments should mention struct kvm_userspace_memory_region
instead, since that is what's used by the ioctl that replaced the old one
and this data structure support the same set of flags.

Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-4-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02 12:54:40 -05:00
Javier Martinez Canillas 66a9221d73 KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
The documentation says that the ioctl has been deprecated, but it has been
actually removed and the remaining references are just left overs.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-3-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02 12:54:40 -05:00
Javier Martinez Canillas 61e15f8712 KVM: Delete all references to removed KVM_SET_MEMORY_REGION ioctl
The documentation says that the ioctl has been deprecated, but it has been
actually removed and the remaining references are just left overs.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-2-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02 12:54:30 -05:00
Sean Christopherson 1f15814718 KVM: x86: Clean up KVM_CAP_X86_USER_SPACE_MSR documentation
Clean up the KVM_CAP_X86_USER_SPACE_MSR documentation to eliminate
misleading and/or inconsistent verbiage, and to actually document what
accesses are intercepted by which flags.

  - s/will/may since not all #GPs are guaranteed to be intercepted
  - s/deflect/intercept to align with common KVM terminology
  - s/user space/userspace to align with the majority of KVM docs
  - Avoid using "trap" terminology, as KVM exits to userspace _before_
    stepping, i.e. doesn't exhibit trap-like behavior
  - Actually document the flags

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220831001706.4075399-4-seanjc@google.com
2022-11-30 16:19:43 -08:00
Sean Christopherson b93d2ec34e KVM: x86: Reword MSR filtering docs to more precisely define behavior
Reword the MSR filtering documentatiion to more precisely define the
behavior of filtering using common virtualization terminology.

  - Explicitly document KVM's behavior when an MSR is denied
  - s/handled/allowed as there is no guarantee KVM will "handle" the
    MSR access
  - Drop the "fall back" terminology, which incorrectly suggests that
    there is existing KVM behavior to fall back to
  - Fix an off-by-one error in the range (the end is exclusive)
  - Call out the interaction between MSR filtering and
    KVM_CAP_X86_USER_SPACE_MSR's KVM_MSR_EXIT_REASON_FILTER
  - Delete the redundant paragraph on what '0' and '1' in the bitmap
    means, it's covered by the sections on KVM_MSR_FILTER_{READ,WRITE}
  - Delete the clause on x2APIC MSR behavior depending on APIC base, this
    is covered by stating that KVM follows architectural behavior when
    emulating/virtualizing MSR accesses

Reported-by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220831001706.4075399-3-seanjc@google.com
2022-11-30 16:19:43 -08:00
Sean Christopherson 5c8c0b3273 KVM: x86: Delete documentation for READ|WRITE in KVM_X86_SET_MSR_FILTER
Delete the paragraph that describes the behavior when both
KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE are set for a range.  There is
nothing special about KVM's handling of this combination, whereas
explicitly documenting the combination suggests that there is some magic
behavior the user needs to be aware of.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220831001706.4075399-2-seanjc@google.com
2022-11-30 16:19:43 -08:00
David Woodhouse d8ba8ba4c8 KVM: x86/xen: Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
Closer inspection of the Xen code shows that we aren't supposed to be
using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
If we randomly set the top bit of ->state_entry_time for a guest that
hasn't asked for it and doesn't expect it, that could make the runtimes
fail to add up and confuse the guest. Without the flag it's perfectly
safe for a vCPU to read its own vcpu_runstate_info; just not for one
vCPU to read *another's*.

I briefly pondered adding a word for the whole set of VMASST_TYPE_*
flags but the only one we care about for HVM guests is this, so it
seemed a bit pointless.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-30 10:59:37 -05:00
Peter Collingbourne a4baf8d263 Documentation: document the ABI changes for KVM_CAP_ARM_MTE
Document both the restriction on VM_MTE_ALLOWED mappings and
the relaxation for shared mappings.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221104011041.290951-9-pcc@google.com
2022-11-29 09:26:07 +00:00
Paolo Bonzini 1e79a9e3ab - Second batch of the lazy destroy patches
- First batch of KVM changes for kernel virtual != physical address support
 - Removal of a unused function
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmN/eYwACgkQ41TmuOI4
 ufjoxA/9Et38aXO/IhmUt8v0QhA4yec+sc5GSFfQSYehej/1Vqhw0DXx+ORUiRgg
 +rtiXJSSqkuD2dL+BDffY2xoul6nzNdVf4AbkcnrWscfWr6xwVYlPvuL0ymGI6J2
 U/IPedRoKw0bHw/wHs05yV5PubrRwDFERKhtyXWYGbPJhX0w2n3IFOoKH1oWBhLW
 Dc8jEs6t3gDbJ71Er0xoeBUoiuu+PgZG06cpOvzBZ0KclRgjADXyISqqk8/4mu8w
 R+/Wf8NcrbQYV1jfCeq5zIsKC8uvnFj25UuyTLumn5vh+dNNsvE72Khe4tz7LI0I
 ZPZ+GZuemu7Yi12dKjw4Sw3ui0ejWH/5XL1SVB0X/xYIWrBqOot+Lq6538GCng+c
 tJt+zsu64VFgXCCZ8O9qO4uE4DBL70H3ThT7VZxIghSTZtY0xh3uFc64f3/3d9dy
 K4WTJHrmMxhXaA/rqtIa8I53JvFl8CztofZATiQQesyPuc7lZ01w1Co5el4xYaxe
 YknyMTq11qf/iYqVOW7sjoWW/YRuuMZ4+FhpI3o/SllVdN98iTwkk1kP3wcoBO5P
 bvzpm+WXHbv9OxifPrqkqv34+upbjfEmSogHudQzagBX4vl3rZRfBCdQGCAha0Uc
 ZYyg68kiil5sWmHI/Ln/ZjANYfbS5sF0CreuWxnmqcwKl2NSN/E=
 =/1yt
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-6.2-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

- Second batch of the lazy destroy patches
- First batch of KVM changes for kernel virtual != physical address support
- Removal of a unused function
2022-11-28 13:34:47 -05:00
Claudio Imbrenda d9459922a1 KVM: s390: pv: api documentation for asynchronous destroy
Add documentation for the new commands added to the KVM_S390_PV_COMMAND
ioctl.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20221111170632.77622-3-imbrenda@linux.ibm.com
Message-Id: <20221111170632.77622-3-imbrenda@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-11-23 09:06:50 +00:00
Kuppuswamy Sathyanarayanan 6c8c1406a6 virt: Add TDX guest driver
TDX guest driver exposes IOCTL interfaces to service TDX guest
user-specific requests. Currently, it is only used to allow the user to
get the TDREPORT to support TDX attestation.

Details about the TDX attestation process are documented in
Documentation/x86/tdx.rst, and the IOCTL details are documented in
Documentation/virt/coco/tdx-guest.rst.

Operations like getting TDREPORT involves sending a blob of data as
input and getting another blob of data as output. It was considered
to use a sysfs interface for this, but it doesn't fit well into the
standard sysfs model for configuring values. It would be possible to
do read/write on files, but it would need multiple file descriptors,
which would be somewhat messy. IOCTLs seem to be the best fitting
and simplest model for this use case. The AMD sev-guest driver also
uses the IOCTL interface to support attestation.

[Bagas Sanjaya: Ack is for documentation portion]
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/all/20221116223820.819090-3-sathyanarayanan.kuppuswamy%40linux.intel.com
2022-11-17 11:04:23 -08:00
Usama Arif 83f8a81dec KVM: arm64: Fix pvtime documentation
This includes table format and using reST labels for
cross-referencing to vcpu.rst.

Suggested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221103131210.3603385-1-usama.arif@bytedance.com
2022-11-11 16:29:22 +00:00
Gavin Shan 9cb1096f85 KVM: arm64: Enable ring-based dirty memory tracking
Enable ring-based dirty memory tracking on ARM64:

  - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL.

  - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP.

  - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page
    offset.

  - Add ARM64 specific kvm_arch_allow_write_without_running_vcpu() to
    keep the site of saving vgic/its tables out of the no-running-vcpu
    radar.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221110104914.31280-5-gshan@redhat.com
2022-11-10 13:11:58 +00:00
Gavin Shan 86bdf3ebcf KVM: Support dirty ring in conjunction with bitmap
ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
enabled. It's conflicting with that ring-based dirty page tracking always
requires a running VCPU context.

Introduce a new flavor of dirty ring that requires the use of both VCPU
dirty rings and a dirty bitmap. The expectation is that for non-VCPU
sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
the dirty bitmap. Userspace should scan the dirty bitmap before migrating
the VM to the target.

Use an additional capability to advertise this behavior. The newly added
capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.

Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Peter Xu <peterx@redhat.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221110104914.31280-4-gshan@redhat.com
2022-11-10 13:11:58 +00:00
Nico Boehr 6973091d1b KVM: s390: pv: don't allow userspace to set the clock under PV
When running under PV, the guest's TOD clock is under control of the
ultravisor and the hypervisor isn't allowed to change it. Hence, don't
allow userspace to change the guest's TOD clock by returning
-EOPNOTSUPP.

When userspace changes the guest's TOD clock, KVM updates its
kvm.arch.epoch field and, in addition, the epoch field in all state
descriptions of all VCPUs.

But, under PV, the ultravisor will ignore the epoch field in the state
description and simply overwrite it on next SIE exit with the actual
guest epoch. This leads to KVM having an incorrect view of the guest's
TOD clock: it has updated its internal kvm.arch.epoch field, but the
ultravisor ignores the field in the state description.

Whenever a guest is now waiting for a clock comparator, KVM will
incorrectly calculate the time when the guest should wake up, possibly
causing the guest to sleep for much longer than expected.

With this change, kvm_s390_set_tod() will now take the kvm->lock to be
able to call kvm_s390_pv_is_protected(). Since kvm_s390_set_tod_clock()
also takes kvm->lock, use __kvm_s390_set_tod_clock() instead.

The function kvm_s390_set_tod_clock is now unused, hence remove it.
Update the documentation to indicate the TOD clock attr calls can now
return -EOPNOTSUPP.

Fixes: 0f30350471 ("KVM: s390: protvirt: Do only reset registers that are accessible")
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20221011160712.928239-2-nrb@linux.ibm.com
Message-Id: <20221011160712.928239-2-nrb@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-11-07 10:14:15 +01:00
Linus Torvalds f311d498be ARM:
* Fixes for single-stepping in the presence of an async
   exception as well as the preservation of PSTATE.SS
 
 * Better handling of AArch32 ID registers on AArch64-only
   systems
 
 * Fixes for the dirty-ring API, allowing it to work on
   architectures with relaxed memory ordering
 
 * Advertise the new kvmarm mailing list
 
 * Various minor cleanups and spelling fixes
 
 RISC-V:
 
 * Improved instruction encoding infrastructure for
   instructions not yet supported by binutils
 
 * Svinval support for both KVM Host and KVM Guest
 
 * Zihintpause support for KVM Guest
 
 * Zicbom support for KVM Guest
 
 * Record number of signal exits as a VCPU stat
 
 * Use generic guest entry infrastructure
 
 x86:
 
 * Misc PMU fixes and cleanups.
 
 * selftests: fixes for Hyper-V hypercall
 
 * selftests: fix nx_huge_pages_test on TDP-disabled hosts
 
 * selftests: cleanups for fix_hypercall_test
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmM7OcMUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPAFgf/Rqc9hrXZVdbh2OZ+gScSsFsPK1zO
 DISUksLcXaYVYYsvQAEg/N2BPz3XbmO4jA+z8bIUrYTA7fC98we2C4jfR+EaX/fO
 +/Kzf0lAgu/nQZyFzUya+1jRsZqvVbC/HmDCI2kzN4u78e/LZ7NVcMijdV/ly6ib
 cq0b0LLqJHe/fcpJ806JZP3p5sndQhDmlUkZ2AWZf6CUKSEFcufbbYkt+84ZK4PL
 N9mEqXYQ3DXClLQmIBv+NZhtGlmADkWDE4BNouw8dVxhaXH7Hw/jfBHdb6SSHMRe
 tQ6Src1j8AYOhf5J35SMudgkbGcMelm0yeZ7Sizk+5Ft0EmdbZsnkvsGdQ==
 =4RA+
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
 "The main batch of ARM + RISC-V changes, and a few fixes and cleanups
  for x86 (PMU virtualization and selftests).

  ARM:

   - Fixes for single-stepping in the presence of an async exception as
     well as the preservation of PSTATE.SS

   - Better handling of AArch32 ID registers on AArch64-only systems

   - Fixes for the dirty-ring API, allowing it to work on architectures
     with relaxed memory ordering

   - Advertise the new kvmarm mailing list

   - Various minor cleanups and spelling fixes

  RISC-V:

   - Improved instruction encoding infrastructure for instructions not
     yet supported by binutils

   - Svinval support for both KVM Host and KVM Guest

   - Zihintpause support for KVM Guest

   - Zicbom support for KVM Guest

   - Record number of signal exits as a VCPU stat

   - Use generic guest entry infrastructure

  x86:

   - Misc PMU fixes and cleanups.

   - selftests: fixes for Hyper-V hypercall

   - selftests: fix nx_huge_pages_test on TDP-disabled hosts

   - selftests: cleanups for fix_hypercall_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (57 commits)
  riscv: select HAVE_POSIX_CPU_TIMERS_TASK_WORK
  RISC-V: KVM: Use generic guest entry infrastructure
  RISC-V: KVM: Record number of signal exits as a vCPU stat
  RISC-V: KVM: add __init annotation to riscv_kvm_init()
  RISC-V: KVM: Expose Zicbom to the guest
  RISC-V: KVM: Provide UAPI for Zicbom block size
  RISC-V: KVM: Make ISA ext mappings explicit
  RISC-V: KVM: Allow Guest use Zihintpause extension
  RISC-V: KVM: Allow Guest use Svinval extension
  RISC-V: KVM: Use Svinval for local TLB maintenance when available
  RISC-V: Probe Svinval extension form ISA string
  RISC-V: KVM: Change the SBI specification version to v1.0
  riscv: KVM: Apply insn-def to hlv encodings
  riscv: KVM: Apply insn-def to hfence encodings
  riscv: Introduce support for defining instructions
  riscv: Add X register names to gpr-nums
  KVM: arm64: Advertise new kvmarm mailing list
  kvm: vmx: keep constant definition format consistent
  kvm: mmu: fix typos in struct kvm_arch
  KVM: selftests: Fix nx_huge_pages_test on TDP-disabled hosts
  ...
2022-10-11 20:07:44 -07:00
Linus Torvalds 3604a7f568 This update includes the following changes:
API:
 
 - Feed untrusted RNGs into /dev/random.
 - Allow HWRNG sleeping to be more interruptible.
 - Create lib/utils module.
 - Setting private keys no longer required for akcipher.
 - Remove tcrypt mode=1000.
 - Reorganised Kconfig entries.
 
 Algorithms:
 
 - Load x86/sha512 based on CPU features.
 - Add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher.
 
 Drivers:
 
 - Add HACE crypto driver aspeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmM785cACgkQxycdCkmx
 i6dveBAAmGVYtrPmcGfA6CmzZ8ps9KdZxhjHjzLKwuqrOMulZvE2IYeUV4QtNqpQ
 6NLY2+TkqL0XIbCXoByIk32lMYIlXBaJdMYdHHDTeo7E2wqZn/46SPSWeNKazyJx
 dkL8Oj62nqDc2s0LOi3vLvod+sENFQ69R+vkHOa0fZhX0UBsac3NIXo+74Y2A7bE
 0+iQFKTWdNnoQzQ0j4q8WMiolKYh21iPZ9l5sjgMgichLCaE6PrITlRcaWrtPhey
 U1OmJtbTPsg+5X1r9KyLtoAXtBDONl66GQyne+p/ZYD8cMhxomjJaPlMhwWE/n4d
 d2KJKvoXoPPo4c+yNIS9hBav07ZriPl0q0jd2M1rd6oYTmFpaodTgIBfjvxO+wfV
 GoqDS8PEc42U1uwkuKC/cvfr6pB8WiybfXy+vSXBm/jUgIOO3y+eqsC8Jx9ZoQeG
 F+d34PYfJrJbmDRtcA6ZKdzN0OmKq7aCilx1kGKGPg0D+uq64FBo7zsT6XzTK8HL
 2Za9AACPn87xLQwGrKDSBfyrlSSIJm2FaIIPayUXHEo7cyoiZwbTpXRRJ1mDR+v9
 jzI+xPEXCthtjysuRmufNhTkiZUv3lZ8ORfQ0QFKR53tjZUm+dVQo0V/N/ZSXoSV
 SyRvXYO+ToXePAofNWl1LcO1grX/vxtFNedMkDLHXooRcnCaIYo=
 =rq2f
 -----END PGP SIGNATURE-----

Merge tag 'v6.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Feed untrusted RNGs into /dev/random
   - Allow HWRNG sleeping to be more interruptible
   - Create lib/utils module
   - Setting private keys no longer required for akcipher
   - Remove tcrypt mode=1000
   - Reorganised Kconfig entries

  Algorithms:
   - Load x86/sha512 based on CPU features
   - Add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher

  Drivers:
   - Add HACE crypto driver aspeed"

* tag 'v6.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (124 commits)
  crypto: aspeed - Remove redundant dev_err call
  crypto: scatterwalk - Remove unused inline function scatterwalk_aligned()
  crypto: aead - Remove unused inline functions from aead
  crypto: bcm - Simplify obtain the name for cipher
  crypto: marvell/octeontx - use sysfs_emit() to instead of scnprintf()
  hwrng: core - start hwrng kthread also for untrusted sources
  crypto: zip - remove the unneeded result variable
  crypto: qat - add limit to linked list parsing
  crypto: octeontx2 - Remove the unneeded result variable
  crypto: ccp - Remove the unneeded result variable
  crypto: aspeed - Fix check for platform_get_irq() errors
  crypto: virtio - fix memory-leak
  crypto: cavium - prevent integer overflow loading firmware
  crypto: marvell/octeontx - prevent integer overflows
  crypto: aspeed - fix build error when only CRYPTO_DEV_ASPEED is enabled
  crypto: hisilicon/qm - fix the qos value initialization
  crypto: sun4i-ss - use DEFINE_SHOW_ATTRIBUTE to simplify sun4i_ss_debugfs
  crypto: tcrypt - add async speed test for aria cipher
  crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher
  crypto: aria - prepare generic module for optimized implementations
  ...
2022-10-10 13:04:25 -07:00
Linus Torvalds ef688f8b8c The first batch of KVM patches, mostly covering x86, which I
am sending out early due to me travelling next week.  There is a
 lone mm patch for which Andrew gave an informal ack at
 https://lore.kernel.org/linux-mm/20220817102500.440c6d0a3fce296fdf91bea6@linux-foundation.org.
 
 I will send the bulk of ARM work, as well as other
 architectures, at the end of next week.
 
 ARM:
 
 * Account stage2 page table allocations in memory stats.
 
 x86:
 
 * Account EPT/NPT arm64 page table allocations in memory stats.
 
 * Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR accesses.
 
 * Drop eVMCS controls filtering for KVM on Hyper-V, all known versions of
   Hyper-V now support eVMCS fields associated with features that are
   enumerated to the guest.
 
 * Use KVM's sanitized VMCS config as the basis for the values of nested VMX
   capabilities MSRs.
 
 * A myriad event/exception fixes and cleanups.  Most notably, pending
   exceptions morph into VM-Exits earlier, as soon as the exception is
   queued, instead of waiting until the next vmentry.  This fixed
   a longstanding issue where the exceptions would incorrecly become
   double-faults instead of triggering a vmexit; the common case of
   page-fault vmexits had a special workaround, but now it's fixed
   for good.
 
 * A handful of fixes for memory leaks in error paths.
 
 * Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow.
 
 * Never write to memory from non-sleepable kvm_vcpu_check_block()
 
 * Selftests refinements and cleanups.
 
 * Misc typo cleanups.
 
 Generic:
 
 * remove KVM_REQ_UNHALT
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmM2zwcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNpbwf+MlVeOlzE5SBdrJ0TEnLmKUel1lSz
 QnZzP5+D65oD0zhCilUZHcg6G4mzZ5SdVVOvrGJvA0eXh25ruLNMF6jbaABkMLk/
 FfI1ybN7A82hwJn/aXMI/sUurWv4Jteaad20JC2DytBCnsW8jUqc49gtXHS2QWy4
 3uMsFdpdTAg4zdJKgEUfXBmQviweVpjjl3ziRyZZ7yaeo1oP7XZ8LaE1nR2l5m0J
 mfjzneNm5QAnueypOh5KhSwIvqf6WHIVm/rIHDJ1HIFbgfOU0dT27nhb1tmPwAcE
 +cJnnMUHjZqtCXteHkAxMClyRq0zsEoKk0OGvSOOMoq3Q0DavSXUNANOig==
 =/hqX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "The first batch of KVM patches, mostly covering x86.

  ARM:

   - Account stage2 page table allocations in memory stats

  x86:

   - Account EPT/NPT arm64 page table allocations in memory stats

   - Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR
     accesses

   - Drop eVMCS controls filtering for KVM on Hyper-V, all known
     versions of Hyper-V now support eVMCS fields associated with
     features that are enumerated to the guest

   - Use KVM's sanitized VMCS config as the basis for the values of
     nested VMX capabilities MSRs

   - A myriad event/exception fixes and cleanups. Most notably, pending
     exceptions morph into VM-Exits earlier, as soon as the exception is
     queued, instead of waiting until the next vmentry. This fixed a
     longstanding issue where the exceptions would incorrecly become
     double-faults instead of triggering a vmexit; the common case of
     page-fault vmexits had a special workaround, but now it's fixed for
     good

   - A handful of fixes for memory leaks in error paths

   - Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow

   - Never write to memory from non-sleepable kvm_vcpu_check_block()

   - Selftests refinements and cleanups

   - Misc typo cleanups

  Generic:

   - remove KVM_REQ_UNHALT"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
  KVM: remove KVM_REQ_UNHALT
  KVM: mips, x86: do not rely on KVM_REQ_UNHALT
  KVM: x86: never write to memory from kvm_vcpu_check_block()
  KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events
  KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending
  KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter
  KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set
  KVM: x86: lapic does not have to process INIT if it is blocked
  KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific
  KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed
  KVM: nVMX: Make an event request when pending an MTF nested VM-Exit
  KVM: x86: make vendor code check for all nested events
  mailmap: Update Oliver's email address
  KVM: x86: Allow force_emulation_prefix to be written without a reload
  KVM: selftests: Add an x86-only test to verify nested exception queueing
  KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes
  KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events()
  KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior
  KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions
  KVM: x86: Morph pending exceptions to pending VM-Exits at queue time
  ...
2022-10-09 09:39:55 -07:00
Paolo Bonzini fe4d9e4abf KVM/arm64 updates for v6.1
- Fixes for single-stepping in the presence of an async
   exception as well as the preservation of PSTATE.SS
 
 - Better handling of AArch32 ID registers on AArch64-only
   systems
 
 - Fixes for the dirty-ring API, allowing it to work on
   architectures with relaxed memory ordering
 
 - Advertise the new kvmarm mailing list
 
 - Various minor cleanups and spelling fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmM5hQcPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDoMUP/jra4HSmujLUB5G7Op8HxuurEecOc6xtw0Af
 AbDLlVc2Vs4rrdVh8GMc8D80atUAVitp8IFjdp/PzI2GTBTzWz43Gav2AbhgIJbJ
 xoFVHL8LkdHKyMbq10359DqGMqhIf41OFzGwhbzcx2V4pKNkSpjbCpu3bi/+Ybjg
 006ZpZc7NAU0rZgw9Flb/dhn0jw7RMc3orhoDQ4tBp1P/VhvqvgFt5bWipkvvBP7
 +lQK28ujG3ghST/hKRhg6ozgy5+6NEEHMuhErMYP8nIivRchX+pWF2Lb0qGH1e+U
 v2MZIZnIIUjyTV1vbYlxtltzfYmPuQ2MFNUBawI9tmlIOU9vJSCzeJS64uWK4KLV
 kbmk57OfC7rQoSNJH4jaKQp0YpIktrB9Vei97t4I7NwEmkjQj6cLTgg4tQrNqTiQ
 cFGeC9mE+lhFC8z1lCbna2eG631FxpPrB1SJ1/CU9wboam9dUfXGIvBPh+i2pvMZ
 vcxzUZJ11y+/uhp4k8i2PBwNno0iwRXd5MinwRUs2CR5vhs8qa5y7FVWKyqKpgI2
 xqr4lYTixJZL3mWkYyOQuClrTbT1zkoaPldLq6M7wvO08+QV8ryMeyKT+9s/gNQU
 dcYSwBCWZaOZm2nN8/zjxRb7VqZVu3cwyXi9XXUWNTCgIe/Q/SDPbXU/Hwbgzf8X
 UsQF7e9A
 =aNPK
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for v6.1

- Fixes for single-stepping in the presence of an async
  exception as well as the preservation of PSTATE.SS

- Better handling of AArch32 ID registers on AArch64-only
  systems

- Fixes for the dirty-ring API, allowing it to work on
  architectures with relaxed memory ordering

- Advertise the new kvmarm mailing list

- Various minor cleanups and spelling fixes
2022-10-03 15:33:32 -04:00
Marc Zyngier 671c8c7f9f KVM: Document weakly ordered architecture requirements for dirty ring
Now that the kernel can expose to userspace that its dirty ring
management relies on explicit ordering, document these new requirements
for VMMs to do the right thing.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20220926145120.27974-5-maz@kernel.org
2022-09-29 10:23:08 +01:00
Akhil Raj 7f77ebbf75 Delete duplicate words from kernel docs
I have deleted duplicate words like

to, guest, trace, when, we

Signed-off-by: Akhil Raj <lf32.dev@gmail.com>
Link: https://lore.kernel.org/r/20220829065239.4531-1-lf32.dev@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-27 13:21:43 -06:00
Paolo Bonzini c59fb12758 KVM: remove KVM_REQ_UNHALT
KVM_REQ_UNHALT is now unnecessary because it is replaced by the return
value of kvm_vcpu_block/kvm_vcpu_halt.  Remove it.

No functional change intended.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20220921003201.1441511-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:37:21 -04:00
Aaron Lewis b5cb32b16c KVM: x86: Delete duplicate documentation for KVM_X86_SET_MSR_FILTER
Two copies of KVM_X86_SET_MSR_FILTER somehow managed to make it's way
into the documentation.  Remove one copy and merge the difference from
the removed copy into the copy that's being kept.

Fixes: fd49e8ee70 ("Merge branch 'kvm-sev-cgroup' into HEAD")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20220712001045.2364298-2-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:02:29 -04:00
Jacky Li d8da2da21f crypto: ccp - Initialize PSP when reading psp data file failed
Currently the OS fails the PSP initialization when the file specified at
'init_ex_path' does not exist or has invalid content. However the SEV
spec just requires users to allocate 32KB of 0xFF in the file, which can
be taken care of by the OS easily.

To improve the robustness during the PSP init, leverage the retry
mechanism and continue the init process:

Before the first INIT_EX call, if the content is invalid or missing,
continue the process by feeding those contents into PSP instead of
aborting. PSP will then override it with 32KB 0xFF and return
SEV_RET_SECURE_DATA_INVALID status code. In the second INIT_EX call,
this 32KB 0xFF content will then be fed and PSP will write the valid
data to the file.

In order to do this, sev_read_init_ex_file should only be called once
for the first INIT_EX call. Calling it again for the second INIT_EX call
will cause the invalid file content overwriting the valid 32KB 0xFF data
provided by PSP in the first INIT_EX call.

Co-developed-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Jacky Li <jackyli@google.com>
Reported-by: Alper Gun <alpergun@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-08-26 18:50:07 +08:00
Bagas Sanjaya 19a7cc817a KVM: x86/MMU: properly format KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability table
There is unexpected warning on KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability
table, which cause the table to be rendered as paragraph text instead.

The warning is due to missing colon at capability name and returns keyword,
as well as improper alignment on multi-line returns field.

Fix the warning by adding missing colons and aligning the field.

Link: https://lore.kernel.org/lkml/20220627181937.3be67263@canb.auug.org.au/
Fixes: 084cc29f8b ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Matlack <dmatlack@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-next@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20220627095151.19339-3-bagasdotme@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-11 02:35:37 -04:00
Bagas Sanjaya b4aed4d85f Documentation: KVM: extend KVM_CAP_VM_DISABLE_NX_HUGE_PAGES heading underline
Extend heading underline for KVM_CAP_VM_DISABLE_NX_HUGE_PAGE to match
the heading text length.

Link: https://lore.kernel.org/lkml/20220627181937.3be67263@canb.auug.org.au/
Fixes: 084cc29f8b ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Matlack <dmatlack@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-next@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20220627095151.19339-2-bagasdotme@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-11 02:35:36 -04:00
Linus Torvalds 7c5c3a6177 ARM:
* Unwinder implementations for both nVHE modes (classic and
   protected), complete with an overflow stack
 
 * Rework of the sysreg access from userspace, with a complete
   rewrite of the vgic-v3 view to allign with the rest of the
   infrastructure
 
 * Disagregation of the vcpu flags in separate sets to better track
   their use model.
 
 * A fix for the GICv2-on-v3 selftest
 
 * A small set of cosmetic fixes
 
 RISC-V:
 
 * Track ISA extensions used by Guest using bitmap
 
 * Added system instruction emulation framework
 
 * Added CSR emulation framework
 
 * Added gfp_custom flag in struct kvm_mmu_memory_cache
 
 * Added G-stage ioremap() and iounmap() functions
 
 * Added support for Svpbmt inside Guest
 
 s390:
 
 * add an interface to provide a hypervisor dump for secure guests
 
 * improve selftests to use TAP interface
 
 * enable interpretive execution of zPCI instructions (for PCI passthrough)
 
 * First part of deferred teardown
 
 * CPU Topology
 
 * PV attestation
 
 * Minor fixes
 
 x86:
 
 * Permit guests to ignore single-bit ECC errors
 
 * Intel IPI virtualization
 
 * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS
 
 * PEBS virtualization
 
 * Simplify PMU emulation by just using PERF_TYPE_RAW events
 
 * More accurate event reinjection on SVM (avoid retrying instructions)
 
 * Allow getting/setting the state of the speaker port data bit
 
 * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent
 
 * "Notify" VM exit (detect microarchitectural hangs) for Intel
 
 * Use try_cmpxchg64 instead of cmpxchg64
 
 * Ignore benign host accesses to PMU MSRs when PMU is disabled
 
 * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior
 
 * Allow NX huge page mitigation to be disabled on a per-vm basis
 
 * Port eager page splitting to shadow MMU as well
 
 * Enable CMCI capability by default and handle injected UCNA errors
 
 * Expose pid of vcpu threads in debugfs
 
 * x2AVIC support for AMD
 
 * cleanup PIO emulation
 
 * Fixes for LLDT/LTR emulation
 
 * Don't require refcounted "struct page" to create huge SPTEs
 
 * Miscellaneous cleanups:
 ** MCE MSR emulation
 ** Use separate namespaces for guest PTEs and shadow PTEs bitmasks
 ** PIO emulation
 ** Reorganize rmap API, mostly around rmap destruction
 ** Do not workaround very old KVM bugs for L0 that runs with nesting enabled
 ** new selftests API for CPUID
 
 Generic:
 
 * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache
 
 * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmLnyo4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtQQf/XjVWiRcWLPR9dqzRM/vvRXpiG+UL
 jU93R7m6ma99aqTtrxV/AE+kHgamBlma3Cwo+AcWk9uCVNbIhFjv2YKg6HptKU0e
 oJT3zRYp+XIjEo7Kfw+TwroZbTlG6gN83l1oBLFMqiFmHsMLnXSI2mm8MXyi3dNB
 vR2uIcTAl58KIprqNNsYJ2dNn74ogOMiXYx9XzoA9/5Xb6c0h4rreHJa5t+0s9RO
 Gz7Io3PxumgsbJngjyL1Ve5oxhlIAcZA8DU0PQmjxo3eS+k6BcmavGFd45gNL5zg
 iLpCh4k86spmzh8CWkAAwWPQE4dZknK6jTctJc0OFVad3Z7+X7n0E8TFrA==
 =PM8o
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Quite a large pull request due to a selftest API overhaul and some
  patches that had come in too late for 5.19.

  ARM:

   - Unwinder implementations for both nVHE modes (classic and
     protected), complete with an overflow stack

   - Rework of the sysreg access from userspace, with a complete rewrite
     of the vgic-v3 view to allign with the rest of the infrastructure

   - Disagregation of the vcpu flags in separate sets to better track
     their use model.

   - A fix for the GICv2-on-v3 selftest

   - A small set of cosmetic fixes

  RISC-V:

   - Track ISA extensions used by Guest using bitmap

   - Added system instruction emulation framework

   - Added CSR emulation framework

   - Added gfp_custom flag in struct kvm_mmu_memory_cache

   - Added G-stage ioremap() and iounmap() functions

   - Added support for Svpbmt inside Guest

  s390:

   - add an interface to provide a hypervisor dump for secure guests

   - improve selftests to use TAP interface

   - enable interpretive execution of zPCI instructions (for PCI
     passthrough)

   - First part of deferred teardown

   - CPU Topology

   - PV attestation

   - Minor fixes

  x86:

   - Permit guests to ignore single-bit ECC errors

   - Intel IPI virtualization

   - Allow getting/setting pending triple fault with
     KVM_GET/SET_VCPU_EVENTS

   - PEBS virtualization

   - Simplify PMU emulation by just using PERF_TYPE_RAW events

   - More accurate event reinjection on SVM (avoid retrying
     instructions)

   - Allow getting/setting the state of the speaker port data bit

   - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls
     are inconsistent

   - "Notify" VM exit (detect microarchitectural hangs) for Intel

   - Use try_cmpxchg64 instead of cmpxchg64

   - Ignore benign host accesses to PMU MSRs when PMU is disabled

   - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior

   - Allow NX huge page mitigation to be disabled on a per-vm basis

   - Port eager page splitting to shadow MMU as well

   - Enable CMCI capability by default and handle injected UCNA errors

   - Expose pid of vcpu threads in debugfs

   - x2AVIC support for AMD

   - cleanup PIO emulation

   - Fixes for LLDT/LTR emulation

   - Don't require refcounted "struct page" to create huge SPTEs

   - Miscellaneous cleanups:
      - MCE MSR emulation
      - Use separate namespaces for guest PTEs and shadow PTEs bitmasks
      - PIO emulation
      - Reorganize rmap API, mostly around rmap destruction
      - Do not workaround very old KVM bugs for L0 that runs with nesting enabled
      - new selftests API for CPUID

  Generic:

   - Fix races in gfn->pfn cache refresh; do not pin pages tracked by
     the cache

   - new selftests API using struct kvm_vcpu instead of a (vm, id)
     tuple"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits)
  selftests: kvm: set rax before vmcall
  selftests: KVM: Add exponent check for boolean stats
  selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test
  selftests: KVM: Check stat name before other fields
  KVM: x86/mmu: remove unused variable
  RISC-V: KVM: Add support for Svpbmt inside Guest/VM
  RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap()
  RISC-V: KVM: Add G-stage ioremap() and iounmap() functions
  KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache
  RISC-V: KVM: Add extensible CSR emulation framework
  RISC-V: KVM: Add extensible system instruction emulation framework
  RISC-V: KVM: Factor-out instruction emulation into separate sources
  RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run
  RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function
  RISC-V: KVM: Fix variable spelling mistake
  RISC-V: KVM: Improve ISA extension by using a bitmap
  KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs()
  KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog
  KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT
  KVM: x86: Do not block APIC write for non ICR registers
  ...
2022-08-04 14:59:54 -07:00
Linus Torvalds aad26f55f4 This was a moderately busy cycle for documentation, but nothing all that
earth-shaking:
 
 - More Chinese translations, and an update to the Italian translations.
   The Japanese, Korean, and traditional Chinese translations are
   more-or-less unmaintained at this point, instead.
 
 - Some build-system performance improvements.
 
 - The removal of the archaic submitting-drivers.rst document, with the
   movement of what useful material that remained into other docs.
 
 - Improvements to sphinx-pre-install to, hopefully, give more useful
   suggestions.
 
 - A number of build-warning fixes
 
 Plus the usual collection of typo fixes, updates, and more.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmLn9OwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YtrwIAJNZoDYJJIRuVHnFkAn5EJ4b/chnR1dSTBtn
 WdE/1zdAlMBWVlEGO48VZybph9Sk0v+cUGf+yviDgASQrfOhRRTkg/0u6XaBAYO0
 +C2D1QDd9DggGgajxsfJfTdD3IuB78mGmCQvP17XIJW+NK1CK9rXZBnj6WC5/HJw
 PCHzeeVreBxOS3W9GelMYa6vjVl7dv81x4DPllnsgU2AMk0/Ce0MVjeIZ695sOeP
 Ki6jZgC2GsgFSK5kBC35OiDe5q+fDzlLfek34EUCn4SIbMALSUYWO1db122w5Pme
 Ej0+UTBhD19WH1uB/rcVKnVWugi7UEUJexZsao+nC7UrdIVtYq0=
 =83BG
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.0' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "This was a moderately busy cycle for documentation, but nothing
  all that earth-shaking:

   - More Chinese translations, and an update to the Italian
     translations.

     The Japanese, Korean, and traditional Chinese translations
     are more-or-less unmaintained at this point, instead.

   - Some build-system performance improvements.

   - The removal of the archaic submitting-drivers.rst document,
     with the movement of what useful material that remained into
     other docs.

   - Improvements to sphinx-pre-install to, hopefully, give more
     useful suggestions.

   - A number of build-warning fixes

  Plus the usual collection of typo fixes, updates, and more"

* tag 'docs-6.0' of git://git.lwn.net/linux: (92 commits)
  docs: efi-stub: Fix paths for x86 / arm stubs
  Docs/zh_CN: Update the translation of sched-stats to 5.19-rc8
  Docs/zh_CN: Update the translation of pci to 5.19-rc8
  Docs/zh_CN: Update the translation of pci-iov-howto to 5.19-rc8
  Docs/zh_CN: Update the translation of usage to 5.19-rc8
  Docs/zh_CN: Update the translation of testing-overview to 5.19-rc8
  Docs/zh_CN: Update the translation of sparse to 5.19-rc8
  Docs/zh_CN: Update the translation of kasan to 5.19-rc8
  Docs/zh_CN: Update the translation of iio_configfs to 5.19-rc8
  doc:it_IT: align Italian documentation
  docs: Remove spurious tag from admin-guide/mm/overcommit-accounting.rst
  Documentation: process: Update email client instructions for Thunderbird
  docs: ABI: correct QEMU fw_cfg spec path
  doc/zh_CN: remove submitting-driver reference from docs
  docs: zh_TW: align to submitting-drivers removal
  docs: zh_CN: align to submitting-drivers removal
  docs: ko_KR: howto: remove reference to removed submitting-drivers
  docs: ja_JP: howto: remove reference to removed submitting-drivers
  docs: it_IT: align to submitting-drivers removal
  docs: process: remove outdated submitting-drivers.rst
  ...
2022-08-02 19:24:24 -07:00
Linus Torvalds 0cec3f24a7 arm64 updates for 5.20
- Remove unused generic cpuidle support (replaced by PSCI version)
 
 - Fix documentation describing the kernel virtual address space
 
 - Handling of some new CPU errata in Arm implementations
 
 - Rework of our exception table code in preparation for handling
   machine checks (i.e. RAS errors) more gracefully
 
 - Switch over to the generic implementation of ioremap()
 
 - Fix lockdep tracking in NMI context
 
 - Instrument our memory barrier macros for KCSAN
 
 - Rework of the kPTI G->nG page-table repainting so that the MMU remains
   enabled and the boot time is no longer slowed to a crawl for systems
   which require the late remapping
 
 - Enable support for direct swapping of 2MiB transparent huge-pages on
   systems without MTE
 
 - Fix handling of MTE tags with allocating new pages with HW KASAN
 
 - Expose the SMIDR register to userspace via sysfs
 
 - Continued rework of the stack unwinder, particularly improving the
   behaviour under KASAN
 
 - More repainting of our system register definitions to match the
   architectural terminology
 
 - Improvements to the layout of the vDSO objects
 
 - Support for allocating additional bits of HWCAP2 and exposing
   FEAT_EBF16 to userspace on CPUs that support it
 
 - Considerable rework and optimisation of our early boot code to reduce
   the need for cache maintenance and avoid jumping in and out of the
   kernel when handling relocation under KASLR
 
 - Support for disabling SVE and SME support on the kernel command-line
 
 - Support for the Hisilicon HNS3 PMU
 
 - Miscellanous cleanups, trivial updates and minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmLeccUQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNCysB/4ml92RJLhVwRAofbtFfVgVz3JLTSsvob9x
 Z7FhNDxfM/G32wKtOHU9tHkGJ+PMVWOPajukzxkMhxmilfTyHBbiisNWVRjKQxj4
 wrd07DNXPIv3bi8SWzS1y2y8ZqujZWjNJlX8SUCzEoxCVtuNKwrh96kU1jUjrkFZ
 kBo4E4wBWK/qW29nClGSCgIHRQNJaB/jvITlQhkqIb0pwNf3sAUzW7QoF1iTZWhs
 UswcLh/zC4q79k9poegdCt8chV5OBDLtLPnMxkyQFvsLYRp3qhyCSQQY/BxvO5JS
 jT9QR6d+1ewET9BFhqHlIIuOTYBCk3xn/PR9AucUl+ZBQd2tO4B1
 =LVH0
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "Highlights include a major rework of our kPTI page-table rewriting
  code (which makes it both more maintainable and considerably faster in
  the cases where it is required) as well as significant changes to our
  early boot code to reduce the need for data cache maintenance and
  greatly simplify the KASLR relocation dance.

  Summary:

   - Remove unused generic cpuidle support (replaced by PSCI version)

   - Fix documentation describing the kernel virtual address space

   - Handling of some new CPU errata in Arm implementations

   - Rework of our exception table code in preparation for handling
     machine checks (i.e. RAS errors) more gracefully

   - Switch over to the generic implementation of ioremap()

   - Fix lockdep tracking in NMI context

   - Instrument our memory barrier macros for KCSAN

   - Rework of the kPTI G->nG page-table repainting so that the MMU
     remains enabled and the boot time is no longer slowed to a crawl
     for systems which require the late remapping

   - Enable support for direct swapping of 2MiB transparent huge-pages
     on systems without MTE

   - Fix handling of MTE tags with allocating new pages with HW KASAN

   - Expose the SMIDR register to userspace via sysfs

   - Continued rework of the stack unwinder, particularly improving the
     behaviour under KASAN

   - More repainting of our system register definitions to match the
     architectural terminology

   - Improvements to the layout of the vDSO objects

   - Support for allocating additional bits of HWCAP2 and exposing
     FEAT_EBF16 to userspace on CPUs that support it

   - Considerable rework and optimisation of our early boot code to
     reduce the need for cache maintenance and avoid jumping in and out
     of the kernel when handling relocation under KASLR

   - Support for disabling SVE and SME support on the kernel
     command-line

   - Support for the Hisilicon HNS3 PMU

   - Miscellanous cleanups, trivial updates and minor fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (136 commits)
  arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}
  arm64: fix KASAN_INLINE
  arm64/hwcap: Support FEAT_EBF16
  arm64/cpufeature: Store elf_hwcaps as a bitmap rather than unsigned long
  arm64/hwcap: Document allocation of upper bits of AT_HWCAP
  arm64: enable THP_SWAP for arm64
  arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52
  arm64: errata: Remove AES hwcap for COMPAT tasks
  arm64: numa: Don't check node against MAX_NUMNODES
  drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX
  perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node()
  docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING
  arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"
  mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON
  mm: kasan: Skip unpoisoning of user pages
  mm: kasan: Ensure the tags are visible before the tag in page->flags
  drivers/perf: hisi: add driver for HNS3 PMU
  drivers/perf: hisi: Add description for HNS3 PMU driver
  drivers/perf: riscv_pmu_sbi: perf format
  perf/arm-cci: Use the bitmap API to allocate bitmaps
  ...
2022-08-01 10:37:00 -07:00
Paolo Bonzini 63f4b21041 Merge remote-tracking branch 'kvm/next' into kvm-next-5.20
KVM/s390, KVM/x86 and common infrastructure changes for 5.20

x86:

* Permit guests to ignore single-bit ECC errors

* Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache

* Intel IPI virtualization

* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS

* PEBS virtualization

* Simplify PMU emulation by just using PERF_TYPE_RAW events

* More accurate event reinjection on SVM (avoid retrying instructions)

* Allow getting/setting the state of the speaker port data bit

* Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent

* "Notify" VM exit (detect microarchitectural hangs) for Intel

* Cleanups for MCE MSR emulation

s390:

* add an interface to provide a hypervisor dump for secure guests

* improve selftests to use TAP interface

* enable interpretive execution of zPCI instructions (for PCI passthrough)

* First part of deferred teardown

* CPU Topology

* PV attestation

* Minor fixes

Generic:

* new selftests API using struct kvm_vcpu instead of a (vm, id) tuple

x86:

* Use try_cmpxchg64 instead of cmpxchg64

* Bugfixes

* Ignore benign host accesses to PMU MSRs when PMU is disabled

* Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior

* x86/MMU: Allow NX huge pages to be disabled on a per-vm basis

* Port eager page splitting to shadow MMU as well

* Enable CMCI capability by default and handle injected UCNA errors

* Expose pid of vcpu threads in debugfs

* x2AVIC support for AMD

* cleanup PIO emulation

* Fixes for LLDT/LTR emulation

* Don't require refcounted "struct page" to create huge SPTEs

x86 cleanups:

* Use separate namespaces for guest PTEs and shadow PTEs bitmasks

* PIO emulation

* Reorganize rmap API, mostly around rmap destruction

* Do not workaround very old KVM bugs for L0 that runs with nesting enabled

* new selftests API for CPUID
2022-08-01 03:21:00 -04:00
Paolo Bonzini a4850b5590 KVM: s390x: Fixes and features for 5.20
* First part of deferred teardown
 * CPU Topology
 * interpretive execution for PCI instructions
 * PV attestation
 * Minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEoWuZBM6M3lCBSfTnuARItAMU6BMFAmLZaj0ACgkQuARItAMU
 6BPmPw/+PIZ7xxoiuUCCWlFi+RsfgTH1OWGoLL2B8zQHyJJ+IS4puGqKfVdusl56
 GhPPlTkBG9gbhRXUMDWnKgp4JLOIj8ngbgQ8zcD/xeH8tzm4iRy+uj7PxCe+kQGB
 CY3rL42HwEvFJUbLzi0/8ahLjz6t7+hq7okMdhq9/MoIJBp/w74QZb+chpbuCpi/
 /nwdfon85NYTLCK73g6kjjHzzA3hQfUmheVkOt288Qp3yKLh77blJ79SRl3xHeuH
 PBTqsLoxMiJyFfhJQ9U5HnKcIMAD6vA4ayoj1PAZgToYt40B3k8K6HQF+07TcZuy
 KABAqmUMNgbapUC74+U45d1eLAl1TGbZSuaq5ymb9jyFyMxGN2Vp6K9O6Ebd+ghy
 NsGc+CLRPDTiwi/D+QQpGrIksqcMEy+ocGGjbeIuRQxao6C9+KladP5nRxXJpDZf
 Vrupw+gDjBPozmQAA50VtR+ad5rdEdbSmUvo0nIeBpQ4VJxLFnA0tXl4XOEEFr2M
 214yv691KFXhVtaV0lHRKoCQ78OAix6XyjHI0rukGnqNNo7w+V3htU0oG0ifWEU9
 96BpKKpjfsCRhN1h2BF8MTPMEb6AGsI2hLCm3neGxrcT7KM1chk2Z3qlGdrc3xZT
 BIj7H23trC6HG0vCi1rHAn+S0zmcnt5TCaRUNQF5nUeLYfV1FC0=
 =IB9Y
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-5.20-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390x: Fixes and features for 5.20

* First part of deferred teardown
* CPU Topology
* interpretive execution for PCI instructions
* PV attestation
* Minor fixes
2022-07-22 03:14:32 -04:00
Pierre Morel f5ecfee944 KVM: s390: resetting the Topology-Change-Report
During a subsystem reset the Topology-Change-Report is cleared.

Let's give userland the possibility to clear the MTCR in the case
of a subsystem reset.

To migrate the MTCR, we give userland the possibility to
query the MTCR state.

We indicate KVM support for the CPU topology facility with a new
KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Message-Id: <20220714194334.127812-1-pmorel@linux.ibm.com>
Link: https://lore.kernel.org/all/20220714194334.127812-1-pmorel@linux.ibm.com/
[frankja@linux.ibm.com: Simple conflict resolution in Documentation/virt/kvm/api.rst]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-07-20 10:57:37 +02:00
Oliver Upton 450a563924 KVM: stats: Fix value for KVM_STATS_UNIT_MAX for boolean stats
commit 1b870fa557 ("kvm: stats: tell userspace which values are
boolean") added a new stat unit (boolean) but failed to raise
KVM_STATS_UNIT_MAX.

Fix by pointing UNIT_MAX at the new max value of UNIT_BOOLEAN.

Fixes: 1b870fa557 ("kvm: stats: tell userspace which values are boolean")
Reported-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20220719125229.2934273-1-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-19 08:54:11 -04:00
Paolo Bonzini 942d9e8952 Documentation: kvm: clarify histogram units
In the case of histogram statistics, the values are always sample
counts; the unit instead applies to the bucket range.  For example,
halt_poll_success_hist is a nanosecond statistic because the buckets are
for 0ns, 1ns, 2-3ns, 4-7ns etc.  There isn't really any other sensible
interpretation, but clarify this anyway in the Documentation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 08:02:09 -04:00
Paolo Bonzini 1b870fa557 kvm: stats: tell userspace which values are boolean
Some of the statistics values exported by KVM are always only 0 or 1.
It can be useful to export this fact to userspace so that it can track
them specially (for example by polling the value every now and then to
compute a % of time spent in a specific state).

Therefore, add "boolean value" as a new "unit".  While it is not exactly
a unit, it walks and quacks like one.  In particular, using the type
would be wrong because boolean values could be instantaneous or peak
values (e.g. "is the rmap allocated?") or even two-bucket histograms
(e.g. "number of posted vs. non-posted interrupt injections").

Suggested-by: Amneesh Singh <natto@weirdnatto.in>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14 08:01:59 -04:00
Sean Christopherson 43bb9e000e KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specific
Add a "UD" clause to KVM_X86_QUIRK_MWAIT_NEVER_FAULTS to make it clear
that the quirk only controls the #UD behavior of MONITOR/MWAIT.  KVM
doesn't currently enforce fault checks when MONITOR/MWAIT are supported,
but that could change in the future.  SVM also has a virtualization hole
in that it checks all faults before intercepts, and so "never faults" is
already a lie when running on SVM.

Fixes: bfbcc81bb8 ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220711225753.1073989-4-seanjc@google.com
2022-07-13 18:14:05 -07:00
Michael Kelley ab3e69fc4d Documentation: hyperv: Add overview of clocks and timers
Add documentation topic for clocks and timers when running as a
guest on Hyper-V.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1657561704-12631-4-git-send-email-mikelley@microsoft.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-13 14:04:48 -06:00
Michael Kelley ac1129e79e Documentation: hyperv: Add overview of VMbus
Add documentation topic for using VMbus when running as a guest
on Hyper-V.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1657561704-12631-3-git-send-email-mikelley@microsoft.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-13 14:04:48 -06:00
Michael Kelley ec7c56812f Documentation: hyperv: Add overview of Hyper-V enlightenments
Add an initial documentation topic for Linux enlightenments to
run as a guest on Microsoft's Hyper-V hypervisor, linked under
the "virt" documentation area. Update the virt doc index.rst
and the MAINTAINERS file.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1657561704-12631-2-git-send-email-mikelley@microsoft.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-13 14:04:48 -06:00
Bagas Sanjaya 5efab5cdf0 Documentation: kvm: extend KVM_S390_ZPCI_OP subheading underline
Stephen Rothwell reported the htmldocs warning:

Documentation/virt/kvm/api.rst:5959: WARNING: Title underline too short.

4.137 KVM_S390_ZPCI_OP
--------------------

The warning is due to subheading underline on KVM_S390_ZPCI_OP section is
short of 2 dashes.

Extend the underline to fix the warning.

Link: https://lore.kernel.org/linux-next/20220711205557.183c3b14@canb.auug.org.au/
Fixes: a0c4d1109d6cc5 ("KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Matthew Rosato <mjrosato@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: kvm@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20220712092954.142027-4-bagasdotme@gmail.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-13 10:10:11 +02:00
Christian Borntraeger d41b5e0176 KVM: s390/pci: enable zPCI for interpretive execution
Add the necessary code in s390 base, pci and KVM to enable interpretion
 of PCI pasthru.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+SKTgaM0CPnbq/vKEXu8gLWmHHwFAmLL7HcACgkQEXu8gLWm
 HHz0JA/8C/pG5JdeOfKA6ZgWuUtxh8NRAmn+XEh+sAPpdK1cmEc1Qt/UKteSFel4
 cmqfaCELalq/BaFxtPS7Wn8Rf4pY8/GwEzwM0dNiS09pTWv0YMXql6+013nr1TJU
 hWx5Pm9Za+T/UnbbHqlyJfjMf7/HELHmQYemDpCr6n1sIYMjsWIJI/P6ZsQiG/8V
 iDZQGIM8mfUC+PMzxsYAQZQB3nm6noZfnWlAcuChDCmgk2ZxdXSdZlHneiLLiYlb
 yZPOyTysA0H2iFgRGfXMI4Oz6vegr6xAcZ2c9mkc8lM42yKHQNpPa0PqEY+EzVV8
 0iaMT3LKWQRdjzTq6E4I5wb74KQn/t1TbTzM5wznOQ6GySRhPvnXVLOuYyUf5d+0
 PwtnfKyx2C5UtOn47Xuujp5FClP8NI8Se5uq6Myei5OtYAvrQtOFxiJAixLx8nCb
 ca/migenYr+R5zYn5g3o6oo2BUJfF3Y1Q8nazz602JRu42aZzVFu2GNB062YjleK
 w7SfIZNTh0picxSmoehSOQMVaiGY/C/ow7Xa+bLaCITQC3s8HY73m3gynaVOB23X
 2umrC3HkTnH2ymqvDC6O/5QG7IUlSfjbWzN0TdmPfV5KeM7BmBvP4vxqxRYyTY7b
 7UhFg820fZKZu4Ul740a2+HBNw73T8fc4xbZVJ6glJo3AdWQD5s=
 =YD+W
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-pci-5.20' into kernelorgnext

KVM: s390/pci: enable zPCI for interpretive execution

Add the necessary code in s390 base, pci and KVM to enable interpretion
of PCI pasthru.
2022-07-11 11:28:57 +02:00
Matthew Rosato db1c875e05 KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices
The KVM_S390_ZPCI_OP ioctl provides a mechanism for managing
hardware-assisted virtualization features for s390x zPCI passthrough.
Add the first 2 operations, which can be used to enable/disable
the specified device for Adapter Event Notification interpretation.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/r/20220606203325.110625-21-mjrosato@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11 09:54:38 +02:00
Mauro Carvalho Chehab 8a5d192166 Documentation: KVM: update s390-diag.rst reference
Changeset daec8d4083 ("Documentation: KVM: add separate directories for architecture-specific documentation")
renamed: Documentation/virt/kvm/s390-diag.rst
to: Documentation/virt/kvm/s390/s390-diag.rst.

Update its cross-reference accordingly.

Fixes: daec8d4083 ("Documentation: KVM: add separate directories for architecture-specific documentation")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/85b81e4678bbe23d0e9692616798762a6465f0a3.1656234456.git.mchehab@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-07 13:10:00 -06:00
Mauro Carvalho Chehab 48b36e59ac Documentation: KVM: update msr.rst reference
Changeset daec8d4083 ("Documentation: KVM: add separate directories for architecture-specific documentation")
renamed: Documentation/virt/kvm/msr.rst
to: Documentation/virt/kvm/x86/msr.rst.

Update its cross-reference accordingly.

Fixes: daec8d4083 ("Documentation: KVM: add separate directories for architecture-specific documentation")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/5652b7f5caff3b817a660b75f1f319a2f8962380.1656234456.git.mchehab@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-07 13:09:59 -06:00
Mauro Carvalho Chehab 7ac3945d8e Documentation: KVM: update amd-memory-encryption.rst references
Changeset daec8d4083 ("Documentation: KVM: add separate directories for architecture-specific documentation")
renamed: Documentation/virt/kvm/amd-memory-encryption.rst
to: Documentation/virt/kvm/x86/amd-memory-encryption.rst.

Update the cross-references accordingly.

Fixes: daec8d4083 ("Documentation: KVM: add separate directories for architecture-specific documentation")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/fd80db889e34aae87a4ca88cad94f650723668f4.1656234456.git.mchehab@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-07 13:09:59 -06:00
Mauro Carvalho Chehab e38fd63749 Documentation: KVM: update s390-pv.rst reference
Changesets: daec8d4083 ("Documentation: KVM: add separate directories for architecture-specific documentation")
and: daec8d4083 ("Documentation: KVM: add separate directories for architecture-specific documentation")
renamed: Documentation/virt/kvm/s390-pv.rst
to: Documentation/virt/kvm/s390/s390-pv.rst.

Update its cross-reference accordingly.

Fixes: daec8d4083 ("Documentation: KVM: add separate directories for architecture-specific documentation")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/e2676f087d287db0bc31ae7c05c80ce5adf93333.1656234456.git.mchehab@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-07 13:09:59 -06:00
Marc Zyngier 7ddb0c3df7 arm64: Rename the VHE switch to "finalise_el2"
as we are about to perform a lot more in 'mutate_to_vhe' than
we currently do, this function really becomes the point where
we finalise the basic EL2 configuration.

Reflect this into the code by renaming a bunch of things:
- HVC_VHE_RESTART -> HVC_FINALISE_EL2
- switch_to_vhe --> finalise_el2
- mutate_to_vhe -> __finalise_el2

No functional changes.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220630160500.1536744-2-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-01 15:22:51 +01:00
Steven Lung 5b8d9ee003 docs: UML: fix typo
Replace 'absense' with 'absence'.

Signed-off-by: Steven Lung <1030steven@gmail.com>
Link: https://lore.kernel.org/r/20220621072910.4704-1-1030steven@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-06-24 13:18:21 -06:00
Ben Gardon 084cc29f8b KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis
In some cases, the NX hugepage mitigation for iTLB multihit is not
needed for all guests on a host. Allow disabling the mitigation on a
per-VM basis to avoid the performance hit of NX hugepages on trusted
workloads.

In order to disable NX hugepages on a VM, ensure that the userspace
actor has permission to reboot the system. Since disabling NX hugepages
would allow a guest to crash the system, it is similar to reboot
permissions.

Ideally, KVM would require userspace to prove it has access to KVM's
nx_huge_pages module param, e.g. so that userspace can opt out without
needing full reboot permissions.  But getting access to the module param
file info is difficult because it is buried in layers of sysfs and module
glue. Requiring CAP_SYS_BOOT is sufficient for all known use cases.

Suggested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-9-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24 04:51:49 -04:00
Sean Christopherson bfbcc81bb8 KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior
Add a quirk for KVM's behavior of emulating intercepted MONITOR/MWAIT
instructions a NOPs regardless of whether or not they are supported in
guest CPUID.  KVM's current behavior was likely motiviated by a certain
fruity operating system that expects MONITOR/MWAIT to be supported
unconditionally and blindly executes MONITOR/MWAIT without first checking
CPUID.  And because KVM does NOT advertise MONITOR/MWAIT to userspace,
that's effectively the default setup for any VMM that regurgitates
KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID2.

Note, this quirk interacts with KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT.  The
behavior is actually desirable, as userspace VMMs that want to
unconditionally hide MONITOR/MWAIT from the guest can leave the
MISC_ENABLE quirk enabled.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220608224516.3788274-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20 11:50:42 -04:00
Sean Christopherson 8deb03e75f KVM: Fix references to non-existent KVM_CAP_TRIPLE_FAULT_EVENT
The x86-only KVM_CAP_TRIPLE_FAULT_EVENT was (appropriately) renamed to
KVM_CAP_X86_TRIPLE_FAULT_EVENT when the patches were applied, but the
docs and selftests got left behind.  Fix them.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-11 10:14:28 -04:00
Paul Durrant b172862241 KVM: x86: PIT: Preserve state of speaker port data bit
Currently the state of the speaker port (0x61) data bit (bit 1) is not
saved in the exported state (kvm_pit_state2) and hence is lost when
re-constructing guest state.

This patch removes the 'speaker_data_port' field from kvm_kpit_state and
instead tracks the state using a new KVM_PIT_FLAGS_SPEAKER_DATA_ON flag
defined in the API.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Message-Id: <20220531124421.1427-1-pdurrant@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 13:06:20 -04:00
Tao Xu 2f4073e08f KVM: VMX: Enable Notify VM exit
There are cases that malicious virtual machines can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs.

VMM can enable notify VM exit that a VM exit generated if no event
window occurs in VM non-root mode for a specified amount of time (notify
window).

Feature enabling:
- The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to
  enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust
  the expected notify window.
- Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space
  can query and enable this feature in per-VM scope. The argument is a
  64bit value: bits 63:32 are used for notify window, and bits 31:0 are
  for flags. Current supported flags:
  - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify
    window provided.
  - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen.
- It's safe to even set notify window to zero since an internal hardware
  threshold is added to vmcs.notify_window.

VM exit handling:
- Introduce a vcpu state notify_window_exits to records the count of
  notify VM exits and expose it through the debugfs.
- Notify VM exit can happen incident to delivery of a vector event.
  Allow it in KVM.
- Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID
  bit is set.

Nested handling
- Nested notify VM exits are not supported yet. Keep the same notify
  window control in vmcs02 as vmcs01, so that L1 can't escape the
  restriction of notify VM exits through launching L2 VM.

Notify VM exit is defined in latest Intel Architecture Instruction Set
Extensions Programming Reference, chapter 9.2.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 05:56:24 -04:00
Chenyi Qiang ed2351174e KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault
For the triple fault sythesized by KVM, e.g. the RSM path or
nested_vmx_abort(), if KVM exits to userspace before the request is
serviced, userspace could migrate the VM and lose the triple fault.

Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a
new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and
restore the triple fault event. This extension is guarded by a new KVM
capability KVM_CAP_TRIPLE_FAULT_EVENT.

Note that in the set_vcpu_events path, userspace is able to set/clear
the triple fault request through triple_fault.pending field.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 05:20:53 -04:00
Zeng Guang 3587531638 KVM: x86: Allow userspace to set maximum VCPU id for VM
Introduce new max_vcpu_ids in KVM for x86 architecture. Userspace
can assign maximum possible vcpu id for current VM session using
KVM_CAP_MAX_VCPU_ID of KVM_ENABLE_CAP ioctl().

This is done for x86 only because the sole use case is to guide
memory allocation for PID-pointer table, a structure needed to
enable VMX IPI.

By default, max_vcpu_ids set as KVM_MAX_VCPU_IDS.

Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154444.11888-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08 04:47:31 -04:00
Paolo Bonzini 5552de7b92 KVM: s390: pvdump and selftest improvements
- add an interface to provide a hypervisor dump for secure guests
 - improve selftests to show tests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+SKTgaM0CPnbq/vKEXu8gLWmHHwFAmKXf2wACgkQEXu8gLWm
 HHzu1Q//WjEuOX5nBjklMUlDB2oB2+vFSyW9lE7x9m38EnFTH8QTfH695ChVoNN+
 j06Fhd4ENjxqTTYs7z67tP4TSQ/LhB/GsPydKCEOnB/63+k2cnYeS3wsv19213F0
 IyvpN6MkzxoktV4m1EtKhlvXGpEBoXZCczgBLj3FYlNQ7kO8RsSkF9rOnhuP9Yjh
 l2876bWHWlbU0qWmRSAu0spkwHWjtyh/bnQKzXotQyrQ9bo1yMQvhe2HH8HVTSio
 cjRlseWVi01rJKzKcs6D7MFMctLKr5y0onxBgGJnRh27KoBY195ICH2Jz2LfJoor
 EP57YcXZqfxzKCGHTGgVYMgFeixX6nzBgqTpDIHMQzvoM1IrQKl+d5riepO03xpS
 gZxHtJqZi8s+t8w0ZFBHj83VXkzFyLuCIeui9vo3cQ00K7bBrNUSw1BAdqT5HTzW
 K2R4jSQaszjw8mDz3R3G1+yg6PjMS6cDEU1+G2Id7xSYTV3lJnBDVzas7aEUNCC4
 LzIrD5c4dscyZzIjAp9huVwpZoCNLy6jtecRTaGhA2YiE0VMWtJlMJHwbShlSnM7
 5VhEn859namvoYtN8XBaTFa/jRDOxO+LHWuOy172oaBUgaVHBjZQLyrlit1FRQvT
 SVruCmgtJ7u7RD/8uVDfPNR05DTSWQYzklJoKx2avKZj5FIx7ms=
 =/6Ue
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: pvdump and selftest improvements

- add an interface to provide a hypervisor dump for secure guests
- improve selftests to show tests
2022-06-07 12:28:53 -04:00
Janosch Frank b0f46280d3 Documentation/virt/kvm/api.rst: Explain rc/rrc delivery
Let's explain in which situations the rc/rrc will set in struct
kvm_pv_cmd so it's clear that the struct members should be set to
0. rc/rrc are independent of the IOCTL return code.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-12-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-12-frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-06-01 16:57:14 +02:00
Janosch Frank 437cfd714d Documentation/virt/kvm/api.rst: Add protvirt dump/info api descriptions
Time to add the dump API changes to the api documentation file.
Also some minor cleanup.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-11-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-11-frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-06-01 16:57:14 +02:00
Janosch Frank 660a28653d Documentation: virt: Protected virtual machine dumps
Let's add a documentation file which describes the dump process. Since
we only copy the UV dump data from the UV to userspace we'll not go
into detail here and let the party which processes the data describe
its structure.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-10-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-10-frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-06-01 16:57:14 +02:00
Linus Torvalds bf9095424d S390:
* ultravisor communication device driver
 
 * fix TEID on terminating storage key ops
 
 RISC-V:
 
 * Added Sv57x4 support for G-stage page table
 
 * Added range based local HFENCE functions
 
 * Added remote HFENCE functions based on VCPU requests
 
 * Added ISA extension registers in ONE_REG interface
 
 * Updated KVM RISC-V maintainers entry to cover selftests support
 
 ARM:
 
 * Add support for the ARMv8.6 WFxT extension
 
 * Guard pages for the EL2 stacks
 
 * Trap and emulate AArch32 ID registers to hide unsupported features
 
 * Ability to select and save/restore the set of hypercalls exposed
   to the guest
 
 * Support for PSCI-initiated suspend in collaboration with userspace
 
 * GICv3 register-based LPI invalidation support
 
 * Move host PMU event merging into the vcpu data structure
 
 * GICv3 ITS save/restore fixes
 
 * The usual set of small-scale cleanups and fixes
 
 x86:
 
 * New ioctls to get/set TSC frequency for a whole VM
 
 * Allow userspace to opt out of hypercall patching
 
 * Only do MSR filtering for MSRs accessed by rdmsr/wrmsr
 
 AMD SEV improvements:
 
 * Add KVM_EXIT_SHUTDOWN metadata for SEV-ES
 
 * V_TSC_AUX support
 
 Nested virtualization improvements for AMD:
 
 * Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE,
   nested vGIF)
 
 * Allow AVIC to co-exist with a nested guest running
 
 * Fixes for LBR virtualizations when a nested guest is running,
   and nested LBR virtualization support
 
 * PAUSE filtering for nested hypervisors
 
 Guest support:
 
 * Decoupling of vcpu_is_preempted from PV spinlocks
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKN9M4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNLeAf+KizAlQwxEehHHeNyTkZuKyMawrD6
 zsqAENR6i1TxiXe7fDfPFbO2NR0ZulQopHbD9mwnHJ+nNw0J4UT7g3ii1IAVcXPu
 rQNRGMVWiu54jt+lep8/gDg0JvPGKVVKLhxUaU1kdWT9PhIOC6lwpP3vmeWkUfRi
 PFL/TMT0M8Nfryi0zHB0tXeqg41BiXfqO8wMySfBAHUbpv8D53D2eXQL6YlMM0pL
 2quB1HxHnpueE5vj3WEPQ3PCdy1M2MTfCDBJAbZGG78Ljx45FxSGoQcmiBpPnhJr
 C6UGP4ZDWpml5YULUoA70k5ylCbP+vI61U4vUtzEiOjHugpPV5wFKtx5nw==
 =ozWx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "S390:

   - ultravisor communication device driver

   - fix TEID on terminating storage key ops

  RISC-V:

   - Added Sv57x4 support for G-stage page table

   - Added range based local HFENCE functions

   - Added remote HFENCE functions based on VCPU requests

   - Added ISA extension registers in ONE_REG interface

   - Updated KVM RISC-V maintainers entry to cover selftests support

  ARM:

   - Add support for the ARMv8.6 WFxT extension

   - Guard pages for the EL2 stacks

   - Trap and emulate AArch32 ID registers to hide unsupported features

   - Ability to select and save/restore the set of hypercalls exposed to
     the guest

   - Support for PSCI-initiated suspend in collaboration with userspace

   - GICv3 register-based LPI invalidation support

   - Move host PMU event merging into the vcpu data structure

   - GICv3 ITS save/restore fixes

   - The usual set of small-scale cleanups and fixes

  x86:

   - New ioctls to get/set TSC frequency for a whole VM

   - Allow userspace to opt out of hypercall patching

   - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr

  AMD SEV improvements:

   - Add KVM_EXIT_SHUTDOWN metadata for SEV-ES

   - V_TSC_AUX support

  Nested virtualization improvements for AMD:

   - Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE,
     nested vGIF)

   - Allow AVIC to co-exist with a nested guest running

   - Fixes for LBR virtualizations when a nested guest is running, and
     nested LBR virtualization support

   - PAUSE filtering for nested hypervisors

  Guest support:

   - Decoupling of vcpu_is_preempted from PV spinlocks"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (199 commits)
  KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest
  KVM: selftests: x86: Sync the new name of the test case to .gitignore
  Documentation: kvm: reorder ARM-specific section about KVM_SYSTEM_EVENT_SUSPEND
  x86, kvm: use correct GFP flags for preemption disabled
  KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer
  x86/kvm: Alloc dummy async #PF token outside of raw spinlock
  KVM: x86: avoid calling x86 emulator without a decoded instruction
  KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak
  x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave)
  s390/uv_uapi: depend on CONFIG_S390
  KVM: selftests: x86: Fix test failure on arch lbr capable platforms
  KVM: LAPIC: Trace LAPIC timer expiration on every vmentry
  KVM: s390: selftest: Test suppression indication on key prot exception
  KVM: s390: Don't indicate suppression on dirtying, failing memop
  selftests: drivers/s390x: Add uvdevice tests
  drivers/s390/char: Add Ultravisor io device
  MAINTAINERS: Update KVM RISC-V entry to cover selftests support
  RISC-V: KVM: Introduce ISA extension register
  RISC-V: KVM: Cleanup stale TLB entries when host CPU changes
  RISC-V: KVM: Add remote HFENCE functions based on VCPU requests
  ...
2022-05-26 14:20:14 -07:00
Paolo Bonzini 186af6bb40 Documentation: kvm: reorder ARM-specific section about KVM_SYSTEM_EVENT_SUSPEND
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-25 05:15:43 -04:00
Paolo Bonzini 1644e27059 KVM: s390: Fix and feature for 5.19
- ultravisor communication device driver
 - fix TEID on terminating storage key ops
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+SKTgaM0CPnbq/vKEXu8gLWmHHwFAmKLWW0ACgkQEXu8gLWm
 HHyhmBAApRObtkLtQjctGs4lzGPvE019EFFdBlK5ayYrgFE0gcaX0adstnLWyzJ+
 J7L6UbxUzKKfev0BCDyPCTH+FUW5LHanpS0pBASLrl4VMcloWa7GZh5Ahbiq797x
 9QnMC72qUggg4FYj4X4WxYJhxqgqi2lmYrcz7QjCbW6X0RWilryPuzZcL326ghzz
 gH11gup0cy9HSpe6zr7efNT8UVahUr06ky1VnUBnDRR3ecuMQOUBET/McOXLYUQP
 Q0eFtdRXvrnDKbCXinORCCp6dbreibBpLAF5PWh5WxlTNluZQtfzYjBUHyMpoeEB
 akEc/gb/MY7bbR5V7aTG3Joi1soFFmZQ93P9Bn8c39wWfouOkm8gqioCd6erjczW
 5seFUNR72uWwUfNxBFvPbDFq7eS6qEVoIx14jLjGhUcTwE9xQhNYCgc5qmmNSqB/
 OMUKyKpaqkvPs+mx+/efFhVSScWs+AMimUYzYb2fdTJ7MXxnCRIF0BUkIlFjJVpc
 3y1tOi0mD+CEKDyVfTPmigagFgW79FK6rnScSorSRWXqE3xcSpJjv3Afo6II20mQ
 YJZKviciknzxZ8/uZbJl98DpHvW17oh08SBs9kLweHjLo3SPZtHWa2qzilGTBZMY
 75jPiwNMLdZdf/SRYdrIn6nlSNrvUt+16YcN8vqUwcqW+9Of0IQ=
 =93zj
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fix and feature for 5.19

- ultravisor communication device driver
- fix TEID on terminating storage key ops
2022-05-25 05:11:21 -04:00
Paolo Bonzini 47e8eec832 KVM/arm64 updates for 5.19
- Add support for the ARMv8.6 WFxT extension
 
 - Guard pages for the EL2 stacks
 
 - Trap and emulate AArch32 ID registers to hide unsupported features
 
 - Ability to select and save/restore the set of hypercalls exposed
   to the guest
 
 - Support for PSCI-initiated suspend in collaboration with userspace
 
 - GICv3 register-based LPI invalidation support
 
 - Move host PMU event merging into the vcpu data structure
 
 - GICv3 ITS save/restore fixes
 
 - The usual set of small-scale cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmKGAGsPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDB/gQAMhyZ+wCG0OMEZhwFF6iDfxVEX2Kw8L41NtD
 a/e6LDWuIOGihItpRkYROc5myG74D7XckF2Bz3G7HJoU4vhwHOV/XulE26GFizoC
 O1GVRekeSUY81wgS1yfo0jojLupBkTjiq3SjTHoDP7GmCM0qDPBtA0QlMRzd2bMs
 Kx0+UUXZUHFSTXc7Lp4vqNH+tMp7se+yRx7hxm6PCM5zG+XYJjLxnsZ0qpchObgU
 7f6YFojsLUs1SexgiUqJ1RChVQ+FkgICh5HyzORvGtHNNzK6D2sIbsW6nqMGAMql
 Kr3A5O/VOkCztSYnLxaa76/HqD21mvUrXvr3grhabNc7rOmuzWV0dDgr6c6wHKHb
 uNCtH4d7Ra06gUrEOrfsgLOLn0Zqik89y6aIlMsnTudMg9gMNgFHy1jz4LM7vMkY
 FS5AVj059heg2uJcfgTvzzcqneyuBLBmF3dS4coowO6oaj8SycpaEmP5e89zkPMI
 1kk8d0e6RmXuCh/2AJ8GxxnKvBPgqp2mMKXOCJ8j4AmHEDX/CKpEBBqIWLKkplUU
 8DGiOWJUtRZJg398dUeIpiVLoXJthMODjAnkKkuhiFcQbXomlwgg7YSnNAz6TRED
 Z7KR2leC247kapHnnagf02q2wED8pBeyrxbQPNdrHtSJ9Usm4nTkY443HgVTJW3s
 aTwPZAQ7
 =mh7W
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 5.19

- Add support for the ARMv8.6 WFxT extension

- Guard pages for the EL2 stacks

- Trap and emulate AArch32 ID registers to hide unsupported features

- Ability to select and save/restore the set of hypercalls exposed
  to the guest

- Support for PSCI-initiated suspend in collaboration with userspace

- GICv3 register-based LPI invalidation support

- Move host PMU event merging into the vcpu data structure

- GICv3 ITS save/restore fixes

- The usual set of small-scale cleanups and fixes

[Due to the conflict, KVM_SYSTEM_EVENT_SEV_TERM is relocated
 from 4 to 6. - Paolo]
2022-05-25 05:09:23 -04:00
Linus Torvalds 143a6252e1 arm64 updates for 5.19:
- Initial support for the ARMv9 Scalable Matrix Extension (SME). SME
   takes the approach used for vectors in SVE and extends this to provide
   architectural support for matrix operations. No KVM support yet, SME
   is disabled in guests.
 
 - Support for crashkernel reservations above ZONE_DMA via the
   'crashkernel=X,high' command line option.
 
 - btrfs search_ioctl() fix for live-lock with sub-page faults.
 
 - arm64 perf updates: support for the Hisilicon "CPA" PMU for monitoring
   coherent I/O traffic, support for Arm's CMN-650 and CMN-700
   interconnect PMUs, minor driver fixes, kerneldoc cleanup.
 
 - Kselftest updates for SME, BTI, MTE.
 
 - Automatic generation of the system register macros from a 'sysreg'
   file describing the register bitfields.
 
 - Update the type of the function argument holding the ESR_ELx register
   value to unsigned long to match the architecture register size
   (originally 32-bit but extended since ARMv8.0).
 
 - stacktrace cleanups.
 
 - ftrace cleanups.
 
 - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(),
   avoid executable mappings in kexec/hibernate code, drop TLB flushing
   from get_clear_flush() (and rename it to get_clear_contig()),
   ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmKH19IACgkQa9axLQDI
 XvEFWg//bf0p6zjeNaOJmBbyVFsXsVyYiEaLUpFPUs3oB+81s2YZ+9i1rgMrNCft
 EIDQ9+/HgScKxJxnzWf68heMdcBDbk76VJtLALExbge6owFsjByQDyfb/b3v/bLd
 ezAcGzc6G5/FlI1IP7ct4Z9MnQry4v5AG8lMNAHjnf6GlBS/tYNAqpmj8HpQfgRQ
 ZbhfZ8Ayu3TRSLWL39NHVevpmxQm/bGcpP3Q9TtjUqg0r1FQ5sK/LCqOksueIAzT
 UOgUVYWSFwTpLEqbYitVqgERQp9LiLoK5RmNYCIEydfGM7+qmgoxofSq5e2hQtH2
 SZM1XilzsZctRbBbhMit1qDBqMlr/XAy/R5FO0GauETVKTaBhgtj6mZGyeC9nU/+
 RGDljaArbrOzRwMtSuXF+Fp6uVo5spyRn1m8UT/k19lUTdrV9z6EX5Fzuc4Mnhed
 oz4iokbl/n8pDObXKauQspPA46QpxUYhrAs10B/ELc3yyp/Qj3jOfzYHKDNFCUOq
 HC9mU+YiO9g2TbYgCrrFM6Dah2E8fU6/cR0ZPMeMgWK4tKa+6JMEINYEwak9e7M+
 8lZnvu3ntxiJLN+PrPkiPyG+XBh2sux1UfvNQ+nw4Oi9xaydeX7PCbQVWmzTFmHD
 q7UPQ8220e2JNCha9pULS8cxDLxiSksce06DQrGXwnHc1Ir7T04=
 =0DjE
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Initial support for the ARMv9 Scalable Matrix Extension (SME).

   SME takes the approach used for vectors in SVE and extends this to
   provide architectural support for matrix operations. No KVM support
   yet, SME is disabled in guests.

 - Support for crashkernel reservations above ZONE_DMA via the
   'crashkernel=X,high' command line option.

 - btrfs search_ioctl() fix for live-lock with sub-page faults.

 - arm64 perf updates: support for the Hisilicon "CPA" PMU for
   monitoring coherent I/O traffic, support for Arm's CMN-650 and
   CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup.

 - Kselftest updates for SME, BTI, MTE.

 - Automatic generation of the system register macros from a 'sysreg'
   file describing the register bitfields.

 - Update the type of the function argument holding the ESR_ELx register
   value to unsigned long to match the architecture register size
   (originally 32-bit but extended since ARMv8.0).

 - stacktrace cleanups.

 - ftrace cleanups.

 - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(),
   avoid executable mappings in kexec/hibernate code, drop TLB flushing
   from get_clear_flush() (and rename it to get_clear_contig()),
   ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits)
  arm64/sysreg: Generate definitions for FAR_ELx
  arm64/sysreg: Generate definitions for DACR32_EL2
  arm64/sysreg: Generate definitions for CSSELR_EL1
  arm64/sysreg: Generate definitions for CPACR_ELx
  arm64/sysreg: Generate definitions for CONTEXTIDR_ELx
  arm64/sysreg: Generate definitions for CLIDR_EL1
  arm64/sve: Move sve_free() into SVE code section
  arm64: Kconfig.platforms: Add comments
  arm64: Kconfig: Fix indentation and add comments
  arm64: mm: avoid writable executable mappings in kexec/hibernate code
  arm64: lds: move special code sections out of kernel exec segment
  arm64/hugetlb: Implement arm64 specific huge_ptep_get()
  arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
  arm64: kdump: Do not allocate crash low memory if not needed
  arm64/sve: Generate ZCR definitions
  arm64/sme: Generate defintions for SVCR
  arm64/sme: Generate SMPRI_EL1 definitions
  arm64/sme: Automatically generate SMPRIMAP_EL2 definitions
  arm64/sme: Automatically generate SMIDR_EL1 defines
  arm64/sme: Automatically generate defines for SMCR
  ...
2022-05-23 21:06:11 -07:00
Linus Torvalds eb39e37d5c AMD SEV-SNP support
Add to confidential guests the necessary memory integrity protection
 against malicious hypervisor-based attacks like data replay, memory
 remapping and others, thus achieving a stronger isolation from the
 hypervisor.
 
 At the core of the functionality is a new structure called a reverse
 map table (RMP) with which the guest has a say in which pages get
 assigned to it and gets notified when a page which it owns, gets
 accessed/modified under the covers so that the guest can take an
 appropriate action.
 
 In addition, add support for the whole machinery needed to launch a SNP
 guest, details of which is properly explained in each patch.
 
 And last but not least, the series refactors and improves parts of the
 previous SEV support so that the new code is accomodated properly and
 not just bolted on.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmKLU2AACgkQEsHwGGHe
 VUpb/Q//f4LGiJf4nw1flzpe90uIsHNwAafng3NOjeXmhI/EcOlqPf23WHPCgg3Z
 2umfa4sRZyj4aZubDd7tYAoq4qWrQ7pO7viWCNTh0InxBAILOoMPMuq2jSAbq0zV
 ASUJXeQ2bqjYxX4JV4N5f3HT2l+k68M0mpGLN0H+O+LV9pFS7dz7Jnsg+gW4ZP25
 PMPLf6FNzO/1tU1aoYu80YDP1ne4eReLrNzA7Y/rx+S2NAetNwPn21AALVgoD4Nu
 vFdKh4MHgtVbwaQuh0csb/+4vD+tDXAhc8lbIl+Abl9ZxJaDWtAJW5D9e2CnsHk1
 NOkHwnrzizzhtGK1g56YPUVRFAWhZYMOI1hR0zGPLQaVqBnN4b+iahPeRiV0XnGE
 PSbIHSfJdeiCkvLMCdIAmpE5mRshhRSUfl1CXTCdetMn8xV/qz/vG6bXssf8yhTV
 cfLGPHU7gfVmsbR9nk5a8KZ78PaytxOxfIDXvCy8JfQwlIWtieaCcjncrj+sdMJy
 0fdOuwvi4jma0cyYuPolKiS1Hn4ldeibvxXT7CZQlIx6jZShMbpfpTTJs11XdtHm
 PdDAc1TY3AqI33mpy9DhDQmx/+EhOGxY3HNLT7evRhv4CfdQeK3cPVUWgo4bGNVv
 ZnFz7nvmwpyufltW9K8mhEZV267174jXGl6/idxybnlVE7ESr2Y=
 =Y8kW
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull AMD SEV-SNP support from Borislav Petkov:
 "The third AMD confidential computing feature called Secure Nested
  Paging.

  Add to confidential guests the necessary memory integrity protection
  against malicious hypervisor-based attacks like data replay, memory
  remapping and others, thus achieving a stronger isolation from the
  hypervisor.

  At the core of the functionality is a new structure called a reverse
  map table (RMP) with which the guest has a say in which pages get
  assigned to it and gets notified when a page which it owns, gets
  accessed/modified under the covers so that the guest can take an
  appropriate action.

  In addition, add support for the whole machinery needed to launch a
  SNP guest, details of which is properly explained in each patch.

  And last but not least, the series refactors and improves parts of the
  previous SEV support so that the new code is accomodated properly and
  not just bolted on"

* tag 'x86_sev_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  x86/entry: Fixup objtool/ibt validation
  x86/sev: Mark the code returning to user space as syscall gap
  x86/sev: Annotate stack change in the #VC handler
  x86/sev: Remove duplicated assignment to variable info
  x86/sev: Fix address space sparse warning
  x86/sev: Get the AP jump table address from secrets page
  x86/sev: Add missing __init annotations to SEV init routines
  virt: sevguest: Rename the sevguest dir and files to sev-guest
  virt: sevguest: Change driver name to reflect generic SEV support
  x86/boot: Put globals that are accessed early into the .data section
  x86/boot: Add an efi.h header for the decompressor
  virt: sevguest: Fix bool function returning negative value
  virt: sevguest: Fix return value check in alloc_shared_pages()
  x86/sev-es: Replace open-coded hlt-loop with sev_es_terminate()
  virt: sevguest: Add documentation for SEV-SNP CPUID Enforcement
  virt: sevguest: Add support to get extended report
  virt: sevguest: Add support to derive key
  virt: Add SEV-SNP guest driver
  x86/sev: Register SEV-SNP guest request platform device
  x86/sev: Provide support for SNP guest request NAEs
  ...
2022-05-23 17:38:01 -07:00
Janis Schoetterl-Glausch c783631b0b KVM: s390: Don't indicate suppression on dirtying, failing memop
If user space uses a memop to emulate an instruction and that
memop fails, the execution of the instruction ends.
Instruction execution can end in different ways, one of which is
suppression, which requires that the instruction execute like a no-op.
A writing memop that spans multiple pages and fails due to key
protection may have modified guest memory, as a result, the likely
correct ending is termination. Therefore, do not indicate a
suppressing instruction ending in this case.

Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220512131019.2594948-2-scgl@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-05-20 16:38:42 +02:00
Marc Zyngier 3b8e21e3c3 Merge branch kvm-arm64/psci-suspend into kvmarm-master/next
* kvm-arm64/psci-suspend:
  : .
  : Add support for PSCI SYSTEM_SUSPEND and allow userspace to
  : filter the wake-up events.
  :
  : Patches courtesy of Oliver.
  : .
  Documentation: KVM: Fix title level for PSCI_SUSPEND
  selftests: KVM: Test SYSTEM_SUSPEND PSCI call
  selftests: KVM: Refactor psci_test to make it amenable to new tests
  selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test
  selftests: KVM: Create helper for making SMCCC calls
  selftests: KVM: Rename psci_cpu_on_test to psci_test
  KVM: arm64: Implement PSCI SYSTEM_SUSPEND
  KVM: arm64: Add support for userspace to suspend a vCPU
  KVM: arm64: Return a value from check_vcpu_requests()
  KVM: arm64: Rename the KVM_REQ_SLEEP handler
  KVM: arm64: Track vCPU power state using MP state values
  KVM: arm64: Dedupe vCPU power off helpers
  KVM: arm64: Don't depend on fallthrough to hide SYSTEM_RESET2

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16 17:48:20 +01:00
Marc Zyngier 0586e28aaa Merge branch kvm-arm64/hcall-selection into kvmarm-master/next
* kvm-arm64/hcall-selection:
  : .
  : Introduce a new set of virtual sysregs for userspace to
  : select the hypercalls it wants to see exposed to the guest.
  :
  : Patches courtesy of Raghavendra and Oliver.
  : .
  KVM: arm64: Fix hypercall bitmap writeback when vcpus have already run
  KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace
  Documentation: Fix index.rst after psci.rst renaming
  selftests: KVM: aarch64: Add the bitmap firmware registers to get-reg-list
  selftests: KVM: aarch64: Introduce hypercall ABI test
  selftests: KVM: Create helper for making SMCCC calls
  selftests: KVM: Rename psci_cpu_on_test to psci_test
  tools: Import ARM SMCCC definitions
  Docs: KVM: Add doc for the bitmap firmware registers
  Docs: KVM: Rename psci.rst to hypercalls.rst
  KVM: arm64: Add vendor hypervisor firmware register
  KVM: arm64: Add standard hypervisor firmware register
  KVM: arm64: Setup a framework for hypercall bitmap firmware registers
  KVM: arm64: Factor out firmware register handling from psci.c

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16 17:47:03 +01:00
Stephen Rothwell 582eb04e05 Documentation: KVM: Fix title level for PSCI_SUSPEND
The htmldoc build breaks in a funny way with:

<quote>
Sphinx parallel build error:
docutils.utils.SystemMessage: /home/sfr/next/next/Documentation/virt/kvm/api.rst:6175: (SEVERE/4) Title level inconsistent:

For arm/arm64:
^^^^^^^^^^^^^^
</quote>

Swap the ^^s for a bunch of --s...

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[maz: commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-05 12:24:21 +01:00
Marc Zyngier c36820b04c Documentation: Fix index.rst after psci.rst renaming
Fix the TOC in index.rst after psci.rst has been renamed to
hypercalls.rst.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20220504205627.18f46380@canb.auug.org.au
2022-05-04 13:13:55 +01:00
Oliver Upton bfbab44568 KVM: arm64: Implement PSCI SYSTEM_SUSPEND
ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
software to request that a system be placed in the deepest possible
low-power state. Effectively, software can use this to suspend itself to
RAM.

Unfortunately, there really is no good way to implement a system-wide
PSCI call in KVM. Any precondition checks done in the kernel will need
to be repeated by userspace since there is no good way to protect a
critical section that spans an exit to userspace. SYSTEM_RESET and
SYSTEM_OFF are equally plagued by this issue, although no users have
seemingly cared for the relatively long time these calls have been
supported.

The solution is to just make the whole implementation userspace's
problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that
indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND.
Additionally, add a CAP to get buy-in from userspace for this new exit
type.

Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in.
If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide
explicit documentation of userspace's responsibilites for the exit and
point to the PSCI specification to describe the actual PSCI call.

Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-8-oupton@google.com
2022-05-04 09:28:45 +01:00
Oliver Upton 7b33a09d03 KVM: arm64: Add support for userspace to suspend a vCPU
Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
is in a suspended state. In the suspended state the vCPU will block
until a wakeup event (pending interrupt) is recognized.

Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
userspace that KVM has recognized one such wakeup event. It is the
responsibility of userspace to then make the vCPU runnable, or leave it
suspended until the next wakeup event.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-7-oupton@google.com
2022-05-04 09:28:45 +01:00
Raghavendra Rao Ananta fa246c68a0 Docs: KVM: Add doc for the bitmap firmware registers
Add the documentation for the bitmap firmware registers in
hypercalls.rst and api.rst. This includes the details for
KVM_REG_ARM_STD_BMAP, KVM_REG_ARM_STD_HYP_BMAP, and
KVM_REG_ARM_VENDOR_HYP_BMAP registers.

Since the document is growing to carry other hypercall related
information, make necessary adjustments to present the document
in a generic sense, rather than being PSCI focused.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: small scale reformat, move things about, random typo fixes]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-7-rananta@google.com
2022-05-03 21:30:19 +01:00
Raghavendra Rao Ananta f1ced23a9b Docs: KVM: Rename psci.rst to hypercalls.rst
Since the doc also covers general hypercalls' details,
rather than just PSCI, and the fact that the bitmap firmware
registers' details will be added to this doc, rename the file
to a more appropriate name- hypercalls.rst.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-6-rananta@google.com
2022-05-03 21:30:19 +01:00
Alexandru Elisei 18f3976fdb KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high
When userspace is debugging a VM, the kvm_debug_exit_arch part of the
kvm_run struct contains arm64 specific debug information: the ESR_EL2
value, encoded in the field "hsr", and the address of the instruction
that caused the exception, encoded in the field "far".

Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately
kvm_debug_exit_arch.hsr cannot be changed because that would change the
memory layout of the struct on big endian machines:

Current layout:			| Layout with "hsr" extended to 64 bits:
				|
offset 0: ESR_EL2[31:0] (hsr)   | offset 0: ESR_EL2[61:32] (hsr[61:32])
offset 4: padding		| offset 4: ESR_EL2[31:0]  (hsr[31:0])
offset 8: FAR_EL2[61:0] (far)	| offset 8: FAR_EL2[61:0]  (far)

which breaks existing code.

The padding is inserted by the compiler because the "far" field must be
aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1],
page 18), and the struct itself must be aligned to 8 bytes (the struct must
be aligned to the maximum alignment of its fields - aapcs64, page 18),
which means that "hsr" must be aligned to 8 bytes as it is the first field
in the struct.

To avoid changing the struct size and layout for the existing fields, add a
new field, "hsr_high", which replaces the existing padding. "hsr_high" will
be used to hold the ESR_EL2[61:32] bits of the register. The memory layout,
both on big and little endian machine, becomes:

offset 0: ESR_EL2[31:0]  (hsr)
offset 4: ESR_EL2[61:32] (hsr_high)
offset 8: FAR_EL2[61:0]  (far)

The padding that the compiler inserts for the current struct layout is
unitialized. To prevent an updated userspace running on an old kernel
mistaking the padding for a valid "hsr_high" value, add a new flag,
KVM_DEBUG_ARCH_HSR_HIGH_VALID, to kvm_run->flags to let userspace know that
"hsr_high" holds a valid ESR_EL2[61:32] value.

[1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-6-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:26:27 +01:00
Lai Jiangshan 84e5ffd045 KVM: X86/MMU: Fix shadowing 5-level NPT for 4-level NPT L1 guest
When shadowing 5-level NPT for 4-level NPT L1 guest, the root_sp is
allocated with role.level = 5 and the guest pagetable's root gfn.

And root_sp->spt[0] is also allocated with the same gfn and the same
role except role.level = 4.  Luckily that they are different shadow
pages, but only root_sp->spt[0] is the real translation of the guest
pagetable.

Here comes a problem:

If the guest switches from gCR4_LA57=0 to gCR4_LA57=1 (or vice verse)
and uses the same gfn as the root page for nested NPT before and after
switching gCR4_LA57.  The host (hCR4_LA57=1) might use the same root_sp
for the guest even the guest switches gCR4_LA57.  The guest will see
unexpected page mapped and L2 may exploit the bug and hurt L1.  It is
lucky that the problem can't hurt L0.

And three special cases need to be handled:

The root_sp should be like role.direct=1 sometimes: its contents are
not backed by gptes, root_sp->gfns is meaningless.  (For a normal high
level sp in shadow paging, sp->gfns is often unused and kept zero, but
it could be relevant and meaningful if sp->gfns is used because they
are backed by concrete gptes.)

For such root_sp in the case, root_sp is just a portal to contribute
root_sp->spt[0], and root_sp->gfns should not be used and
root_sp->spt[0] should not be dropped if gpte[0] of the guest root
pagetable is changed.

Such root_sp should not be accounted too.

So add role.passthrough to distinguish the shadow pages in the hash
when gCR4_LA57 is toggled and fix above special cases by using it in
kvm_mmu_page_{get|set}_gfn() and sp_has_gptes().

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220420131204.2850-3-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:50:00 -04:00
Paolo Bonzini 71d7c575a6 Merge branch 'kvm-fixes-for-5.18-rc5' into HEAD
Fixes for (relatively) old bugs, to be merged in both the -rc and next
development trees.

The merge reconciles the ABI fixes for KVM_EXIT_SYSTEM_EVENT between
5.18 and commit c24a950ec7 ("KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata
for SEV-ES", 2022-04-13).
2022-04-29 12:47:59 -04:00
Paolo Bonzini d495f942f4 KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
member that at the time was unused.  Unfortunately this extensibility
mechanism has several issues:

- x86 is not writing the member, so it would not be possible to use it
  on x86 except for new events

- the member is not aligned to 64 bits, so the definition of the
  uAPI struct is incorrect for 32- on 64-bit userspace.  This is a
  problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately
  usage of flags was only introduced in 5.18.

Since padding has to be introduced, place a new field in there
that tells if the flags field is valid.  To allow further extensibility,
in fact, change flags to an array of 16 values, and store how many
of the values are valid.  The availability of the new ndata field
is tied to a system capability; all architectures are changed to
fill in the field.

To avoid breaking compilation of userspace that was using the flags
field, provide a userspace-only union to overlap flags with data[0].
The new field is placed at the same offset for both 32- and 64-bit
userspace.

Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <20220422103013.34832-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29 12:38:22 -04:00
Tom Lendacky d63670d23e virt: sevguest: Rename the sevguest dir and files to sev-guest
Rename the drivers/virt/coco/sevguest directory and files to sev-guest
so as to match the driver name.

  [ bp: Rename Documentation/virt/coco/sevguest.rst too, as reported by sfr:
    https://lore.kernel.org/r/20220427101059.3bf55262@canb.auug.org.au ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/2f5c9cb16e3a67599c8e3170f6c72c8712c47d53.1650464054.git.thomas.lendacky@amd.com
2022-04-27 13:29:56 +02:00
Peter Gonda c24a950ec7 KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata for SEV-ES
If an SEV-ES guest requests termination, exit to userspace with
KVM_EXIT_SYSTEM_EVENT and a dedicated SEV_TERM type instead of -EINVAL
so that userspace can take appropriate action.

See AMD's GHCB spec section '4.1.13 Termination Request' for more details.

Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Gonda <pgonda@google.com>

Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <20220407210233.782250-1-pgonda@google.com>
[Add documentatino. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-13 13:37:46 -04:00
Paolo Bonzini a4cfff3f0f Merge branch 'kvm-older-features' into HEAD
Merge branch for features that did not make it into 5.18:

* New ioctls to get/set TSC frequency for a whole VM

* Allow userspace to opt out of hypercall patching

Nested virtualization improvements for AMD:

* Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE,
  nested vGIF)

* Allow AVIC to co-exist with a nested guest running

* Fixes for LBR virtualizations when a nested guest is running,
  and nested LBR virtualization support

* PAUSE filtering for nested hypervisors

Guest support:

* Decoupling of vcpu_is_preempted from PV spinlocks

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-13 13:37:17 -04:00
Like Xu af105c9cc9 Documentation: KVM: Add SPDX-License-Identifier tag
+new file mode 100644
+WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
+#27: FILE: Documentation/virt/kvm/x86/errata.rst:1:

Opportunistically update all other non-added KVM documents and
remove a new extra blank line at EOF for x86/errata.rst.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220406063715.55625-5-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-11 13:28:56 -04:00
Michael Roth 92a99584d9 virt: sevguest: Add documentation for SEV-SNP CPUID Enforcement
Update the documentation with information regarding SEV-SNP CPUID
Enforcement details and what sort of assurances it provides to guests.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220307213356.2797205-47-brijesh.singh@amd.com
2022-04-07 16:47:12 +02:00
Brijesh Singh d80b494f71 virt: sevguest: Add support to get extended report
Version 2 of GHCB specification defines Non-Automatic-Exit (NAE) to get
extended guest report which is similar to the SNP_GET_REPORT ioctl. The
main difference is related to the additional data that will be returned.

That additional data returned is a certificate blob that can be used by
the SNP guest user. The certificate blob layout is defined in the GHCB
specification. The driver simply treats the blob as a opaque data and
copies it to userspace.

  [ bp: Massage commit message, cast 1st arg of access_ok() ]

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220307213356.2797205-46-brijesh.singh@amd.com
2022-04-07 16:47:12 +02:00
Brijesh Singh 68de0b2f93 virt: sevguest: Add support to derive key
The SNP_GET_DERIVED_KEY ioctl interface can be used by the SNP guest to
ask the firmware to provide a key derived from a root key. The derived
key may be used by the guest for any purposes it chooses, such as a
sealing key or communicating with the external entities.

See SEV-SNP firmware spec for more information.

  [ bp: No need to memset "req" - it will get overwritten. ]

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Link: https://lore.kernel.org/r/20220307213356.2797205-45-brijesh.singh@amd.com
2022-04-07 16:47:12 +02:00
Brijesh Singh fce96cf044 virt: Add SEV-SNP guest driver
The SEV-SNP specification provides the guest a mechanism to communicate
with the PSP without risk from a malicious hypervisor who wishes to
read, alter, drop or replay the messages sent. The driver uses
snp_issue_guest_request() to issue GHCB SNP_GUEST_REQUEST or
SNP_EXT_GUEST_REQUEST NAE events to submit the request to PSP.

The PSP requires that all communication should be encrypted using key
specified through a struct snp_guest_platform_data descriptor.

Userspace can use SNP_GET_REPORT ioctl() to query the guest attestation
report.

See SEV-SNP spec section Guest Messages for more details.

  [ bp: Remove the "what" from the commit message, massage. ]

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220307213356.2797205-44-brijesh.singh@amd.com
2022-04-07 16:47:12 +02:00
Bagas Sanjaya c1be1ef1b4 Documentation: kvm: Add missing line break in api.rst
Add missing line break separator between literal block and description
of KVM_EXIT_RISCV_SBI.

This fixes:
</path/to/linux>/Documentation/virt/kvm/api.rst:6118: WARNING: Literal block ends without a blank line; unexpected unindent.

Fixes: da40d85805 (RISC-V: KVM: Document RISC-V specific parts of KVM API, 2021-09-27)
Cc: Anup Patel <anup.patel@wdc.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20220403065735.23859-1-bagasdotme@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-05 08:11:37 -04:00
Linus Torvalds 38904911e8 * Only do MSR filtering for MSRs accessed by rdmsr/wrmsr
* Documentation improvements
 
 * Prevent module exit until all VMs are freed
 
 * PMU Virtualization fixes
 
 * Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences
 
 * Other miscellaneous bugfixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJIGV8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO5FQgAhls4+Nu+NqId/yvvyNxr3vXq0dHI
 hLlHtvzgGzZisZ7y2bNeyIpJVBDT5LCbrptPD/5eTvchVswDh0+kCVC0Uni5ugGT
 tLT/Pv9Oq9e0X7aGdHRyuHIivIFDC20zIZO2DV48Lrj/+r6DafB2Fghq2XQLlBxN
 p8KislvuqAAos543BPC1+Lk3dhOLuZ8qcFD8wGRlcCwjNwYaitrQ16rO04cLfUur
 OwIks1I6TdI2JpLBhm6oWYVG/YnRsoo4bQE8cjdQ6yNSbwWtRpV33q7X6onw8x8K
 BEeESoTnMqfaxIF/6mPl6bnDblVHFp6Xhld/vJcgeWQTdajFtuFE/K4sCA==
 =xnQ6
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

 - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr

 - Documentation improvements

 - Prevent module exit until all VMs are freed

 - PMU Virtualization fixes

 - Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences

 - Other miscellaneous bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: x86: fix sending PV IPI
  KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
  KVM: x86: Remove redundant vm_entry_controls_clearbit() call
  KVM: x86: cleanup enter_rmode()
  KVM: x86: SVM: fix tsc scaling when the host doesn't support it
  kvm: x86: SVM: remove unused defines
  KVM: x86: SVM: move tsc ratio definitions to svm.h
  KVM: x86: SVM: fix avic spec based definitions again
  KVM: MIPS: remove reference to trap&emulate virtualization
  KVM: x86: document limitations of MSR filtering
  KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr
  KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
  KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
  KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set
  KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
  KVM: x86: Trace all APICv inhibit changes and capture overall status
  KVM: x86: Add wrappers for setting/clearing APICv inhibits
  KVM: x86: Make APICv inhibit reasons an enum and cleanup naming
  KVM: X86: Handle implicit supervisor access with SMAP
  KVM: X86: Rename variable smap to not_smap in permission_fault()
  ...
2022-04-02 12:09:02 -07:00
David Woodhouse ffbb61d09f KVM: x86: Accept KVM_[GS]ET_TSC_KHZ as a VM ioctl.
This sets the default TSC frequency for subsequently created vCPUs.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20220225145304.36166-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02 05:41:19 -04:00
David Woodhouse 661a20fab7 KVM: x86/xen: Advertise and document KVM_XEN_HVM_CONFIG_EVTCHN_SEND
At the end of the patch series adding this batch of event channel
acceleration features, finally add the feature bit which advertises
them and document it all.

For SCHEDOP_poll we need to wake a polling vCPU when a given port
is triggered, even when it's masked — and we want to implement that
in the kernel, for efficiency. So we want the kernel to know that it
has sole ownership of event channel delivery. Thus, we allow
userspace to make the 'promise' by setting the corresponding feature
bit in its KVM_XEN_HVM_CONFIG call. As we implement SCHEDOP_poll
bypass later, we will do so only if that promise has been made by
userspace.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-16-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02 05:41:17 -04:00
Oliver Upton f1a9761fbb KVM: x86: Allow userspace to opt out of hypercall patching
KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
both of these instructions really should #UD when executed on the wrong
vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
guest's instruction with the appropriate instruction for the vendor.
Nonetheless, older guest kernels without commit c1118b3602 ("x86: kvm:
use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
do not patch in the appropriate instruction using alternatives, likely
motivating KVM's intervention.

Add a quirk allowing userspace to opt out of hypercall patching. If the
quirk is disabled, KVM synthesizes a #UD in the guest.

Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20220316005538.2282772-2-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02 05:41:10 -04:00
Paolo Bonzini fe5f691413 KVM: MIPS: remove reference to trap&emulate virtualization
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220313140522.1307751-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02 05:34:47 -04:00
Paolo Bonzini ce2f72e26c KVM: x86: document limitations of MSR filtering
MSR filtering requires an exit to userspace that is hard to implement and
would be very slow in the case of nested VMX vmexit and vmentry MSR
accesses.  Document the limitation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02 05:34:47 -04:00
David Woodhouse cf1d88b36b KVM: Remove dirty handling from gfn_to_pfn_cache completely
It isn't OK to cache the dirty status of a page in internal structures
for an indefinite period of time.

Any time a vCPU exits the run loop to userspace might be its last; the
VMM might do its final check of the dirty log, flush the last remaining
dirty pages to the destination and complete a live migration. If we
have internal 'dirty' state which doesn't get flushed until the vCPU
is finally destroyed on the source after migration is complete, then
we have lost data because that will escape the final copy.

This problem already exists with the use of kvm_vcpu_unmap() to mark
pages dirty in e.g. VMX nesting.

Note that the actual Linux MM already considers the page to be dirty
since we have a writeable mapping of it. This is just about the KVM
dirty logging.

For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to
track which gfn_to_pfn_caches have been used and explicitly mark the
corresponding pages dirty before returning to userspace. But we would
have needed external tracking of that anyway, rather than walking the
full list of GPCs to find those belonging to this vCPU which are dirty.

So let's rely *solely* on that external tracking, and keep it simple
rather than laying a tempting trap for callers to fall into.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02 05:34:41 -04:00
Sean Christopherson df06dae3f2 KVM: Don't actually set a request when evicting vCPUs for GFN cache invd
Don't actually set a request bit in vcpu->requests when making a request
purely to force a vCPU to exit the guest.  Logging a request but not
actually consuming it would cause the vCPU to get stuck in an infinite
loop during KVM_RUN because KVM would see the pending request and bail
from VM-Enter to service the request.

Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as
nothing in KVM is wired up to set guest_uses_pa=true.  But, it'd be all
too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init()
without implementing handling of the request, especially since getting
test coverage of MMU notifier interaction with specific KVM features
usually requires a directed test.

Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus
to evict_vcpus.  The purpose of the request is to get vCPUs out of guest
mode, it's supposed to _avoid_ waking vCPUs that are blocking.

Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to
what it wants to accomplish, and to genericize the name so that it can
used for similar but unrelated scenarios, should they arise in the future.
Add a comment and documentation to explain why the "no action" request
exists.

Add compile-time assertions to help detect improper usage.  Use the inner
assertless helper in the one s390 path that makes requests without a
hardcoded request.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220223165302.3205276-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02 05:34:39 -04:00
Linus Torvalds e8b767f5e0 This pull request contains the following changes for UML:
- Devicetree support (for testing)
 - Various cleanups and fixes: UBD, port_user, uml_mconsole
 - Maintainer update
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmJFwUMWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wQqBD/9gLyeiVp2eu1YFVir64IASgVjK
 lNdlAfUwfebtEsw65JcfY8K64910ahw6TvkjTT2A+QGeJIYaVwmw69bLXJUvQq31
 C7ZDsMHptuNiZrHDL9SoA0DfwqRdJx3tgGzDnSkhX+2T7Zs5n1nLRMBmn/NJV9Qy
 CmxG9fLH1VsU0p6RI76WST3GPLOqWa3jCeHK1vMGZNXI+eo5prHc59lkOcT7lEy7
 M4vJRaAV6pCDDYMQdDOYr1PDEeG7/h49EqdKylkOhonDyYB649rL6Lc9nRBvSts3
 NXX/qYy1Sj1AlOSR5IOon6QCyk1hap9kr85QoCtz3VMabD/yLlBovZzLOLaF+0S6
 dQWgKg806g8QYQGxN03Ph0Pb5cA6hAjr8nVmAuICJDWgmY6Oo74pEvhI8toofFzk
 NJzwa6G99xNhfggeTcGdG0ddQDT8N3enKspDPkzpN127GzU5cgvI1Z8wnZXB7JDM
 zLMCxzwehocCSrFlh9aQDFK1XJfEWuP66xEPl5cX46//IMKqsrXEOjNlCTRUmA5F
 OhU4qqb01OW3K4HPaAkBcGPZ0HhFn6JREUFyNW07dg6s73IWzf0CaNKeYJS7abln
 tdvfPg3OPNXCjHd3aCW22EzuB9R/K8BNMkva3QQZxtUa+tOjBdBd9JBJ+vHGA1MN
 7/k60wl1dt8/N9yHFg==
 =YsK8
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML updates from Richard Weinberger:

 - Devicetree support (for testing)

 - Various cleanups and fixes: UBD, port_user, uml_mconsole

 - Maintainer update

* tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: run_helper: Write error message to kernel log on exec failure on host
  um: port_user: Improve error handling when port-helper is not found
  um: port_user: Allow setting path to port-helper using UML_PORT_HELPER envvar
  um: port_user: Search for in.telnetd in PATH
  um: clang: Strip out -mno-global-merge from USER_CFLAGS
  docs: UML: Mention telnetd for port channel
  um: Remove unused timeval_to_ns() function
  um: Fix uml_mconsole stop/go
  um: Cleanup syscall_handler_t definition/cast, fix warning
  uml: net: vector: fix const issue
  um: Fix WRITE_ZEROES in the UBD Driver
  um: Migrate vector drivers to NAPI
  um: Fix order of dtb unflatten/early init
  um: fix and optimize xor select template for CONFIG64 and timetravel mode
  um: Document dtb command line option
  lib/logic_iomem: correct fallback config references
  um: Remove duplicated include in syscalls_64.c
  MAINTAINERS: Update UserModeLinux entry
2022-03-31 16:16:58 -07:00
Paolo Bonzini cde363ab7c Documentation: KVM: add API issues section
Add a section to document all the different ways in which the KVM API sucks.

I am sure there are way more, give people a place to vent so that userspace
authors are aware.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220322110712.222449-4-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29 13:21:20 -04:00
Paolo Bonzini 45016721de Documentation: KVM: add virtual CPU errata documentation
Add a file to document all the different ways in which the virtual CPU
emulation is imperfect.  Include an example to show how to document
such errata.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Message-Id: <20220322110712.222449-3-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29 13:21:20 -04:00
Paolo Bonzini daec8d4083 Documentation: KVM: add separate directories for architecture-specific documentation
ARM already has an arm/ subdirectory, but s390 and x86 do not even though
they have a relatively large number of files specific to them.  Create
new directories in Documentation/virt/kvm for these two architectures
as well.

While at it, group the API documentation and the developer documentation
in the table of contents.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220322110712.222449-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29 13:21:20 -04:00
Paolo Bonzini 99a17b7770 Documentation: kvm: include new locks
kvm->mn_invalidate_lock and kvm->slots_arch_lock were not included in the
documentation, add them.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220322110720.222499-3-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29 13:21:19 -04:00
Paolo Bonzini e9611bf9d2 Documentation: kvm: fixes for locking.rst
Separate the various locks clearly, and include the new names of blocked_vcpu_on_cpu_lock
and blocked_vcpu_on_cpu.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220322110720.222499-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29 13:21:19 -04:00
Linus Torvalds 1ebdbeb03e ARM:
- Proper emulation of the OSLock feature of the debug architecture
 
 - Scalibility improvements for the MMU lock when dirty logging is on
 
 - New VMID allocator, which will eventually help with SVA in VMs
 
 - Better support for PMUs in heterogenous systems
 
 - PSCI 1.1 support, enabling support for SYSTEM_RESET2
 
 - Implement CONFIG_DEBUG_LIST at EL2
 
 - Make CONFIG_ARM64_ERRATUM_2077057 default y
 
 - Reduce the overhead of VM exit when no interrupt is pending
 
 - Remove traces of 32bit ARM host support from the documentation
 
 - Updated vgic selftests
 
 - Various cleanups, doc updates and spelling fixes
 
 RISC-V:
 
 - Prevent KVM_COMPAT from being selected
 
 - Optimize __kvm_riscv_switch_to() implementation
 
 - RISC-V SBI v0.3 support
 
 s390:
 
 - memop selftest
 
 - fix SCK locking
 
 - adapter interruptions virtualization for secure guests
 
 - add Claudio Imbrenda as maintainer
 
 - first step to do proper storage key checking
 
 x86:
 
 - Continue switching kvm_x86_ops to static_call(); introduce
   static_call_cond() and __static_call_ret0 when applicable.
 
 - Cleanup unused arguments in several functions
 
 - Synthesize AMD 0x80000021 leaf
 
 - Fixes and optimization for Hyper-V sparse-bank hypercalls
 
 - Implement Hyper-V's enlightened MSR bitmap for nested SVM
 
 - Remove MMU auditing
 
 - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
   page tracking is enabled
 
 - Cleanup the implementation of the guest PGD cache
 
 - Preparation for the implementation of Intel IPI virtualization
 
 - Fix some segment descriptor checks in the emulator
 
 - Allow AMD AVIC support on systems with physical APIC ID above 255
 
 - Better API to disable virtualization quirks
 
 - Fixes and optimizations for the zapping of page tables:
 
   - Zap roots in two passes, avoiding RCU read-side critical sections
     that last too long for very large guests backed by 4 KiB SPTEs.
 
   - Zap invalid and defunct roots asynchronously via concurrency-managed
     work queue.
 
   - Allowing yielding when zapping TDP MMU roots in response to the root's
     last reference being put.
 
   - Batch more TLB flushes with an RCU trick.  Whoever frees the paging
     structure now holds RCU as a proxy for all vCPUs running in the guest,
     i.e. to prolongs the grace period on their behalf.  It then kicks the
     the vCPUs out of guest mode before doing rcu_read_unlock().
 
 Generic:
 
 - Introduce __vcalloc and use it for very large allocations that
   need memcg accounting
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmI4fdwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMq8gf/WoeVHtw2QlL5Mmz6McvRRmPAYPLV
 wLUIFNrRqRvd8Tw4kivzZoh/xTpwmnojv0YdK5SjKAiMjgv094YI1LrNp1JSPvmL
 pitocMkA10RSJNWHeEMg9cMSKH0rKiqeYl6S1e2XsdB+UZZ2BINOCVtvglmjTAvJ
 dFBdKdBkqjAUZbdXAGIvz4JEEER3N/LkFDKGaUGX+0QIQOzGBPIyLTxynxIDG6mt
 RViCCFyXdy5NkVp5hZFm96vQ2qAlWL9B9+iKruQN++82+oqWbeTdSqPhdwF7GyFz
 BfOv3gobQ2c4ef/aMLO5LswZ9joI1t/4kQbbAn6dNybpOAz/NXfDnbNefg==
 =keox
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - Proper emulation of the OSLock feature of the debug architecture

   - Scalibility improvements for the MMU lock when dirty logging is on

   - New VMID allocator, which will eventually help with SVA in VMs

   - Better support for PMUs in heterogenous systems

   - PSCI 1.1 support, enabling support for SYSTEM_RESET2

   - Implement CONFIG_DEBUG_LIST at EL2

   - Make CONFIG_ARM64_ERRATUM_2077057 default y

   - Reduce the overhead of VM exit when no interrupt is pending

   - Remove traces of 32bit ARM host support from the documentation

   - Updated vgic selftests

   - Various cleanups, doc updates and spelling fixes

  RISC-V:
   - Prevent KVM_COMPAT from being selected

   - Optimize __kvm_riscv_switch_to() implementation

   - RISC-V SBI v0.3 support

  s390:
   - memop selftest

   - fix SCK locking

   - adapter interruptions virtualization for secure guests

   - add Claudio Imbrenda as maintainer

   - first step to do proper storage key checking

  x86:
   - Continue switching kvm_x86_ops to static_call(); introduce
     static_call_cond() and __static_call_ret0 when applicable.

   - Cleanup unused arguments in several functions

   - Synthesize AMD 0x80000021 leaf

   - Fixes and optimization for Hyper-V sparse-bank hypercalls

   - Implement Hyper-V's enlightened MSR bitmap for nested SVM

   - Remove MMU auditing

   - Eager splitting of page tables (new aka "TDP" MMU only) when dirty
     page tracking is enabled

   - Cleanup the implementation of the guest PGD cache

   - Preparation for the implementation of Intel IPI virtualization

   - Fix some segment descriptor checks in the emulator

   - Allow AMD AVIC support on systems with physical APIC ID above 255

   - Better API to disable virtualization quirks

   - Fixes and optimizations for the zapping of page tables:

      - Zap roots in two passes, avoiding RCU read-side critical
        sections that last too long for very large guests backed by 4
        KiB SPTEs.

      - Zap invalid and defunct roots asynchronously via
        concurrency-managed work queue.

      - Allowing yielding when zapping TDP MMU roots in response to the
        root's last reference being put.

      - Batch more TLB flushes with an RCU trick. Whoever frees the
        paging structure now holds RCU as a proxy for all vCPUs running
        in the guest, i.e. to prolongs the grace period on their behalf.
        It then kicks the the vCPUs out of guest mode before doing
        rcu_read_unlock().

  Generic:
   - Introduce __vcalloc and use it for very large allocations that need
     memcg accounting"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
  KVM: use kvcalloc for array allocations
  KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
  kvm: x86: Require const tsc for RT
  KVM: x86: synthesize CPUID leaf 0x80000021h if useful
  KVM: x86: add support for CPUID leaf 0x80000021
  KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
  Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
  kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
  KVM: arm64: fix typos in comments
  KVM: arm64: Generalise VM features into a set of flags
  KVM: s390: selftests: Add error memop tests
  KVM: s390: selftests: Add more copy memop tests
  KVM: s390: selftests: Add named stages for memop test
  KVM: s390: selftests: Add macro as abstraction for MEM_OP
  KVM: s390: selftests: Split memop tests
  KVM: s390x: fix SCK locking
  RISC-V: KVM: Implement SBI HSM suspend call
  RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
  RISC-V: Add SBI HSM suspend related defines
  RISC-V: KVM: Implement SBI v0.3 SRST extension
  ...
2022-03-24 11:58:57 -07:00
Linus Torvalds 346658a5e1 It has been a moderately busy cycle for documentation; some of the
highlights are:
 
 - Numerous PDF-generation improvements
 
 - Kees's new document with guidelines for researchers studying the
   development community.
 
 - The ongoing stream of Chinese translations
 
 - Thorsten's new document on regression handling
 
 - A major reworking of the internal documentation for the kernel-doc
   script.
 
 Plus the usual stream of typo fixes and such.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmI4puIPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YUnoH/RubGxsXMBcpVk0szSN/c8VEhp+QxnL6Q8NV
 BtOySou5lu204awb285m4HPvuVSViHKoDObFK3fYZUjCHP8rrSymnBV8N0E0pCYm
 QAgUtNcGqFk41uEkr1v4wmGCj3hIvklycOtBAite4NulHoUzpMsssf6YbajZRIt9
 /PyX30jC3dVPDCZ33lYIzJRdilhoKlS5r/x2Fk/c9uOLGCsJDufHlI6PB+RA7Gpf
 6kDMriIKmEU9Pq2P+Gl+tVPnrQYSSxVP8QUweObQQll2Wq/tQR/1YtecAg0RvG+g
 qc3ciEpVnNHRzSP1XY6Um1FyE338cBtZckdUMgVZ+5vY0300ooA=
 =wRYm
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.18' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a moderately busy cycle for documentation; some of the
  highlights are:

   - Numerous PDF-generation improvements

   - Kees's new document with guidelines for researchers studying the
     development community.

   - The ongoing stream of Chinese translations

   - Thorsten's new document on regression handling

   - A major reworking of the internal documentation for the kernel-doc
     script.

  Plus the usual stream of typo fixes and such"

* tag 'docs-5.18' of git://git.lwn.net/linux: (80 commits)
  docs/kernel-parameters: update description of mem=
  docs/zh_CN: Add sched-nice-design Chinese translation
  docs: scheduler: Convert schedutil.txt to ReST
  Docs: ktap: add code-block type
  docs: serial: fix a reference file name in driver.rst
  docs: UML: Mention telnetd for port channel
  docs/zh_CN: add damon reclaim translation
  docs/zh_CN: add damon usage translation
  docs/zh_CN: add admin-guide damon start translation
  docs/zh_CN: add admin-guide damon index translation
  docs/zh_CN: Refactoring the admin-guide directory index
  zh_CN: Add translation for admin-guide/mm/index.rst
  zh_CN: Add translations for admin-guide/mm/ksm.rst
  Add Chinese translation for vm/ksm.rst
  docs/zh_CN: Add sched-stats Chinese translation
  docs/zh_CN: add devicetree of_unittest translation
  docs/zh_CN: add devicetree usage-model translation
  docs/zh_CN: add devicetree index translation
  Documentation: describe how to apply incremental stable patches
  docs/zh_CN: add peci subsystem translation
  ...
2022-03-21 14:13:25 -07:00
Oliver Upton 6d8491910f KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
advertise the set of quirks which may be disabled to userspace, so it is
impossible to predict the behavior of KVM. Worse yet,
KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
it fails to reject attempts to set invalid quirk bits.

The only valid workaround for the quirky quirks API is to add a new CAP.
Actually advertise the set of quirks that can be disabled to userspace
so it can predict KVM's behavior. Reject values for cap->args[0] that
contain invalid bits.

Finally, add documentation for the new capability and describe the
existing quirks.

Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20220301060351.442881-5-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21 09:28:41 -04:00
Paolo Bonzini 714797c98e KVM/arm64 updates for 5.18
- Proper emulation of the OSLock feature of the debug architecture
 
 - Scalibility improvements for the MMU lock when dirty logging is on
 
 - New VMID allocator, which will eventually help with SVA in VMs
 
 - Better support for PMUs in heterogenous systems
 
 - PSCI 1.1 support, enabling support for SYSTEM_RESET2
 
 - Implement CONFIG_DEBUG_LIST at EL2
 
 - Make CONFIG_ARM64_ERRATUM_2077057 default y
 
 - Reduce the overhead of VM exit when no interrupt is pending
 
 - Remove traces of 32bit ARM host support from the documentation
 
 - Updated vgic selftests
 
 - Various cleanups, doc updates and spelling fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmI0lrQPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDy0YQAIX2bWcPFMqHqn3CAYhTSTiOK5s+OWx9im5f
 5yTPRj+SJ88SWv030r8a5dxWh2dEK2IetM9KifZ0dvmcCs8lYW/9/IUkHYY9lAYJ
 9VLH4iPgs9dOD9wtfovfb+vcM8bso9Ndi3aCFJUj+bcNwYU3kBIJ+8AxA5DZoLty
 5LPF38eoxrSEv9N0VwqvhGxdgqDp8Zahykr693r+8Wd3Rj6yRoqoEvqWhHdVWlWJ
 3quRNkYN4LzjN3x1T9CLaZUqMofbUjfYCAvbZorALJy6In1FfgoyocFe6/JvsmzZ
 xOlrWWbJz/1NNI6Hoy5aZtQavTFrHu4XbCkjBDL7RhRxj636KWelVoXAbV05XX2r
 hQYMnN0bwlnAljTefguIZ7frnQyjg5OV8GMu3CTIPMqu//fA+61z+bXoyVy6pzaV
 gcXHtDgIdiRaT6BJiHST8ctxZWDTr2GUgTGfdlCde7hgmJ7DjManLXvgYx101/Nz
 VfvKzz3oSvVTelNa/6ZWxuUlwvly0eKONSkwjp0uq5TZ9G8NLaKitA8nKDSkoegx
 41iIUEztivuu9KQvQkl8wdcCPwEk8K2sOTH7ikINS/wJ0khiUztndxCAlEPbQo50
 567OiSaj5+vqFPZsxWBVTIbmkdBVKCzrG+4B1H4didMb1Q1n2lHhgj1keHTmZyVP
 jlFofZxf
 =J1mn
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 5.18

- Proper emulation of the OSLock feature of the debug architecture

- Scalibility improvements for the MMU lock when dirty logging is on

- New VMID allocator, which will eventually help with SVA in VMs

- Better support for PMUs in heterogenous systems

- PSCI 1.1 support, enabling support for SYSTEM_RESET2

- Implement CONFIG_DEBUG_LIST at EL2

- Make CONFIG_ARM64_ERRATUM_2077057 default y

- Reduce the overhead of VM exit when no interrupt is pending

- Remove traces of 32bit ARM host support from the documentation

- Updated vgic selftests

- Various cleanups, doc updates and spelling fixes
2022-03-18 12:43:24 -04:00
Vincent Whitchurch 89ee9301ac docs: UML: Mention telnetd for port channel
It is not obvious from the documentation that using the "port" channel
for the console requires telnetd to be installed (see port_connection()
in arch/um/drivers/port_user.c).  Mention this, and the fact that UML
will not boot until a client connects.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://lore.kernel.org/r/20220310124230.3069354-1-vincent.whitchurch@axis.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11 13:51:03 -07:00
Vincent Whitchurch 4ef5a0b2e1 docs: UML: Mention telnetd for port channel
It is not obvious from the documentation that using the "port" channel
for the console requires telnetd to be installed (see port_connection()
in arch/um/drivers/port_user.c).  Mention this, and the fact that UML
will not boot until a client connects.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-03-11 10:50:42 +01:00
Anton Ivanov 6427c16527 um: Document dtb command line option
Add documentation for the dtb command line option and the
ability to load/parse device trees.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Reviewed-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-03-11 10:43:32 +01:00
Marc Zyngier 7297a8bcc0 Merge branch kvm-arm64/misc-5.18 into kvmarm-master/next
* kvm-arm64/misc-5.18:
  : .
  : Misc fixes for KVM/arm64 5.18:
  :
  : - Drop unused kvm parameter to kvm_psci_version()
  :
  : - Implement CONFIG_DEBUG_LIST at EL2
  :
  : - Make CONFIG_ARM64_ERRATUM_2077057 default y
  :
  : - Only do the interrupt dance if we have exited because of an interrupt
  :
  : - Remove traces of 32bit ARM host support from the documentation
  : .
  Documentation: KVM: Update documentation to indicate KVM is arm64-only
  KVM: arm64: Only open the interrupt window on exit due to an interrupt
  KVM: arm64: Enable Cortex-A510 erratum 2077057 by default

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-03-09 11:16:48 +00:00
Oliver Upton 3fbf4207dc Documentation: KVM: Update documentation to indicate KVM is arm64-only
KVM support for 32-bit ARM hosts (KVM/arm) has been removed from the
kernel since commit 541ad0150c ("arm: Remove 32bit KVM host
support"). There still exists some remnants of the old architecture in
the KVM documentation.

Remove all traces of 32-bit host support from the documentation. Note
that AArch32 guests are still supported.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220308172856.2997250-1-oupton@google.com
2022-03-09 11:15:24 +00:00
Paolo Bonzini 0564eeb71b Merge branch 'kvm-bugfixes' into HEAD
Merge bugfixes from 5.17 before merging more tricky work.
2022-03-04 18:39:29 -05:00
Sean Christopherson e65a3b46b5 KVM: Drop KVM_REQ_MMU_RELOAD and update vcpu-requests.rst documentation
Remove the now unused KVM_REQ_MMU_RELOAD, shift KVM_REQ_VM_DEAD into the
unoccupied space, and update vcpu-requests.rst, which was missing an
entry for KVM_REQ_VM_DEAD.  Switching KVM_REQ_VM_DEAD to entry '1' also
fixes the stale comment about bits 4-7 being reserved.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220225182248.3812651-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01 08:58:26 -05:00