Extend the get_exit_info static call to provide the reason for the VM
exit. Modify relevant trace points to use this rather than extracting
the reason in the caller.
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-3-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use a rw_semaphore instead of a mutex to coordinate APICv updates so that
vCPUs responding to requests can take the lock for read and run in
parallel. Using a mutex forces serialization of vCPUs even though
kvm_vcpu_update_apicv() only touches data local to that vCPU or is
protected by a different lock, e.g. SVM's ir_list_lock.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211022004927.1448382-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paul pointed out the error messages when KVM fails to load are unhelpful
in understanding exactly what went wrong if userspace probes the "wrong"
module.
Add a mandatory kvm_x86_ops field to track vendor module names, kvm_intel
and kvm_amd, and use the name for relevant error message when KVM fails
to load so that the user knows which module failed to load.
Opportunistically tweak the "disabled by bios" error message to clarify
that _support_ was disabled, not that the module itself was magically
disabled by BIOS.
Suggested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211018183929.897461-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SDM section 18.2.3 mentioned that:
"IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of
any general-purpose or fixed-function counters via a single WRMSR."
It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing,
the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr
MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop
global_ovf_ctrl variable.
Tested-by: Like Xu <likexu@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1634631160-67276-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Unify the flags for rmaps and page tracking data, using a
single flag in struct kvm_arch and a single loop to go
over all the address spaces and memslots. This avoids
code duplication between alloc_all_memslots_rmaps and
kvm_page_track_enable_mmu_write_tracking.
Signed-off-by: David Stevens <stevensd@chromium.org>
[This patch is the delta between David's v2 and v3, with conflicts
fixed and my own commit message. - Paolo]
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.
A much cleaner approach to configuring the guest's views of the TSC is to
simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus not subject to change depending on when the VMM actually
reads/writes values from/to KVM. The VMM can then read the TSC once with
KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
the guest is paused.
Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-8-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Protect the reference point for kvmclock with a seqcount, so that
kvmclock updates for all vCPUs can proceed in parallel. Xen runstate
updates will also run in parallel and not bounce the kvmclock cacheline.
Of the variables that were protected by pvclock_gtod_sync_lock,
nr_vcpus_matched_tsc is different because it is updated outside
pvclock_update_vm_gtod_copy and read inside it. Therefore, we
need to keep it protected by a spinlock. In fact it must now
be a raw spinlock, because pvclock_update_vm_gtod_copy, being the
write-side of a seqcount, is non-preemptible. Since we already
have tsc_write_lock which is a raw spinlock, we can just use
tsc_write_lock as the lock that protects the write-side of the
seqcount.
Co-developed-by: Oliver Upton <oupton@google.com>
Message-Id: <20210916181538.968978-6-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.
Provide userspace with a host TSC & realtime pair iff the realtime clock
is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
realtime value, advance the KVM clock by the amount of elapsed time. Do
not step the KVM clock backwards, though, as it is a monotonic
oscillator.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-5-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Avoid allocating the gfn_track arrays if nothing needs them. If there
are no external to KVM users of the API (i.e. no GVT-g), then page
tracking is only needed for shadow page tables. This means that when tdp
is enabled and there are no external users, then the gfn_track arrays
can be lazily allocated when the shadow MMU is actually used. This avoid
allocations equal to .05% of guest memory when nested virtualization is
not used, if the kernel is compiled without GVT-g.
Signed-off-by: David Stevens <stevensd@chromium.org>
Message-Id: <20210922045859.2011227-3-stevensd@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can
use the common variant. This can be accomplished by adding another
macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now.
Further simplification can be achieved by adding __kvm_arch_free_vm()
doing the common part.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-Id: <20210903130808.30142-5-jgross@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
mmu_try_to_unsync_pages checks if page tracking is active for the given
gfn, which requires knowing the memslot. We can pass down the memslot
via make_spte to avoid this lookup.
The memslot is also handy for make_spte's marking of the gfn as dirty:
we can test whether dirty page tracking is enabled, and if so ensure that
pages are mapped as writable with 4K granularity. Apart from the warning,
no functional change is intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-7-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that kvm_page_fault has a pointer to the memslot it can be passed
down to the page tracking code to avoid a redundant slot lookup.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-5-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pass struct kvm_page_fault to mmu->page_fault() instead of
extracting the arguments from the struct. FNAME(page_fault) can use
the precomputed bools from the error code.
Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Updates to the kvmclock parameters needs to do a complicated dance of
KVM_REQ_MCLOCK_INPROGRESS and KVM_REQ_CLOCK_UPDATE in addition to taking
pvclock_gtod_sync_lock. Place that in two functions that can be called
on all of master clock update, KVM_SET_CLOCK, and Hyper-V reenlightenment.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Open code the call to mmu->translate_gpa() when loading nested PDPTRs and
kill off the existing helper, kvm_read_guest_page_mmu(), to discourage
incorrect use. Reading guest memory straight from an L2 GPA is extremely
rare (as evidenced by the lack of users), as very few constructs in x86
specify physical addresses, even fewer are virtualized by KVM, and even
fewer yet require emulation of L2 by L0 KVM.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210831164224.1119728-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_MAX_VCPU_ID is not specifying the highest allowed vcpu-id, but the
number of allowed vcpu-ids. This has already led to confusion, so
rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS to make its semantics more
clear
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210913135745.13944-3-jgross@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_make_vcpus_request_mask() already disables preemption so just like
kvm_make_all_cpus_request_except() it can be switched to using
pre-allocated per-cpu cpumasks. This allows for improvements for both
users of the function: in Hyper-V emulation code 'tlb_flush' can now be
dropped from 'struct kvm_vcpu_hv' and kvm_make_scan_ioapic_request_mask()
gets rid of dynamic allocation.
cpumask_available() checks in kvm_make_vcpu_request() and
kvm_kick_many_cpus() can now be dropped as they checks for an impossible
condition: kvm_init() makes sure per-cpu masks are allocated.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210903075141.403071-9-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There're other modules might use hv_clock_per_cpu variable like ptp_kvm,
so move it into kvmclock.h and export the symbol to make it visiable to
other modules.
Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Message-Id: <1632892429-101194-2-git-send-email-zelin.deng@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Check the return of init_srcu_struct(), which can fail due to OOM, when
initializing the page track mechanism. Lack of checking leads to a NULL
pointer deref found by a modified syzkaller.
Reported-by: TCS Robot <tcs_robot@tencent.com>
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com>
Message-Id: <1630636626-12262-1-git-send-email-tcs_kernel@tencent.com>
[Move the call towards the beginning of kvm_arch_init_vm. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 865c50e1d2 ("x86/uaccess: utilize CONFIG_CC_HAS_ASM_GOTO_OUTPUT")
added an optimised version of __get_user_asm() for x86 using 'asm goto'.
Like the non-optimised code, the 32-bit implementation of 64-bit
get_user() expands to a pair of 32-bit accesses. Unlike the
non-optimised code, the _original_ pointer is incremented to copy the
high word instead of loading through a new pointer explicitly
constructed to point at a 32-bit type. Consequently, if the pointer
points at a 64-bit type then we end up loading the wrong data for the
upper 32-bits.
This was observed as a mount() failure in Android targeting i686 after
b0cfcdd9b9 ("d_path: make 'prepend()' fill up the buffer exactly on
overflow") because the call to copy_from_kernel_nofault() from
prepend_copy() ends up in __get_kernel_nofault() and casts the source
pointer to a 'u64 __user *'. An attempt to mount at "/debug_ramdisk"
therefore ends up failing trying to mount "/debumdismdisk".
Use the existing '__gu_ptr' source pointer to unsigned int for 32-bit
__get_user_asm_u64() instead of the original pointer.
Cc: Bill Wendling <morbo@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 865c50e1d2 ("x86/uaccess: utilize CONFIG_CC_HAS_ASM_GOTO_OUTPUT")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All users of compat_alloc_user_space() and copy_in_user() have been
removed from the kernel, only a few functions in sparc remain that can be
changed to calling arch_copy_in_user() instead.
Link: https://lkml.kernel.org/r/20210727144859.4150043-7-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Page ownership tracking between host EL1 and EL2
- Rely on userspace page tables to create large stage-2 mappings
- Fix incompatibility between pKVM and kmemleak
- Fix the PMU reset state, and improve the performance of the virtual PMU
- Move over to the generic KVM entry code
- Address PSCI reset issues w.r.t. save/restore
- Preliminary rework for the upcoming pKVM fixed feature
- A bunch of MM cleanups
- a vGIC fix for timer spurious interrupts
- Various cleanups
s390:
- enable interpretation of specification exceptions
- fix a vcpu_idx vs vcpu_id mixup
x86:
- fast (lockless) page fault support for the new MMU
- new MMU now the default
- increased maximum allowed VCPU count
- allow inhibit IRQs on KVM_RUN while debugging guests
- let Hyper-V-enabled guests run with virtualized LAPIC as long as they
do not enable the Hyper-V "AutoEOI" feature
- fixes and optimizations for the toggling of AMD AVIC (virtualized LAPIC)
- tuning for the case when two-dimensional paging (EPT/NPT) is disabled
- bugfixes and cleanups, especially with respect to 1) vCPU reset and
2) choosing a paging mode based on CR0/CR4/EFER
- support for 5-level page table on AMD processors
Generic:
- MMU notifier invalidation callbacks do not take mmu_lock unless necessary
- improved caching of LRU kvm_memory_slot
- support for histogram statistics
- add statistics for halt polling and remote TLB flush requests
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmE2CIAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMyqwf+Ky2WoThuQ9Ra0r/m8pUTAx5+gsAf
MmG24rNLE+26X0xuBT9Q5+etYYRLrRTWJvo5cgHooz7muAYW6scR+ho5xzvLTAxi
DAuoijkXsSdGoFCp0OMUHiwG3cgY5N7feTEwLPAb2i6xr/l6SZyCP4zcwiiQbJ2s
UUD0i3rEoNQ02/hOEveud/ENxzUli9cmmgHKXR3kNgsJClSf1fcuLnhg+7EGMhK9
+c2V+hde5y0gmEairQWm22MLMRolNZ5NL4kjykiNh2M5q9YvbHe5+f/JmENlNZMT
bsUQT6Ry1ukuJ0V59rZvUw71KknPFzZ3d6HgW4pwytMq6EJKiISHzRbVnQ==
=FCAB
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- Page ownership tracking between host EL1 and EL2
- Rely on userspace page tables to create large stage-2 mappings
- Fix incompatibility between pKVM and kmemleak
- Fix the PMU reset state, and improve the performance of the virtual
PMU
- Move over to the generic KVM entry code
- Address PSCI reset issues w.r.t. save/restore
- Preliminary rework for the upcoming pKVM fixed feature
- A bunch of MM cleanups
- a vGIC fix for timer spurious interrupts
- Various cleanups
s390:
- enable interpretation of specification exceptions
- fix a vcpu_idx vs vcpu_id mixup
x86:
- fast (lockless) page fault support for the new MMU
- new MMU now the default
- increased maximum allowed VCPU count
- allow inhibit IRQs on KVM_RUN while debugging guests
- let Hyper-V-enabled guests run with virtualized LAPIC as long as
they do not enable the Hyper-V "AutoEOI" feature
- fixes and optimizations for the toggling of AMD AVIC (virtualized
LAPIC)
- tuning for the case when two-dimensional paging (EPT/NPT) is
disabled
- bugfixes and cleanups, especially with respect to vCPU reset and
choosing a paging mode based on CR0/CR4/EFER
- support for 5-level page table on AMD processors
Generic:
- MMU notifier invalidation callbacks do not take mmu_lock unless
necessary
- improved caching of LRU kvm_memory_slot
- support for histogram statistics
- add statistics for halt polling and remote TLB flush requests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits)
KVM: Drop unused kvm_dirty_gfn_invalid()
KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
KVM: MMU: mark role_regs and role accessors as maybe unused
KVM: MIPS: Remove a "set but not used" variable
x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
KVM: stats: Add VM stat for remote tlb flush requests
KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()
KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page
KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality
Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page
kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
kvm: x86: Increase MAX_VCPUS to 1024
kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host
KVM: s390: index kvm->arch.idle_mask by vcpu_idx
KVM: s390: Enable specification exception interpretation
KVM: arm64: Trim guest debug exception handling
KVM: SVM: Add 5-level page table support for SVM
...
Support for 710 VCPUs was tested by Red Hat since RHEL-8.4,
so increase KVM_SOFT_MAX_VCPUS to 710.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210903211600.2002377-4-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Increase KVM_MAX_VCPUS to 1024, so we can test larger VMs.
I'm not changing KVM_SOFT_MAX_VCPUS yet because I'm afraid it
might involve complicated questions around the meaning of
"supported" and "recommended" in the upstream tree.
KVM_SOFT_MAX_VCPUS will be changed in a separate patch.
For reference, visible effects of this change are:
- KVM_CAP_MAX_VCPUS will now return 1024 (of course)
- Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
will now be 1024
- KVM_MAX_VCPU_ID will change from 1151 to 4096
- Size of struct kvm will increase from 19328 to 22272 bytes
(in x86_64)
- Size of struct kvm_ioapic will increase from 1780 to 5084 bytes
(in x86_64)
- Bitmap stack variables that will grow:
- At kvm_hv_flush_tlb() kvm_hv_send_ipi(),
vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long
- vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long
once patch "KVM: x86: Fix stack-out-of-bounds memory access
from ioapic_write_indirect()" is applied
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210903211600.2002377-3-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Instead of requiring KVM_MAX_VCPU_ID to be manually increased
every time we increase KVM_MAX_VCPUS, set it to 4*KVM_MAX_VCPUS.
This should be enough for CPU topologies where Cores-per-Package
and Packages-per-Socket are not powers of 2.
In practice, this increases KVM_MAX_VCPU_ID from 1023 to 1152.
The only side effect of this change is making some fields in
struct kvm_ioapic larger, increasing the struct size from 1628 to
1780 bytes (in x86_64).
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210903211600.2002377-2-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmEuJwwTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXp0ICACsx9NtQh1f9xGMClYrbobJfGmiwHVV
uKut/44Vg39tSyZB4mQt3A8YcaQj1Nibo6HVmxJtbKbrKwlTGAiQh5fiOmBOd7Re
/rII/S+CGtAyChI1adHTSL2xdk6WY0c7XQw+IPaERBikG5rO81Y6NLjFZNOv494k
JnG9uGGjAcJWFYylPcLxt4sR/hEfE4KDzsWjWOb5azYgo/RwOan6zYDdkUgocp4A
J+zmgCiME8LLmEV19gn7p4gpX7X9m5mcNgn53eICYPhrBqI0PTWocm6DepCEnrQ+
pEobIagWIMx5Dr7euEJwLxFSN7bdzleVOa4FSfM0zUsEjdbiPH47VQFM
=Vae6
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20210831' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- make Hyper-V code arch-agnostic (Michael Kelley)
- fix sched_clock behaviour on Hyper-V (Ani Sinha)
- fix a fault when Linux runs as the root partition on MSHV (Praveen
Kumar)
- fix VSS driver (Vitaly Kuznetsov)
- cleanup (Sonia Sharma)
* tag 'hyperv-next-signed-20210831' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hv_utils: Set the maximum packet size for VSS driver to the length of the receive buffer
Drivers: hv: Enable Hyper-V code to be built on ARM64
arm64: efi: Export screen_info
arm64: hyperv: Initialize hypervisor on boot
arm64: hyperv: Add panic handler
arm64: hyperv: Add Hyper-V hypercall and register access utilities
x86/hyperv: fix root partition faults when writing to VP assist page MSR
hv: hyperv.h: Remove unused inline functions
drivers: hv: Decouple Hyper-V clock/timer code from VMbus drivers
x86/hyperv: add comment describing TSC_INVARIANT_CONTROL MSR setting bit 0
Drivers: hv: Move Hyper-V misc functionality to arch-neutral code
Drivers: hv: Add arch independent default functions for some Hyper-V handlers
Drivers: hv: Make portions of Hyper-V init code be arch neutral
x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable
asm-generic/hyperv: Add missing #include of nmi.h
core:
- extract i915 eDP backlight into core
- DP aux bus support
- drm_device.irq_enabled removed
- port drivers to native irq interfaces
- export gem shadow plane handling for vgem
- print proper driver name in framebuffer registration
- driver fixes for implicit fencing rules
- ARM fixed rate compression modifier added
- updated fb damage handling
- rmfb ioctl logging/docs
- drop drm_gem_object_put_locked
- define DRM_FORMAT_MAX_PLANES
- add gem fb vmap/vunmap helpers
- add lockdep_assert(once) helpers
- mark drm irq midlayer as legacy
- use offset adjusted bo mapping conversion
vgaarb:
- cleanups
fbdev:
- extend efifb handling to all arches
- div by 0 fixes for multiple drivers
udmabuf:
- add hugepage mapping support
dma-buf:
- non-dynamic exporter fixups
- document implicit fencing rules
amdgpu:
- Initial Cyan Skillfish support
- switch virtual DCE over to vkms based atomic
- VCN/JPEG power down fixes
- NAVI PCIE link handling fixes
- AMD HDMI freesync fixes
- Yellow Carp + Beige Goby fixes
- Clockgating/S0ix/SMU/EEPROM fixes
- embed hw fence in job
- rework dma-resv handling
- ensure eviction to system ram
amdkfd:
- uapi: SVM address range query added
- sysfs leak fix
- GPUVM TLB optimizations
- vmfault/migration counters
i915:
- Enable JSL and EHL by default
- preliminary XeHP/DG2 support
- remove all CNL support (never shipped)
- move to TTM for discrete memory support
- allow mixed object mmap handling
- GEM uAPI spring cleaning
- add I915_MMAP_OBJECT_FIXED
- reinstate ADL-P mmap ioctls
- drop a bunch of unused by userspace features
- disable and remove GPU relocations
- revert some i915 misfeatures
- major refactoring of GuC for Gen11+
- execbuffer object locking separate step
- reject caching/set-domain on discrete
- Enable pipe DMC loading on XE-LPD and ADL-P
- add PSF GV point support
- Refactor and fix DDI buffer translations
- Clean up FBC CFB allocation code
- Finish INTEL_GEN() and friends macro conversions
nouveau:
- add eDP backlight support
- implicit fence fix
msm:
- a680/7c3 support
- drm/scheduler conversion
panfrost:
- rework GPU reset
virtio:
- fix fencing for planes
ast:
- add detect support
bochs:
- move to tiny GPU driver
vc4:
- use hotplug irqs
- HDMI codec support
vmwgfx:
- use internal vmware device headers
ingenic:
- demidlayering irq
rcar-du:
- shutdown fixes
- convert to bridge connector helpers
zynqmp-dsub:
- misc fixes
mgag200:
- convert PLL handling to atomic
mediatek:
- MT8133 AAL support
- gem mmap object support
- MT8167 support
etnaviv:
- NXP Layerscape LS1028A SoC support
- GEM mmap cleanups
tegra:
- new user API
exynos:
- missing unlock fix
- build warning fix
- use refcount_t
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmEtvn8ACgkQDHTzWXnE
hr7aqw//WfcIyGdPLjAz59cW8jm+FgihD5colHtOUYRHRO4GeX/bNNufquR8+N3y
HESsyZdpihFHms/wURMq41ibmHg0EuHA01HZzjZuGBesG4F9I8sP/HnDOxDuYuAx
N7Lg4PlUNlfFHmw7Y84owQ6s/XWmNp5iZ8e/mTK5hcraJFQKS4QO74n9RbG/F1vC
Hc3P6AnpqGac2AEGXt0NjIRxVVCTUIBGx+XOhj+1AMyAGzt9VcO1DS9PVCS0zsEy
zKMj9tZAPNg0wYsXAi4kA1lK7uVY8KoXSVDYLpsI5Or2/e7mfq2b4EWrezbtp6UA
H+w86axuwJq7NaYHYH6HqyrLTOmvcHgIl2LoZN91KaNt61xfJT3XZkyQoYViGIrJ
oZy6X/+s+WPoW98bHZrr6vbcxtWKfEeQyUFEAaDMmraKNJwROjtwgFC9DP8MDctq
PUSM+XkwbGRRxQfv9dNKufeWfV5blVfzEJO8EfTU1YET3WTDaUHe/FoIcLZt2DZG
JAJgZkIlU8egthPdakUjQz/KoyLMyovcN5zcjgzgjA9PyNEq74uElN9l446kSSxu
jEVErOdd+aG3Zzk7/ZZL/RmpNQpPfpQ2RaPUkgeUsW01myNzUNuU3KUDaSlVa+Oi
1n7eKoaQ2to/+LjhYApVriri4hIZckNNn5FnnhkgwGi8mpHQIVQ=
=vZkA
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"Highlights:
- i915 has seen a lot of refactoring and uAPI cleanups due to a
change in the upstream direction going forward
This has all been audited with known userspace, but there may be
some pitfalls that were missed.
- i915 now uses common TTM to enable discrete memory on DG1/2 GPUs
- i915 enables Jasper and Elkhart Lake by default and has preliminary
XeHP/DG2 support
- amdgpu adds support for Cyan Skillfish
- lots of implicit fencing rules documented and fixed up in drivers
- msm now uses the core scheduler
- the irq midlayer has been removed for non-legacy drivers
- the sysfb code now works on more than x86.
Otherwise the usual smattering of stuff everywhere, panels, bridges,
refactorings.
Detailed summary:
core:
- extract i915 eDP backlight into core
- DP aux bus support
- drm_device.irq_enabled removed
- port drivers to native irq interfaces
- export gem shadow plane handling for vgem
- print proper driver name in framebuffer registration
- driver fixes for implicit fencing rules
- ARM fixed rate compression modifier added
- updated fb damage handling
- rmfb ioctl logging/docs
- drop drm_gem_object_put_locked
- define DRM_FORMAT_MAX_PLANES
- add gem fb vmap/vunmap helpers
- add lockdep_assert(once) helpers
- mark drm irq midlayer as legacy
- use offset adjusted bo mapping conversion
vgaarb:
- cleanups
fbdev:
- extend efifb handling to all arches
- div by 0 fixes for multiple drivers
udmabuf:
- add hugepage mapping support
dma-buf:
- non-dynamic exporter fixups
- document implicit fencing rules
amdgpu:
- Initial Cyan Skillfish support
- switch virtual DCE over to vkms based atomic
- VCN/JPEG power down fixes
- NAVI PCIE link handling fixes
- AMD HDMI freesync fixes
- Yellow Carp + Beige Goby fixes
- Clockgating/S0ix/SMU/EEPROM fixes
- embed hw fence in job
- rework dma-resv handling
- ensure eviction to system ram
amdkfd:
- uapi: SVM address range query added
- sysfs leak fix
- GPUVM TLB optimizations
- vmfault/migration counters
i915:
- Enable JSL and EHL by default
- preliminary XeHP/DG2 support
- remove all CNL support (never shipped)
- move to TTM for discrete memory support
- allow mixed object mmap handling
- GEM uAPI spring cleaning
- add I915_MMAP_OBJECT_FIXED
- reinstate ADL-P mmap ioctls
- drop a bunch of unused by userspace features
- disable and remove GPU relocations
- revert some i915 misfeatures
- major refactoring of GuC for Gen11+
- execbuffer object locking separate step
- reject caching/set-domain on discrete
- Enable pipe DMC loading on XE-LPD and ADL-P
- add PSF GV point support
- Refactor and fix DDI buffer translations
- Clean up FBC CFB allocation code
- Finish INTEL_GEN() and friends macro conversions
nouveau:
- add eDP backlight support
- implicit fence fix
msm:
- a680/7c3 support
- drm/scheduler conversion
panfrost:
- rework GPU reset
virtio:
- fix fencing for planes
ast:
- add detect support
bochs:
- move to tiny GPU driver
vc4:
- use hotplug irqs
- HDMI codec support
vmwgfx:
- use internal vmware device headers
ingenic:
- demidlayering irq
rcar-du:
- shutdown fixes
- convert to bridge connector helpers
zynqmp-dsub:
- misc fixes
mgag200:
- convert PLL handling to atomic
mediatek:
- MT8133 AAL support
- gem mmap object support
- MT8167 support
etnaviv:
- NXP Layerscape LS1028A SoC support
- GEM mmap cleanups
tegra:
- new user API
exynos:
- missing unlock fix
- build warning fix
- use refcount_t"
* tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm: (1318 commits)
drm/amd/display: Move AllowDRAMSelfRefreshOrDRAMClockChangeInVblank to bounding box
drm/amd/display: Remove duplicate dml init
drm/amd/display: Update bounding box states (v2)
drm/amd/display: Update number of DCN3 clock states
drm/amdgpu: disable GFX CGCG in aldebaran
drm/amdgpu: Clear RAS interrupt status on aldebaran
drm/amdgpu: Add support for RAS XGMI err query
drm/amdkfd: Account for SH/SE count when setting up cu masks.
drm/amdgpu: rename amdgpu_bo_get_preferred_pin_domain
drm/amdgpu: drop redundant cancel_delayed_work_sync call
drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend
drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend
drm/amdkfd: map SVM range with correct access permission
drm/amdkfd: check access permisson to restore retry fault
drm/amdgpu: Update RAS XGMI Error Query
drm/amdgpu: Add driver infrastructure for MCA RAS
drm/amd/display: Add Logging for HDMI color depth information
drm/amd/amdgpu: consolidate PSP TA init shared buf functions
drm/amd/amdgpu: add name field back to ras_common_if
drm/amdgpu: Fix build with missing pm_suspend_target_state module export
...
- Enable memcg accounting for various networking objects.
BPF:
- Introduce bpf timers.
- Add perf link and opaque bpf_cookie which the program can read
out again, to be used in libbpf-based USDT library.
- Add bpf_task_pt_regs() helper to access user space pt_regs
in kprobes, to help user space stack unwinding.
- Add support for UNIX sockets for BPF sockmap.
- Extend BPF iterator support for UNIX domain sockets.
- Allow BPF TCP congestion control progs and bpf iterators to call
bpf_setsockopt(), e.g. to switch to another congestion control
algorithm.
Protocols:
- Support IOAM Pre-allocated Trace with IPv6.
- Support Management Component Transport Protocol.
- bridge: multicast: add vlan support.
- netfilter: add hooks for the SRv6 lightweight tunnel driver.
- tcp:
- enable mid-stream window clamping (by user space or BPF)
- allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
- more accurate DSACK processing for RACK-TLP
- mptcp:
- add full mesh path manager option
- add partial support for MP_FAIL
- improve use of backup subflows
- optimize option processing
- af_unix: add OOB notification support.
- ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by
the router.
- mac80211: Target Wake Time support in AP mode.
- can: j1939: extend UAPI to notify about RX status.
Driver APIs:
- Add page frag support in page pool API.
- Many improvements to the DSA (distributed switch) APIs.
- ethtool: extend IRQ coalesce uAPI with timer reset modes.
- devlink: control which auxiliary devices are created.
- Support CAN PHYs via the generic PHY subsystem.
- Proper cross-chip support for tag_8021q.
- Allow TX forwarding for the software bridge data path to be
offloaded to capable devices.
Drivers:
- veth: more flexible channels number configuration.
- openvswitch: introduce per-cpu upcall dispatch.
- Add internet mix (IMIX) mode to pktgen.
- Transparently handle XDP operations in the bonding driver.
- Add LiteETH network driver.
- Renesas (ravb):
- support Gigabit Ethernet IP
- NXP Ethernet switch (sja1105)
- fast aging support
- support for "H" switch topologies
- traffic termination for ports under VLAN-aware bridge
- Intel 1G Ethernet
- support getcrosststamp() with PCIe PTM (Precision Time
Measurement) for better time sync
- support Credit-Based Shaper (CBS) offload, enabling HW traffic
prioritization and bandwidth reservation
- Broadcom Ethernet (bnxt)
- support pulse-per-second output
- support larger Rx rings
- Mellanox Ethernet (mlx5)
- support ethtool RSS contexts and MQPRIO channel mode
- support LAG offload with bridging
- support devlink rate limit API
- support packet sampling on tunnels
- Huawei Ethernet (hns3):
- basic devlink support
- add extended IRQ coalescing support
- report extended link state
- Netronome Ethernet (nfp):
- add conntrack offload support
- Broadcom WiFi (brcmfmac):
- add WPA3 Personal with FT to supported cipher suites
- support 43752 SDIO device
- Intel WiFi (iwlwifi):
- support scanning hidden 6GHz networks
- support for a new hardware family (Bz)
- Xen pv driver:
- harden netfront against malicious backends
- Qualcomm mobile
- ipa: refactor power management and enable automatic suspend
- mhi: move MBIM to WWAN subsystem interfaces
Refactor:
- Ambient BPF run context and cgroup storage cleanup.
- Compat rework for ndo_ioctl.
Old code removal:
- prism54 remove the obsoleted driver, deprecated by the p54 driver.
- wan: remove sbni/granch driver.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEukBYACgkQMUZtbf5S
IrsyHA//TO8dw18NYts4n9LmlJT2naJ7yBUUSSXK/M+DtW0MQ9nnHhqzPm5uJdRl
IgQTNJrW3dYzRwgqaWZqEwO1t5/FI+f87ND1Nsekg7x9tF66a6ov5WxU26TwwSba
U+si/inQ/4chuQ+LxMQobqCDxaLE46I2dIoRl+YfndJ24DRzYSwAEYIPPbSdfyU+
+/l+3s4GaxO4k/hLciPAiOniyxLoUNiGUTNh+2yqRBXelSRJRKVnl+V22ANFrxRW
nTEiplfVKhlPU1e4iLuRtaxDDiePHhw9I3j/lMHhfeFU2P/gKJIvz4QpGV0CAZg2
1VvDU32WEx1GQLXJbKm0KwoNRUq1QSjOyyFti+BO7ugGaYAR4gKhShOqlSYLzUtB
tbtzQhSNLWOGqgmSJOztZb5kFDm2EdRSll5/lP2uyFlPkIsIp0QbscJVzNTnS74b
Xz15ZOw41Z4TfWPEMWgfrx6Zkm7pPWkly+7WfUkPcHa1gftNz6tzXXxSXcXIBPdi
yQ5JCzzxrM5573YHuk5YedwZpn6PiAt4A/muFGk9C6aXP60TQAOS/ppaUzZdnk4D
NfOk9mj06WEULjYjPcKEuT3GGWE6kmjb8Pu0QZWKOchv7vr6oZly1EkVZqYlXELP
AfhcrFeuufie8mqm0jdb4LnYaAnqyLzlb1J4Zxh9F+/IX7G3yoc=
=JDGD
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Enable memcg accounting for various networking objects.
BPF:
- Introduce bpf timers.
- Add perf link and opaque bpf_cookie which the program can read out
again, to be used in libbpf-based USDT library.
- Add bpf_task_pt_regs() helper to access user space pt_regs in
kprobes, to help user space stack unwinding.
- Add support for UNIX sockets for BPF sockmap.
- Extend BPF iterator support for UNIX domain sockets.
- Allow BPF TCP congestion control progs and bpf iterators to call
bpf_setsockopt(), e.g. to switch to another congestion control
algorithm.
Protocols:
- Support IOAM Pre-allocated Trace with IPv6.
- Support Management Component Transport Protocol.
- bridge: multicast: add vlan support.
- netfilter: add hooks for the SRv6 lightweight tunnel driver.
- tcp:
- enable mid-stream window clamping (by user space or BPF)
- allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
- more accurate DSACK processing for RACK-TLP
- mptcp:
- add full mesh path manager option
- add partial support for MP_FAIL
- improve use of backup subflows
- optimize option processing
- af_unix: add OOB notification support.
- ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by the
router.
- mac80211: Target Wake Time support in AP mode.
- can: j1939: extend UAPI to notify about RX status.
Driver APIs:
- Add page frag support in page pool API.
- Many improvements to the DSA (distributed switch) APIs.
- ethtool: extend IRQ coalesce uAPI with timer reset modes.
- devlink: control which auxiliary devices are created.
- Support CAN PHYs via the generic PHY subsystem.
- Proper cross-chip support for tag_8021q.
- Allow TX forwarding for the software bridge data path to be
offloaded to capable devices.
Drivers:
- veth: more flexible channels number configuration.
- openvswitch: introduce per-cpu upcall dispatch.
- Add internet mix (IMIX) mode to pktgen.
- Transparently handle XDP operations in the bonding driver.
- Add LiteETH network driver.
- Renesas (ravb):
- support Gigabit Ethernet IP
- NXP Ethernet switch (sja1105):
- fast aging support
- support for "H" switch topologies
- traffic termination for ports under VLAN-aware bridge
- Intel 1G Ethernet
- support getcrosststamp() with PCIe PTM (Precision Time
Measurement) for better time sync
- support Credit-Based Shaper (CBS) offload, enabling HW traffic
prioritization and bandwidth reservation
- Broadcom Ethernet (bnxt)
- support pulse-per-second output
- support larger Rx rings
- Mellanox Ethernet (mlx5)
- support ethtool RSS contexts and MQPRIO channel mode
- support LAG offload with bridging
- support devlink rate limit API
- support packet sampling on tunnels
- Huawei Ethernet (hns3):
- basic devlink support
- add extended IRQ coalescing support
- report extended link state
- Netronome Ethernet (nfp):
- add conntrack offload support
- Broadcom WiFi (brcmfmac):
- add WPA3 Personal with FT to supported cipher suites
- support 43752 SDIO device
- Intel WiFi (iwlwifi):
- support scanning hidden 6GHz networks
- support for a new hardware family (Bz)
- Xen pv driver:
- harden netfront against malicious backends
- Qualcomm mobile
- ipa: refactor power management and enable automatic suspend
- mhi: move MBIM to WWAN subsystem interfaces
Refactor:
- Ambient BPF run context and cgroup storage cleanup.
- Compat rework for ndo_ioctl.
Old code removal:
- prism54 remove the obsoleted driver, deprecated by the p54 driver.
- wan: remove sbni/granch driver"
* tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1715 commits)
net: Add depends on OF_NET for LiteX's LiteETH
ipv6: seg6: remove duplicated include
net: hns3: remove unnecessary spaces
net: hns3: add some required spaces
net: hns3: clean up a type mismatch warning
net: hns3: refine function hns3_set_default_feature()
ipv6: remove duplicated 'net/lwtunnel.h' include
net: w5100: check return value after calling platform_get_resource()
net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx()
net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource()
net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()
fou: remove sparse errors
ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
octeontx2-af: Set proper errorcode for IPv4 checksum errors
octeontx2-af: Fix static code analyzer reported issues
octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfg
octeontx2-af: Fix loop in free and unmap counter
af_unix: fix potential NULL deref in unix_dgram_connect()
dpaa2-eth: Replace strlcpy with strscpy
octeontx2-af: Use NDC TX for transmit packet data
...
which can be found on various ALi chipsets and is also available on older
Intel systems which expose a PIRQ router. While the Intel support is more
or less nostalgia, the ALi chips are still in use on popular embedded
boards used for routers.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmEsn2QTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQT1EACIvzRbycwclASIV6rBK5FMcVa2VuXR
GVqrfERPCUHQxnLshUJxnLk0NvZcQrLHjYl/QMCHBFOeEh3XrzU7JkKDW0Q8Dnov
QGFRtandKDwY4TwnCPKVdz/HeWMxNRT7OF4d08Q3iKCN5l39RLxraMixSrFL8soO
wgGcRTjbTa6HaMlqacFN7DwwiHxbIGJNepi0yqLZBV2dQOnZPd+ujV1FRSNXkv9p
vFPfuazk/psiSXy3x/+YVPUw+6h8DRDkflc9+wvSR+1cVl8eyrjkLgLH43ihddEN
Dl1SG5vKyCOtvQm+TEYdB5qjb/Zd4BjlbvKPJ+94OTtsjIIwxzInizkeTXiLHXnl
SDHX9Sc8L4sYP5+tAew1WMj8K2/p6FzdHm+sBJHd2JFSsMpeErI7p0y0Nz58E7pG
0cRqeWlq7rbGFPq544A8cgx/LjPkZT4LgutGpJ6f3NTZeLfj09xbFRqxNOHqAx+h
fp+36RNb1/j70Yz+4r7lLeDOVswbK+YxPIZGdnNfINTHeGllthDI5vaUL0L2jZnI
CnnKjss2a1WkDC8gczr/3QYcQRKrKDHL0hn0nUh+9laAaTSwNv3oRrkUWvMqwaT8
qSMMm5Eb84B4fZLyvPIcAwyC++JU/cVCgWEP37EzhYcvp6tq8GmR1cdi1lo2/K4O
qhg1d7loNh0eCg==
=R+c1
-----END PGP SIGNATURE-----
Merge tag 'x86-irq-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PIRQ updates from Thomas Gleixner:
"A set of updates to support port 0x22/0x23 based PCI configuration
space which can be found on various ALi chipsets and is also available
on older Intel systems which expose a PIRQ router.
While the Intel support is more or less nostalgia, the ALi chips are
still in use on popular embedded boards used for routers"
* tag 'x86-irq-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Fix typo s/ECLR/ELCR/ for the PIC register
x86: Avoid magic number with ELCR register accesses
x86/PCI: Add support for the Intel 82426EX PIRQ router
x86/PCI: Add support for the Intel 82374EB/82374SB (ESC) PIRQ router
x86/PCI: Add support for the ALi M1487 (IBC) PIRQ router
x86: Add support for 0x22/0x23 port I/O configuration space
A stop gap for potential future speculation related hardware
vulnerabilities and a mechanism for truly security paranoid
applications.
It allows a task to request that the L1D cache is flushed when the kernel
switches to a different mm. This can be requested via prctl().
Changes vs. the previous versions:
- Get rid of the software flush fallback
- Make the handling consistent with other mitigations
- Kill the task when it ends up on a SMT enabled core which defeats the
purpose of L1D flushing obviously
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmEsn0oTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoa5fD/47vHGtjAtDr/DaXR1C6F9AvVbKEl8p
oNHn8IukE6ts6G4dFH9wUvo/Ut0K3kxX54I+BATew0LTy6tsQeUYh/xjwXMupgNV
oKOc9waoqdFvju3ayLFWJmuACLdXpyrGC1j35Aji61zSbR/GdtZ4oDxbuN2YJDAT
BTcgKrBM5nQm94JNa083RQSCU5LJxbC7ETkIh6NR73RSPCjUC1Wpxy1sAQAa2MPD
8EzcJ/DjVGaHCI7adX10sz3xdUcyOz7qYz16HpoMGx+oSiq7pGEBtUiK97EYMcrB
s+ADFUjYmx/pbEWv2r4c9zxNh7ZV3aLBsWwi7bScHIsv8GjrsA/mYLWskuwOV6BB
22qZjfd0c4raiJwd+nmSx+D2Szv6lZ20gP+krtP2VNC6hUv7ft0VPLySiaFMmUHj
quooDZis/W5n+4C9Q8Rk9uUtKzzJOngqW+duftiixHiNQ/ECP/QCAHhZYck/NOkL
tZkNj6lJj9+2iR7mhbYROZ+wrYQzRvqNb2pJJQoi/wA0q7wPSKBi3m+51lPsht5W
tn94CpaDDZ4IB7Fe1NtcA0UpYJSWpDQGlau4qp92HMCCIcRFfQEm+m9x8axwcj7m
ECblHJYBPHuNcCHvPA8kHvr1nd6UUXrGPIo8TK8YhUUbK6pO0OjdNzZX496ia/2g
pLzaW2ENTPLbXg==
=27wH
-----END PGP SIGNATURE-----
Merge tag 'x86-cpu-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache flush updates from Thomas Gleixner:
"A reworked version of the opt-in L1D flush mechanism.
This is a stop gap for potential future speculation related hardware
vulnerabilities and a mechanism for truly security paranoid
applications.
It allows a task to request that the L1D cache is flushed when the
kernel switches to a different mm. This can be requested via prctl().
Changes vs the previous versions:
- Get rid of the software flush fallback
- Make the handling consistent with other mitigations
- Kill the task when it ends up on a SMT enabled core which defeats
the purpose of L1D flushing obviously"
* tag 'x86-cpu-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation: Add L1D flushing Documentation
x86, prctl: Hook L1D flushing in via prctl
x86/mm: Prepare for opt-in based L1D flush in switch_mm()
x86/process: Make room for TIF_SPEC_L1D_FLUSH
sched: Add task_work callback for paranoid L1D flush
x86/mm: Refactor cond_ibpb() to support other use cases
x86/smp: Add a per-cpu view of SMT state
- Add support for Intel Sapphire Rapids server CPU uncore events
- Allow the AMD uncore driver to be built as a module
- Misc cleanups and fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmEsrWgRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iPbg/+LTi1ki2kswWF5Fmoo0IWZ1GiCkWD0rSm
SqCJR3u/MHd7QxsfPXFuhABt7mqaC50epfim0MNC1J11wRoNy6GzM/9nICdez9CK
c2c3jAwoEfWE7o5PTSjXolH0FFXVYwQ9WBRJTBCaZjuXUW7TOk7h9o8fXoF8SssU
VkP9z0YKanhN480v4k7/KR20ktY6GPFFu5cxhC0wyygZXIoG6ku+nmDjvN1ipo54
JQYQbCTz+dj/C0uZehbEoTZWx7cajAxlbq+Iyor3ND30YHLeRxWNEcq/Jzn9Szqa
LXnIi3mg4/mHc8mtZjDHXazpMxGYI02hkBACftPb2gizR4DOwtWlw1A2oHjbfmvM
oF29kcZmFU/Na2O5JTf0IpV4LLkt/OlrTsTcd5ROYWgmri3UV18MhaKz0R0vQNhI
8Xx6TJVA9fFHCIONg+4yxzDkYlxHJ9Tg8yFb06M6NWOnud03xYv4FGOzi0laoums
8XbYZUnMh8aVkXz05CYUadknu/ajMOSqAZZAstng3unazrutSCMkZ+Pzs4X+yqq5
Zz2Tb26oi6KepLD0eQNTmo1pRwuWC/IBqVF5aKH4e4qgyta/VxWob3Qd+I7ONaCl
6HpbaWz01Nw8U2wE4dQB0wKwIGlLE8bRQTS2QFuqaEHu0yZsQGt8zMwVWB+f93Gb
viglPyeA/y8=
=QrPT
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event updates from Ingo Molnar:
- Add support for Intel Sapphire Rapids server CPU uncore events
- Allow the AMD uncore driver to be built as a module
- Misc cleanups and fixes
* tag 'perf-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
perf/x86/amd/ibs: Add bitfield definitions in new <asm/amd-ibs.h> header
perf/amd/uncore: Allow the driver to be built as a module
x86/cpu: Add get_llc_id() helper function
perf/amd/uncore: Clean up header use, use <linux/ include paths instead of <asm/
perf/amd/uncore: Simplify code, use free_percpu()'s built-in check for NULL
perf/hw_breakpoint: Replace deprecated CPU-hotplug functions
perf/x86/intel: Replace deprecated CPU-hotplug functions
perf/x86: Remove unused assignment to pointer 'e'
perf/x86/intel/uncore: Fix IIO cleanup mapping procedure for SNR/ICX
perf/x86/intel/uncore: Support IMC free-running counters on Sapphire Rapids server
perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server
perf/x86/intel/uncore: Factor out snr_uncore_mmio_map()
perf/x86/intel/uncore: Add alias PMU name
perf/x86/intel/uncore: Add Sapphire Rapids server MDF support
perf/x86/intel/uncore: Add Sapphire Rapids server M3UPI support
perf/x86/intel/uncore: Add Sapphire Rapids server UPI support
perf/x86/intel/uncore: Add Sapphire Rapids server M2M support
perf/x86/intel/uncore: Add Sapphire Rapids server IMC support
perf/x86/intel/uncore: Add Sapphire Rapids server PCU support
perf/x86/intel/uncore: Add Sapphire Rapids server M2PCIe support
...
is not up yet - delay that processing until everything is ready
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmEsp9MACgkQEsHwGGHe
VUp1ARAAtlYWtIQXUV+YiGGnueO3WZlbME/O/I6uz3KSIgZg59qaXPpscL7vCjS1
pFav1F3GKpplvkWAPBQf290yLYn934oLDanof1ENcAQxE46RD/CTr4V1J3xSzb31
+XEcwHoDFMdSHp219pinNWEDFGlYvb9/Q8AxCMFlUomRrPE/1tjo2l8gmZ7Wi1r5
uUwImmiYKnKoin3HpaY5n0NW+3madWhVfPhwv7Tsh0Hp0Qh5KRra+0OoLSGeKNDk
EDpKE/XKAjCNjcBLNIAWtLCHvPC1bWh2jC+Qmu8a9UGn4e5KmWw+GCUoOuBRuA6X
KOAoXtZXARsDDXadQKjNnQ4P3CvFcmNpCzjoFAarsflA3vKSHttDamk14bxE/mRx
AkNdw7oQYfOLAP2PVpPGgBEiU5WIoPwXdudDi6R8Tu107Z0mhiJIgbjPajJOvFpj
EnTO4k/A+Lk4uhPYYFhlhphDT/B8KdVlnEv0VJds0nPEE2uC3JiXxu7dHbBPYcXm
KtvJ5A+jDEMvFoPzoqwDBr7EzeHUUjyyI1Gym5EsjCNoH4WG8SyXYngCRrFOaIFe
e047UnJ4a7ibsFHAPrB9LtWLxvhPl13D4D+Ejb1GvnzDwSaAJAP9gVBGC0VpicId
NGEYAjsRsLGP2D3I4+Bhh1MN6mL/wlcmzO4izXXhMyc9vuTrXws=
=1ROV
-----END PGP SIGNATURE-----
Merge tag 'ras_core_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS update from Borislav Petkov:
"A single RAS change for 5.15:
- Do not start processing MCEs logged early because the decoding
chain is not up yet - delay that processing until everything is
ready"
* tag 'ras_core_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Defer processing of early errors
- Improve ftrace code patching so that stop_machine is not required anymore.
This requires a small common code patch acked by Steven Rostedt:
https://lore.kernel.org/linux-s390/20210730220741.4da6fdf6@oasis.local.home/
- Enable KCSAN for s390. This comes with a small common code change to fix a
compile warning. Acked by Marco Elver:
https://lore.kernel.org/r/20210729142811.1309391-1-hca@linux.ibm.com
- Add KFENCE support for s390. This also comes with a minimal x86 patch from
Marco Elver who said also this can be carried via the s390 tree:
https://lore.kernel.org/linux-s390/YQJdarx6XSUQ1tFZ@elver.google.com/
- More changes to prepare the decompressor for relocation.
- Enable DAT also for CPU restart path.
- Final set of register asm removal patches; leaving only three locations where
needed and sane.
- Add NNPA, Vector-Packed-Decimal-Enhancement Facility 2, PCI MIO support to
hwcaps flags.
- Cleanup hwcaps implementation.
- Add new instructions to in-kernel disassembler.
- Various QDIO cleanups.
- Add SCLP debug feature.
- Various other cleanups and improvements all over the place.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmEs1GsACgkQIg7DeRsp
bsJ9+A/9FApCECNPgu6jOX4Ee+no+LxpCPUF8rvt56TFTLv7+Dhm7fJl0xQ9utsZ
FyLMDAr1/FKdm2wBW23QZH4vEIt1bd6e/03DwwK+6IjHKZHRIfB8eGJMsLj/TDzm
K6/+FI7qXjvpNXxgkCqXf5yESi/y5Dgr+16kTBhPZj5awRiwe5puPamji3uiQ45V
r4MdGCCC9BnTZvtPpUrr8ImnUqHJ4/TMo1YYdykLbZFuAvvYUyZ5YG5kh0pMa8JZ
DGJpfLQfy7ZNscIzdVhZtfzzESVtS6/AOeBzDMO1pbM1CGXtvpJJP0Wjlr/PGwoW
fvuMHpqTlDi+TfNZiPP5lwsFC89xSd6gtZH7vAuI8kFCXgW3RMjABF6h/mzpH1WO
jXyaSEZROc/83gxPMYyOYiqrKyAFPbpZ/Rnav2bvGQGneqx7RvmpF3GgA9WEo1PW
rMDoEbLstJuHk0E2uEV+OnQd5F7MHNonzpYfp/7pyQ+PL8w2GExV9yngVc/f3TqG
HYLC9rc3K6DkxZappcJm0qTb7lDTMFI7xK3g9RiqPQBJE1v1MYE/rai48nW69ypE
bRNL76AjyXKo+zKR2wlhJVMY1I1+DarMopHhZj6fzQT5te1LLsv8OuTU2gkt6dIq
YoSYOYvModf3HbKnJul2tszQG9yl+vpE9MiCyBQSsxIYXCriq/c=
=WDRh
-----END PGP SIGNATURE-----
Merge tag 's390-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Improve ftrace code patching so that stop_machine is not required
anymore. This requires a small common code patch acked by Steven
Rostedt:
https://lore.kernel.org/linux-s390/20210730220741.4da6fdf6@oasis.local.home/
- Enable KCSAN for s390. This comes with a small common code change to
fix a compile warning. Acked by Marco Elver:
https://lore.kernel.org/r/20210729142811.1309391-1-hca@linux.ibm.com
- Add KFENCE support for s390. This also comes with a minimal x86 patch
from Marco Elver who said also this can be carried via the s390 tree:
https://lore.kernel.org/linux-s390/YQJdarx6XSUQ1tFZ@elver.google.com/
- More changes to prepare the decompressor for relocation.
- Enable DAT also for CPU restart path.
- Final set of register asm removal patches; leaving only three
locations where needed and sane.
- Add NNPA, Vector-Packed-Decimal-Enhancement Facility 2, PCI MIO
support to hwcaps flags.
- Cleanup hwcaps implementation.
- Add new instructions to in-kernel disassembler.
- Various QDIO cleanups.
- Add SCLP debug feature.
- Various other cleanups and improvements all over the place.
* tag 's390-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (105 commits)
s390: remove SCHED_CORE from defconfigs
s390/smp: do not use nodat_stack for secondary CPU start
s390/smp: enable DAT before CPU restart callback is called
s390: update defconfigs
s390/ap: fix state machine hang after failure to enable irq
KVM: s390: generate kvm hypercall functions
s390/sclp: add tracing of SCLP interactions
s390/debug: add early tracing support
s390/debug: fix debug area life cycle
s390/debug: keep debug data on resize
s390/diag: make restart_part2 a local label
s390/mm,pageattr: fix walk_pte_level() early exit
s390: fix typo in linker script
s390: remove do_signal() prototype and do_notify_resume() function
s390/crypto: fix all kernel-doc warnings in vfio_ap_ops.c
s390/pci: improve DMA translation init and exit
s390/pci: simplify CLP List PCI handling
s390/pci: handle FH state mismatch only on disable
s390/pci: fix misleading rc in clp_set_pci_fn()
s390/boot: factor out offset_vmlinux_info() function
...
Add <asm/amd-ibs.h> with bitfield definitions for IBS MSRs,
and demonstrate usage within the driver.
Also move 'struct perf_ibs_data' where it can be shared with
the perf tool that will soon be using it.
No functional changes.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210817221048.88063-9-kim.phillips@amd.com
Factor out a helper function rather than export cpu_llc_id, which is
needed in order to be able to build the AMD uncore driver as a module.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210817221048.88063-7-kim.phillips@amd.com
When a fatal machine check results in a system reset, Linux does not
clear the error(s) from machine check bank(s) - hardware preserves the
machine check banks across a warm reset.
During initialization of the kernel after the reboot, Linux reads, logs,
and clears all machine check banks.
But there is a problem. In:
5de97c9f6d ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
the call to mce_register_decode_chain() moved later in the boot
sequence. This means that /dev/mcelog doesn't see those early error
logs.
This was partially fixed by:
cd9c57cad3 ("x86/MCE: Dump MCE to dmesg if no consumers")
which made sure that the logs were not lost completely by printing
to the console. But parsing console logs is error prone. Users of
/dev/mcelog should expect to find any early errors logged to standard
places.
Add a new flag MCP_QUEUE_LOG to machine_check_poll() to be used in early
machine check initialization to indicate that any errors found should
just be queued to genpool. When mcheck_late_init() is called it will
call mce_schedule_work() to actually log and flush any errors queued in
the genpool.
[ Based on an original patch, commit message by and completely
productized by Tony Luck. ]
Fixes: 5de97c9f6d ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Reported-by: Sumanth Kamatala <skamatala@juniper.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210824003129.GA1642753@agluck-desk2.amr.corp.intel.com
When the 5-level page table CPU flag is set in the host, but the guest
has CR4.LA57=0 (including the case of a 32-bit guest), the top level of
the shadow NPT page tables will be fixed, consisting of one pointer to
a lower-level table and 511 non-present entries. Extend the existing
code that creates the fixed PML4 or PDP table, to provide a fixed PML5
table if needed.
This is not needed on EPT because the number of layers in the tables
is specified in the EPTP instead of depending on the host CR4.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Message-Id: <20210818165549.3771014-3-wei.huang2@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level
on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force
a fixed TDP level.
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Message-Id: <20210818165549.3771014-2-wei.huang2@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_GUESTDBG_BLOCKIRQ will allow KVM to block all interrupts
while running.
This change is mostly intended for more robust single stepping
of the guest and it has the following benefits when enabled:
* Resuming from a breakpoint is much more reliable.
When resuming execution from a breakpoint, with interrupts enabled,
more often than not, KVM would inject an interrupt and make the CPU
jump immediately to the interrupt handler and eventually return to
the breakpoint, to trigger it again.
From the user point of view it looks like the CPU never executed a
single instruction and in some cases that can even prevent forward
progress, for example, when the breakpoint is placed by an automated
script (e.g lx-symbols), which does something in response to the
breakpoint and then continues the guest automatically.
If the script execution takes enough time for another interrupt to
arrive, the guest will be stuck on the same breakpoint RIP forever.
* Normal single stepping is much more predictable, since it won't
land the debugger into an interrupt handler.
* RFLAGS.TF has less chance to be leaked to the guest:
We set that flag behind the guest's back to do single stepping
but if single step lands us into an interrupt/exception handler
it will be leaked to the guest in the form of being pushed
to the stack.
This doesn't completely eliminate this problem as exceptions
can still happen, but at least this reduces the chances
of this happening.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210811122927.900604-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Existing KVM code tracks the number of large pages regardless of their
sizes. Therefore, when large page of 1GB (or larger) is adopted, the
information becomes less useful because lpages counts a mix of 1G and 2M
pages.
So remove the lpages since it is easy for user space to aggregate the info.
Instead, provide a comprehensive page stats of all sizes from 4K to 512G.
Suggested-by: Ben Gardon <bgardon@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Cc: Jing Zhang <jingzhangos@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Message-Id: <20210803044607.599629-4-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
APICV_INHIBIT_REASON_HYPERV is currently unconditionally forced upon
SynIC activation as SynIC's AutoEOI is incompatible with APICv/AVIC. It is,
however, possible to track whether the feature was actually used by the
guest and only inhibit APICv/AVIC when needed.
TLFS suggests a dedicated 'HV_DEPRECATING_AEOI_RECOMMENDED' flag to let
Windows know that AutoEOI feature should be avoided. While it's up to
KVM userspace to set the flag, KVM can help a bit by exposing global
APICv/AVIC enablement.
Maxim:
- always set HV_DEPRECATING_AEOI_RECOMMENDED in kvm_get_hv_cpuid,
since this feature can be used regardless of AVIC
Paolo:
- use arch.apicv_update_lock to protect the hv->synic_auto_eoi_used
instead of atomic ops
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210810205251.424103-12-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently on SVM, the kvm_request_apicv_update toggles the APICv
memslot without doing any synchronization.
If there is a mismatch between that memslot state and the AVIC state,
on one of the vCPUs, an APIC mmio access can be lost:
For example:
VCPU0: enable the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT
VCPU1: access an APIC mmio register.
Since AVIC is still disabled on VCPU1, the access will not be intercepted
by it, and neither will it cause MMIO fault, but rather it will just be
read/written from/to the dummy page mapped into the
APIC_ACCESS_PAGE_PRIVATE_MEMSLOT.
Fix that by adding a lock guarding the AVIC state changes, and carefully
order the operations of kvm_request_apicv_update to avoid this race:
1. Take the lock
2. Send KVM_REQ_APICV_UPDATE
3. Update the apic inhibit reason
4. Release the lock
This ensures that at (2) all vCPUs are kicked out of the guest mode,
but don't yet see the new avic state.
Then only after (4) all other vCPUs can update their AVIC state and resume.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210810205251.424103-10-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Thanks to the former patches, it is now possible to keep the APICv
memslot always enabled, and it will be invisible to the guest
when it is inhibited
This code is based on a suggestion from Sean Christopherson:
https://lkml.org/lkml/2021/7/19/2970
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210810205251.424103-9-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce kvm_mmu_slot_lpages() to calculcate lpage_info and rmap array size.
The other __kvm_mmu_slot_lpages() can take an extra parameter of npages rather
than fetching from the memslot pointer. Start to use the latter one in
kvm_alloc_memslot_metadata().
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210730220455.26054-4-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Invert the mask of bits that we pick from L2 in
nested_vmcb02_prepare_control
* Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr
This fixes a security issue that allowed a malicious L1 to run L2 with
AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled
AVIC to read/write the host physical memory at some offsets.
Fixes: 3d6368ef58 ("KVM: SVM: Add VMRUN handler")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the declaration of kvm_spurious_fault() to KVM's "private" x86.h,
it should never be called by anything other than low level KVM code.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
[sean: rebased to a series without __ex()/__kvm_handle_fault_on_reboot()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210809173955.1710866-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the __kvm_handle_fault_on_reboot() and __ex() macros now that all
VMX and SVM instructions use asm goto to handle the fault (or in the
case of VMREAD, completely custom logic). Drop kvm_spurious_fault()'s
asmlinkage annotation as __kvm_handle_fault_on_reboot() was the only
flow that invoked it from assembly code.
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210809173955.1710866-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit ae561edeb4 ("KVM: x86: DR0-DR3 are not clear on reset") added code to
ensure eff_db are updated when they're modified through non-standard paths.
But there is no reason to also update hardware DRs unless hardware breakpoints
are active or DR exiting is disabled, and in those cases updating hardware is
handled by KVM_DEBUGREG_WONT_EXIT and KVM_DEBUGREG_BP_ENABLED.
KVM_DEBUGREG_RELOAD just causes unnecesarry load of hardware DRs and is better
to be removed.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210809174307.145263-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>