Commit Graph

34394 Commits

Author SHA1 Message Date
Linus Torvalds ec181b7f30 Two fixes for x86:
- Map EFI runtime service data as encrypted when SEV is enabled otherwise
     e.g. SMBIOS data cannot be properly decoded by dmidecode.
 
   - Remove the warning in the vector management code which triggered when a
     managed interrupt affinity changed outside of a CPU hotplug
     operation. The warning was correct until the recent core code change
     that introduced a CPU isolation feature which needs to migrate managed
     interrupts away from online CPUs under certain conditions to achieve the
     isolation.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5uRi8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoSH9EACToDM3iADmLZnP4dookJpPWvxazCio
 UclqaIUE7k2Wg/EPmE0oNTQCxqh42rTX6Ifo5WaiCJbxIFZKGMhe02BwmQffilaS
 dOlxuEEeLQq3S4Ai10Mq7wcp5uVHCE/+IhaphwFrdPn/w99O0SZf/bpZMveh6xgR
 Qw3vMLav9FXpWqvnDTw0Vcrcd9sEnZ/iaLrXVDFAnwZggrUqq26Ia4DqUlOaiHGC
 DHESmYFlHcFqfzd6BOJXbsJqedL56Qav0n7zsIqz6B34cLyc8QOqnSn2HxzncP22
 BLPVLvdLi7yqrWIoVgSefcAJq1wcE+Vl9V6mvjxMK4GieYZ91WdLKIbvqUPRZvhU
 viDzZ7NCsg6TmQBD6ilvYrMNB9ds+GNl/1dZ9c854zuvnTcnKqRq9CE6djnlqaLw
 AfHQQJ+kPjrnVyyPnyYBqrWgfsVJ3ueE8BEPtTfruL2CDQLrwiScwCNZ3qQmZ6Bx
 r00wbx+QtATHiZ97pwR1FJr1gyuZE6q3tY3gnb5ORIY19DfkwzRprKpE+Z++3N1H
 Z5Vc7A67CcQe6uCwyViJZuamNgBaXvFmbDDjt3d8N4KKnLK647WyW0XutabQppWa
 Jueq9XJX2V752y81i2Gf2+/U7xGOK0C4QajRMbqiizRBHKiG1JXpi9yCrdqNldEP
 ocz5HASe634nng==
 =KeLM
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Two fixes for x86:

   - Map EFI runtime service data as encrypted when SEV is enabled.

     Otherwise e.g. SMBIOS data cannot be properly decoded by dmidecode.

   - Remove the warning in the vector management code which triggered
     when a managed interrupt affinity changed outside of a CPU hotplug
     operation.

     The warning was correct until the recent core code change that
     introduced a CPU isolation feature which needs to migrate managed
     interrupts away from online CPUs under certain conditions to
     achieve the isolation"

* tag 'x86-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vector: Remove warning on managed interrupt migration
  x86/ioremap: Map EFI runtime services data as encrypted for SEV
2020-03-15 12:52:56 -07:00
Linus Torvalds e99bc917fe A pile of perf fixes:
- AMD uncore driver:
 
     Replace the open coded sanity check with the core variant, which
     provides the correct error code and also leaves a hint in dmesg
 
   - tools:
 
     - Fix the stdio input handling with glibc versions >= 2.28
 
     - Unbreak the futex-wake benchmark which was reduced to 0 test threads
       due to the conversion to cpumaps
 
     - Initialize sigaction structs before invoking sys_sigactio()
 
     - Plug the mapfile memory leak in perf jevents
 
     - Fix off by one relative directory includes
 
     - Fix an undefined string comparison in perf diff
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5uQuETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVeLEAC3lJ8jRzGfETQJFyS4C+vj1r+Jglvq
 Hi7Zd8hLDAd+F/aO2/DMgHkKLqpq+sj9qjnPv0Mu/eAS2AbOC3Q4Nz1vm0mxfmyB
 D6+/t3O2t01hyCJ70g8z7HgJclYyLc+JU72F37UcMCBJNHKFUx6ZrgMOPFRwebc6
 aUgyObX5YJ7h35Bl0kYLB0z4q1Znvus3YlFxrEOF78Xldx7zjTJOBsXoDdBjcWVP
 axtvhOnI3aR8E08a+1nbOmE79qSkscneXY7pg0FVDs9/Zq+38BEOVlzDC5aRG3Rm
 4fmty+NO3zOe663kNAGTJ/UQu1fIXGn+6rZ+5lH2pdtgkdeZN6zoVNQFVZrCarhC
 9Skrgz2dZ7DQe6/VwM7Z20oChh5V9q/207Rr2w/6+hmtQ/mnriWpXODZxPevc8kN
 KYHj3Lmo63MrSWIp4Qm4U6wMC9LOGZDUojPs0zbd3prhPoRGVlivTbkQ497Rht00
 BW8TCFhKhIqQJyE72KPI1zlmb0piihCHmMUi1XtuRi+3LpGFPQGXHBAxVrT9HJuF
 1zGr9VeiY8XtHWBdYoD176aOD8wO36mABHkDo2DY7AmkyI8OefGj5EFwtnr+e1aF
 F1LRYw+IGn4kMn35NZVNiJUisGzVWGIrWGVCGlTdoKgm3hhVyoRuPKCCzV2GVXd+
 3hjvmSY9aFmrMw==
 =uJcr
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A pile of perf fixes:

  Kernel side:

   - AMD uncore driver: Replace the open coded sanity check with the
     core variant, which provides the correct error code and also leaves
     a hint in dmesg

  Tooling:

   - Fix the stdio input handling with glibc versions >= 2.28

   - Unbreak the futex-wake benchmark which was reduced to 0 test
     threads due to the conversion to cpumaps

   - Initialize sigaction structs before invoking sys_sigactio()

   - Plug the mapfile memory leak in perf jevents

   - Fix off by one relative directory includes

   - Fix an undefined string comparison in perf diff"

* tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
  tools: Fix off-by 1 relative directory includes
  perf jevents: Fix leak of mapfile memory
  perf bench: Clear struct sigaction before sigaction() syscall
  perf bench futex-wake: Restore thread count default to online CPU count
  perf top: Fix stdio interface input handling with glibc 2.28+
  perf diff: Fix undefined string comparision spotted by clang's -Wstring-compare
  perf symbols: Don't try to find a vmlinux file when looking for kernel modules
  perf bench: Share some global variables to fix build with gcc 10
  perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files
  perf env: Do not return pointers to local variables
  perf tests bp_account: Make global variable static
2020-03-15 12:50:15 -07:00
Linus Torvalds 52ac3777fc Two RAS related fixes:
- Shut down the per CPU thermal throttling poll work properly when a CPU
     goes offline. The missing shutdown caused the poll work to be migrated
     to a unbound worker which triggered warnings about the usage of
     smp_processor_id() in preemptible context
 
   - Fix the PPIN feature initialization which missed to enable the
     functionality when PPIN_CTL was enabled but the MSR locked against
     updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5uRHUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVVGD/0WEjZoB8yhwez6u0YNFhUkjfP8JFC1
 mGdWMoevyH3Tb+DQNX3cW95t2O7IxP0N6OUNnYYQ9Tlqwt6r0ptJpNnXO7CV2+Jh
 5lxpw/Uv2kQv69BNDK9qPDhiIBPzZQCg/utDTVdIyG0y+XU0q/IZqXh+XedAJsVr
 P3U7KC//NwTYnlpPWjDsG26GHSguV4kj+Lwi88nfh1DJ7eawb8AF4k965pLmOoF9
 g13EFxv2FW1/uq+QJq5ophQIH/pPI/T67rhIyLWxFsCByBzVKjm4BBgXH4gb+QIn
 OofVQcaWCpZCOq2ZTNfHWdPvJK2ziig9w+twbArb7Cb9aOgp3Oe1zbp2VD4nKu4+
 0G5E2Vdv6qRrEIk5LUTqlyOIogd5xPSufaCGF/HC/qXqBxqwWD0tUvjtYyRwwy+Y
 u90bo90zlMjUoDirgtZrjYe0bXuy3xJ+FxZ5OxovGRxLn4qqBqEJGrXYvB0LIlpd
 3x+YeHB4T2pwC6Ya5Odi6RKhwMKpro24dDMJ9jIR1u/NwIgJ2elSO9bsw6SZ823e
 /Mwns7CC/7xtjOCJXPlyj4Uw0TzwTbp1W9Kb0OqJo6q+ntvxbAhoMf32FDxg0OKC
 h4trc3FZt+e2a0l8R8e3nNeAvnS0fM1P4vtg18EcX8SqlSoALJkS3XUO4WeCFLBh
 F9jOt/LSf+4okQ==
 =9W7B
 -----END PGP SIGNATURE-----

Merge tag 'ras-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS fixes from Thomas Gleixner:
 "Two RAS related fixes:

   - Shut down the per CPU thermal throttling poll work properly when a
     CPU goes offline.

     The missing shutdown caused the poll work to be migrated to a
     unbound worker which triggered warnings about the usage of
     smp_processor_id() in preemptible context

   - Fix the PPIN feature initialization which missed to enable the
     functionality when PPIN_CTL was enabled but the MSR locked against
     updates"

* tag 'ras-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix logic and comments around MSR_PPIN_CTL
  x86/mce/therm_throt: Undo thermal polling properly on CPU offline
2020-03-15 12:44:23 -07:00
Linus Torvalds 6693075e0f Bugfixes, x86+s390.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJebMXnAAoJEL/70l94x66D3fYIAJ1r+o2qgzadwEqoXTvlihjB
 ujX1jOs20EJJ56VhTtXF/wZQc+7VeKCjpIqNv4WaeSYPUhzFGyL9t5tw1YdRDCwY
 u6gklxruIzZodgp+vCoTkPyyUylVmY50sY/yBIJ4F8qOaMxhTEE1aXzGuaOrYqVO
 MmIlAltEKQzdXPO1SVPD7triGPgUTj+DRxrlyRrGt2ItiMUincCz9K6TDyXFib0r
 SSCVFNYtYmzu/bV/E4/Sphi2BxCQEem5DIFWLcngzN8Wy5oCoRVzPGugT4Q9eXWt
 ZtWIDh473JGiXBLYmDq4REJsRSca+7s/YiiLSiQwYfByhIPJpVEoy54fcdaZflo=
 =T4AD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes for x86 and s390"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs
  KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect
  KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1
  KVM: s390: Also reset registers in sync regs for initial cpu reset
  KVM: fix Kconfig menu text for -Werror
  KVM: x86: remove stale comment from struct x86_emulate_ctxt
  KVM: x86: clear stale x86_emulate_ctxt->intercept value
  KVM: SVM: Fix the svm vmexit code for WRMSR
  KVM: X86: Fix dereference null cpufreq policy
2020-03-14 15:45:26 -07:00
Paolo Bonzini 018cabb694 Merge branch 'kvm-null-pointer-fix' into kvm-master 2020-03-14 12:49:37 +01:00
Vitaly Kuznetsov 95fa10103d KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs
When an EVMCS enabled L1 guest on KVM will tries doing enlightened VMEnter
with EVMCS GPA = 0 the host crashes because the

evmcs_gpa != vmx->nested.hv_evmcs_vmptr

condition in nested_vmx_handle_enlightened_vmptrld() will evaluate to
false (as nested.hv_evmcs_vmptr is zeroed after init). The crash will
happen on vmx->nested.hv_evmcs pointer dereference.

Another problematic EVMCS ptr value is '-1' but it only causes host crash
after nested_release_evmcs() invocation. The problem is exactly the same as
with '0', we mistakenly think that the EVMCS pointer hasn't changed and
thus nested.hv_evmcs_vmptr is valid.

Resolve the issue by adding an additional !vmx->nested.hv_evmcs
check to nested_vmx_handle_enlightened_vmptrld(), this way we will
always be trying kvm_vcpu_map() when nested.hv_evmcs is NULL
and this is supposed to catch all invalid EVMCS GPAs.

Also, initialize hv_evmcs_vmptr to '0' in nested_release_evmcs()
to be consistent with initialization where we don't currently
set hv_evmcs_vmptr to '-1'.

Cc: stable@vger.kernel.org
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14 12:49:27 +01:00
Nitesh Narayan Lal 0c22056f8c KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect
Previously all fields of structure kvm_lapic_irq were not initialized
before it was passed to kvm_bitmap_or_dest_vcpus(). Which will cause
an issue when any of those fields are used for processing a request.
For example not initializing the msi_redir_hint field before passing
to the kvm_bitmap_or_dest_vcpus(), may lead to a misbehavior of
kvm_apic_map_get_dest_lapic(). This will specifically happen when the
kvm_lowest_prio_delivery() returns TRUE due to a non-zero garbage
value of msi_redir_hint, which should not happen as the request belongs
to APIC fixed delivery mode and we do not want to deliver the
interrupt only to the lowest priority candidate.

This patch initializes all the fields of kvm_lapic_irq based on the
values of ioapic redirect_entry object before passing it on to
kvm_bitmap_or_dest_vcpus().

Fixes: 7ee30bc132 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[Set level to false since the value doesn't really matter. Suggested
 by Vitaly Kuznetsov. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14 10:46:01 +01:00
Sean Christopherson 7a57c09bb1 KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1
Enable ENCLS-exiting (and thus set vmcs.ENCLS_EXITING_BITMAP) only if
the CPU supports SGX1.  Per Intel's SDM, all ENCLS leafs #UD if SGX1
is not supported[*], i.e. intercepting ENCLS to inject a #UD is
unnecessary.

Avoiding ENCLS-exiting even when it is reported as supported by the CPU
works around a reported issue where SGX is "hard" disabled after an S3
suspend/resume cycle, i.e. CPUID.0x7.SGX=0 and the VMCS field/control
are enumerated as unsupported.  While the root cause of the S3 issue is
unknown, it's definitely _not_ a KVM (or kernel) bug, i.e. this is a
workaround for what is most likely a hardware or firmware issue.  As a
bonus side effect, KVM saves a VMWRITE when first preparing vmcs01 and
vmcs02.

Note, SGX must be disabled in BIOS to take advantage of this workaround

[*] The additional ENCLS CPUID check on SGX1 exists so that SGX can be
    globally "soft" disabled post-reset, e.g. if #MC bits in MCi_CTL are
    cleared.  Soft disabled meaning disabling SGX without clearing the
    primary CPUID bit (in leaf 0x7) and without poking into non-SGX
    CPU paths, e.g. for the VMCS controls.

Fixes: 0b665d3040 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest")
Reported-by: Toni Spets <toni.spets@iki.fi>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14 10:34:51 +01:00
Peter Xu 469ff207b4 x86/vector: Remove warning on managed interrupt migration
The vector management code assumes that managed interrupts cannot be
migrated away from an online CPU. free_moved_vector() has a WARN_ON_ONCE()
which triggers when a managed interrupt vector association on a online CPU
is cleared. The CPU offline code uses a different mechanism which cannot
trigger this.

This assumption is not longer correct because the new CPU isolation feature
which affects the placement of managed interrupts must be able to move a
managed interrupt away from an online CPU.

There are two reasons why this can happen:

  1) When the interrupt is activated the affinity mask which was
     established in irq_create_affinity_masks() is handed in to
     the vector allocation code. This mask contains all CPUs to which
     the interrupt can be made affine to, but this does not take the
     CPU isolation 'managed_irq' mask into account.

     When the interrupt is finally requested by the device driver then the
     affinity is checked again and the CPU isolation 'managed_irq' mask is
     taken into account, which moves the interrupt to a non-isolated CPU if
     possible.

  2) The interrupt can be affine to an isolated CPU because the
     non-isolated CPUs in the calculated affinity mask are not online.

     Once a non-isolated CPU which is in the mask comes online the
     interrupt is migrated to this non-isolated CPU

In both cases the regular online migration mechanism is used which triggers
the WARN_ON_ONCE() in free_moved_vector().

Case #1 could have been addressed by taking the isolation mask into
account, but that would require a massive code change in the activation
logic and the eventual migration event was accepted as a reasonable
tradeoff when the isolation feature was developed. But even if #1 would be
addressed, #2 would still trigger it.

Of course the warning in free_moved_vector() was overlooked at that time
and the above two cases which have been discussed during patch review have
obviously never been tested before the final submission.

So keep it simple and remove the warning.

[ tglx: Rewrote changelog and added a comment to free_moved_vector() ]

Fixes: 11ea68f553 ("genirq, sched/isolation: Isolate from handling managed interrupts")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>                                                                                                                                                                       
Link: https://lkml.kernel.org/r/20200312205830.81796-1-peterx@redhat.com
2020-03-13 15:29:26 +01:00
Linus Torvalds 2644bc8569 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "Fix a build problem with x86/curve25519"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/curve25519 - support assemblers with no adx support
2020-03-12 09:25:55 -07:00
Kim Phillips f967140dfb perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
Enable the sampling check in kernel/events/core.c::perf_event_open(),
which returns the more appropriate -EOPNOTSUPP.

BEFORE:

  $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (l3_request_g1.caching_l3_cache_accesses).
  /bin/dmesg | grep -i perf may provide additional information.

With nothing relevant in dmesg.

AFTER:

  $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
  Error:
  l3_request_g1.caching_l3_cache_accesses: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

Fixes: c43ca5091a ("perf/x86/amd: Add support for AMD NB and L2I "uncore" counters")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200311191323.13124-1-kim.phillips@amd.com
2020-03-12 14:08:50 +01:00
Tom Lendacky 985e537a40 x86/ioremap: Map EFI runtime services data as encrypted for SEV
The dmidecode program fails to properly decode the SMBIOS data supplied
by OVMF/UEFI when running in an SEV guest. The SMBIOS area, under SEV, is
encrypted and resides in reserved memory that is marked as EFI runtime
services data.

As a result, when memremap() is attempted for the SMBIOS data, it
can't be mapped as regular RAM (through try_ram_remap()) and, since
the address isn't part of the iomem resources list, it isn't mapped
encrypted through the fallback ioremap().

Add a new __ioremap_check_other() to deal with memory types like
EFI_RUNTIME_SERVICES_DATA which are not covered by the resource ranges.

This allows any runtime services data which has been created encrypted,
to be mapped encrypted too.

 [ bp: Move functionality to a separate function. ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: <stable@vger.kernel.org> # 5.3
Link: https://lkml.kernel.org/r/2d9e16eb5b53dc82665c95c6764b7407719df7a0.1582645327.git.thomas.lendacky@amd.com
2020-03-11 15:54:54 +01:00
Jason A. Donenfeld a754acc3e4 KVM: fix Kconfig menu text for -Werror
This was evidently copy and pasted from the i915 driver, but the text
wasn't updated.

Fixes: 4f337faf1c ("KVM: allow disabling -Werror")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-05 15:27:43 +01:00
Jason A. Donenfeld 1579f1bc3b crypto: x86/curve25519 - support assemblers with no adx support
Some older version of GAS do not support the ADX instructions, similarly
to how they also don't support AVX and such. This commit adds the same
build-time detection mechanisms we use for AVX and others for ADX, and
then makes sure that the curve25519 library dispatcher calls the right
functions.

Reported-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-05 18:28:09 +11:00
Vitaly Kuznetsov d718fdc3e7 KVM: x86: remove stale comment from struct x86_emulate_ctxt
Commit c44b4c6ab8 ("KVM: emulate: clean up initializations in
init_decode_cache") did some field shuffling and instead of
[opcode_len, _regs) started clearing [has_seg_override, modrm).
The comment about clearing fields altogether is not true anymore.

Fixes: c44b4c6ab8 ("KVM: emulate: clean up initializations in init_decode_cache")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-03 17:38:22 +01:00
Vitaly Kuznetsov 342993f96a KVM: x86: clear stale x86_emulate_ctxt->intercept value
After commit 07721feee4 ("KVM: nVMX: Don't emulate instructions in guest
mode") Hyper-V guests on KVM stopped booting with:

 kvm_nested_vmexit:    rip fffff802987d6169 reason EPT_VIOLATION info1 181
    info2 0 int_info 0 int_info_err 0
 kvm_page_fault:       address febd0000 error_code 181
 kvm_emulate_insn:     0:fffff802987d6169: f3 a5
 kvm_emulate_insn:     0:fffff802987d6169: f3 a5 FAIL
 kvm_inj_exception:    #UD (0x0)

"f3 a5" is a "rep movsw" instruction, which should not be intercepted
at all.  Commit c44b4c6ab8 ("KVM: emulate: clean up initializations in
init_decode_cache") reduced the number of fields cleared by
init_decode_cache() claiming that they are being cleared elsewhere,
'intercept', however, is left uncleared if the instruction does not have
any of the "slow path" flags (NotImpl, Stack, Op3264, Sse, Mmx, CheckPerm,
NearBranch, No16 and of course Intercept itself).

Fixes: c44b4c6ab8 ("KVM: emulate: clean up initializations in init_decode_cache")
Fixes: 07721feee4 ("KVM: nVMX: Don't emulate instructions in guest mode")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-03 17:38:16 +01:00
Haiwei Li aaca21007b KVM: SVM: Fix the svm vmexit code for WRMSR
In svm, exit_code for MSR writes is not EXIT_REASON_MSR_WRITE which
belongs to vmx.

According to amd manual, SVM_EXIT_MSR(7ch) is the exit_code of VMEXIT_MSR
due to RDMSR or WRMSR access to protected MSR. Additionally, the processor
indicates in the VMCB's EXITINFO1 whether a RDMSR(EXITINFO1=0) or
WRMSR(EXITINFO1=1) was intercepted.

Signed-off-by: Haiwei Li <lihaiwei@tencent.com>
Fixes: 1e9e2622a1 ("KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath", 2019-11-21)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-02 17:06:52 +01:00
Wanpeng Li 9a11997e75 KVM: X86: Fix dereference null cpufreq policy
Naresh Kamboju reported:

   Linux version 5.6.0-rc4 (oe-user@oe-host) (gcc version
  (GCC)) #1 SMP Sun Mar 1 22:59:08 UTC 2020
   kvm: no hardware support
   BUG: kernel NULL pointer dereference, address: 000000000000028c
   #PF: supervisor read access in kernel mode
   #PF: error_code(0x0000) - not-present page
   PGD 0 P4D 0
   Oops: 0000 [#1] SMP NOPTI
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4 #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
  04/01/2014
   RIP: 0010:kobject_put+0x12/0x1c0
   Call Trace:
    cpufreq_cpu_put+0x15/0x20
    kvm_arch_init+0x1f6/0x2b0
    kvm_init+0x31/0x290
    ? svm_check_processor_compat+0xd/0xd
    ? svm_check_processor_compat+0xd/0xd
    svm_init+0x21/0x23
    do_one_initcall+0x61/0x2f0
    ? rdinit_setup+0x30/0x30
    ? rcu_read_lock_sched_held+0x4f/0x80
    kernel_init_freeable+0x219/0x279
    ? rest_init+0x250/0x250
    kernel_init+0xe/0x110
    ret_from_fork+0x27/0x50
   Modules linked in:
   CR2: 000000000000028c
   ---[ end trace 239abf40c55c409b ]---
   RIP: 0010:kobject_put+0x12/0x1c0

cpufreq policy which is get by cpufreq_cpu_get() can be NULL if it is failure,
this patch takes care of it.

Fixes: aaec7c03de (KVM: x86: avoid useless copy of cpufreq policy)
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-02 17:06:52 +01:00
Linus Torvalds 2873dc2547 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes: a pkeys fix for a bug that triggers with weird BIOS
  settings, and two Xen PV fixes: a paravirt interface fix, and
  pagetable dumping fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix dump_pagetables with Xen PV
  x86/ioperm: Add new paravirt function update_io_bitmap()
  x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
2020-03-02 06:54:54 -06:00
Linus Torvalds e130a920f6 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "Three fixes to EFI mixed boot mode, mostly related to x86-64 vmap
  stacks activated years ago, bug-fixed recently for EFI, which had
  knock-on effects of various 1:1 mapping assumptions in mixed mode.

  There's also a READ_ONCE() fix for reading an mmap-ed EFI firmware
  data field only once, out of caution"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: READ_ONCE rng seed size before munmap
  efi/x86: Handle by-ref arguments covering multiple pages in mixed mode
  efi/x86: Remove support for EFI time and counter services in mixed mode
  efi/x86: Align GUIDs to their size in the mixed mode runtime wrapper
2020-03-02 06:41:41 -06:00
Linus Torvalds f853ed90e2 More bugfixes, including a few remaining "make W=1" issues such
as too large frame sizes on some configurations.  On the
 ARM side, the compiler was messing up shadow stacks between
 EL1 and EL2 code, which is easily fixed with __always_inline.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeXAT4AAoJEL/70l94x66DWywH/1kv4MmeGo6PI0Nxk/yvA7X8
 78iqIBchtxZX0v/9kqpTB7bYmHyTgmZHM+IkwtIUANDSaOvWqJwU+TLUfduOiuXF
 NxBHcZDyuMoftX5CSQ+bJ5PwxKijAdJsIkCZ13CnsTCkwcfamSGypFUCK8LacPeq
 WHvV5Ws5pFc51xrP3CH1DrRhLoulaBmt5xxqK9fxWtslrlsnm1uNza5vs8As8CzM
 apnmdRIf5p4v91Zic3PFH7/GXES0m1tjIBKdtZ4YHb8yrXV/kBsEVhhTjqE9mrUq
 qtRRl5waOFoP4yc9ey52PAbMm1x1Ho/pyunpM0xh40Yq8OPFwqXBPTnWfobSoiM=
 =LNQc
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "More bugfixes, including a few remaining "make W=1" issues such as too
  large frame sizes on some configurations.

  On the ARM side, the compiler was messing up shadow stacks between EL1
  and EL2 code, which is easily fixed with __always_inline"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: check descriptor table exits on instruction emulation
  kvm: x86: Limit the number of "kvm: disabled by bios" messages
  KVM: x86: avoid useless copy of cpufreq policy
  KVM: allow disabling -Werror
  KVM: x86: allow compiling as non-module with W=1
  KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
  KVM: Introduce pv check helpers
  KVM: let declaration of kvm_get_running_vcpus match implementation
  KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
  arm64: Ask the compiler to __always_inline functions used by KVM at HYP
  KVM: arm64: Define our own swab32() to avoid a uapi static inline
  KVM: arm64: Ask the compiler to __always_inline functions used at HYP
  kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()
  KVM: arm/arm64: Fix up includes for trace.h
2020-03-01 15:16:35 -06:00
Oliver Upton 86f7e90ce8 KVM: VMX: check descriptor table exits on instruction emulation
KVM emulates UMIP on hardware that doesn't support it by setting the
'descriptor table exiting' VM-execution control and performing
instruction emulation. When running nested, this emulation is broken as
KVM refuses to emulate L2 instructions by default.

Correct this regression by allowing the emulation of descriptor table
instructions if L1 hasn't requested 'descriptor table exiting'.

Fixes: 07721feee4 ("KVM: nVMX: Don't emulate instructions in guest mode")
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-01 19:26:31 +01:00
Juergen Gross bba42affa7 x86/mm: Fix dump_pagetables with Xen PV
Commit 2ae27137b2 ("x86: mm: convert dump_pagetables to use
walk_page_range") broke Xen PV guests as the hypervisor reserved hole in
the memory map was not taken into account.

Fix that by starting the kernel range only at GUARD_HOLE_END_ADDR.

Fixes: 2ae27137b2 ("x86: mm: convert dump_pagetables to use walk_page_range")
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Julien Grall <julien@xen.org>
Link: https://lkml.kernel.org/r/20200221103851.7855-1-jgross@suse.com
2020-02-29 12:43:10 +01:00
Juergen Gross 99bcd4a6e5 x86/ioperm: Add new paravirt function update_io_bitmap()
Commit 111e7b15cf ("x86/ioperm: Extend IOPL config to control ioperm()
as well") reworked the iopl syscall to use I/O bitmaps.

Unfortunately this broke Xen PV domains using that syscall as there is
currently no I/O bitmap support in PV domains.

Add I/O bitmap support via a new paravirt function update_io_bitmap which
Xen PV domains can use to update their I/O bitmaps via a hypercall.

Fixes: 111e7b15cf ("x86/ioperm: Extend IOPL config to control ioperm() as well")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Cc: <stable@vger.kernel.org> # 5.5
Link: https://lkml.kernel.org/r/20200218154712.25490-1-jgross@suse.com
2020-02-29 12:43:09 +01:00
Erwan Velu ef935c25fd kvm: x86: Limit the number of "kvm: disabled by bios" messages
In older version of systemd(219), at boot time, udevadm is called with :
	/usr/bin/udevadm trigger --type=devices --action=add"

This program generates an echo "add" in /sys/devices/system/cpu/cpu<x>/uevent,
leading to the "kvm: disabled by bios" message in case of your Bios disabled
the virtualization extensions.

On a modern system running up to 256 CPU threads, this pollutes the Kernel logs.

This patch offers to ratelimit this message to avoid any userspace program triggering
this uevent printing this message too often.

This patch is only a workaround but greatly reduce the pollution without
breaking the current behavior of printing a message if some try to instantiate
KVM on a system that doesn't support it.

Note that recent versions of systemd (>239) do not have trigger this behavior.

This patch will be useful at least for some using older systemd with recent Kernels.

Signed-off-by: Erwan Velu <e.velu@criteo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28 11:37:20 +01:00
Paolo Bonzini aaec7c03de KVM: x86: avoid useless copy of cpufreq policy
struct cpufreq_policy is quite big and it is not a good idea
to allocate one on the stack.  Just use cpufreq_cpu_get and
cpufreq_cpu_put which is even simpler.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28 10:54:50 +01:00
Paolo Bonzini 4f337faf1c KVM: allow disabling -Werror
Restrict -Werror to well-tested configurations and allow disabling it
via Kconfig.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28 10:45:28 +01:00
Valdis Klētnieks 575b255c16 KVM: x86: allow compiling as non-module with W=1
Compile error with CONFIG_KVM_INTEL=y and W=1:

  CC      arch/x86/kvm/vmx/vmx.o
arch/x86/kvm/vmx/vmx.c:68:32: error: 'vmx_cpu_id' defined but not used [-Werror=unused-const-variable=]
   68 | static const struct x86_cpu_id vmx_cpu_id[] = {
      |                                ^~~~~~~~~~
cc1: all warnings being treated as errors

When building with =y, the MODULE_DEVICE_TABLE macro doesn't generate a
reference to the structure (or any code at all).  This makes W=1 compiles
unhappy.

Wrap both in a #ifdef to avoid the issue.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
[Do the same for CONFIG_KVM_AMD. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28 10:35:37 +01:00
Wanpeng Li 8a9442f49c KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
Nick Desaulniers Reported:

  When building with:
  $ make CC=clang arch/x86/ CFLAGS=-Wframe-larger-than=1000
  The following warning is observed:
  arch/x86/kernel/kvm.c:494:13: warning: stack frame size of 1064 bytes in
  function 'kvm_send_ipi_mask_allbutself' [-Wframe-larger-than=]
  static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int
  vector)
              ^
  Debugging with:
  https://github.com/ClangBuiltLinux/frame-larger-than
  via:
  $ python3 frame_larger_than.py arch/x86/kernel/kvm.o \
    kvm_send_ipi_mask_allbutself
  points to the stack allocated `struct cpumask newmask` in
  `kvm_send_ipi_mask_allbutself`. The size of a `struct cpumask` is
  potentially large, as it's CONFIG_NR_CPUS divided by BITS_PER_LONG for
  the target architecture. CONFIG_NR_CPUS for X86_64 can be as high as
  8192, making a single instance of a `struct cpumask` 1024 B.

This patch fixes it by pre-allocate 1 cpumask variable per cpu and use it for
both pv tlb and pv ipis..

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28 10:34:25 +01:00
Wanpeng Li a262bca3ab KVM: Introduce pv check helpers
Introduce some pv check helpers for consistency.

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28 10:34:19 +01:00
Paolo Bonzini 7943f4acea KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
Even if APICv is disabled at startup, the backing page and ir_list need
to be initialized in case they are needed later.  The only case in
which this can be skipped is for userspace irqchip, and that must be
done because avic_init_backing_page dereferences vcpu->arch.apic
(which is NULL for userspace irqchip).

Tested-by: rmuncrief@humanavance.com
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=206579
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28 10:33:17 +01:00
Tony Luck 59b5809655 x86/mce: Fix logic and comments around MSR_PPIN_CTL
There are two implemented bits in the PPIN_CTL MSR:

Bit 0: LockOut (R/WO)
      Set 1 to prevent further writes to MSR_PPIN_CTL.

Bit 1: Enable_PPIN (R/W)
       If 1, enables MSR_PPIN to be accessible using RDMSR.
       If 0, an attempt to read MSR_PPIN will cause #GP.

So there are four defined values:
	0: PPIN is disabled, PPIN_CTL may be updated
	1: PPIN is disabled. PPIN_CTL is locked against updates
	2: PPIN is enabled. PPIN_CTL may be updated
	3: PPIN is enabled. PPIN_CTL is locked against updates

Code would only enable the X86_FEATURE_INTEL_PPIN feature for case "2".
When it should have done so for both case "2" and case "3".

Fix the final test to just check for the enable bit. Also fix some of
the other comments in this function.

Fixes: 3f5a7896a5 ("x86/mce: Include the PPIN in MCE records when available")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200226011737.9958-1-tony.luck@intel.com
2020-02-27 21:36:42 +01:00
Sean Christopherson 735a6dd022 x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
Explicitly set X86_FEATURE_OSPKE via set_cpu_cap() instead of calling
get_cpu_cap() to pull the feature bit from CPUID after enabling CR4.PKE.
Invoking get_cpu_cap() effectively wipes out any {set,clear}_cpu_cap()
changes that were made between this_cpu->c_init() and setup_pku(), as
all non-synthetic feature words are reinitialized from the CPU's CPUID
values.

Blasting away capability updates manifests most visibility when running
on a VMX capable CPU, but with VMX disabled by BIOS.  To indicate that
VMX is disabled, init_ia32_feat_ctl() clears X86_FEATURE_VMX, using
clear_cpu_cap() instead of setup_clear_cpu_cap() so that KVM can report
which CPU is misconfigured (KVM needs to probe every CPU anyways).
Restoring X86_FEATURE_VMX from CPUID causes KVM to think VMX is enabled,
ultimately leading to an unexpected #GP when KVM attempts to do VMXON.

Arguably, init_ia32_feat_ctl() should use setup_clear_cpu_cap() and let
KVM figure out a different way to report the misconfigured CPU, but VMX
is not the only feature bit that is affected, i.e. there is precedent
that tweaking feature bits via {set,clear}_cpu_cap() after ->c_init()
is expected to work.  Most notably, x86_init_rdrand()'s clearing of
X86_FEATURE_RDRAND when RDRAND malfunctions is also overwritten.

Fixes: 0697694564 ("x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU")
Reported-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.com
2020-02-27 19:02:45 +01:00
Christoph Hellwig cfe2ce49b9 Revert "KVM: x86: enable -Werror"
This reverts commit ead68df94d.

Using the -Werror flag breaks the build for me due to mostly harmless
KASAN or similar warnings:

  arch/x86/kvm/x86.c: In function ‘kvm_timer_init’:
  arch/x86/kvm/x86.c:7209:1: error: the frame size of 1112 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Feel free to add a CONFIG_WERROR if you care strong enough, but don't
break peoples builds for absolutely no good reason.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-26 09:59:58 -08:00
Ard Biesheuvel 8319e9d5ad efi/x86: Handle by-ref arguments covering multiple pages in mixed mode
The mixed mode runtime wrappers are fragile when it comes to how the
memory referred to by its pointer arguments are laid out in memory, due
to the fact that it translates these addresses to physical addresses that
the runtime services can dereference when running in 1:1 mode. Since
vmalloc'ed pages (including the vmap'ed stack) are not contiguous in the
physical address space, this scheme only works if the referenced memory
objects do not cross page boundaries.

Currently, the mixed mode runtime service wrappers require that all by-ref
arguments that live in the vmalloc space have a size that is a power of 2,
and are aligned to that same value. While this is a sensible way to
construct an object that is guaranteed not to cross a page boundary, it is
overly strict when it comes to checking whether a given object violates
this requirement, as we can simply take the physical address of the first
and the last byte, and verify that they point into the same physical page.

When this check fails, we emit a WARN(), but then simply proceed with the
call, which could cause data corruption if the next physical page belongs
to a mapping that is entirely unrelated.

Given that with vmap'ed stacks, this condition is much more likely to
trigger, let's relax the condition a bit, but fail the runtime service
call if it does trigger.

Fixes: f6697df36b ("x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200221084849.26878-4-ardb@kernel.org
2020-02-26 15:31:42 +01:00
Ard Biesheuvel f80c9f6476 efi/x86: Remove support for EFI time and counter services in mixed mode
Mixed mode calls at runtime are rather tricky with vmap'ed stacks,
as we can no longer assume that data passed in by the callers of the
EFI runtime wrapper routines is contiguous in physical memory.

We need to fix this, but before we do, let's drop the implementations
of routines that we know are never used on x86, i.e., the RTC related
ones. Given that UEFI rev2.8 permits any runtime service to return
EFI_UNSUPPORTED at runtime, let's return that instead.

As get_next_high_mono_count() is never used at all, even on other
architectures, let's make that return EFI_UNSUPPORTED too.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200221084849.26878-3-ardb@kernel.org
2020-02-26 15:31:42 +01:00
Ard Biesheuvel 63056e8b5e efi/x86: Align GUIDs to their size in the mixed mode runtime wrapper
Hans reports that his mixed mode systems running v5.6-rc1 kernels hit
the WARN_ON() in virt_to_phys_or_null_size(), caused by the fact that
efi_guid_t objects on the vmap'ed stack happen to be misaligned with
respect to their sizes. As a quick (i.e., backportable) fix, copy GUID
pointer arguments to the local stack into a buffer that is naturally
aligned to its size, so that it is guaranteed to cover only one
physical page.

Note that on x86, we cannot rely on the stack pointer being aligned
the way the compiler expects, so we need to allocate an 8-byte aligned
buffer of sufficient size, and copy the GUID into that buffer at an
offset that is aligned to 16 bytes.

Fixes: f6697df36b ("x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y")
Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Cc: linux-efi@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200221084849.26878-2-ardb@kernel.org
2020-02-26 15:31:41 +01:00
Thomas Gleixner d364847eed x86/mce/therm_throt: Undo thermal polling properly on CPU offline
Chris Wilson reported splats from running the thermal throttling
workqueue callback on offlined CPUs. The problem is that that callback
should not even run on offlined CPUs but it happens nevertheless because
the offlining callback thermal_throttle_offline() does not symmetrically
undo the setup work done in its onlining counterpart. IOW,

 1. The thermal interrupt vector should be masked out before ...

 2. ... cancelling any pending work synchronously so that no new work is
 enqueued anymore.

Do those things and fix the issue properly.

 [ bp: Write commit message. ]

Fixes: f6656208f0 ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Pandruvada, Srinivas <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/158120068234.18291.7938335950259651295@skylake-alporthouse-com
2020-02-25 21:21:44 +01:00
Linus Torvalds 63623fd449 Bugfixes, including the fix for CVE-2020-2732 and a few
issues found by "make W=1".
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeVBwcAAoJEL/70l94x66DB9AH/AxWhmtf6YVXMNyZjXydxa1f
 hYVm9wg9GCsZS+7cktMhq0/uDEu5IjaCv7d+bzIcYZdFAOcs5nBUUjn1LtVl9w1y
 48vobyOa8pXpORerBtZtaO1kt4sfFR63zm7uau32DzXrz3qpHlMUjPdL08A1e35V
 cSSPAHHsl9S1TbDryc/VUNCOgauJes6LHbd3CdeAXU6lzMBW8JWbF2b/MAkvHG6n
 Hw5LpicWSeTxoPjR4Oi0Yx3VKvWfS9608netSJmuCNsv36wrhzKR1iuyb3kNCkAy
 AIlALn4PZq1Y5i1INi/XIkpC8d9yTqt5heRxYwp+yHadWO6E7ZMlITfxLZii+mM=
 =7EpO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes, including the fix for CVE-2020-2732 and a few issues found
  by 'make W=1'"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: s390: rstify new ioctls in api.rst
  KVM: nVMX: Check IO instruction VM-exit conditions
  KVM: nVMX: Refactor IO bitmap checks into helper function
  KVM: nVMX: Don't emulate instructions in guest mode
  KVM: nVMX: Emulate MTF when performing instruction emulation
  KVM: fix error handling in svm_hardware_setup
  KVM: SVM: Fix potential memory leak in svm_cpu_init()
  KVM: apic: avoid calculating pending eoi from an uninitialized val
  KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled
  KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1
  kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabled
  KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSE
  KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow
  KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI
  kvm/emulate: fix a -Werror=cast-function-type
  KVM: x86: fix incorrect comparison in trace event
  KVM: nVMX: Fix some obsolete comments and grammar error
  KVM: x86: fix missing prototypes
  KVM: x86: enable -Werror
2020-02-24 11:48:17 -08:00
Oliver Upton 35a571346a KVM: nVMX: Check IO instruction VM-exit conditions
Consult the 'unconditional IO exiting' and 'use IO bitmaps' VM-execution
controls when checking instruction interception. If the 'use IO bitmaps'
VM-execution control is 1, check the instruction access against the IO
bitmaps to determine if the instruction causes a VM-exit.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23 10:16:32 +01:00
Oliver Upton e71237d3ff KVM: nVMX: Refactor IO bitmap checks into helper function
Checks against the IO bitmap are useful for both instruction emulation
and VM-exit reflection. Refactor the IO bitmap checks into a helper
function.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23 10:14:40 +01:00
Paolo Bonzini 07721feee4 KVM: nVMX: Don't emulate instructions in guest mode
vmx_check_intercept is not yet fully implemented. To avoid emulating
instructions disallowed by the L1 hypervisor, refuse to emulate
instructions by default.

Cc: stable@vger.kernel.org
[Made commit, added commit msg - Oliver]
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23 10:11:55 +01:00
Oliver Upton 5ef8acbdd6 KVM: nVMX: Emulate MTF when performing instruction emulation
Since commit 5f3d45e7f2 ("kvm/x86: add support for
MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap
flag processor-based execution control for its L2 guest. KVM simply
forwards any MTF VM-exits to the L1 guest, which works for normal
instruction execution.

However, when KVM needs to emulate an instruction on the behalf of an L2
guest, the monitor trap flag is not emulated. Add the necessary logic to
kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon
instruction emulation for L2.

Fixes: 5f3d45e7f2 ("kvm/x86: add support for MONITOR_TRAP_FLAG")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23 09:36:23 +01:00
Li RongQing dd58f3c95c KVM: fix error handling in svm_hardware_setup
rename svm_hardware_unsetup as svm_hardware_teardown, move
it before svm_hardware_setup, and call it to free all memory
if fail to setup in svm_hardware_setup, otherwise memory will
be leaked

remove __exit attribute for it since it is called in __init
function

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23 09:34:26 +01:00
Linus Torvalds dca132a60f Two fixes for the AMD MCE driver:
- Populate the per CPU MCA bank descriptor pointer only after it has been
     completely set up to prevent a use-after-free in case that one of the
     subsequent initialization step fails
 
   - Implement a proper release function for the sysfs entries of MCA
     threshold controls instead of freeing the memory right in the CPU
     teardown code, which leads to another use-after-free when the
     associated sysfs file is opened and accessed.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5RkwATHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoY6ED/4oafN9DmeY18oUv1QpoMQQMa6iduz6
 udemhpjqVXO7R1Ste4ccM/fIQ8sMjf58isuxUClcDwrX3fxv+lusK2iVPEw99Vpo
 w1xSjdXNW0KSSiQko9oMHVu+xcXIt8vxpL4YyjEuR81rcoecFaq2c2KhLMAW4o0p
 3mEv7/QYPfpKc4ydcbcHo2JF6U1sfUpsWpoe/SxpXRpxeoy64baCWZGbcsUXqjB6
 3MRxxy+ypKKKPPUM1py4D/ViDXwkhhP+gMD4ljWXCprpul/KuXAMEgvW39MtVsBJ
 uMF3PMXqjKx+WY492tpxtdZjWej+X13ID/cTc2w1EBHz30Qxmc6RieTKi6FzsJYB
 PKsTWdGarzORioaBg51Riq27C3+fjHbe6WqkhIQzmenSIwiV1o6o4IyuOs5sdlxX
 rjIk/ssNeAxRpCy308i6Vaq98PBZqAY1/iUZN50vAzldH3bwKxobowjn+AYStA0c
 9BF5zw7/3oXB4WaByuBwJ3DzWjqiXM4EUPu7LYF9DVSvj+A2xOmhwN+uz3SK6hBk
 vkxiFE50Lo2qoDaATJozY8+nxgUKRNiDdz+udhVsoQxNKWUMxirsH18TFu8yBl2r
 HGKsfCBY4CnV64WRy5IKQsqt3EhAgAUUoD0jSy7P3xf4HwSKAn/9OZ1cWQAo1wzQ
 xnXUtRDFc7ScHg==
 =2f84
 -----END PGP SIGNATURE-----

Merge tag 'ras-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS fixes from Thomas Gleixner:
 "Two fixes for the AMD MCE driver:

   - Populate the per CPU MCA bank descriptor pointer only after it has
     been completely set up to prevent a use-after-free in case that one
     of the subsequent initialization step fails

   - Implement a proper release function for the sysfs entries of MCA
     threshold controls instead of freeing the memory right in the CPU
     teardown code, which leads to another use-after-free when the
     associated sysfs file is opened and accessed"

* tag 'ras-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce/amd: Fix kobject lifetime
  x86/mce/amd: Publish the bank pointer only after setup has succeeded
2020-02-22 18:02:10 -08:00
Linus Torvalds fca1037864 Two fixes for x86:
- Remove the __force_oder definiton from the kaslr boot code as it is
     already defined in the page table code which makes GCC 10 builds fail
     because it changed the default to -fno-common.
 
   - Address the AMD erratum 1054 concerning the IRPERF capability and
     enable the Instructions Retired fixed counter on machines which are not
     affected by the erratum.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5RlXkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoV6vEACsB5d8TC+OYYn1UsRJZszQ4ItRoT2Q
 t++G1RjY+hIiEVb4BufhWi3DBsS2XETwO7LIma8tj+Vt/hhAs+3PyBiumFIz3HEN
 pJzPR7CszD6EiO0qRw5Mrj2n+EC8I1Ts/hKuzir0kQr0h+jxg3OAOWMnUNfXiqS2
 mGh3baMeNYvLvI/MUDBcFP0ZDMcgsYPb3qt4Qodg9bS31+d7xlTPwK6Lua5R8eih
 ZTaVOR2JMYXIYDQA5eAqB2P/GiFBDERQHrJUQ44mY9A14w3T7qjthfMiCAvWlVd7
 +ibxYA3/xujQumhyCFXmdxYEzyVzp8kLSlF7ERGVCdDZ20ZV/FA/c6uyUiW+tmUi
 NR915G8632qKF7TXRPITZaWl8rC0KEcm5W+K0uf8ThJKUdq5vigXURLV9t9udeKY
 HqQtyuNtesmKycF9oXG6OfFeKuveZR6XSlhLK2fMs/mxa9yyvyRyXNmwwATgTSI4
 RPwrpAB52snexARBR/kZ9p/kgB47FceVYOYuQMvcp/n+1KXNesmAIeT29vNSzYUK
 vL0M5XVBsz9pvTkQlhxW36sO8uLZG6SPZ+e0ypDt9YDz+YTXbBM91buxpYE9xk36
 2j0aPrexC6FwCEny9uEckHRuLUip2mpld4QOxH8j3itYme3LfPa1poajAoKBwFkg
 gu42lzbWqVArZA==
 =Pgxk
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Two fixes for x86:

   - Remove the __force_oder definiton from the kaslr boot code as it is
     already defined in the page table code which makes GCC 10 builds
     fail because it changed the default to -fno-common.

   - Address the AMD erratum 1054 concerning the IRPERF capability and
     enable the Instructions Retired fixed counter on machines which are
     not affected by the erratum"

* tag 'x86-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF
  x86/boot/compressed: Don't declare __force_order in kaslr_64.c
2020-02-22 17:08:16 -08:00
Linus Torvalds 54dedb5b57 xen: branch for v5.6-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXlAvJgAKCRCAXGG7T9hj
 voVrAPsEsWQB5qtd+mWJCzE8VeR+mZ5SzQwJ12FhDA+4wUFuHgEAofvP7t8H3Bkr
 SrSGMB2hHlJW78ZLoSSpnhAWm4nANg8=
 =skec
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Two small fixes for Xen:

   - a fix to avoid warnings with new gcc

   - a fix for incorrectly disabled interrupts when calling
     _cond_resched()"

* tag 'for-linus-5.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Enable interrupts when calling _cond_resched()
  x86/xen: Distribute switch variables for initialization
2020-02-21 16:10:10 -08:00
Miaohe Lin d80b64ff29 KVM: SVM: Fix potential memory leak in svm_cpu_init()
When kmalloc memory for sd->sev_vmcbs failed, we forget to free the page
held by sd->save_area. Also get rid of the var r as '-ENOMEM' is actually
the only possible outcome here.

Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21 18:06:04 +01:00
Miaohe Lin 23520b2def KVM: apic: avoid calculating pending eoi from an uninitialized val
When pv_eoi_get_user() fails, 'val' may remain uninitialized and the return
value of pv_eoi_get_pending() becomes random. Fix the issue by initializing
the variable.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21 18:05:44 +01:00
Vitaly Kuznetsov a444326780 KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled
When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*),
nothing happens to VMX MSRs on the already existing vCPUs, however, all new
ones are created with PIN_BASED_POSTED_INTR filtered out. This is very
confusing and results in the following picture inside the guest:

$ rdmsr -ax 0x48d
ff00000016
7f00000016
7f00000016
7f00000016

This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does
KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three.

L1 hypervisor may only check CPU0's controls to find out what features
are available and it will be very confused later. Switch to setting
PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21 18:05:35 +01:00