Commit Graph

38114 Commits

Author SHA1 Message Date
Kan Liang c4c55e362a perf/x86/intel/uncore: Generic support for the MMIO type of uncore blocks
The discovery table provides the generic uncore block information
for the MMIO type of uncore blocks, which is good enough to provide
basic uncore support.

The box control field is composed of the BAR address and box control
offset. When initializing the uncore blocks, perf should ioremap the
address from the box control field.

Implement the generic support for the MMIO type of uncore block.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-6-git-send-email-kan.liang@linux.intel.com
2021-04-02 10:04:55 +02:00
Kan Liang 42839ef4a2 perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks
The discovery table provides the generic uncore block information
for the PCI type of uncore blocks, which is good enough to provide
basic uncore support.

The PCI BUS and DEVFN information can be retrieved from the box control
field. Introduce the uncore_pci_pmus_register() to register all the
PCICFG type of uncore blocks. The old PCI probe/remove way is dropped.

The PCI BUS and DEVFN information are different among dies. Add box_ctls
to store the box control field of each die.

Add a new BUS notifier for the PCI type of uncore block to support the
hotplug. If the device is "hot remove", the corresponding registered PMU
has to be unregistered. Perf cannot locate the PMU by searching a const
pci_device_id table, because the discovery tables don't provide such
information. Introduce uncore_pci_find_dev_pmu_from_types() to search
the whole uncore_pci_uncores for the PMU.

Implement generic support for the PCI type of uncore block.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-5-git-send-email-kan.liang@linux.intel.com
2021-04-02 10:04:55 +02:00
Kan Liang 6477dc3934 perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier
Perf will use a similar method to the PCI sub driver to register
the PMUs for the PCI type of uncore blocks. The method requires a BUS
notifier to support hotplug. The current BUS notifier cannot be reused,
because it searches a const id_table for the corresponding registered
PMU. The PCI type of uncore blocks in the discovery tables doesn't
provide an id_table.

Factor out uncore_bus_notify() and add the pointer of an id_table as a
parameter. The uncore_bus_notify() will be reused in the following
patch.

The current BUS notifier is only used by the PCI sub driver. Its name is
too generic. Rename it to uncore_pci_sub_notifier, which is specific for
the PCI sub driver.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-4-git-send-email-kan.liang@linux.intel.com
2021-04-02 10:04:54 +02:00
Kan Liang d6c7541304 perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks
The discovery table provides the generic uncore block information for
the MSR type of uncore blocks, e.g., the counter width, the number of
counters, the location of control/counter registers, which is good
enough to provide basic uncore support. It can be used as a fallback
solution when the kernel doesn't support a platform.

The name of the uncore box cannot be retrieved from the discovery table.
uncore_type_&typeID_&boxID will be used as its name. Save the type ID
and the box ID information in the struct intel_uncore_type.
Factor out uncore_get_pmu_name() to handle different naming methods.

Implement generic support for the MSR type of uncore block.

Some advanced features, such as filters and constraints, cannot be
retrieved from discovery tables. Features that rely on that
information are not be supported here.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-3-git-send-email-kan.liang@linux.intel.com
2021-04-02 10:04:54 +02:00
Kan Liang edae1f06c2 perf/x86/intel/uncore: Parse uncore discovery tables
A self-describing mechanism for the uncore PerfMon hardware has been
introduced with the latest Intel platforms. By reading through an MMIO
page worth of information, perf can 'discover' all the standard uncore
PerfMon registers in a machine.

The discovery mechanism relies on BIOS's support. With a proper BIOS,
a PCI device with the unique capability ID 0x23 can be found on each
die. Perf can retrieve the information of all available uncore PerfMons
from the device via MMIO. The information is composed of one global
discovery table and several unit discovery tables.
- The global discovery table includes global uncore information of the
  die, e.g., the address of the global control register, the offset of
  the global status register, the number of uncore units, the offset of
  unit discovery tables, etc.
- The unit discovery table includes generic uncore unit information,
  e.g., the access type, the counter width, the address of counters,
  the address of the counter control, the unit ID, the unit type, etc.
  The unit is also called "box" in the code.
Perf can provide basic uncore support based on this information
with the following patches.

To locate the PCI device with the discovery tables, check the generic
PCI ID first. If it doesn't match, go through the entire PCI device tree
and locate the device with the unique capability ID.

The uncore information is similar among dies. To save parsing time and
space, only completely parse and store the discovery tables on the first
die and the first box of each die. The parsed information is stored in
an
RB tree structure, intel_uncore_discovery_type. The size of the stored
discovery tables varies among platforms. It's around 4KB for a Sapphire
Rapids server.

If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
will exit with -ENODEV. There is nothing changed.

Add a module parameter to disable the discovery feature. If a BIOS gets
the discovery tables wrong, users can have an option to disable the
feature. For the current patchset, the uncore driver will exit with
-ENODEV. In the future, it may fall back to the hardcode uncore driver
on a known platform.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-2-git-send-email-kan.liang@linux.intel.com
2021-04-02 10:04:54 +02:00
Arnd Bergmann 8d195e7a8a crypto: poly1305 - fix poly1305_core_setkey() declaration
gcc-11 points out a mismatch between the declaration and the definition
of poly1305_core_setkey():

lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=]
   13 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 raw_key[16])
      |                                                          ~~~~~~~~~^~~~~~~~~~~
In file included from lib/crypto/poly1305-donna32.c:11:
include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 *’ {aka ‘const unsigned char *’}
   21 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 *raw_key);

This is harmless in principle, as the calling conventions are the same,
but the more specific prototype allows better type checking in the
caller.

Change the declaration to match the actual function definition.
The poly1305_simd_init() is a bit suspicious here, as it previously
had a 32-byte argument type, but looks like it needs to take the
16-byte POLY1305_BLOCK_SIZE array instead.

Fixes: 1c08a10436 ("crypto: poly1305 - add new 32 and 64-bit generic versions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-04-02 18:28:12 +11:00
Vitaly Kuznetsov 8cdddd182b ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()
Commit 496121c021 ("ACPI: processor: idle: Allow probing on platforms
with one ACPI C-state") broke CPU0 hotplug on certain systems, e.g.
I'm observing the following on AWS Nitro (e.g r5b.xlarge but other
instance types are affected as well):

 # echo 0 > /sys/devices/system/cpu/cpu0/online
 # echo 1 > /sys/devices/system/cpu/cpu0/online
 <10 seconds delay>
 -bash: echo: write error: Input/output error

In fact, the above mentioned commit only revealed the problem and did
not introduce it. On x86, to wakeup CPU an NMI is being used and
hlt_play_dead()/mwait_play_dead() loops are prepared to handle it:

	/*
	 * If NMI wants to wake up CPU0, start CPU0.
	 */
	if (wakeup_cpu0())
		start_cpu0();

cpuidle_play_dead() -> acpi_idle_play_dead() (which is now being called on
systems where it wasn't called before the above mentioned commit) serves
the same purpose but it doesn't have a path for CPU0. What happens now on
wakeup is:
 - NMI is sent to CPU0
 - wakeup_cpu0_nmi() works as expected
 - we get back to while (1) loop in acpi_idle_play_dead()
 - safe_halt() puts CPU0 to sleep again.

The straightforward/minimal fix is add the special handling for CPU0 on x86
and that's what the patch is doing.

Fixes: 496121c021 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-01 13:37:55 +02:00
Vitaly Kuznetsov 77fcbe823f KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update()
When guest time is reset with KVM_SET_CLOCK(0), it is possible for
'hv_clock->system_time' to become a small negative number. This happens
because in KVM_SET_CLOCK handling we set 'kvm->arch.kvmclock_offset' based
on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled,
kvm_guest_time_update() does (masterclock in use case):

hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset;

And 'master_kernel_ns' represents the last time when masterclock
got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a
problem, the difference is very small, e.g. I'm observing
hv_clock.system_time = -70 ns. The issue comes from the fact that
'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in
compute_tsc_page_parameters() becomes a very big number.

Use 'master_kernel_ns' instead of get_kvmclock_ns() when masterclock is in
use and get_kvmclock_base_ns() when it's not to prevent 'system_time' from
going negative.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210331124130.337992-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01 05:14:19 -04:00
Paolo Bonzini a83829f56c KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken
pvclock_gtod_sync_lock can be taken with interrupts disabled if the
preempt notifier calls get_kvmclock_ns to update the Xen
runstate information:

   spin_lock include/linux/spinlock.h:354 [inline]
   get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587
   kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69
   kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100
   kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline]
   kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062

So change the users of the spinlock to spin_lock_irqsave and
spin_unlock_irqrestore.

Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
Fixes: 30b5c851af ("KVM: x86/xen: Add support for vCPU runstate information")
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01 05:14:19 -04:00
Paolo Bonzini c2c647f91a KVM: x86: reduce pvclock_gtod_sync_lock critical sections
There is no need to include changes to vcpu->requests into
the pvclock_gtod_sync_lock critical section.  The changes to
the shared data structures (in pvclock_update_vm_gtod_copy)
already occur under the lock.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01 05:14:19 -04:00
Paolo Bonzini 6ebae23c07 Merge branch 'kvm-fix-svm-races' into kvm-master 2021-04-01 05:14:05 -04:00
Paolo Bonzini 3c346c0c60 KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit
Fixing nested_vmcb_check_save to avoid all TOC/TOU races
is a bit harder in released kernels, so do the bare minimum
by avoiding that EFER.SVME is cleared.  This is problematic
because svm_set_efer frees the data structures for nested
virtualization if EFER.SVME is cleared.

Also check that EFER.SVME remains set after a nested vmexit;
clearing it could happen if the bit is zero in the save area
that is passed to KVM_SET_NESTED_STATE (the save area of the
nested state corresponds to the nested hypervisor's state
and is restored on the next nested vmexit).

Cc: stable@vger.kernel.org
Fixes: 2fcf4876ad ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01 05:11:35 -04:00
Paolo Bonzini a58d9166a7 KVM: SVM: load control fields from VMCB12 before checking them
Avoid races between check and use of the nested VMCB controls.  This
for example ensures that the VMRUN intercept is always reflected to the
nested hypervisor, instead of being processed by the host.  Without this
patch, it is possible to end up with svm->nested.hsave pointing to
the MSR permission bitmap for nested guests.

This bug is CVE-2021-29657.

Reported-by: Felix Wilhelm <fwilhelm@google.com>
Cc: stable@vger.kernel.org
Fixes: 2fcf4876ad ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-01 05:09:31 -04:00
Borislav Petkov f2ac256b9a Merge 'x86/alternatives'
Pick up dependent changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
2021-03-31 18:04:19 +02:00
Peter Zijlstra 52fa82c21f x86: Add insn_decode_kernel()
Add a helper to decode kernel instructions; there's no point in
endlessly repeating those last two arguments.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210326151259.379242587@infradead.org
2021-03-31 16:20:22 +02:00
Paolo Bonzini 825e34d3c9 Merge commit 'kvm-tdp-fix-flushes' into kvm-master 2021-03-31 07:45:41 -04:00
Sean Christopherson 33a3164161 KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
Prevent the TDP MMU from yielding when zapping a gfn range during NX
page recovery.  If a flush is pending from a previous invocation of the
zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU
has not accumulated a flush for the current invocation, then yielding
will release mmu_lock with stale TLB entries.

That being said, this isn't technically a bug fix in the current code, as
the TDP MMU will never yield in this case.  tdp_mmu_iter_cond_resched()
will yield if and only if it has made forward progress, as defined by the
current gfn vs. the last yielded (or starting) gfn.  Because zapping a
single shadow page is guaranteed to (a) find that page and (b) step
sideways at the level of the shadow page, the TDP iter will break its loop
before getting a chance to yield.

But that is all very, very subtle, and will break at the slightest sneeze,
e.g. zapping while holding mmu_lock for read would break as the TDP MMU
wouldn't be guaranteed to see the present shadow page, and thus could step
sideways at a lower level.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-4-seanjc@google.com>
[Add lockdep assertion. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30 13:19:56 -04:00
Sean Christopherson 048f49809c KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which
does the flush itself if and only if it yields (which it will never do in
this particular scenario), and otherwise expects the caller to do the
flush.  If pages are zapped from the TDP MMU but not the legacy MMU, then
no flush will occur.

Fixes: 29cf0f5007 ("kvm: x86/mmu: NX largepage recovery for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-3-seanjc@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30 13:19:55 -04:00
Sean Christopherson a835429cda KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
When flushing a range of GFNs across multiple roots, ensure any pending
flush from a previous root is honored before yielding while walking the
tables of the current root.

Note, kvm_tdp_mmu_zap_gfn_range() now intentionally overwrites its local
"flush" with the result to avoid redundant flushes.  zap_gfn_range()
preserves and return the incoming "flush", unless of course the flush was
performed prior to yielding and no new flush was triggered.

Fixes: 1af4a96025 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
Cc: stable@vger.kernel.org
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30 13:19:55 -04:00
Siddharth Chandrasekaran 6fb3084ab5 KVM: make: Fix out-of-source module builds
Building kvm module out-of-source with,

    make -C $SRC O=$BIN M=arch/x86/kvm

fails to find "irq.h" as the include dir passed to cflags-y does not
prefix the source dir. Fix this by prefixing $(srctree) to the include
dir path.

Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de>
Message-Id: <20210324124347.18336-1-sidcha@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30 13:07:10 -04:00
Vitaly Kuznetsov 1973cadd4c KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE
MSR_F15H_PERF_CTL0-5, MSR_F15H_PERF_CTR0-5 MSRs are only available when
X86_FEATURE_PERFCTR_CORE CPUID bit was exposed to the guest. KVM, however,
allows these MSRs unconditionally because kvm_pmu_is_valid_msr() ->
amd_msr_idx_to_pmc() check always passes and because kvm_pmu_set_msr() ->
amd_pmu_set_msr() doesn't fail.

In case of a counter (CTRn), no big harm is done as we only increase
internal PMC's value but in case of an eventsel (CTLn), we go deep into
perf internals with a non-existing counter.

Note, kvm_get_msr_common() just returns '0' when these MSRs don't exist
and this also seems to contradict architectural behavior which is #GP
(I did check one old Opteron host) but changing this status quo is a bit
scarier.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210323084515.1346540-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30 13:07:10 -04:00
Dongli Zhang ecaf088f53 KVM: x86: remove unused declaration of kvm_write_tsc()
kvm_write_tsc() was renamed and made static since commit 0c899c25d7
("KVM: x86: do not attempt TSC synchronization on guest writes"). Remove
its unused declaration.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Message-Id: <20210326070334.12310-1-dongli.zhang@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30 13:07:09 -04:00
Haiwei Li d632826f26 KVM: clean up the unused argument
kvm_msr_ignored_check function never uses vcpu argument. Clean up the
function and invokers.

Signed-off-by: Haiwei Li <lihaiwei@tencent.com>
Message-Id: <20210313051032.4171-1-lihaiwei.kernel@gmail.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-30 13:07:09 -04:00
Thomas Gleixner 9a98bc2cf0 x86/vector: Add a sanity check to prevent IRQ2 allocations
To prevent another incidental removal of the IRQ2 ignore logic in the
IO/APIC code going unnoticed add a sanity check. Add some commentry at the
other place which ignores IRQ2 while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210318192819.795280387@linutronix.de
2021-03-30 00:39:12 +02:00
Rafael J. Wysocki 1a1c130ab7 ACPI: tables: x86: Reserve memory occupied by ACPI tables
The following problem has been reported by George Kennedy:

 Since commit 7fef431be9 ("mm/page_alloc: place pages to tail
 in __free_pages_core()") the following use after free occurs
 intermittently when ACPI tables are accessed.

 BUG: KASAN: use-after-free in ibft_init+0x134/0xc49
 Read of size 4 at addr ffff8880be453004 by task swapper/0/1
 CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc1-7a7fd0d #1
 Call Trace:
  dump_stack+0xf6/0x158
  print_address_description.constprop.9+0x41/0x60
  kasan_report.cold.14+0x7b/0xd4
  __asan_report_load_n_noabort+0xf/0x20
  ibft_init+0x134/0xc49
  do_one_initcall+0xc4/0x3e0
  kernel_init_freeable+0x5af/0x66b
  kernel_init+0x16/0x1d0
  ret_from_fork+0x22/0x30

 ACPI tables mapped via kmap() do not have their mapped pages
 reserved and the pages can be "stolen" by the buddy allocator.

Apparently, on the affected system, the ACPI table in question is
not located in "reserved" memory, like ACPI NVS or ACPI Data, that
will not be used by the buddy allocator, so the memory occupied by
that table has to be explicitly reserved to prevent the buddy
allocator from using it.

In order to address this problem, rearrange the initialization of the
ACPI tables on x86 to locate the initial tables earlier and reserve
the memory occupied by them.

The other architectures using ACPI should not be affected by this
change.

Link: https://lore.kernel.org/linux-acpi/1614802160-29362-1-git-send-email-george.kennedy@oracle.com/
Reported-by: George Kennedy <george.kennedy@oracle.com>
Tested-by: George Kennedy <george.kennedy@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
2021-03-29 19:26:04 +02:00
Ingo Molnar feecb81732 Linux 5.12-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFRBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBhB7AeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGCPUH+KKkSoOlN2YNu1oc
 iy2nznwZoSQTk5ZLz7PypO/WWmmtgzudkObG7yqIURdrncsAkHR17Wu2P7rdBr1j
 Ma+VhF9MQ+xx+r86upH7c3gYfhyfdUMvzuLy0rwLQ1Yrzrb7xFcVkj3BHk54TAQA
 w05sRPuVJ3/c/HPYV2iXkkdnnMbXSTCebeDDwjFb9D3qagr4vcd/PjDHmGbfNF8R
 o6gLpbK5Ly6ww1nth9gGGUjzrW95yVItvcroP6vQWljxhuy+NE1lXRm8LsGhxqtW
 foFFptJup5nhSNJXWtQt/U3huVD6mZ3W3y9cOThPjXZRy2wva3I1IpBKoEFReUpG
 /Tq8EA==
 =tPUY
 -----END PGP SIGNATURE-----

Merge tag 'v5.12-rc5' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-29 15:56:48 +02:00
Fenghua Yu ebb1064e7c x86/traps: Handle #DB for bus lock
Bus locks degrade performance for the whole system, not just for the CPU
that requested the bus lock. Two CPU features "#AC for split lock" and
"#DB for bus lock" provide hooks so that the operating system may choose
one of several mitigation strategies.

#AC for split lock is already implemented. Add code to use the #DB for
bus lock feature to cover additional situations with new options to
mitigate.

split_lock_detect=
		#AC for split lock		#DB for bus lock

off		Do nothing			Do nothing

warn		Kernel OOPs			Warn once per task and
		Warn once per task and		and continues to run.
		disable future checking
	 	When both features are
		supported, warn in #AC

fatal		Kernel OOPs			Send SIGBUS to user.
		Send SIGBUS to user
		When both features are
		supported, fatal in #AC

ratelimit:N	Do nothing			Limit bus lock rate to
						N per second in the
						current non-root user.

Default option is "warn".

Hardware only generates #DB for bus lock detect when CPL>0 to avoid
nested #DB from multiple bus locks while the first #DB is being handled.
So no need to handle #DB for bus lock detected in the kernel.

#DB for bus lock is enabled by bus lock detection bit 2 in DEBUGCTL MSR
while #AC for split lock is enabled by split lock detection bit 29 in
TEST_CTRL MSR.

Both breakpoint and bus lock in the same instruction can trigger one #DB.
The bus lock is handled before the breakpoint in the #DB handler.

Delivery of #DB for bus lock in userspace clears DR6[11], which is set by
the #DB handler right after reading DR6.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210322135325.682257-3-fenghua.yu@intel.com
2021-03-28 22:52:15 +02:00
Fenghua Yu f21d4d3b97 x86/cpufeatures: Enumerate #DB for bus lock detection
A bus lock is acquired through either a split locked access to writeback
(WB) memory or any locked access to non-WB memory. This is typically >1000
cycles slower than an atomic operation within a cache line. It also
disrupts performance on other cores.

Some CPUs have the ability to notify the kernel by a #DB trap after a user
instruction acquires a bus lock and is executed. This allows the kernel to
enforce user application throttling or mitigation. Both breakpoint and bus
lock can trigger the #DB trap in the same instruction and the ordering of
handling them is the kernel #DB handler's choice.

The CPU feature flag to be shown in /proc/cpuinfo will be "bus_lock_detect".

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210322135325.682257-2-fenghua.yu@intel.com
2021-03-28 22:52:14 +02:00
Lai Jiangshan 1591584e2e x86/process/64: Move cpu_current_top_of_stack out of TSS
cpu_current_top_of_stack is currently stored in TSS.sp1. TSS is exposed
through the cpu_entry_area which is visible with user CR3 when PTI is
enabled and active.

This makes it a coveted fruit for attackers.  An attacker can fetch the
kernel stack top from it and continue next steps of actions based on the
kernel stack.

But it is actualy not necessary to be stored in the TSS.  It is only
accessed after the entry code switched to kernel CR3 and kernel GS_BASE
which means it can be in any regular percpu variable.

The reason why it is in TSS is historical (pre PTI) because TSS is also
used as scratch space in SYSCALL_64 and therefore cache hot.

A syscall also needs the per CPU variable current_task and eventually
__preempt_count, so placing cpu_current_top_of_stack next to them makes it
likely that they end up in the same cache line which should avoid
performance regressions. This is not enforced as the compiler is free to
place these variables, so these entry relevant variables should move into
a data structure to make this enforceable.

The seccomp_benchmark doesn't show any performance loss in the "getpid
native" test result.  Actually, the result changes from 93ns before to 92ns
with this change when KPTI is disabled. The test is very stable and
although the test doesn't show a higher degree of precision it gives enough
confidence that moving cpu_current_top_of_stack does not cause a
regression.

[ tglx: Removed unneeded export. Massaged changelog ]

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210125173444.22696-2-jiangshanlai@gmail.com
2021-03-28 22:40:10 +02:00
Linus Torvalds 36a14638f7 Two fixes:
- Fix build failure on Ubuntu with new GCC packages that turn on -fcf-protection
 
  - Fix SME memory encryption PTE encoding bug - AFAICT the code worked on
    4K page sizes (level 1) but had the wrong shift at higher page level orders
    (level 2 and higher).
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmBgXdERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gJWxAAgAOWwGY3yq3kUEtIExosXZPlHCFjal3N
 UXoVpQde4aBeZ9A4flMjkSZmTF5PVEN2npMz8ltnxU8NUJg4QR68UYiIE8BReARg
 +JuyNXGdAu1XyT+dWdTFqL9xgA9t8dG13o4WbBqGDZagnLNuvjYhzJtsgw9FbNWZ
 a1abBbcxpoZvSyQBHyqtuwoiWeeeFJiQZ02wZwxtonYHWVbBXEN5WhFL9Tc2kDJc
 /Ic09O+FDhpe3I/PvCiMrkpVJuBnaDdve5zDPDzR+FRMeAj4AhNLIJiMFj17bJWD
 eR6vCDoFz3EsbSdJz0XvHIZOSZnaiiC0ybTEv5nJTiRgDk+s6JDXWwDcJG+3yKJR
 Fm5TLlnaU++E9lYLpyCbgrWkrv0F2u3GmnieFnOOyzRv8NlkZqrThApf3xGsavy+
 qJZnXe5ftWp+mmIDw4DZDBVsJ8rBIflvURQxfG3SHkUc0iVsyUCrAK2eKYewk/dN
 eC6FVPkCdx4Ys50wb+OR9Enhq3yKFyRuZ2zIeguUX30sqoapJL85M1vglS5DFoX/
 pHcigRzBzFQOZhOh8Kq3VREOx0F+ioUfcZzmYdzjWSfXfpvqWFcLAIFgOv1hDfms
 XQ60X/voG0tWd0ODKXqyx6oa0rqamigPjLJp/gtDKpQHORFaabvnTJTLwN6n8N1Q
 syTWRiHMhi0=
 =tM9n
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Two fixes:

   - Fix build failure on Ubuntu with new GCC packages that turn
     on -fcf-protection

   - Fix SME memory encryption PTE encoding bug - AFAICT the code
     worked on 4K page sizes (level 1) but had the wrong shift at
     higher page level orders (level 2 and higher)"

* tag 'x86-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Turn off -fcf-protection for realmode targets
  x86/mem_encrypt: Correct physical address calculation in __set_clr_pte_enc()
2021-03-28 12:19:16 -07:00
Alexey Makhalov 0b4a285e2c x86/vmware: Avoid TSC recalibration when frequency is known
When the TSC frequency is known because it is retrieved from the
hypervisor, skip TSC refined calibration by setting X86_FEATURE_TSC_KNOWN_FREQ.

Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210105004752.131069-1-amakhalov@vmware.com
2021-03-28 21:11:43 +02:00
Martin KaFai Lau 797b84f727 bpf: Support kernel function call in x86-32
This patch adds kernel function call support to the x86-32 bpf jit.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015149.1545267-1-kafai@fb.com
2021-03-26 20:41:51 -07:00
Martin KaFai Lau e6ac2450d6 bpf: Support bpf program calling kernel function
This patch adds support to BPF verifier to allow bpf program calling
kernel function directly.

The use case included in this set is to allow bpf-tcp-cc to directly
call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()").  Those
functions have already been used by some kernel tcp-cc implementations.

This set will also allow the bpf-tcp-cc program to directly call the
kernel tcp-cc implementation,  For example, a bpf_dctcp may only want to
implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly
from the kernel tcp_dctcp.c instead of reimplementing (or
copy-and-pasting) them.

The tcp-cc kernel functions mentioned above will be white listed
for the struct_ops bpf-tcp-cc programs to use in a later patch.
The white listed functions are not bounded to a fixed ABI contract.
Those functions have already been used by the existing kernel tcp-cc.
If any of them has changed, both in-tree and out-of-tree kernel tcp-cc
implementations have to be changed.  The same goes for the struct_ops
bpf-tcp-cc programs which have to be adjusted accordingly.

This patch is to make the required changes in the bpf verifier.

First change is in btf.c, it adds a case in "btf_check_func_arg_match()".
When the passed in "btf->kernel_btf == true", it means matching the
verifier regs' states with a kernel function.  This will handle the
PTR_TO_BTF_ID reg.  It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET,
and PTR_TO_TCP_SOCK to its kernel's btf_id.

In the later libbpf patch, the insn calling a kernel function will
look like:

insn->code == (BPF_JMP | BPF_CALL)
insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */
insn->imm == func_btf_id /* btf_id of the running kernel */

[ For the future calling function-in-kernel-module support, an array
  of module btf_fds can be passed at the load time and insn->off
  can be used to index into this array. ]

At the early stage of verifier, the verifier will collect all kernel
function calls into "struct bpf_kfunc_desc".  Those
descriptors are stored in "prog->aux->kfunc_tab" and will
be available to the JIT.  Since this "add" operation is similar
to the current "add_subprog()" and looking for the same insn->code,
they are done together in the new "add_subprog_and_kfunc()".

In the "do_check()" stage, the new "check_kfunc_call()" is added
to verify the kernel function call instruction:
1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE.
   A new bpf_verifier_ops "check_kfunc_call" is added to do that.
   The bpf-tcp-cc struct_ops program will implement this function in
   a later patch.
2. Call "btf_check_kfunc_args_match()" to ensure the regs can be
   used as the args of a kernel function.
3. Mark the regs' type, subreg_def, and zext_dst.

At the later do_misc_fixups() stage, the new fixup_kfunc_call()
will replace the insn->imm with the function address (relative
to __bpf_call_base).  If needed, the jit can find the btf_func_model
by calling the new bpf_jit_find_kfunc_model(prog, insn).
With the imm set to the function address, "bpftool prog dump xlated"
will be able to display the kernel function calls the same way as
it displays other bpf helper calls.

gpl_compatible program is required to call kernel function.

This feature currently requires JIT.

The verifier selftests are adjusted because of the changes in
the verbose log in add_subprog_and_kfunc().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210325015142.1544736-1-kafai@fb.com
2021-03-26 20:41:51 -07:00
Sean Christopherson 231d3dbdda x86/sgx: Add SGX_CHILD_PRESENT hardware error code
SGX driver can accurately track how enclave pages are used.  This
enables SECS to be specifically targeted and EREMOVE'd only after all
child pages have been EREMOVE'd.  This ensures that SGX driver will
never encounter SGX_CHILD_PRESENT in normal operation.

Virtual EPC is different.  The host does not track how EPC pages are
used by the guest, so it cannot guarantee EREMOVE success.  It might,
for instance, encounter a SECS with a non-zero child count.

Add a definition of SGX_CHILD_PRESENT.  It will be used exclusively by
the SGX virtualization driver to handle recoverable EREMOVE errors when
saniziting EPC pages after they are freed.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/050b198e882afde7e6eba8e6a0d4da39161dbb5a.1616136308.git.kai.huang@intel.com
2021-03-26 22:51:36 +01:00
Kai Huang b0c7459be0 x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()
EREMOVE takes a page and removes any association between that page and
an enclave. It must be run on a page before it can be added into another
enclave. Currently, EREMOVE is run as part of pages being freed into the
SGX page allocator. It is not expected to fail, as it would indicate a
use-after-free of EPC pages. Rather than add the page back to the pool
of available EPC pages, the kernel intentionally leaks the page to avoid
additional errors in the future.

However, KVM does not track how guest pages are used, which means that
SGX virtualization use of EREMOVE might fail. Specifically, it is
legitimate that EREMOVE returns SGX_CHILD_PRESENT for EPC assigned to
KVM guest, because KVM/kernel doesn't track SECS pages.

To allow SGX/KVM to introduce a more permissive EREMOVE helper and
to let the SGX virtualization code use the allocator directly, break
out the EREMOVE call from the SGX page allocator. Rename the original
sgx_free_epc_page() to sgx_encl_free_epc_page(), indicating that
it is used to free an EPC page assigned to a host enclave. Replace
sgx_free_epc_page() with sgx_encl_free_epc_page() in all call sites so
there's no functional change.

At the same time, improve the error message when EREMOVE fails, and
add documentation to explain to the user what that failure means and
to suggest to the user what to do when this bug happens in the case it
happens.

 [ bp: Massage commit message, fix typos and sanitize text, simplify. ]

Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20210325093057.122834-1-kai.huang@intel.com
2021-03-26 22:51:23 +01:00
Linus Torvalds 6c20f6df61 xen: branch for v5.12-rc5
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYF37OgAKCRCAXGG7T9hj
 vp8hAP4h7mvjfkntbFXagrJK9pi2xVC9d/YO5nfa8/K3LcGVnQD/fKcU9ggPN9vI
 GLnhyprGLcCA4aTL6Ogb37o9fDd4Yws=
 =joIg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.12b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "This contains a small series with a more elegant fix of a problem
  which was originally fixed in rc2"

* tag 'for-linus-5.12b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  Revert "xen: fix p2m size in dom0 for disabled memory hotplug case"
  xen/x86: make XEN_BALLOON_MEMORY_HOTPLUG_LIMIT depend on MEMORY_HOTPLUG
2021-03-26 11:15:25 -07:00
Nathan Chancellor d5cbd80e30 x86/boot: Add $(CLANG_FLAGS) to compressed KBUILD_CFLAGS
When cross compiling x86 on an ARM machine with clang, there are several
errors along the lines of:

  arch/x86/include/asm/string_64.h:27:10: error: invalid output constraint '=&c' in asm

This happens because the compressed boot Makefile reassigns KBUILD_CFLAGS
and drops the clang flags that set the target architecture ('--target=')
and the path to the GNU cross tools ('--prefix='), meaning that the host
architecture is targeted.

These flags are available as $(CLANG_FLAGS) from the main Makefile so
add them to the compressed boot folder's KBUILD_CFLAGS so that cross
compiling works as expected.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lkml.kernel.org/r/20210326000435.4785-3-nathan@kernel.org
2021-03-26 11:32:55 +01:00
John Millikin 8abe7fc26a x86/build: Propagate $(CLANG_FLAGS) to $(REALMODE_FLAGS)
When cross-compiling with Clang, the `$(CLANG_FLAGS)' variable
contains additional flags needed to build C and assembly sources
for the target platform. Normally this variable is automatically
included in `$(KBUILD_CFLAGS)' via the top-level Makefile.

The x86 real-mode makefile builds `$(REALMODE_CFLAGS)' from a
plain assignment and therefore drops the Clang flags. This causes
Clang to not recognize x86-specific assembler directives:

  arch/x86/realmode/rm/header.S:36:1: error: unknown directive
  .type real_mode_header STT_OBJECT ; .size real_mode_header, .-real_mode_header
  ^

Explicit propagation of `$(CLANG_FLAGS)' to `$(REALMODE_CFLAGS)',
which is inherited by real-mode make rules, fixes cross-compilation
with Clang for x86 targets.

Relevant flags:

* `--target' sets the target architecture when cross-compiling. This
  flag must be set for both compilation and assembly (`KBUILD_AFLAGS')
  to support architecture-specific assembler directives.

* `-no-integrated-as' tells clang to assemble with GNU Assembler
  instead of its built-in LLVM assembler. This flag is set by default
  unless `LLVM_IAS=1' is set, because the LLVM assembler can't yet
  parse certain GNU extensions.

Signed-off-by: John Millikin <john@john-millikin.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Link: https://lkml.kernel.org/r/20210326000435.4785-2-nathan@kernel.org
2021-03-26 11:32:47 +01:00
Sean Christopherson b8921dccf3 x86/cpufeatures: Add SGX1 and SGX2 sub-features
Add SGX1 and SGX2 feature flags, via CPUID.0x12.0x0.EAX, as scattered
features, since adding a new leaf for only two bits would be wasteful.
As part of virtualizing SGX, KVM will expose the SGX CPUID leafs to its
guest, and to do so correctly needs to query hardware and kernel support
for SGX1 and SGX2.

Suppress both SGX1 and SGX2 from /proc/cpuinfo. SGX1 basically means
SGX, and for SGX2 there is no concrete use case of using it in
/proc/cpuinfo.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/d787827dbfca6b3210ac3e432e3ac1202727e786.1616136308.git.kai.huang@intel.com
2021-03-25 17:33:11 +01:00
Kai Huang e9a15a40e8 x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit
Move SGX_LC feature bit to CPUID dependency table to make clearing all
SGX feature bits easier. Also remove clear_sgx_caps() since it is just
a wrapper of setup_clear_cpu_cap(X86_FEATURE_SGX) now.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/5d4220fd0a39f52af024d3fa166231c1d498dd10.1616136308.git.kai.huang@intel.com
2021-03-25 17:33:11 +01:00
Masahiro Yamada 7dfe553aff x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL()
Building kernel/sys_ni.c with W=1 emits tons of -Wmissing-prototypes warnings:

  $ make W=1 kernel/sys_ni.o
    [ snip ]
    CC      kernel/sys_ni.o
     ./arch/x86/include/asm/syscall_wrapper.h:83:14: warning: no previous prototype for '__ia32_sys_io_setup' [-Wmissing-prototypes]
     ...

The problem is in __COND_SYSCALL(), the __SYS_STUB0() and __SYS_STUBx() macros
defined a few lines above already have forward declarations.

Let's do likewise for __COND_SYSCALL() to fix the warnings.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210301131533.64671-2-masahiroy@kernel.org
2021-03-25 16:20:41 +01:00
Wei Yongjun 2304d14db6 x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declaration
Address this GCC warning:

  arch/x86/kernel/kprobes/core.c:940:1:
   warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
    940 | static int nokprobe_inline kprobe_is_ss(struct kprobe_ctlblk *kcb)
        | ^~~~~~

[ mingo: Tidied up the changelog. ]

Fixes: 6256e668b7af: ("x86/kprobes: Use int3 instead of debug trap for single-step")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210324144502.1154883-1-weiyongjun1@huawei.com
2021-03-25 13:07:58 +01:00
Masami Hiramatsu 2f706e0e5e x86/kprobes: Fix to identify indirect jmp and others using range case
Fix can_boost() to identify indirect jmp and others using range case
correctly.

Since the condition in switch statement is opcode & 0xf0, it can not
evaluate to 0xff case. This should be under the 0xf0 case. However,
there is no reason to use the conbinations of the bit-masked condition
and lower bit checking.

Use range case to clean up the switch statement too.

Fixes: 6256e668b7 ("x86/kprobes: Use int3 instead of debug trap for single-step")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/161666692308.1120877.4675552834049546493.stgit@devnote2
2021-03-25 11:37:22 +01:00
Masami Hiramatsu 6dd3b8c9f5 x86/kprobes: Fix to check non boostable prefixes correctly
There are 2 bugs in the can_boost() function because of using
x86 insn decoder. Since the insn->opcode never has a prefix byte,
it can not find CS override prefix in it. And the insn->attr is
the attribute of the opcode, thus inat_is_address_size_prefix(
insn->attr) always returns false.

Fix those by checking each prefix bytes with for_each_insn_prefix
loop and getting the correct attribute for each prefix byte.
Also, this removes unlikely, because this is a slow path.

Fixes: a8d11cd071 ("kprobes/x86: Consolidate insn decoder users for copying code")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/161666691162.1120877.2808435205294352583.stgit@devnote2
2021-03-25 11:37:21 +01:00
Ira Weiny 633b0616cf x86/sgx: Remove unnecessary kmap() from sgx_ioc_enclave_init()
kmap() is inefficient and is being replaced by kmap_local_page(), if
possible. There is no readily apparent reason why initp_page needs to be
allocated and kmap'ed() except that 'sigstruct' needs to be page-aligned
and 'token' 512 byte-aligned.

Rather than change it to kmap_local_page(), use kmalloc() instead
because kmalloc() can give this alignment when allocating PAGE_SIZE
bytes.

Remove the alloc_page()/kmap() and replace with kmalloc(PAGE_SIZE, ...)
to get a page aligned kernel address.

In addition, add a comment to document the alignment requirements so that
others don't attempt to 'fix' this again.

 [ bp: Massage commit message. ]

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210324182246.2484875-1-ira.weiny@intel.com
2021-03-25 09:50:32 +01:00
Linus Torvalds e138138003 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Various fixes, all over:

   1) Fix overflow in ptp_qoriq_adjfine(), from Yangbo Lu.

   2) Always store the rx queue mapping in veth, from Maciej
      Fijalkowski.

   3) Don't allow vmlinux btf in map_create, from Alexei Starovoitov.

   4) Fix memory leak in octeontx2-af from Colin Ian King.

   5) Use kvalloc in bpf x86 JIT for storing jit'd addresses, from
      Yonghong Song.

   6) Fix tx ptp stats in mlx5, from Aya Levin.

   7) Check correct ip version in tun decap, fropm Roi Dayan.

   8) Fix rate calculation in mlx5 E-Switch code, from arav Pandit.

   9) Work item memork leak in mlx5, from Shay Drory.

  10) Fix ip6ip6 tunnel crash with bpf, from Daniel Borkmann.

  11) Lack of preemptrion awareness in macvlan, from Eric Dumazet.

  12) Fix data race in pxa168_eth, from Pavel Andrianov.

  13) Range validate stab in red_check_params(), from Eric Dumazet.

  14) Inherit vlan filtering setting properly in b53 driver, from
      Florian Fainelli.

  15) Fix rtnl locking in igc driver, from Sasha Neftin.

  16) Pause handling fixes in igc driver, from Muhammad Husaini
      Zulkifli.

  17) Missing rtnl locking in e1000_reset_task, from Vitaly Lifshits.

  18) Use after free in qlcnic, from Lv Yunlong.

  19) fix crash in fritzpci mISDN, from Tong Zhang.

  20) Premature rx buffer reuse in igb, from Li RongQing.

  21) Missing termination of ip[a driver message handler arrays, from
      Alex Elder.

  22) Fix race between "x25_close" and "x25_xmit"/"x25_rx" in hdlc_x25
      driver, from Xie He.

  23) Use after free in c_can_pci_remove(), from Tong Zhang.

  24) Uninitialized variable use in nl80211, from Jarod Wilson.

  25) Off by one size calc in bpf verifier, from Piotr Krysiuk.

  26) Use delayed work instead of deferrable for flowtable GC, from
      Yinjun Zhang.

  27) Fix infinite loop in NPC unmap of octeontx2 driver, from
      Hariprasad Kelam.

  28) Fix being unable to change MTU of dwmac-sun8i devices due to lack
      of fifo sizes, from Corentin Labbe.

  29) DMA use after free in r8169 with WoL, fom Heiner Kallweit.

  30) Mismatched prototypes in isdn-capi, from Arnd Bergmann.

  31) Fix psample UAPI breakage, from Ido Schimmel"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (171 commits)
  psample: Fix user API breakage
  math: Export mul_u64_u64_div_u64
  ch_ktls: fix enum-conversion warning
  octeontx2-af: Fix memory leak of object buf
  ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation
  net: bridge: don't notify switchdev for local FDB addresses
  net/sched: act_ct: clear post_ct if doing ct_clear
  net: dsa: don't assign an error value to tag_ops
  isdn: capi: fix mismatched prototypes
  net/mlx5: SF, do not use ecpu bit for vhca state processing
  net/mlx5e: Fix division by 0 in mlx5e_select_queue
  net/mlx5e: Fix error path for ethtool set-priv-flag
  net/mlx5e: Offload tuple rewrite for non-CT flows
  net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
  net/mlx5: Add back multicast stats for uplink representor
  net: ipconfig: ic_dev can be NULL in ic_close_devs
  MAINTAINERS: Combine "QLOGIC QLGE 10Gb ETHERNET DRIVER" sections into one
  docs: networking: Fix a typo
  r8169: fix DMA being used after buffer free if WoL is enabled
  net: ipa: fix init header command validation
  ...
2021-03-24 18:16:04 -07:00
Roger Pau Monne af44a387e7 Revert "xen: fix p2m size in dom0 for disabled memory hotplug case"
This partially reverts commit 882213990d ("xen: fix p2m size in dom0
for disabled memory hotplug case")

There's no need to special case XEN_UNPOPULATED_ALLOC anymore in order
to correctly size the p2m. The generic memory hotplug option has
already been tied together with the Xen hotplug limit, so enabling
memory hotplug should already trigger a properly sized p2m on Xen PV.

Note that XEN_UNPOPULATED_ALLOC depends on ZONE_DEVICE which pulls in
MEMORY_HOTPLUG.

Leave the check added to __set_phys_to_machine and the adjusted
comment about EXTRA_MEM_RATIO.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210324122424.58685-3-roger.pau@citrix.com

[boris: fixed formatting issues]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-24 18:33:36 -05:00
Roger Pau Monne 2b514ec727 xen/x86: make XEN_BALLOON_MEMORY_HOTPLUG_LIMIT depend on MEMORY_HOTPLUG
The Xen memory hotplug limit should depend on the memory hotplug
generic option, rather than the Xen balloon configuration. It's
possible to have a kernel with generic memory hotplug enabled, but
without Xen balloon enabled, at which point memory hotplug won't work
correctly due to the size limitation of the p2m.

Rename the option to XEN_MEMORY_HOTPLUG_LIMIT since it's no longer
tied to ballooning.

Fixes: 9e2369c06c ("xen: add helpers to allocate unpopulated memory")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210324122424.58685-2-roger.pau@citrix.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-24 18:33:11 -05:00
Sunil Muthuswamy 6dc2a774cb x86/Hyper-V: Support for free page reporting
Linux has support for free page reporting now (36e66c554b) for
virtualized environment. On Hyper-V when virtually backed VMs are
configured, Hyper-V will advertise cold memory discard capability,
when supported. This patch adds the support to hook into the free
page reporting infrastructure and leverage the Hyper-V cold memory
discard hint hypercall to report/free these pages back to the host.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Tested-by: Matheus Castello <matheus@castello.eng.br>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/SN4PR2101MB0880121FA4E2FEC67F35C1DCC0649@SN4PR2101MB0880.namprd21.prod.outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-24 11:35:24 +00:00
Xu Yihang 1b60280834 x86/hyperv: Fix unused variable 'hi' warning in hv_apic_read
Fixes the following W=1 kernel build warning(s):
arch/x86/hyperv/hv_apic.c:58:15: warning: variable ‘hi’ set but not used [-Wunused-but-set-variable]

Compiled with CONFIG_HYPERV enabled:
make allmodconfig ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu-
make W=1 arch/x86/hyperv/hv_apic.o ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu-

HV_X64_MSR_EOI occupies bit 31:0 and HV_X64_MSR_TPR occupies bit 7:0,
which means the higher 32 bits are not really used. Cast the variable hi
to void to silence this warning.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xu Yihang <xuyihang@huawei.com>
Link: https://lore.kernel.org/r/20210323025013.191533-1-xuyihang@huawei.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-24 11:32:56 +00:00
Xu Yihang 13c4d4626a x86/hyperv: Fix unused variable 'msr_val' warning in hv_qlock_wait
Fixes the following W=1 kernel build warning(s):
arch/x86/hyperv/hv_spinlock.c:28:16: warning: variable ‘msr_val’ set but not used [-Wunused-but-set-variable]
  unsigned long msr_val;

As Hypervisor Top-Level Functional Specification states in chapter 7.5
Virtual Processor Idle Sleep State, "A partition which possesses the
AccessGuestIdleMsr privilege (refer to section 4.2.2) may trigger entry
into the virtual processor idle sleep state through a read to the
hypervisor-defined MSR HV_X64_MSR_GUEST_IDLE".

That means only a read of the MSR is necessary. The returned value
msr_val is not used. Cast it to void to silence this warning.

Reference:
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xu Yihang <xuyihang@huawei.com>
Link: https://lore.kernel.org/r/20210323024302.174434-1-xuyihang@huawei.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-24 11:31:04 +00:00
Borislav Petkov 2ffdc2c344 x86/mce/inject: Add IPID for injection too
Add an injection file in order to specify the IPID too when injecting
an error. One use case example is using the machinery to decode MCEs
collected from other machines.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210314201806.12798-1-bp@alien8.de
2021-03-24 00:04:45 +01:00
Mike Rapoport 4c674481dc x86/setup: Merge several reservations of start of memory
Currently, the first several pages are reserved both to avoid leaking
their contents on systems with L1TF and to avoid corrupting BIOS memory.

Merge the two memory reservations.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210302100406.22059-3-rppt@kernel.org
2021-03-23 17:17:36 +01:00
Mike Rapoport a799c2bd29 x86/setup: Consolidate early memory reservations
The early reservations of memory areas used by the firmware, bootloader,
kernel text and data are spread over setup_arch(). Moreover, some of them
happen *after* memblock allocations, e.g trim_platform_memory_ranges() and
trim_low_memory_range() are called after reserve_real_mode() that allocates
memory.

There was no corruption of these memory regions because memblock always
allocates memory either from the end of memory (in top-down mode) or above
the kernel image (in bottom-up mode). However, the bottom up mode is going
to be updated to span the entire memory [1] to avoid limitations caused by
KASLR.

Consolidate early memory reservations in a dedicated function to improve
robustness against future changes. Having the early reservations in one
place also makes it clearer what memory must be reserved before memblock
allocations are allowed.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Link: [1] https://lore.kernel.org/lkml/20201217201214.3414100-2-guro@fb.com
Link: https://lkml.kernel.org/r/20210302100406.22059-2-rppt@kernel.org
2021-03-23 17:13:17 +01:00
Arnd Bergmann 9fcb51c14d x86/build: Turn off -fcf-protection for realmode targets
The new Ubuntu GCC packages turn on -fcf-protection globally,
which causes a build failure in the x86 realmode code:

  cc1: error: ‘-fcf-protection’ is not compatible with this target

Turn it off explicitly on compilers that understand this option.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210323124846.1584944-1-arnd@kernel.org
2021-03-23 16:36:01 +01:00
Masami Hiramatsu 6256e668b7 x86/kprobes: Use int3 instead of debug trap for single-step
Use int3 instead of debug trap exception for single-stepping the
probed instructions. Some instructions which change the ip
registers or modify IF flags are emulated because those are not
able to be single-stepped by int3 or may allow the interrupt
while single-stepping.

This actually changes the kprobes behavior.

- kprobes can not probe following instructions; int3, iret,
  far jmp/call which get absolute address as immediate,
  indirect far jmp/call, indirect near jmp/call with addressing
  by memory (register-based indirect jmp/call are OK), and
  vmcall/vmlaunch/vmresume/vmxoff.

- If the kprobe post_handler doesn't set before registering,
  it may not be called in some case even if you set it afterwards.
  (IOW, kprobe booster is enabled at registration, user can not
   change it)

But both are rare issue, unsupported instructions will not be
used in the kernel (or rarely used), and post_handlers are
rarely used (I don't see it except for the test code).

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/161469874601.49483.11985325887166921076.stgit@devnote2
2021-03-23 16:07:56 +01:00
Masami Hiramatsu a194acd316 x86/kprobes: Identify far indirect JMP correctly
Since Grp5 far indirect JMP is FF "mod 101 r/m", it should be
(modrm & 0x38) == 0x28, and near indirect JMP is also 0x38 == 0x20.
So we can mask modrm with 0x30 and check 0x20.
This is actually what the original code does, it also doesn't care
the last bit. So the result code is same.

Thus, I think this is just a cosmetic cleanup.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/161469873475.49483.13257083019966335137.stgit@devnote2
2021-03-23 16:07:56 +01:00
Masami Hiramatsu d60ad3d46f x86/kprobes: Retrieve correct opcode for group instruction
Since the opcodes start from 0xff are group5 instruction group which is
not 2 bytes opcode but the extended opcode determined by the MOD/RM byte.

The commit abd82e533d ("x86/kprobes: Do not decode opcode in resume_execution()")
used insn->opcode.bytes[1], but that is not correct. We have to refer
the insn->modrm.bytes[1] instead.

Fixes: abd82e533d ("x86/kprobes: Do not decode opcode in resume_execution()")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/161469872400.49483.18214724458034233166.stgit@devnote2
2021-03-23 16:07:55 +01:00
Isaku Yamahata 8249d17d31 x86/mem_encrypt: Correct physical address calculation in __set_clr_pte_enc()
The pfn variable contains the page frame number as returned by the
pXX_pfn() functions, shifted to the right by PAGE_SHIFT to remove the
page bits. After page protection computations are done to it, it gets
shifted back to the physical address using page_level_shift().

That is wrong, of course, because that function determines the shift
length based on the level of the page in the page table but in all the
cases, it was shifted by PAGE_SHIFT before.

Therefore, shift it back using PAGE_SHIFT to get the correct physical
address.

 [ bp: Rewrite commit message. ]

Fixes: dfaaec9033 ("x86: Add support for changing memory encryption attribute in early boot")
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/81abbae1657053eccc535c16151f63cd049dcb97.1616098294.git.isaku.yamahata@intel.com
2021-03-23 11:59:45 +01:00
Arnd Bergmann e14cfb3bdd x86/boot/compressed: Avoid gcc-11 -Wstringop-overread warning
GCC gets confused by the comparison of a pointer to an integer literal,
with the assumption that this is an offset from a NULL pointer and that
dereferencing it is invalid:

  In file included from arch/x86/boot/compressed/misc.c:18:
  In function ‘parse_elf’,
      inlined from ‘extract_kernel’ at arch/x86/boot/compressed/misc.c:442:2:
  arch/x86/boot/compressed/../string.h:15:23: error: ‘__builtin_memcpy’ reading 64 bytes from a region of size 0 [-Werror=stringop-overread]
     15 | #define memcpy(d,s,l) __builtin_memcpy(d,s,l)
        |                       ^~~~~~~~~~~~~~~~~~~~~~~
  arch/x86/boot/compressed/misc.c:283:9: note: in expansion of macro ‘memcpy’
    283 |         memcpy(&ehdr, output, sizeof(ehdr));
        |         ^~~~~~

I could not find any good workaround for this, but as this is only
a warning for a failure during early boot, removing the line entirely
works around the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Sebor <msebor@gmail.com>
Link: https://lore.kernel.org/r/20210322160253.4032422-2-arnd@kernel.org
2021-03-23 00:16:25 +01:00
Arnd Bergmann cdc34cb8f2 x86/boot/tboot: Avoid Wstringop-overread-warning
gcc-11 warns about using string operations on pointers that are
defined at compile time as offsets from a NULL pointer. Unfortunately
that also happens on the result of fix_to_virt(), which is a
compile-time constant for a constant input:

  arch/x86/kernel/tboot.c: In function 'tboot_probe':
  arch/x86/kernel/tboot.c:70:13: error: '__builtin_memcmp_eq' specified bound 16 exceeds source size 0 [-Werror=stringop-overread]
     70 |         if (memcmp(&tboot_uuid, &tboot->uuid, sizeof(tboot->uuid))) {
        |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I hope this can get addressed in gcc-11 before the release.

As a workaround, split up the tboot_probe() function in two halves
to separate the pointer generation from the usage. This is a bit
ugly, and hopefully gcc understands that the code is actually correct
before it learns to peek into the noinline function.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Sebor <msebor@gmail.com>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
Link: https://lore.kernel.org/r/20210322160253.4032422-3-arnd@kernel.org
2021-03-23 00:16:13 +01:00
Arnd Bergmann 279d56abc6 x86/fpu/math-emu: Fix function cast warning
Building with 'make W=1', gcc points out that casting between
incompatible function types can be dangerous:

  arch/x86/math-emu/fpu_trig.c:1638:60: error: cast between incompatible function types from ‘int (*)(FPU_REG *, u_char)’ {aka ‘int (*)(struct fpu__reg *, unsigned char)’} to ‘void (*)(FPU_REG *, u_char)’ {aka ‘void (*)(struct fpu__reg *, unsigned char)’} [-Werror=cast-function-type]
   1638 |         fprem, fyl2xp1, fsqrt_, fsincos, frndint_, fscale, (FUNC_ST0) fsin, fcos
        |                                                            ^

This one seems harmless, but it is easy enough to work around it by
adding an intermediate function that adjusts the return type.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210322214824.974323-1-arnd@kernel.org
2021-03-23 00:08:02 +01:00
Otavio Pontes 7189b3c119 x86/microcode: Check for offline CPUs before requesting new microcode
Currently, the late microcode loading mechanism checks whether any CPUs
are offlined, and, in such a case, aborts the load attempt.

However, this must be done before the kernel caches new microcode from
the filesystem. Otherwise, when offlined CPUs are onlined later, those
cores are going to be updated through the CPU hotplug notifier callback
with the new microcode, while CPUs previously onine will continue to run
with the older microcode.

For example:

Turn off one core (2 threads):

  echo 0 > /sys/devices/system/cpu/cpu3/online
  echo 0 > /sys/devices/system/cpu/cpu1/online

Install the ucode fails because a primary SMT thread is offline:

  cp intel-ucode/06-8e-09 /lib/firmware/intel-ucode/
  echo 1 > /sys/devices/system/cpu/microcode/reload
  bash: echo: write error: Invalid argument

Turn the core back on

  echo 1 > /sys/devices/system/cpu/cpu3/online
  echo 1 > /sys/devices/system/cpu/cpu1/online
  cat /proc/cpuinfo |grep microcode
  microcode : 0x30
  microcode : 0xde
  microcode : 0x30
  microcode : 0xde

The rationale for why the update is aborted when at least one primary
thread is offline is because even if that thread is soft-offlined
and idle, it will still have to participate in broadcasted MCE's
synchronization dance or enter SMM, and in both examples it will execute
instructions so it better have the same microcode revision as the other
cores.

 [ bp: Heavily edit and extend commit message with the reasoning behind all
   this. ]

Fixes: 30ec26da99 ("x86/microcode: Do not upload microcode if CPUs are offline")
Signed-off-by: Otavio Pontes <otavio.pontes@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/20210319165515.9240-2-otavio.pontes@intel.com
2021-03-22 22:29:40 +01:00
Arnd Bergmann 396a66aa11 x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes
gcc-11 warns about mismatched prototypes here:

  arch/x86/lib/msr-smp.c:255:51: error: argument 2 of type ‘u32 *’ {aka ‘unsigned int *’} declared as a pointer [-Werror=array-parameter=]
    255 | int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 *regs)
        |                                              ~~~~~^~~~
  arch/x86/include/asm/msr.h:347:50: note: previously declared as an array ‘u32[8]’ {aka ‘unsigned int[8]’}

GCC is right here - fix up the types.

[ mingo: Twiddled the changelog. ]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210322164541.912261-1-arnd@kernel.org
2021-03-22 21:37:03 +01:00
Ingo Molnar 163b099146 x86: Fix various typos in comments, take #2
Fix another ~42 single-word typos in arch/x86/ code comments,
missed a few in the first pass, in particular in .S files.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
2021-03-21 23:50:28 +01:00
Ingo Molnar c681df88dc x86: Remove unusual Unicode characters from comments
We've accumulated a few unusual Unicode characters in arch/x86/
over the years, substitute them with their proper ASCII equivalents.

A few of them were a whitespace equivalent: ' ' - the use was harmless.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
2021-03-21 23:50:07 +01:00
Ingo Molnar ca8778c45e Merge branch 'linus' into x86/cleanups, to resolve conflict
Conflicts:
	arch/x86/kernel/kprobes/ftrace.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-21 22:16:08 +01:00
Linus Torvalds 1c74516c2d Boundary condition fixes for bugs unearthed by the perf fuzzer.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmBXJeQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iE/BAAsawZZH9GFsnwv7HraBl0jKvftp3/xPh6
 WL/RKGhGfu3f9MOrcM+dflggJEfnvz6/Tfm7/XKWHlIW3nHrQcn+lQtddoTwb2wp
 CmpAUYGtGWr7tr/B5vQcIg+yyYsVGtfyEmVro+TfzYCl/e21zATqEKtgSGclCcXg
 g0u5ZJsL8AOPSk2cR/ABrpI0MUlKHjUSJJ3V9j69OqSLhfc+GCn6ifTC1XK05MyR
 JX1kNaVTpVSGk650+oCUOP2rNaSk/G2wVZtp/LB9O1N0b9Zot2hQYbx1cEGFRNOy
 Q2FeMcw3V2t26Xk2q9AFGlOS0IeasO/NKK/urotRS2/rXdcr8QMUHTZdmr85UVQJ
 oohM+/DqoCAY5TeC4+d+tL5i+DLVGkrdbHX8IKkzYmejhE9DMQ5+a16O7ZcGoVv4
 oFG8RYHsUHPjEqPgC9vxS8Iy3n2yk34TIKQg/DJBdNhkQPnNup/zAInCEs6WqWN7
 OZulpWGK2yEV3mJpX2ayAMxym3hGAk/pBGAEcFI1DTXVBlGlOTvr6J0S3O54efTH
 +hrx+V+bYKHZPk3gK9mjN8rzC/u2pFbFZpf0cC2+G9XhFctmx8sIiWZ8kHrftp8b
 OaKXxo9lhfZhDuBj0zl3Yz59bFzS5VKPrHCGJ43pZRsxZsv8PXJnhmAsKT197fv+
 xQOrdN+L7Cc=
 =bCUA
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Boundary condition fixes for bugs unearthed by the perf fuzzer"

* tag 'perf-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix unchecked MSR access error caused by VLBR_EVENT
  perf/x86/intel: Fix a crash caused by zero PEBS status
2021-03-21 11:26:21 -07:00
Linus Torvalds 5e3ddf96e7 - Add the arch-specific mapping between physical and logical CPUs to fix
devicetree-node lookups.
 
 - Restore the IRQ2 ignore logic
 
 - Fix get_nr_restart_syscall() to return the correct restart syscall number.
 Split in a 4-patches set to avoid kABI breakage when backporting to dead
 kernels.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmBXJu0ACgkQEsHwGGHe
 VUrCkQ/9Et5W76HMQfHccluks2i2yNXgd7nROhIt0iMS1Ph86AWYJZmMZ2dbaqW8
 nORU20ziHme+9PScmcJb2LdJxIRDtYNs1J811IYeKNpvj8KHXtV2VYCVG9UcL21E
 FmUlZf5oINiDMzu3q4SuqHw9t7X6RCItolQIRmQHDXqPraFhBxji2VOFXDIg+qhf
 a4sBz6UfxA4a/b7d/KxHxNvuQE5Cluc9gninhtaYh1b7OQZJX4+vTa3W5V4kK0df
 ohOH5pnJp9V7qH2CmB3UcGWJTxHeLbm4E0KYkyasnKG9M0KmIvJ6jNARlRAo3hAF
 hn9D4xLtsnIWjtO6xEVdF7kSizkYZRPay5kX88quvlSa0FkkPnsUvFtW79Yi3ZNy
 vL2NAu2biqNQyo7ZWVffJns2DrJwYZ6KOGA6oUBwTUBfieF9KMdDew8IXRUMYNdO
 LzW87Irf9eZj9c+b7Rtr0VofmKgRYwy1Lo8eVT+VGkV+nOTOB9rlAll2lYBq3aNA
 W6ei0S5/1zaRF5aU6Qmnap4eb1X/tp845q6CPYa9kIsZwVyGFOa7iLeYcNn9qHdB
 G6RW6CUh97A7wwxUYt5VGUscjYV2V9Ycv9HvIwrG/T7aezWnhI9ODtggzDgCnbls
 og6N/+heLZ9G/DyxAEmHuazV2ItDPJq69gag/POHhXJaSUGbdbA=
 =WfC4
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "The freshest pile of shiny x86 fixes for 5.12:

   - Add the arch-specific mapping between physical and logical CPUs to
     fix devicetree-node lookups

   - Restore the IRQ2 ignore logic

   - Fix get_nr_restart_syscall() to return the correct restart syscall
     number. Split in a 4-patches set to avoid kABI breakage when
     backporting to dead kernels"

* tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/of: Fix CPU devicetree-node lookups
  x86/ioapic: Ignore IRQ2 again
  x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART
  x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall()
  x86: Move TS_COMPAT back to asm/thread_info.h
  kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
2021-03-21 11:04:20 -07:00
Linus Torvalds 812da4d394 RISC-V Fixes for 5.12-rc4
I have handful of fixes for 5.12:
 
 * A fix to the SBI remote fence numbers for hypervisor fences, which had
   been transcribed in the wrong order in Linux.  These fences are only
   used with the KVM patches applied.
 * A whole host of build warnings have been fixed, these should have no
   functional change.
 * A fix to init_resources() that prevents an off-by-one error from
   causing an out-of-bounds array reference.  This is manifesting during
   boot on vexriscv.
 * A fix to ensure the KASAN mappings are visible before proceeding to
   use them.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmBVgV4THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiTOWD/4l+uRCwTelZqm/G0yKSSAevAv5Crsc
 Nzsa1uq7dOC+JLZ5y96SUng825WdGX+HiIf7QyUFPzpnqyYc4+ROwNb80ObPWQZU
 dctatP2g9Jk2ImmJbGQVeDXKAiqrMM3hf1bOF3N3VV9DpqID0z/S8l8H9mz7x9yl
 opd6kXxCPFKLgmAbMxcsytUduxZrJEcCpy3jPpIvjJ3BrzaGZlgjytqc2tYvbv/L
 9i//evmGTCNXfQPrWEcMpBPbMf+aSzb/9Im8THB42jpJVQ7kx3txVg6d+wb73oGf
 XHkm5mwrESAcnVGfxY5xRaaSK/L2k5Lg98J1K/BIHIKskjCTg5FdyrgeGwdtLg6T
 FuXEvK29FJgfMb7k2Mf25l/Lglzi4q4LxBO4wcAUb1OpaVeK2kgYJr1eniSKrE/v
 NF5/bD9h7sD1qbZLfk+lsTggBGfMBmthwp59jNb7V4cLkIFXwopgx2h/73jm6kn8
 8fMCTlwOoktewbv0DdWCy0Sfaa0iCXMSJy+Y13GWlcEMvQn1VLtX7RbQzZq9X+tV
 C/qkp1SdXfPG3vJbkNnZh/eS12F6vDauYJ814s3VAeJKOoMJWABB6Jm2SoBwFM6v
 kpIRNzDyJ1oKhF4PxIrmGkv6PvRM/j5akspOwy/zdHB3FBVCGmyuoB9GE8Bg1Rw7
 xyfdZthPDdvGyQ==
 =XhDE
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "A handful of fixes for 5.12:

   - fix the SBI remote fence numbers for hypervisor fences, which had
     been transcribed in the wrong order in Linux. These fences are only
     used with the KVM patches applied.

   - fix a whole host of build warnings, these should have no functional
     change.

   - fix init_resources() to prevent an off-by-one error from causing an
     out-of-bounds array reference. This was manifesting during boot on
     vexriscv.

   - ensure the KASAN mappings are visible before proceeding to use
     them"

* tag 'riscv-for-linus-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Correct SPARSEMEM configuration
  RISC-V: kasan: Declare kasan_shallow_populate() static
  riscv: Ensure page table writes are flushed when initializing KASAN vmalloc
  RISC-V: Fix out-of-bounds accesses in init_resources()
  riscv: Fix compilation error with Canaan SoC
  ftrace: Fix spelling mistake "disabed" -> "disabled"
  riscv: fix bugon.cocci warnings
  riscv: process: Fix no prototype for arch_dup_task_struct
  riscv: ftrace: Use ftrace_get_regs helper
  riscv: process: Fix no prototype for show_regs
  riscv: syscall_table: Reduce W=1 compilation warnings noise
  riscv: time: Fix no prototype for time_init
  riscv: ptrace: Fix no prototype warnings
  riscv: sbi: Fix comment of __sbi_set_timer_v01
  riscv: irq: Fix no prototype warning
  riscv: traps: Fix no prototype warnings
  RISC-V: correct enum sbi_ext_rfence_fid
2021-03-20 11:01:54 -07:00
Tony Luck a331f5fdd3 x86/mce: Add Xeon Sapphire Rapids to list of CPUs that support PPIN
New CPU model, same MSRs to control and read the inventory number.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210319173919.291428-1-tony.luck@intel.com
2021-03-20 12:12:10 +01:00
Stanislav Fomichev b908297047 bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG
__bpf_arch_text_poke does rewrite only for atomic nop5, emit_nops(xxx, 5)
emits non-atomic one which breaks fentry/fexit with k8 atomics:

P6_NOP5 == P6_NOP5_ATOMIC (0f1f440000 == 0f1f440000)
K8_NOP5 != K8_NOP5_ATOMIC (6666906690 != 6666666690)

Can be reproduced by doing "ideal_nops = k8_nops" in "arch_init_ideal_nops()
and running fexit_bpf2bpf selftest.

Fixes: e21aa34178 ("bpf: Fix fexit trampoline.")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210320000001.915366-1-sdf@google.com
2021-03-19 19:25:39 -07:00
Johan Hovold dd926880da x86/apic/of: Fix CPU devicetree-node lookups
Architectures that describe the CPU topology in devicetree and do not have
an identity mapping between physical and logical CPU ids must override the
default implementation of arch_match_cpu_phys_id().

Failing to do so breaks CPU devicetree-node lookups using of_get_cpu_node()
and of_cpu_device_node_get() which several drivers rely on. It also causes
the CPU struct devices exported through sysfs to point to the wrong
devicetree nodes.

On x86, CPUs are described in devicetree using their APIC ids and those
do not generally coincide with the logical ids, even if CPU0 typically
uses APIC id 0.

Add the missing implementation of arch_match_cpu_phys_id() so that CPU-node
lookups work also with SMP.

Apart from fixing the broken sysfs devicetree-node links this likely does
not affect current users of mainline kernels on x86.

Fixes: 4e07db9c8d ("x86/devicetree: Use CPU description from Device Tree")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210312092033.26317-1-johan@kernel.org
2021-03-19 23:01:49 +01:00
Linus Torvalds ecd8ee7f9c x86:
* new selftests
 * fixes for migration with HyperV re-enlightenment enabled
 * fix RCU/SRCU usage
 * fixes for local_irq_restore misuse false positive
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmBUpO8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPj6Af+LSkDniR08Eh/x4GHdX+ZSA9EhNuP
 PMqL+nDYvLXqc0XaErbZQpQbSP4aK7Tjly0LguZmNkBk17pnbjLb5Vv9hqJ30pM/
 pI8bGgdh+KDO9LClfrgsaYgC+B4R+fwqqTIvtBYMilVZ96JwixFiODB4ntRQmZgd
 xJS99jwjD8TO9pTYskKPf8y8yv5W9RH+wVQGXwc+T/sSzK/rcL4Jwt/ibO2FLcJK
 gBRXJDVjMIlpxPrqqoejVB2FHQQe36Bns85QU3dz0QuXfDuuEvbShY/f4R1z32fT
 RaccrvdMQtvgwS0l9Ij06PT0BdiG0EdZv/gOBUq5gVgx4XZyJTleJaVURw==
 =WZP4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Fixes for kvm on x86:

   - new selftests

   - fixes for migration with HyperV re-enlightenment enabled

   - fix RCU/SRCU usage

   - fixes for local_irq_restore misuse false positive"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  documentation/kvm: additional explanations on KVM_SET_BOOT_CPU_ID
  x86/kvm: Fix broken irq restoration in kvm_wait
  KVM: X86: Fix missing local pCPU when executing wbinvd on all dirty pCPUs
  KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish
  selftests: kvm: add set_boot_cpu_id test
  selftests: kvm: add _vm_ioctl
  selftests: kvm: add get_msr_index_features
  selftests: kvm: Add basic Hyper-V clocksources tests
  KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment
  KVM: x86: hyper-v: Track Hyper-V TSC page status
  KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS
  KVM: x86/mmu: Store the address space ID in the TDP iterator
  KVM: x86/mmu: Factor out tdp_iter_return_to_root
  KVM: x86/mmu: Fix RCU usage when atomically zapping SPTEs
  KVM: x86/mmu: Fix RCU usage in handle_removed_tdp_mmu_page
2021-03-19 14:10:07 -07:00
Jarkko Sakkinen 901ddbb9ec x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
Background
==========

SGX enclave memory is enumerated by the processor in contiguous physical
ranges called Enclave Page Cache (EPC) sections.  Currently, there is a
free list per section, but allocations simply target the lowest-numbered
sections.  This is functional, but has no NUMA awareness.

Fortunately, EPC sections are covered by entries in the ACPI SRAT table.
These entries allow each EPC section to be associated with a NUMA node,
just like normal RAM.

Solution
========

Implement a NUMA-aware enclave page allocator.  Mirror the buddy allocator
and maintain a list of enclave pages for each NUMA node.  Attempt to
allocate enclave memory first from local nodes, then fall back to other
nodes.

Note that the fallback is not as sophisticated as the buddy allocator
and is itself not aware of NUMA distances.  When a node's free list is
empty, it searches for the next-highest node with enclave pages (and
will wrap if necessary).  This could be improved in the future.

Other
=====

NUMA_KEEP_MEMINFO dependency is required for phys_to_target_node().

 [ Kai Huang: Do not return NULL from __sgx_alloc_epc_page() because
   callers do not expect that and that leads to a NULL ptr deref. ]

 [ dhansen: Fix an uninitialized 'nid' variable in
   __sgx_alloc_epc_page() as

   Reported-by: kernel test robot <lkp@intel.com>

   to avoid any potential allocations from the wrong NUMA node or even
   premature allocation failures. ]

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/
Link: https://lkml.kernel.org/r/20210319040602.178558-1-kai.huang@intel.com
Link: https://lkml.kernel.org/r/20210318214933.29341-1-dave.hansen@intel.com
Link: https://lkml.kernel.org/r/20210317235332.362001-2-jarkko.sakkinen@intel.com
2021-03-19 19:16:51 +01:00
Joerg Roedel 799de1baaf x86/sev-es: Optimize __sev_es_ist_enter() for better readability
Reorganize the code and improve the comments to make the function more
readable and easier to understand.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210303141716.29223-4-joro@8bytes.org
2021-03-19 13:37:22 +01:00
Jiapeng Chong 21d6a7dcbf x86/kaslr: Return boolean values from a function returning bool
Fix the following coccicheck warnings:

  ./arch/x86/boot/compressed/kaslr.c:642:10-11: WARNING: return of 0/1 in
  function 'process_mem_region' with return type bool.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1615283963-67277-1-git-send-email-jiapeng.chong@linux.alibaba.com
2021-03-19 13:25:07 +01:00
Thomas Gleixner a501b048a9 x86/ioapic: Ignore IRQ2 again
Vitaly ran into an issue with hotplugging CPU0 on an Amazon instance where
the matrix allocator claimed to be out of vectors. He analyzed it down to
the point that IRQ2, the PIC cascade interrupt, which is supposed to be not
ever routed to the IO/APIC ended up having an interrupt vector assigned
which got moved during unplug of CPU0.

The underlying issue is that IRQ2 for various reasons (see commit
af174783b9 ("x86: I/O APIC: Never configure IRQ2" for details) is treated
as a reserved system vector by the vector core code and is not accounted as
a regular vector. The Amazon BIOS has an routing entry of pin2 to IRQ2
which causes the IO/APIC setup to claim that interrupt which is granted by
the vector domain because there is no sanity check. As a consequence the
allocation counter of CPU0 underflows which causes a subsequent unplug to
fail with:

  [ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU

There is another sanity check missing in the matrix allocator, but the
underlying root cause is that the IO/APIC code lost the IRQ2 ignore logic
during the conversion to irqdomains.

For almost 6 years nobody complained about this wreckage, which might
indicate that this requirement could be lifted, but for any system which
actually has a PIC IRQ2 is unusable by design so any routing entry has no
effect and the interrupt cannot be connected to a device anyway.

Due to that and due to history biased paranoia reasons restore the IRQ2
ignore logic and treat it as non existent despite a routing entry claiming
otherwise.

Fixes: d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210318192819.636943062@linutronix.de
2021-03-19 12:43:41 +01:00
Ingo Molnar 01438749e3 Merge branch 'locking/urgent' into locking/core, to pick up dependent commits
We are applying further, lower-prio fixes on top of two ww_mutex fixes in locking/urgent.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-19 12:10:49 +01:00
Joerg Roedel f15a0a732a x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()
There are a few places left in the SEV-ES C code where hlt loops and/or
terminate requests are implemented. Replace them all with calls to
sev_es_terminate().

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-9-joro@8bytes.org
2021-03-18 23:04:12 +01:00
Joerg Roedel fef81c8626 x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path
Check whether the hypervisor reported the correct C-bit when running
as an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-8-joro@8bytes.org
2021-03-18 23:04:12 +01:00
Joerg Roedel e927e62d8e x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path
The 32-bit #VC handler has no GHCB and can only handle CPUID exit codes.
It is needed by the early boot code to handle #VC exceptions raised in
verify_cpu() and to get the position of the C-bit.

But the CPUID information comes from the hypervisor which is untrusted
and might return results which trick the guest into the no-SEV boot path
with no C-bit set in the page-tables. All data written to memory would
then be unencrypted and could leak sensitive data to the hypervisor.

Add sanity checks to the 32-bit boot #VC handler to make sure the
hypervisor does not pretend that SEV is not enabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-7-joro@8bytes.org
2021-03-18 23:04:12 +01:00
Joerg Roedel 1ccdbf748d x86/boot/compressed/64: Add 32-bit boot #VC handler
Add a #VC exception handler which is used when the kernel still executes
in protected mode. This boot-path already uses CPUID, which will cause #VC
exceptions in an SEV-ES guest.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-6-joro@8bytes.org
2021-03-18 23:03:43 +01:00
Wanpeng Li f4e61f0c9a x86/kvm: Fix broken irq restoration in kvm_wait
After commit 997acaf6b4 (lockdep: report broken irq restoration), the guest
splatting below during boot:

 raw_local_irq_restore() called with IRQs enabled
 WARNING: CPU: 1 PID: 169 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x26/0x30
 Modules linked in: hid_generic usbhid hid
 CPU: 1 PID: 169 Comm: systemd-udevd Not tainted 5.11.0+ #25
 RIP: 0010:warn_bogus_irq_restore+0x26/0x30
 Call Trace:
  kvm_wait+0x76/0x90
  __pv_queued_spin_lock_slowpath+0x285/0x2e0
  do_raw_spin_lock+0xc9/0xd0
  _raw_spin_lock+0x59/0x70
  lockref_get_not_dead+0xf/0x50
  __legitimize_path+0x31/0x60
  legitimize_root+0x37/0x50
  try_to_unlazy_next+0x7f/0x1d0
  lookup_fast+0xb0/0x170
  path_openat+0x165/0x9b0
  do_filp_open+0x99/0x110
  do_sys_openat2+0x1f1/0x2e0
  do_sys_open+0x5c/0x80
  __x64_sys_open+0x21/0x30
  do_syscall_64+0x32/0x50
  entry_SYSCALL_64_after_hwframe+0x44/0xae

The new consistency checking,  expects local_irq_save() and
local_irq_restore() to be paired and sanely nested, and therefore expects
local_irq_restore() to be called with irqs disabled.
The irqflags handling in kvm_wait() which ends up doing:

	local_irq_save(flags);
	safe_halt();
	local_irq_restore(flags);

instead triggers it.  This patch fixes it by using
local_irq_disable()/enable() directly.

Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1615791328-2735-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18 13:58:14 -04:00
Wanpeng Li c2162e13d6 KVM: X86: Fix missing local pCPU when executing wbinvd on all dirty pCPUs
In order to deal with noncoherent DMA, we should execute wbinvd on
all dirty pCPUs when guest wbinvd exits to maintain data consistency.
smp_call_function_many() does not execute the provided function on the
local core, therefore replace it by on_each_cpu_mask().

Reported-by: Nadav Amit <namit@vmware.com>
Cc: Nadav Amit <namit@vmware.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1615517151-7465-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18 13:55:34 -04:00
Sean Christopherson b318e8decf KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish
Fix a plethora of issues with MSR filtering by installing the resulting
filter as an atomic bundle instead of updating the live filter one range
at a time.  The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
the hardware MSR bitmaps won't be updated until the next VM-Enter, but
the relevant software struct is atomically updated, which is what KVM
really needs.

Similar to the approach used for modifying memslots, make arch.msr_filter
a SRCU-protected pointer, do all the work configuring the new filter
outside of kvm->lock, and then acquire kvm->lock only when the new filter
has been vetted and created.  That way vCPU readers either see the old
filter or the new filter in their entirety, not some half-baked state.

Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
TOCTOU bug, but that's just the tip of the iceberg...

  - Nothing is __rcu annotated, making it nigh impossible to audit the
    code for correctness.
  - kvm_add_msr_filter() has an unpaired smp_wmb().  Violation of kernel
    coding style aside, the lack of a smb_rmb() anywhere casts all code
    into doubt.
  - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
    count before taking the lock.
  - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.

The entire approach of updating the live filter is also flawed.  While
installing a new filter is inherently racy if vCPUs are running, fixing
the above issues also makes it trivial to ensure certain behavior is
deterministic, e.g. KVM can provide deterministic behavior for MSRs with
identical settings in the old and new filters.  An atomic update of the
filter also prevents KVM from getting into a half-baked state, e.g. if
installing a filter fails, the existing approach would leave the filter
in a half-baked state, having already committed whatever bits of the
filter were already processed.

[*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com

Fixes: 1a155254ff ("KVM: x86: Introduce MSR filtering")
Cc: stable@vger.kernel.org
Cc: Alexander Graf <graf@amazon.com>
Reported-by: Yuan Yao <yaoyuan0329os@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210316184436.2544875-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18 13:55:14 -04:00
Joerg Roedel 79419e13e8 x86/boot/compressed/64: Setup IDT in startup_32 boot path
This boot path needs exception handling when it is used with SEV-ES.
Setup an IDT and provide a helper function to write IDT entries for
use in 32-bit protected mode.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-5-joro@8bytes.org
2021-03-18 16:44:46 +01:00
Joerg Roedel 0c289ff81c x86/boot/compressed/64: Reload CS in startup_32
Exception handling in the startup_32 boot path requires the CS
selector to be correctly set up. Reload it from the current GDT.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-4-joro@8bytes.org
2021-03-18 16:44:43 +01:00
Joerg Roedel eab696d8e8 x86/sev: Do not require Hypervisor CPUID bit for SEV guests
A malicious hypervisor could disable the CPUID intercept for an SEV or
SEV-ES guest and trick it into the no-SEV boot path, where it could
potentially reveal secrets. This is not an issue for SEV-SNP guests,
as the CPUID intercept can't be disabled for those.

Remove the Hypervisor CPUID bit check from the SEV detection code to
protect against this kind of attack and add a Hypervisor bit equals zero
check to the SME detection path to prevent non-encrypted guests from
trying to enable SME.

This handles the following cases:

	1) SEV(-ES) guest where CPUID intercept is disabled. The guest
	   will still see leaf 0x8000001f and the SEV bit. It can
	   retrieve the C-bit and boot normally.

	2) Non-encrypted guests with intercepted CPUID will check
	   the SEV_STATUS MSR and find it 0 and will try to enable SME.
	   This will fail when the guest finds MSR_K8_SYSCFG to be zero,
	   as it is emulated by KVM. But we can't rely on that, as there
	   might be other hypervisors which return this MSR with bit
	   23 set. The Hypervisor bit check will prevent that the guest
	   tries to enable SME in this case.

	3) Non-encrypted guests on SEV capable hosts with CPUID intercept
	   disabled (by a malicious hypervisor) will try to boot into
	   the SME path. This will fail, but it is also not considered
	   a problem because non-encrypted guests have no protection
	   against the hypervisor anyway.

 [ bp: s/non-SEV/non-encrypted/g ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20210312123824.306-3-joro@8bytes.org
2021-03-18 16:44:40 +01:00
Joerg Roedel b099155e2d x86/boot/compressed/64: Cleanup exception handling before booting kernel
Disable the exception handling before booting the kernel to make sure
any exceptions that happen during early kernel boot are not directed to
the pre-decompression code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-2-joro@8bytes.org
2021-03-18 16:44:36 +01:00
Borislav Petkov afb4a37778 Linux 5.12-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBOgu4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGUd0H/3Ey8aWjVAig9Pe+
 VQVZKwG+LXWH6UmUx5qyaTxophhmGnWLvkigJMn63qIg4eQtfp2gNFHK+T4OJNIP
 ybnkjFZ337x4J9zD6m8mt4Wmelq9iW2wNOS+3YZAyYiGlXfMGM7SlYRCQRQznTED
 2O/JCMsOoP+Z8tr5ah/bzs0dANsXmTZ3QqRP2uzb6irKTgFR3/weOhj+Ht1oJ4Aq
 V+bgdcwhtk20hJhlvVeqws+o74LR789tTDCknlz/YNMv9e6VPfyIQ5vJAcFmZATE
 Ezj9yzkZ4IU+Ux6ikAyaFyBU8d1a4Wqye3eHCZBsEo6tcSAhbTZ90eoU86vh6ajS
 LZjwkNw=
 =6y1u
 -----END PGP SIGNATURE-----

Merge tag 'v5.12-rc3' into x86/seves

Pick up dependent SEV-ES urgent changes which went into -rc3 to base new
work ontop.

Signed-off-by: Borislav Petkov <bp@suse.de>
2021-03-18 16:43:11 +01:00
Jarkko Sakkinen 51ab30eb2a x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list
During normal runtime, the "ksgxd" daemon behaves like a version of
kswapd just for SGX. But, before it starts acting like kswapd, its first
job is to initialize enclave memory.

Currently, the SGX boot code places each enclave page on a
epc_section->init_laundry_list. Once it starts up, the ksgxd code walks
over that list and populates the actual SGX page allocator.

However, the per-section structures are going away to make way for the
SGX NUMA allocator. There's also little need to have a per-section
structure; the enclave pages are all treated identically, and they can
be placed on the correct allocator list from metadata stored in the
enclave page (struct sgx_epc_page) itself.

Modify sgx_sanitize_section() to take a single page list instead of
taking a section and deriving the list from there.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20210317235332.362001-1-jarkko.sakkinen@intel.com
2021-03-18 16:17:26 +01:00
Ingo Molnar d9f6e12fb0 x86: Fix various typos in comments
Fix ~144 single-word typos in arch/x86/ code comments.

Doing this in a single commit should reduce the churn.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
2021-03-18 15:31:53 +01:00
Ingo Molnar 14ff3ed86e Linux 5.12-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBOgu4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGUd0H/3Ey8aWjVAig9Pe+
 VQVZKwG+LXWH6UmUx5qyaTxophhmGnWLvkigJMn63qIg4eQtfp2gNFHK+T4OJNIP
 ybnkjFZ337x4J9zD6m8mt4Wmelq9iW2wNOS+3YZAyYiGlXfMGM7SlYRCQRQznTED
 2O/JCMsOoP+Z8tr5ah/bzs0dANsXmTZ3QqRP2uzb6irKTgFR3/weOhj+Ht1oJ4Aq
 V+bgdcwhtk20hJhlvVeqws+o74LR789tTDCknlz/YNMv9e6VPfyIQ5vJAcFmZATE
 Ezj9yzkZ4IU+Ux6ikAyaFyBU8d1a4Wqye3eHCZBsEo6tcSAhbTZ90eoU86vh6ajS
 LZjwkNw=
 =6y1u
 -----END PGP SIGNATURE-----

Merge tag 'v5.12-rc3' into x86/cleanups, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-18 15:27:03 +01:00
Vitaly Kuznetsov 0469f2f7ab KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment
When guest opts for re-enlightenment notifications upon migration, it is
in its right to assume that TSC page values never change (as they're only
supposed to change upon migration and the host has to keep things as they
are before it receives confirmation from the guest). This is mostly true
until the guest is migrated somewhere. KVM userspace (e.g. QEMU) will
trigger masterclock update by writing to HV_X64_MSR_REFERENCE_TSC, by
calling KVM_SET_CLOCK,... and as TSC value and kvmclock reading drift
apart (even slightly), the update causes TSC page values to change.

The issue at hand is that when Hyper-V is migrated, it uses stale (cached)
TSC page values to compute the difference between its own clocksource
(provided by KVM) and its guests' TSC pages to program synthetic timers
and in some cases, when TSC page is updated, this puts all stimer
expirations in the past. This, in its turn, causes an interrupt storm
and L2 guests not making much forward progress.

Note, KVM doesn't fully implement re-enlightenment notification. Basically,
the support for reenlightenment MSRs is just a stub and userspace is only
expected to expose the feature when TSC scaling on the expected destination
hosts is available. With TSC scaling, no real re-enlightenment is needed
as TSC frequency doesn't change. With TSC scaling becoming ubiquitous, it
likely makes little sense to fully implement re-enlightenment in KVM.

Prevent TSC page from being updated after migration. In case it's not the
guest who's initiating the change and when TSC page is already enabled,
just keep it as it is: TSC value is supposed to be preserved across
migration and TSC frequency can't change with re-enlightenment enabled.
The guest is doomed anyway if any of this is not true.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210316143736.964151-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18 08:02:46 -04:00
Vitaly Kuznetsov cc9cfddb04 KVM: x86: hyper-v: Track Hyper-V TSC page status
Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it
was updated from guest/host side or if we've failed to set it up (because
e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no
need to retry.

Also, in a hypothetical situation when we are in 'always catchup' mode for
TSC we can now avoid contending 'hv->hv_lock' on every guest enter by
setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters()
returns false.

Check for HV_TSC_PAGE_SET state instead of '!hv->tsc_ref.tsc_sequence' in
get_time_ref_counter() to properly handle the situation when we failed to
write the updated TSC page values to the guest.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210316143736.964151-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18 08:02:46 -04:00
Alexei Starovoitov e21aa34178 bpf: Fix fexit trampoline.
The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
The synchronize_rcu_tasks() will not wait for such tasks to complete.
In such case the trampoline image will be freed and when the task
wakes up the return IP will point to freed memory causing the crash.
Solve this by adding percpu_ref_get/put for the duration of trampoline
and separate trampoline vs its image life times.
The "half page" optimization has to be removed, since
first_half->second_half->first_half transition cannot be guaranteed to
complete in deterministic time. Every trampoline update becomes a new image.
The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
call_rcu_tasks. Together they will wait for the original function and
trampoline asm to complete. The trampoline is patched from nop to jmp to skip
fexit progs. They are freed independently from the trampoline. The image with
fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
will wait for both sleepable and non-sleepable progs to complete.

Fixes: fec56f5890 ("bpf: Introduce BPF trampoline")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Paul E. McKenney <paulmck@kernel.org>  # for RCU
Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
2021-03-18 00:22:51 +01:00
Dave Airlie 06debd6e1b Merge tag 'drm-intel-next-2021-03-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Highlights:
- Alderlake S enabling, via topic branch (Aditya, Anusha, Caz, José, Lucas, Matt, Tejas)
- Refactor display code to shrink intel_display.c etc. (Dave)
- Support more gen 9 and Tigerlake PCH combinations (Lyude, Tejas)
- Add eDP MSO support (Jani)

Display:
- Refactor to support multiple PSR instances (Gwan-gyeong)
- Link training debug logging updates (Sean)
- Updates to eDP fixed mode handling (Jani)
- Disable PSR2 on JSL/EHL (Edmund)
- Support DDR5 and LPDDR5 for bandwidth computation (Clint, José)
- Update VBT DP max link rate table (Shawn)
- Disable the QSES check for HDCP2.2 over MST (Juston)
- PSR updates, refactoring, selective fetch (José, Gwan-gyeong)
- Display init sequence refactoring (Lucas)
- Limit LSPCON to gen 9 and 10 platforms (Ankit)
- Fix DDI lane polarity per VBT info (Uma)
- Fix HDMI vswing programming location in mode set (Ville)
- Various display improvements and refactorings and cleanups (Ville)
- Clean up DDI clock routing and readout (Ville)
- Workaround async flip + VT-d corruption on HSW/BDW (Ville)
- SAGV watermark fixes and cleanups (Ville)
- Silence pipe tracepoint WARNs (Ville)

Other:
- Remove require_force_probe protection from RKL, may need to be revisited (Tejas)
- Detect loss of MMIO access (Matt)
- GVT display improvements
- drm/i915: Disable runtime power management during shutdown (Imre)
- Perf/OA updates (Umesh)
- Remove references to struct drm_device.pdev, via topic branch (Thomas)
- Backmerge (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87v99rnk1g.fsf@intel.com
2021-03-18 08:06:34 +10:00
Leon Romanovsky 6417f03132 module: remove never implemented MODULE_SUPPORTED_DEVICE
MODULE_SUPPORTED_DEVICE was added in pre-git era and never was
implemented. We can safely remove it, because the kernel has grown
to have many more reliable mechanisms to determine if device is
supported or not.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-17 13:16:18 -07:00
Sascha Hauer fa8b90070a quota: wire up quotactl_path
Wire up the quotactl_path syscall added in the previous patch.

Link: https://lore.kernel.org/r/20210304123541.30749-3-s.hauer@pengutronix.de
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-03-17 15:51:17 +01:00
Vitaly Kuznetsov e880c6ea55 KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
When KVM_REQ_MASTERCLOCK_UPDATE request is issued (e.g. after migration)
we need to make sure no vCPU sees stale values in PV clock structures and
thus all vCPUs are kicked with KVM_REQ_CLOCK_UPDATE. Hyper-V TSC page
clocksource is global and kvm_guest_time_update() only updates in on vCPU0
but this is not entirely correct: nothing blocks some other vCPU from
entering the guest before we finish the update on CPU0 and it can read
stale values from the page.

Invalidate TSC page in kvm_gen_update_masterclock() to switch all vCPUs
to using MSR based clocksource (HV_X64_MSR_TIME_REF_COUNT).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210316143736.964151-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-17 04:03:22 -04:00
Vitaly Kuznetsov d2547cf597 KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS
HV_X64_MSR_TSC_EMULATION_STATUS indicates whether TSC accesses are emulated
after migration (to accommodate for a different host TSC frequency when TSC
scaling is not supported; we don't implement this in KVM). Guest can use
the same MSR to stop TSC access emulation by writing zero. Writing anything
else is forbidden.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210316143736.964151-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-17 04:02:49 -04:00
Colin Ian King bab1770a2c
ftrace: Fix spelling mistake "disabed" -> "disabled"
There is a spelling mistake in a comment, fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-03-16 21:19:40 -07:00
Oleg Nesterov b2e9df850c x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART
Save the current_thread_info()->status of X86 in the new
restart_block->arch_data field so TS_COMPAT_RESTART can be removed again.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210201174716.GA17898@redhat.com
2021-03-16 22:13:11 +01:00
Oleg Nesterov 8c150ba2fb x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall()
The comment in get_nr_restart_syscall() says:

	 * The problem is that we can get here when ptrace pokes
	 * syscall-like values into regs even if we're not in a syscall
	 * at all.

Yes, but if not in a syscall then the

	status & (TS_COMPAT|TS_I386_REGS_POKED)

check below can't really help:

	- TS_COMPAT can't be set

	- TS_I386_REGS_POKED is only set if regs->orig_ax was changed by
	  32bit debugger; and even in this case get_nr_restart_syscall()
	  is only correct if the tracee is 32bit too.

Suppose that a 64bit debugger plays with a 32bit tracee and

	* Tracee calls sleep(2)	// TS_COMPAT is set
	* User interrupts the tracee by CTRL-C after 1 sec and does
	  "(gdb) call func()"
	* gdb saves the regs by PTRACE_GETREGS
	* does PTRACE_SETREGS to set %rip='func' and %orig_rax=-1
	* PTRACE_CONT		// TS_COMPAT is cleared
	* func() hits int3.
	* Debugger catches SIGTRAP.
	* Restore original regs by PTRACE_SETREGS.
	* PTRACE_CONT

get_nr_restart_syscall() wrongly returns __NR_restart_syscall==219, the
tracee calls ia32_sys_call_table[219] == sys_madvise.

Add the sticky TS_COMPAT_RESTART flag which survives after return to user
mode. It's going to be removed in the next step again by storing the
information in the restart block. As a further cleanup it might be possible
to remove also TS_I386_REGS_POKED with that.

Test-case:

  $ cvs -d :pserver:anoncvs:anoncvs@sourceware.org:/cvs/systemtap co ptrace-tests
  $ gcc -o erestartsys-trap-debuggee ptrace-tests/tests/erestartsys-trap-debuggee.c --m32
  $ gcc -o erestartsys-trap-debugger ptrace-tests/tests/erestartsys-trap-debugger.c -lutil
  $ ./erestartsys-trap-debugger
  Unexpected: retval 1, errno 22
  erestartsys-trap-debugger: ptrace-tests/tests/erestartsys-trap-debugger.c:421

Fixes: 609c19a385 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210201174709.GA17895@redhat.com
2021-03-16 22:13:11 +01:00
Oleg Nesterov 66c1b6d74c x86: Move TS_COMPAT back to asm/thread_info.h
Move TS_COMPAT back to asm/thread_info.h, close to TS_I386_REGS_POKED.

It was moved to asm/processor.h by b9d989c721 ("x86/asm: Move the
thread_info::status field to thread_struct"), then later 37a8f7c383
("x86/asm: Move 'status' from thread_struct to thread_info") moved the
'status' field back but TS_COMPAT was forgotten.

Preparatory patch to fix the COMPAT case for get_nr_restart_syscall()

Fixes: 609c19a385 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210201174649.GA17880@redhat.com
2021-03-16 22:13:11 +01:00
Kan Liang 2dc0572f2c perf/x86/intel: Fix unchecked MSR access error caused by VLBR_EVENT
On a Haswell machine, the perf_fuzzer managed to trigger this message:

[117248.075892] unchecked MSR access error: WRMSR to 0x3f1 (tried to
write 0x0400000000000000) at rIP: 0xffffffff8106e4f4
(native_write_msr+0x4/0x20)
[117248.089957] Call Trace:
[117248.092685]  intel_pmu_pebs_enable_all+0x31/0x40
[117248.097737]  intel_pmu_enable_all+0xa/0x10
[117248.102210]  __perf_event_task_sched_in+0x2df/0x2f0
[117248.107511]  finish_task_switch.isra.0+0x15f/0x280
[117248.112765]  schedule_tail+0xc/0x40
[117248.116562]  ret_from_fork+0x8/0x30

A fake event called VLBR_EVENT may use the bit 58 of the PEBS_ENABLE, if
the precise_ip is set. The bit 58 is reserved by the HW. Accessing the
bit causes the unchecked MSR access error.

The fake event doesn't support PEBS. The case should be rejected.

Fixes: 097e4311cd ("perf/x86: Add constraint to create guest LBR event without hw counter")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1615555298-140216-2-git-send-email-kan.liang@linux.intel.com
2021-03-16 21:44:39 +01:00
Kan Liang d88d05a9e0 perf/x86/intel: Fix a crash caused by zero PEBS status
A repeatable crash can be triggered by the perf_fuzzer on some Haswell
system.
https://lore.kernel.org/lkml/7170d3b-c17f-1ded-52aa-cc6d9ae999f4@maine.edu/

For some old CPUs (HSW and earlier), the PEBS status in a PEBS record
may be mistakenly set to 0. To minimize the impact of the defect, the
commit was introduced to try to avoid dropping the PEBS record for some
cases. It adds a check in the intel_pmu_drain_pebs_nhm(), and updates
the local pebs_status accordingly. However, it doesn't correct the PEBS
status in the PEBS record, which may trigger the crash, especially for
the large PEBS.

It's possible that all the PEBS records in a large PEBS have the PEBS
status 0. If so, the first get_next_pebs_record_by_bit() in the
__intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large
PEBS, the 'count' parameter must > 1. The second
get_next_pebs_record_by_bit() will crash.

Besides the local pebs_status, correct the PEBS status in the PEBS
record as well.

Fixes: 01330d7288 ("perf/x86: Allow zero PEBS status with only single active event")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1615555298-140216-1-git-send-email-kan.liang@linux.intel.com
2021-03-16 21:44:39 +01:00
Sean Christopherson 08889894cc KVM: x86/mmu: Store the address space ID in the TDP iterator
Store the address space ID in the TDP iterator so that it can be
retrieved without having to bounce through the root shadow page.  This
streamlines the code and fixes a Sparse warning about not properly using
rcu_dereference() when grabbing the ID from the root on the fly.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-5-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-16 14:16:34 -04:00
Ben Gardon b601c3bc9d KVM: x86/mmu: Factor out tdp_iter_return_to_root
In tdp_mmu_iter_cond_resched there is a call to tdp_iter_start which
causes the iterator to continue its walk over the paging structure from
the root. This is needed after a yield as paging structure could have
been freed in the interim.

The tdp_iter_start call is not very clear and something of a hack. It
requires exposing tdp_iter fields not used elsewhere in tdp_mmu.c and
the effect is not obvious from the function name. Factor a more aptly
named function out of tdp_iter_start and call it from
tdp_mmu_iter_cond_resched and tdp_iter_start.

No functional change intended.

Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-4-bgardon@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-16 14:16:17 -04:00
Ben Gardon 14f6fec2e8 KVM: x86/mmu: Fix RCU usage when atomically zapping SPTEs
Fix a missing rcu_dereference in tdp_mmu_zap_spte_atomic.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-3-bgardon@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-16 14:15:24 -04:00
Ben Gardon 70fb3e41a9 KVM: x86/mmu: Fix RCU usage in handle_removed_tdp_mmu_page
The pt passed into handle_removed_tdp_mmu_page does not need RCU
protection, as it is not at any risk of being freed by another thread at
that point. However, the implicit cast from tdp_sptep_t to u64 * dropped
the __rcu annotation without a proper rcu_derefrence. Fix this by
passing the pt as a tdp_ptep_t and then rcu_dereferencing it in
the function.

Suggested-by: Sean Christopherson <seanjc@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-2-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-16 14:14:59 -04:00
Peter Zijlstra a89dfde3dc x86: Remove dynamic NOP selection
This ensures that a NOP is a NOP and not a random other instruction that
is also a NOP. It allows simplification of dynamic code patching that
wants to verify existing code before writing new instructions (ftrace,
jump_label, static_call, etc..).

Differentiating on NOPs is not a feature.

This pessimises 32bit (DONTCARE) and 32bit on 64bit CPUs (CARELESS).
32bit is not a performance target.

Everything x86_64 since AMD K10 (2007) and Intel IvyBridge (2012) is
fine with using NOPL (as opposed to prefix NOP). And per FEATURE_NOPL
being required for x86_64, all x86_64 CPUs can use NOPL. So stop
caring about NOPs, simplify things and get on with life.

[ The problem seems to be that some uarchs can only decode NOPL on a
single front-end port while others have severe decode penalties for
excessive prefixes. All modern uarchs can handle both, except Atom,
which has prefix penalties. ]

[ Also, much doubt you can actually measure any of this on normal
workloads. ]

After this, FEATURE_NOPL is unused except for required-features for
x86_64. FEATURE_K8 is only used for PTI.

 [ bp: Kernel build measurements showed ~0.3s slowdown on Sandybridge
   which is hardly a slowdown. Get rid of X86_FEATURE_K7, while at it. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> # bpf
Acked-by: Linus Torvalds <torvalds@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20210312115749.065275711@infradead.org
2021-03-15 16:24:59 +01:00
Borislav Petkov f935178b5c x86/insn: Make insn_complete() static
... and move it above the only place it is used.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-22-bp@alien8.de
2021-03-15 13:03:46 +01:00
Borislav Petkov 404b639e51 x86/insn: Remove kernel_insn_init()
Now that it is not needed anymore, drop it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-21-bp@alien8.de
2021-03-15 12:58:36 +01:00
Borislav Petkov a277ce601c x86/tools/insn_sanity: Convert to insn_decode()
Simplify code, no functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-19-bp@alien8.de
2021-03-15 12:21:11 +01:00
Borislav Petkov 0c925c61da x86/tools/insn_decoder_test: Convert to insn_decode()
Simplify code, no functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-17-bp@alien8.de
2021-03-15 12:09:09 +01:00
Borislav Petkov 88afc23922 x86/uprobes: Convert to insn_decode()
Simplify code, no functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-16-bp@alien8.de
2021-03-15 12:05:03 +01:00
Borislav Petkov 0be7f42d6f x86/traps: Convert to insn_decode()
Simplify code, no functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-15-bp@alien8.de
2021-03-15 12:00:18 +01:00
Borislav Petkov 5e32c64bb6 x86/sev-es: Convert to insn_decode()
Simplify code, no functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-14-bp@alien8.de
2021-03-15 11:46:40 +01:00
Borislav Petkov 99e4b0de4d x86/sev-es: Split vc_decode_insn()
Split it into two helpers - a user- and a kernel-mode one for
readability. Yes, the original function body is not that convoluted but
splitting it makes following through that code trivial than having to
pay attention to each little difference when in user or in kernel mode.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-13-bp@alien8.de
2021-03-15 11:40:05 +01:00
Borislav Petkov 77e768ec13 x86/kprobes: Convert to insn_decode()
Simplify code, improve decoding error checking.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20210304174237.31945-12-bp@alien8.de
2021-03-15 11:28:20 +01:00
Borislav Petkov 1580f488ea x86/mce: Convert to insn_decode()
Simplify code, no functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-11-bp@alien8.de
2021-03-15 11:26:40 +01:00
Borislav Petkov 63c66cde7b x86/alternative: Use insn_decode()
No functional changes, just simplification.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-10-bp@alien8.de
2021-03-15 11:25:38 +01:00
Borislav Petkov 8c98a60554 perf/x86/intel/ds: Check return values of insn decoder functions
branch_type() doesn't need to call the full insn_decode() because it
doesn't need it in all cases thus leave the calls separate.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-9-bp@alien8.de
2021-03-15 11:23:48 +01:00
Borislav Petkov 2ff49881d6 perf/x86/intel/ds: Check insn_get_length() retval
intel_pmu_pebs_fixup_ip() needs only the insn length so use the
appropriate helper instead of a full decode. A full decode differs only
in running insn_complete() on the decoded insn but that is not needed
here.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-8-bp@alien8.de
2021-03-15 11:20:29 +01:00
Borislav Petkov 514ef77607 x86/boot/compressed/sev-es: Convert to insn_decode()
Other than simplifying the code there should be no functional changes
resulting from this.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-7-bp@alien8.de
2021-03-15 11:18:35 +01:00
Borislav Petkov 6e8c83d2a3 x86/insn-eval: Handle return values from the decoder
Now that the different instruction-inspecting functions return a value,
test that and return early from callers if error has been encountered.

While at it, do not call insn_get_modrm() when calling
insn_get_displacement() because latter will make sure to call
insn_get_modrm() if ModRM hasn't been parsed yet.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-6-bp@alien8.de
2021-03-15 11:12:15 +01:00
Borislav Petkov 93281c4a96 x86/insn: Add an insn_decode() API
Users of the instruction decoder should use this to decode instruction
bytes. For that, have insn*() helpers return an int value to denote
success/failure. When there's an error fetching the next insn byte and
the insn falls short, return -ENODATA to denote that.

While at it, make insn_get_opcode() more stricter as to whether what has
seen so far is a valid insn and if not.

Copy linux/kconfig.h for the tools-version of the decoder so that it can
use IS_ENABLED().

Also, cast the INSN_MODE_KERN dummy define value to (enum insn_mode)
for tools use of the decoder because perf tool builds with -Werror and
errors out with -Werror=sign-compare otherwise.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20210304174237.31945-5-bp@alien8.de
2021-03-15 11:05:47 +01:00
Borislav Petkov d30c7b820b x86/insn: Add a __ignore_sync_check__ marker
Add an explicit __ignore_sync_check__ marker which will be used to mark
lines which are supposed to be ignored by file synchronization check
scripts, its advantage being that it explicitly denotes such lines in
the code.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20210304174237.31945-4-bp@alien8.de
2021-03-15 11:00:57 +01:00
Borislav Petkov 508ef28674 x86/insn: Add @buf_len param to insn_init() kernel-doc comment
It wasn't documented so add it. No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20210304174237.31945-3-bp@alien8.de
2021-03-15 11:00:19 +01:00
Borislav Petkov 9e761296c5 x86/insn: Rename insn_decode() to insn_decode_from_regs()
Rename insn_decode() to insn_decode_from_regs() to denote that it
receives regs as param and uses registers from there during decoding.
Free the former name for a more generic version of the function.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-2-bp@alien8.de
2021-03-15 10:53:10 +01:00
Borislav Petkov 0d39131980 Merge 'x86/seves' into x86/core
Pick up dependent changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
2021-03-15 10:49:59 +01:00
Borislav Petkov aa7680f6fe Linux 5.12-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBOgu4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGUd0H/3Ey8aWjVAig9Pe+
 VQVZKwG+LXWH6UmUx5qyaTxophhmGnWLvkigJMn63qIg4eQtfp2gNFHK+T4OJNIP
 ybnkjFZ337x4J9zD6m8mt4Wmelq9iW2wNOS+3YZAyYiGlXfMGM7SlYRCQRQznTED
 2O/JCMsOoP+Z8tr5ah/bzs0dANsXmTZ3QqRP2uzb6irKTgFR3/weOhj+Ht1oJ4Aq
 V+bgdcwhtk20hJhlvVeqws+o74LR789tTDCknlz/YNMv9e6VPfyIQ5vJAcFmZATE
 Ezj9yzkZ4IU+Ux6ikAyaFyBU8d1a4Wqye3eHCZBsEo6tcSAhbTZ90eoU86vh6ajS
 LZjwkNw=
 =6y1u
 -----END PGP SIGNATURE-----

Merge tag 'v5.12-rc3' into x86/core

Pick up dependent SEV-ES urgent changes to base new work ontop.

Signed-off-by: Borislav Petkov <bp@suse.de>
2021-03-15 10:49:00 +01:00
Linus Torvalds 19469d2ada A single objtool fix to handle the PUSHF/POPF validation correctly for the
paravirt changes which modified arch_local_irq_restore not to use popf.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmBOLF8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoSfOD/940XqIrDp/cXuqKL1r4zE5n4DF/nBy
 cHp8KOfo+T302crNOvylpSuL7kCCcfDM/E2BBBZ7JubN4d1VA0HDF0tV6PApmpWx
 4uGT/9ZXB7Hl2Gu5M+VvOSBQIytPYyQCGdbiWeHYfvO5HTqC1G8Pfbg2Otw+6Wgy
 jUJuuDS0xwmlo56WTDWz1aB/f/oOHUEaS3XDeaqZ86oqvD0di+tODUJoDvtYGkam
 K6nXRhFfEa4bI7Ynsa4RyMhjNOxNiFDimYnZjgGba4+8X6KGSG4N83rOr6tjHGL+
 AsBM1o5TRfBpudi5rbDAOEIhy0V3FyefIbeQeL6DZoNMS4ey8qRkYkqCLp+lOxTm
 F9T5ORZuWV43gs4c2GODGy5MHDKzcPA15OBRU2BECKrILnNG5MPMcNt3iTJbO8kY
 YNZs2svGw8/MVl928idjYPecEsTNzLi3z3MdV6QfJLCbGpIBzeX83PbvK0dKgxwL
 yeuJXBOz3sYbcxxLbueGv2Z+xH0wneHXUqPJT/YI8KFdxknZkwSnf4MA5bqVu2Mn
 q4etZxtAokvyl79NZQXvLgRxCwNj4PeXli1k11t4WhJxDLpKIm8N7QMNSKu4Z/tw
 GxAbe85Wut1ywU6srGKEnpibCFAmFyZ5HN+awKrt5BkphdGwphYb88Ldk3859o0B
 ZIKBRlhIz870ag==
 =IIWb
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fix from Thomas Gleixner:
 "A single objtool fix to handle the PUSHF/POPF validation correctly for
  the paravirt changes which modified arch_local_irq_restore not to use
  popf"

* tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool,x86: Fix uaccess PUSHF/POPF validation
2021-03-14 13:15:55 -07:00
Linus Torvalds 75013c6c52 - Make sure PMU internal buffers are flushed for per-CPU events too and
properly handle PID/TID for large PEBS.
 
 - Handle the case properly when there's no PMU and therefore return an
   empty list of perf MSRs for VMX to switch instead of reading random
   garbage from the stack.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmBOBHoACgkQEsHwGGHe
 VUoYHBAAmSY3P4Q91ZS+Sz1orGGX0LufQ0ZVWxnNUD9sFibz5Y2MxyJpQPm6Ae4U
 1nO0+QyzbQPwuWKcQxlLHOJXkypkFSdRyR3cpAE5BOIXvqna07xBg/zuTFaOoDek
 qn42RHLs5TQB1MNKY+0dyJAfjEHBFrm0CsO27L99TRv5nEsdECM/vUswvasc+QMC
 dTS9sMHoiDVwJ8DFn6qmJ8AqkNxmcZgvNOD62TAt8Ac6u6zTGqq1oN+BMpQFRo9a
 j/Fu+5PZS4bH/pMlpL0OR6AlmR1PPJ8e1Ik+1Dk0brCrSNdiXtS3DSTllbGxNFi6
 4d5oSoStAyDNrihwPm2dw+VofFC03PEVZN095WVq7Iqn9cK/nxFgBEpaIe6fiwa2
 MrZ2YiDxrOAin0hxUSu8oLwgOwxmedaSQwo1tyzZswVtXSqc7p4JawzBiIo93RAJ
 UHpXI9zwgEyOGUJ95qcbezJVgILJqExjN+SOxaNjoqkAX8Hfgrf4aKDIMrcMC02Z
 ZFW86MXL2Rwk+WspAKlWtPgAGuU5sljXeyDK0MRcHwAom8cX+Fod80ocI+xjX8JB
 R73cd9dE2iWzIADikCItixzka+HuUBgWDqVT85yTzBt/KqwbIeE7kn6VCJmoJBbw
 c9aRcyqEBky8FO6EpD0vIP2jcnlbvUnoq5wG0KV9KXaQDhxtZfk=
 =djiL
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Make sure PMU internal buffers are flushed for per-CPU events too and
   properly handle PID/TID for large PEBS.

 - Handle the case properly when there's no PMU and therefore return an
   empty list of perf MSRs for VMX to switch instead of reading random
   garbage from the stack.

* tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
  perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
  perf/core: Flush PMU internal buffers for per-CPU events
2021-03-14 12:57:17 -07:00
Linus Torvalds 0a7c10df49 - A couple of SEV-ES fixes and robustifications: verify usermode stack
pointer in NMI is not coming from the syscall gap, correctly track IRQ
 states in the #VC handler and access user insn bytes atomically in same
 handler as latter cannot sleep.
 
 - Balance 32-bit fast syscall exit path to do the proper work on exit
 and thus not confuse audit and ptrace frameworks.
 
 - Two fixes for the ORC unwinder going "off the rails" into KASAN
 redzones and when ORC data is missing.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmBN+ksACgkQEsHwGGHe
 VUpYtBAAj199n50ipP2x+jjgCueIytMqwCCRozrgZ8JF0Al83piVfjhuAYQpfvD8
 cKCxN/jSEF0YoUg/grBTPLG6f0J4B2GoekSlSc3ljnuhBby4iJ9B4YgE7qym6tuT
 G/mBOuAo2HBzvB70i1BYPN6mrA+6SG1d4tIhRLGKHCz+hQm8yYnJYVbiOkLBECeP
 0QOOpX6hR5ytOOCRqwD/O5YIdZD8NvlA4sQE522Mrw/4PWz9XcS2kwpOQFHoRsFL
 if3t2yLMiGMfV0dyUCMoGZl0NqpnIZynoNdVPq/bllTW5obnmh6z8Eir44PzJmVJ
 ftVZTcReRqm5ObgwZh0g1H7CRjKe0xU9FyJHRmQl3Xb5g3wRAm3OkMJ2hQcOUPy9
 VOB4vp7kbDg3MmGJe2xOtsEeAyVHGzTaWlmZ0moxjJXiLTjUy69eelmvLepypO3P
 Bo/xpjn9hN7L9ptKv1exsSatQRN7KWTCxtz+NBJgC4pEpkdtDBkaWunIKeauPTZ2
 CAJJrp2sn7i5/CKPOuhjbQ+nSTMptpuZQxTDrjVUO0/6qs4ffQT3O+WXRV1bQ07v
 ObRqi0hIYgm4vSiBfVRfxOU+Zrx0j3kny4/xUs6CIjMjrjIp4RgBbqvZ95ZMooMi
 yeyZdVfzQ7PRaam5J2V3IHxKz7554hvMl5Zf4zAdl0qcQw3YZ0o=
 =rw8S
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - A couple of SEV-ES fixes and robustifications: verify usermode stack
   pointer in NMI is not coming from the syscall gap, correctly track
   IRQ states in the #VC handler and access user insn bytes atomically
   in same handler as latter cannot sleep.

 - Balance 32-bit fast syscall exit path to do the proper work on exit
   and thus not confuse audit and ptrace frameworks.

 - Two fixes for the ORC unwinder going "off the rails" into KASAN
   redzones and when ORC data is missing.

* tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev-es: Use __copy_from_user_inatomic()
  x86/sev-es: Correctly track IRQ states in runtime #VC handler
  x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
  x86/sev-es: Introduce ip_within_syscall_gap() helper
  x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
  x86/unwind/orc: Silence warnings caused by missing ORC data
  x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
2021-03-14 12:48:10 -07:00
Linus Torvalds 9d0c8e793f More fixes for ARM and x86.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmBLsyoUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMpYgf/Zu1Byif+XZVdwm52wJN38ppUUVmn
 4u8HvQ8Ht+P0cGg1IaNx9D5QXGRgdn72qEpWUF5aH03ahTANAuf6zXw+evKmiub/
 RtJfxZWEcWeLdugLVHUSrR4MOox7uvFmCdcdht4sEPdjFdH/9JeceC3A1pZ/DYTR
 +eS+E3dMWQjXnd2Omo/5f5H1LTZjNLEditnkcHT5unwKKukc008V/avgs8xOAKJB
 xf3oqJF960IO+NYf8rRQb8WtyGeo0grrWjgeqvZ37gwGUaFB9ldVxchsVLsL66OR
 bJRIoSiTgL+TUYSMQ5mKG4tmmBnPHUHfgfNoOXlWMoJHIjFeQ9oM6eTHhA==
 =QTFW
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "More fixes for ARM and x86"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: LAPIC: Advancing the timer expiration on guest initiated write
  KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
  KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
  kvm: x86: annotate RCU pointers
  KVM: arm64: Fix exclusive limit for IPA size
  KVM: arm64: Reject VM creation when the default IPA size is unsupported
  KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
  KVM: arm64: Don't use cbz/adr with external symbols
  KVM: arm64: Fix range alignment when walking page tables
  KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
  KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
  KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
  KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
  KVM: arm64: Fix nVHE hyp panic host context restore
  KVM: arm64: Avoid corrupting vCPU context register in guest exit
  KVM: arm64: nvhe: Save the SPE context early
  kvm: x86: use NULL instead of using plain integer as pointer
  KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
  KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
2021-03-14 12:35:02 -07:00
Linus Torvalds 6bf8819fed xen: branch for v5.12-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYEsmRgAKCRCAXGG7T9hj
 vsQ9AP9oN1PKbTGn9U6FR/yJtMuD2XuX8a86PnMI8iM/bnox5QEA/kLIOBknM/nF
 bPDfBcb72BERKX+83qtd5153zcbhww4=
 =a/rf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.12b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Two fix series and a single cleanup:

   - a small cleanup patch to remove unneeded symbol exports

   - a series to cleanup Xen grant handling (avoiding allocations in
     some cases, and using common defines for "invalid" values)

   - a series to address a race issue in Xen event channel handling"

* tag 'for-linus-5.12b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  Xen/gntdev: don't needlessly use kvcalloc()
  Xen/gnttab: introduce common INVALID_GRANT_{HANDLE,REF}
  Xen/gntdev: don't needlessly allocate k{,un}map_ops[]
  Xen: drop exports of {set,clear}_foreign_p2m_mapping()
  xen/events: avoid handling the same event on two cpus at the same time
  xen/events: don't unmask an event channel when an eoi is pending
  xen/events: reset affinity of 2-level event when tearing it down
2021-03-12 11:34:36 -08:00
Wanpeng Li 35737d2db2 KVM: LAPIC: Advancing the timer expiration on guest initiated write
Advancing the timer expiration should only be necessary on guest initiated
writes. When we cancel the timer and clear .pending during state restore,
clear expired_tscdeadline as well.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1614818118-965-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-12 13:18:52 -05:00
Sean Christopherson 8df9f1af2e KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
If mmu_lock is held for write, don't bother setting !PRESENT SPTEs to
REMOVED_SPTE when recursively zapping SPTEs as part of shadow page
removal.  The concurrent write protections provided by REMOVED_SPTE are
not needed, there are no backing page side effects to record, and MMIO
SPTEs can be left as is since they are protected by the memslot
generation, not by ensuring that the MMIO SPTE is unreachable (which
is racy with respect to lockless walks regardless of zapping behavior).

Skipping !PRESENT drastically reduces the number of updates needed to
tear down sparsely populated MMUs, e.g. when tearing down a 6gb VM that
didn't touch much memory, 6929/7168 (~96.6%) of SPTEs were '0' and could
be skipped.

Avoiding the write itself is likely close to a wash, but avoiding
__handle_changed_spte() is a clear-cut win as that involves saving and
restoring all non-volatile GPRs (it's a subtly big function), as well as
several conditional branches before bailing out.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210310003029.1250571-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-12 13:18:52 -05:00
Wanpeng Li d7eb79c629 KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                88
On-line CPU(s) list:   0-63
Off-line CPU(s) list:  64-87

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.10.0-rc3-tlinux2-0050+ root=/dev/mapper/cl-root ro
rd.lvm.lv=cl/root rhgb quiet console=ttyS0 LANG=en_US .UTF-8 no-kvmclock-vsyscall

# echo 1 > /sys/devices/system/cpu/cpu76/online
-bash: echo: write error: Cannot allocate memory

The per-cpu vsyscall pvclock data pointer assigns either an element of the
static array hv_clock_boot (#vCPU <= 64) or dynamically allocated memory
hvclock_mem (vCPU > 64), the dynamically memory will not be allocated if
kvmclock vsyscall is disabled, this can result in cpu hotpluged fails in
kvmclock_setup_percpu() which returns -ENOMEM. It's broken for no-vsyscall
and sometimes you end up with vsyscall disabled if the host does something
strange. This patch fixes it by allocating this dynamically memory
unconditionally even if vsyscall is disabled.

Fixes: 6a1cac56f4 ("x86/kvm: Use __bss_decrypted attribute in shared variables")
Reported-by: Zelin Deng <zelin.deng@linux.alibaba.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: stable@vger.kernel.org#v4.19-rc5+
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1614130683-24137-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-12 13:18:16 -05:00
Muhammad Usama Anjum 6fcd9cbc6a kvm: x86: annotate RCU pointers
This patch adds the annotation to fix the following sparse errors:
arch/x86/kvm//x86.c:8147:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:8147:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//x86.c:8147:15:    struct kvm_apic_map *
arch/x86/kvm//x86.c:10628:16: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:10628:16:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//x86.c:10628:16:    struct kvm_apic_map *
arch/x86/kvm//x86.c:10629:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:10629:15:    struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//x86.c:10629:15:    struct kvm_pmu_event_filter *
arch/x86/kvm//lapic.c:267:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:267:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:267:15:    struct kvm_apic_map *
arch/x86/kvm//lapic.c:269:9: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:269:9:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:269:9:    struct kvm_apic_map *
arch/x86/kvm//lapic.c:637:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:637:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:637:15:    struct kvm_apic_map *
arch/x86/kvm//lapic.c:994:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:994:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:994:15:    struct kvm_apic_map *
arch/x86/kvm//lapic.c:1036:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:1036:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:1036:15:    struct kvm_apic_map *
arch/x86/kvm//lapic.c:1173:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:1173:15:    struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:1173:15:    struct kvm_apic_map *
arch/x86/kvm//pmu.c:190:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:190:18:    struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:190:18:    struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:251:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:251:18:    struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:251:18:    struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:522:18:    struct kvm_pmu_event_filter *

Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com>
Message-Id: <20210305191123.GA497469@LEGION>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-12 13:17:41 -05:00
Peter Zijlstra ba08abca66 objtool,x86: Fix uaccess PUSHF/POPF validation
Commit ab234a260b ("x86/pv: Rework arch_local_irq_restore() to not
use popf") replaced "push %reg; popf" with something like: "test
$0x200, %reg; jz 1f; sti; 1:", which breaks the pushf/popf symmetry
that commit ea24213d80 ("objtool: Add UACCESS validation") relies
on.

The result is:

  drivers/gpu/drm/amd/amdgpu/si.o: warning: objtool: si_common_hw_init()+0xf36: PUSHF stack exhausted

Meanwhile, commit c9c324dc22 ("objtool: Support stack layout changes
in alternatives") makes that we can actually use stack-ops in
alternatives, which means we can revert 1ff865e343 ("x86,smap: Fix
smap_{save,restore}() alternatives").

That in turn means we can limit the PUSHF/POPF handling of
ea24213d80 to those instructions that are in alternatives.

Fixes: ab234a260b ("x86/pv: Rework arch_local_irq_restore() to not use popf")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/YEY4rIbQYa5fnnEp@hirez.programming.kicks-ass.net
2021-03-12 09:15:49 +01:00
Juergen Gross 054ac8ad5e x86/paravirt: Have only one paravirt patch function
There is no need any longer to have different paravirt patch functions
for native and Xen. Eliminate native_patch() and rename
paravirt_patch_default() to paravirt_patch().

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-15-jgross@suse.com
2021-03-11 20:11:09 +01:00
Juergen Gross fafe5e7422 x86/paravirt: Switch functions with custom code to ALTERNATIVE
Instead of using paravirt patching for custom code sequences use
ALTERNATIVE for the functions with custom code replacements.

Instead of patching an ud2 instruction for unpopulated vector entries
into the caller site, use a simple function just calling BUG() as a
replacement.

Simplify the register defines for assembler paravirt calling, as there
isn't much usage left.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-14-jgross@suse.com
2021-03-11 20:07:01 +01:00
Juergen Gross 00aa3193ab x86/paravirt: Add new PVOP_ALT* macros to support pvops in ALTERNATIVEs
Instead of using paravirt patching for custom code sequences add
support for using ALTERNATIVE handling combined with paravirt call
patching.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-13-jgross@suse.com
2021-03-11 20:05:44 +01:00
Juergen Gross ae755b5a45 x86/paravirt: Switch iret pvops to ALTERNATIVE
The iret paravirt op is rather special as it is using a jmp instead
of a call instruction. Switch it to ALTERNATIVE.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-12-jgross@suse.com
2021-03-11 19:58:54 +01:00
Juergen Gross 0b8d366a94 x86/paravirt: Simplify paravirt macros
The central pvops call macros ____PVOP_CALL() and ____PVOP_VCALL() are
looking very similar now.

The main differences are using PVOP_VCALL_ARGS or PVOP_CALL_ARGS, which
are identical, and the return value handling.

So drop PVOP_VCALL_ARGS and instead of ____PVOP_VCALL() just use
(void)____PVOP_CALL(long, ...).

Note that it isn't easily possible to just redefine ____PVOP_VCALL()
to use ____PVOP_CALL() instead, as this would require further hiding of
commas in macro parameters.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-11-jgross@suse.com
2021-03-11 19:52:52 +01:00
Juergen Gross 33634e42e3 x86/paravirt: Remove no longer needed 32-bit pvops cruft
PVOP_VCALL4() is only used for Xen PV, while PVOP_CALL4() isn't used
at all. Keep PVOP_CALL4() for 64 bits due to symmetry reasons.

This allows to remove the 32-bit definitions of those macros leading
to a substantial simplification of the paravirt macros, as those were
the only ones needing non-empty "pre" and "post" parameters.

PVOP_CALLEE2() and PVOP_VCALLEE2() are used nowhere, so remove them.

Another no longer needed case is special handling of return types
larger than unsigned long. Replace that with a BUILD_BUG_ON().

DISABLE_INTERRUPTS() is used in 32-bit code only, so it can just be
replaced by cli.

INTERRUPT_RETURN in 32-bit code can be replaced by iret.

ENABLE_INTERRUPTS is used nowhere, so it can be removed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-10-jgross@suse.com
2021-03-11 19:51:55 +01:00
Juergen Gross 4e6292114c x86/paravirt: Add new features for paravirt patching
For being able to switch paravirt patching from special cased custom
code sequences to ALTERNATIVE handling some X86_FEATURE_* are needed
as new features. This enables to have the standard indirect pv call
as the default code and to patch that with the non-Xen custom code
sequence via ALTERNATIVE patching later.

Make sure paravirt patching is performed before alternatives patching.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-9-jgross@suse.com
2021-03-11 19:51:49 +01:00
Juergen Gross 2fe2a2c7a9 x86/alternative: Use ALTERNATIVE_TERNARY() in _static_cpu_has()
_static_cpu_has() contains a completely open coded version of
ALTERNATIVE_TERNARY(). Replace that with the macro instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210311142319.4723-8-jgross@suse.com
2021-03-11 19:33:43 +01:00
Juergen Gross e208b3c4a9 x86/alternative: Support ALTERNATIVE_TERNARY
Add ALTERNATIVE_TERNARY support for replacing an initial instruction
with either of two instructions depending on a feature:

  ALTERNATIVE_TERNARY "default_instr", FEATURE_NR,
                      "feature_on_instr", "feature_off_instr"

which will start with "default_instr" and at patch time will,
depending on FEATURE_NR being set or not, patch that with either
"feature_on_instr" or "feature_off_instr".

 [ bp: Add comment ontop. ]

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-7-jgross@suse.com
2021-03-11 16:57:31 +01:00
Juergen Gross dda7bb7648 x86/alternative: Support not-feature
Add support for alternative patching for the case a feature is not
present on the current CPU. For users of ALTERNATIVE() and friends, an
inverted feature is specified by applying the ALT_NOT() macro to it,
e.g.:

  ALTERNATIVE(old, new, ALT_NOT(feature));

Committer note:

The decision to encode the NOT-bit in the feature bit itself is because
a future change which would make objtool generate such alternative
calls, would keep the code in objtool itself fairly simple.

Also, this allows for the alternative macros to support the NOT feature
without having to change them.

Finally, the u16 cpuid member encoding the X86_FEATURE_ flags is not an
ABI so if more bits are needed, cpuid itself can be enlarged or a flags
field can be added to struct alt_instr after having considered the size
growth in either cases.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210311142319.4723-6-jgross@suse.com
2021-03-11 16:44:01 +01:00
Juergen Gross a0e2bf7cb7 x86/paravirt: Switch time pvops functions to use static_call()
The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.

Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-5-jgross@suse.com
2021-03-11 16:17:52 +01:00
Juergen Gross 5e21a3ecad x86/alternative: Merge include files
Merge arch/x86/include/asm/alternative-asm.h into
arch/x86/include/asm/alternative.h in order to make it easier to use
common definitions later.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210311142319.4723-2-jgross@suse.com
2021-03-11 15:58:02 +01:00
Cao jin 81519f7788 x86/setup: Remove unused RESERVE_BRK_ARRAY()
Since a13f2ef168 ("x86/xen: remove 32-bit Xen PV guest support"),
RESERVE_BRK_ARRAY() has no user anymore so drop it.

Update related comments too.

Signed-off-by: Cao jin <jojing64@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210311083919.27530-1-jojing64@gmail.com
2021-03-11 11:47:37 +01:00
Jani Nikula 35bb28ece9 Merge drm/drm-next into drm-intel-next
Sync up with upstream.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-03-11 08:52:53 +02:00
David S. Miller 547fd08377 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-03-10

The following pull-request contains BPF updates for your *net* tree.

We've added 8 non-merge commits during the last 5 day(s) which contain
a total of 11 files changed, 136 insertions(+), 17 deletions(-).

The main changes are:

1) Reject bogus use of vmlinux BTF as map/prog creation BTF, from Alexei Starovoitov.

2) Fix allocation failure splat in x86 JIT for large progs. Also fix overwriting
   percpu cgroup storage from tracing programs when nested, from Yonghong Song.

3) Fix rx queue retrieval in XDP for multi-queue veth, from Maciej Fijalkowski.

4) Fix bpf_check_mtu() helper API before freeze to have mtu_len as custom skb/xdp
   L3 input length, from Jesper Dangaard Brouer.

5) Fix inode_storage's lookup_elem return value upon having bad fd, from Tal Lossos.

6) Fix bpftool and libbpf cross-build on MacOS, from Georgi Valkov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10 15:14:56 -08:00
Jan Beulich bce21a2b48 Xen/gnttab: introduce common INVALID_GRANT_{HANDLE,REF}
It's not helpful if every driver has to cook its own. Generalize
xenbus'es INVALID_GRANT_HANDLE and pcifront's INVALID_GRANT_REF (which
shouldn't have expanded to zero to begin with). Use the constants in
p2m.c and gntdev.c right away, and update field types where necessary so
they would match with the constants' types (albeit without touching
struct ioctl_gntdev_grant_ref's ref field, as that's part of the public
interface of the kernel and would require introducing a dependency on
Xen's grant_table.h public header).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/db7c38a5-0d75-d5d1-19de-e5fe9f0b9c48@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-10 16:39:29 -06:00
Jan Beulich 0f9b05b9a0 Xen: drop exports of {set,clear}_foreign_p2m_mapping()
They're only used internally, and the layering violation they contain
(x86) or imply (Arm) of calling HYPERVISOR_grant_table_op() strongly
advise against any (uncontrolled) use from a module. The functions also
never had users except the ones from drivers/xen/grant-table.c forever
since their introduction in 3.15.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/746a5cd6-1446-eda4-8b23-03c1cac30b8d@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2021-03-10 16:39:29 -06:00
Sean Christopherson c8e2fe13d1 x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
Initialize x86_pmu.guest_get_msrs to return 0/NULL to handle the "nop"
case.  Patching in perf_guest_get_msrs_nop() during setup does not work
if there is no PMU, as setup bails before updating the static calls,
leaving x86_pmu.guest_get_msrs NULL and thus a complete nop.  Ultimately,
this causes VMX abort on VM-Exit due to KVM putting random garbage from
the stack into the MSR load list.

Add a comment in KVM to note that nr_msrs is valid if and only if the
return value is non-NULL.

Fixes: abd562df94 ("x86/perf: Use static_call for x86_pmu.guest_get_msrs")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+cce9ef2dd25246f815ee@syzkaller.appspotmail.com
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210309171019.1125243-1-seanjc@google.com
2021-03-10 16:45:09 +01:00
Mark Brown b18adee4ce stacktrace: Move documentation for arch_stack_walk_reliable() to header
Currently arch_stack_walk_reliable() is documented with an identical
comment in both x86 and S/390 implementations which is a bit redundant.
Move this to the header and convert to kerneldoc while we're at it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lkml.kernel.org/r/20210309194125.652-1-broonie@kernel.org
2021-03-10 15:52:31 +01:00
Linus Torvalds 05a59d7979 Merge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.

 2) TX skb error handling fix in mt76 driver, also from Felix.

 3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.

 4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
    From Cong Wang.

 5) Use correct printf format for dma_addr_t in ath11k, from Geert
    Uytterhoeven.

 6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
    Hsieh.

 7) Don't report truncated frames to mac80211 in mt76 driver, from
    Lorenzop Bianconi.

 8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.

 9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.

10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
    Arjun Roy.

11) Ignore routes with deleted nexthop object in mlxsw, from Ido
    Schimmel.

12) Need to undo tcp early demux lookup sometimes in nf_nat, from
    Florian Westphal.

13) Fix gro aggregation for udp encaps with zero csum, from Daniel
    Borkmann.

14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
    Donenfeld.

15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.

16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.

17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
    Oltean.

18) Make sure textsearch copntrol block is large enough, from Wilem de
    Bruijn.

19) Revert MAC changes to r8152 leading to instability, from Hates Wang.

20) Advance iov in 9p even for empty reads, from Jissheng Zhang.

21) Double hook unregister in nftables, from PabloNeira Ayuso.

22) Fix memleak in ixgbe, fropm Dinghao Liu.

23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.

24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
    Tang.

25) Fix DOI refcount bugs in cipso, from Paul Moore.

26) One too many irqsave in ibmvnic, from Junlin Yang.

27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
    Balazs Nemeth.

* git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
  s390/qeth: fix notification for pending buffers during teardown
  s390/qeth: schedule TX NAPI on QAOB completion
  s390/qeth: improve completion of pending TX buffers
  s390/qeth: fix memory leak after failed TX Buffer allocation
  net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
  net: check if protocol extracted by virtio_net_hdr_set_proto is correct
  net: dsa: xrs700x: check if partner is same as port in hsr join
  net: lapbether: Remove netif_start_queue / netif_stop_queue
  atm: idt77252: fix null-ptr-dereference
  atm: uPD98402: fix incorrect allocation
  atm: fix a typo in the struct description
  net: qrtr: fix error return code of qrtr_sendmsg()
  mptcp: fix length of ADD_ADDR with port sub-option
  net: bonding: fix error return code of bond_neigh_init()
  net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
  net: enetc: set MAC RX FIFO to recommended value
  net: davicom: Use platform_get_irq_optional()
  net: davicom: Fix regulator not turned off on driver removal
  net: davicom: Fix regulator not turned off on failed probe
  net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
  ...
2021-03-09 17:15:56 -08:00
Yonghong Song de920fc64c bpf, x86: Use kvmalloc_array instead kmalloc_array in bpf_jit_comp
x86 bpf_jit_comp.c used kmalloc_array to store jited addresses
for each bpf insn. With a large bpf program, we have see the
following allocation failures in our production server:

    page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP),
                             nodemask=(null),cpuset=/,mems_allowed=0"
    Call Trace:
    dump_stack+0x50/0x70
    warn_alloc.cold.120+0x72/0xd2
    ? __alloc_pages_direct_compact+0x157/0x160
    __alloc_pages_slowpath+0xcdb/0xd00
    ? get_page_from_freelist+0xe44/0x1600
    ? vunmap_page_range+0x1ba/0x340
    __alloc_pages_nodemask+0x2c9/0x320
    kmalloc_order+0x18/0x80
    kmalloc_order_trace+0x1d/0xa0
    bpf_int_jit_compile+0x1e2/0x484
    ? kmalloc_order_trace+0x1d/0xa0
    bpf_prog_select_runtime+0xc3/0x150
    bpf_prog_load+0x480/0x720
    ? __mod_memcg_lruvec_state+0x21/0x100
    __do_sys_bpf+0xc31/0x2040
    ? close_pdeo+0x86/0xe0
    do_syscall_64+0x42/0x110
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f2f300f7fa9
    Code: Bad RIP value.

Dumped assembly:

    ffffffff810b6d70 <bpf_int_jit_compile>:
    ; {
    ffffffff810b6d70: e8 eb a5 b4 00        callq   0xffffffff81c01360 <__fentry__>
    ffffffff810b6d75: 41 57                 pushq   %r15
    ...
    ffffffff810b6f39: e9 72 fe ff ff        jmp     0xffffffff810b6db0 <bpf_int_jit_compile+0x40>
    ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
    ffffffff810b6f3e: 8b 45 0c              movl    12(%rbp), %eax
    ;       return __kmalloc(bytes, flags);
    ffffffff810b6f41: be c0 0c 00 00        movl    $3264, %esi
    ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
    ffffffff810b6f46: 8d 78 01              leal    1(%rax), %edi
    ;       if (unlikely(check_mul_overflow(n, size, &bytes)))
    ffffffff810b6f49: 48 c1 e7 02           shlq    $2, %rdi
    ;       return __kmalloc(bytes, flags);
    ffffffff810b6f4d: e8 8e 0c 1d 00        callq   0xffffffff81287be0 <__kmalloc>
    ;       if (!addrs) {
    ffffffff810b6f52: 48 85 c0              testq   %rax, %rax

Change kmalloc_array() to kvmalloc_array() to avoid potential
allocation error for big bpf programs.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210309015647.3657852-1-yhs@fb.com
2021-03-10 00:10:23 +01:00
Juergen Gross db16e07269 x86/alternative: Drop unused feature parameter from ALTINSTR_REPLACEMENT()
The macro ALTINSTR_REPLACEMENT() doesn't make use of the feature
parameter, so drop it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210309134813.23912-4-jgross@suse.com
2021-03-09 20:08:28 +01:00
Joerg Roedel bffe30dd9f x86/sev-es: Use __copy_from_user_inatomic()
The #VC handler must run in atomic context and cannot sleep. This is a
problem when it tries to fetch instruction bytes from user-space via
copy_from_user().

Introduce a insn_fetch_from_user_inatomic() helper which uses
__copy_from_user_inatomic() to safely copy the instruction bytes to
kernel memory in the #VC handler.

Fixes: 5e3427a7bc ("x86/sev-es: Handle instruction fetches from user-space")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-6-joro@8bytes.org
2021-03-09 12:37:54 +01:00
Joerg Roedel 62441a1fb5 x86/sev-es: Correctly track IRQ states in runtime #VC handler
Call irqentry_nmi_enter()/irqentry_nmi_exit() in the #VC handler to
correctly track the IRQ state during its execution.

Fixes: 0786138c78 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-5-joro@8bytes.org
2021-03-09 12:33:46 +01:00
Joerg Roedel 545ac14c16 x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
The code in the NMI handler to adjust the #VC handler IST stack is
needed in case an NMI hits when the #VC handler is still using its IST
stack.

But the check for this condition also needs to look if the regs->sp
value is trusted, meaning it was not set by user-space. Extend the check
to not use regs->sp when the NMI interrupted user-space code or the
SYSCALL gap.

Fixes: 315562c9af ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler")
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-3-joro@8bytes.org
2021-03-09 12:26:26 +01:00
Tom Lendacky 229164175f x86/virtio: Have SEV guests enforce restricted virtio memory access
An SEV guest requires that virtio devices use the DMA API to allow the
hypervisor to successfully access guest memory as needed.

The VIRTIO_F_VERSION_1 and VIRTIO_F_ACCESS_PLATFORM features tell virtio
to use the DMA API. Add arch_has_restricted_virtio_memory_access() for
x86, to fail the device probe if these features have not been set for the
device when running as an SEV guest.

 [ bp: Fix -Wmissing-prototypes warning
   Reported-by: kernel test robot <lkp@intel.com> ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/b46e0211f77ca1831f11132f969d470a6ffc9267.1614897610.git.thomas.lendacky@amd.com
2021-03-08 20:41:33 +01:00
Lakshmi Ramasubramanian 179350f00e x86: Use ELF fields defined in 'struct kimage'
ELF related fields elf_headers, elf_headers_sz, and elf_load_addr
have been moved from 'struct kimage_arch' to 'struct kimage'.

Use the ELF fields defined in 'struct kimage'.

Suggested-by: Rob Herring <robh@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-5-nramas@linux.microsoft.com
2021-03-08 12:06:29 -07:00
Michael Kelley ec866be6ec clocksource/drivers/hyper-v: Move handling of STIMER0 interrupts
STIMER0 interrupts are most naturally modeled as per-cpu IRQs. But
because x86/x64 doesn't have per-cpu IRQs, the core STIMER0 interrupt
handling machinery is done in code under arch/x86 and Linux IRQs are
not used. Adding support for ARM64 means adding equivalent code
using per-cpu IRQs under arch/arm64.

A better model is to treat per-cpu IRQs as the normal path (which it is
for modern architectures), and the x86/x64 path as the exception. Do this
by incorporating standard Linux per-cpu IRQ allocation into the main
SITMER0 driver code, and bypass it in the x86/x64 exception case. For
x86/x64, special case code is retained under arch/x86, but no STIMER0
interrupt handling code is needed under arch/arm64.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1614721102-2241-11-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-08 17:33:00 +00:00
Michael Kelley eb3e1d370b clocksource/drivers/hyper-v: Handle sched_clock differences inline
While the Hyper-V Reference TSC code is architecture neutral, the
pv_ops.time.sched_clock() function is implemented for x86/x64, but not
for ARM64. Current code calls a utility function under arch/x86 (and
coming, under arch/arm64) to handle the difference.

Change this approach to handle the difference inline based on whether
GENERIC_SCHED_CLOCK is present.  The new approach removes code under
arch/* since the difference is tied more to the specifics of the Linux
implementation than to the architecture.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1614721102-2241-9-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-08 17:33:00 +00:00
Michael Kelley e4ab4658f1 clocksource/drivers/hyper-v: Handle vDSO differences inline
While the driver for the Hyper-V Reference TSC and STIMERs is architecture
neutral, vDSO is implemented for x86/x64, but not for ARM64.  Current code
calls into utility functions under arch/x86 (and coming, under arch/arm64)
to handle the difference.

Change this approach to handle the difference inline based on whether
VDSO_CLOCK_MODE_HVCLOCK is present.  The new approach removes code under
arch/* since the difference is tied more to the specifics of the Linux
implementation than to the architecture.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1614721102-2241-8-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-08 17:33:00 +00:00
Michael Kelley d608715d47 Drivers: hv: vmbus: Move handling of VMbus interrupts
VMbus interrupts are most naturally modelled as per-cpu IRQs.  But
because x86/x64 doesn't have per-cpu IRQs, the core VMbus interrupt
handling machinery is done in code under arch/x86 and Linux IRQs are
not used.  Adding support for ARM64 means adding equivalent code
using per-cpu IRQs under arch/arm64.

A better model is to treat per-cpu IRQs as the normal path (which it is
for modern architectures), and the x86/x64 path as the exception.  Do this
by incorporating standard Linux per-cpu IRQ allocation into the main VMbus
driver, and bypassing it in the x86/x64 exception case. For x86/x64,
special case code is retained under arch/x86, but no VMbus interrupt
handling code is needed under arch/arm64.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-7-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-08 17:33:00 +00:00
Michael Kelley 946f4b8680 Drivers: hv: vmbus: Handle auto EOI quirk inline
On x86/x64, Hyper-V provides a flag to indicate auto EOI functionality,
but it doesn't on ARM64. Handle this quirk inline instead of calling
into code under arch/x86 (and coming, under arch/arm64).

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-6-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-08 17:32:59 +00:00
Michael Kelley b548a77427 Drivers: hv: vmbus: Move hyperv_report_panic_msg to arch neutral code
With the new Hyper-V MSR set function, hyperv_report_panic_msg() can be
architecture neutral, so move it out from under arch/x86 and merge into
hv_kmsg_dump(). This move also avoids needing a separate implementation
under arch/arm64.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-5-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-08 17:32:59 +00:00
Michael Kelley f3c5e63c36 Drivers: hv: Redo Hyper-V synthetic MSR get/set functions
Current code defines a separate get and set macro for each Hyper-V
synthetic MSR used by the VMbus driver. Furthermore, the get macro
can't be converted to a standard function because the second argument
is modified in place, which is somewhat bad form.

Redo this by providing a single get and a single set function that
take a parameter specifying the MSR to be operated on. Fixup usage
of the get function. Calling locations are no more complex than before,
but the code under arch/x86 and the upcoming code under arch/arm64
is significantly simplified.

Also standardize the names of Hyper-V synthetic MSRs that are
architecture neutral. But keep the old x86-specific names as aliases
that can be removed later when all references (particularly in KVM
code) have been cleaned up in a separate patch series.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-4-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-08 17:32:59 +00:00
Michael Kelley 5e4e6ddf8d x86/hyper-v: Move hv_message_type to architecture neutral module
The definition of enum hv_message_type includes arch neutral and
x86/x64-specific values. Ideally there would be a way to put the
arch neutral values in an arch neutral module, and the arch
specific values in an arch specific module. But C doesn't provide
a way to extend enum types. As a compromise, move the entire
definition into an arch neutral module, to avoid duplicating the
arch neutral values for x86/x64 and for ARM64.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-3-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-08 17:32:59 +00:00
Michael Kelley ca48739e59 Drivers: hv: vmbus: Move Hyper-V page allocator to arch neutral code
The Hyper-V page allocator functions are implemented in an architecture
neutral way.  Move them into the architecture neutral VMbus module so
a separate implementation for ARM64 is not needed.

No functional change.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-2-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-03-08 17:32:59 +00:00
Joerg Roedel 78a81d88f6 x86/sev-es: Introduce ip_within_syscall_gap() helper
Introduce a helper to check whether an exception came from the syscall
gap and use it in the SEV-ES code. Extend the check to also cover the
compatibility SYSCALL entry path.

Fixes: 315562c9af ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-2-joro@8bytes.org
2021-03-08 14:22:17 +01:00
Andy Lutomirski d0962f2b24 x86/entry/32: Remove leftover macros after stackprotector cleanups
Now that nonlazy-GS mode is gone, remove the macros from entry_32.S
that obfuscated^Wabstracted GS handling.  The assembled output is
identical before and after this patch.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/b1543116f0f0e68f1763d90d5f7fcec27885dff5.1613243844.git.luto@kernel.org
2021-03-08 13:27:31 +01:00
Andy Lutomirski 3fb0fdb3bb x86/stackprotector/32: Make the canary into a regular percpu variable
On 32-bit kernels, the stackprotector canary is quite nasty -- it is
stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
percpu storage.  It's even nastier because it means that whether %gs
contains userspace state or kernel state while running kernel code
depends on whether stackprotector is enabled (this is
CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
that segment selectors work.  Supporting both variants is a
maintenance and testing mess.

Merely rearranging so that percpu and the stack canary
share the same segment would be messy as the 32-bit percpu address
layout isn't currently compatible with putting a variable at a fixed
offset.

Fortunately, GCC 8.1 added options that allow the stack canary to be
accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
percpu variable.  This lets us get rid of all of the code to manage the
stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.

(That name is special.  We could use any symbol we want for the
 %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
 name other than __stack_chk_guard.)

Forcibly disable stackprotector on older compilers that don't support
the new options and turn the stack canary into a percpu variable. The
"lazy GS" approach is now used for all 32-bit configurations.

Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels,
it loads the GS selector and updates the user GSBASE accordingly. (This
is unchanged.) On 32-bit kernels, it loads the GS selector and updates
GSBASE, which is now always the user base. This means that the overall
effect is the same on 32-bit and 64-bit, which avoids some ifdeffery.

 [ bp: Massage commit message. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org
2021-03-08 13:19:05 +01:00
Mike Travis 6840a150b9 x86/platform/uv: Set section block size for hubless architectures
Commit

  bbbd2b51a2 ("x86/platform/UV: Use new set memory block size function")

added a call to set the block size value that is needed by the kernel
to set the boundaries in the section list. This was done for UV Hubbed
systems but missed in the UV Hubless setup. Fix that mistake by adding
that same set call for hubless systems, which support the same NVRAMs
and Intel BIOS, thus the same problem occurs.

 [ bp: Massage commit message. ]

Fixes: bbbd2b51a2 ("x86/platform/UV: Use new set memory block size function")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Russ Anderson <rja@hpe.com>
Link: https://lkml.kernel.org/r/20210305162853.299892-1-mike.travis@hpe.com
2021-03-08 12:17:53 +01:00
Dave Hansen 09141ec0e4 x86: Remove duplicate TSC DEADLINE MSR definitions
There are two definitions for the TSC deadline MSR in msr-index.h,
one with an underscore and one without.  Axe one of them and move
all the references over to the other one.

 [ bp: Fixup the MSR define in handle_fastpath_set_msr_irqoff() too. ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200305174706.0D6B8EE4@viggo.jf.intel.com
2021-03-08 11:05:20 +01:00
Andy Lutomirski 5d5675df79 x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
On a 32-bit fast syscall that fails to read its arguments from user
memory, the kernel currently does syscall exit work but not
syscall entry work.  This confuses audit and ptrace.  For example:

    $ ./tools/testing/selftests/x86/syscall_arg_fault_32
    ...
    strace: pid 264258: entering, ptrace_syscall_info.op == 2
    ...

This is a minimal fix intended for ease of backporting.  A more
complete cleanup is coming.

Fixes: 0b085e68f4 ("x86/entry: Consolidate 32/64 bit syscall entry")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/8c82296ddf803b91f8d1e5eac89e5803ba54ab0e.1614884673.git.luto@kernel.org
2021-03-06 13:10:06 +01:00
Josh Poimboeuf b59cc97674 x86/unwind/orc: Silence warnings caused by missing ORC data
The ORC unwinder attempts to fall back to frame pointers when ORC data
is missing for a given instruction.  It sets state->error, but then
tries to keep going as a best-effort type of thing.  That may result in
further warnings if the unwinder gets lost.

Until we have some way to register generated code with the unwinder,
missing ORC will be expected, and occasionally going off the rails will
also be expected.  So don't warn about it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Ivan Babrou <ivan@cloudflare.com>
Link: https://lkml.kernel.org/r/06d02c4bbb220bd31668db579278b0352538efbb.1612534649.git.jpoimboe@redhat.com
2021-03-06 13:09:45 +01:00
Josh Poimboeuf e504e74cc3 x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
KASAN reserves "redzone" areas between stack frames in order to detect
stack overruns.  A read or write to such an area triggers a KASAN
"stack-out-of-bounds" BUG.

Normally, the ORC unwinder stays in-bounds and doesn't access the
redzone.  But sometimes it can't find ORC metadata for a given
instruction.  This can happen for code which is missing ORC metadata, or
for generated code.  In such cases, the unwinder attempts to fall back
to frame pointers, as a best-effort type thing.

This fallback often works, but when it doesn't, the unwinder can get
confused and go off into the weeds into the KASAN redzone, triggering
the aforementioned KASAN BUG.

But in this case, the unwinder's confusion is actually harmless and
working as designed.  It already has checks in place to prevent
off-stack accesses, but those checks get short-circuited by the KASAN
BUG.  And a BUG is a lot more disruptive than a harmless unwinder
warning.

Disable the KASAN checks by using READ_ONCE_NOCHECK() for all stack
accesses.  This finishes the job started by commit 881125bfe6
("x86/unwind: Disable KASAN checking in the ORC unwinder"), which only
partially fixed the issue.

Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Ivan Babrou <ivan@cloudflare.com>
Cc: stable@kernel.org
Link: https://lkml.kernel.org/r/9583327904ebbbeda399eca9c56d6c7085ac20fe.1612534649.git.jpoimboe@redhat.com
2021-03-06 13:09:37 +01:00
Ingo Molnar a500fc918f Merge branch 'locking/core' into x86/mm, to resolve conflict
There's a non-trivial conflict between the parallel TLB flush
framework and the IPI flush debugging code - merge them
manually.

Conflicts:
	kernel/smp.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-06 13:00:58 +01:00
Nadav Amit 1608e4cf31 x86/mm/tlb: Remove unnecessary uses of the inline keyword
The compiler is smart enough without these hints.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-9-namit@vmware.com
2021-03-06 12:59:10 +01:00
Nadav Amit 09c5272e48 x86/mm/tlb: Do not make is_lazy dirty for no reason
Blindly writing to is_lazy for no reason, when the written value is
identical to the old value, makes the cacheline dirty for no reason.
Avoid making such writes to prevent cache coherency traffic for no
reason.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-7-namit@vmware.com
2021-03-06 12:59:10 +01:00
Nadav Amit 2f4305b19f x86/mm/tlb: Privatize cpu_tlbstate
cpu_tlbstate is mostly private and only the variable is_lazy is shared.
This causes some false-sharing when TLB flushes are performed.

Break cpu_tlbstate intro cpu_tlbstate and cpu_tlbstate_shared, and mark
each one accordingly.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-6-namit@vmware.com
2021-03-06 12:59:10 +01:00
Nadav Amit 4ce94eabac x86/mm/tlb: Flush remote and local TLBs concurrently
To improve TLB shootdown performance, flush the remote and local TLBs
concurrently. Introduce flush_tlb_multi() that does so. Introduce
paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
and hyper-v are only compile-tested).

While the updated smp infrastructure is capable of running a function on
a single local core, it is not optimized for this case. The multiple
function calls and the indirect branch introduce some overhead, and
might make local TLB flushes slower than they were before the recent
changes.

Before calling the SMP infrastructure, check if only a local TLB flush
is needed to restore the lost performance in this common case. This
requires to check mm_cpumask() one more time, but unless this mask is
updated very frequently, this should impact performance negatively.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com> # Hyper-v parts
Reviewed-by: Juergen Gross <jgross@suse.com> # Xen and paravirt parts
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-5-namit@vmware.com
2021-03-06 12:59:10 +01:00
Nadav Amit 6035152d8e x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()
Open-code on_each_cpu_cond_mask() in native_flush_tlb_others() to
optimize the code. Open-coding eliminates the need for the indirect branch
that is used to call is_lazy(), and in CPUs that are vulnerable to
Spectre v2, it eliminates the retpoline. In addition, it allows to use a
preallocated cpumask to compute the CPUs that should be.

This would later allow us not to adapt on_each_cpu_cond_mask() to
support local and remote functions.

Note that calling tlb_is_not_lazy() for every CPU that needs to be
flushed, as done in native_flush_tlb_multi() might look ugly, but it is
equivalent to what is currently done in on_each_cpu_cond_mask().
Actually, native_flush_tlb_multi() does it more efficiently since it
avoids using an indirect branch for the matter.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-4-namit@vmware.com
2021-03-06 12:59:09 +01:00
Nadav Amit 4c1ba3923e x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()
The unification of these two functions allows to use them in the updated
SMP infrastrucutre.

To do so, remove the reason argument from flush_tlb_func_local(), add
a member to struct tlb_flush_info that says which CPU initiated the
flush and act accordingly. Optimize the size of flush_tlb_info while we
are at it.

Unfortunately, this prevents us from using a constant tlb_flush_info for
arch_tlbbatch_flush(), but in a later stage we may be able to inline
tlb_flush_info into the IPI data, so it should not have an impact
eventually.

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-3-namit@vmware.com
2021-03-06 12:59:09 +01:00
Pu Wen 59eca2fa19 x86/cpu/hygon: Set __max_die_per_package on Hygon
Set the maximum DIE per package variable on Hygon using the
nodes_per_socket value in order to do per-DIE manipulations for drivers
such as powercap.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210302020217.1827-1-puwen@hygon.cn
2021-03-06 12:54:59 +01:00
Kan Liang afbef30149 perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
To supply a PID/TID for large PEBS, it requires flushing the PEBS buffer
in a context switch.

For normal LBRs, a context switch can flip the address space and LBR
entries are not tagged with an identifier, we need to wipe the LBR, even
for per-cpu events.

For LBR callstack, save/restore the stack is required during a context
switch.

Set PERF_ATTACH_SCHED_CB for the event with large PEBS & LBR.

Fixes: 9c964efa43 ("perf/x86/intel: Drain the PEBS buffer during context switches")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20201130193842.10569-2-kan.liang@linux.intel.com
2021-03-06 12:52:44 +01:00
Jason Gerecke 864b435514 x86/jump_label: Mark arguments as const to satisfy asm constraints
When compiling an external kernel module with `-O0` or `-O1`, the following
compile error may be reported:

    ./arch/x86/include/asm/jump_label.h:25:2: error: impossible constraint in ‘asm’
       25 |  asm_volatile_goto("1:"
          |  ^~~~~~~~~~~~~~~~~

It appears that these lower optimization levels prevent GCC from detecting
that the key/branch arguments can be treated as constants and used as
immediate operands. To work around this, explicitly add the `const` label.

Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20210211214848.536626-1-jason.gerecke@wacom.com
2021-03-06 12:51:00 +01:00
Borislav Petkov f3db3365c0 x86/sev-es: Remove subtraction of res variable
vc_decode_insn() calls copy_from_kernel_nofault() by way of
vc_fetch_insn_kernel() to fetch 15 bytes max of opcodes to decode.

copy_from_kernel_nofault() returns negative on error and 0 on success.
The error case is handled by returning ES_EXCEPTION.

In the success case, the ret variable which contains the return value is
0 so there's no need to subtract it from MAX_INSN_SIZE when initializing
the insn buffer for further decoding. Remove it.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20210223111130.16201-1-bp@alien8.de
2021-03-06 12:08:53 +01:00
Jiri Slaby 70c9d95922 x86/vdso: Use proper modifier for len's format specifier in extract()
Commit

  8382c668ce ("x86/vdso: Add support for exception fixup in vDSO functions")

prints length "len" which is size_t.

Compilers now complain when building on a 32-bit host:

  HOSTCC  arch/x86/entry/vdso/vdso2c
  ...
  In file included from arch/x86/entry/vdso/vdso2c.c:162:
  arch/x86/entry/vdso/vdso2c.h: In function 'extract64':
  arch/x86/entry/vdso/vdso2c.h:38:52: warning: format '%lu' expects argument of \
	type 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'}

So use proper modifier (%zu) for size_t.

 [ bp: Massage commit message. ]

Fixes: 8382c668ce ("x86/vdso: Add support for exception fixup in vDSO functions")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20210303064357.17056-1-jslaby@suse.cz
2021-03-06 11:34:07 +01:00