Commit graph

45498 commits

Author SHA1 Message Date
Ilpo Järvinen
197e0da1f1 x86/pci: Use PCI_HEADER_TYPE_* instead of literals
Replace 0x7f and 0x80 literals with PCI_HEADER_TYPE_* defines.

Link: https://lore.kernel.org/r/20231124090919.23687-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-12-01 15:00:43 -06:00
Ashok Raj
1f693ef550 x86/microcode/intel: Remove redundant microcode late updated message
After successful update, the late loading routine prints an update
summary similar to:

  microcode: load: updated on 128 primary CPUs with 128 siblings
  microcode: revision: 0x21000170 -> 0x21000190

Remove the redundant message in the Intel side of the driver.

  [ bp: Massage commit message. ]

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/ZWjYhedNfhAUmt0k@a4bf019067fa.jf.intel.com
2023-12-01 18:52:01 +01:00
Like Xu
ef8d89033c KVM: x86: Remove 'return void' expression for 'void function'
The requested info will be stored in 'guest_xsave->region' referenced by
the incoming pointer "struct kvm_xsave *guest_xsave", thus there is no need
to explicitly use return void expression for a void function "static void
kvm_vcpu_ioctl_x86_get_xsave(...)". The issue is caught with [-Wpedantic].

Fixes: 2d287ec65e79 ("x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer")
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20231007064019.17472-1-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-12-01 08:14:27 -08:00
Sean Christopherson
087e15206d KVM: Set file_operations.owner appropriately for all such structures
Set .owner for all KVM-owned filed types so that the KVM module is pinned
until any files with callbacks back into KVM are completely freed.  Using
"struct kvm" as a proxy for the module, i.e. keeping KVM-the-module alive
while there are active VMs, doesn't provide full protection.

Userspace can invoke delete_module() the instant the last reference to KVM
is put.  If KVM itself puts the last reference, e.g. via kvm_destroy_vm(),
then it's possible for KVM to be preempted and deleted/unloaded before KVM
fully exits, e.g. when the task running kvm_destroy_vm() is scheduled back
in, it will jump to a code page that is no longer mapped.

Note, file types that can call into sub-module code, e.g. kvm-intel.ko or
kvm-amd.ko on x86, must use the module pointer passed to kvm_init(), not
THIS_MODULE (which points at kvm.ko).  KVM assumes that if /dev/kvm is
reachable, e.g. VMs are active, then the vendor module is loaded.

To reduce the probability of forgetting to set .owner entirely, use
THIS_MODULE for stats files where KVM does not call back into vendor code.

This reverts commit 70375c2d8f, and fixes
several other file types that have been buggy since their introduction.

Fixes: 70375c2d8f ("Revert "KVM: set owner of cpu and vm file operations"")
Fixes: 3bcd0662d6 ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/all/20231010003746.GN800259@ZenIV
Link: https://lore.kernel.org/r/20231018204624.1905300-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-12-01 08:12:17 -08:00
Paolo Bonzini
e59f75de4e KVM: x86/mmu: fix comment about mmu_unsync_pages_lock
Fix the comment about what can and cannot happen when mmu_unsync_pages_lock
is not help.  The comment correctly mentions "clearing sp->unsync", but then
it talks about unsync going from 0 to 1.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231125083400.1399197-5-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-12-01 07:52:09 -08:00
Paolo Bonzini
250ce1b4d2 KVM: x86/mmu: always take tdp_mmu_pages_lock
It is cheap to take tdp_mmu_pages_lock in all write-side critical sections.
We already do it all the time when zapping with read_lock(), so it is not
a problem to do it from the kvm_tdp_mmu_zap_all() path (aka
kvm_arch_flush_shadow_all(), aka VM destruction and MMU notifier release).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231125083400.1399197-4-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-12-01 07:52:08 -08:00
Paolo Bonzini
484dd27c06 KVM: x86/mmu: remove unnecessary "bool shared" argument from iterators
The "bool shared" argument is more or less unnecessary in the
for_each_*_tdp_mmu_root_yield_safe() macros.  Many users check for
the lock before calling it; all of them either call small functions
that do the check, or end up calling tdp_mmu_set_spte_atomic() and
tdp_mmu_iter_set_spte().  Add a few assertions to make up for the
lost check in for_each_*_tdp_mmu_root_yield_safe(), but even this
is probably overkill and mostly for documentation reasons.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231125083400.1399197-3-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-12-01 07:52:08 -08:00
Paolo Bonzini
5f3c8c9187 KVM: x86/mmu: remove unnecessary "bool shared" argument from functions
Neither tdp_mmu_next_root nor kvm_tdp_mmu_put_root need to know
if the lock is taken for read or write.  Either way, protection
is achieved via RCU and tdp_mmu_pages_lock.  Remove the argument
and just assert that the lock is taken.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231125083400.1399197-2-pbonzini@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-12-01 07:52:07 -08:00
David Matlack
45a61ebb22 KVM: x86/mmu: Check for leaf SPTE when clearing dirty bit in the TDP MMU
Re-check that the given SPTE is still a leaf and present SPTE after a
failed cmpxchg in clear_dirty_gfn_range(). clear_dirty_gfn_range()
intends to only operate on present leaf SPTEs, but that could change
after a failed cmpxchg.

A check for present was added in commit 3354ef5a59 ("KVM: x86/mmu:
Check for present SPTE when clearing dirty bit in TDP MMU") but the
check for leaf is still buried in tdp_root_for_each_leaf_pte() and does
not get rechecked on retry.

Fixes: a6a0b05da9 ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20231027172640.2335197-3-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-12-01 07:52:07 -08:00
David Matlack
1aa4bb9168 KVM: x86/mmu: Fix off-by-1 when splitting huge pages during CLEAR
Fix an off-by-1 error when passing in the range of pages to
kvm_mmu_try_split_huge_pages() during CLEAR_DIRTY_LOG. Specifically, end
is the last page that needs to be split (inclusive) so pass in `end + 1`
since kvm_mmu_try_split_huge_pages() expects the `end` to be
non-inclusive.

At worst this will cause a huge page to be write-protected instead of
eagerly split, which is purely a performance issue, not a correctness
issue. But even that is unlikely as it would require userspace pass in a
bitmap where the last page is the only 4K page on a huge page that needs
to be split.

Reported-by: Vipin Sharma <vipinsh@google.com>
Fixes: cb00a70bd4 ("KVM: x86/mmu: Split huge pages mapped by the TDP MMU during KVM_CLEAR_DIRTY_LOG")
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20231027172640.2335197-2-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-12-01 07:51:55 -08:00
Philipp Stanner
573cc0e5cf KVM: x86: Harden copying of userspace-array against overflow
cpuid.c utilizes vmemdup_user() and array_size() to copy two userspace
arrays. This, currently, does not check for an overflow.

Use the new wrapper vmemdup_array_user() to copy the arrays more safely,
as vmemdup_user() doesn't check for overflow.

Note, KVM explicitly checks the number of entries before duplicating the
array, i.e. adding the overflow check should be a glorified nop.

Suggested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Link: https://lore.kernel.org/r/20231102181526.43279-2-pstanner@redhat.com
[sean: call out that KVM pre-checks the number of entries]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 13:16:21 -08:00
Sean Christopherson
fd89499a51 KVM: x86/pmu: Track emulated counter events instead of previous counter
Explicitly track emulated counter events instead of using the common
counter value that's shared with the hardware counter owned by perf.
Bumping the common counter requires snapshotting the pre-increment value
in order to detect overflow from emulation, and the snapshot approach is
inherently flawed.

Snapshotting the previous counter at every increment assumes that there is
at most one emulated counter event per emulated instruction (or rather,
between checks for KVM_REQ_PMU).  That's mostly holds true today because
KVM only emulates (branch) instructions retired, but the approach will
fall apart if KVM ever supports event types that don't have a 1:1
relationship with instructions.

And KVM already has a relevant bug, as handle_invalid_guest_state()
emulates multiple instructions without checking KVM_REQ_PMU, i.e. could
miss an overflow event due to clobbering pmc->prev_counter.  Not checking
KVM_REQ_PMU is problematic in both cases, but at least with the emulated
counter approach, the resulting behavior is delayed overflow detection,
as opposed to completely lost detection.

Tracking the emulated count fixes another bug where the snapshot approach
can signal spurious overflow due to incorporating both the emulated count
and perf's count in the check, i.e. if overflow is detected by perf, then
KVM's emulation will also incorrectly signal overflow.  Add a comment in
the related code to call out the need to process emulated events *after*
pausing the perf event (big kudos to Mingwei for figuring out that
particular wrinkle).

Cc: Mingwei Zhang <mizhang@google.com>
Cc: Roman Kagan <rkagan@amazon.de>
Cc: Jim Mattson <jmattson@google.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Like Xu <like.xu.linux@gmail.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20231103230541.352265-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:55 -08:00
Sean Christopherson
89acf1237b KVM: x86/pmu: Update sample period in pmc_write_counter()
Update a PMC's sample period in pmc_write_counter() to deduplicate code
across all callers of pmc_write_counter().  Opportunistically move
pmc_write_counter() into pmc.c now that it's doing more work.  WRMSR isn't
such a hot path that an extra CALL+RET pair will be problematic, and the
order of function definitions needs to be changed anyways, i.e. now is a
convenient time to eat the churn.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:55 -08:00
Sean Christopherson
ec61b2306d KVM: x86/pmu: Remove manual clearing of fields in kvm_pmu_init()
Remove code that unnecessarily clears event_count and need_cleanup in
kvm_pmu_init(), the entire kvm_pmu is zeroed just a few lines earlier.
Vendor code doesn't set event_count or need_cleanup during .init(), and
if either VMX or SVM did set those fields it would be a flagrant bug.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:54 -08:00
Sean Christopherson
f2f63f7ec6 KVM: x86/pmu: Stop calling kvm_pmu_reset() at RESET (it's redundant)
Drop kvm_vcpu_reset()'s call to kvm_pmu_reset(), the call is performed
only for RESET, which is really just the same thing as vCPU creation,
and kvm_arch_vcpu_create() *just* called kvm_pmu_init(), i.e. there can't
possibly be any work to do.

Unlike Intel, AMD's amd_pmu_refresh() does fill all_valid_pmc_idx even if
guest CPUID is empty, but everything that is at all dynamic is guaranteed
to be '0'/NULL, e.g. it should be impossible for KVM to have already
created a perf event.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:54 -08:00
Sean Christopherson
1647b52757 KVM: x86/pmu: Reset the PMU, i.e. stop counters, before refreshing
Stop all counters and release all perf events before refreshing the vPMU,
i.e. before reconfiguring the vPMU to respond to changes in the vCPU
model.

Clear need_cleanup in kvm_pmu_reset() as well so that KVM doesn't
prematurely stop counters, e.g. if KVM enters the guest and enables
counters before the vCPU is scheduled out.

Cc: stable@vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:54 -08:00
Sean Christopherson
cbb359d81a KVM: x86/pmu: Move PMU reset logic to common x86 code
Move the common (or at least "ignored") aspects of resetting the vPMU to
common x86 code, along with the stop/release helpers that are no used only
by the common pmu.c.

There is no need to manually handle fixed counters as all_valid_pmc_idx
tracks both fixed and general purpose counters, and resetting the vPMU is
far from a hot path, i.e. the extra bit of overhead to the PMC from the
index is a non-issue.

Zero fixed_ctr_ctrl in common code even though it's Intel specific.
Ensuring it's zero doesn't harm AMD/SVM in any way, and stopping the fixed
counters via all_valid_pmc_idx, but not clearing the associated control
bits, would be odd/confusing.

Make the .reset() hook optional as SVM no longer needs vendor specific
handling.

Cc: stable@vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:52:54 -08:00
Uros Bizjak
15223c4f97 KVM: SVM,VMX: Use %rip-relative addressing to access kvm_rebooting
Instruction with %rip-relative address operand is one byte shorter than
its absolute address counterpart and is also compatible with position
independent executable (-fpie) build.

No functional changes intended.

Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20231031075312.47525-1-ubizjak@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:51:54 -08:00
Sean Christopherson
72046d0a07 KVM: SVM: Don't intercept IRET when injecting NMI and vNMI is enabled
When vNMI is enabled, rely entirely on hardware to correctly handle NMI
blocking, i.e. don't intercept IRET to detect when NMIs are no longer
blocked.  KVM already correctly ignores svm->nmi_masked when vNMI is
enabled, so the effect of the bug is essentially an unnecessary VM-Exit.

KVM intercepts IRET for two reasons:
 - To track NMI masking to be able to know at any point of time if NMI
   is masked.
 - To track NMI windows (to inject another NMI after the guest executes
   IRET, i.e. unblocks NMIs)

When vNMI is enabled, both cases are handled by hardware:
- NMI masking state resides in int_ctl.V_NMI_BLOCKING and can be read by
  KVM at will.
- Hardware automatically "injects" pending virtual NMIs when virtual NMIs
  become unblocked.

However, even though pending a virtual NMI for hardware to handle is the
most common way to synthesize a guest NMI, KVM may still directly inject
an NMI via when KVM is handling two "simultaneous" NMIs (see comments in
process_nmi() for details on KVM's simultaneous NMI handling).  Per AMD's
APM, hardware sets the BLOCKING flag when software directly injects an NMI
as well, i.e. KVM doesn't need to manually mark vNMIs as blocked:

  If Event Injection is used to inject an NMI when NMI Virtualization is
  enabled, VMRUN sets V_NMI_MASK in the guest state.

Note, it's still possible that KVM could trigger a spurious IRET VM-Exit.
When running a nested guest, KVM disables vNMI for L2 and thus will enable
IRET interception (in both vmcb01 and vmcb02) while running L2 reason.  If
a nested VM-Exit happens before L2 executes IRET, KVM can end up running
L1 with vNMI enable and IRET intercepted.  This is also a benign bug, and
even less likely to happen, i.e. can be safely punted to a future fix.

Fixes: fa4c027a79 ("KVM: x86: Add support for SVM's Virtual NMI")
Link: https://lore.kernel.org/all/ZOdnuDZUd4mevCqe@google.como
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Santosh Shukla <santosh.shukla@amd.com>
Link: https://lore.kernel.org/r/20231018192021.1893261-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:51:22 -08:00
Sean Christopherson
770d6aa2e4 KVM: SVM: Explicitly require FLUSHBYASID to enable SEV support
Add a sanity check that FLUSHBYASID is available if SEV is supported in
hardware, as SEV (and beyond) guests are bound to a single ASID, i.e. KVM
can't "flush" by assigning a new, fresh ASID to the guest.  If FLUSHBYASID
isn't supported for some bizarre reason, KVM would completely fail to do
TLB flushes for SEV+ guests (see pre_svm_run() and pre_sev_run()).

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20231018193617.1895752-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:51:14 -08:00
Sean Christopherson
176bfc5b17 KVM: nSVM: Advertise support for flush-by-ASID
Advertise support for FLUSHBYASID when nested SVM is enabled, as KVM can
always emulate flushing TLB entries for a vmcb12 ASID, e.g. by running L2
with a new, fresh ASID in vmcb02.  Some modern hypervisors, e.g. VMWare
Workstation 17, require FLUSHBYASID support and will refuse to run if it's
not present.

Punt on proper support, as "Honor L1's request to flush an ASID on nested
VMRUN" is one of the TODO items in the (incomplete) list of issues that
need to be addressed in order for KVM to NOT do a full TLB flush on every
nested SVM transition (see nested_svm_transition_tlb_flush()).

Reported-by: Stefan Sterz <s.sterz@proxmox.com>
Closes: https://lkml.kernel.org/r/b9915c9c-4cf6-051a-2d91-44cc6380f455%40proxmox.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20231018194104.1896415-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:50:25 -08:00
Sean Christopherson
a484755ab2 Revert "nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB"
Revert KVM's made-up consistency check on SVM's TLB control.  The APM says
that unsupported encodings are reserved, but the APM doesn't state that
VMRUN checks for a supported encoding.  Unless something is called out
in "Canonicalization and Consistency Checks" or listed as MBZ (Must Be
Zero), AMD behavior is typically to let software shoot itself in the foot.

This reverts commit 174a921b69.

Fixes: 174a921b69 ("nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB")
Reported-by: Stefan Sterz <s.sterz@proxmox.com>
Closes: https://lkml.kernel.org/r/b9915c9c-4cf6-051a-2d91-44cc6380f455%40proxmox.com
Cc: stable@vger.kernel.org
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20231018194104.1896415-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:47:43 -08:00
Sean Christopherson
c52ffadc65 KVM: x86: Don't unnecessarily force masterclock update on vCPU hotplug
Don't force a masterclock update when a vCPU synchronizes to the current
TSC generation, e.g. when userspace hotplugs a pre-created vCPU into the
VM.  Unnecessarily updating the masterclock is undesirable as it can cause
kvmclock's time to jump, which is particularly painful on systems with a
stable TSC as kvmclock _should_ be fully reliable on such systems.

The unexpected time jumps are due to differences in the TSC=>nanoseconds
conversion algorithms between kvmclock and the host's CLOCK_MONOTONIC_RAW
(the pvclock algorithm is inherently lossy).  When updating the
masterclock, KVM refreshes the "base", i.e. moves the elapsed time since
the last update from the kvmclock/pvclock algorithm to the
CLOCK_MONOTONIC_RAW algorithm.  Synchronizing kvmclock with
CLOCK_MONOTONIC_RAW is the lesser of evils when the TSC is unstable, but
adds no real value when the TSC is stable.

Prior to commit 7f187922dd ("KVM: x86: update masterclock values on TSC
writes"), KVM did NOT force an update when synchronizing a vCPU to the
current generation.

  commit 7f187922dd
  Author: Marcelo Tosatti <mtosatti@redhat.com>
  Date:   Tue Nov 4 21:30:44 2014 -0200

    KVM: x86: update masterclock values on TSC writes

    When the guest writes to the TSC, the masterclock TSC copy must be
    updated as well along with the TSC_OFFSET update, otherwise a negative
    tsc_timestamp is calculated at kvm_guest_time_update.

    Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to
    "if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)"
    becomes redundant, so remove the do_request boolean and collapse
    everything into a single condition.

Before that, KVM only re-synced the masterclock if the masterclock was
enabled or disabled  Note, at the time of the above commit, VMX
synchronized TSC on *guest* writes to MSR_IA32_TSC:

        case MSR_IA32_TSC:
                kvm_write_tsc(vcpu, msr_info);
                break;

which is why the changelog specifically says "guest writes", but the bug
that was being fixed wasn't unique to guest write, i.e. a TSC write from
the host would suffer the same problem.

So even though KVM stopped synchronizing on guest writes as of commit
0c899c25d7 ("KVM: x86: do not attempt TSC synchronization on guest
writes"), simply reverting commit 7f187922dd is not an option.  Figuring
out how a negative tsc_timestamp could be computed requires a bit more
sleuthing.

In kvm_write_tsc() (at the time), except for KVM's "less than 1 second"
hack, KVM snapshotted the vCPU's current TSC *and* the current time in
nanoseconds, where kvm->arch.cur_tsc_nsec is the current host kernel time
in nanoseconds:

        ns = get_kernel_ns();

        ...

        if (usdiff < USEC_PER_SEC &&
            vcpu->arch.virtual_tsc_khz == kvm->arch.last_tsc_khz) {
                ...
        } else {
                /*
                 * We split periods of matched TSC writes into generations.
                 * For each generation, we track the original measured
                 * nanosecond time, offset, and write, so if TSCs are in
                 * sync, we can match exact offset, and if not, we can match
                 * exact software computation in compute_guest_tsc()
                 *
                 * These values are tracked in kvm->arch.cur_xxx variables.
                 */
                kvm->arch.cur_tsc_generation++;
                kvm->arch.cur_tsc_nsec = ns;
                kvm->arch.cur_tsc_write = data;
                kvm->arch.cur_tsc_offset = offset;
                matched = false;
                pr_debug("kvm: new tsc generation %llu, clock %llu\n",
                         kvm->arch.cur_tsc_generation, data);
        }

        ...

        /* Keep track of which generation this VCPU has synchronized to */
        vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
        vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
        vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;

Note that the above creates a new generation and sets "matched" to false!
But because kvm_track_tsc_matching() looks for matched+1, i.e. doesn't
require the vCPU that creates the new generation to match itself, KVM
would immediately compute vcpus_matched as true for VMs with a single vCPU.
As a result, KVM would skip the masterlock update, even though a new TSC
generation was created:

        vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 ==
                         atomic_read(&vcpu->kvm->online_vcpus));

        if (vcpus_matched && gtod->clock.vclock_mode == VCLOCK_TSC)
                if (!ka->use_master_clock)
                        do_request = 1;

        if (!vcpus_matched && ka->use_master_clock)
                        do_request = 1;

        if (do_request)
                kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);

On hardware without TSC scaling support, vcpu->tsc_catchup is set to true
if the guest TSC frequency is faster than the host TSC frequency, even if
the TSC is otherwise stable.  And for that mode, kvm_guest_time_update(),
by way of compute_guest_tsc(), uses vcpu->arch.this_tsc_nsec, a.k.a. the
kernel time at the last TSC write, to compute the guest TSC relative to
kernel time:

  static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
  {
        u64 tsc = pvclock_scale_delta(kernel_ns-vcpu->arch.this_tsc_nsec,
                                      vcpu->arch.virtual_tsc_mult,
                                      vcpu->arch.virtual_tsc_shift);
        tsc += vcpu->arch.this_tsc_write;
        return tsc;
  }

Except the "kernel_ns" passed to compute_guest_tsc() isn't the current
kernel time, it's the masterclock snapshot!

        spin_lock(&ka->pvclock_gtod_sync_lock);
        use_master_clock = ka->use_master_clock;
        if (use_master_clock) {
                host_tsc = ka->master_cycle_now;
                kernel_ns = ka->master_kernel_ns;
        }
        spin_unlock(&ka->pvclock_gtod_sync_lock);

        if (vcpu->tsc_catchup) {
                u64 tsc = compute_guest_tsc(v, kernel_ns);
                if (tsc > tsc_timestamp) {
                        adjust_tsc_offset_guest(v, tsc - tsc_timestamp);
                        tsc_timestamp = tsc;
                }
        }

And so when KVM skips the masterclock update after a TSC write, i.e. after
a new TSC generation is started, the "kernel_ns-vcpu->arch.this_tsc_nsec"
is *guaranteed* to generate a negative value, because this_tsc_nsec was
captured after ka->master_kernel_ns.

Forcing a masterclock update essentially fudged around that problem, but
in a heavy handed way that introduced undesirable side effects, i.e.
unnecessarily forces a masterclock update when a new vCPU joins the party
via hotplug.

Note, KVM forces masterclock updates in other weird ways that are also
likely unnecessary, e.g. when establishing a new Xen shared info page and
when userspace creates a brand new vCPU.  But the Xen thing is firmly a
separate mess, and there are no known userspace VMMs that utilize kvmclock
*and* create new vCPUs after the VM is up and running.  I.e. the other
issues are future problems.

Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
Closes: https://lore.kernel.org/all/20230926230649.67852-1-dongli.zhang@oracle.com
Fixes: 7f187922dd ("KVM: x86: update masterclock values on TSC writes")
Cc: David Woodhouse <dwmw2@infradead.org>
Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com>
Tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Link: https://lore.kernel.org/r/20231018195638.1898375-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:47:06 -08:00
Jim Mattson
80c883db87 KVM: x86: Use a switch statement and macros in __feature_translate()
Use a switch statement with macro-generated case statements to handle
translating feature flags in order to reduce the probability of runtime
errors due to copy+paste goofs, to make compile-time errors easier to
debug, and to make the code more readable.

E.g. the compiler won't directly generate an error for duplicate if
statements

	if (x86_feature == X86_FEATURE_SGX1)
		return KVM_X86_FEATURE_SGX1;
	else if (x86_feature == X86_FEATURE_SGX2)
		return KVM_X86_FEATURE_SGX1;

and so instead reverse_cpuid_check() will fail due to the untranslated
entry pointing at a Linux-defined leaf, which provides practically no
hint as to what is broken

  arch/x86/kvm/reverse_cpuid.h:108:2: error: call to __compiletime_assert_450 declared with 'error' attribute:
                                      BUILD_BUG_ON failed: x86_leaf == CPUID_LNX_4
          BUILD_BUG_ON(x86_leaf == CPUID_LNX_4);
          ^
whereas duplicate case statements very explicitly point at the offending
code:

  arch/x86/kvm/reverse_cpuid.h:125:2: error: duplicate case value '361'
          KVM_X86_TRANSLATE_FEATURE(SGX2);
          ^
  arch/x86/kvm/reverse_cpuid.h:124:2: error: duplicate case value '360'
          KVM_X86_TRANSLATE_FEATURE(SGX1);
          ^

And without macros, the opposite type of copy+paste goof doesn't generate
any error at compile-time, e.g. this yields no complaints:

        case X86_FEATURE_SGX1:
                return KVM_X86_FEATURE_SGX1;
        case X86_FEATURE_SGX2:
                return KVM_X86_FEATURE_SGX1;

Note, __feature_translate() is forcibly inlined and the feature is known
at compile-time, so the code generation between an if-elif sequence and a
switch statement should be identical.

Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20231024001636.890236-2-jmattson@google.com
[sean: use a macro, rewrite changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 12:27:02 -08:00
Jim Mattson
eefe5e6682 KVM: x86: Advertise CPUID.(EAX=7,ECX=2):EDX[5:0] to userspace
The low five bits {INTEL_PSFD, IPRED_CTRL, RRSBA_CTRL, DDPD_U, BHI_CTRL}
advertise the availability of specific bits in IA32_SPEC_CTRL. Since KVM
dynamically determines the legal IA32_SPEC_CTRL bits for the underlying
hardware, the hard work has already been done. Just let userspace know
that a guest can use these IA32_SPEC_CTRL bits.

The sixth bit (MCDT_NO) states that the processor does not exhibit MXCSR
Configuration Dependent Timing (MCDT) behavior. This is an inherent
property of the physical processor that is inherited by the virtual
CPU. Pass that information on to userspace.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20231024001636.890236-1-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 11:50:16 -08:00
Sean Christopherson
75bedc1ee9 KVM: x86: Turn off KVM_WERROR by default for all configs
Don't enable KVM_WERROR by default for x86-64 builds as KVM's one-off
-Werror enabling is *mostly* superseded by the kernel-wide WERROR, and
enabling KVM_WERROR by default can cause problems for developers working
on other subsystems.  E.g. subsystems that have a "zero W=1 regressions"
rule can inadvertently build KVM with -Werror and W=1, and end up with
build failures that are completely uninteresting to the developer (W=1 is
prone to false positives, especially on older compilers).

Keep KVM_WERROR as there are combinations where enabling WERROR isn't
feasible, e.g. the default FRAME_WARN=1024 on i386 builds generates a
non-zero number of warnings and thus errors, and there are far too many
warnings throughout the kernel to enable WERROR with W=1 (building KVM
with -Werror is desirable (with a sane compiler) as W=1 does generate
useful warnings).

Opportunistically drop the dependency on !COMPILE_TEST as it's completely
meaningless (it was copied from i195's -Werror Kconfig), as the kernel's
WERROR is explicitly *enabled* for COMPILE_TEST=y kernel's, i.e. enabling
-Werror is obviosly not dependent on COMPILE_TEST=n.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/all/20231006205415.3501535-1-kuba@kernel.org
Link: https://lore.kernel.org/r/20231018151906.1841689-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30 11:48:39 -08:00
Alexander Antonov
fdd041028f perf/x86/intel/uncore: Factor out topology_gidnid_map()
The same code is used for retrieving package ID procedure from GIDNIDMAP
register. Factor out topology_gidnid_map() to avoid code duplication.

Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20231127185246.2371939-3-alexander.antonov@linux.intel.com
2023-11-30 14:29:53 +01:00
Alexander Antonov
1692cf434b perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology()
Get logical socket id instead of physical id in discover_upi_topology()
to avoid out-of-bound access on 'upi = &type->topology[nid][idx];' line
that leads to NULL pointer dereference in upi_fill_topology()

Fixes: f680b6e606 ("perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server")
Reported-by: Kyle Meyer <kyle.meyer@hpe.com>
Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Kyle Meyer <kyle.meyer@hpe.com>
Link: https://lore.kernel.org/r/20231127185246.2371939-2-alexander.antonov@linux.intel.com
2023-11-30 14:29:52 +01:00
Ashwin Dayanand Kamat
27d25348d4 x86/sev: Fix kernel crash due to late update to read-only ghcb_version
A write-access violation page fault kernel crash was observed while running
cpuhotplug LTP testcases on SEV-ES enabled systems. The crash was
observed during hotplug, after the CPU was offlined and the process
was migrated to different CPU. setup_ghcb() is called again which
tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
is a read_only variable which is initialised during booting.

Trying to write it results in a pagefault:

  BUG: unable to handle page fault for address: ffffffffba556e70
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0003) - permissions violation
  [ ...]
  Call Trace:
   <TASK>
   ? __die_body.cold+0x1a/0x1f
   ? __die+0x2a/0x35
   ? page_fault_oops+0x10c/0x270
   ? setup_ghcb+0x71/0x100
   ? __x86_return_thunk+0x5/0x6
   ? search_exception_tables+0x60/0x70
   ? __x86_return_thunk+0x5/0x6
   ? fixup_exception+0x27/0x320
   ? kernelmode_fixup_or_oops+0xa2/0x120
   ? __bad_area_nosemaphore+0x16a/0x1b0
   ? kernel_exc_vmm_communication+0x60/0xb0
   ? bad_area_nosemaphore+0x16/0x20
   ? do_kern_addr_fault+0x7a/0x90
   ? exc_page_fault+0xbd/0x160
   ? asm_exc_page_fault+0x27/0x30
   ? setup_ghcb+0x71/0x100
   ? setup_ghcb+0xe/0x100
   cpu_init_exception_handling+0x1b9/0x1f0

The fix is to call sev_es_negotiate_protocol() only in the BSP boot phase,
and it only needs to be done once in any case.

[ mingo: Refined the changelog. ]

Fixes: 95d33bfaa3 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Ashwin Dayanand Kamat <ashwin.kamat@broadcom.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/1701254429-18250-1-git-send-email-kashwindayan@vmware.com
2023-11-30 10:23:12 +01:00
Jun'ichi Nomura
78a509fba9 x86/boot: Ignore NMIs during very early boot
When there are two racing NMIs on x86, the first NMI invokes NMI handler and
the 2nd NMI is latched until IRET is executed.

If panic on NMI and panic kexec are enabled, the first NMI triggers
panic and starts booting the next kernel via kexec. Note that the 2nd
NMI is still latched. During the early boot of the next kernel, once
an IRET is executed as a result of a page fault, then the 2nd NMI is
unlatched and invokes the NMI handler.

However, NMI handler is not set up at the early stage of boot, which
results in a boot failure.

Avoid such problems by setting up a NOP handler for early NMIs.

[ mingo: Refined the changelog. ]

Signed-off-by: Jun'ichi Nomura <junichi.nomura@nec.com>
Signed-off-by: Derek Barbosa <debarbos@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2023-11-30 09:55:40 +01:00
Nathan Chancellor
5225952d74 x86/tools: Remove chkobjdump.awk
This check is superfluous now that the minimum version of binutils to
build the kernel is 2.25. This also fixes an error seen with
llvm-objdump because it does not support '-v' prior to LLVM 13:

  llvm-objdump: error: unknown argument '-v'

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: dde24a87c5
Link: https://lore.kernel.org/r/20231129-objdump-reformat-llvm-v3-3-0d855e79314d@kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/1362
2023-11-30 09:38:12 +01:00
Samuel Zeter
f4570ebd83 x86/tools: objdump_reformat.awk: Allow for spaces
GNU objdump and LLVM objdump have differing output formats.
Specifically, GNU objump will format its output as: address:<tab>hex,
whereas LLVM objdump displays its output as address:<space>hex.

objdump_reformat.awk incorrectly handles this discrepancy due to
the unexpected space and as a result insn_decoder_test fails, as
its input is garbled.

The instruction line being tokenized now handles a space and colon,
or tab delimiter.

Signed-off-by: Samuel Zeter <samuelzeter@gmail.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20231129-objdump-reformat-llvm-v3-2-0d855e79314d@kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/1364
2023-11-30 09:38:10 +01:00
Samuel Zeter
60c2ea7c89 x86/tools: objdump_reformat.awk: Ensure regex matches fwait
If there is "wait" mnemonic in the line being parsed, it is incorrectly
handled by the script, and an extra line of "fwait" in
objdump_reformat's output is inserted. As insn_decoder_test relies upon
the formatted output, the test fails.

This is reproducible when disassembling with llvm-objdump:

Pre-processed lines:

  ffffffff81033e72: 9b                    wait
  ffffffff81033e73: 48 c7 c7 89 50 42 82  movq

After objdump_reformat.awk:

  ffffffff81033e72:       9b      fwait
  ffffffff81033e72:                               wait
  ffffffff81033e73:       48 c7 c7 89 50 42 82    movq

The regex match now accepts spaces or tabs, along with the "fwait"
instruction.

Signed-off-by: Samuel Zeter <samuelzeter@gmail.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20231129-objdump-reformat-llvm-v3-1-0d855e79314d@kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/1364
2023-11-30 09:38:09 +01:00
Namhyung Kim
0f9e0d7928 perf/x86/amd: Reject branch stack for IBS events
The AMD IBS PMU doesn't handle branch stacks, so it should not accept
events with brstack.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20231130062246.290-1-ravi.bangoria@amd.com
2023-11-30 09:34:40 +01:00
Sean Christopherson
0277022a77 KVM: x86/mmu: Declare flush_remote_tlbs{_range}() hooks iff HYPERV!=n
Declare the kvm_x86_ops hooks used to wire up paravirt TLB flushes when
running under Hyper-V if and only if CONFIG_HYPERV!=n.  Wrapping yet more
code with IS_ENABLED(CONFIG_HYPERV) eliminates a handful of conditional
branches, and makes it super obvious why the hooks *might* be valid.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20231018192325.1893896-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-29 14:44:47 -08:00
Like Xu
547c91929f KVM: x86: Get CPL directly when checking if loaded vCPU is in kernel mode
When querying whether or not a vCPU "is" running in kernel mode, directly
get the CPL if the vCPU is the currently loaded vCPU.  In scenarios where
a guest is profiled via perf-kvm, querying vcpu->arch.preempted_in_kernel
from kvm_guest_state() is wrong if vCPU is actively running, i.e. isn't
scheduled out due to being preempted and so preempted_in_kernel is stale.

This affects perf/core's ability to accurately tag guest RIP with
PERF_RECORD_MISC_GUEST_{KERNEL|USER} and record it in the sample.  This
causes perf/tool to fail to connect the vCPU RIPs to the guest kernel
space symbols when parsing these samples due to incorrect PERF_RECORD_MISC
flags:

   Before (perf-report of a cpu-cycles sample):
      1.23%  :58945   [unknown]         [u] 0xffffffff818012e0

   After:
      1.35%  :60703   [kernel.vmlinux]  [g] asm_exc_page_fault

Note, checking preempted_in_kernel in kvm_arch_vcpu_in_kernel() is awful
as nothing in the API's suggests that it's safe to use if and only if the
vCPU was preempted.  That can be cleaned up in the future, for now just
fix the glaring correctness bug.

Note #2, checking vcpu->preempted is NOT safe, as getting the CPL on VMX
requires VMREAD, i.e. is correct if and only if the vCPU is loaded.  If
the target vCPU *was* preempted, then it can be scheduled back in after
the check on vcpu->preempted in kvm_vcpu_on_spin(), i.e. KVM could end up
trying to do VMREAD on a VMCS that isn't loaded on the current pCPU.

Signed-off-by: Like Xu <likexu@tencent.com>
Fixes: e1bfc24577 ("KVM: Move x86's perf guest info callbacks to generic KVM")
Link: https://lore.kernel.org/r/20231123075818.12521-1-likexu@tencent.com
[sean: massage changelong, add Fixes]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-29 10:22:55 -08:00
Peter Zijlstra
edc8fc01f6 x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram
intel_idle_irq() re-enables IRQs very early. As a result, an interrupt
may fire before mwait() is eventually called. If such an interrupt queues
a timer, it may go unnoticed until mwait returns and the idle loop
handles the tick re-evaluation. And monitoring TIF_NEED_RESCHED doesn't
help because a local timer enqueue doesn't set that flag.

The issue is mitigated by the fact that this idle handler is only invoked
for shallow C-states when, presumably, the next tick is supposed to be
close enough. There may still be rare cases though when the next tick
is far away and the selected C-state is shallow, resulting in a timer
getting ignored for a while.

Fix this with using sti_mwait() whose IRQ-reenablement only triggers
upon calling mwait(), dealing with the race while keeping the interrupt
latency within acceptable bounds.

Fixes: c227233ad6 (intel_idle: enable interrupts before C1 on Xeons)
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lkml.kernel.org/r/20231115151325.6262-3-frederic@kernel.org
2023-11-29 15:44:01 +01:00
Frederic Weisbecker
7d09a052a3 x86: Add a comment about the "magic" behind shadow sti before mwait
Add a note to make sure we never miss and break the requirements behind
it.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lkml.kernel.org/r/20231115151325.6262-2-frederic@kernel.org
2023-11-29 15:44:01 +01:00
Borislav Petkov (AMD)
05f5f73936 x86/CPU/AMD: Drop now unused CPU erratum checking function
Bye bye.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-14-bp@alien8.de
2023-11-29 12:13:53 +01:00
Borislav Petkov (AMD)
794c68b204 x86/CPU/AMD: Get rid of amd_erratum_1485[]
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-13-bp@alien8.de
2023-11-29 12:13:31 +01:00
Borislav Petkov (AMD)
b3ffbbd282 x86/CPU/AMD: Get rid of amd_erratum_400[]
Setting X86_BUG_AMD_E400 in init_amd() is early enough.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-12-bp@alien8.de
2023-11-29 12:13:23 +01:00
Borislav Petkov (AMD)
1709528f73 x86/CPU/AMD: Get rid of amd_erratum_383[]
Set it in init_amd_gh() unconditionally as that is the F10h init
function.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-11-bp@alien8.de
2023-11-29 12:13:08 +01:00
Borislav Petkov (AMD)
54c33e23f7 x86/CPU/AMD: Get rid of amd_erratum_1054[]
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-10-bp@alien8.de
2023-11-29 12:12:55 +01:00
Borislav Petkov (AMD)
bfff3c6692 x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-9-bp@alien8.de
2023-11-29 12:12:42 +01:00
Borislav Petkov (AMD)
f69759be25 x86/CPU/AMD: Move Zenbleed check to the Zen2 init function
Prefix it properly so that it is clear which generation it is dealing
with.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: http://lore.kernel.org/r/20231120104152.13740-8-bp@alien8.de
2023-11-29 12:12:34 +01:00
Borislav Petkov (AMD)
7c81ad8e8b x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common()
Call it from all Zen init functions.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-7-bp@alien8.de
2023-11-29 12:12:27 +01:00
Borislav Petkov (AMD)
cfbf4f992b x86/CPU/AMD: Call the spectral chicken in the Zen2 init function
No functional change.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-6-bp@alien8.de
2023-11-29 12:12:20 +01:00
Borislav Petkov (AMD)
0da91912fc x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-5-bp@alien8.de
2023-11-29 12:11:59 +01:00
Borislav Petkov (AMD)
affc66cb96 x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init function
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-4-bp@alien8.de
2023-11-29 12:11:44 +01:00
Borislav Petkov (AMD)
a7c32a1ae9 x86/CPU/AMD: Carve out the erratum 1386 fix
Call it on the affected CPU generations.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: http://lore.kernel.org/r/20231120104152.13740-3-bp@alien8.de
2023-11-29 12:11:21 +01:00
Borislav Petkov (AMD)
30fa92832f x86/CPU/AMD: Add ZenX generations flags
Add X86_FEATURE flags for each Zen generation. They should be used from
now on instead of checking f/m/s.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/r/20231120104152.13740-2-bp@alien8.de
2023-11-29 12:11:01 +01:00
Binbin Wu
183bdd161c KVM: x86: Use KVM-governed feature framework to track "LAM enabled"
Use the governed feature framework to track if Linear Address Masking (LAM)
is "enabled", i.e. if LAM can be used by the guest.

Using the framework to avoid the relative expensive call guest_cpuid_has()
during cr3 and vmexit handling paths for LAM.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-14-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:09 -08:00
Robert Hoo
703d794cb8 KVM: x86: Advertise and enable LAM (user and supervisor)
LAM is enumerated by CPUID.7.1:EAX.LAM[bit 26]. Advertise the feature to
userspace and enable it as the final step after the LAM virtualization
support for supervisor and user pointers.

SGX LAM support is not advertised yet. SGX LAM support is enumerated in
SGX's own CPUID and there's no hard requirement that it must be supported
when LAM is reported in CPUID leaf 0x7.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-13-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:09 -08:00
Robert Hoo
3098e6eca8 KVM: x86: Virtualize LAM for user pointer
Add support to allow guests to set the new CR3 control bits for Linear
Address Masking (LAM) and add implementation to get untagged address for
user pointers.

LAM modifies the canonical check for 64-bit linear addresses, allowing
software to use the masked/ignored address bits for metadata.  Hardware
masks off the metadata bits before using the linear addresses to access
memory.  LAM uses two new CR3 non-address bits, LAM_U48 (bit 62) and
LAM_U57 (bit 61), to configure LAM for user pointers. LAM also changes
VMENTER to allow both bits to be set in VMCS's HOST_CR3 and GUEST_CR3 for
virtualization.

When EPT is on, CR3 is not trapped by KVM and it's up to the guest to set
any of the two LAM control bits. However, when EPT is off, the actual CR3
used by the guest is generated from the shadow MMU root which is different
from the CR3 that is *set* by the guest, and KVM needs to manually apply
any active control bits to VMCS's GUEST_CR3 based on the cached CR3 *seen*
by the guest.

KVM manually checks guest's CR3 to make sure it points to a valid guest
physical address (i.e. to support smaller MAXPHYSADDR in the guest). Extend
this check to allow the two LAM control bits to be set. After check, LAM
bits of guest CR3 will be stripped off to extract guest physical address.

In case of nested, for a guest which supports LAM, both VMCS12's HOST_CR3
and GUEST_CR3 are allowed to have the new LAM control bits set, i.e. when
L0 enters L1 to emulate a VMEXIT from L2 to L1 or when L0 enters L2
directly. KVM also manually checks VMCS12's HOST_CR3 and GUEST_CR3 being
valid physical address. Extend such check to allow the new LAM control bits
too.

Note, LAM doesn't have a global control bit to turn on/off LAM completely,
but purely depends on hardware's CPUID to determine it can be enabled or
not. That means, when EPT is on, even when KVM doesn't expose LAM to guest,
the guest can still set LAM control bits in CR3 w/o causing problem. This
is an unfortunate virtualization hole. KVM could choose to intercept CR3 in
this case and inject fault but this would hurt performance when running a
normal VM w/o LAM support.  This is undesirable. Just choose to let the
guest do such illegal thing as the worst case is guest being killed when
KVM eventually find out such illegal behaviour and that the guest is
misbehaving.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-12-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:08 -08:00
Robert Hoo
93d1c9f498 KVM: x86: Virtualize LAM for supervisor pointer
Add support to allow guests to set the new CR4 control bit for LAM and add
implementation to get untagged address for supervisor pointers.

LAM modifies the canonicality check applied to 64-bit linear addresses for
data accesses, allowing software to use of the untranslated address bits for
metadata and masks the metadata bits before using them as linear addresses
to access memory. LAM uses CR4.LAM_SUP (bit 28) to configure and enable LAM
for supervisor pointers. It also changes VMENTER to allow the bit to be set
in VMCS's HOST_CR4 and GUEST_CR4 to support virtualization. Note CR4.LAM_SUP
is allowed to be set even not in 64-bit mode, but it will not take effect
since LAM only applies to 64-bit linear addresses.

Move CR4.LAM_SUP out of CR4_RESERVED_BITS, its reservation depends on vcpu
supporting LAM or not. Leave it intercepted to prevent guest from setting
the bit if LAM is not exposed to guest as well as to avoid vmread every time
when KVM fetches its value, with the expectation that guest won't toggle the
bit frequently.

Set CR4.LAM_SUP bit in the emulated IA32_VMX_CR4_FIXED1 MSR for guests to
allow guests to enable LAM for supervisor pointers in nested VMX operation.

Hardware is not required to do TLB flush when CR4.LAM_SUP toggled, KVM
doesn't need to emulate TLB flush based on it.  There's no other features
or vmx_exec_controls connection, and no other code needed in
{kvm,vmx}_set_cr4().

Skip address untag for instruction fetches (which includes branch targets),
operand of INVLPG instructions, and implicit system accesses, all of which
are not subject to untagging.  Note, get_untagged_addr() isn't invoked for
implicit system accesses as there is no reason to do so, but check the
flag anyways for documentation purposes.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-11-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:07 -08:00
Binbin Wu
b39bd520a6 KVM: x86: Untag addresses for LAM emulation where applicable
Stub in vmx_get_untagged_addr() and wire up calls from the emulator (via
get_untagged_addr()) and "direct" calls from various VM-Exit handlers in
VMX where LAM untagging is supposed to be applied.  Defer implementing
the guts of vmx_get_untagged_addr() to future patches purely to make the
changes easier to consume.

LAM is active only for 64-bit linear addresses and several types of
accesses are exempted.

- Cases need to untag address (handled in get_vmx_mem_address())
  Operand(s) of VMX instructions and INVPCID.
  Operand(s) of SGX ENCLS.

- Cases LAM doesn't apply to (no change needed)
  Operand of INVLPG.
  Linear address in INVPCID descriptor.
  Linear address in INVVPID descriptor.
  BASEADDR specified in SECS of ECREATE.

Note:
  - LAM doesn't apply to write to control registers or MSRs
  - LAM masking is applied before walking page tables, i.e. the faulting
    linear address in CR2 doesn't contain the metadata.
  - The guest linear address saved in VMCS doesn't contain metadata.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-10-binbin.wu@linux.intel.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:07 -08:00
Binbin Wu
37a41847b7 KVM: x86: Introduce get_untagged_addr() in kvm_x86_ops and call it in emulator
Introduce a new interface get_untagged_addr() to kvm_x86_ops to untag
the metadata from linear address.  Call the interface in linearization
of instruction emulator for 64-bit mode.

When enabled feature like Intel Linear Address Masking (LAM) or AMD Upper
Address Ignore (UAI), linear addresses may be tagged with metadata that
needs to be dropped prior to canonicality checks, i.e. the metadata is
ignored.

Introduce get_untagged_addr() to kvm_x86_ops to hide the vendor specific
code, as sadly LAM and UAI have different semantics.  Pass the emulator
flags to allow vendor specific implementation to precisely identify the
access type (LAM doesn't untag certain accesses).

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-9-binbin.wu@linux.intel.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:06 -08:00
Binbin Wu
9c8021d4ae KVM: x86: Remove kvm_vcpu_is_illegal_gpa()
Remove kvm_vcpu_is_illegal_gpa() and use !kvm_vcpu_is_legal_gpa() instead.
The "illegal" helper actually predates the "legal" helper, the only reason
the "illegal" variant wasn't removed by commit 4bda0e9786 ("KVM: x86:
Add a helper to check for a legal GPA") was to avoid code churn.  Now that
CR3 has a dedicated helper, there are fewer callers, and so the code churn
isn't that much of a deterrent.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-8-binbin.wu@linux.intel.com
[sean: provide a bit of history in the changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:05 -08:00
Binbin Wu
2c49db455e KVM: x86: Add & use kvm_vcpu_is_legal_cr3() to check CR3's legality
Add and use kvm_vcpu_is_legal_cr3() to check CR3's legality to provide
a clear distinction between CR3 and GPA checks.  This will allow exempting
bits from kvm_vcpu_is_legal_cr3() without affecting general GPA checks,
e.g. for upcoming features that will use high bits in CR3 for feature
enabling.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-7-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:04 -08:00
Binbin Wu
a130066f74 KVM: x86/mmu: Drop non-PA bits when getting GFN for guest's PGD
Drop non-PA bits when getting GFN for guest's PGD with the maximum theoretical
mask for guest MAXPHYADDR.

Do it unconditionally because it's harmless for 32-bit guests, querying 64-bit
mode would be more expensive, and for EPT the mask isn't tied to guest mode.
Using PT_BASE_ADDR_MASK would be technically wrong (PAE paging has 64-bit
elements _except_ for CR3, which has only 32 valid bits), it wouldn't matter
in practice though.

Opportunistically use GENMASK_ULL() to define __PT_BASE_ADDR_MASK.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-6-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:04 -08:00
Binbin Wu
538ac9a92d KVM: x86: Add X86EMUL_F_INVLPG and pass it in em_invlpg()
Add an emulation flag X86EMUL_F_INVLPG, which is used to identify an
instruction that does TLB invalidation without true memory access.

Only invlpg & invlpga implemented in emulator belong to this kind.
invlpga doesn't need additional information for emulation. Just pass
the flag to em_invlpg().

Linear Address Masking (LAM) and Linear Address Space Separation (LASS)
don't apply to addresses that are inputs to TLB invalidation. The flag
will be consumed to support LAM/LASS virtualization.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-5-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:03 -08:00
Binbin Wu
3963c52df4 KVM: x86: Add an emulation flag for implicit system access
Add an emulation flag X86EMUL_F_IMPLICIT to identify implicit system access
in instruction emulation.  Don't bother wiring up any usage at this point,
as Linear Address Space Separation (LASS) will be the first "real" consumer
of the flag and LASS support will require dedicated hooks, i.e. there
aren't any existing calls where passing X86EMUL_F_IMPLICIT is meaningful.

Add the IMPLICIT flag even though there's no imminent usage so that
Linear Address Masking (LAM) support can reference the flag to document
that addresses for implicit accesses aren't untagged.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-4-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:02 -08:00
Binbin Wu
7b0dd9430c KVM: x86: Consolidate flags for __linearize()
Consolidate @write and @fetch of __linearize() into a set of flags so that
additional flags can be added without needing more/new boolean parameters,
to precisely identify the access type.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Link: https://lore.kernel.org/r/20230913124227.12574-2-binbin.wu@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-28 17:54:02 -08:00
Muralidhara M K
47b744ea5e x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types
Add HWID and McaType values for new SMCA bank types.

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231102114225.2006878-3-muralimk@amd.com
2023-11-28 16:26:55 +01:00
Christian Brauner
3652117f85 eventfd: simplify eventfd_signal()
Ever since the eventfd type was introduced back in 2007 in commit
e1ad7468c7 ("signal/timer/event: eventfd core") the eventfd_signal()
function only ever passed 1 as a value for @n. There's no point in
keeping that additional argument.

Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org
Acked-by: Xu Yilun <yilun.xu@intel.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl
Acked-by: Eric Farman <farman@linux.ibm.com>  # s390
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-28 14:08:38 +01:00
Lukas Bulwahn
c64545594d x86/Kconfig: Remove obsolete config X86_32_SMP
Commit

  0f08c3b229 ("x86/smp: Reduce code duplication")

removed the only use of CONFIG_X86_32_SMP.

Remove the now obsolete config X86_32_SMP too.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231128090016.29676-1-lukas.bulwahn@gmail.com
2023-11-28 13:31:34 +01:00
Juergen Gross
db2832309a x86/xen: fix percpu vcpu_info allocation
Today the percpu struct vcpu_info is allocated via DEFINE_PER_CPU(),
meaning that it could cross a page boundary. In this case registering
it with the hypervisor will fail, resulting in a panic().

This can easily be fixed by using DEFINE_PER_CPU_ALIGNED() instead,
as struct vcpu_info is guaranteed to have a size of 64 bytes, matching
the cache line size of x86 64-bit processors (Xen doesn't support
32-bit processors).

Fixes: 5ead97c84f ("xen: Core Xen implementation")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.con>
Link: https://lore.kernel.org/r/20231124074852.25161-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-28 12:47:11 +01:00
Yazen Ghannam
ff03ff328f x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
The long names of the SMCA banks are only used by the MCE decoder
module.

Move them out of the arch code and into the decoder module.

  [ bp: Name the long names array "smca_long_names", drop local ptr in
    decode_smca_error(), constify arrays. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231118193248.1296798-5-yazen.ghannam@amd.com
2023-11-27 12:16:51 +01:00
Linus Torvalds
4892711ace Fix/enhance x86 microcode version reporting: fix the bootup log spam,
and remove the driver version announcement to avoid version
 confusion when distros backport fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmVjFKgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1h/7hAAs5hL1KvrfdW0VpAW91MbX6mtIe7Emc8T
 LCiBJtl9UngRdASUC9CGrcIZ5JIps0702gAq0qPVzk5zKxC22ySWsqMZybask+eF
 d6E7amMtF+KX0wiCZSuC66StCKA08JfrUXgxvYHnxDjNqERYFmVr1QabGL1IN5lZ
 KUrVUyvN8VOnzypOiQ98lXGWDJYwaV7t+IzMMh7mT5OUkoo09e6tFm7IF+NWg5xe
 NYCcvZqyo0Ipld7HOjlGHYG+blFkDxJpfTby5UevZybXsPd00cxBzDSR9zs1sYeG
 Kt6cgwDhfewfcM1QFVvNV/SDVsPp1BlVvMUa6Xa3vtsnWCit8zQqMbkYYWUaTcIh
 yUJvtzh/xZQtxaQ8Z8SbI7EhUBOFJXoHWV9JoEe3gWsWA5thu+4iOCh/P8C2ON3n
 6kLSgNQ4GAylH34MWoS84t2Jxv7XmNZljR/78ucRQrJ1JJIEA+r4sJ9hK9btxqf2
 0n86StHuwtNXSQwEhDcacqUpFPLZ65Za1Y9AXc69CDuiwj3DvTVBMwjEOBbnGTrZ
 dL9QOYG5gkklOx4o5ePj7RoLrzz/j6dj6idmu8FxZZ4q+QB9vvL2lRHusJnUEloE
 yxR7WWOB/kyUZT7FriLHRuEP7yQNRSvLs6U7b8uXCHiGcAb2mN0fFi2m/BcJGS4A
 hHA0t9WyNBM=
 =eYQ4
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode fixes from Ingo Molnar:
 "Fix/enhance x86 microcode version reporting: fix the bootup log spam,
  and remove the driver version announcement to avoid version confusion
  when distros backport fixes"

* tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Rework early revisions reporting
  x86/microcode: Remove the driver announcement and version
2023-11-26 08:42:42 -08:00
Linus Torvalds
e81fe50520 Fix a bug in the Intel hybrid CPUs hardware-capabilities enumeration
code resulting in non-working events on those platforms.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmVjEt8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g54g/9EKugplQhVSfEXhzfBx38ICyodLxJuomL
 x96auVes+H3CEC6CYADzAY14GPDxymzOVSK/Sy8fWkzyGVWG23SZgB0i/hOTIsFL
 CVZLrVztidvkWWzJj5RnzNstZsUXp4QfNtcIw3VvCcsgDhk8WVen1xe5XbIuzamJ
 HbTrlrL2S3feAWaWZUh6c/3unHsLhCupLg8u0DTe97OFlRRlSj+wtmJCMCQoRRpL
 /LVItmmxfRbuMeyY8GGR6kb2qcb2+Zr6AUOXUPIoYOqARLuaV81F0kD2zwpPwB68
 rFPYBG2ld2IHkbNFgxfCfNqP2DuY5TnSpS8SJzcfTAp6saIA5Ua3QLwg8IumV/2X
 G3ZI/G54er2+hhddX7tUjJIiutwNHxGkiOKy3Bnxm8K8c2H0I8XZ97fB1OWQUnVy
 pnwMzz3tlUOfF9SV8K0zttu3Dar9dcoBybT8HL/Wxr4RWF+FuZBiQNjjd92k3L4u
 Ua8Q6nLD/IRXwQfiEWzDSsWIQqxuSDGxvGHBgYJm4d5DxblFAeKmKMkFnO6Z17Kh
 fm1M4nW/YawSQYlG7Jo5XG+dWESJPhM5IAVBw19o0oYkqHV2gfmUerG8hvJYR7DY
 CpLrw4S9UDOLzsYAtNoozB/GMQnXQvjRz8mILWiwNJxpbo1saeZUj43zHvzqr7ta
 Vj96qrQLuNU=
 =jOSq
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf event fix from Ingo Molnar:
 "Fix a bug in the Intel hybrid CPUs hardware-capabilities enumeration
  code resulting in non-working events on those platforms"

* tag 'perf-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities
2023-11-26 08:34:12 -08:00
Kan Liang
cb4a6ccf35 perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge
The same as Granite Rapids, the Sierra Forest and Grand Ridge also
supports the discovery table feature and the same type of the uncore
units. The difference of the available units and counters can be
retrieved from the discovery table automatically.
Just add the CPU model ID.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Link: https://lore.kernel.org/r/20231117163939.2468007-5-kan.liang@linux.intel.com
2023-11-24 20:25:03 +01:00
Kan Liang
388d76175b perf/x86/intel/uncore: Support IIO free-running counters on GNR
The free-running counters for IIO uncore blocks on Granite Rapids are
similar to Sapphire Rapids. The key difference is the offset of the
registers. The number of the IIO uncore blocks can also be retrieved
from the discovery table.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Link: https://lore.kernel.org/r/20231117163939.2468007-4-kan.liang@linux.intel.com
2023-11-24 20:25:02 +01:00
Kan Liang
632c4bf6d0 perf/x86/intel/uncore: Support Granite Rapids
The same as Sapphire Rapids, Granite Rapids also supports the discovery
table feature. All the basic uncore PMON information can be retrieved
from the discovery table which resides in the BIOS.

There are 4 new units are added on Granite Rapids, b2cmi, b2cxl, ubox,
and mdf_sbo. The layout of the counters is exactly the same as the
generic uncore counters. Only add a name for the new units. All the
details can be retrieved from the discovery table.
The description of the new units can be found at
https://www.intel.com/content/www/us/en/secure/content-details/772943/content-details.html

The other units, e.g., cha, iio, irp, pcu, and imc, are the same as
Sapphire Rapids.

Ignore the upi and b2upi units in the discovery table, which are broken
for now.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Link: https://lore.kernel.org/r/20231117163939.2468007-3-kan.liang@linux.intel.com
2023-11-24 20:25:01 +01:00
Kan Liang
b560e0cd88 perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array
The current perf doesn't save the complete address of an uncore unit.
The complete address of each unit is calculated by the base address +
offset. The type of the base address is u64, while the type of offset is
unsigned.
In the old platforms (without the discovery table method), the base
address and offset are hard coded in the driver. Perf can always use the
lowest address as the base address. Everything works well.

In the new platforms (starting from SPR), the discovery table provides
a complete address for all uncore units. To follow the current
framework/codes, when parsing the discovery table, the complete address
of the first box is stored as a base address. The offset of the
following units is calculated by the complete address of the unit minus
the base address (the address of the first unit). On GNR, the latter
units may have a lower address compared to the first unit. So the offset
is a negative value. The upper 32 bits are lost when casting a negative
u64 to an unsigned type.

Use u64 to replace unsigned for the uncore offsets array to correct the
above case. There is no functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Link: https://lore.kernel.org/r/20231117163939.2468007-2-kan.liang@linux.intel.com
2023-11-24 20:25:01 +01:00
Kan Liang
cf35791476 perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR
Factor out SPR_UNCORE_MMIO_COMMON_FORMAT which can be reused by
Granite Rapids in the following patch.

Granite Rapids have more uncore units than Sapphire Rapids. Add new
parameters to support adjustable uncore units.

No functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Ammy Yi <ammy.yi@intel.com>
Link: https://lore.kernel.org/r/20231117163939.2468007-1-kan.liang@linux.intel.com
2023-11-24 20:25:00 +01:00
James Morse
5bfa0e45e9 x86/cpu/intel_epb: Don't rely on link order
intel_epb_init() is called as a subsys_initcall() to register cpuhp
callbacks. The callbacks make use of get_cpu_device() which will return
NULL unless register_cpu() has been called. register_cpu() is called
from topology_init(), which is also a subsys_initcall().

This is fragile. Moving the register_cpu() to a different
subsys_initcall() leads to a NULL dereference during boot.

Make intel_epb_init() a late_initcall(), user-space can't provide a
policy before this point anyway.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-24 13:54:31 +01:00
Arnd Bergmann
42874e4eb3 arch: vdso: consolidate gettime prototypes
The VDSO functions are defined as globals in the kernel sources but intended
to be called from userspace, so there is no need to declare them in a kernel
side header.

Without a prototype, this now causes warnings such as

arch/mips/vdso/vgettimeofday.c:14:5: error: no previous prototype for '__vdso_clock_gettime' [-Werror=missing-prototypes]
arch/mips/vdso/vgettimeofday.c:28:5: error: no previous prototype for '__vdso_gettimeofday' [-Werror=missing-prototypes]
arch/mips/vdso/vgettimeofday.c:36:5: error: no previous prototype for '__vdso_clock_getres' [-Werror=missing-prototypes]
arch/mips/vdso/vgettimeofday.c:42:5: error: no previous prototype for '__vdso_clock_gettime64' [-Werror=missing-prototypes]
arch/sparc/vdso/vclock_gettime.c:254:1: error: no previous prototype for '__vdso_clock_gettime' [-Werror=missing-prototypes]
arch/sparc/vdso/vclock_gettime.c:282:1: error: no previous prototype for '__vdso_clock_gettime_stick' [-Werror=missing-prototypes]
arch/sparc/vdso/vclock_gettime.c:307:1: error: no previous prototype for '__vdso_gettimeofday' [-Werror=missing-prototypes]
arch/sparc/vdso/vclock_gettime.c:343:1: error: no previous prototype for '__vdso_gettimeofday_stick' [-Werror=missing-prototypes]

Most architectures have already added workarounds for these by adding
declarations somewhere, but since these are all compatible, we should
really just have one copy, with an #ifdef check for the 32-bit vs
64-bit variant and use that everywhere.

Unfortunately, the sparc an um versions are currently incompatible
since they never added support for __vdso_clock_gettime64() in 32-bit
userland. For the moment, I'm leaving this one out, as I can't
easily test it and it requires a larger rework.

Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-11-23 11:32:32 +01:00
Arnd Bergmann
f717a8d164 arch: include linux/cpu.h for trap_init() prototype
some architectures run into a -Wmissing-prototypes warning
for trap_init()

arch/microblaze/kernel/traps.c:21:6: warning: no previous prototype for 'trap_init' [-Wmissing-prototypes]

Include the right header to avoid this consistently, removing
the extra declarations on m68k and x86 that were added as local
workarounds already.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-11-23 11:32:31 +01:00
Arnd Bergmann
64bac5ea17 arch: consolidate arch_irq_work_raise prototypes
The prototype was hidden in an #ifdef on x86, which causes a warning:

kernel/irq_work.c:72:13: error: no previous prototype for 'arch_irq_work_raise' [-Werror=missing-prototypes]

Some architectures have a working prototype, while others don't.
Fix this by providing it in only one place that is always visible.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-11-23 11:32:29 +01:00
Adrian Huang
5e1c8a47fc x86/ioapic: Remove unfinished sentence from comment
[ mingo: Refine changelog. ]

Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
2023-11-23 11:00:24 +01:00
Yuntao Wang
03f111710a x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro
Commit 1e8f93e183 ("x86: Consolidate port I/O helpers") moved some
port I/O helpers to <asm/shared/io.h>, which caused the 'bw' parameter
in the BUILDIO() macro to become unused. Remove it.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20231123034911.217791-1-ytcoode@gmail.com
2023-11-23 10:57:36 +01:00
Yazen Ghannam
6175b40775 x86/mce/inject: Clear test status value
AMD systems generally allow MCA "simulation" where MCA registers can be
written with valid data and the full MCA handling flow can be tested by
software.

However, the platform on Scalable MCA systems, can prevent software from
writing data to the MCA registers. There is no architectural way to
determine this configuration. Therefore, the MCE injection module will
check for this behavior by writing and reading back a test status value.
This is done during module init, and the check can run on any CPU with
any valid MCA bank.

If MCA_STATUS writes are ignored by the platform, then there are no side
effects on the hardware state.

If the writes are not ignored, then the test status value will remain in
the hardware MCA_STATUS register. It is likely that the value will not
be overwritten by hardware or software, since the tested CPU and bank
are arbitrary. Therefore, the user may see a spurious, synthetic MCA
error reported whenever MCA is polled for this CPU.

Clear the test value immediately after writing it. It is very unlikely
that a valid MCA error is logged by hardware during the test. Errors
that cause an #MC won't be affected.

Fixes: 891e465a1b ("x86/mce: Check whether writes to MCA_STATUS are getting ignored")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231118193248.1296798-2-yazen.ghannam@amd.com
2023-11-22 19:13:38 +01:00
Linus Torvalds
05c8c94ed4 hyperv-fixes for 6.7-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmVdgqYTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXhBsCACzUGLF3vOQdrmgTMymzaaOzfLJtvNW
 oQ34FwMJMOAyJ6FxM12IJPHA2j+azl9CPjQc5O6F2CBcF8hVj2mDIINQIi+4wpV5
 FQv445g2KFml/+AJr/1waz1GmhHtr1rfu7B7NX6tPUtOpxKN7AHAQXWYmHnwK8BJ
 5Mh2a/7Lphjin4M1FWCeBTj0JtqF1oVAW2L9jsjqogq1JV0a51DIFutROtaPSC/4
 ssTLM5Rqpnw8Z1GWVYD2PObIW4A+h1LV1tNGOIoGW6mX56mPU+KmVA7tTKr8Je/i
 z3Jk8bZXFyLvPW2+KNJacbldKNcfwAFpReffNz/FO3R16Stq9Ta1mcE2
 =wXju
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - One fix for the KVP daemon (Ani Sinha)

 - Fix for the detection of E820_TYPE_PRAM in a Gen2 VM (Saurabh Sengar)

 - Micro-optimization for hv_nmi_unknown() (Uros Bizjak)

* tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
  x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM
  hv/hv_kvp_daemon: Some small fixes for handling NM keyfiles
2023-11-22 09:56:26 -08:00
Uros Bizjak
18286883e7 x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old
in hv_nmi_unknown(). On x86 the CMPXCHG instruction returns success in
the ZF flag, so this change saves a compare after CMPXCHG. The generated
asm code improves from:

  3e:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
  45:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
  4a:	f0 0f b1 15 00 00 00 	lock cmpxchg %edx,0x0(%rip)
  51:	00
  52:	83 f8 ff             	cmp    $0xffffffff,%eax
  55:	0f 95 c0             	setne  %al

to:

  3e:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
  45:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
  4a:	f0 0f b1 15 00 00 00 	lock cmpxchg %edx,0x0(%rip)
  51:	00
  52:	0f 95 c0             	setne  %al

No functional change intended.

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20231114170038.381634-1-ubizjak@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20231114170038.381634-1-ubizjak@gmail.com>
2023-11-22 03:47:44 +00:00
Andrew Cooper
5a7d6d26af x86/apic: Drop struct local_apic
This type predates recorded history in tglx/history.git, making it older
than Feb 5th 2002.

This structure is literally old enough to drink in most juristictions in
the world, and has not been used once in that time.

Lay it to rest in /dev/null.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20231102-x86-apic-v1-3-bf049a2a0ed6@citrix.com
2023-11-21 17:23:35 +01:00
Andrew Cooper
855da7cdf9 x86/apic: Drop enum apic_delivery_modes
The type is not used any more.

Replace the constants with plain defines so they can live outside of an
__ASSEMBLY__ block, allowing for more cleanup in subsequent changes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20231102-x86-apic-v1-2-bf049a2a0ed6@citrix.com
2023-11-21 17:05:06 +01:00
Andrew Cooper
07e8f88568 x86/apic: Drop apic::delivery_mode
This field is set to APIC_DELIVERY_MODE_FIXED in all cases, and is read
exactly once.  Fold the constant in uv_program_mmr() and drop the field.

Searching for the origin of the stale HyperV comment reveals commit
a31e58e129 ("x86/apic: Switch all APICs to Fixed delivery mode") which
notes:

  As a consequence of this change, the apic::irq_delivery_mode field is
  now pointless, but this needs to be cleaned up in a separate patch.

6 years is long enough for this technical debt to have survived.

  [ bp: Fold in
    https://lore.kernel.org/r/20231121123034.1442059-1-andrew.cooper3@citrix.com
  ]

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20231102-x86-apic-v1-1-bf049a2a0ed6@citrix.com
2023-11-21 16:58:54 +01:00
Borislav Petkov (AMD)
080990aa33 x86/microcode: Rework early revisions reporting
The AMD side of the loader issues the microcode revision for each
logical thread on the system, which can become really noisy on huge
machines. And doing that doesn't make a whole lot of sense - the
microcode revision is already in /proc/cpuinfo.

So in case one is interested in the theoretical support of mixed silicon
steppings on AMD, one can check there.

What is also missing on the AMD side - something which people have
requested before - is showing the microcode revision the CPU had
*before* the early update.

So abstract that up in the main code and have the BSP on each vendor
provide those revision numbers.

Then, dump them only once on driver init.

On Intel, do not dump the patch date - it is not needed.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/CAHk-=wg=%2B8rceshMkB4VnKxmRccVLtBLPBawnewZuuqyx5U=3A@mail.gmail.com
2023-11-21 16:35:48 +01:00
Borislav Petkov (AMD)
2e569ada42 x86/microcode: Remove the driver announcement and version
First of all, the print is useless. The driver will either load and say
which microcode revision the machine has or issue an error.

Then, the version number is meaningless and actively confusing, as Yazen
mentioned recently: when a subset of patches are backported to a distro
kernel, one can't assume the driver version is the same as the upstream
one. And besides, the version number of the loader hasn't been used and
incremented for a long time. So drop it.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231115210212.9981-2-bp@alien8.de
2023-11-21 16:20:49 +01:00
Peter Zijlstra
1e4d3001f5 x86/entry: Harden return-to-user
Make the CONFIG_DEBUG_ENTRY=y check that validates CS is a user segment
unconditional and move it nearer to IRET.

  PRE:
       140,026,608      cycles:k                                                      ( +-  0.01% )
       236,696,176      instructions:k            #    1.69  insn per cycle           ( +-  0.00% )

  POST:
       139,957,681      cycles:k                                                      ( +-  0.01% )
       236,681,819      instructions:k            #    1.69  insn per cycle           ( +-  0.00% )

(this is with --repeat 100 and the run-to-run variance is bigger than
the difference shown)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231120143626.753200755@infradead.org
2023-11-21 13:57:31 +01:00
Peter Zijlstra
c516213726 x86/entry: Optimize common_interrupt_return()
The code in common_interrupt_return() does a bunch of unconditional
work that is really only needed on PTI kernels. Specifically it
unconditionally copies the IRET frame back onto the entry stack,
swizzles onto the entry stack and does IRET from there.

However, without PTI we can simply IRET from whatever stack we're on.

  ivb-ep, mitigations=off, gettid-1m:

  PRE:
       140,118,538      cycles:k                                                      ( +-  0.01% )
       236,692,878      instructions:k            #    1.69  insn per cycle           ( +-  0.00% )

  POST:
       140,026,608      cycles:k                                                      ( +-  0.01% )
       236,696,176      instructions:k            #    1.69  insn per cycle           ( +-  0.00% )

(this is with --repeat 100 and the run-to-run variance is bigger than
the difference shown)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231120143626.638107480@infradead.org
2023-11-21 13:57:30 +01:00
Dapeng Mi
e8df9d9f42 perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities
When running perf-stat command on Intel hybrid platform, perf-stat
reports the following errors:

  sudo taskset -c 7 ./perf stat -vvvv -e cpu_atom/instructions/ sleep 1

  Opening: cpu/cycles/:HG
  ------------------------------------------------------------
  perf_event_attr:
    type                             0 (PERF_TYPE_HARDWARE)
    config                           0xa00000000
    disabled                         1
  ------------------------------------------------------------
  sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
  sys_perf_event_open failed, error -16

   Performance counter stats for 'sleep 1':

       <not counted>      cpu_atom/instructions/

It looks the cpu_atom/instructions/ event can't be enabled on atom PMU
even when the process is pinned on atom core. Investigation shows that
exclusive_event_init() helper always returns -EBUSY error in the perf
event creation. That's strange since the atom PMU should not be an
exclusive PMU.

Further investigation shows the issue was introduced by commit:

  97588df87b ("perf/x86/intel: Add common intel_pmu_init_hybrid()")

The commit originally intents to clear the bit PERF_PMU_CAP_AUX_OUTPUT
from PMU capabilities if intel_cap.pebs_output_pt_available is not set,
but it incorrectly uses 'or' operation and leads to all PMU capabilities
bits are set to 1 except bit PERF_PMU_CAP_AUX_OUTPUT.

Testing this fix on Intel hybrid platforms, the observed issues
disappear.

Fixes: 97588df87b ("perf/x86/intel: Add common intel_pmu_init_hybrid()")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231121014628.729989-1-dapeng1.mi@linux.intel.com
2023-11-21 13:44:36 +01:00
Borislav Petkov (AMD)
4e15b91c5b x86/mtrr: Document missing function parameters in kernel-doc
Add text explaining what they do.

No functional changes.

Closes: https://lore.kernel.org/oe-kbuild-all/202311130104.9xKAKzke-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/202311130104.9xKAKzke-lkp@intel.com
2023-11-20 10:19:27 +01:00
Linus Torvalds
cd557bc0a2 - Ignore invalid x2APIC entries in order to not waste per-CPU data
- Fix a back-to-back signals handling scenario when shadow stack is in
   use
 
 - A documentation fix
 
 - Add Kirill as TDX maintainer
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmVaChkACgkQEsHwGGHe
 VUraNQ/+KyCyJgG6bdIB3tS9qKr0Z4REaXQ+UQ7DfAjlhrzw7C6f4VReNLp3ohEv
 RdxNjKLEueYFQAo+v8uKGkqYIT6H1ob9uW+RjtjN+OJqWN/3AfK7CTx8HI1bJsW5
 wKM+Ey81cID0iQDiNPAdzRnu7suKKjF5jLwztAw6EYOsTRfUnLZ8Ct84uHBWd58v
 kZ+WkEyeOyeJo+Vdx07d/LEcCJ+S9G6WfA0AnhHPOZxRZTn2RhqNsnJvqTeOvWUM
 PSN9NjxFk0ymidwnhR1urw1wHGgTT990vNsPIHLE72TwXrWEOM14Xkq1XNI4PfD1
 Bp74ySpF0YUQrvgBW4V3qXgBFls4DkKys1amd2kK5KQGEpcXZm7ZPnI5w2NKMsY4
 1Tk379W/1jPY8cyZjIqn92eFEkAjfID4eHICLj5IJhVMUusNEPmxgoycvKDqI8sK
 NihF1wUjyfRibh4ujYaurqKUBgxVHo2dyXPPo7UNzeaMfvqkFaxgwNJVF0gQ+MyI
 5BzeY71RCFb8ZKtCT6SVN6oUeWLg+QAZApoJVDDnhF9InG+wJj+D400T7pZnNHbo
 ag6L2gJFJ2+XsV8DJhiaII0gfbf9cUppn4G7RcvQfL2HivYnZV3q1dBKf6C35H44
 Kpz5w/eoJPOIcuZ48a6ph80zuRpuN6MSBigZ0G2Q7IwrmFx1Vcg=
 =PGYO
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Ignore invalid x2APIC entries in order to not waste per-CPU data

 - Fix a back-to-back signals handling scenario when shadow stack is in
   use

 - A documentation fix

 - Add Kirill as TDX maintainer

* tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/acpi: Ignore invalid x2APIC entries
  x86/shstk: Delay signal entry SSP write until after user accesses
  x86/Documentation: Indent 'note::' directive for protocol version number note
  MAINTAINERS: Add Intel TDX entry
2023-11-19 13:46:17 -08:00
Colin Ian King
a24d61c609 x86/lib: Fix overflow when counting digits
tl;dr: The num_digits() function has a theoretical overflow issue.
But it doesn't affect any actual in-tree users.  Fix it by using
a larger type for one of the local variables.

Long version:

There is an overflow in variable m in function num_digits when val
is >= 1410065408 which leads to the digit calculation loop to
iterate more times than required. This results in either more
digits being counted or in some cases (for example where val is
1932683193) the value of m eventually overflows to zero and the
while loop spins forever).

Currently the function num_digits is currently only being used for
small values of val in the SMP boot stage for digit counting on the
number of cpus and NUMA nodes, so the overflow is never encountered.
However it is useful to fix the overflow issue in case the function
is used for other purposes in the future. (The issue was discovered
while investigating the digit counting performance in various
kernel helper functions rather than any real-world use-case).

The simplest fix is to make m a long long, the overhead in
multiplication speed for a long long is very minor for small values
of val less than 10000 on modern processors. The alternative
fix is to replace the multiplication with a constant division
by 10 loop (this compiles down to an multiplication and shift)
without needing to make m a long long, but this is slightly slower
than the fix in this commit when measured on a range of x86
processors).

[ dhansen: subject and changelog tweaks ]

Fixes: 646e29a178 ("x86: Improve the printout of the SMP bootup CPU table")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20231102174901.2590325-1-colin.i.king%40gmail.com
2023-11-17 06:26:14 -08:00
Eric Biggers
ba5a434d5a crypto: x86/sha256 - autoload if SHA-NI detected
The x86 SHA-256 module contains four implementations: SSSE3, AVX, AVX2,
and SHA-NI.  Commit 1c43c0f1f8 ("crypto: x86/sha - load modules based
on CPU features") made the module be autoloaded when SSSE3, AVX, or AVX2
is detected.  The omission of SHA-NI appears to be an oversight, perhaps
because of the outdated file-level comment.  This patch fixes this,
though in practice this makes no difference because SSSE3 is a subset of
the other three features anyway.  Indeed, sha256_ni_transform() executes
SSSE3 instructions such as pshufb.

Reviewed-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-11-17 19:16:29 +08:00
Eric Biggers
20342e3f64 crypto: x86/sha1 - autoload if SHA-NI detected
The x86 SHA-1 module contains four implementations: SSSE3, AVX, AVX2,
and SHA-NI.  Commit 1c43c0f1f8 ("crypto: x86/sha - load modules based
on CPU features") made the module be autoloaded when SSSE3, AVX, or AVX2
is detected.  The omission of SHA-NI appears to be an oversight, perhaps
because of the outdated file-level comment.  This patch fixes this,
though in practice this makes no difference because SSSE3 is a subset of
the other three features anyway.  Indeed, sha1_ni_transform() executes
SSSE3 instructions such as pshufb.

Reviewed-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-11-17 19:16:29 +08:00
Kan Liang
bbb968696d perf/x86/intel/cstate: Add Grand Ridge support
The same as the Sierra Forest, the Grand Ridge supports core C1/C6 and
module C6. But it doesn't support pkg C6 residency counter.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231116142245.1233485-4-kan.liang@linux.intel.com
2023-11-17 10:54:53 +01:00
Kan Liang
3877d55a0d perf/x86/intel/cstate: Add Sierra Forest support
A new module C6 Residency Counter is introduced in the Sierra Forest.
The scope of the new counter is module (A cluster of cores shared L2
cache). Create a brand new cstate_module PMU to profile the new counter.
The only differences between the new cstate_module PMU and the existing
cstate PMU are the scope and events.

Regarding the choice of the new cstate_module PMU name, the current
naming rule of a cstate PMU is "cstate_" + the scope of the PMU. The
scope of the PMU is the cores shared L2. On SRF, Intel calls it
"module", while the internal Linux sched code calls it "cluster". The
"cstate_module" is used as the new PMU name, because
- The Cstate PMU driver is a Intel specific driver. It doesn't impact
  other ARCHs. The name makes it consistent with the documentation.
- The "cluster" mainly be used by the scheduler developer, while the
  user of cstate PMU is more likely a researcher reading HW docs and
  optimizing power.
- In the Intel's SDM, the "cluster" has a different meaning/scope for
  topology. Using it will mislead the end users.

Besides the module C6, the core C1/C6 and pkg C6 residency counters are
supported in the Sierra Forest as well.

Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231116142245.1233485-3-kan.liang@linux.intel.com
2023-11-17 10:54:53 +01:00
Kan Liang
c3dd199562 x86/smp: Export symbol cpu_clustergroup_mask()
Intel cstate PMU driver will invoke the topology_cluster_cpumask() to
retrieve the CPU mask of a cluster. A modpost error is triggered since
the symbol cpu_clustergroup_mask is not exported.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231116142245.1233485-2-kan.liang@linux.intel.com
2023-11-17 10:54:52 +01:00
Kan Liang
243218ca93 perf/x86/intel/cstate: Cleanup duplicate attr_groups
The events of the cstate_core and cstate_pkg PMU have the same format.
They both need to create a "events" group (with empty attrs). The
attr_groups can be shared.

Remove the dedicated attr_groups for each cstate PMU. Use the shared
cstate_attr_groups to replace.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231116142245.1233485-1-kan.liang@linux.intel.com
2023-11-17 10:54:52 +01:00
Nikolay Borisov
612905e13b x86/mce: Remove redundant check from mce_device_create()
mce_device_create() is called only from mce_cpu_online() which in turn
will be called iff MCA support is available. That is, at the time of
mce_device_create() call it's guaranteed that MCA support is available.
No need to duplicate this check so remove it.

  [ bp: Massage commit message. ]

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231107165529.407349-1-nik.borisov@suse.com
2023-11-15 17:19:14 +01:00
Peter Zijlstra
5d2d4a9f60 Merge branch 'tip/perf/urgent'
Avoid conflicts, base on fixes.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2023-11-15 10:15:40 +01:00
Paolo Bonzini
6c370dc653 Merge branch 'kvm-guestmemfd' into HEAD
Introduce several new KVM uAPIs to ultimately create a guest-first memory
subsystem within KVM, a.k.a. guest_memfd.  Guest-first memory allows KVM
to provide features, enhancements, and optimizations that are kludgly
or outright impossible to implement in a generic memory subsystem.

The core KVM ioctl() for guest_memfd is KVM_CREATE_GUEST_MEMFD, which
similar to the generic memfd_create(), creates an anonymous file and
returns a file descriptor that refers to it.  Again like "regular"
memfd files, guest_memfd files live in RAM, have volatile storage,
and are automatically released when the last reference is dropped.
The key differences between memfd files (and every other memory subystem)
is that guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be resized.
guest_memfd files do however support PUNCH_HOLE, which can be used to
convert a guest memory area between the shared and guest-private states.

A second KVM ioctl(), KVM_SET_MEMORY_ATTRIBUTES, allows userspace to
specify attributes for a given page of guest memory.  In the long term,
it will likely be extended to allow userspace to specify per-gfn RWX
protections, including allowing memory to be writable in the guest
without it also being writable in host userspace.

The immediate and driving use case for guest_memfd are Confidential
(CoCo) VMs, specifically AMD's SEV-SNP, Intel's TDX, and KVM's own pKVM.
For such use cases, being able to map memory into KVM guests without
requiring said memory to be mapped into the host is a hard requirement.
While SEV+ and TDX prevent untrusted software from reading guest private
data by encrypting guest memory, pKVM provides confidentiality and
integrity *without* relying on memory encryption.  In addition, with
SEV-SNP and especially TDX, accessing guest private memory can be fatal
to the host, i.e. KVM must be prevent host userspace from accessing
guest memory irrespective of hardware behavior.

Long term, guest_memfd may be useful for use cases beyond CoCo VMs,
for example hardening userspace against unintentional accesses to guest
memory.  As mentioned earlier, KVM's ABI uses userspace VMA protections to
define the allow guest protection (with an exception granted to mapping
guest memory executable), and similarly KVM currently requires the guest
mapping size to be a strict subset of the host userspace mapping size.
Decoupling the mappings sizes would allow userspace to precisely map
only what is needed and with the required permissions, without impacting
guest performance.

A guest-first memory subsystem also provides clearer line of sight to
things like a dedicated memory pool (for slice-of-hardware VMs) and
elimination of "struct page" (for offload setups where userspace _never_
needs to DMA from or into guest memory).

guest_memfd is the result of 3+ years of development and exploration;
taking on memory management responsibilities in KVM was not the first,
second, or even third choice for supporting CoCo VMs.  But after many
failed attempts to avoid KVM-specific backing memory, and looking at
where things ended up, it is quite clear that of all approaches tried,
guest_memfd is the simplest, most robust, and most extensible, and the
right thing to do for KVM and the kernel at-large.

The "development cycle" for this version is going to be very short;
ideally, next week I will merge it as is in kvm/next, taking this through
the KVM tree for 6.8 immediately after the end of the merge window.
The series is still based on 6.6 (plus KVM changes for 6.7) so it
will require a small fixup for changes to get_file_rcu() introduced in
6.7 by commit 0ede61d858 ("file: convert to SLAB_TYPESAFE_BY_RCU").
The fixup will be done as part of the merge commit, and most of the text
above will become the commit message for the merge.

Pending post-merge work includes:
- hugepage support
- looking into using the restrictedmem framework for guest memory
- introducing a testing mechanism to poison memory, possibly using
  the same memory attributes introduced here
- SNP and TDX support

There are two non-KVM patches buried in the middle of this series:

  fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
  mm: Add AS_UNMOVABLE to mark mapping as completely unmovable

The first is small and mostly suggested-by Christian Brauner; the second
a bit less so but it was written by an mm person (Vlastimil Babka).
2023-11-14 08:31:31 -05:00
Sean Christopherson
89ea60c2c7 KVM: x86: Add support for "protected VMs" that can utilize private memory
Add a new x86 VM type, KVM_X86_SW_PROTECTED_VM, to serve as a development
and testing vehicle for Confidential (CoCo) VMs, and potentially to even
become a "real" product in the distant future, e.g. a la pKVM.

The private memory support in KVM x86 is aimed at AMD's SEV-SNP and
Intel's TDX, but those technologies are extremely complex (understatement),
difficult to debug, don't support running as nested guests, and require
hardware that's isn't universally accessible.  I.e. relying SEV-SNP or TDX
for maintaining guest private memory isn't a realistic option.

At the very least, KVM_X86_SW_PROTECTED_VM will enable a variety of
selftests for guest_memfd and private memory support without requiring
unique hardware.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20231027182217.3615211-24-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-14 08:01:05 -05:00
Sean Christopherson
eed52e434b KVM: Allow arch code to track number of memslot address spaces per VM
Let x86 track the number of address spaces on a per-VM basis so that KVM
can disallow SMM memslots for confidential VMs.  Confidentials VMs are
fundamentally incompatible with emulating SMM, which as the name suggests
requires being able to read and write guest memory and register state.

Disallowing SMM will simplify support for guest private memory, as KVM
will not need to worry about tracking memory attributes for multiple
address spaces (SMM is the only "non-default" address space across all
architectures).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-23-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-14 08:01:05 -05:00
Sean Christopherson
2333afa17a KVM: Drop superfluous __KVM_VCPU_MULTIPLE_ADDRESS_SPACE macro
Drop __KVM_VCPU_MULTIPLE_ADDRESS_SPACE and instead check the value of
KVM_ADDRESS_SPACE_NUM.

No functional change intended.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-22-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-14 08:01:04 -05:00
Chao Peng
8dd2eee9d5 KVM: x86/mmu: Handle page fault for private memory
Add support for resolving page faults on guest private memory for VMs
that differentiate between "shared" and "private" memory.  For such VMs,
KVM_MEM_GUEST_MEMFD memslots can include both fd-based private memory and
hva-based shared memory, and KVM needs to map in the "correct" variant,
i.e. KVM needs to map the gfn shared/private as appropriate based on the
current state of the gfn's KVM_MEMORY_ATTRIBUTE_PRIVATE flag.

For AMD's SEV-SNP and Intel's TDX, the guest effectively gets to request
shared vs. private via a bit in the guest page tables, i.e. what the guest
wants may conflict with the current memory attributes.  To support such
"implicit" conversion requests, exit to user with KVM_EXIT_MEMORY_FAULT
to forward the request to userspace.  Add a new flag for memory faults,
KVM_MEMORY_EXIT_FLAG_PRIVATE, to communicate whether the guest wants to
map memory as shared vs. private.

Like KVM_MEMORY_ATTRIBUTE_PRIVATE, use bit 3 for flagging private memory
so that KVM can use bits 0-2 for capturing RWX behavior if/when userspace
needs such information, e.g. a likely user of KVM_EXIT_MEMORY_FAULT is to
exit on missing mappings when handling guest page fault VM-Exits.  In
that case, userspace will want to know RWX information in order to
correctly/precisely resolve the fault.

Note, private memory *must* be backed by guest_memfd, i.e. shared mappings
always come from the host userspace page tables, and private mappings
always come from a guest_memfd instance.

Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-21-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-14 08:01:04 -05:00
Chao Peng
90b4fe1798 KVM: x86: Disallow hugepages when memory attributes are mixed
Disallow creating hugepages with mixed memory attributes, e.g. shared
versus private, as mapping a hugepage in this case would allow the guest
to access memory with the wrong attributes, e.g. overlaying private memory
with a shared hugepage.

Tracking whether or not attributes are mixed via the existing
disallow_lpage field, but use the most significant bit in 'disallow_lpage'
to indicate a hugepage has mixed attributes instead using the normal
refcounting.  Whether or not attributes are mixed is binary; either they
are or they aren't.  Attempting to squeeze that info into the refcount is
unnecessarily complex as it would require knowing the previous state of
the mixed count when updating attributes.  Using a flag means KVM just
needs to ensure the current status is reflected in the memslots.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-20-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-14 08:01:04 -05:00
Sean Christopherson
ee605e3156 KVM: x86: "Reset" vcpu->run->exit_reason early in KVM_RUN
Initialize run->exit_reason to KVM_EXIT_UNKNOWN early in KVM_RUN to reduce
the probability of exiting to userspace with a stale run->exit_reason that
*appears* to be valid.

To support fd-based guest memory (guest memory without a corresponding
userspace virtual address), KVM will exit to userspace for various memory
related errors, which userspace *may* be able to resolve, instead of using
e.g. BUS_MCEERR_AR.  And in the more distant future, KVM will also likely
utilize the same functionality to let userspace "intercept" and handle
memory faults when the userspace mapping is missing, i.e. when fast gup()
fails.

Because many of KVM's internal APIs related to guest memory use '0' to
indicate "success, continue on" and not "exit to userspace", reporting
memory faults/errors to userspace will set run->exit_reason and
corresponding fields in the run structure fields in conjunction with a
a non-zero, negative return code, e.g. -EFAULT or -EHWPOISON.  And because
KVM already returns  -EFAULT in many paths, there's a relatively high
probability that KVM could return -EFAULT without setting run->exit_reason,
in which case reporting KVM_EXIT_UNKNOWN is much better than reporting
whatever exit reason happened to be in the run structure.

Note, KVM must wait until after run->immediate_exit is serviced to
sanitize run->exit_reason as KVM's ABI is that run->exit_reason is
preserved across KVM_RUN when run->immediate_exit is true.

Link: https://lore.kernel.org/all/20230908222905.1321305-1-amoorthy@google.com
Link: https://lore.kernel.org/all/ZFFbwOXZ5uI%2Fgdaf@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-19-seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-14 08:01:03 -05:00
Hou Wenlong
fe22bc430c x86/paravirt: Make the struct paravirt_patch_site packed
Similar to struct alt_instr, make the struct paravirt_patch_site packed
and get rid of all the .align directives and save 2 bytes for one
PARA_SITE entry on X86_64.

  [ bp: Massage commit message. ]

Suggested-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/6dcb20159ded36586c5f7f2ae159e4e030256627.1686301237.git.houwenlong.hwl@antgroup.com
2023-11-13 12:43:50 +01:00
Hou Wenlong
5c22c4726e x86/paravirt: Use relative reference for the original instruction offset
Similar to the alternative patching, use a relative reference for original
instruction offset rather than absolute one, which saves 8 bytes for one
PARA_SITE entry on x86_64.  As a result, a R_X86_64_PC32 relocation is
generated instead of an R_X86_64_64 one, which also reduces relocation
metadata on relocatable builds. Hardcode the alignment to 4 now.

  [ bp: Massage commit message. ]

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/9e6053107fbaabc0d33e5d2865c5af2c67ec9925.1686301237.git.houwenlong.hwl@antgroup.com
2023-11-13 12:23:27 +01:00
Chao Peng
16f95f3b95 KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace
Add a new KVM exit type to allow userspace to handle memory faults that
KVM cannot resolve, but that userspace *may* be able to handle (without
terminating the guest).

KVM will initially use KVM_EXIT_MEMORY_FAULT to report implicit
conversions between private and shared memory.  With guest private memory,
there will be two kind of memory conversions:

  - explicit conversion: happens when the guest explicitly calls into KVM
    to map a range (as private or shared)

  - implicit conversion: happens when the guest attempts to access a gfn
    that is configured in the "wrong" state (private vs. shared)

On x86 (first architecture to support guest private memory), explicit
conversions will be reported via KVM_EXIT_HYPERCALL+KVM_HC_MAP_GPA_RANGE,
but reporting KVM_EXIT_HYPERCALL for implicit conversions is undesriable
as there is (obviously) no hypercall, and there is no guarantee that the
guest actually intends to convert between private and shared, i.e. what
KVM thinks is an implicit conversion "request" could actually be the
result of a guest code bug.

KVM_EXIT_MEMORY_FAULT will be used to report memory faults that appear to
be implicit conversions.

Note!  To allow for future possibilities where KVM reports
KVM_EXIT_MEMORY_FAULT and fills run->memory_fault on _any_ unresolved
fault, KVM returns "-EFAULT" (-1 with errno == EFAULT from userspace's
perspective), not '0'!  Due to historical baggage within KVM, exiting to
userspace with '0' from deep callstacks, e.g. in emulation paths, is
infeasible as doing so would require a near-complete overhaul of KVM,
whereas KVM already propagates -errno return codes to userspace even when
the -errno originated in a low level helper.

Report the gpa+size instead of a single gfn even though the initial usage
is expected to always report single pages.  It's entirely possible, likely
even, that KVM will someday support sub-page granularity faults, e.g.
Intel's sub-page protection feature allows for additional protections at
128-byte granularity.

Link: https://lore.kernel.org/all/20230908222905.1321305-5-amoorthy@google.com
Link: https://lore.kernel.org/all/ZQ3AmLO2SYv3DszH@google.com
Cc: Anish Moorthy <amoorthy@google.com>
Cc: David Matlack <dmatlack@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20231027182217.3615211-10-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-13 05:31:11 -05:00
Sean Christopherson
bb58b90b1a KVM: Introduce KVM_SET_USER_MEMORY_REGION2
Introduce a "version 2" of KVM_SET_USER_MEMORY_REGION so that additional
information can be supplied without setting userspace up to fail.  The
padding in the new kvm_userspace_memory_region2 structure will be used to
pass a file descriptor in addition to the userspace_addr, i.e. allow
userspace to point at a file descriptor and map memory into a guest that
is NOT mapped into host userspace.

Alternatively, KVM could simply add "struct kvm_userspace_memory_region2"
without a new ioctl(), but as Paolo pointed out, adding a new ioctl()
makes detection of bad flags a bit more robust, e.g. if the new fd field
is guarded only by a flag and not a new ioctl(), then a userspace bug
(setting a "bad" flag) would generate out-of-bounds access instead of an
-EINVAL error.

Cc: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-9-seanjc@google.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-13 05:30:41 -05:00
Sean Christopherson
f128cf8cfb KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIER
Convert KVM_ARCH_WANT_MMU_NOTIFIER into a Kconfig and select it where
appropriate to effectively maintain existing behavior.  Using a proper
Kconfig will simplify building more functionality on top of KVM's
mmu_notifier infrastructure.

Add a forward declaration of kvm_gfn_range to kvm_types.h so that
including arch/powerpc/include/asm/kvm_ppc.h's with CONFIG_KVM=n doesn't
generate warnings due to kvm_gfn_range being undeclared.  PPC defines
hooks for PR vs. HV without guarding them via #ifdeffery, e.g.

  bool (*unmap_gfn_range)(struct kvm *kvm, struct kvm_gfn_range *range);
  bool (*age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range);
  bool (*test_age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range);
  bool (*set_spte_gfn)(struct kvm *kvm, struct kvm_gfn_range *range);

Alternatively, PPC could forward declare kvm_gfn_range, but there's no
good reason not to define it in common KVM.

Acked-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Message-Id: <20231027182217.3615211-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-13 05:29:09 -05:00
Chao Peng
8569992d64 KVM: Use gfn instead of hva for mmu_notifier_retry
Currently in mmu_notifier invalidate path, hva range is recorded and then
checked against by mmu_invalidate_retry_hva() in the page fault handling
path. However, for the soon-to-be-introduced private memory, a page fault
may not have a hva associated, checking gfn(gpa) makes more sense.

For existing hva based shared memory, gfn is expected to also work. The
only downside is when aliasing multiple gfns to a single hva, the
current algorithm of checking multiple ranges could result in a much
larger range being rejected. Such aliasing should be uncommon, so the
impact is expected small.

Suggested-by: Sean Christopherson <seanjc@google.com>
Cc: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
[sean: convert vmx_set_apic_access_page_addr() to gfn-based API]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Xu Yilun <yilun.xu@linux.intel.com>
Message-Id: <20231027182217.3615211-4-seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-13 05:28:53 -05:00
Borislav Petkov (AMD)
04c3024560 x86/barrier: Do not serialize MSR accesses on AMD
AMD does not have the requirement for a synchronization barrier when
acccessing a certain group of MSRs. Do not incur that unnecessary
penalty there.

There will be a CPUID bit which explicitly states that a MFENCE is not
needed. Once that bit is added to the APM, this will be extended with
it.

While at it, move to processor.h to avoid include hell. Untangling that
file properly is a matter for another day.

Some notes on the performance aspect of why this is relevant, courtesy
of Kishon VijayAbraham <Kishon.VijayAbraham@amd.com>:

On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM
shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The
ipi-bench is modified so that the IPIs are sent between two vCPUs in the
same CCX. This also requires to pin the vCPU to a physical core to
prevent any latencies. This simulates the use case of pinning vCPUs to
the thread of a single CCX to avoid interrupt IPI latency.

In order to avoid run-to-run variance (for both x2AVIC and AVIC), the
below configurations are done:

  1) Disable Power States in BIOS (to prevent the system from going to
     lower power state)

  2) Run the system at fixed frequency 2500MHz (to prevent the system
     from increasing the frequency when the load is more)

With the above configuration:

*) Performance measured using ipi-bench for AVIC:
  Average Latency:  1124.98ns [Time to send IPI from one vCPU to another vCPU]

  Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from
  				     48 vCPUs simultaneously]

*) Performance measured using ipi-bench for x2AVIC:
  Average Latency:  1172.42ns [Time to send IPI from one vCPU to another vCPU]

  Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from
  				     48 vCPUs simultaneously]

From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is
x2AVIC performance to be better or equivalent to AVIC. Upon analyzing
the perf captures, it is observed significant time is spent in
weak_wrmsr_fence() invoked by x2apic_send_IPI().

With the fix to skip weak_wrmsr_fence()

*) Performance measured using ipi-bench for x2AVIC:
  Average Latency:  1117.44ns [Time to send IPI from one vCPU to another vCPU]

  Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from
  				     48 vCPUs simultaneously]

Comparing the performance of x2AVIC with and without the fix, it can be seen
the performance improves by ~4%.

Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option
with and without weak_wrmsr_fence() on a Zen4 system also showed significant
performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores
CCX or CCD and just picks random vCPU.

  Average throughput (10 iterations) with weak_wrmsr_fence(),
        Cumulative throughput: 4933374 IPI/s

  Average throughput (10 iterations) without weak_wrmsr_fence(),
        Cumulative throughput: 6355156 IPI/s

[1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230622095212.20940-1-bp@alien8.de
2023-11-13 10:09:45 +01:00
Zhiquan Li
9f3b130048 x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
Memory errors don't happen very often, especially fatal ones. However,
in large-scale scenarios such as data centers, that probability
increases with the amount of machines present.

When a fatal machine check happens, mce_panic() is called based on the
severity grading of that error. The page containing the error is not
marked as poison.

However, when kexec is enabled, tools like makedumpfile understand when
pages are marked as poison and do not touch them so as not to cause
a fatal machine check exception again while dumping the previous
kernel's memory.

Therefore, mark the page containing the error as poisoned so that the
kexec'ed kernel can avoid accessing the page.

  [ bp: Rewrite commit message and comment. ]

Co-developed-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@intel.com
2023-11-13 09:53:15 +01:00
Yuntao Wang
f7a25cf1d4 x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
After

  0b62f6cb07 ("x86/microcode/32: Move early loading after paging enable"),

the global variable relocated_ramdisk is no longer used anywhere except
for the relocate_initrd() function. Make it a local variable of that
function.

Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20231113034026.130679-1-ytcoode@gmail.com
2023-11-13 09:09:37 +01:00
Roger Pau Monne
bfa993b355 acpi/processor: sanitize _OSC/_PDC capabilities for Xen dom0
The Processor capability bits notify ACPI of the OS capabilities, and
so ACPI can adjust the return of other Processor methods taking the OS
capabilities into account.

When Linux is running as a Xen dom0, the hypervisor is the entity
in charge of processor power management, and hence Xen needs to make
sure the capabilities reported by _OSC/_PDC match the capabilities of
the driver in Xen.

Introduce a small helper to sanitize the buffer when running as Xen
dom0.

When Xen supports HWP, this serves as the equivalent of commit
a21211672c ("ACPI / processor: Request native thermal interrupt
handling via _OSC") to avoid SMM crashes.  Xen will set bit
ACPI_PROC_CAP_COLLAB_PROC_PERF (bit 12) in the capability bits and the
_OSC/_PDC call will apply it.

[ jandryuk: Mention Xen HWP's need.  Support _OSC & _PDC ]
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20231108212517.72279-1-jandryuk@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-13 07:22:00 +01:00
Casey Schaufler
5f42375904 LSM: wireup Linux Security Module syscalls
Wireup lsm_get_self_attr, lsm_set_self_attr and lsm_list_modules
system calls.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: linux-api@vger.kernel.org
Reviewed-by: Mickaël Salaün <mic@digikod.net>
[PM: forward ported beyond v6.6 due merge window changes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-11-12 22:54:42 -05:00
Saurabh Sengar
7e8037b099 x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM
A Gen2 VM doesn't support legacy PCI/PCIe, so both raw_pci_ops and
raw_pci_ext_ops are NULL, and pci_subsys_init() -> pcibios_init()
doesn't call pcibios_resource_survey() -> e820__reserve_resources_late();
as a result, any emulated persistent memory of E820_TYPE_PRAM (12) via
the kernel parameter memmap=nn[KMG]!ss is not added into iomem_resource
and hence can't be detected by register_e820_pmem().

Fix this by directly calling e820__reserve_resources_late() in
hv_pci_init(), which is called from arch_initcall(pci_arch_init).

It's ok to move a Gen2 VM's e820__reserve_resources_late() from
subsys_initcall(pci_subsys_init) to arch_initcall(pci_arch_init) because
the code in-between doesn't depend on the E820 resources.
e820__reserve_resources_late() depends on e820__reserve_resources(),
which has been called earlier from setup_arch().

For a Gen-2 VM, the new hv_pci_init() also adds any memory of
E820_TYPE_PMEM (7) into iomem_resource, and acpi_nfit_register_region() ->
acpi_nfit_insert_resource() -> region_intersects() returns
REGION_INTERSECTS, so the memory of E820_TYPE_PMEM won't get added twice.

Changed the local variable "int gen2vm" to "bool gen2vm".

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1699691867-9827-1-git-send-email-ssengar@linux.microsoft.com>
2023-11-12 22:50:30 +00:00
Arnd Bergmann
abc28463c8 kprobes: unify kprobes_exceptions_nofify() prototypes
Most architectures that support kprobes declare this function in their
own asm/kprobes.h header and provide an override, but some are missing
the prototype, which causes a warning for the __weak stub implementation:

kernel/kprobes.c:1865:12: error: no previous prototype for 'kprobe_exceptions_notify' [-Werror=missing-prototypes]
 1865 | int __weak kprobe_exceptions_notify(struct notifier_block *self,

Move the prototype into linux/kprobes.h so it is visible to all
the definitions.

Link: https://lore.kernel.org/all/20231108125843.3806765-4-arnd@kernel.org/

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-11-10 19:59:05 +09:00
Zhang Rui
ec9aedb2aa x86/acpi: Ignore invalid x2APIC entries
Currently, the kernel enumerates the possible CPUs by parsing both ACPI
MADT Local APIC entries and x2APIC entries. So CPUs with "valid" APIC IDs,
even if they have duplicated APIC IDs in Local APIC and x2APIC, are always
enumerated.

Below is what ACPI MADT Local APIC and x2APIC describes on an
Ivebridge-EP system,

[02Ch 0044   1]                Subtable Type : 00 [Processor Local APIC]
[02Fh 0047   1]                Local Apic ID : 00
...
[164h 0356   1]                Subtable Type : 00 [Processor Local APIC]
[167h 0359   1]                Local Apic ID : 39
[16Ch 0364   1]                Subtable Type : 00 [Processor Local APIC]
[16Fh 0367   1]                Local Apic ID : FF
...
[3ECh 1004   1]                Subtable Type : 09 [Processor Local x2APIC]
[3F0h 1008   4]                Processor x2Apic ID : 00000000
...
[B5Ch 2908   1]                Subtable Type : 09 [Processor Local x2APIC]
[B60h 2912   4]                Processor x2Apic ID : 00000077

As a result, kernel shows "smpboot: Allowing 168 CPUs, 120 hotplug CPUs".
And this wastes significant amount of memory for the per-cpu data.
Plus this also breaks https://lore.kernel.org/all/87edm36qqb.ffs@tglx/,
because __max_logical_packages is over-estimated by the APIC IDs in
the x2APIC entries.

According to https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-local-x2apic-structure:

  "[Compatibility note] On some legacy OSes, Logical processors with APIC
   ID values less than 255 (whether in XAPIC or X2APIC mode) must use the
   Processor Local APIC structure to convey their APIC information to OSPM,
   and those processors must be declared in the DSDT using the Processor()
   keyword. Logical processors with APIC ID values 255 and greater must use
   the Processor Local x2APIC structure and be declared using the Device()
   keyword."

Therefore prevent the registration of x2APIC entries with an APIC ID less
than 255 if the local APIC table enumerates valid APIC IDs.

[ tglx: Simplify the logic ]

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230702162802.344176-1-rui.zhang@intel.com
2023-11-09 14:33:30 +01:00
Rick Edgecombe
31255e072b x86/shstk: Delay signal entry SSP write until after user accesses
When a signal is being delivered, the kernel needs to make accesses to
userspace. These accesses could encounter an access error, in which case
the signal delivery itself will trigger a segfault. Usually this would
result in the kernel killing the process. But in the case of a SEGV signal
handler being configured, the failure of the first signal delivery will
result in *another* signal getting delivered. The second signal may
succeed if another thread has resolved the issue that triggered the
segfault (i.e. a well timed mprotect()/mmap()), or the second signal is
being delivered to another stack (i.e. an alt stack).

On x86, in the non-shadow stack case, all the accesses to userspace are
done before changes to the registers (in pt_regs). The operation is
aborted when an access error occurs, so although there may be writes done
for the first signal, control flow changes for the signal (regs->ip,
regs->sp, etc) are not committed until all the accesses have already
completed successfully. This means that the second signal will be
delivered as if it happened at the time of the first signal. It will
effectively replace the first aborted signal, overwriting the half-written
frame of the aborted signal. So on sigreturn from the second signal,
control flow will resume happily from the point of control flow where the
original signal was delivered.

The problem is, when shadow stack is active, the shadow stack SSP
register/MSR is updated *before* some of the userspace accesses. This
means if the earlier accesses succeed and the later ones fail, the second
signal will not be delivered at the same spot on the shadow stack as the
first one. So on sigreturn from the second signal, the SSP will be
pointing to the wrong location on the shadow stack (off by a frame).

Pengfei privately reported that while using a shadow stack enabled glibc,
the “signal06” test in the LTP test-suite hung. It turns out it is
testing the above described double signal scenario. When this test was
compiled with shadow stack, the first signal pushed a shadow stack
sigframe, then the second pushed another. When the second signal was
handled, the SSP was at the first shadow stack signal frame instead of
the original location. The test then got stuck as the #CP from the twice
incremented SSP was incorrect and generated segfaults in a loop.

Fix this by adjusting the SSP register only after any userspace accesses,
such that there can be no failures after the SSP is adjusted. Do this by
moving the shadow stack sigframe push logic to happen after all other
userspace accesses.

Note, sigreturn (as opposed to the signal delivery dealt with in this
patch) has ordering behavior that could lead to similar failures. The
ordering issues there extend beyond shadow stack to include the alt stack
restoration. Fixing that would require cross-arch changes, and the
ordering today does not cause any known test or apps breakages. So leave
it as is, for now.

[ dhansen: minor changelog/subject tweak ]

Fixes: 05e36022c0 ("x86/shstk: Handle signals for shadow stack")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20231107182251.91276-1-rick.p.edgecombe%40intel.com
Link: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/signal/signal06.c
2023-11-08 08:55:37 -08:00
Linus Torvalds
5e2cb28dd7 configfs-tsm for v6.7
- Introduce configfs-tsm as a shared ABI for confidential computing
   attestation reports
 
 - Convert sev-guest to additionally support configfs-tsm alongside its
   vendor specific ioctl()
 
 - Added signed attestation report retrieval to the tdx-guest driver
   forgoing a new vendor specific ioctl()
 
 - Misc. cleanups and a new __free() annotation for kvfree()
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZUQhiQAKCRDfioYZHlFs
 Z2gMAQCJdtP0f2kH+pvf3oxAkA1OubKBqJqWOppeyrhTsNMpDQEA9ljXH9h7eRB/
 2NQ6USrU6jqcdu3gB5Tzq8J/ZZabMQU=
 =1Eiv
 -----END PGP SIGNATURE-----

Merge tag 'tsm-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux

Pull unified attestation reporting from Dan Williams:
 "In an ideal world there would be a cross-vendor standard attestation
  report format for confidential guests along with a common device
  definition to act as the transport.

  In the real world the situation ended up with multiple platform
  vendors inventing their own attestation report formats with the
  SEV-SNP implementation being a first mover to define a custom
  sev-guest character device and corresponding ioctl(). Later, this
  configfs-tsm proposal intercepted an attempt to add a tdx-guest
  character device and a corresponding new ioctl(). It also anticipated
  ARM and RISC-V showing up with more chardevs and more ioctls().

  The proposal takes for granted that Linux tolerates the vendor report
  format differentiation until a standard arrives. From talking with
  folks involved, it sounds like that standardization work is unlikely
  to resolve anytime soon. It also takes the position that kernfs ABIs
  are easier to maintain than ioctl(). The result is a shared configfs
  mechanism to return per-vendor report-blobs with the option to later
  support a standard when that arrives.

  Part of the goal here also is to get the community into the
  "uncomfortable, but beneficial to the long term maintainability of the
  kernel" state of talking to each other about their differentiation and
  opportunities to collaborate. Think of this like the device-driver
  equivalent of the common memory-management infrastructure for
  confidential-computing being built up in KVM.

  As for establishing an "upstream path for cross-vendor
  confidential-computing device driver infrastructure" this is something
  I want to discuss at Plumbers. At present, the multiple vendor
  proposals for assigning devices to confidential computing VMs likely
  needs a new dedicated repository and maintainer team, but that is a
  discussion for v6.8.

  For now, Greg and Thomas have acked this approach and this is passing
  is AMD, Intel, and Google tests.

  Summary:

   - Introduce configfs-tsm as a shared ABI for confidential computing
     attestation reports

   - Convert sev-guest to additionally support configfs-tsm alongside
     its vendor specific ioctl()

   - Added signed attestation report retrieval to the tdx-guest driver
     forgoing a new vendor specific ioctl()

   - Misc cleanups and a new __free() annotation for kvfree()"

* tag 'tsm-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux:
  virt: tdx-guest: Add Quote generation support using TSM_REPORTS
  virt: sevguest: Add TSM_REPORTS support for SNP_GET_EXT_REPORT
  mm/slab: Add __free() support for kvfree
  virt: sevguest: Prep for kernel internal get_ext_report()
  configfs-tsm: Introduce a shared ABI for attestation reports
  virt: coco: Add a coco/Makefile and coco/Kconfig
  virt: sevguest: Fix passing a stack buffer as a scatterlist target
2023-11-04 15:58:13 -10:00
Linus Torvalds
0a23fb262d Major microcode loader restructuring, cleanup and improvements by Thomas
Gleixner:
 
 - Restructure the code needed for it and add a temporary initrd mapping
   on 32-bit so that the loader can access the microcode blobs. This in
   itself is a preparation for the next major improvement:
 
 - Do not load microcode on 32-bit before paging has been enabled.
   Handling this has caused an endless stream of headaches, issues, ugly
   code and unnecessary hacks in the past. And there really wasn't any
   sensible reason to do that in the first place. So switch the 32-bit
   loading to happen after paging has been enabled and turn the loader
   code "real purrty" again
 
 - Drop mixed microcode steppings loading on Intel - there, a single patch
   loaded on the whole system is sufficient
 
 - Rework late loading to track which CPUs have updated microcode
   successfully and which haven't, act accordingly
 
 - Move late microcode loading on Intel in NMI context in order to
   guarantee concurrent loading on all threads
 
 - Make the late loading CPU-hotplug-safe and have the offlined threads
   be woken up for the purpose of the update
 
 - Add support for a minimum revision which determines whether late
   microcode loading is safe on a machine and the microcode does not
   change software visible features which the machine cannot use anyway
   since feature detection has happened already. Roughly, the minimum
   revision is the smallest revision number which must be loaded
   currently on the system so that late updates can be allowed
 
 - Other nice leanups, fixess, etc all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmVE0xkACgkQEsHwGGHe
 VUrCuBAAhOqqwkYPiGXPWd2hvdn1zGtD5fvEdXn3Orzd+Lwc6YaQTsCxCjIO/0ws
 8inpPFuOeGz4TZcplzipi3G5oatPVc7ORDuW+/BvQQQljZOsSKfhiaC29t6dvS6z
 UG3sbCXKVwlJ5Kwv3Qe4eWur4Ex6GeFDZkIvBCmbaAdGPFlfu1i2uO1yBooNP1Rs
 GiUkp+dP1/KREWwR/dOIsHYL2QjWIWfHQEWit/9Bj46rxE9ERx/TWt3AeKPfKriO
 Wp0JKp6QY78jg6a0a2/JVmbT1BKz69Z9aPp6hl4P2MfbBYOnqijRhdezFW0NyqV2
 pn6nsuiLIiXbnSOEw0+Wdnw5Q0qhICs5B5eaBfQrwgfZ8pxPHv2Ir777GvUTV01E
 Dv0ZpYsHa+mHe17nlK8V3+4eajt0PetExcXAYNiIE+pCb7pLjjKkl8e+lcEvEsO0
 QSL3zG5i5RWUMPYUvaFRgepWy3k/GPIoDQjRcUD3P+1T0GmnogNN10MMNhmOzfWU
 pyafe4tJUOVsq0HJ7V/bxIHk2p+Q+5JLKh5xBm9janE4BpabmSQnvFWNblVfK4ig
 M9ohjI/yMtgXROC4xkNXgi8wE5jfDKBghT6FjTqKWSV45vknF1mNEjvuaY+aRZ3H
 MB4P3HCj+PKWJimWHRYnDshcytkgcgVcYDiim8va/4UDrw8O2ks=
 =JOZu
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode loading updates from Borislac Petkov:
 "Major microcode loader restructuring, cleanup and improvements by
  Thomas Gleixner:

   - Restructure the code needed for it and add a temporary initrd
     mapping on 32-bit so that the loader can access the microcode
     blobs. This in itself is a preparation for the next major
     improvement:

   - Do not load microcode on 32-bit before paging has been enabled.

     Handling this has caused an endless stream of headaches, issues,
     ugly code and unnecessary hacks in the past. And there really
     wasn't any sensible reason to do that in the first place. So switch
     the 32-bit loading to happen after paging has been enabled and turn
     the loader code "real purrty" again

   - Drop mixed microcode steppings loading on Intel - there, a single
     patch loaded on the whole system is sufficient

   - Rework late loading to track which CPUs have updated microcode
     successfully and which haven't, act accordingly

   - Move late microcode loading on Intel in NMI context in order to
     guarantee concurrent loading on all threads

   - Make the late loading CPU-hotplug-safe and have the offlined
     threads be woken up for the purpose of the update

   - Add support for a minimum revision which determines whether late
     microcode loading is safe on a machine and the microcode does not
     change software visible features which the machine cannot use
     anyway since feature detection has happened already. Roughly, the
     minimum revision is the smallest revision number which must be
     loaded currently on the system so that late updates can be allowed

   - Other nice leanups, fixess, etc all over the place"

* tag 'x86_microcode_for_v6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  x86/microcode/intel: Add a minimum required revision for late loading
  x86/microcode: Prepare for minimal revision check
  x86/microcode: Handle "offline" CPUs correctly
  x86/apic: Provide apic_force_nmi_on_cpu()
  x86/microcode: Protect against instrumentation
  x86/microcode: Rendezvous and load in NMI
  x86/microcode: Replace the all-in-one rendevous handler
  x86/microcode: Provide new control functions
  x86/microcode: Add per CPU control field
  x86/microcode: Add per CPU result state
  x86/microcode: Sanitize __wait_for_cpus()
  x86/microcode: Clarify the late load logic
  x86/microcode: Handle "nosmt" correctly
  x86/microcode: Clean up mc_cpu_down_prep()
  x86/microcode: Get rid of the schedule work indirection
  x86/microcode: Mop up early loading leftovers
  x86/microcode/amd: Use cached microcode for AP load
  x86/microcode/amd: Cache builtin/initrd microcode early
  x86/microcode/amd: Cache builtin microcode too
  x86/microcode/amd: Use correct per CPU ucode_cpu_info
  ...
2023-11-04 08:46:37 -10:00
Linus Torvalds
5c5e048b24 Kbuild updates for v6.7
- Implement the binary search in modpost for faster symbol lookup
 
  - Respect HOSTCC when linking host programs written in Rust
 
  - Change the binrpm-pkg target to generate kernel-devel RPM package
 
  - Fix endianness issues for tee and ishtp MODULE_DEVICE_TABLE
 
  - Unify vdso_install rules
 
  - Remove unused __memexit* annotations
 
  - Eliminate stale whitelisting for __devinit/__devexit from modpost
 
  - Enable dummy-tools to handle the -fpatchable-function-entry flag
 
  - Add 'userldlibs' syntax
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmVFIZgVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGeKwP+wd2kCrxAgS4zPffOcO3cVHfZwJe
 AXOrTp/v73gzxb9eHXH6TmEDf1Rv7EwW3fmmGJosopJGD6itBqzJa4bNDrbq40rY
 XStmg0NRmTrIG20CHGgaGWxb8/7WMrYfu0rhFdUXJjmbny6XwJ3US9FvDPC0mZz7
 w9VCq5CZOqMsJcQyGkAR7uCHDRzNWiZ/Vnfbz3aa6abFzp7dsjhOgDy5SQ6qZgQz
 AwHHKNEN+G3HWmGDZqcbV9aDaCk4btnz64h843RAxjy2HNJF360Ohm2KOcdJr5lo
 DSSStkogBkZNSRQPtqtfknDjzITjeF4JAnUw5ivOtt8ERaO3JRUcr5gHjfw5iV/n
 o4pC1SXmFzdfoN4dogoYF9rz3j955mSFlT/DSbSbuQS/ELzQs0nsqERxhV4zNCsX
 KvYPUqKzZLW3i8pHNuhh7z7t4Nbz1zXqUa19FvaLNtFTCtS8/IA868a59S0uqT9I
 EAIqrNy9qAsk8UuQUxWVx0qf9f5wKGYxW62iMIF9F2lsFRWA8H588CFPUuSU9Bhk
 KAsvzq249MUGJd0RAjF92EWJgNz/nYzZfFTEL5HKAVauYY5UCyR3AVjrak761I8z
 ctVskA7eVkaW4eARfcp15Fna15FHVzxBJ3B26oKYIJBQfJLjzZcV8XeMtEcQjEGU
 jzl+oRqB/Q3oD7Nx
 =PeX7
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Implement the binary search in modpost for faster symbol lookup

 - Respect HOSTCC when linking host programs written in Rust

 - Change the binrpm-pkg target to generate kernel-devel RPM package

 - Fix endianness issues for tee and ishtp MODULE_DEVICE_TABLE

 - Unify vdso_install rules

 - Remove unused __memexit* annotations

 - Eliminate stale whitelisting for __devinit/__devexit from modpost

 - Enable dummy-tools to handle the -fpatchable-function-entry flag

 - Add 'userldlibs' syntax

* tag 'kbuild-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (30 commits)
  kbuild: support 'userldlibs' syntax
  kbuild: dummy-tools: pretend we understand -fpatchable-function-entry
  kbuild: Correct missing architecture-specific hyphens
  modpost: squash ALL_{INIT,EXIT}_TEXT_SECTIONS to ALL_TEXT_SECTIONS
  modpost: merge sectioncheck table entries regarding init/exit sections
  modpost: use ALL_INIT_SECTIONS for the section check from DATA_SECTIONS
  modpost: disallow the combination of EXPORT_SYMBOL and __meminit*
  modpost: remove EXIT_SECTIONS macro
  modpost: remove MEM_INIT_SECTIONS macro
  modpost: remove more symbol patterns from the section check whitelist
  modpost: disallow *driver to reference .meminit* sections
  linux/init: remove __memexit* annotations
  modpost: remove ALL_EXIT_DATA_SECTIONS macro
  kbuild: simplify cmd_ld_multi_m
  kbuild: avoid too many execution of scripts/pahole-flags.sh
  kbuild: remove ARCH_POSTLINK from module builds
  kbuild: unify no-compiler-targets and no-sync-config-targets
  kbuild: unify vdso_install rules
  docs: kbuild: add INSTALL_DTBS_PATH
  UML: remove unused cmd_vdso_install
  ...
2023-11-04 08:07:19 -10:00
Linus Torvalds
1f24458a10 TTY/Serial changes for 6.7-rc1
Here is the big set of tty/serial driver changes for 6.7-rc1.  Included
 in here are:
   - console/vgacon cleanups and removals from Arnd
   - tty core and n_tty cleanups from Jiri
   - lots of 8250 driver updates and cleanups
   - sc16is7xx serial driver updates
   - dt binding updates
   - first set of port lock wrapers from Thomas for the printk fixes
     coming in future releases
   - other small serial and tty core cleanups and updates
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZUTbaw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk9+gCeKdoRb8FDwGCO/GaoHwR4EzwQXhQAoKXZRmN5
 LTtw9sbfGIiBdOTtgLPb
 =6PJr
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty and serial updates from Greg KH:
 "Here is the big set of tty/serial driver changes for 6.7-rc1. Included
  in here are:

   - console/vgacon cleanups and removals from Arnd

   - tty core and n_tty cleanups from Jiri

   - lots of 8250 driver updates and cleanups

   - sc16is7xx serial driver updates

   - dt binding updates

   - first set of port lock wrapers from Thomas for the printk fixes
     coming in future releases

   - other small serial and tty core cleanups and updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (193 commits)
  serdev: Replace custom code with device_match_acpi_handle()
  serdev: Simplify devm_serdev_device_open() function
  serdev: Make use of device_set_node()
  tty: n_gsm: add copyright Siemens Mobility GmbH
  tty: n_gsm: fix race condition in status line change on dead connections
  serial: core: Fix runtime PM handling for pending tx
  vgacon: fix mips/sibyte build regression
  dt-bindings: serial: drop unsupported samsung bindings
  tty: serial: samsung: drop earlycon support for unsupported platforms
  tty: 8250: Add note for PX-835
  tty: 8250: Fix IS-200 PCI ID comment
  tty: 8250: Add Brainboxes Oxford Semiconductor-based quirks
  tty: 8250: Add support for Intashield IX cards
  tty: 8250: Add support for additional Brainboxes PX cards
  tty: 8250: Fix up PX-803/PX-857
  tty: 8250: Fix port count of PX-257
  tty: 8250: Add support for Intashield IS-100
  tty: 8250: Add support for Brainboxes UP cards
  tty: 8250: Add support for additional Brainboxes UC cards
  tty: 8250: Remove UC-257 and UC-431
  ...
2023-11-03 15:44:25 -10:00
Linus Torvalds
8f6f76a6a2 As usual, lots of singleton and doubleton patches all over the tree and
there's little I can say which isn't in the individual changelogs.
 
 The lengthier patch series are
 
 - "kdump: use generic functions to simplify crashkernel reservation in
   arch", from Baoquan He.  This is mainly cleanups and consolidation of
   the "crashkernel=" kernel parameter handling.
 
 - After much discussion, David Laight's "minmax: Relax type checks in
   min() and max()" is here.  Hopefully reduces some typecasting and the
   use of min_t() and max_t().
 
 - A group of patches from Oleg Nesterov which clean up and slightly fix
   our handling of reads from /proc/PID/task/...  and which remove
   task_struct.therad_group.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZUQP9wAKCRDdBJ7gKXxA
 jmOAAQDh8sxagQYocoVsSm28ICqXFeaY9Co1jzBIDdNesAvYVwD/c2DHRqJHEiS4
 63BNcG3+hM9nwGJHb5lyh5m79nBMRg0=
 =On4u
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "As usual, lots of singleton and doubleton patches all over the tree
  and there's little I can say which isn't in the individual changelogs.

  The lengthier patch series are

   - 'kdump: use generic functions to simplify crashkernel reservation
     in arch', from Baoquan He. This is mainly cleanups and
     consolidation of the 'crashkernel=' kernel parameter handling

   - After much discussion, David Laight's 'minmax: Relax type checks in
     min() and max()' is here. Hopefully reduces some typecasting and
     the use of min_t() and max_t()

   - A group of patches from Oleg Nesterov which clean up and slightly
     fix our handling of reads from /proc/PID/task/... and which remove
     task_struct.thread_group"

* tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits)
  scripts/gdb/vmalloc: disable on no-MMU
  scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n
  .mailmap: add address mapping for Tomeu Vizoso
  mailmap: update email address for Claudiu Beznea
  tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions
  .mailmap: map Benjamin Poirier's address
  scripts/gdb: add lx_current support for riscv
  ocfs2: fix a spelling typo in comment
  proc: test ProtectionKey in proc-empty-vm test
  proc: fix proc-empty-vm test with vsyscall
  fs/proc/base.c: remove unneeded semicolon
  do_io_accounting: use sig->stats_lock
  do_io_accounting: use __for_each_thread()
  ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error()
  ocfs2: fix a typo in a comment
  scripts/show_delta: add __main__ judgement before main code
  treewide: mark stuff as __ro_after_init
  fs: ocfs2: check status values
  proc: test /proc/${pid}/statm
  compiler.h: move __is_constexpr() to compiler.h
  ...
2023-11-02 20:53:31 -10:00
Linus Torvalds
ecae0bd517 Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
 
 - Kemeng Shi has contributed some compation maintenance work in the
   series "Fixes and cleanups to compaction".
 
 - Joel Fernandes has a patchset ("Optimize mremap during mutual
   alignment within PMD") which fixes an obscure issue with mremap()'s
   pagetable handling during a subsequent exec(), based upon an
   implementation which Linus suggested.
 
 - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the
   following patch series:
 
 	mm/damon: misc fixups for documents, comments and its tracepoint
 	mm/damon: add a tracepoint for damos apply target regions
 	mm/damon: provide pseudo-moving sum based access rate
 	mm/damon: implement DAMOS apply intervals
 	mm/damon/core-test: Fix memory leaks in core-test
 	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval
 
 - In the series "Do not try to access unaccepted memory" Adrian Hunter
   provides some fixups for the recently-added "unaccepted memory' feature.
   To increase the feature's checking coverage.  "Plug a few gaps where
   RAM is exposed without checking if it is unaccepted memory".
 
 - In the series "cleanups for lockless slab shrink" Qi Zheng has done
   some maintenance work which is preparation for the lockless slab
   shrinking code.
 
 - Qi Zheng has redone the earlier (and reverted) attempt to make slab
   shrinking lockless in the series "use refcount+RCU method to implement
   lockless slab shrink".
 
 - David Hildenbrand contributes some maintenance work for the rmap code
   in the series "Anon rmap cleanups".
 
 - Kefeng Wang does more folio conversions and some maintenance work in
   the migration code.  Series "mm: migrate: more folio conversion and
   unification".
 
 - Matthew Wilcox has fixed an issue in the buffer_head code which was
   causing long stalls under some heavy memory/IO loads.  Some cleanups
   were added on the way.  Series "Add and use bdev_getblk()".
 
 - In the series "Use nth_page() in place of direct struct page
   manipulation" Zi Yan has fixed a potential issue with the direct
   manipulation of hugetlb page frames.
 
 - In the series "mm: hugetlb: Skip initialization of gigantic tail
   struct pages if freed by HVO" has improved our handling of gigantic
   pages in the hugetlb vmmemmep optimizaton code.  This provides
   significant boot time improvements when significant amounts of gigantic
   pages are in use.
 
 - Matthew Wilcox has sent the series "Small hugetlb cleanups" - code
   rationalization and folio conversions in the hugetlb code.
 
 - Yin Fengwei has improved mlock()'s handling of large folios in the
   series "support large folio for mlock"
 
 - In the series "Expose swapcache stat for memcg v1" Liu Shixin has
   added statistics for memcg v1 users which are available (and useful)
   under memcg v2.
 
 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
   prctl so that userspace may direct the kernel to not automatically
   propagate the denial to child processes.  The series is named "MDWE
   without inheritance".
 
 - Kefeng Wang has provided the series "mm: convert numa balancing
   functions to use a folio" which does what it says.
 
 - In the series "mm/ksm: add fork-exec support for prctl" Stefan Roesch
   makes is possible for a process to propagate KSM treatment across
   exec().
 
 - Huang Ying has enhanced memory tiering's calculation of memory
   distances.  This is used to permit the dax/kmem driver to use "high
   bandwidth memory" in addition to Optane Data Center Persistent Memory
   Modules (DCPMM).  The series is named "memory tiering: calculate
   abstract distance based on ACPI HMAT"
 
 - In the series "Smart scanning mode for KSM" Stefan Roesch has
   optimized KSM by teaching it to retain and use some historical
   information from previous scans.
 
 - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the
   series "mm: memcg: fix tracking of pending stats updates values".
 
 - In the series "Implement IOCTL to get and optionally clear info about
   PTEs" Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits
   us to atomically read-then-clear page softdirty state.  This is mainly
   used by CRIU.
 
 - Hugh Dickins contributed the series "shmem,tmpfs: general maintenance"
   - a bunch of relatively minor maintenance tweaks to this code.
 
 - Matthew Wilcox has increased the use of the VMA lock over file-backed
   page faults in the series "Handle more faults under the VMA lock".  Some
   rationalizations of the fault path became possible as a result.
 
 - In the series "mm/rmap: convert page_move_anon_rmap() to
   folio_move_anon_rmap()" David Hildenbrand has implemented some cleanups
   and folio conversions.
 
 - In the series "various improvements to the GUP interface" Lorenzo
   Stoakes has simplified and improved the GUP interface with an eye to
   providing groundwork for future improvements.
 
 - Andrey Konovalov has sent along the series "kasan: assorted fixes and
   improvements" which does those things.
 
 - Some page allocator maintenance work from Kemeng Shi in the series
   "Two minor cleanups to break_down_buddy_pages".
 
 - In thes series "New selftest for mm" Breno Leitao has developed
   another MM self test which tickles a race we had between madvise() and
   page faults.
 
 - In the series "Add folio_end_read" Matthew Wilcox provides cleanups
   and an optimization to the core pagecache code.
 
 - Nhat Pham has added memcg accounting for hugetlb memory in the series
   "hugetlb memcg accounting".
 
 - Cleanups and rationalizations to the pagemap code from Lorenzo
   Stoakes, in the series "Abstract vma_merge() and split_vma()".
 
 - Audra Mitchell has fixed issues in the procfs page_owner code's new
   timestamping feature which was causing some misbehaviours.  In the
   series "Fix page_owner's use of free timestamps".
 
 - Lorenzo Stoakes has fixed the handling of new mappings of sealed files
   in the series "permit write-sealed memfd read-only shared mappings".
 
 - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
   series "Batch hugetlb vmemmap modification operations".
 
 - Some buffer_head folio conversions and cleanups from Matthew Wilcox in
   the series "Finish the create_empty_buffers() transition".
 
 - As a page allocator performance optimization Huang Ying has added
   automatic tuning to the allocator's per-cpu-pages feature, in the series
   "mm: PCP high auto-tuning".
 
 - Roman Gushchin has contributed the patchset "mm: improve performance
   of accounted kernel memory allocations" which improves their performance
   by ~30% as measured by a micro-benchmark.
 
 - folio conversions from Kefeng Wang in the series "mm: convert page
   cpupid functions to folios".
 
 - Some kmemleak fixups in Liu Shixin's series "Some bugfix about
   kmemleak".
 
 - Qi Zheng has improved our handling of memoryless nodes by keeping them
   off the allocation fallback list.  This is done in the series "handle
   memoryless nodes more appropriately".
 
 - khugepaged conversions from Vishal Moola in the series "Some
   khugepaged folio conversions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZULEMwAKCRDdBJ7gKXxA
 jhQHAQCYpD3g849x69DmHnHWHm/EHQLvQmRMDeYZI+nx/sCJOwEAw4AKg0Oemv9y
 FgeUPAD1oasg6CP+INZvCj34waNxwAc=
 =E+Y4
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Kemeng Shi has contributed some compation maintenance work in the
     series 'Fixes and cleanups to compaction'

   - Joel Fernandes has a patchset ('Optimize mremap during mutual
     alignment within PMD') which fixes an obscure issue with mremap()'s
     pagetable handling during a subsequent exec(), based upon an
     implementation which Linus suggested

   - More DAMON/DAMOS maintenance and feature work from SeongJae Park i
     the following patch series:

	mm/damon: misc fixups for documents, comments and its tracepoint
	mm/damon: add a tracepoint for damos apply target regions
	mm/damon: provide pseudo-moving sum based access rate
	mm/damon: implement DAMOS apply intervals
	mm/damon/core-test: Fix memory leaks in core-test
	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval

   - In the series 'Do not try to access unaccepted memory' Adrian
     Hunter provides some fixups for the recently-added 'unaccepted
     memory' feature. To increase the feature's checking coverage. 'Plug
     a few gaps where RAM is exposed without checking if it is
     unaccepted memory'

   - In the series 'cleanups for lockless slab shrink' Qi Zheng has done
     some maintenance work which is preparation for the lockless slab
     shrinking code

   - Qi Zheng has redone the earlier (and reverted) attempt to make slab
     shrinking lockless in the series 'use refcount+RCU method to
     implement lockless slab shrink'

   - David Hildenbrand contributes some maintenance work for the rmap
     code in the series 'Anon rmap cleanups'

   - Kefeng Wang does more folio conversions and some maintenance work
     in the migration code. Series 'mm: migrate: more folio conversion
     and unification'

   - Matthew Wilcox has fixed an issue in the buffer_head code which was
     causing long stalls under some heavy memory/IO loads. Some cleanups
     were added on the way. Series 'Add and use bdev_getblk()'

   - In the series 'Use nth_page() in place of direct struct page
     manipulation' Zi Yan has fixed a potential issue with the direct
     manipulation of hugetlb page frames

   - In the series 'mm: hugetlb: Skip initialization of gigantic tail
     struct pages if freed by HVO' has improved our handling of gigantic
     pages in the hugetlb vmmemmep optimizaton code. This provides
     significant boot time improvements when significant amounts of
     gigantic pages are in use

   - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code
     rationalization and folio conversions in the hugetlb code

   - Yin Fengwei has improved mlock()'s handling of large folios in the
     series 'support large folio for mlock'

   - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has
     added statistics for memcg v1 users which are available (and
     useful) under memcg v2

   - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
     prctl so that userspace may direct the kernel to not automatically
     propagate the denial to child processes. The series is named 'MDWE
     without inheritance'

   - Kefeng Wang has provided the series 'mm: convert numa balancing
     functions to use a folio' which does what it says

   - In the series 'mm/ksm: add fork-exec support for prctl' Stefan
     Roesch makes is possible for a process to propagate KSM treatment
     across exec()

   - Huang Ying has enhanced memory tiering's calculation of memory
     distances. This is used to permit the dax/kmem driver to use 'high
     bandwidth memory' in addition to Optane Data Center Persistent
     Memory Modules (DCPMM). The series is named 'memory tiering:
     calculate abstract distance based on ACPI HMAT'

   - In the series 'Smart scanning mode for KSM' Stefan Roesch has
     optimized KSM by teaching it to retain and use some historical
     information from previous scans

   - Yosry Ahmed has fixed some inconsistencies in memcg statistics in
     the series 'mm: memcg: fix tracking of pending stats updates
     values'

   - In the series 'Implement IOCTL to get and optionally clear info
     about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap
     which permits us to atomically read-then-clear page softdirty
     state. This is mainly used by CRIU

   - Hugh Dickins contributed the series 'shmem,tmpfs: general
     maintenance', a bunch of relatively minor maintenance tweaks to
     this code

   - Matthew Wilcox has increased the use of the VMA lock over
     file-backed page faults in the series 'Handle more faults under the
     VMA lock'. Some rationalizations of the fault path became possible
     as a result

   - In the series 'mm/rmap: convert page_move_anon_rmap() to
     folio_move_anon_rmap()' David Hildenbrand has implemented some
     cleanups and folio conversions

   - In the series 'various improvements to the GUP interface' Lorenzo
     Stoakes has simplified and improved the GUP interface with an eye
     to providing groundwork for future improvements

   - Andrey Konovalov has sent along the series 'kasan: assorted fixes
     and improvements' which does those things

   - Some page allocator maintenance work from Kemeng Shi in the series
     'Two minor cleanups to break_down_buddy_pages'

   - In thes series 'New selftest for mm' Breno Leitao has developed
     another MM self test which tickles a race we had between madvise()
     and page faults

   - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups
     and an optimization to the core pagecache code

   - Nhat Pham has added memcg accounting for hugetlb memory in the
     series 'hugetlb memcg accounting'

   - Cleanups and rationalizations to the pagemap code from Lorenzo
     Stoakes, in the series 'Abstract vma_merge() and split_vma()'

   - Audra Mitchell has fixed issues in the procfs page_owner code's new
     timestamping feature which was causing some misbehaviours. In the
     series 'Fix page_owner's use of free timestamps'

   - Lorenzo Stoakes has fixed the handling of new mappings of sealed
     files in the series 'permit write-sealed memfd read-only shared
     mappings'

   - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
     series 'Batch hugetlb vmemmap modification operations'

   - Some buffer_head folio conversions and cleanups from Matthew Wilcox
     in the series 'Finish the create_empty_buffers() transition'

   - As a page allocator performance optimization Huang Ying has added
     automatic tuning to the allocator's per-cpu-pages feature, in the
     series 'mm: PCP high auto-tuning'

   - Roman Gushchin has contributed the patchset 'mm: improve
     performance of accounted kernel memory allocations' which improves
     their performance by ~30% as measured by a micro-benchmark

   - folio conversions from Kefeng Wang in the series 'mm: convert page
     cpupid functions to folios'

   - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about
     kmemleak'

   - Qi Zheng has improved our handling of memoryless nodes by keeping
     them off the allocation fallback list. This is done in the series
     'handle memoryless nodes more appropriately'

   - khugepaged conversions from Vishal Moola in the series 'Some
     khugepaged folio conversions'"

[ bcachefs conflicts with the dynamically allocated shrinkers have been
  resolved as per Stephen Rothwell in

     https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/

  with help from Qi Zheng.

  The clone3 test filtering conflict was half-arsed by yours truly ]

* tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits)
  mm/damon/sysfs: update monitoring target regions for online input commit
  mm/damon/sysfs: remove requested targets when online-commit inputs
  selftests: add a sanity check for zswap
  Documentation: maple_tree: fix word spelling error
  mm/vmalloc: fix the unchecked dereference warning in vread_iter()
  zswap: export compression failure stats
  Documentation: ubsan: drop "the" from article title
  mempolicy: migration attempt to match interleave nodes
  mempolicy: mmap_lock is not needed while migrating folios
  mempolicy: alloc_pages_mpol() for NUMA policy without vma
  mm: add page_rmappable_folio() wrapper
  mempolicy: remove confusing MPOL_MF_LAZY dead code
  mempolicy: mpol_shared_policy_init() without pseudo-vma
  mempolicy trivia: use pgoff_t in shared mempolicy tree
  mempolicy trivia: slightly more consistent naming
  mempolicy trivia: delete those ancient pr_debug()s
  mempolicy: fix migrate_pages(2) syscall return nr_failed
  kernfs: drop shared NUMA mempolicy hooks
  hugetlbfs: drop shared NUMA mempolicy pretence
  mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()
  ...
2023-11-02 19:38:47 -10:00
Linus Torvalds
bc3012f4e3 This update includes the following changes:
API:
 
 - Add virtual-address based lskcipher interface.
 - Optimise ahash/shash performance in light of costly indirect calls.
 - Remove ahash alignmask attribute.
 
 Algorithms:
 
 - Improve AES/XTS performance of 6-way unrolling for ppc.
 - Remove some uses of obsolete algorithms (md4, md5, sha1).
 - Add FIPS 202 SHA-3 support in pkcs1pad.
 - Add fast path for single-page messages in adiantum.
 - Remove zlib-deflate.
 
 Drivers:
 
 - Add support for S4 in meson RNG driver.
 - Add STM32MP13x support in stm32.
 - Add hwrng interface support in qcom-rng.
 - Add support for deflate algorithm in hisilicon/zip.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmVB3vgACgkQxycdCkmx
 i6dsOBAAykbnX8BpnpnOXYywE9ZWrl98rAk51MK0N9olZNfg78zRPIv7fFxFdC20
 SDJrDSNPmn0Qvaa5e0EfoAdklsm0k2GkXL/BwPKMKWUsyIoJVYI3WrBMnjBy9xMp
 yfME+h0bKoXJCZKnYkIUSGUejmUPSyRlEylrXoFlH/VWYwAaii/x9zwreQoF+0LR
 KI24A1q8AYs6Dw9HSfndaAub9GOzrqKYs6fSaMG+77Y4UC5aoi5J9Bp2G3uVyHay
 x/0bZtIxKXS9wn+LeG/3GspX23x/I5VwBOdAoMigrYmAIaIg5qgyMszudltTAs4R
 zF1Kh7WsnM5+vpnBSeigzo+/GGOU3QTz8y3tBTg+3ZR7GWGOwQLiizhOYqCyOfAH
 pIm6c++sZw/OOHiL69Nt4HeLKzGNYYWk3s4X/B/6cqoouPfOsfBaQobZNx9zfy7q
 ZNEvSVBjrFX/L6wDSotny1LTWLUNjHbmLaMV5uQZ/SQKEtv19fp2Dl7SsLkHH+3v
 ldOAwfoJR6QcSwz3Ez02TUAvQhtP172Hnxi7u44eiZu2aUboLhCFr7aEU6kVdBCx
 1rIRVHD1oqlOEDRwPRXzhF3I8R4QDORJIxZ6UUhg7yueuI+XCGDsBNC+LqBrBmSR
 IbdjqmSDUBhJyM5yMnt1VFYhqKQ/ZzwZ3JQviwW76Es9pwEIolM=
 =IZmR
 -----END PGP SIGNATURE-----

Merge tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Add virtual-address based lskcipher interface
   - Optimise ahash/shash performance in light of costly indirect calls
   - Remove ahash alignmask attribute

  Algorithms:
   - Improve AES/XTS performance of 6-way unrolling for ppc
   - Remove some uses of obsolete algorithms (md4, md5, sha1)
   - Add FIPS 202 SHA-3 support in pkcs1pad
   - Add fast path for single-page messages in adiantum
   - Remove zlib-deflate

  Drivers:
   - Add support for S4 in meson RNG driver
   - Add STM32MP13x support in stm32
   - Add hwrng interface support in qcom-rng
   - Add support for deflate algorithm in hisilicon/zip"

* tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (283 commits)
  crypto: adiantum - flush destination page before unmapping
  crypto: testmgr - move pkcs1pad(rsa,sha3-*) to correct place
  Documentation/module-signing.txt: bring up to date
  module: enable automatic module signing with FIPS 202 SHA-3
  crypto: asymmetric_keys - allow FIPS 202 SHA-3 signatures
  crypto: rsa-pkcs1pad - Add FIPS 202 SHA-3 support
  crypto: FIPS 202 SHA-3 register in hash info for IMA
  x509: Add OIDs for FIPS 202 SHA-3 hash and signatures
  crypto: ahash - optimize performance when wrapping shash
  crypto: ahash - check for shash type instead of not ahash type
  crypto: hash - move "ahash wrapping shash" functions to ahash.c
  crypto: talitos - stop using crypto_ahash::init
  crypto: chelsio - stop using crypto_ahash::init
  crypto: ahash - improve file comment
  crypto: ahash - remove struct ahash_request_priv
  crypto: ahash - remove crypto_ahash_alignmask
  crypto: gcm - stop using alignmask of ahash
  crypto: chacha20poly1305 - stop using alignmask of ahash
  crypto: ccm - stop using alignmask of ahash
  net: ipv6: stop checking crypto_ahash_alignmask
  ...
2023-11-02 16:15:30 -10:00
Linus Torvalds
6803bd7956 ARM:
* Generalized infrastructure for 'writable' ID registers, effectively
   allowing userspace to opt-out of certain vCPU features for its guest
 
 * Optimization for vSGI injection, opportunistically compressing MPIDR
   to vCPU mapping into a table
 
 * Improvements to KVM's PMU emulation, allowing userspace to select
   the number of PMCs available to a VM
 
 * Guest support for memory operation instructions (FEAT_MOPS)
 
 * Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
   bugs and getting rid of useless code
 
 * Changes to the way the SMCCC filter is constructed, avoiding wasted
   memory allocations when not in use
 
 * Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
   the overhead of errata mitigations
 
 * Miscellaneous kernel and selftest fixes
 
 LoongArch:
 
 * New architecture.  The hardware uses the same model as x86, s390
   and RISC-V, where guest/host mode is orthogonal to supervisor/user
   mode.  The virtualization extensions are very similar to MIPS,
   therefore the code also has some similarities but it's been cleaned
   up to avoid some of the historical bogosities that are found in
   arch/mips.  The kernel emulates MMU, timer and CSR accesses, while
   interrupt controllers are only emulated in userspace, at least for
   now.
 
 RISC-V:
 
 * Support for the Smstateen and Zicond extensions
 
 * Support for virtualizing senvcfg
 
 * Support for virtualized SBI debug console (DBCN)
 
 S390:
 
 * Nested page table management can be monitored through tracepoints
   and statistics
 
 x86:
 
 * Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC,
   which could result in a dropped timer IRQ
 
 * Avoid WARN on systems with Intel IPI virtualization
 
 * Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without
   forcing more common use cases to eat the extra memory overhead.
 
 * Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
   SBPB, aka Selective Branch Predictor Barrier).
 
 * Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
   creating the original vCPU would cause KVM to try to synchronize the vCPU's
   TSC and thus clobber the correct TSC being set by userspace.
 
 * Compute guest wall clock using a single TSC read to avoid generating an
   inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.
 
 * "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
   about a "Firmware Bug" if the bit isn't set for select F/M/S combos.
   Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server
   2022.
 
 * Don't apply side effects to Hyper-V's synthetic timer on writes from
   userspace to fix an issue where the auto-enable behavior can trigger
   spurious interrupts, i.e. do auto-enabling only for guest writes.
 
 * Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
   without PML enabled.
 
 * Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.
 
 * Harden the fast page fault path to guard against encountering an invalid
   root when walking SPTEs.
 
 * Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
 
 * Use the fast path directly from the timer callback when delivering Xen
   timer events, instead of waiting for the next iteration of the run loop.
   This was not done so far because previously proposed code had races,
   but now care is taken to stop the hrtimer at critical points such as
   restarting the timer or saving the timer information for userspace.
 
 * Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag.
 
 * Optimize injection of PMU interrupts that are simultaneous with NMIs.
 
 * Usual handful of fixes for typos and other warts.
 
 x86 - MTRR/PAT fixes and optimizations:
 
 * Clean up code that deals with honoring guest MTRRs when the VM has
   non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
 
 * Zap EPT entries when non-coherent DMA assignment stops/start to prevent
   using stale entries with the wrong memtype.
 
 * Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y.
   This was done as a workaround for virtual machine BIOSes that did not
   bother to clear CR0.CD (because ancient KVM/QEMU did not bother to
   set it, in turn), and there's zero reason to extend the quirk to
   also ignore guest PAT.
 
 x86 - SEV fixes:
 
 * Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while
   running an SEV-ES guest.
 
 * Clean up the recognition of emulation failures on SEV guests, when KVM would
   like to "skip" the instruction but it had already been partially emulated.
   This makes it possible to drop a hack that second guessed the (insufficient)
   information provided by the emulator, and just do the right thing.
 
 Documentation:
 
 * Various updates and fixes, mostly for x86
 
 * MTRR and PAT fixes and optimizations:
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmVBZc0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP1LQf+NgsmZ1lkGQlKdSdijoQ856w+k0or
 l2SV1wUwiEdFPSGK+RTUlHV5Y1ni1dn/CqCVIJZKEI3ZtZ1m9/4HKIRXvbMwFHIH
 hx+E4Lnf8YUjsGjKTLd531UKcpphztZavQ6pXLEwazkSkDEra+JIKtooI8uU+9/p
 bd/eF1V+13a8CHQf1iNztFJVxqBJbVlnPx4cZDRQQvewskIDGnVDtwbrwCUKGtzD
 eNSzhY7si6O2kdQNkuA8xPhg29dYX9XLaCK2K1l8xOUm8WipLdtF86GAKJ5BVuOL
 6ek/2QCYjZ7a+coAZNfgSEUi8JmFHEqCo7cnKmWzPJp+2zyXsdudqAhT1g==
 =UIxm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Generalized infrastructure for 'writable' ID registers, effectively
     allowing userspace to opt-out of certain vCPU features for its
     guest

   - Optimization for vSGI injection, opportunistically compressing
     MPIDR to vCPU mapping into a table

   - Improvements to KVM's PMU emulation, allowing userspace to select
     the number of PMCs available to a VM

   - Guest support for memory operation instructions (FEAT_MOPS)

   - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
     bugs and getting rid of useless code

   - Changes to the way the SMCCC filter is constructed, avoiding wasted
     memory allocations when not in use

   - Load the stage-2 MMU context at vcpu_load() for VHE systems,
     reducing the overhead of errata mitigations

   - Miscellaneous kernel and selftest fixes

  LoongArch:

   - New architecture for kvm.

     The hardware uses the same model as x86, s390 and RISC-V, where
     guest/host mode is orthogonal to supervisor/user mode. The
     virtualization extensions are very similar to MIPS, therefore the
     code also has some similarities but it's been cleaned up to avoid
     some of the historical bogosities that are found in arch/mips. The
     kernel emulates MMU, timer and CSR accesses, while interrupt
     controllers are only emulated in userspace, at least for now.

  RISC-V:

   - Support for the Smstateen and Zicond extensions

   - Support for virtualizing senvcfg

   - Support for virtualized SBI debug console (DBCN)

  S390:

   - Nested page table management can be monitored through tracepoints
     and statistics

  x86:

   - Fix incorrect handling of VMX posted interrupt descriptor in
     KVM_SET_LAPIC, which could result in a dropped timer IRQ

   - Avoid WARN on systems with Intel IPI virtualization

   - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs
     without forcing more common use cases to eat the extra memory
     overhead.

   - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
     SBPB, aka Selective Branch Predictor Barrier).

   - Fix a bug where restoring a vCPU snapshot that was taken within 1
     second of creating the original vCPU would cause KVM to try to
     synchronize the vCPU's TSC and thus clobber the correct TSC being
     set by userspace.

   - Compute guest wall clock using a single TSC read to avoid
     generating an inaccurate time, e.g. if the vCPU is preempted
     between multiple TSC reads.

   - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which
     complain about a "Firmware Bug" if the bit isn't set for select
     F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to
     appease Windows Server 2022.

   - Don't apply side effects to Hyper-V's synthetic timer on writes
     from userspace to fix an issue where the auto-enable behavior can
     trigger spurious interrupts, i.e. do auto-enabling only for guest
     writes.

   - Remove an unnecessary kick of all vCPUs when synchronizing the
     dirty log without PML enabled.

   - Advertise "support" for non-serializing FS/GS base MSR writes as
     appropriate.

   - Harden the fast page fault path to guard against encountering an
     invalid root when walking SPTEs.

   - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.

   - Use the fast path directly from the timer callback when delivering
     Xen timer events, instead of waiting for the next iteration of the
     run loop. This was not done so far because previously proposed code
     had races, but now care is taken to stop the hrtimer at critical
     points such as restarting the timer or saving the timer information
     for userspace.

   - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future
     flag.

   - Optimize injection of PMU interrupts that are simultaneous with
     NMIs.

   - Usual handful of fixes for typos and other warts.

  x86 - MTRR/PAT fixes and optimizations:

   - Clean up code that deals with honoring guest MTRRs when the VM has
     non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.

   - Zap EPT entries when non-coherent DMA assignment stops/start to
     prevent using stale entries with the wrong memtype.

   - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y

     This was done as a workaround for virtual machine BIOSes that did
     not bother to clear CR0.CD (because ancient KVM/QEMU did not bother
     to set it, in turn), and there's zero reason to extend the quirk to
     also ignore guest PAT.

  x86 - SEV fixes:

   - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts
     SHUTDOWN while running an SEV-ES guest.

   - Clean up the recognition of emulation failures on SEV guests, when
     KVM would like to "skip" the instruction but it had already been
     partially emulated. This makes it possible to drop a hack that
     second guessed the (insufficient) information provided by the
     emulator, and just do the right thing.

  Documentation:

   - Various updates and fixes, mostly for x86

   - MTRR and PAT fixes and optimizations"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits)
  KVM: selftests: Avoid using forced target for generating arm64 headers
  tools headers arm64: Fix references to top srcdir in Makefile
  KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
  KVM: arm64: selftest: Perform ISB before reading PAR_EL1
  KVM: arm64: selftest: Add the missing .guest_prepare()
  KVM: arm64: Always invalidate TLB for stage-2 permission faults
  KVM: x86: Service NMI requests after PMI requests in VM-Enter path
  KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
  KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
  KVM: arm64: Refine _EL2 system register list that require trap reinjection
  arm64: Add missing _EL2 encodings
  arm64: Add missing _EL12 encodings
  KVM: selftests: aarch64: vPMU test for validating user accesses
  KVM: selftests: aarch64: vPMU register test for unimplemented counters
  KVM: selftests: aarch64: vPMU register test for implemented counters
  KVM: selftests: aarch64: Introduce vpmu_counter_access test
  tools: Import arm_pmuv3.h
  KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
  KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
  KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  ...
2023-11-02 15:45:15 -10:00
Linus Torvalds
27beb3ca34 pci-v6.7-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmVBaU8UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwEdxAAo++s98+ZaaTdUuoV0Zpft1fuY6Yr
 mR80jUDxjHDbcI1G4iNVUSWG6pGIdlURnrBp5kU74FV9R2Ps3Fl49XQUHowE0HfH
 D/qmihiJQdnMsQKwzw3XGoTSINrDcF6nLafl9brBItVkgjNxfxSEbnweJMBf+Boc
 rpRXHzxbVHVjwwhBLODF2Wt/8sQ24w9c+wcQkpo7im8ZZReoigNMKgEa4J7tLlqA
 vTyPR/K6QeU8IBUk2ObCY3GeYrVuqi82eRK3Uwzu7IkQwA9orE416Okvq3Z026/h
 TUAivtrcygHaFRdGNvzspYLbc2hd2sEXF+KKKb6GNAjxuDWUhVQW4ObY4FgFkZ65
 Gqz/05D6c1dqTS3vTxp3nZYpvPEbNnO1RaGRL4h0/mbU+QSPSlHXWd9Lfg6noVVd
 3O+CcstQK8RzMiiWLeyctRPV5XIf7nGVQTJW5aCLajlHeJWcvygNpNG4N57j/hXQ
 gyEHrz3idXXHXkBKmyWZfre6YpLkxZtKyONZDHWI/AVhU0TgRdJWmqpRfC1kVVUe
 IUWBRcPUF4/r3jEu6t10N/aDWQN1uQzIsJNnCrKzAddPDTTYQJk8VVzKPo8SVxPD
 X+OjEMgBB/fXUfkJ7IMwgYnWaFJhxthrs6/3j1UqRvGYRoulE4NdWwJDky9UYIHd
 qV3dzuAxC/cpv08=
 =G//C
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Use acpi_evaluate_dsm_typed() instead of open-coding _DSM
     evaluation to learn device characteristics (Andy Shevchenko)

   - Tidy multi-function header checks using new PCI_HEADER_TYPE_MASK
     definition (Ilpo Järvinen)

   - Simplify config access error checking in various drivers (Ilpo
     Järvinen)

   - Use pcie_capability_clear_word() (not
     pcie_capability_clear_and_set_word()) when only clearing (Ilpo
     Järvinen)

   - Add pci_get_base_class() to simplify finding devices using base
     class only (ignoring subclass and programming interface) (Sui
     Jingfeng)

   - Add pci_is_vga(), which includes ancient PCI_CLASS_NOT_DEFINED_VGA
     devices from before the Class Code was added to PCI (Sui Jingfeng)

   - Use pci_is_vga() for vgaarb, sysfs "boot_vga", virtio, qxl to
     include ancient VGA devices (Sui Jingfeng)

  Resource management:

   - Make pci_assign_unassigned_resources() non-init because sparc uses
     it after init (Randy Dunlap)

  Driver binding:

   - Retain .remove() and .probe() callbacks (previously __init) because
     sysfs may cause them to be called later (Uwe Kleine-König)

   - Prevent xHCI driver from claiming AMD VanGogh USB3 DRD device, so
     it can be claimed by dwc3 instead (Vicki Pfau)

  PCI device hotplug:

   - Add Ampere Altra Attention Indicator extension driver for acpiphp
     (D Scott Phillips)

  Power management:

   - Quirk VideoPropulsion Torrent QN16e with longer delay after reset
     (Lukas Wunner)

   - Prevent users from overriding drivers that say we shouldn't use
     D3cold (Lukas Wunner)

   - Avoid PME from D3hot/D3cold for AMD Rembrandt and Phoenix USB4
     because wakeup interrupts from those states don't work if amd-pmc
     has put the platform in a hardware sleep state (Mario Limonciello)

  IOMMU:

   - Disable ATS for Intel IPU E2000 devices with invalidation message
     endianness erratum (Bartosz Pawlowski)

  Error handling:

   - Factor out interrupt enable/disable into helpers (Kai-Heng Feng)

  Peer-to-peer DMA:

   - Fix flexible-array usage in struct pci_p2pdma_pagemap in case we
     ever use pagemaps with multiple entries (Gustavo A. R. Silva)

  ASPM:

   - Revert a change that broke when drivers disabled L1 and users later
     enabled an L1.x substate via sysfs, and fix a similar issue when
     users disabled L1 via sysfs (Heiner Kallweit)

  Endpoint framework:

   - Fix double free in __pci_epc_create() (Dan Carpenter)

   - Use IS_ERR_OR_NULL() to simplify endpoint core (Ruan Jinjie)

  Cadence PCIe controller driver:

   - Drop unused "is_rc" member (Li Chen)

  Freescale Layerscape PCIe controller driver:

   - Enable 64-bit addressing in endpoint mode (Guanhua Gao)

  Intel VMD host bridge driver:

   - Fix multi-function header check (Ilpo Järvinen)

  Microsoft Hyper-V host bridge driver:

   - Annotate struct hv_dr_state with __counted_by (Kees Cook)

  NVIDIA Tegra194 PCIe controller driver:

   - Drop setting of LNKCAP_MLW (max link width) since dw_pcie_setup()
     already does this via dw_pcie_link_set_max_link_width() (Yoshihiro
     Shimoda)

  Qualcomm PCIe controller driver:

   - Use PCIE_SPEED2MBS_ENC() to simplify encoding of link speed
     (Manivannan Sadhasivam)

   - Add a .write_dbi2() callback so DBI2 register writes, e.g., for
     setting the BAR size, work correctly (Manivannan Sadhasivam)

   - Enable ASPM for platforms that use 1.9.0 ops, because the PCI core
     doesn't enable ASPM states that haven't been enabled by the
     firmware (Manivannan Sadhasivam)

  Renesas R-Car Gen4 PCIe controller driver:

   - Add DesignWare core support (set max link width, EDMA_UNROLL flag,
     .pre_init(), .deinit(), etc) for use by R-Car Gen4 driver
     (Yoshihiro Shimoda)

   - Add driver and DT schema for DesignWare-based Renesas R-Car Gen4
     controller in both host and endpoint mode (Yoshihiro Shimoda)

  Xilinx NWL PCIe controller driver:

   - Update ECAM size to support 256 buses (Thippeswamy Havalige)

   - Stop setting bridge primary/secondary/subordinate bus numbers,
     since PCI core does this (Thippeswamy Havalige)

  Xilinx XDMA controller driver:

   - Add driver and DT schema for Zynq UltraScale+ MPSoCs devices with
     Xilinx XDMA Soft IP (Thippeswamy Havalige)

  Miscellaneous:

   - Use FIELD_GET()/FIELD_PREP() to simplify and reduce use of _SHIFT
     macros (Ilpo Järvinen, Bjorn Helgaas)

   - Remove logic_outb(), _outw(), outl() duplicate declarations (John
     Sanpe)

   - Replace unnecessary UTF-8 in Kconfig help text because menuconfig
     doesn't render it correctly (Liu Song)"

* tag 'pci-v6.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (102 commits)
  PCI: qcom-ep: Add dedicated callback for writing to DBI2 registers
  PCI: Simplify pcie_capability_clear_and_set_word() to ..._clear_word()
  PCI: endpoint: Fix double free in __pci_epc_create()
  PCI: xilinx-xdma: Add Xilinx XDMA Root Port driver
  dt-bindings: PCI: xilinx-xdma: Add schemas for Xilinx XDMA PCIe Root Port Bridge
  PCI: xilinx-cpm: Move IRQ definitions to a common header
  PCI: xilinx-nwl: Modify ECAM size to enable support for 256 buses
  PCI: xilinx-nwl: Rename the NWL_ECAM_VALUE_DEFAULT macro
  dt-bindings: PCI: xilinx-nwl: Modify ECAM size in the DT example
  PCI: xilinx-nwl: Remove redundant code that sets Type 1 header fields
  PCI: hotplug: Add Ampere Altra Attention Indicator extension driver
  PCI/AER: Factor out interrupt toggling into helpers
  PCI: acpiphp: Allow built-in drivers for Attention Indicators
  PCI/portdrv: Use FIELD_GET()
  PCI/VC: Use FIELD_GET()
  PCI/PTM: Use FIELD_GET()
  PCI/PME: Use FIELD_GET()
  PCI/ATS: Use FIELD_GET()
  PCI/ATS: Show PASID Capability register width in bitmasks
  PCI/ASPM: Fix L1 substate handling in aspm_attr_store_common()
  ...
2023-11-02 14:05:18 -10:00
Linus Torvalds
426ee5196d sysctl-6.7-rc1
To help make the move of sysctls out of kernel/sysctl.c not incur a size
 penalty sysctl has been changed to allow us to not require the sentinel, the
 final empty element on the sysctl array. Joel Granados has been doing all this
 work. On the v6.6 kernel we got the major infrastructure changes required to
 support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove
 the sentinel. Both arch and driver changes have been on linux-next for a bit
 less than a month. It is worth re-iterating the value:
 
   - this helps reduce the overall build time size of the kernel and run time
      memory consumed by the kernel by about ~64 bytes per array
   - the extra 64-byte penalty is no longer inncurred now when we move sysctls
     out from kernel/sysctl.c to their own files
 
 For v6.8-rc1 expect removal of all the sentinels and also then the unneeded
 check for procname == NULL.
 
 The last 2 patches are fixes recently merged by Krister Johansen which allow
 us again to use softlockup_panic early on boot. This used to work but the
 alias work broke it. This is useful for folks who want to detect softlockups
 super early rather than wait and spend money on cloud solutions with nothing
 but an eventual hung kernel. Although this hadn't gone through linux-next it's
 also a stable fix, so we might as well roll through the fixes now.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmVCqKsSHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinEgYQAIpkqRL85DBwems19Uk9A27lkctwZ6Fc
 HdslQCObQTsbuKVimZFP4IL2beUfUE0cfLZCXlzp+4nRDOf6vyhyf3w19jPQtI0Q
 YdqwTk9y6G5VjDsb35QK0+UBloY/kZ1H3/LW4uCwjXTuksUGmWW2Qvey35696Scv
 hDMLADqKQmdpYxLUaNi9QyYbEAjYtOai2ezg3+i7hTG168t1k/Ab2BxIFrPVsCR2
 FAiq05L4ugWjNskdsWBjck05JZsx9SK/qcAxpIPoUm4nGiFNHApXE0E0hs3vsnmn
 WIHIbxCQw8ZlUDlmw4S+0YH3NFFzFbWfmW8k2b0f2qZTJm/rU4KiJfcJVknkAUVF
 raFox6XDW0AUQ9L/NOUJ9ip5rup57GcFrMYocdJ3PPAvvmHKOb1D1O741p75RRcc
 9j7zwfIRrzjPUqzhsQS/GFjdJu3lJNmEBK1AcgrVry6WoItrAzJHKPPDC7TwaNmD
 eXpjxMl1sYzzHqtVh4hn+xkUYphj/6gTGMV8zdo+/FopFswgeJW9G8kHtlEWKDPk
 MRIKwACmfetP6f3ngHunBg+BOipbjCANL7JI0nOhVOQoaULxCCPx+IPJ6GfSyiuH
 AbcjH8DGI7fJbUkBFoF0dsRFZ2gH8ds1PYMbWUJ6x3FtuCuv5iIuvQYoaWU6itm7
 6f0KvCogg0fU
 =Qf50
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull sysctl updates from Luis Chamberlain:
 "To help make the move of sysctls out of kernel/sysctl.c not incur a
  size penalty sysctl has been changed to allow us to not require the
  sentinel, the final empty element on the sysctl array. Joel Granados
  has been doing all this work. On the v6.6 kernel we got the major
  infrastructure changes required to support this. For v6.7-rc1 we have
  all arch/ and drivers/ modified to remove the sentinel. Both arch and
  driver changes have been on linux-next for a bit less than a month. It
  is worth re-iterating the value:

   - this helps reduce the overall build time size of the kernel and run
     time memory consumed by the kernel by about ~64 bytes per array

   - the extra 64-byte penalty is no longer inncurred now when we move
     sysctls out from kernel/sysctl.c to their own files

  For v6.8-rc1 expect removal of all the sentinels and also then the
  unneeded check for procname == NULL.

  The last two patches are fixes recently merged by Krister Johansen
  which allow us again to use softlockup_panic early on boot. This used
  to work but the alias work broke it. This is useful for folks who want
  to detect softlockups super early rather than wait and spend money on
  cloud solutions with nothing but an eventual hung kernel. Although
  this hadn't gone through linux-next it's also a stable fix, so we
  might as well roll through the fixes now"

* tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits)
  watchdog: move softlockup_panic back to early_param
  proc: sysctl: prevent aliased sysctls from getting passed to init
  intel drm: Remove now superfluous sentinel element from ctl_table array
  Drivers: hv: Remove now superfluous sentinel element from ctl_table array
  raid: Remove now superfluous sentinel element from ctl_table array
  fw loader: Remove the now superfluous sentinel element from ctl_table array
  sgi-xp: Remove the now superfluous sentinel element from ctl_table array
  vrf: Remove the now superfluous sentinel element from ctl_table array
  char-misc: Remove the now superfluous sentinel element from ctl_table array
  infiniband: Remove the now superfluous sentinel element from ctl_table array
  macintosh: Remove the now superfluous sentinel element from ctl_table array
  parport: Remove the now superfluous sentinel element from ctl_table array
  scsi: Remove now superfluous sentinel element from ctl_table array
  tty: Remove now superfluous sentinel element from ctl_table array
  xen: Remove now superfluous sentinel element from ctl_table array
  hpet: Remove now superfluous sentinel element from ctl_table array
  c-sky: Remove now superfluous sentinel element from ctl_talbe array
  powerpc: Remove now superfluous sentinel element from ctl_table arrays
  riscv: Remove now superfluous sentinel element from ctl_table array
  x86/vdso: Remove now superfluous sentinel element from ctl_table array
  ...
2023-11-01 20:51:41 -10:00
Linus Torvalds
1e0c505e13 asm-generic updates for v6.7
The ia64 architecture gets its well-earned retirement as planned,
 now that there is one last (mostly) working release that will
 be maintained as an LTS kernel.
 
 The architecture specific system call tables are updated for
 the added map_shadow_stack() syscall and to remove references
 to the long-gone sys_lookup_dcookie() syscall.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmVC40IACgkQYKtH/8kJ
 Uidhmw/9EX+aWSXGoObJ3fngaNSMw+PmrEuP8qEKBHxfKHcCdX3hc451Oh4GlhaQ
 tru91pPwgNvN2/rfoKusxT+V4PemGIzfNni/04rp+P0kvmdw5otQ2yNhsQNsfVmq
 XGWvkxF4P2GO6bkjjfR/1dDq7GtlyXtwwPDKeLbYb6TnJOZjtx+EAN27kkfSn1Ms
 R4Sa3zJ+DfHUmHL5S9g+7UD/CZ5GfKNmIskI4Mz5GsfoUz/0iiU+Bge/9sdcdSJQ
 kmbLy5YnVzfooLZ3TQmBFsO3iAMWb0s/mDdtyhqhTVmTUshLolkPYyKnPFvdupyv
 shXcpEST2XJNeaDRnL2K4zSCdxdbnCZHDpjfl9wfioBg7I8NfhXKpf1jYZHH1de4
 LXq8ndEFEOVQw/zSpYWfQq1sux8Jiqr+UK/ukbVeFWiGGIUs91gEWtPAf8T0AZo9
 ujkJvaWGl98O1g5wmBu0/dAR6QcFJMDfVwbmlIFpU8O+MEaz6X8mM+O5/T0IyTcD
 eMbAUjj4uYcU7ihKzHEv/0SS9Of38kzff67CLN5k8wOP/9NlaGZ78o1bVle9b52A
 BdhrsAefFiWHp1jT6Y9Rg4HOO/TguQ9e6EWSKOYFulsiLH9LEFaB9RwZLeLytV0W
 vlAgY9rUW77g1OJcb7DoNv33nRFuxsKqsnz3DEIXtgozo9CzbYI=
 =H1vH
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull ia64 removal and asm-generic updates from Arnd Bergmann:

 - The ia64 architecture gets its well-earned retirement as planned,
   now that there is one last (mostly) working release that will be
   maintained as an LTS kernel.

 - The architecture specific system call tables are updated for the
   added map_shadow_stack() syscall and to remove references to the
   long-gone sys_lookup_dcookie() syscall.

* tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  hexagon: Remove unusable symbols from the ptrace.h uapi
  asm-generic: Fix spelling of architecture
  arch: Reserve map_shadow_stack() syscall number for all architectures
  syscalls: Cleanup references to sys_lookup_dcookie()
  Documentation: Drop or replace remaining mentions of IA64
  lib/raid6: Drop IA64 support
  Documentation: Drop IA64 from feature descriptions
  kernel: Drop IA64 support from sig_fault handlers
  arch: Remove Itanium (IA-64) architecture
2023-11-01 15:28:33 -10:00
Linus Torvalds
8999ad99f4 * Refactor and clean up TDX hypercall/module call infrastructure
* Handle retrying/resuming page conversion hypercalls
  * Make sure to use the (shockingly) reliable TSC in TDX guests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmVBlqMACgkQaDWVMHDJ
 krBrhBAArKay0MvzmdzS4IQs8JqkmuMEHI6WabYv2POPjJNXrn5MelLH972pLuX9
 NJ3+yeOLmNMYwqu5qwLCxyeO5CtqEyT2lNumUrxAtHQG4+oS2RYJYUalxMuoGxt8
 fAHxbItFg0TobBSUtwcnN2R2WdXwPuUW0Co+pJfLlZV4umVM7QANO1nf1g8YmlDD
 sVtpDaeKJRdylmwgWgAyGow0tDKd6oZB9j/vOHvZRrEQ+DMjEtG75fjwbjbu43Cl
 tI/fbxKjzAkOFcZ7PEPsQ8jE1h9DXU+JzTML9Nu/cPMalxMeBg3Dol/JOEbqgreI
 4W8Lg7g071EkUcQDxpwfe4aS6rsfsbwUIV4gJVkg9ZhlT7RayWsFik2CfBpJ4IMQ
 TM8BxtCEGCz3cxVvg3mstX9rRA7eNlXOzcKE/8Y7cpSsp94bA9jtf2GgUSUoi9St
 y+fIEei8mgeHutdiFh8psrmR7hp6iX/ldMFqHtjNo6xatf2KjdVHhVSU13Jz544z
 43ATNi1gZeHOgfwlAlIxLPDVDJidHuux3f6g2vfMkAqItyEqFauC1HA1pIDgckoY
 9FpBPp9vNUToSPp6reB6z/PkEBIrG2XtQh82JLt2CnCb6aTUtnPds+psjtT4sSE/
 a9SQvZLWWmpj+BlI2yrtfJzhy7SwhltgdjItQHidmCNEn0PYfTc=
 =FJ1Y
 -----END PGP SIGNATURE-----

Merge tag 'x86_tdx_for_6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 TDX updates from Dave Hansen:
 "The majority of this is a rework of the assembly and C wrappers that
  are used to talk to the TDX module and VMM. This is a nice cleanup in
  general but is also clearing the way for using this code when Linux is
  the TDX VMM.

  There are also some tidbits to make TDX guests play nicer with Hyper-V
  and to take advantage the hardware TSC.

  Summary:

   - Refactor and clean up TDX hypercall/module call infrastructure

   - Handle retrying/resuming page conversion hypercalls

   - Make sure to use the (shockingly) reliable TSC in TDX guests"

[ TLA reminder: TDX is "Trust Domain Extensions", Intel's guest VM
  confidentiality technology ]

* tag 'x86_tdx_for_6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tdx: Mark TSC reliable
  x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed()
  x86/virt/tdx: Make TDX_MODULE_CALL handle SEAMCALL #UD and #GP
  x86/virt/tdx: Wire up basic SEAMCALL functions
  x86/tdx: Remove 'struct tdx_hypercall_args'
  x86/tdx: Reimplement __tdx_hypercall() using TDX_MODULE_CALL asm
  x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL
  x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs
  x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structure
  x86/tdx: Rename __tdx_module_call() to __tdcall()
  x86/tdx: Make macros of TDCALLs consistent with the spec
  x86/tdx: Skip saving output regs when SEAMCALL fails with VMFailInvalid
  x86/tdx: Zero out the missing RSI in TDX_HYPERCALL macro
  x86/tdx: Retry partially-completed page conversion hypercalls
2023-11-01 10:28:32 -10:00
Linus Torvalds
7d461b291e drm for 6.7-rc1
kernel:
 - add initial vmemdup-user-array
 
 core:
 - fix platform remove() to return void
 - drm_file owner updated to reflect owner
 - move size calcs to drm buddy allocator
 - let GPUVM build as a module
 - allow variable number of run-queues in scheduler
 
 edid:
 - handle bad h/v sync_end in EDIDs
 
 panfrost:
 - add Boris as maintainer
 
 fbdev:
 - use fb_ops helpers more
 - only allow logo use from fbcon
 - rename fb_pgproto to pgprot_framebuffer
 - add HPD state to drm_connector_oob_hotplug_event
 - convert to fbdev i/o mem helpers
 
 i915:
 - Enable meteorlake by default
 - Early Xe2 LPD/Lunarlake display enablement
 - Rework subplatforms into IP version checks
 - GuC based TLB invalidation for Meteorlake
 - Display rework for future Xe driver integration
 - LNL FBC features
 - LNL display feature capability reads
 - update recommended fw versions for DG2+
 - drop fastboot module parameter
 - added deviceid for Arrowlake-S
 - drop preproduction workarounds
 - don't disable preemption for resets
 - cleanup inlines in headers
 - PXP firmware loading fix
 - Fix sg list lengths
 - DSC PPS state readout/verification
 - Add more RPL P/U PCI IDs
 - Add new DG2-G12 stepping
 - DP enhanced framing support to state checker
 - Improve shared link bandwidth management
 - stop using GEM macros in display code
 - refactor related code into display code
 - locally enable W=1 warnings
 - remove PSR watchdog timers on LNL
 
 amdgpu:
 - RAS/FRU EEPROM updatse
 - IP discovery updatses
 - GC 11.5 support
 - DCN 3.5 support
 - VPE 6.1 support
 - NBIO 7.11 support
 - DML2 support
 - lots of IP updates
 - use flexible arrays for bo list handling
 - W=1 fixes
 - Enable seamless boot in more cases
 - Enable context type property for HDMI
 - Rework GPUVM TLB flushing
 - VCN IB start/size alignment fixes
 
 amdkfd:
 - GC 10/11 fixes
 - GC 11.5 support
 - use partial migration in GPU faults
 
 radeon:
 - W=1 Fixes
 - fix some possible buffer overflow/NULL derefs
 nouveau:
 - update uapi for NO_PREFETCH
 - scheduler/fence fixes
 - rework suspend/resume for GSP-RM
 - rework display in preparation for GSP-RM
 
 habanalabs:
 - uapi: expose tsc clock
 - uapi: block access to eventfd through control device
 - uapi: force dma-buf export to PAGE_SIZE alignments
 - complete move to accel subsystem
 - move firmware interface include files
 - perform hard reset on PCIe AXI drain event
 - optimise user interrupt handling
 
 msm:
 - DP: use existing helpers for DPCD
 - DPU: interrupts reworked
 - gpu: a7xx (a730/a740) support
 - decouple msm_drv from kms for headless devices
 
 mediatek:
 - MT8188 dsi/dp/edp support
 - DDP GAMMA - 12 bit LUT support
 - connector dynamic selection capability
 
 rockchip:
 - rv1126 mipi-dsi/vop support
 - add planar formats
 
 ast:
 - rename constants
 
 panels:
 - Mitsubishi AA084XE01
 - JDI LPM102A188A
 - LTK050H3148W-CTA6
 
 ivpu:
 - power management fixes
 
 qaic:
 - add detach slice bo api
 
 komeda:
 - add NV12 writeback
 
 tegra:
 - support NVSYNC/NHSYNC
 - host1x suspend fixes
 
 ili9882t:
 - separate into own driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmVAgzYACgkQDHTzWXnE
 hr7ZEQ//UXne3tyGOsU3X8r+lstLFDMa90a3hvTg6hX+Q0MjHd/clwkKFkLpkipL
 n7gIZlaHl11dRs0FzrIZA5EVAAgjMLKmIl10NBDFec6ZFA3VERcggx8y61uifI15
 VviMR1VbLHYZaCdyrQOK0A4wcktWnKXyoXp7cwy9crdc2GOBMUZkdIqtvD7jHxQx
 UMIFnzi1CyKUX/Fjt/JceYcNk9y2ZGkzakYO3sHcUdv4DPu9qX4kNzpjF691AZBP
 UeKWvCswTRVg2M0kuo/RYIBzqaTmOlk6dHLWBognIeZPyuyhCcaGC2d64c6tShwQ
 dtHdi+IgyQ8s2qb350ymKTQUP7xA/DfZBwH7LvrZALBxeQGYQN1CnsgDMOS2wcUc
 XrRFiS7PxEOtMMBctcPBnnoV5ttnsLLlPpzM9puh9sUFMn6CgLzcAMqXdqxzMajH
 +dz2aD1N0vMqq4varozOg9SC2QamgUiPN/TQfrulhCTCfQaXczy5x1OYiIz65+Sl
 mKoe2WASuP9Ve8do4N/wEwH5SZY2ItipBdUTRxttY9NTanmV0X5DjZBXH5b9XGci
 Zl5Ar613f9zwm5T5BVA5k6s3ZbGY6QcP5pDNTCPaSgitfFXIdReBZ2CaYzK3MPg/
 Wit/TXrud9yT6VPpI1igboMyasf5QubV1MY1K83kOCWr9u8R2CM=
 =l79u
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "Highlights:
   - AMD adds some more upcoming HW platforms
   - Intel made Meteorlake stable and started adding Lunarlake
   - nouveau has a bunch of display rework in prepartion for the NVIDIA
     GSP firmware support
   - msm adds a7xx support
   - habanalabs has finished migration to accel subsystem

  Detail summary:

  kernel:
   - add initial vmemdup-user-array

  core:
   - fix platform remove() to return void
   - drm_file owner updated to reflect owner
   - move size calcs to drm buddy allocator
   - let GPUVM build as a module
   - allow variable number of run-queues in scheduler

  edid:
   - handle bad h/v sync_end in EDIDs

  panfrost:
   - add Boris as maintainer

  fbdev:
   - use fb_ops helpers more
   - only allow logo use from fbcon
   - rename fb_pgproto to pgprot_framebuffer
   - add HPD state to drm_connector_oob_hotplug_event
   - convert to fbdev i/o mem helpers

  i915:
   - Enable meteorlake by default
   - Early Xe2 LPD/Lunarlake display enablement
   - Rework subplatforms into IP version checks
   - GuC based TLB invalidation for Meteorlake
   - Display rework for future Xe driver integration
   - LNL FBC features
   - LNL display feature capability reads
   - update recommended fw versions for DG2+
   - drop fastboot module parameter
   - added deviceid for Arrowlake-S
   - drop preproduction workarounds
   - don't disable preemption for resets
   - cleanup inlines in headers
   - PXP firmware loading fix
   - Fix sg list lengths
   - DSC PPS state readout/verification
   - Add more RPL P/U PCI IDs
   - Add new DG2-G12 stepping
   - DP enhanced framing support to state checker
   - Improve shared link bandwidth management
   - stop using GEM macros in display code
   - refactor related code into display code
   - locally enable W=1 warnings
   - remove PSR watchdog timers on LNL

  amdgpu:
   - RAS/FRU EEPROM updatse
   - IP discovery updatses
   - GC 11.5 support
   - DCN 3.5 support
   - VPE 6.1 support
   - NBIO 7.11 support
   - DML2 support
   - lots of IP updates
   - use flexible arrays for bo list handling
   - W=1 fixes
   - Enable seamless boot in more cases
   - Enable context type property for HDMI
   - Rework GPUVM TLB flushing
   - VCN IB start/size alignment fixes

  amdkfd:
   - GC 10/11 fixes
   - GC 11.5 support
   - use partial migration in GPU faults

  radeon:
   - W=1 Fixes
   - fix some possible buffer overflow/NULL derefs

  nouveau:
   - update uapi for NO_PREFETCH
   - scheduler/fence fixes
   - rework suspend/resume for GSP-RM
   - rework display in preparation for GSP-RM

  habanalabs:
   - uapi: expose tsc clock
   - uapi: block access to eventfd through control device
   - uapi: force dma-buf export to PAGE_SIZE alignments
   - complete move to accel subsystem
   - move firmware interface include files
   - perform hard reset on PCIe AXI drain event
   - optimise user interrupt handling

  msm:
   - DP: use existing helpers for DPCD
   - DPU: interrupts reworked
   - gpu: a7xx (a730/a740) support
   - decouple msm_drv from kms for headless devices

  mediatek:
   - MT8188 dsi/dp/edp support
   - DDP GAMMA - 12 bit LUT support
   - connector dynamic selection capability

  rockchip:
   - rv1126 mipi-dsi/vop support
   - add planar formats

  ast:
   - rename constants

  panels:
   - Mitsubishi AA084XE01
   - JDI LPM102A188A
   - LTK050H3148W-CTA6

  ivpu:
   - power management fixes

  qaic:
   - add detach slice bo api

  komeda:
   - add NV12 writeback

  tegra:
   - support NVSYNC/NHSYNC
   - host1x suspend fixes

  ili9882t:
   - separate into own driver"

* tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm: (1803 commits)
  drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo
  drm/amdgpu: Remove duplicate fdinfo fields
  drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded
  drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
  drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table
  drm/amdgpu: Identify data parity error corrected in replay mode
  drm/amdgpu: Fix typo in IP discovery parsing
  drm/amd/display: fix S/G display enablement
  drm/amdxcp: fix amdxcp unloads incompletely
  drm/amd/amdgpu: fix the GPU power print error in pm info
  drm/amdgpu: Use pcie domain of xcc acpi objects
  drm/amd: check num of link levels when update pcie param
  drm/amdgpu: Add a read to GFX v9.4.3 ring test
  drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported.
  drm/amdgpu: get RAS poison status from DF v4_6_2
  drm/amdgpu: Use discovery table's subrevision
  drm/amd/display: 3.2.256
  drm/amd/display: add interface to query SubVP status
  drm/amd/display: Read before writing Backlight Mode Set Register
  drm/amd/display: Disable SYMCLK32_SE RCO on DCN314
  ...
2023-11-01 06:28:35 -10:00
Linus Torvalds
59fff63cc2 platform-drivers-x86 for v6.7-1
Highlights:
  - asus-wmi:		Support for screenpad and solve brightness key
 			press duplication
  - int3472:		Eliminate the last use of deprecated GPIO functions
  - mlxbf-pmc:		New HW support
  - msi-ec:		Support new EC configurations
  - thinkpad_acpi:	Support reading aux MAC address during passthrough
  - wmi: 		Fixes & improvements
  - x86-android-tablets:	Detection fix and avoid use of GPIO private APIs
  - Debug & metrics interface improvements
  - Miscellaneous cleanups / fixes / improvements
 
 The following is an automated shortlog grouped by driver:
 
 acer-wmi:
  -  Remove void function return
 
 amd/hsmp:
  -  add support for metrics tbl
  -  create plat specific struct
  -  Fix iomem handling
  -  improve the error log
 
 amd/pmc:
  -  Add dump_custom_stb module parameter
  -  Add PMFW command id to support S2D force flush
  -  Handle overflow cases where the num_samples range is higher
  -  Use flex array when calling amd_pmc_stb_debugfs_open_v2()
 
 asus-wireless:
  -  Replace open coded acpi_match_acpi_device()
 
 asus-wmi:
  -  add support for ASUS screenpad
  -  Do not report brightness up/down keys when also reported by acpi_video
 
 gpiolib: acpi:
  -  Add a ignore interrupt quirk for Peaq C1010
  -  Check if a GPIO is listed in ignore_interrupt earlier
 
 hp-bioscfg:
  -  Annotate struct bios_args with __counted_by
 
 inspur-platform-profile:
  -  Add platform profile support
 
 int3472:
  -  Add new skl_int3472_fill_gpiod_lookup() helper
  -  Add new skl_int3472_gpiod_get_from_temp_lookup() helper
  -  Stop using gpiod_toggle_active_low()
  -  Switch to devm_get_gpiod()
 
 intel: bytcrc_pwrsrc:
  -  Convert to platform remove callback returning void
 
 intel/ifs:
  -  Add new CPU support
  -  Add new error code
  -  ARRAY BIST for Sierra Forest
  -  Gen2 scan image loading
  -  Gen2 Scan test support
  -  Metadata validation for start_chunk
  -  Refactor image loading code
  -  Store IFS generation number
  -  Validate image size
 
 intel_speed_select_if:
  -  Remove hardcoded map size
  -  Use devm_ioremap_resource
 
 intel/tpmi:
  -  Add debugfs support for read/write blocked
  -  Add defines to get version information
 
 intel-uncore-freq:
  -  Ignore minor version change
 
 ISST:
  -  Allow level 0 to be not present
  -  Ignore minor version change
  -  Use fuse enabled mask instead of allowed levels
 
 mellanox:
  -  Fix misspelling error in routine name
  -  Rename some init()/exit() functions for consistent naming
 
 mlxbf-bootctl:
  -  Convert to platform remove callback returning void
 
 mlxbf-pmc:
  -  Add support for BlueField-3
 
 mlxbf-tmfifo:
  -  Convert to platform remove callback returning void
 
 mlx-Convert to platform remove callback returning void:
  - mlx-Convert to platform remove callback returning void
 
 mlxreg-hotplug:
  -  Convert to platform remove callback returning void
 
 mlxreg-io:
  -  Convert to platform remove callback returning void
 
 mlxreg-lc:
  -  Convert to platform remove callback returning void
 
 msi-ec:
  -  Add more EC configs
  -  rename fn_super_swap
 
 nvsw-sn2201:
  -  Convert to platform remove callback returning void
 
 sel3350-Convert to platform remove callback returning void:
  - sel3350-Convert to platform remove callback returning void
 
 siemens: simatic-ipc-batt-apollolake:
  -  Convert to platform remove callback returning void
 
 siemens: simatic-ipc-batt:
  -  Convert to platform remove callback returning void
 
 siemens: simatic-ipc-batt-elkhartlake:
  -  Convert to platform remove callback returning void
 
 siemens: simatic-ipc-batt-f7188x:
  -  Convert to platform remove callback returning void
 
 siemens: simatic-ipc-batt:
  -  Simplify simatic_ipc_batt_remove()
 
 surface: acpi-notify:
  -  Convert to platform remove callback returning void
 
 surface: aggregator:
  -  Annotate struct ssam_event with __counted_by
 
 surface: aggregator-cdev:
  -  Convert to platform remove callback returning void
 
 surface: aggregator-registry:
  -  Convert to platform remove callback returning void
 
 surface: dtx:
  -  Convert to platform remove callback returning void
 
 surface: gpe:
  -  Convert to platform remove callback returning void
 
 surface: hotplug:
  -  Convert to platform remove callback returning void
 
 surface: surface3-wmi:
  -  Convert to platform remove callback returning void
 
 think-lmi:
  -  Add bulk save feature
  -  Replace kstrdup() + strreplace() with kstrdup_and_replace()
  -  Use strreplace() to replace a character by nul
 
 thinkpad_acpi:
  -  Add battery quirk for Thinkpad X120e
  -  replace deprecated strncpy with memcpy
  -  sysfs interface to auxmac
 
 tools/power/x86/intel-speed-select:
  -  Display error for core-power support
  -  Increase max CPUs in one request
  -  No TRL for non compute domains
  -  Sanitize integer arguments
  -  turbo-mode enable disable swapped
  -  Update help for TRL
  -  Use cgroup isolate for CPU 0
  -  v1.18 release
 
 wmi:
  -  Decouple probe deferring from wmi_block_list
  -  Decouple WMI device removal from wmi_block_list
  -  Fix opening of char device
  -  Fix probe failure when failing to register WMI devices
  -  Fix refcounting of WMI devices in legacy functions
 
 x86-android-tablets:
  -  Add a comment about x86_android_tablet_get_gpiod()
  -  Create a platform_device from module_init()
  -  Drop "linux,power-supply-name" from lenovo_yt3_bq25892_0_props[]
  -  Fix Lenovo Yoga Tablet 2 830F/L vs 1050F/L detection
  -  Remove invalid_aei_gpiochip from Peaq C1010
  -  Remove invalid_aei_gpiochip support
  -  Stop using gpiolib private APIs
  -  Use platform-device as gpio-keys parent
 
 xo15-ebook:
  -  Replace open coded acpi_match_acpi_device()
 
 Merges:
  -  Merge branch 'pdx86/platform-drivers-x86-int3472' into review-ilpo
  -  Merge branch 'pdx86/platform-drivers-x86-mellanox-init' into review-ilpo
  -  Merge remote-tracking branch 'intel-speed-select/intel-sst' into review-ilpo
  -  Merge remote-tracking branch 'pdx86/platform-drivers-x86-android-tablets' into review-hans
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCZT+lBwAKCRBZrE9hU+XO
 Mck0AQCFU7dYLCF4d1CXtHf1eZhSXLpYdhcO+C08JGGoM+MqSgD+Jyb9KJHk4pxE
 FvKG51I9neyAne9lvNrLodHRzxCYgAo=
 =duM8
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver updates from Ilpo Järvinen:

 - asus-wmi: Support for screenpad and solve brightness key press
   duplication

 - int3472: Eliminate the last use of deprecated GPIO functions

 - mlxbf-pmc: New HW support

 - msi-ec: Support new EC configurations

 - thinkpad_acpi: Support reading aux MAC address during passthrough

 - wmi: Fixes & improvements

 - x86-android-tablets: Detection fix and avoid use of GPIO private APIs

 - Debug & metrics interface improvements

 - Miscellaneous cleanups / fixes / improvements

* tag 'platform-drivers-x86-v6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (80 commits)
  platform/x86: inspur-platform-profile: Add platform profile support
  platform/x86: thinkpad_acpi: Add battery quirk for Thinkpad X120e
  platform/x86: wmi: Decouple WMI device removal from wmi_block_list
  platform/x86: wmi: Fix opening of char device
  platform/x86: wmi: Fix probe failure when failing to register WMI devices
  platform/x86: wmi: Fix refcounting of WMI devices in legacy functions
  platform/x86: wmi: Decouple probe deferring from wmi_block_list
  platform/x86/amd/hsmp: Fix iomem handling
  platform/x86: asus-wmi: Do not report brightness up/down keys when also reported by acpi_video
  platform/x86: thinkpad_acpi: replace deprecated strncpy with memcpy
  tools/power/x86/intel-speed-select: v1.18 release
  tools/power/x86/intel-speed-select: Use cgroup isolate for CPU 0
  tools/power/x86/intel-speed-select: Increase max CPUs in one request
  tools/power/x86/intel-speed-select: Display error for core-power support
  tools/power/x86/intel-speed-select: No TRL for non compute domains
  tools/power/x86/intel-speed-select: turbo-mode enable disable swapped
  tools/power/x86/intel-speed-select: Update help for TRL
  tools/power/x86/intel-speed-select: Sanitize integer arguments
  platform/x86: acer-wmi: Remove void function return
  platform/x86/amd/pmc: Add dump_custom_stb module parameter
  ...
2023-10-31 17:53:00 -10:00
Linus Torvalds
89ed67ef12 Networking changes for 6.7.
Core & protocols
 ----------------
 
  - Support usec resolution of TCP timestamps, enabled selectively by
    a route attribute.
 
  - Defer regular TCP ACK while processing socket backlog, try to send
    a cumulative ACK at the end. Increase single TCP flow performance
    on a 200Gbit NIC by 20% (100Gbit -> 120Gbit).
 
  - The Fair Queuing (FQ) packet scheduler:
    - add built-in 3 band prio / WRR scheduling
    - support bypass if the qdisc is mostly idle (5% speed up for TCP RR)
    - improve inactive flow reporting
    - optimize the layout of structures for better cache locality
 
  - Support TCP Authentication Option (RFC 5925, TCP-AO), a more modern
    replacement for the old MD5 option.
 
  - Add more retransmission timeout (RTO) related statistics to TCP_INFO.
 
  - Support sending fragmented skbs over vsock sockets.
 
  - Make sure we send SIGPIPE for vsock sockets if socket was shutdown().
 
  - Add sysctl for ignoring lower limit on lifetime in Router
    Advertisement PIO, based on an in-progress IETF draft.
 
  - Add sysctl to control activation of TCP ping-pong mode.
 
  - Add sysctl to make connection timeout in MPTCP configurable.
 
  - Support rcvlowat and notsent_lowat on MPTCP sockets, to help apps
    limit the number of wakeups.
 
  - Support netlink GET for MDB (multicast forwarding), allowing user
    space to request a single MDB entry instead of dumping the entire
    table.
 
  - Support selective FDB flushing in the VXLAN tunnel driver.
 
  - Allow limiting learned FDB entries in bridges, prevent OOM attacks.
 
  - Allow controlling via configfs netconsole targets which were created
    via the kernel cmdline at boot, rather than via configfs at runtime.
 
  - Support multiple PTP timestamp event queue readers with different
    filters.
 
  - MCTP over I3C.
 
 BPF
 ---
 
  - Add new veth-like netdevice where BPF program defines the logic
    of the xmit routine. It can operate in L3 and L2 mode.
 
  - Support exceptions - allow asserting conditions which should
    never be true but are hard for the verifier to infer.
    With some extra flexibility around handling of the exit / failure.
    https://lwn.net/Articles/938435/
 
  - Add support for local per-cpu kptr, allow allocating and storing
    per-cpu objects in maps. Access to those objects operates on
    the value for the current CPU. This allows to deprecate local
    one-off implementations of per-CPU storage like
    BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE maps.
 
  - Extend cgroup BPF sockaddr hooks for UNIX sockets. The use case is
    for systemd to re-implement the LogNamespace feature which allows
    running multiple instances of systemd-journald to process the logs
    of different services.
 
  - Enable open-coded task_vma iteration, after maple tree conversion
    made it hard to directly walk VMAs in tracing programs.
 
  - Add open-coded task, css_task and css iterator support.
    One of the use cases is customizable OOM victim selection via BPF.
 
  - Allow source address selection with bpf_*_fib_lookup().
 
  - Add ability to pin BPF timer to the current CPU.
 
  - Prevent creation of infinite loops by combining tail calls and
    fentry/fexit programs.
 
  - Add missed stats for kprobes to retrieve the number of missed kprobe
    executions and subsequent executions of BPF programs.
 
  - Inherit system settings for CPU security mitigations.
 
  - Add BPF v4 CPU instruction support for arm32 and s390x.
 
 Changes to common code
 ----------------------
 
  - overflow: add DEFINE_FLEX() for on-stack definition of structs
    with flexible array members.
 
  - Process doc update with more guidance for reviewers.
 
 Driver API
 ----------
 
  - Simplify locking in WiFi (cfg80211 and mac80211 layers), use wiphy
    mutex in most places and remove a lot of smaller locks.
 
  - Create a common DPLL configuration API. Allow configuring
    and querying state of PLL circuits used for clock syntonization,
    in network time distribution.
 
  - Unify fragmented and full page allocation APIs in page pool code.
    Let drivers be ignorant of PAGE_SIZE.
 
  - Rework PHY state machine to avoid races with calls to phy_stop().
 
  - Notify DSA drivers of MAC address changes on user ports, improve
    correctness of offloads which depend on matching port MAC addresses.
 
  - Allow antenna control on injected WiFi frames.
 
  - Reduce the number of variants of napi_schedule().
 
  - Simplify error handling when composing devlink health messages.
 
 Misc
 ----
 
  - A lot of KCSAN data race "fixes", from Eric.
 
  - A lot of __counted_by() annotations, from Kees.
 
  - A lot of strncpy -> strscpy and printf format fixes.
 
  - Replace master/slave terminology with conduit/user in DSA drivers.
 
  - Handful of KUnit tests for netdev and WiFi core.
 
 Removed
 -------
 
  - AppleTalk COPS.
 
  - AppleTalk ipddp.
 
  - TI AR7 CPMAC Ethernet driver.
 
 Drivers
 -------
 
  - Ethernet high-speed NICs:
    - Intel (100G, ice, idpf):
      - add a driver for the Intel E2000 IPUs
      - make CRC/FCS stripping configurable
      - cross-timestamping for E823 devices
      - basic support for E830 devices
      - use aux-bus for managing client drivers
      - i40e: report firmware versions via devlink
    - nVidia/Mellanox:
      - support 4-port NICs
      - increase max number of channels to 256
      - optimize / parallelize SF creation flow
    - Broadcom (bnxt):
      - enhance NIC temperature reporting
      - support PAM4 speeds and lane configuration
    - Marvell OcteonTX2:
      - PTP pulse-per-second output support
      - enable hardware timestamping for VFs
    - Solarflare/AMD:
      - conntrack NAT offload and offload for tunnels
    - Wangxun (ngbe/txgbe):
      - expose HW statistics
    - Pensando/AMD:
      - support PCI level reset
      - narrow down the condition under which skbs are linearized
    - Netronome/Corigine (nfp):
      - support CHACHA20-POLY1305 crypto in IPsec offload
 
  - Ethernet NICs embedded, slower, virtual:
    - Synopsys (stmmac):
      - add Loongson-1 SoC support
      - enable use of HW queues with no offload capabilities
      - enable PPS input support on all 5 channels
      - increase TX coalesce timer to 5ms
    - RealTek USB (r8152): improve efficiency of Rx by using GRO frags
    - xen: support SW packet timestamping
    - add drivers for implementations based on TI's PRUSS (AM64x EVM)
 
  - nVidia/Mellanox Ethernet datacenter switches:
    - avoid poor HW resource use on Spectrum-4 by better block selection
      for IPv6 multicast forwarding and ordering of blocks in ACL region
 
  - Ethernet embedded switches:
    - Microchip:
      - support configuring the drive strength for EMI compliance
      - ksz9477: partial ACL support
      - ksz9477: HSR offload
      - ksz9477: Wake on LAN
    - Realtek:
      - rtl8366rb: respect device tree config of the CPU port
 
  - Ethernet PHYs:
    - support Broadcom BCM5221 PHYs
    - TI dp83867: support hardware LED blinking
 
  - CAN:
    - add support for Linux-PHY based CAN transceivers
    - at91_can: clean up and use rx-offload helpers
 
  - WiFi:
    - MediaTek (mt76):
      - new sub-driver for mt7925 USB/PCIe devices
      - HW wireless <> Ethernet bridging in MT7988 chips
      - mt7603/mt7628 stability improvements
    - Qualcomm (ath12k):
      - WCN7850:
        - enable 320 MHz channels in 6 GHz band
        - hardware rfkill support
        - enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS
          to make scan faster
        - read board data variant name from SMBIOS
      - QCN9274: mesh support
    - RealTek (rtw89):
      - TDMA-based multi-channel concurrency (MCC)
    - Silicon Labs (wfx):
      - Remain-On-Channel (ROC) support
 
  - Bluetooth:
    - ISO: many improvements for broadcast support
    - mark BCM4378/BCM4387 as BROKEN_LE_CODED
    - add support for QCA2066
    - btmtksdio: enable Bluetooth wakeup from suspend
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmU8XsYACgkQMUZtbf5S
 Irv19RAAnud/24OOF5XMEJkIcYlnfqximh4XO6PujRSYkSkOUJdZTF6iJPgf3pSP
 YpwoHYbYKHYfeOf8+3bTNESiQNSnoVmvmvwiS6/7lZ3behHUrGLQzW9Htc3EZyWH
 2h6QkDZ5OOjfg0bwYSfp3vXkmMH2k8WE9Y0NvCkhcohqZi13Rmp14RnyPmNb2d1V
 yZRYDMSM133KqE6gnBr1Ct65IEvnKeGlCUN2mTGqOJgdn6DZMsyxvtt0y4rmN7Ab
 41+CgPU5SfxfbYpW+Dl2HJpgfte3WrC57KC6AM0PAPJzPmQWgeB/m9mjz/apj6Bg
 bhsEIo7FdvbCnQm3yWPhK2OgCAcSwLr8jfGMU+Q+W4VnL5SRRR3Rm0zjsze+kHNP
 OfqJgxzl3DpvoJqVBy1h5FGcZt0XHwhksm4cTxWqIahsF+veY0ECBXbuBBQx9XTF
 Y7INfI8ulg7wISJs+CJfIClYkgOibTw2u8taBS5ikbtgxNqp5D4QqODn7UefQap1
 PR/IDYODF+zRgmMJLeBqSa6fij6BkfOEDiOWak5kggBoZdtbtmeKI6tzze06CNdW
 lWv1WEhRufxnwK+IuWsEkjhiMbs2WGLvkJ5JbgQV9BfqHfIfiqBCrcWtT/WbQnGt
 lmU46CXh1t/FZEqbmK9h+8vsIIfrcDl6jb5npEiKPRG00vDKRTM=
 =46nS
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Support usec resolution of TCP timestamps, enabled selectively by a
     route attribute.

   - Defer regular TCP ACK while processing socket backlog, try to send
     a cumulative ACK at the end. Increase single TCP flow performance
     on a 200Gbit NIC by 20% (100Gbit -> 120Gbit).

   - The Fair Queuing (FQ) packet scheduler:
       - add built-in 3 band prio / WRR scheduling
       - support bypass if the qdisc is mostly idle (5% speed up for TCP RR)
       - improve inactive flow reporting
       - optimize the layout of structures for better cache locality

   - Support TCP Authentication Option (RFC 5925, TCP-AO), a more modern
     replacement for the old MD5 option.

   - Add more retransmission timeout (RTO) related statistics to
     TCP_INFO.

   - Support sending fragmented skbs over vsock sockets.

   - Make sure we send SIGPIPE for vsock sockets if socket was
     shutdown().

   - Add sysctl for ignoring lower limit on lifetime in Router
     Advertisement PIO, based on an in-progress IETF draft.

   - Add sysctl to control activation of TCP ping-pong mode.

   - Add sysctl to make connection timeout in MPTCP configurable.

   - Support rcvlowat and notsent_lowat on MPTCP sockets, to help apps
     limit the number of wakeups.

   - Support netlink GET for MDB (multicast forwarding), allowing user
     space to request a single MDB entry instead of dumping the entire
     table.

   - Support selective FDB flushing in the VXLAN tunnel driver.

   - Allow limiting learned FDB entries in bridges, prevent OOM attacks.

   - Allow controlling via configfs netconsole targets which were
     created via the kernel cmdline at boot, rather than via configfs at
     runtime.

   - Support multiple PTP timestamp event queue readers with different
     filters.

   - MCTP over I3C.

  BPF:

   - Add new veth-like netdevice where BPF program defines the logic of
     the xmit routine. It can operate in L3 and L2 mode.

   - Support exceptions - allow asserting conditions which should never
     be true but are hard for the verifier to infer. With some extra
     flexibility around handling of the exit / failure:

          https://lwn.net/Articles/938435/

   - Add support for local per-cpu kptr, allow allocating and storing
     per-cpu objects in maps. Access to those objects operates on the
     value for the current CPU.

     This allows to deprecate local one-off implementations of per-CPU
     storage like BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE maps.

   - Extend cgroup BPF sockaddr hooks for UNIX sockets. The use case is
     for systemd to re-implement the LogNamespace feature which allows
     running multiple instances of systemd-journald to process the logs
     of different services.

   - Enable open-coded task_vma iteration, after maple tree conversion
     made it hard to directly walk VMAs in tracing programs.

   - Add open-coded task, css_task and css iterator support. One of the
     use cases is customizable OOM victim selection via BPF.

   - Allow source address selection with bpf_*_fib_lookup().

   - Add ability to pin BPF timer to the current CPU.

   - Prevent creation of infinite loops by combining tail calls and
     fentry/fexit programs.

   - Add missed stats for kprobes to retrieve the number of missed
     kprobe executions and subsequent executions of BPF programs.

   - Inherit system settings for CPU security mitigations.

   - Add BPF v4 CPU instruction support for arm32 and s390x.

  Changes to common code:

   - overflow: add DEFINE_FLEX() for on-stack definition of structs with
     flexible array members.

   - Process doc update with more guidance for reviewers.

  Driver API:

   - Simplify locking in WiFi (cfg80211 and mac80211 layers), use wiphy
     mutex in most places and remove a lot of smaller locks.

   - Create a common DPLL configuration API. Allow configuring and
     querying state of PLL circuits used for clock syntonization, in
     network time distribution.

   - Unify fragmented and full page allocation APIs in page pool code.
     Let drivers be ignorant of PAGE_SIZE.

   - Rework PHY state machine to avoid races with calls to phy_stop().

   - Notify DSA drivers of MAC address changes on user ports, improve
     correctness of offloads which depend on matching port MAC
     addresses.

   - Allow antenna control on injected WiFi frames.

   - Reduce the number of variants of napi_schedule().

   - Simplify error handling when composing devlink health messages.

  Misc:

   - A lot of KCSAN data race "fixes", from Eric.

   - A lot of __counted_by() annotations, from Kees.

   - A lot of strncpy -> strscpy and printf format fixes.

   - Replace master/slave terminology with conduit/user in DSA drivers.

   - Handful of KUnit tests for netdev and WiFi core.

  Removed:

   - AppleTalk COPS.

   - AppleTalk ipddp.

   - TI AR7 CPMAC Ethernet driver.

  Drivers:

   - Ethernet high-speed NICs:
      - Intel (100G, ice, idpf):
         - add a driver for the Intel E2000 IPUs
         - make CRC/FCS stripping configurable
         - cross-timestamping for E823 devices
         - basic support for E830 devices
         - use aux-bus for managing client drivers
         - i40e: report firmware versions via devlink
      - nVidia/Mellanox:
         - support 4-port NICs
         - increase max number of channels to 256
         - optimize / parallelize SF creation flow
      - Broadcom (bnxt):
         - enhance NIC temperature reporting
         - support PAM4 speeds and lane configuration
      - Marvell OcteonTX2:
         - PTP pulse-per-second output support
         - enable hardware timestamping for VFs
      - Solarflare/AMD:
         - conntrack NAT offload and offload for tunnels
      - Wangxun (ngbe/txgbe):
         - expose HW statistics
      - Pensando/AMD:
         - support PCI level reset
         - narrow down the condition under which skbs are linearized
      - Netronome/Corigine (nfp):
         - support CHACHA20-POLY1305 crypto in IPsec offload

   - Ethernet NICs embedded, slower, virtual:
      - Synopsys (stmmac):
         - add Loongson-1 SoC support
         - enable use of HW queues with no offload capabilities
         - enable PPS input support on all 5 channels
         - increase TX coalesce timer to 5ms
      - RealTek USB (r8152): improve efficiency of Rx by using GRO frags
      - xen: support SW packet timestamping
      - add drivers for implementations based on TI's PRUSS (AM64x EVM)

   - nVidia/Mellanox Ethernet datacenter switches:
      - avoid poor HW resource use on Spectrum-4 by better block
        selection for IPv6 multicast forwarding and ordering of blocks
        in ACL region

   - Ethernet embedded switches:
      - Microchip:
         - support configuring the drive strength for EMI compliance
         - ksz9477: partial ACL support
         - ksz9477: HSR offload
         - ksz9477: Wake on LAN
      - Realtek:
         - rtl8366rb: respect device tree config of the CPU port

   - Ethernet PHYs:
      - support Broadcom BCM5221 PHYs
      - TI dp83867: support hardware LED blinking

   - CAN:
      - add support for Linux-PHY based CAN transceivers
      - at91_can: clean up and use rx-offload helpers

   - WiFi:
      - MediaTek (mt76):
         - new sub-driver for mt7925 USB/PCIe devices
         - HW wireless <> Ethernet bridging in MT7988 chips
         - mt7603/mt7628 stability improvements
      - Qualcomm (ath12k):
         - WCN7850:
            - enable 320 MHz channels in 6 GHz band
            - hardware rfkill support
            - enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS to
              make scan faster
            - read board data variant name from SMBIOS
        - QCN9274: mesh support
      - RealTek (rtw89):
         - TDMA-based multi-channel concurrency (MCC)
      - Silicon Labs (wfx):
         - Remain-On-Channel (ROC) support

   - Bluetooth:
      - ISO: many improvements for broadcast support
      - mark BCM4378/BCM4387 as BROKEN_LE_CODED
      - add support for QCA2066
      - btmtksdio: enable Bluetooth wakeup from suspend"

* tag 'net-next-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1816 commits)
  net: pcs: xpcs: Add 2500BASE-X case in get state for XPCS drivers
  net: bpf: Use sockopt_lock_sock() in ip_sock_set_tos()
  net: mana: Use xdp_set_features_flag instead of direct assignment
  vxlan: Cleanup IFLA_VXLAN_PORT_RANGE entry in vxlan_get_size()
  iavf: delete the iavf client interface
  iavf: add a common function for undoing the interrupt scheme
  iavf: use unregister_netdev
  iavf: rely on netdev's own registered state
  iavf: fix the waiting time for initial reset
  iavf: in iavf_down, don't queue watchdog_task if comms failed
  iavf: simplify mutex_trylock+sleep loops
  iavf: fix comments about old bit locks
  doc/netlink: Update schema to support cmd-cnt-name and cmd-max-name
  tools: ynl: introduce option to process unknown attributes or types
  ipvlan: properly track tx_errors
  netdevsim: Block until all devices are released
  nfp: using napi_build_skb() to replace build_skb()
  net: dsa: microchip: ksz9477: Fix spelling mistake "Enery" -> "Energy"
  net: dsa: microchip: Ensure Stable PME Pin State for Wake-on-LAN
  net: dsa: microchip: Refactor switch shutdown routine for WoL preparation
  ...
2023-10-31 05:10:11 -10:00
Paolo Bonzini
be47941980 KVM SVM changes for 6.7:
- Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while
    running an SEV-ES guest.
 
  - Clean up handling "failures" when KVM detects it can't emulate the "skip"
    action for an instruction that has already been partially emulated.  Drop a
    hack in the SVM code that was fudging around the emulator code not giving
    SVM enough information to do the right thing.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmU8GHYSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5hwkQAIR8l1gWz/caz29biBzmRnDS+aZOXcYM
 8V8WBJqJgMKE9egibF4sADAlhInXzg19Xr7bQs6VfuvmdXrCn0UJ/nLorX+H85A2
 pph6iNlWO6tyQAjvk/AieaeUyZOqpCFmKOgxfN2Fr/Lrn7u3AdjXC20qPeFJSLXr
 YOTCQ704yvjjJp4yVA8JlclAQu38hanKiO5SZdlLzbuhUgWwQk4DVP2ZsYnhX+RO
 F6exxORvMnYF/LJe/kR2/DMLf2JWvyUmjRrGWoeRoksOw5BlXMc5HyTPHSJ2jDac
 lJaNtmZkTY1bDVWZk7N03ze5aFJa4DaqJdIFLtgujrFW8thog0P48aH6vmKi4UAA
 bXme9GFYbmJTkemaGRnrzidFV12uPNvvanS+1PDOw4sn4HpscoMSpZw5PeH2kBwV
 6uKNCJCwLtk8oe50yroKD7rJ/ASB7CeoqzbIL9s2TA0HSAskIf65T4eZp01uniyd
 Q98yCdrG2mudsg5aU5yMfe0LwZby5BB5kUCqIe4hyRC68GJR8wkAzhaFRgCn4aJE
 yaTyjnT2V3PGMEEJOPFdSF3VQGztljzQiXlEvBVj3zvMGQNTo2NhmS3ka4W+wW5G
 avRYv8dITlGRs6J2gV1vp8Eb5LzDrwRpRURSmzeP5rR58saKdljTZgNfOzfLeFr1
 WhLzonLz52IS
 =U0fq
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-svm-6.7' of https://github.com/kvm-x86/linux into HEAD

KVM SVM changes for 6.7:

 - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while
   running an SEV-ES guest.

 - Clean up handling "failures" when KVM detects it can't emulate the "skip"
   action for an instruction that has already been partially emulated.  Drop a
   hack in the SVM code that was fudging around the emulator code not giving
   SVM enough information to do the right thing.
2023-10-31 10:22:43 -04:00
Paolo Bonzini
d5cde2e0b3 KVM PMU change for 6.7:
- Handle NMI/SMI requests after PMU/PMI requests so that a PMI=>NMI doesn't
    require redoing the entire run loop due to the NMI not being detected until
    the final kvm_vcpu_exit_request() check before entering the guest.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmU8G/sSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5/FQP/1B0tk5TMe/Xfe/q4ng+J2eMr10TpbH5
 uWRpxN6seRmH7cqfZwsNH86FubNRf3h9U/jOK3C9Q9dIhrq9MB1dZePDjF/xmZcz
 4lhM76fHTeRNxJ1o+j2ApiK9U2dDAbBTLA8iGi+OTs/sAuvbNUELY7d3Ht2TqJjb
 e9tGT+SavbTsg0UHEmteFHepMCe577AchL2T6jPbUaaVB05N7uD/qvIGDOLQvyaC
 KHWqY5f+eFN+3JdGEefCiS4XCAWXBPSs7Ybq5SduxS7rnB7m96Vkidwk1DLjnyUt
 +KNtb8JXBsMMuyaYZHrl4mPZyvOfmZxXOz9CzCYXzcQlsnkJqIyy3CiZFVEAqdq2
 kXtOhNEqByAKVCWvcoJvfO/VGd/w3KP5XYP3GHXJ8gsS3sDORnL5PYWIvPNfjdlu
 x7nsnk7PbaGdspSPqfKblwUvET1fePs1yjKECUMl4iJ6Wfr+QfKEpPUXQ6f79r+h
 DrhPE9DIWyMMbre0p8E7uTFsteVerUx/GVDh7jtn6LCUKwWmKAZ43sKR2d35GAvG
 x7ZKCcKl5U9vmC8c6q/eAZUE7CeNy1QBGXhYX6oP28NGxl5AzZ/Q8aYMsv0uqhyF
 cwYbVKA5Wl5fovMrnjs8wwkKqa9cHdzy7JmhyhBV5k5ggfSUeD7mG0UM5eRxr6ZM
 TOa/97QeXa7v
 =mjsy
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-pmu-6.7' of https://github.com/kvm-x86/linux into HEAD

KVM PMU change for 6.7:

 - Handle NMI/SMI requests after PMU/PMI requests so that a PMI=>NMI doesn't
   require redoing the entire run loop due to the NMI not being detected until
   the final kvm_vcpu_exit_request() check before entering the guest.
2023-10-31 10:22:23 -04:00
Paolo Bonzini
e122d7a100 KVM x86 Xen changes for 6.7:
- Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.
 
  - Use the fast path directly from the timer callback when delivering Xen timer
    events.  Avoid the problematic races with using the fast path by ensuring
    the hrtimer isn't running when (re)starting the timer or saving the timer
    information (for userspace).
 
  - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmU8He8SHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5KyQP+wUH3n6hhJGScsSCpWXK6r8q+Y2ZBftY
 ecXuoTfeBJmsoTbnExF7K600DtbxHY5jjxt3ROmoUCertCFRCoq6pi5v4rbRDDQ1
 fmGkht43A6zAuHQ0Ntvkq4rNEmISAbzLP4EXOxZJ/Hxld91T8IutMFo7NN/YfOSx
 nb+qgb7B25T7ODGvzahRjxnoevCHBN/TdKeDrvsoWeMpVw+CDYqquQOcLfHMaBAN
 DqGwZzpdVqRQqg3TOuBGCiv5IcvskjkFUh0y6cEYkCR/MruLoT6CygoLImEV2naW
 RU0ZU9Y4cjf+BV/faQEdP6mDQwwCUHWLxDpXUVn03KQYQHlA7q6UgRKxy35ixZ5w
 Euxvg4m2ZGgJjsVLqTTMUlbLSNxD6wWZAVxGH7w8XghKrNmoj1IoajPZS+1rwyO2
 5rUynMKf3HMT6oeqqZH95aChlUMiAvaPYPc+ogku8Bt1zJQVv/xnk/6T95Vw6C/t
 KfYsV80rmJd/EL/fUXYX3mCMcZGHyv80QlOEc0uR4f25HGszCG8qHiSaUtnvQUjQ
 xaguSuO1Cf7sdhHPWj4p/US+Jerrgd8nzoQGvKUOkdLsQzU71xwjvTZNlmmBYKKO
 zgGIXZfaXa4JibAqnRrC+V8UdDPOwKvOEzmH0joLEzkTISnIG2LycvZ6tG7sTcMU
 0sIg2dvhJx/G
 =Z2eM
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-xen-6.7' of https://github.com/kvm-x86/linux into HEAD

KVM x86 Xen changes for 6.7:

 - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.

 - Use the fast path directly from the timer callback when delivering Xen timer
   events.  Avoid the problematic races with using the fast path by ensuring
   the hrtimer isn't running when (re)starting the timer or saving the timer
   information (for userspace).

 - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag.
2023-10-31 10:21:42 -04:00
Paolo Bonzini
f0f59d069e KVM x86 MMU changes for 6.7:
- Clean up code that deals with honoring guest MTRRs when the VM has
    non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.
 
  - Zap EPT entries when non-coherent DMA assignment stops/start to prevent
    using stale entries with the wrong memtype.
 
  - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y, as
    there's zero reason to ignore guest PAT if the effective MTRR memtype is WB.
    This will also allow for future optimizations of handling guest MTRR updates
    for VMs with non-coherent DMA and the quirk enabled.
 
  - Harden the fast page fault path to guard against encountering an invalid
    root when walking SPTEs.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmU8FG8SHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5tYMP+gJd3raPnpmai4NyaFaZNP6/5YsXuUMj
 XBvHH7hBGHmjd1sV+O62fhUvNk4+M/1f1rERutP4s7yXEXxQfC9G/MQFgLBfyiW8
 xR+RQkNrz8HsG8mHFBZ0Ei6OofhP+BRTYDRU7kbctKDh/4Hp5AOZAxYHs/ZhOho1
 Lw6upZbQLCkdt72eEKbfocg6Tf400hWEyarBRXFe4KJzWq7KMjAPgqA/3Vx0lF6u
 zX73Zr6tV0mcf3QXd58Q4CUwOuwMo1aTangmOhEeC09JplF2okLV36h6WrCF8qqO
 gvmDrMA450Yc215peOJGBJzoZJrNjMIHZ2m+4Ifag6Z/jJoam4vjzUZmmrzx+Gbj
 Ot5lmXCVRXCdHmUNdYQ6yR27WaVP3C3ItkxwNZGMPoh2G08NGyLLY1kwzRyITEH4
 M9jYTRBZaeue57ad5Ms9FaneBLWwPxajTX90rWZbl2kzfd8PG5cF1VroESBLoa0f
 I2kDcd7988xLTOMl1sfO8ci21Ve7rQc0hA6WlOXrDxb26OvYrftYXeXOCowN6kqP
 czXIu5ZPmLI1btimZQXGMdxKkw5wwe3wDC3y5gKrm+rTfORUXoOUDoITIpmPCnAp
 Dzfr5la3RI1GjHhzR80x4vXQC9BgJ9WrEwJub/RqVfE3T3ohw+NZl+AeM1xB9eT1
 2mJWm6GFEm9Y
 =Zfbr
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-mmu-6.7' of https://github.com/kvm-x86/linux into HEAD

KVM x86 MMU changes for 6.7:

 - Clean up code that deals with honoring guest MTRRs when the VM has
   non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.

 - Zap EPT entries when non-coherent DMA assignment stops/start to prevent
   using stale entries with the wrong memtype.

 - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y, as
   there's zero reason to ignore guest PAT if the effective MTRR memtype is WB.
   This will also allow for future optimizations of handling guest MTRR updates
   for VMs with non-coherent DMA and the quirk enabled.

 - Harden the fast page fault path to guard against encountering an invalid
   root when walking SPTEs.
2023-10-31 10:17:43 -04:00
Paolo Bonzini
f292dc8aad KVM x86 misc changes for 6.7:
- Add CONFIG_KVM_MAX_NR_VCPUS to allow supporting up to 4096 vCPUs without
    forcing more common use cases to eat the extra memory overhead.
 
  - Add IBPB and SBPB virtualization support.
 
  - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
    creating the original vCPU would cause KVM to try to synchronize the vCPU's
    TSC and thus clobber the correct TSC being set by userspace.
 
  - Compute guest wall clock using a single TSC read to avoid generating an
    inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.
 
  - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
     about a "Firmware Bug" if the bit isn't set for select F/M/S combos.
 
  - Don't apply side effects to Hyper-V's synthetic timer on writes from
    userspace to fix an issue where the auto-enable behavior can trigger
    spurious interrupts, i.e. do auto-enabling only for guest writes.
 
  - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
    without PML enabled.
 
  - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.
 
  - Use octal notation for file permissions through KVM x86.
 
  - Fix a handful of typo fixes and warts.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmU8EugSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5xS0P+gPTDO81CUZO70LrO2W4E7toRBf/F9x1
 /v5D/76p9hG32Z6+BJs/xxDxJFagw75MtoR5oKivtXiip3TxbfOyDOlaQkIRo85E
 /d95il/LRidL3Mv3TXRj1lykXnxSSz9tigAGEZti1Y9Fn9fXEIwurJH7dU5cBI1E
 fin5bsDaTNRjG4jjTiEUbnKPRTlD/S7CQJn4CaYvZhMv/eJkYDLyBBVy4VLoLzvD
 ctL6VJQLGPVxbxr9mEmulaqMrSuDIQQLkRVQJAViKyerBInTEc5d/GPCHuE8O3zi
 0r/QSJbMS9titWLz07NhJ1UH4VJNyaEhRlyJPSFhBW4h6dzUb3EXdUe0Hwa+JH/S
 H2cVqsANItTCIhvDtuEGIRDahu0eD+63h90InJ0gEVL1kSJS+UWZHB71PkUEQgAV
 2OsuT1D26fuxrv+0b9ioBZURycqKw++zGsrwyVhe77eBgqBJ12tbL4TAD+QNjaQ5
 HZTCe6YV83gZoOMeVkoTGSf96s9lGORgxsaAIXmFuLB9RVCVXhVh0ph2HZsnV8Hw
 ZXEXpBEFo7GUhb0NIvsk2W73QL87A3fLv15yITWc8KuC7/dXP9z6KpSKjFySS69X
 uWD1MVx6shhvbg97UzoJlXc3/z0aVzmdZJudE5d0gcFvAjIItqp6ICPOoKxfj8pT
 tqRZu3kVHd61
 =sfp8
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-misc-6.7' of https://github.com/kvm-x86/linux into HEAD

KVM x86 misc changes for 6.7:

 - Add CONFIG_KVM_MAX_NR_VCPUS to allow supporting up to 4096 vCPUs without
   forcing more common use cases to eat the extra memory overhead.

 - Add IBPB and SBPB virtualization support.

 - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
   creating the original vCPU would cause KVM to try to synchronize the vCPU's
   TSC and thus clobber the correct TSC being set by userspace.

 - Compute guest wall clock using a single TSC read to avoid generating an
   inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.

 - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
    about a "Firmware Bug" if the bit isn't set for select F/M/S combos.

 - Don't apply side effects to Hyper-V's synthetic timer on writes from
   userspace to fix an issue where the auto-enable behavior can trigger
   spurious interrupts, i.e. do auto-enabling only for guest writes.

 - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
   without PML enabled.

 - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.

 - Use octal notation for file permissions through KVM x86.

 - Fix a handful of typo fixes and warts.
2023-10-31 10:15:15 -04:00
Paolo Bonzini
f233646760 KVM x86 APIC changes for 6.7:
- Purge VMX's posted interrupt descriptor *before* loading APIC state when
    handling KVM_SET_LAPIC.  Purging the PID after loading APIC state results in
    lost APIC timer IRQs as the APIC timer can be armed as part of loading APIC
    state, i.e. can immediately pend an IRQ if the expiry is in the past.
 
  - Clear the ICR.BUSY bit when handling trap-like x2APIC writes to suppress a
    WARN due to KVM expecting the BUSY bit to be cleared when sending IPIs.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmU8DrkSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5Z54P/jQbjmwbvAVDa1g2HGr16ouFb3v+rCgV
 eXiN2/8tDlQl9JkvZsjOY1AtJauS3RZu/SncT3IxCCvHVrKKt0xgnhWT//rAMX3N
 zrrFxXJmilFdd2ijsuK8flHVFO5i06Kx1NRjSmAMb76et1et1EI0I62F1rom8ocn
 l0tryTkV0qhWKlgxzLBNzzhzxxA2Mvb6vFhut+nfa0IisnQ+roh5dV4JS7Cdhy+5
 qIQ4J86JdDR6kmraZLCv8r7vIQn5EQPVCAC2WWalds7iWUW3ixy01NLweVLaVtA3
 xWRHYK7azvu3SJ9HvmSMRZzzTElQBYnE2d37ycyYewHbDUJ6FCXxh7H+BehYaVBW
 CK6GLaMHY8vyXLcB2cRAzZ7GgBSSDevL11/CMc2ueGc51AOsDCxFh6v69TU+/PiQ
 dMOYiNTuwV9QC3raIXPCc2Q9rqGp20xBqXIv5XLEyIxqcFQESvtRggg6zJdBZ3zB
 T5/xAXDf++rKFiu1wskFAbwsk3M4T36u1WlK9iayC9dPvCDGRzsle3tt9npmoj2X
 3e0Uz7fkRLki1YZW1qLmvqoYJ9bX9B9BjuKkORIqSqXuwcgPYEzHt1GcRByI2FSk
 KSTnNwgs/Bs498LsDdU1Z740ZkYFGmu2nK+CUvwxP+uejPsnlhik5kGqLSIeA7zA
 9kDpcdV+tZKJ
 =atZ2
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-apic-6.7' of https://github.com/kvm-x86/linux into HEAD

KVM x86 APIC changes for 6.7:

 - Purge VMX's posted interrupt descriptor *before* loading APIC state when
   handling KVM_SET_LAPIC.  Purging the PID after loading APIC state results in
   lost APIC timer IRQs as the APIC timer can be armed as part of loading APIC
   state, i.e. can immediately pend an IRQ if the expiry is in the past.

 - Clear the ICR.BUSY bit when handling trap-like x2APIC writes.  This avoids a
   WARN, due to KVM expecting the BUSY bit to be cleared when sending IPIs.
2023-10-31 10:11:19 -04:00
Linus Torvalds
455cdcb45f Rust changes for v6.7
A small one compared to the previous one in terms of features. In terms
 of lines, as usual, the 'alloc' version upgrade accounts for most of them.
 
 Toolchain and infrastructure:
 
  - Upgrade to Rust 1.73.0.
 
    This time around, due to how the kernel and Rust schedules have
    aligned, there are two upgrades in fact. They contain the fixes for
    a few issues we reported to the Rust project.
 
    In addition, a few cleanups indicated by the upgraded compiler
    or possible thanks to it. For instance, the compiler now detects
    redundant explicit links.
 
  - A couple changes to the Rust 'Makefile' so that it can be used with
    toybox tools, allowing Rust to be used in the Android kernel build.
 
 x86:
 
  - Enable IBT if enabled in C.
 
 Documentation:
 
  - Add "The Rust experiment" section to the Rust index page.
 
 MAINTAINERS
 
  - Add Maintainer Entry Profile field ('P:').
 
  - Update our 'W:' field to point to the webpage we have been building
    this year.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmU79aIACgkQGXyLc2ht
 IW03Rw//ZcbRDxVEbD9UE+aM6wtKQ1eY8QIB7gNivctv1R8spKlnsLpF/VdBQV4v
 yWmMPG/Vnp7Xbkcg2kZrbo2J82NgD6ACJWxWHVb8K2N/hoIVwrQXiQUmtg8bAa5B
 aDso+8WcWZOF6uzu5ww7kv8AbAOO3ReCxnIVPexeWQtZVAGeBd4BiVecoTL0mCbA
 7MMNyyKxjnRSo72Sj4iFoVPjb/IkOHYRaPQA/QOvG3bwin5nTvxM+94v+UZ4D7fp
 THWuZjJiC19L/C2/GGweK6mnpV2lExdZl4RC3JInu8s3R6jwGuRxUNE4vCnO9DlY
 QBkeUV3qwMCG/LnAb+iyClDM5aEU2wWBFl1NwNy0yEQM/gBqk6+4HQxxB177Wte3
 V65f4Sz19baci1SNCk+rFe/1EK8/UoD2jo42DXnuIUnGq5VJtVRNxbn/2tR0kNZn
 9DGwR1U8shMytNen5xFLi03q+p1Zuez/YKzmTmahv8zn2JysUgj3ctFK2M3mCiFu
 +HWvPFBbUbOH+/g3txBXtnvY757Nei5siSKONVZTOez6VYR//jlvLNNBbUp3vPFE
 w2TuR6HcFVR8z/kjcTxIgS+xzbsT8FkugJwRdLeQU4Ky03o2Z6uosM3nky9TzWjb
 oVR7rFgb9KygvP67+uVQS5OETQVO0EwutlEsQzzs7cbnoSt0ckk=
 =rKLl
 -----END PGP SIGNATURE-----

Merge tag 'rust-6.7' of https://github.com/Rust-for-Linux/linux

Pull rust updates from Miguel Ojeda:
 "A small one compared to the previous one in terms of features. In
  terms of lines, as usual, the 'alloc' version upgrade accounts for
  most of them.

  Toolchain and infrastructure:

   - Upgrade to Rust 1.73.0

     This time around, due to how the kernel and Rust schedules have
     aligned, there are two upgrades in fact. They contain the fixes for
     a few issues we reported to the Rust project.

     In addition, a few cleanups indicated by the upgraded compiler or
     possible thanks to it. For instance, the compiler now detects
     redundant explicit links.

   - A couple changes to the Rust 'Makefile' so that it can be used with
     toybox tools, allowing Rust to be used in the Android kernel build.

  x86:

   - Enable IBT if enabled in C

  Documentation:

   - Add "The Rust experiment" section to the Rust index page

  MAINTAINERS:

   - Add Maintainer Entry Profile field ('P:').

   - Update our 'W:' field to point to the webpage we have been building
     this year"

* tag 'rust-6.7' of https://github.com/Rust-for-Linux/linux:
  docs: rust: add "The Rust experiment" section
  x86: Enable IBT in Rust if enabled in C
  rust: Use grep -Ev rather than relying on GNU grep
  rust: Use awk instead of recent xargs
  rust: upgrade to Rust 1.73.0
  rust: print: use explicit link in documentation
  rust: task: remove redundant explicit link
  rust: kernel: remove `#[allow(clippy::new_ret_no_self)]`
  MAINTAINERS: add Maintainer Entry Profile field for Rust
  MAINTAINERS: update Rust webpage
  rust: upgrade to Rust 1.72.1
  rust: arc: add explicit `drop()` around `Box::from_raw()`
2023-10-30 20:30:49 -10:00
Linus Torvalds
befaa609f4 hardening updates for v6.7-rc1
- Add LKDTM test for stuck CPUs (Mark Rutland)
 
 - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo)
 
 - Refactor more 1-element arrays into flexible arrays (Gustavo A. R. Silva)
 
 - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem Shaikh)
 
 - Convert group_info.usage to refcount_t (Elena Reshetova)
 
 - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva)
 
 - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas Bulwahn)
 
 - Fix randstruct GCC plugin performance mode to stay in groups (Kees Cook)
 
 - Fix strtomem() compile-time check for small sources (Kees Cook)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmU/3cUWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsEoEACBGPSiOmfSWdH3TOnIG270PD24
 jGjg8KFv7RC/JTOdYmpLl0okdlGT9LvjN/ToSSDEw3PIayxoXUdhkbYy0MYtiV3m
 yz2ozDTzJuplQX/W2fPE+nXSzIwHao2zjPPFjHnT7lt8IIjhgjiOtLfZ2gGUkW99
 Mdu2aWh3u0r4tC8OS23++yN5ibRc5l72efsjDWjZ0aPXnxE1bjmLMiIPiizpndIf
 beasPuDBs98sJVYouemCwnsPXuXOPz3Q1Cpo/fTd+TMTJCLSemCQZCTuOBU0acI/
 ZjLCgCaJU1yIYKBMtrIN4G9kITZniXX3/Nm4o6NQMVlcCqMeNaHuflomqWoqWfhE
 UPbRo2eghZOaMNiCKLLvZDIqPrh1IcsiEl6Ef3W4hICc42GTK96IuGisIvDXwQ4N
 /SzTOupJuN42noh3z1M3XuZy5RoXJ99IYDNY5CTKf9IdqvA0bbGkU3nb1gZH/xw9
 BjTqKzR/7K1kTXuSgagDZ1Wceej9pZxhX7E3IHYsP8ZOvKug3EeL4yybVwQ3HRfq
 Qnzcp/qPB9cOkLSQXveRTFTsj2mX28Gixct/iDuc1jIYwGQlY1gI6dcUcqby6ptM
 BrQti7eR2NH2+T3aE2UVCIWsZVhx7NaSF+z8JxfAuu56jicc4xJVsi8zrNveWX5M
 m2VXyBl3121BVtKi4w==
 =0iVF
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "One of the more voluminous set of changes is for adding the new
  __counted_by annotation[1] to gain run-time bounds checking of
  dynamically sized arrays with UBSan.

   - Add LKDTM test for stuck CPUs (Mark Rutland)

   - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo)

   - Refactor more 1-element arrays into flexible arrays (Gustavo A. R.
     Silva)

   - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem
     Shaikh)

   - Convert group_info.usage to refcount_t (Elena Reshetova)

   - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva)

   - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas
     Bulwahn)

   - Fix randstruct GCC plugin performance mode to stay in groups (Kees
     Cook)

   - Fix strtomem() compile-time check for small sources (Kees Cook)"

* tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (56 commits)
  hwmon: (acpi_power_meter) replace open-coded kmemdup_nul
  reset: Annotate struct reset_control_array with __counted_by
  kexec: Annotate struct crash_mem with __counted_by
  virtio_console: Annotate struct port_buffer with __counted_by
  ima: Add __counted_by for struct modsig and use struct_size()
  MAINTAINERS: Include stackleak paths in hardening entry
  string: Adjust strtomem() logic to allow for smaller sources
  hardening: x86: drop reference to removed config AMD_IOMMU_V2
  randstruct: Fix gcc-plugin performance mode to stay in group
  mailbox: zynqmp: Annotate struct zynqmp_ipi_pdata with __counted_by
  drivers: thermal: tsens: Annotate struct tsens_priv with __counted_by
  irqchip/imx-intmux: Annotate struct intmux_data with __counted_by
  KVM: Annotate struct kvm_irq_routing_table with __counted_by
  virt: acrn: Annotate struct vm_memory_region_batch with __counted_by
  hwmon: Annotate struct gsc_hwmon_platform_data with __counted_by
  sparc: Annotate struct cpuinfo_tree with __counted_by
  isdn: kcapi: replace deprecated strncpy with strscpy_pad
  isdn: replace deprecated strncpy with strscpy
  NFS/flexfiles: Annotate struct nfs4_ff_layout_segment with __counted_by
  nfs41: Annotate struct nfs4_file_layout_dsaddr with __counted_by
  ...
2023-10-30 19:09:55 -10:00
Linus Torvalds
2656821f1f RCU pull request for v6.7
This pull request contains the following branches:
 
 rcu/torture: RCU torture, locktorture and generic torture infrastructure
 	updates that include various fixes, cleanups and consolidations.
 	Among the user visible things, ftrace dumps can now be found into
 	their own file, and module parameters get better documented and
 	reported on dumps.
 
 rcu/fixes: Generic and misc fixes all over the place. Some highlights:
 
 	* Hotplug handling has seen some light cleanups and comments.
 
 	* An RCU barrier can now be triggered through sysfs to serialize
 	memory stress testing and avoid OOM.
 
 	* Object information is now dumped in case of invalid callback
 	invocation.
 
 	* Also various SRCU issues, too hard to trigger to deserve urgent
 	pull requests, have been fixed.
 
 rcu/docs: RCU documentation updates
 
 rcu/refscale: RCU reference scalability test minor fixes and doc
 	improvements.
 
 rcu/tasks: RCU tasks minor fixes
 
 rcu/stall: Stall detection updates. Introduce RCU CPU Stall notifiers
 	that allows a subsystem to provide informations to help debugging.
 	Also cure some false positive stalls.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmU21h0ACgkQhSRUR1CO
 jHdUgA/+Myy5K5OxNrqlF/gIK+flOSg635RyZ0DBx8OMXZ/fAg9qRI+PKt5I4Lha
 eXAg6EtmwSgHmIbjcg8WzsvwniEsqqjOF+n1qil447fHUI2Qqw6c7fIm/MXQkeHJ
 qA7CODDRtsAnwnjmTteasmMeGV0bmXDENxhNrAZBFnVkRgTqfyDbFcn+nxOaPK6b
 fmbKvnB07WUg1KOV8/MbEtAZPb8QgHo58bXSZRKjKkiqRQWB/D3On+tShFK7SYJi
 wIqQ96MLyUXLaIWQ47v6xEO4PZO+3o1wAryvP1DRdb5UrPjO6yKFfQaoo5Mza92G
 zhBJhnXkVvCoNoCU7GKJIDV54SgDHaB6Sf1GN5cjwfujOkLuGCyg0CpKktCGm7uH
 n3X66PVep608Uj2Y/pAo/hv3Hbv7lCu4nfrERvVLG9YoxUvTJDsKmBv+SF/g2mxF
 rHqFa39HUPr1yHA5WjqOQS3lLdqCXEGKvNi6zXCvOceiDbHbiJFkBo6p8TVrbSMX
 FCOWZ3LoE+6uiLu/lLOEroTjeBd8GhDh1LgWgyVK7o0LhP1018DSBolrpcSwnmOo
 Q/E4G2x+aPWs+5NTOmMGOIPY70khKQIM3c8YZelSRffJBo6O3yV68h6X45NQxYvx
 keLvrDaza8h4hKwaof/QaX4ZJgTOZ0xjpawr1vR0hbK8LNtPrUw=
 =cVD7
 -----END PGP SIGNATURE-----

Merge tag 'rcu-next-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks

Pull RCU updates from Frederic Weisbecker:

 - RCU torture, locktorture and generic torture infrastructure updates
   that include various fixes, cleanups and consolidations.

   Among the user visible things, ftrace dumps can now be found into
   their own file, and module parameters get better documented and
   reported on dumps.

 - Generic and misc fixes all over the place. Some highlights:

     * Hotplug handling has seen some light cleanups and comments

     * An RCU barrier can now be triggered through sysfs to serialize
       memory stress testing and avoid OOM

     * Object information is now dumped in case of invalid callback
       invocation

     * Also various SRCU issues, too hard to trigger to deserve urgent
       pull requests, have been fixed

 - RCU documentation updates

 - RCU reference scalability test minor fixes and doc improvements.

 - RCU tasks minor fixes

 - Stall detection updates. Introduce RCU CPU Stall notifiers that
   allows a subsystem to provide informations to help debugging. Also
   cure some false positive stalls.

* tag 'rcu-next-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: (56 commits)
  srcu: Only accelerate on enqueue time
  locktorture: Check the correct variable for allocation failure
  srcu: Fix callbacks acceleration mishandling
  rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP
  rcu: Standardize explicit CPU-hotplug calls
  rcu: Conditionally build CPU-hotplug teardown callbacks
  rcu: Remove references to rcu_migrate_callbacks() from diagrams
  rcu: Assume rcu_report_dead() is always called locally
  rcu: Assume IRQS disabled from rcu_report_dead()
  rcu: Use rcu_segcblist_segempty() instead of open coding it
  rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
  srcu: Fix srcu_struct node grpmask overflow on 64-bit systems
  torture: Convert parse-console.sh to mktemp
  rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle()
  rcutorture: Replace schedule_timeout*() 1-jiffy waits with HZ/20
  torture: Add kvm.sh --debug-info argument
  locktorture: Rename readers_bind/writers_bind to bind_readers/bind_writers
  doc: Catch-up update for locktorture module parameters
  locktorture: Add call_rcu_chains module parameter
  locktorture: Add new module parameters to lock_torture_print_module_parms()
  ...
2023-10-30 18:01:41 -10:00
Linus Torvalds
eb55307e67 X86 core code updates:
- Limit the hardcoded topology quirk for Hygon CPUs to those which have a
     model ID less than 4. The newer models have the topology CPUID leaf 0xB
     correctly implemented and are not affected.
 
   - Make SMT control more robust against enumeration failures
 
     SMT control was added to allow controlling SMT at boottime or
     runtime. The primary purpose was to provide a simple mechanism to
     disable SMT in the light of speculation attack vectors.
 
     It turned out that the code is sensible to enumeration failures and
     worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
     which means the primary thread mask is not set up correctly. By chance
     a XEN/PV boot ends up with smp_num_siblings == 2, which makes the
     hotplug control stay at its default value "enabled". So the mask is
     never evaluated.
 
     The ongoing rework of the topology evaluation caused XEN/PV to end up
     with smp_num_siblings == 1, which sets the SMT control to "not
     supported" and the empty primary thread mask causes the hotplug core to
     deny the bringup of the APS.
 
     Make the decision logic more robust and take 'not supported' and 'not
     implemented' into account for the decision whether a CPU should be
     booted or not.
 
   - Fake primary thread mask for XEN/PV
 
     Pretend that all XEN/PV vCPUs are primary threads, which makes the
     usage of the primary thread mask valid on XEN/PV. That is consistent
     with because all of the topology information on XEN/PV is fake or even
     non-existent.
 
   - Encapsulate topology information in cpuinfo_x86
 
     Move the randomly scattered topology data into a separate data
     structure for readability and as a preparatory step for the topology
     evaluation overhaul.
 
   - Consolidate APIC ID data type to u32
 
     It's fixed width hardware data and not randomly u16, int, unsigned long
     or whatever developers decided to use.
 
   - Cure the abuse of cpuinfo for persisting logical IDs.
 
     Per CPU cpuinfo is used to persist the logical package and die
     IDs. That's really not the right place simply because cpuinfo is
     subject to be reinitialized when a CPU goes through an offline/online
     cycle.
 
     Use separate per CPU data for the persisting to enable the further
     topology management rework. It will be removed once the new topology
     management is in place.
 
   - Provide a debug interface for inspecting topology information
 
     Useful in general and extremly helpful for validating the topology
     management rework in terms of correctness or "bug" compatibility.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmU+yX0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoROUD/4vlvKEcpm9rbI5DzLcaq4DFHKbyEZF
 cQtzuOSM/9vTc9DHnuoNNLl9TWSYxiVYnejf3E21evfsqspYlzbTH8bId9XBCUid
 6B68AJW842M2erNuwj0b0HwF1z++zpDmBDyhGOty/KQhoM8pYOHMvntAmbzJbuso
 Dgx6BLVFcboTy6RwlfRa0EE8f9W5V+JbmG/VBDpdyCInal7VrudoVFZmWQnPIft7
 zwOJpAoehkp8OKq7geKDf79yWxu9a1sNPd62HtaVEvfHwehHqE6OaMLss1us+0vT
 SJ/D6gmRQBOwcXaZL0wL1dG7Km9Et4AisOvzhXGvTa5b2D5oljVoqJ7V7FTf5g3u
 y3aqWbeUJzERUbeJt1HoGVAKyA4GtZOvg+TNIysf6F1Z4khl9alfa9jiqjj4g1au
 zgItq/ZMBEBmJ7X4FxQUEUVBG2CDsEidyNBDRcimWQUDfBakV/iCs0suD8uu8ZOD
 K5jMx8Hi2+xFx7r1YqsfsyMBYOf/zUZw65RbNe+kI992JbJ9nhcODbnbo5MlAsyv
 vcqlK5FwXgZ4YAC8dZHU/tyTiqAW7oaOSkqKwTP5gcyNEqsjQHV//q6v+uqtjfYn
 1C4oUsRHT2vJiV9ktNJTA4GQHIYF4geGgpG8Ih2SjXsSzdGtUd3DtX1iq0YiLEOk
 eHhYsnniqsYB5g==
 =xrz8
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Thomas Gleixner:

 - Limit the hardcoded topology quirk for Hygon CPUs to those which have
   a model ID less than 4.

   The newer models have the topology CPUID leaf 0xB correctly
   implemented and are not affected.

 - Make SMT control more robust against enumeration failures

   SMT control was added to allow controlling SMT at boottime or
   runtime. The primary purpose was to provide a simple mechanism to
   disable SMT in the light of speculation attack vectors.

   It turned out that the code is sensible to enumeration failures and
   worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
   which means the primary thread mask is not set up correctly. By
   chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes
   the hotplug control stay at its default value "enabled". So the mask
   is never evaluated.

   The ongoing rework of the topology evaluation caused XEN/PV to end up
   with smp_num_siblings == 1, which sets the SMT control to "not
   supported" and the empty primary thread mask causes the hotplug core
   to deny the bringup of the APS.

   Make the decision logic more robust and take 'not supported' and 'not
   implemented' into account for the decision whether a CPU should be
   booted or not.

 - Fake primary thread mask for XEN/PV

   Pretend that all XEN/PV vCPUs are primary threads, which makes the
   usage of the primary thread mask valid on XEN/PV. That is consistent
   with because all of the topology information on XEN/PV is fake or
   even non-existent.

 - Encapsulate topology information in cpuinfo_x86

   Move the randomly scattered topology data into a separate data
   structure for readability and as a preparatory step for the topology
   evaluation overhaul.

 - Consolidate APIC ID data type to u32

   It's fixed width hardware data and not randomly u16, int, unsigned
   long or whatever developers decided to use.

 - Cure the abuse of cpuinfo for persisting logical IDs.

   Per CPU cpuinfo is used to persist the logical package and die IDs.
   That's really not the right place simply because cpuinfo is subject
   to be reinitialized when a CPU goes through an offline/online cycle.

   Use separate per CPU data for the persisting to enable the further
   topology management rework. It will be removed once the new topology
   management is in place.

 - Provide a debug interface for inspecting topology information

   Useful in general and extremly helpful for validating the topology
   management rework in terms of correctness or "bug" compatibility.

* tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too
  x86/cpu: Provide debug interface
  x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids
  x86/apic: Use u32 for wakeup_secondary_cpu[_64]()
  x86/apic: Use u32 for [gs]et_apic_id()
  x86/apic: Use u32 for phys_pkg_id()
  x86/apic: Use u32 for cpu_present_to_apicid()
  x86/apic: Use u32 for check_apicid_used()
  x86/apic: Use u32 for APIC IDs in global data
  x86/apic: Use BAD_APICID consistently
  x86/cpu: Move cpu_l[l2]c_id into topology info
  x86/cpu: Move logical package and die IDs into topology info
  x86/cpu: Remove pointless evaluation of x86_coreid_bits
  x86/cpu: Move cu_id into topology info
  x86/cpu: Move cpu_core_id into topology info
  hwmon: (fam15h_power) Use topology_core_id()
  scsi: lpfc: Use topology_core_id()
  x86/cpu: Move cpu_die_id into topology info
  x86/cpu: Move phys_proc_id into topology info
  x86/cpu: Encapsulate topology information in cpuinfo_x86
  ...
2023-10-30 17:37:47 -10:00
Linus Torvalds
943af0e73a Updates for the X86 APIC:
- Make the quirk for non-maskable MSI interrupts in the affinity setter
     functional again.
 
     It was broken by a MSI core code update, which restructured the code in
     a way that the quirk flag was not longer set correctly.
 
     Trying to restore the core logic caused a deeper inspection and it
     turned out that the extra quirk flag is not required at all because
     it's the inverse of the reservation mode bit, which only can be set
     when the MSI interrupt is maskable.
 
     So the trivial fix is to use the reservation mode check in the affinity
     setter function and remove almost 40 lines of code related to the
     no-mask quirk flag.
 
   - Cure a Kconfig dependency issue which causes compile fails by correcting
     the conditionals in the affected heaer files.
 
   - Clean up coding style in the UV APIC driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmU+w2ETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodRIEACugQpAiND53Cz9MTJE1XWGheBn8n+y
 WZtbVckK9LXHfnf99dtigtq3ycPBg2Mx/pItT1c71iN/NyVPNCC/3mM+ntP53etX
 06v7VoIgRpF+GRQsIbjvVfBkKWS3G5zqHq6hazE0lcPhPEiOIMSBcusSQaA3C3sp
 BT23rrx/59JKCb/187pum0Obx+n1oH+MG91wJ1v0zNOGgfx1u5gBPyKDZ0JAsMPx
 r05991Lp4K80ooEsk/PKgpcZuPZD3QRwFDLHJwh2ITjsC22ItOnhN/c/1J8MZtk9
 5YBaIzy+vyQd+nksqmDtB48FD0ioW/WLUR5zV9MMpD3RQAGgrxQmXeTqScu/j8tn
 I4QZGi80HZSWLwFjWL2DoAw5LHZDh8ksxIYZBHE+8LHx0ymyX+7//naZnNFn9cXM
 K5orouZFQi+dvfD7MAla73fiibPs6cHjGzRKfsfNlLBNNj+/ffcEolLlKGhEX9fx
 R1w7gbtNs/RDnfoa45cvmez8UxJB5zei7l2HWbSecR2DDTQ9aGfEU/0NEr9muBzK
 cuZIpmgD09VvS84jCxdYbaXA5Cau3NLOE5Os6UN9Wa3dUjWP5PArWkc8RzxP3Bgx
 PANtytesqJ9YeDDg9SyylPw6d3EYNldtFZDXlD32K2O2SDmUZxxJa+0og8K9YNJN
 aIUcqiSaYhg5rQ==
 =2lhV
 -----END PGP SIGNATURE-----

Merge tag 'x86-apic-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 APIC updates from Thomas Gleixner:

 - Make the quirk for non-maskable MSI interrupts in the affinity setter
   functional again.

   It was broken by a MSI core code update, which restructured the code
   in a way that the quirk flag was not longer set correctly.

   Trying to restore the core logic caused a deeper inspection and it
   turned out that the extra quirk flag is not required at all because
   it's the inverse of the reservation mode bit, which only can be set
   when the MSI interrupt is maskable.

   So the trivial fix is to use the reservation mode check in the
   affinity setter function and remove almost 40 lines of code related
   to the no-mask quirk flag.

 - Cure a Kconfig dependency issue which causes compile failures by
   correcting the conditionals in the affected header files.

 - Clean up coding style in the UV APIC driver.

* tag 'x86-apic-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/msi: Fix misconfigured non-maskable MSI quirk
  x86/msi: Fix compile error caused by CONFIG_GENERIC_MSI_IRQ=y && !CONFIG_X86_LOCAL_APIC
  x86/platform/uv/apic: Clean up inconsistent indenting
2023-10-30 17:27:56 -10:00
Linus Torvalds
ecb8cd2a9f Enable CONFIG_DEBUG_ENTRY=y in the x86 defconfigs.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU9DBQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j8ORAArwXCeaxkCzMpyhnHbTPcTRsApkQnfKPf
 svhRDc2+jj1j4RzPKK+IzjoEUOa2MmAhc3iaRr2v+BKMWbs0AunuJrlVv7mL0xy6
 VHQvz53rtvTgZCdZ9Pde9wmyg2nT0HCyukDG9jNI+LdPws8KrAm6L5YOnD87epsD
 sKB2YGQ64T37qgQYbD3vomv8TDo18eEeRY1LNkUsu/P0PVJ3JjYkV0Lt7JTFoGiQ
 svA4uZ0DYsCqZK07qNxd8Z9SPguK/gLgbtXK6k1zwU7HhgzyUTnINm9AF9Zl5eK8
 RhelY4ElxO8JuGJtP/bJ8dstDko3tswSe1YumW2z2dxZGi3tn0j1RHszaU3y20cl
 Lj3qpLRauFjroia83U6S5I2QgVBkjqivSiS9Sqnur426psLkBrCyXhlJzxRuvy9m
 SPfb6mtxS1YZ/C5+xRO0Ny51SaDsLz+BnmfTwnAlq9qf70kqHYnJSY2hbc0avIiG
 sYChWbpE+dnzfDps1dy0dIqd7OZGRyWHZ8hugeZA4tWOk05NXh8kN9CT7zSJ83CG
 /FXZhtImPAdfndVH6/i37gTFdMem+ud8kP4zwUyNEYtN0uB/knqurWNoTIr9R/e1
 oyRpFCeMfqXk1GnPmceTMtLGvcPMEQr3BPOnrsdGP1KVX6CPUL/SidSPrTZFlEh/
 xTMiFYGZBTo=
 =Foih
 -----END PGP SIGNATURE-----

Merge tag 'x86-build-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 build update from Ingo Molnar:
 "Enable CONFIG_DEBUG_ENTRY=y in the x86 defconfigs"

* tag 'x86-build-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/defconfig: Enable CONFIG_DEBUG_ENTRY=y
2023-10-30 16:10:06 -10:00
Linus Torvalds
f0d25b5d0f x86 MM handling code changes for v6.7:
- Add new NX-stack self-test
  - Improve NUMA partial-CFMWS handling
  - Fix #VC handler bugs resulting in SEV-SNP boot failures
  - Drop the 4MB memory size restriction on minimal NUMA nodes
  - Reorganize headers a bit, in preparation to header dependency reduction efforts
  - Misc cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU9Ek4RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gIJQ/+Mg6mzMaThyNXqhJszeZJBmDaBv2sqjAB
 5tcferg1nJBdNBzX8bJ95UFt9fIqeYAcgH00qlQCYSmyzbC1TQTk9U2Pre1zbOw4
 042ONK8sygKSje1zdYleHoBeqwnxD2VNM0NwBElhGjumwHRng/tbLiI9wx6qiz+C
 VsFXavkBszHGA1pjy9wZLGixYIH5jCygMpH134Wp+CIhpS+C4nftcGdIL1D5Oil1
 6Tm2XeI6uyfiQhm9IOwDjfoYeC7gUjx1rp8rHseGUMJxyO/BX9q5j1ixbsVriqfW
 97ucYuRL9mza7ic516C9v7OlAA3AGH2xWV+SYOGK88i9Co4kYzP4WnamxXqOsD8+
 popxG55oa6QelhaouTBZvgERpZ4fWupSDs/UccsDaE9leMCerNEbGHEzt/Mm/2sw
 xopjMQ0y5Kn6/fS0dLv8U+XHu4ANkvXJkFd6Ny0h/WfgGefuQOOTG9ruYgfeqqB8
 dViQ4R7CO8ySjD45KawAZl/EqL86x1M/CI1nlt0YY4vNwUuOJbebL7Jn8w3Fjxm5
 FVfUlDmcPdhZfL9Vnrsi6MIou1cU1yJPw4D6sXJ4sg4s7A4ebBcRRrjayVQ4msjv
 Q7cvBOMnWEHhOV11pvP50FmQuj74XW3bUqiuWrnK1SypvnhHavF6kc1XYpBLs1xZ
 y8nueJW2qPw=
 =tT5F
 -----END PGP SIGNATURE-----

Merge tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm handling updates from Ingo Molnar:

 - Add new NX-stack self-test

 - Improve NUMA partial-CFMWS handling

 - Fix #VC handler bugs resulting in SEV-SNP boot failures

 - Drop the 4MB memory size restriction on minimal NUMA nodes

 - Reorganize headers a bit, in preparation to header dependency
   reduction efforts

 - Misc cleanups & fixes

* tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size
  selftests/x86/lam: Zero out buffer for readlink()
  x86/sev: Drop unneeded #include
  x86/sev: Move sev_setup_arch() to mem_encrypt.c
  x86/tdx: Replace deprecated strncpy() with strtomem_pad()
  selftests/x86/mm: Add new test that userspace stack is in fact NX
  x86/sev: Make boot_ghcb_page[] static
  x86/boot: Move x86_cache_alignment initialization to correct spot
  x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach
  x86/sev-es: Allow copy_from_kernel_nofault() in earlier boot
  x86_64: Show CR4.PSE on auxiliaries like on BSP
  x86/iommu/docs: Update AMD IOMMU specification document URL
  x86/sev/docs: Update document URL in amd-memory-encryption.rst
  x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions from <asm/processor.h> to <asm/pgtable.h>
  ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window
  x86/numa: Introduce numa_fill_memblks()
2023-10-30 15:40:57 -10:00
Linus Torvalds
1641b9b040 Fix out-of-order NMI nesting checks resulting in false positive warnings.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU9D8IRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1il7BAAq+Ke3O07Ru+aOpBljqaqDigZWHf9PG5w
 f7M2fx/XFKOPmTqhft+c2WoqymQql23WXIucdLR795xM10aLLsTKwk1govaQFIqj
 HmKMyyK4PC4g/1Vw60PtrGh4uWxykxqrz0tgYaIp+UxsGzBvAwxX9fhI0fENt6th
 DI+TuTqf5hiBB6/18Glwu4hFZOxsGDSyIdR833TeU8vdIJNnWRVYEMRd1SPoa7k4
 l8+Slh4XoNwfRU8+iEqT3BJUupEcfsPx37Y67NCVl4Bv1rKXMeJbCq65j2TOmQjW
 X5yGonFfu9QbDBjf413gan+ToYFTsDrMni3pd8lt3WII7AOEchwThPCJuLVG3Q3j
 4GxNO6Ul8tspiPXXe8IghPkHPWsjwM6NoXHJp3ZeEgWbiUDSakmOdIQ6yNV38CPv
 uoh5vd7MlLZu+eWeCGoErp2U3H0CVHj7UDASHDgGK4cGGi4Aoc+5ZAY2shU7jAYz
 qiiSn+ufaoka6lIyhKqhiMuybRmXwrSLfrNnIJBg8FJpPI89cOIZJzlD4L9y7p2f
 Fm37zT3n68H1UnAgSs18ZV7GCmJN02O9Cwnsu+9VHPXurPpTDzmvu+7TrXJ0fMgT
 NdqzV8V2wOWfsij45P+2aA1pR8EypP54PYr1N4Vwt5noOBqM0Ak71dXVenYbwAYM
 w3Ymx5kIxFM=
 =/att
 -----END PGP SIGNATURE-----

Merge tag 'x86-irq-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 irq fix from Ingo Molnar:
 "Fix out-of-order NMI nesting checks resulting in false positive
  warnings"

* tag 'x86-irq-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi: Fix out-of-order NMI nesting checks & false positive warning
2023-10-30 15:39:38 -10:00
Linus Torvalds
ed766c2611 Changes to the x86 entry code in v6.7:
- Make IA32_EMULATION boot time configurable with
    the new ia32_emulation=<bool> boot option.
 
  - Clean up fast syscall return validation code: convert
    it to C and refactor the code.
 
  - As part of this, optimize the canonical RIP test code.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU9DiARHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iNAw//cLn9gBXMVPDiCDVUOTqjkZ+OwIF11Y9v
 WatksSe5hrw0Bzl5CiSvtrWpTkKPnhyM8Lc1WD8l0YSMKprdkQfNAvQOPv0IMLjk
 XP1pgQhAiXwB87XL/G2sA6RunuK56zlnl7KJiDrQThrS/WOfrq3UkB2vyYEP4GtP
 69WZ/WM++u74uEml0+HZ0Z9HVvzwYl1VQPdTYfl52S4H3U8MXL89YEsPr13Ttq88
 FMKdXJ/VvItuVM/ZHHqFkGvRJjUtDWePLu29b684Ap6onDJ7uMMw86Gj5UxXtdpB
 Axsjuwlca8sCPotcqohay6IdyxIth6lMdvjPv0KhA+/QMrHbDaluv88YQs4k7Add
 1GPULH6oeDTHxMPOcJmFuSTpMY8HP6O9ZIXB6ogQRkLaDJKaWr5UQU7L2VBQ/WUy
 NRa6mba0XHYrz6U7DmtsdL0idWBJeJokHmaIcGJ/pp6gMznvufm2+SoJ6w6wcYva
 VTSTyrAAj/N9/TzJ5i8S2+yDPI9GanFpZJfYbW/rT9XGutvXWVKe3AmUNgR8O+hE
 JiEMfpR0TtXXlrik74jur/RPZhaFIE8MeCvJrkJ3oxQlPThYSTMBAlUOtD7kOfNT
 onjPrumREX4hOIBU+nnC9VrJMqxX9lz4xDzqw3jvX99Ma0o8Wx/UndWELX8tAYwd
 j8M8NWAbv90=
 =YkaP
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry updates from Ingo Molnar:

 - Make IA32_EMULATION boot time configurable with
   the new ia32_emulation=<bool> boot option

 - Clean up fast syscall return validation code: convert
   it to C and refactor the code

 - As part of this, optimize the canonical RIP test code

* tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/32: Clean up syscall fast exit tests
  x86/entry/64: Use TASK_SIZE_MAX for canonical RIP test
  x86/entry/64: Convert SYSRET validation tests to C
  x86/entry/32: Remove SEP test for SYSEXIT
  x86/entry/32: Convert do_fast_syscall_32() to bool return type
  x86/entry/compat: Combine return value test from syscall handler
  x86/entry/64: Remove obsolete comment on tracing vs. SYSRET
  x86: Make IA32_EMULATION boot time configurable
  x86/entry: Make IA32 syscalls' availability depend on ia32_enabled()
  x86/elf: Make loading of 32bit processes depend on ia32_enabled()
  x86/entry: Compile entry_SYSCALL32_ignore() unconditionally
  x86/entry: Rename ignore_sysret()
  x86: Introduce ia32_enabled()
2023-10-30 15:27:27 -10:00
Linus Torvalds
5780e39edb x86 assembly code improvements for v6.7 are:
- Micro-optimize the x86 bitops code
 - Define target-specific {raw,this}_cpu_try_cmpxchg{64,128}() to improve code generation
 - Define and use raw_cpu_try_cmpxchg() preempt_count_set()
 - Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op
 - Remove the unused __sw_hweight64() implementation on x86-32
 - Misc fixes and cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU9BJkRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iBKBAAl++uUEmjJM2nfS1AtKS5Rn0YJGoj7ZLy
 MMjFJwYzw7nWqbkCZJqQATicmOVvdEUacYibYYfX4QkH3ylC9Av3ta1HeUEzqLpX
 qG8W4jJPu/qlAOtGI4mQkq/yVminea6xy6l0vcCF+pezUKnwADP/YSL2Zsg03UsX
 Nelty29NrpN/qCcLUJk40CHRjBhfYBVEk+HtCMahnftzLSNZWHYTgoYA9x4zm4Hg
 LGiION+dwJKwaafaNw8/k1ikRJYfc5c1ZImi7YoOeCXXtBq7VHOHf76axok42m4s
 2FJ7QefioQ/Gs1Gojxd99080F4H/Elt6uj05hR2493ncN4WVNKqYBOMezrlG686D
 CuoOZvMaJ1LyaEntu1YWF3dtHIDTVjHhe5dyVclgu2a+v1gP/16xTLIf2z/cWtuX
 whOcalFc7AdGXnLtGACHnhbc5Yh/Ex9Y49+6WYgBrDcgNDCGdTOa/m8kFmKFgOjh
 x8Ot1xUjFI3utGgMlfKpYqw281ws+DjPiVvGi+fmUf8eBsq2IJuO9/f9oyfFfFSj
 1bzwrL5Qop/ccZAOxtuLnVVVZ+Cz/5wHmMTI0rRE5BsoiVUrtWZfD+E0/vqdavjG
 0eabDTdwUgXNBp6ex0TajXKtQY7FKg3tzIuPqHRvhRCvlt77MoSdcWRCc37ac5GJ
 M6mAeLcitu8=
 =7WUt
 -----END PGP SIGNATURE-----

Merge tag 'x86-asm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 assembly code updates from Ingo Molnar:

 - Micro-optimize the x86 bitops code

 - Define target-specific {raw,this}_cpu_try_cmpxchg{64,128}() to
   improve code generation

 - Define and use raw_cpu_try_cmpxchg() preempt_count_set()

 - Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op

 - Remove the unused __sw_hweight64() implementation on x86-32

 - Misc fixes and cleanups

* tag 'x86-asm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/lib: Address kernel-doc warnings
  x86/entry: Fix typos in comments
  x86/entry: Remove unused argument %rsi passed to exc_nmi()
  x86/bitops: Remove unused __sw_hweight64() assembly implementation on x86-32
  x86/percpu: Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op
  x86/percpu: Use raw_cpu_try_cmpxchg() in preempt_count_set()
  x86/percpu: Define raw_cpu_try_cmpxchg and this_cpu_try_cmpxchg()
  x86/percpu: Define {raw,this}_cpu_try_cmpxchg{64,128}
  x86/asm/bitops: Use __builtin_clz{l|ll} to evaluate constant expressions
2023-10-30 14:18:00 -10:00
Linus Torvalds
2b95bb0526 Changes to the x86 boot code in v6.7:
- Rework PE header generation, primarily to generate a modern, 4k aligned
    kernel image view with narrower W^X permissions.
 
  - Further refine init-lifetime annotations
 
  - Misc cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU9B6ERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jXOg/+NAOQKhIYK0uFqAM+CEhZX4cqsJ9Ck0ze
 bqQ8pf5iCkbVZ+6ByiMSOszScTgVTSalRfKMYR+Fa9PVkLK4SNAeYPnGYugmLRoj
 U3lZYFpNDEwsZOmFwvqn7p+bGBQcBYKZuVI6bQh5U7Go4v6ujPjK4zTAK8SWDdTp
 DtEzhj9tELcYlm1NSV2OYu/k0IWAFV3Fc++G3WAm85xOK7oXVOYeMIlaVkpOyAXu
 th3yCw+Q0u1tuBS++77FwsEPt1KTzKGcTL7HpPrb4e4e4snOhmri+KAM/Noef7Vm
 lWqo8fTAeYwpYQ80oFsXVDhuI5LsfsuQgQid20sy1cWwswe1o1A73/AeP4pRogWl
 zLJuRcuNg2/VhPvMLdBWn5QdgJjH7CngeH+r/YkZPssPo6tfwa5UW7HOTCQvLsO9
 a+xy098qkk9d+8Za0sYMuv8/4+Ev5II2haP8edLgNWQ8S5qKIUQaY+r6268pIN/F
 0fGP9B3wblBjiNWCnd8UBh6T271g1O4vaMUt2URdcW3QObEq2EGnNiTc5tx9OPnP
 ZxQdAIl6pB0H0HIe9/7PABF40biKn84zmSl+KuXrhvh1f5FjYjJWVNyKlAKdSpSR
 wjvzg1KbhLiAHV05oQSHR7txMHJxfjpxAKmus0Hpqo6qVQ9FgrKiru9VHKocIpKU
 z66g+wEKUuY=
 =sxZJ
 -----END PGP SIGNATURE-----

Merge tag 'x86-boot-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 boot updates from Ingo Molnar:

 - Rework PE header generation, primarily to generate a modern, 4k
   aligned kernel image view with narrower W^X permissions.

 - Further refine init-lifetime annotations

 - Misc cleanups & fixes

* tag 'x86-boot-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/boot: efistub: Assign global boot_params variable
  x86/boot: Rename conflicting 'boot_params' pointer to 'boot_params_ptr'
  x86/head/64: Move the __head definition to <asm/init.h>
  x86/head/64: Add missing __head annotation to startup_64_load_idt()
  x86/head/64: Mark 'startup_gdt[]' and 'startup_gdt_descr' as __initdata
  x86/boot: Harmonize the style of array-type parameter for fixup_pointer() calls
  x86/boot: Fix incorrect startup_gdt_descr.size
  x86/boot: Compile boot code with -std=gnu11 too
  x86/boot: Increase section and file alignment to 4k/512
  x86/boot: Split off PE/COFF .data section
  x86/boot: Drop PE/COFF .reloc section
  x86/boot: Construct PE/COFF .text section from assembler
  x86/boot: Derive file size from _edata symbol
  x86/boot: Define setup size in linker script
  x86/boot: Set EFI handover offset directly in header asm
  x86/boot: Grab kernel_info offset from zoffset header directly
  x86/boot: Drop references to startup_64
  x86/boot: Drop redundant code setting the root device
  x86/boot: Omit compression buffer from PE/COFF image memory footprint
  x86/boot: Remove the 'bugger off' message
  ...
2023-10-30 14:11:57 -10:00
Linus Torvalds
3b8b4b4fc4 Replace <asm/export.h> uses with <linux/export.h> and then remove <asm/export.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU9DyYRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iZqQ/+MZif5+F024nXciPOeogsMxtT5zStTd9a
 D+rBOCg6iodBIKKcfoqWUkjwXiREwnMxLq9JbArWIqOIBEEfMUsxz7S7PsnFA5dS
 /wXtJ6LHxiudeW6Vs41zLpJVANeAwxFXXyGGuRFk3Hxb7Zwh5jwhU76PKfBiUxrh
 7PzAqfOcVcbvlMXD/oMn9PlhRA4U+nrz+4ao00NCrbWBH/+ECHLb24IoB39jV8Gb
 XMDSsS5vxmCAR+ObRZm209lEf+tIv0T4I57GF85i5fX2aOrwPVvJDtnNMZY8xBDh
 TOyrq+JUTrl6JDKsRfiqvwxIG52bvytK0RHGfzUfObVtVa1AqlXwZaOUYItxlVjx
 f99DkIKhdUV6B3fVvYjsVP+g551kFKbh4EzYnLyp9XlMDIW9WVJv6u4GhR84piju
 3gxkYYBqnD58d/sf3xlAm5NDiqm2vGqw6qWKPkw7HrhX9QGaVkqIKMZTGVk2zM95
 LPVtXfqM+D3nscOmzoD9IsWmi7Z5ep8O0OfM0plOTRLYUlON2F5ewi4l0dNURH1g
 xHO3kr/TpzKeyv2NsG3ugqcO1evItJuCDx/PdzpR88HML/aXSeBh6IlEpMEnXQeq
 wQqB3YpgpTpV0/onIKfRs9qI4posWg0z0Rnfpp+nvVAjpsNrvndddZr65p5lkynZ
 9ccJQ7ipaw0=
 =O8Ce
 -----END PGP SIGNATURE-----

Merge tag 'x86-headers-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 header file cleanup from Ingo Molnar:
 "Replace <asm/export.h> uses with <linux/export.h> and then remove
  <asm/export.h>"

* tag 'x86-headers-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/headers: Remove <asm/export.h>
  x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>
  x86/headers: Remove unnecessary #include <asm/export.h>
2023-10-30 14:04:23 -10:00
Linus Torvalds
bceb7accb7 Performance events changes for v6.7 are:
- Add AMD Unified Memory Controller (UMC) events introduced with Zen 4
  - Simplify & clean up the uncore management code
  - Fall back from RDPMC to RDMSR on certain uncore PMUs
  - Improve per-package and cstate event reading
  - Extend the Intel ref-cycles event to GP counters
  - Fix Intel MTL event constraints
  - Improve the Intel hybrid CPU handling code
  - Micro-optimize the RAPL code
  - Optimize perf_cgroup_switch()
  - Improve large AUX area error handling
 
  - Misc fixes and cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU89YsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iQqQ/9EF9mG4te5By4qN+B7jADCmE71xG5ViKz
 sp4Thl86SHxhwFuiHn8dMUixrp+qbcemi5yTbQ9TF8cKl4s3Ju2CihU8jaauUp0a
 iS5W0IliMqLD1pxQoXAPLuPVInVYgrNOCbR4l6l7D6ervh5Z6PVEf7SVeAP3L5wo
 QV/V3NKkrYeNQL+FoKhCH8Vhxw0HxUmKJO7UhW6yuCt7BAok9Es18h3OVnn+7es4
 BB7VI/JvdmXf2ioKhTPnDXJjC+vh5vnwiBoTcdQ2W9ADhWUvfL4ozxOXT6z7oC3A
 nwBOdXf8w8Rqnqqd8hduop1QUrusMxlEVgOMCk27qHx97uWgPceZWdoxDXGHBiRK
 fqJAwXERf9wp5/M57NDlPwyf/43Hocdx2CdLkQBpfD78/k/sB5hW0KxnzY0FUI9x
 jBRQyWD05IDJATBaMHz+VbrexS+Itvjp2QvSiSm9zislYD4zA9fQ3lAgFhEpcUbA
 ZA/nN4t+CbiGEAsJEuBPlvSC1ahUwVP/0nz3PFlVWFDqAx0mXgVNKBe083A9yh7I
 dVisVY6KPAVDzyOc1LqzU8WFXNFnIkIIaLrb6fRHJVEM8MDfpLPS/a+7AHdRcDP4
 yq6fjVVjyP7e9lSQLYBUP3/3uiVnWQj92l6V6CrcgDMX5rDOb0VN+BQrmhPR6fWY
 WEim6WZrZj4=
 =OLsv
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance event updates from Ingo Molnar:
 - Add AMD Unified Memory Controller (UMC) events introduced with Zen 4
 - Simplify & clean up the uncore management code
 - Fall back from RDPMC to RDMSR on certain uncore PMUs
 - Improve per-package and cstate event reading
 - Extend the Intel ref-cycles event to GP counters
 - Fix Intel MTL event constraints
 - Improve the Intel hybrid CPU handling code
 - Micro-optimize the RAPL code
 - Optimize perf_cgroup_switch()
 - Improve large AUX area error handling
 - Misc fixes and cleanups

* tag 'perf-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  perf/x86/amd/uncore: Pass through error code for initialization failures, instead of -ENODEV
  perf/x86/amd/uncore: Fix uninitialized return value in amd_uncore_init()
  x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations
  perf: Optimize perf_cgroup_switch()
  perf/x86/amd/uncore: Add memory controller support
  perf/x86/amd/uncore: Add group exclusivity
  perf/x86/amd/uncore: Use rdmsr if rdpmc is unavailable
  perf/x86/amd/uncore: Move discovery and registration
  perf/x86/amd/uncore: Refactor uncore management
  perf/core: Allow reading package events from perf_event_read_local
  perf/x86/cstate: Allow reading the package statistics from local CPU
  perf/x86/intel/pt: Fix kernel-doc comments
  perf/x86/rapl: Annotate 'struct rapl_pmus' with __counted_by
  perf/core: Rename perf_proc_update_handler() -> perf_event_max_sample_rate_handler(), for readability
  perf/x86/rapl: Fix "Using plain integer as NULL pointer" Sparse warning
  perf/x86/rapl: Use local64_try_cmpxchg in rapl_event_update()
  perf/x86/rapl: Stop doing cpu_relax() in the local64_cmpxchg() loop in rapl_event_update()
  perf/core: Bail out early if the request AUX area is out of bound
  perf/x86/intel: Extend the ref-cycles event to GP counters
  perf/x86/intel: Fix broken fixed event constraints extension
  ...
2023-10-30 13:44:35 -10:00
Linus Torvalds
cd063c8b9e Misc fixes and cleanups:
- Fix potential MAX_NAME_LEN limit related build failures
  - Fix scripts/faddr2line symbol filtering bug
  - Fix scripts/faddr2line on LLVM=1
  - Fix scripts/faddr2line to accept readelf output with mapping symbols
  - Minor cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU88VYRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g2rQ//dvzezrAs+ZEhKLbRLSabbAlCeJ+J9zuP
 c0xBmaLwUh47sSDKfBLLEFN3IMDfgMdKjfb3E32vT/WQ+ASdfEMs6FfwRtaErypG
 XfZFpfC2WE1+Gq0MAgrXYuQgDv1Lygdimoy0aCwMlrgb7ZgWL1xorG0VSEemyKhd
 CoRFURKjeJIKJN1oOvTXKhp/SZyk39KHXeF4qSAjIGkrzsfDtEUSNR6NjBmeGUS4
 zNVWus/CucHK/6MMpHtdWw1/Ygemc1CBzYC3ZSMGimqy4Rqe2RsiGa0Y3XhlMCyn
 ekNFuUm9bxStaTknM3ZXga0xHPdKnTPkihxykLDzo0Nh9eysuFlmFrFJ2xL/B87k
 IxlpXvwxjxTSmGDhGQFVnXma6M2le3YFWGClS8UyhSPG08qg09ClwZ8OtVDi8ITI
 rj0VoFbFLuc8aeHF/tyF2t323JmcMHq0aHi+kMUElszm6+B+fPnD54gHU+REXVxO
 YIRkK9RY52mfU4KFf8xlO/UhFF6nP8pgE8pVnNF4lC034M0t4z+i/TLjOsspjVt3
 yMoZakD7sfUkAaCBq4mVfdWwo5UzTVse0BarbEcKxoME6wLEfN+efE850zGdy7n1
 iRC9AddddEyo4BnSHbWdWu/PDYJKPiH7dAtHBcfnEMJjLQewnRHlsHHbCA55jtrX
 363jNE3x6K4=
 =9U5x
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:
 "Misc fixes and cleanups:

   - Fix potential MAX_NAME_LEN limit related build failures

   - Fix scripts/faddr2line symbol filtering bug

   - Fix scripts/faddr2line on LLVM=1

   - Fix scripts/faddr2line to accept readelf output with mapping
     symbols

   - Minor cleanups"

* tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  scripts/faddr2line: Skip over mapping symbols in output from readelf
  scripts/faddr2line: Use LLVM addr2line and readelf if LLVM=1
  scripts/faddr2line: Don't filter out non-function symbols from readelf
  objtool: Remove max symbol name length limitation
  objtool: Propagate early errors
  objtool: Use 'the fallthrough' pseudo-keyword
  x86/speculation, objtool: Use absolute relocations for annotations
  x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find()
2023-10-30 13:20:02 -10:00
Linus Torvalds
63ce50fff9 Scheduler changes for v6.7 are:
- Fair scheduler (SCHED_OTHER) improvements:
 
     - Remove the old and now unused SIS_PROP code & option
     - Scan cluster before LLC in the wake-up path
     - Use candidate prev/recent_used CPU if scanning failed for cluster wakeup
 
  - NUMA scheduling improvements:
 
     - Improve the VMA access-PID code to better skip/scan VMAs
     - Extend tracing to cover VMA-skipping decisions
     - Improve/fix the recently introduced sched_numa_find_nth_cpu() code
     - Generalize numa_map_to_online_node()
 
  - Energy scheduling improvements:
 
     - Remove the EM_MAX_COMPLEXITY limit
     - Add tracepoints to track energy computation
     - Make the behavior of the 'sched_energy_aware' sysctl more consistent
     - Consolidate and clean up access to a CPU's max compute capacity
     - Fix uclamp code corner cases
 
  - RT scheduling improvements:
 
     - Drive dl_rq->overloaded with dl_rq->pushable_dl_tasks updates
     - Drive the ->rto_mask with rt_rq->pushable_tasks updates
 
  - Scheduler scalability improvements:
 
     - Rate-limit updates to tg->load_avg
     - On x86 disable IBRS when CPU is offline to improve single-threaded performance
     - Micro-optimize in_task() and in_interrupt()
     - Micro-optimize the PSI code
     - Avoid updating PSI triggers and ->rtpoll_total when there are no state changes
 
  - Core scheduler infrastructure improvements:
 
     - Use saved_state to reduce some spurious freezer wakeups
     - Bring in a handful of fast-headers improvements to scheduler headers
     - Make the scheduler UAPI headers more widely usable by user-space
     - Simplify the control flow of scheduler syscalls by using lock guards
     - Fix sched_setaffinity() vs. CPU hotplug race
 
  - Scheduler debuggability improvements:
     - Disallow writing invalid values to sched_rt_period_us
     - Fix a race in the rq-clock debugging code triggering warnings
     - Fix a warning in the bandwidth distribution code
     - Micro-optimize in_atomic_preempt_off() checks
     - Enforce that the tasklist_lock is held in for_each_thread()
     - Print the TGID in sched_show_task()
     - Remove the /proc/sys/kernel/sched_child_runs_first sysctl
 
  - Misc cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU8/NoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gN+xAAvKGYNZBCBG4jowxccgqAbCx81KOhhsy/
 KUaOmdLPg9WaXuqjZ5sggXQCMT0wUqBYAmqV7ts53VhWcma2I1ap4dCM6Jj+RLrc
 vNwkeNetsikiZtarMoCJs5NahL8ULh3liBaoAkkToPjQ5r43aZ/eKwDovEdIKc+g
 +Vgn7jUY8ssIrAOKT1midSwY1y8kAU2AzWOSFDTgedkJP4PgOu9/lBl9jSJ2sYaX
 N4XqONYPXTwOHUtvmzkYILxLz0k0GgJ7hmt78E8Xy2rC4taGCRwCfCMBYxREuwiP
 huo3O1P/iIe5svm4/EBUvcpvf44eAWTV+CD0dnJPwOc9IvFhpSzqSZZAsyy/JQKt
 Lnzmc/xmyc1PnXCYJfHuXrw2/m+MyUHaegPzh5iLJFrlqa79GavOElj0jNTAMzbZ
 39fybzPtuFP+64faRfu0BBlQZfORPBNc/oWMpPKqgP58YGuveKTWaUF5rl5lM7Ne
 nm07uOmq02JVR8YzPl/FcfhU2dPMawWuMwUjEr2eU+lAunY3PF88vu0FALj7iOBd
 66F8qrtpDHJanOxrdEUwSJ7hgw79qY1iw66Db7cQYjMazFKZONxArQPqFUZ0ngLI
 n9hVa7brg1bAQKrQflqjcIAIbpVu3SjPEl15cKpAJTB/gn5H66TQgw8uQ6HfG+h2
 GtOsn1nlvuk=
 =GDqb
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Fair scheduler (SCHED_OTHER) improvements:
   - Remove the old and now unused SIS_PROP code & option
   - Scan cluster before LLC in the wake-up path
   - Use candidate prev/recent_used CPU if scanning failed for cluster
     wakeup

  NUMA scheduling improvements:
   - Improve the VMA access-PID code to better skip/scan VMAs
   - Extend tracing to cover VMA-skipping decisions
   - Improve/fix the recently introduced sched_numa_find_nth_cpu() code
   - Generalize numa_map_to_online_node()

  Energy scheduling improvements:
   - Remove the EM_MAX_COMPLEXITY limit
   - Add tracepoints to track energy computation
   - Make the behavior of the 'sched_energy_aware' sysctl more
     consistent
   - Consolidate and clean up access to a CPU's max compute capacity
   - Fix uclamp code corner cases

  RT scheduling improvements:
   - Drive dl_rq->overloaded with dl_rq->pushable_dl_tasks updates
   - Drive the ->rto_mask with rt_rq->pushable_tasks updates

  Scheduler scalability improvements:
   - Rate-limit updates to tg->load_avg
   - On x86 disable IBRS when CPU is offline to improve single-threaded
     performance
   - Micro-optimize in_task() and in_interrupt()
   - Micro-optimize the PSI code
   - Avoid updating PSI triggers and ->rtpoll_total when there are no
     state changes

  Core scheduler infrastructure improvements:
   - Use saved_state to reduce some spurious freezer wakeups
   - Bring in a handful of fast-headers improvements to scheduler
     headers
   - Make the scheduler UAPI headers more widely usable by user-space
   - Simplify the control flow of scheduler syscalls by using lock
     guards
   - Fix sched_setaffinity() vs. CPU hotplug race

  Scheduler debuggability improvements:
   - Disallow writing invalid values to sched_rt_period_us
   - Fix a race in the rq-clock debugging code triggering warnings
   - Fix a warning in the bandwidth distribution code
   - Micro-optimize in_atomic_preempt_off() checks
   - Enforce that the tasklist_lock is held in for_each_thread()
   - Print the TGID in sched_show_task()
   - Remove the /proc/sys/kernel/sched_child_runs_first sysctl

  ... and misc cleanups & fixes"

* tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
  sched/fair: Remove SIS_PROP
  sched/fair: Use candidate prev/recent_used CPU if scanning failed for cluster wakeup
  sched/fair: Scan cluster before scanning LLC in wake-up path
  sched: Add cpus_share_resources API
  sched/core: Fix RQCF_ACT_SKIP leak
  sched/fair: Remove unused 'curr' argument from pick_next_entity()
  sched/nohz: Update comments about NEWILB_KICK
  sched/fair: Remove duplicate #include
  sched/psi: Update poll => rtpoll in relevant comments
  sched: Make PELT acronym definition searchable
  sched: Fix stop_one_cpu_nowait() vs hotplug
  sched/psi: Bail out early from irq time accounting
  sched/topology: Rename 'DIE' domain to 'PKG'
  sched/psi: Delete the 'update_total' function parameter from update_triggers()
  sched/psi: Avoid updating PSI triggers and ->rtpoll_total when there are no state changes
  sched/headers: Remove comment referring to rq::cpu_load, since this has been removed
  sched/numa: Complete scanning of inactive VMAs when there is no alternative
  sched/numa: Complete scanning of partial VMAs regardless of PID activity
  sched/numa: Move up the access pid reset logic
  sched/numa: Trace decisions related to skipping VMAs
  ...
2023-10-30 13:12:15 -10:00
Linus Torvalds
3cf3fabccb Locking changes in this cycle are:
- Futex improvements:
 
     - Add the 'futex2' syscall ABI, which is an attempt to get away from the
       multiplex syscall and adds a little room for extentions, while lifting
       some limitations.
 
     - Fix futex PI recursive rt_mutex waiter state bug
 
     - Fix inter-process shared futexes on no-MMU systems
 
     - Use folios instead of pages
 
  - Micro-optimizations of locking primitives:
 
     - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
       architectures, to improve lockref code generation.
 
     - Improve the x86-32 lockref_get_not_zero() main loop by adding
       build-time CMPXCHG8B support detection for the relevant lockref code,
       and by better interfacing the CMPXCHG8B assembly code with the compiler.
 
     - Introduce arch_sync_try_cmpxchg() on x86 to improve sync_try_cmpxchg()
       code generation. Convert some sync_cmpxchg() users to sync_try_cmpxchg().
 
     - Micro-optimize rcuref_put_slowpath()
 
  - Locking debuggability improvements:
 
     - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well
 
     - Enforce atomicity of sched_submit_work(), which is de-facto atomic but
       was un-enforced previously.
 
     - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check semantics
 
     - Fix ww_mutex self-tests
 
     - Clean up const-propagation in <linux/seqlock.h> and simplify
       the API-instantiation macros a bit.
 
  - RT locking improvements:
 
     - Provide the rt_mutex_*_schedule() primitives/helpers and use them
       in the rtmutex code to avoid recursion vs. rtlock on the PI state.
 
     - Add nested blocking lockdep asserts to rt_mutex_lock(), rtlock_lock()
       and rwbase_read_lock().
 
  - Plus misc fixes & cleanups
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU877IRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g9jw/+N7rxQ78dmFCYh4UWnLCYvuKP0/ivHErG
 493JcB8MupuA2tfJHIkDdr4aM2mNq2E61w69/WlZAQWWD6pdOhwgF5Xf5eoEcJm0
 vsAhWBGLxihXdtevPuMAx0dEpg3AMp2wc6i5PkN831KdPUgCNsrKq9Bfnfef7/G8
 MQTSHjmtba6jxleyxfEa4tE2xe5PJX825nRfkX2e1cf+stkYua+uJFxVxUfxFWGE
 4pBy70D9OC7MsJ44WWOA1gwkVtMMiBTmRPNjlP8Gz2GQ0f3ERHRwYk3jDHOPHZI6
 0GNt7pE3IMXQn2UuDtfkvv9IFTd+U5qD+APnWIn2ntWXqzGLFqOlmovMrobVn7El
 olYDCyweWPG71m1Qblsb1VK2QjRPQVJ9NAEg8RlDHIu2ThxHbMysDVGPVOYnPFq4
 S8QFpmldzbNoPU4rDJyT1fAmoUIrusBHkl+Us3yGfC74iM+fHnDEvaSoMZbzEdY1
 x/Nocj9XgKEgfXdYzrCWFmZ9xXqHkO25/wDL6yKqBdQtvaEalXuHTT6mQcYxrUPm
 Xx1BPan2Jg7p4u2oOFcVtKewUtRH9KBx8qytr5S+JK4PJbrBsixMnr84HLd/3X2V
 ykYkO+367T5MTYv4TnJDE5vdurzUqekKSCFPY3skPujPJfdLj1vsPzYf9iMkCLdo
 hU2f/R+Wpdk=
 =36Ff
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Info Molnar:
 "Futex improvements:

   - Add the 'futex2' syscall ABI, which is an attempt to get away from
     the multiplex syscall and adds a little room for extentions, while
     lifting some limitations.

   - Fix futex PI recursive rt_mutex waiter state bug

   - Fix inter-process shared futexes on no-MMU systems

   - Use folios instead of pages

  Micro-optimizations of locking primitives:

   - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
     architectures, to improve lockref code generation

   - Improve the x86-32 lockref_get_not_zero() main loop by adding
     build-time CMPXCHG8B support detection for the relevant lockref
     code, and by better interfacing the CMPXCHG8B assembly code with
     the compiler

   - Introduce arch_sync_try_cmpxchg() on x86 to improve
     sync_try_cmpxchg() code generation. Convert some sync_cmpxchg()
     users to sync_try_cmpxchg().

   - Micro-optimize rcuref_put_slowpath()

  Locking debuggability improvements:

   - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well

   - Enforce atomicity of sched_submit_work(), which is de-facto atomic
     but was un-enforced previously.

   - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check
     semantics

   - Fix ww_mutex self-tests

   - Clean up const-propagation in <linux/seqlock.h> and simplify the
     API-instantiation macros a bit

  RT locking improvements:

   - Provide the rt_mutex_*_schedule() primitives/helpers and use them
     in the rtmutex code to avoid recursion vs. rtlock on the PI state.

   - Add nested blocking lockdep asserts to rt_mutex_lock(),
     rtlock_lock() and rwbase_read_lock()

  .. plus misc fixes & cleanups"

* tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
  futex: Don't include process MM in futex key on no-MMU
  locking/seqlock: Fix grammar in comment
  alpha: Fix up new futex syscall numbers
  locking/seqlock: Propagate 'const' pointers within read-only methods, remove forced type casts
  locking/lockdep: Fix string sizing bug that triggers a format-truncation compiler-warning
  locking/seqlock: Change __seqprop() to return the function pointer
  locking/seqlock: Simplify SEQCOUNT_LOCKNAME()
  locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()
  locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg()
  locking/atomic/x86: Introduce arch_sync_try_cmpxchg()
  locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback
  locking/seqlock: Fix typo in comment
  futex/requeue: Remove unnecessary ‘NULL’ initialization from futex_proxy_trylock_atomic()
  locking/local, arch: Rewrite local_add_unless() as a static inline function
  locking/debug: Fix debugfs API return value checks to use IS_ERR()
  locking/ww_mutex/test: Make sure we bail out instead of livelock
  locking/ww_mutex/test: Fix potential workqueue corruption
  locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootup
  futex: Add sys_futex_requeue()
  futex: Add flags2 argument to futex_requeue()
  ...
2023-10-30 12:38:48 -10:00
Linus Torvalds
9cda4eb04a - A kernel-doc fix
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmU8zJkACgkQEsHwGGHe
 VUpXxg/+Jn2nZG6jJr1PelUnGGBMY3GZuXmB947Ueb9BgbZEvi2VCp93ipO6EkWl
 ba0/3Uw0nO29FHsVM508sbLTHqo5RaMsjfg933HKOwiNx9pPIVtYfB4bs2b1Habq
 qCSCCUTNhNuDnxDduE/tglDVS5vab0fhODLnkYnzrvJ7D/3Zn5BwETbTYslRdnmD
 sIA8R1msot8n+a9aqHb99Lp+wmkinCsuhJbf+rIajVZUAFCSF+sL3hhaVIExgIWq
 3spI7ldKuxcNunmwB/LZYFGtQBeJb5r1JAg2LPOhui1W2D+oRbdXZCh8K2wA7Hec
 z+oSkep0qVbOED+GBfTDqxHYAiBkM0b1QwZdyr1cI3qSSZlb9Z7DreAewS5cxbj/
 hhYQl7voNDn7JH3ROOVyoo5seYIhtjO6GMQ7pPmXkvtXgm05cm7v8MDVN6m/ECPm
 1Iv9paI6EfVGORQLcG7ZndzLkN91KgT3h3OfYGISTrNObmDzMNxPSJT3/GY7TJ8l
 V/QR5hzfw7nuWBES1FRypfB9H8qiezfitMuaEmCHawJiSIwOBbsDHX0mAJzbeU9U
 64HFn98NHDRQLWFIq3s1Vl4omgVGMF0LI6FNOADlEIW0cWzxMk1tLkfVqhhUk6Ar
 b3f8DFAYXSCLSiTvm9yNErdQoSn5g2e+V1Ugs8Qu2pgRhmd1nh0=
 =E12/
 -----END PGP SIGNATURE-----

Merge tag 'x86_fpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu fixlet from Borislav Petkov:

 - kernel-doc fix

* tag 'x86_fpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu/xstate: Address kernel-doc warning
2023-10-30 12:36:41 -10:00
Linus Torvalds
f155f3b3ed - Make sure PCI function 4 IDs of AMD family 0x19, models 0x60-0x7f are
actually used in the amd_nb.c enumeration
 
 - Add support for extracting NUMA information from devicetree for
   Hyper-V usages
 
 - Add PCI device IDs for the new AMD MI300 AI accelerators
 
 - Annotate an array in struct uv_rtc_timer_head with the new
   __counted_by attribute
 
 - Rework UV's NMI action parameter handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmU78I4ACgkQEsHwGGHe
 VUoDFQ//a8KYzcj2wC+GA+tAScMCP4lT3hsSYoHvFE6Fd4HBngMUBcaEHY0InxWc
 jfXRHFZBS54sLMH2XPVOnozTBDLUEF+4cnApj9iVf//RwjSz/mnJ1P/fqwGQHmfY
 eyOKn38nzkdcf+MW69HAyOeOLxOhOcxEdrzJ4AuPijjTH0GfHWXtuZrqCgaRB6XZ
 yYw7PwO9azSBIYA3c87IPo5uGDa0Ht/BWWHv+XD21DjH0E2n2jaGPgExojBwpI1F
 zJZkx8xJWlQANSD9qiYWoYFdEluGAJ6Fyn989hMRVUqFLbGB9ppEu0BD5TWor08A
 UMX3LYIXM+706EnpOYKc5m5dajbUNOEUl73M6S6J67ugvRUWjCZpoejEx1Aejup7
 aCcYrkbSiuVPFn8Wj9/JdkItu+zyzzsp3fjEdYlYVfefLh3EY4IrFAkQGUk7A0cA
 p6Idau8j4Cx+4lvNEFGMk/u7SAxcInJx7NDy8rJL9CG1BdbjzkwJH+GJFELuI/zM
 dSd+N2r5YUMClHR8Tor17PDh61i+ovg1kPhOyDyf7bNv0HQG0Hbcded4lyiVrsXa
 m5GNYRXEQFZXMeBxo6dJKDAj2rdSPsACC2FBTa9B+bciklcWid4FVTvf/9sSFeQv
 cvGFfGbFdW/uddAHsWqHMiuTfBsQneZ9UbZCKq/kAkHAtfdhRa8=
 =7JVZ
 -----END PGP SIGNATURE-----

Merge tag 'x86_platform_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 platform updates from Borislav Petkov:

 - Make sure PCI function 4 IDs of AMD family 0x19, models 0x60-0x7f are
   actually used in the amd_nb.c enumeration

 - Add support for extracting NUMA information from devicetree for
   Hyper-V usages

 - Add PCI device IDs for the new AMD MI300 AI accelerators

 - Annotate an array in struct uv_rtc_timer_head with the new
   __counted_by attribute

 - Rework UV's NMI action parameter handling

* tag 'x86_platform_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/amd_nb: Use Family 19h Models 60h-7Fh Function 4 IDs
  x86/numa: Add Devicetree support
  x86/of: Move the x86_flattree_get_config() call out of x86_dtb_init()
  x86/amd_nb: Add AMD Family MI300 PCI IDs
  x86/platform/uv: Annotate struct uv_rtc_timer_head with __counted_by
  x86/platform/uv: Rework NMI "action" modparam handling
2023-10-30 12:32:48 -10:00
Linus Torvalds
ca2e9c3bee - Make sure the "svm" feature flag is cleared from /proc/cpuinfo when
virtualization support is disabled in the BIOS on AMD and Hygon
   platforms
 
 - A minor cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmU77KoACgkQEsHwGGHe
 VUrophAAtfsB+WhRydin0V6kjQeH+RbiWyx/jOw6eNqvzOzaOPxVXn0cAHRSgAO4
 +S8tKIqaWpXNNNKpOIKBVaDkh9qr50/p36/jfVkXi8GOLYrK633F0BMjcG4+/vYQ
 A9b5iNiJhZ7xWE6+qRrqdg+o+a6UyPUGz34HNp3KwJVTdaHU2OnXXwuWeiUkgRrJ
 uQSfLc4+UIeefIzNy8Tqg083iaENBYMya7U90rzewD64NF0bsA15AEPut/6tnUVq
 ej3UU3cqO7nKXyhuZX+zpt856MZFa1rNYVXUAfoAO4xhqdN0Q5LFWO506sqajNx/
 hqbT+hKDoC03zuLmbZO21s/uWQdtVFo63FU0h9QBRp1m6Ug5P3rQQCK8ydJc5xwr
 Yd7je6UPK9jIKBo9VP1qmsyzGwADNevNf1qGExHI2T6Wml7HgDmPysAHnGiKqRGI
 1o9+Yqa+VBt8Wml9M8Ny+dLyr5F/2uq8sMrQedQlXdFMSzVm2JYecukJ5BvUWE/r
 Qyll8mTpIdgGXjBt56lMrgH7ibMC5ct/4MvTHOHuA997g/PwuwtWj7QyKXpUq2Rf
 o/c3zKKWIFxevjzwU86haCBaz+5xAQlB6dJw61ExxsmUuT/kZzkN15w6aqGZtpns
 PsARwnvuwZJ7vfqFLIa0ZkPN4OgnkRX7HlNqrVyKpONDTocZd9E=
 =i9On
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:

 - Make sure the "svm" feature flag is cleared from /proc/cpuinfo when
   virtualization support is disabled in the BIOS on AMD and Hygon
   platforms

 - A minor cleanup

* tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/amd: Remove redundant 'break' statement
  x86/cpu: Clear SVM feature if disabled by BIOS
2023-10-30 12:10:24 -10:00
Linus Torvalds
9ab021a1b5 - Add support for non-contiguous capacity bitmasks being added to
Intel's CAT implementation
 
 - Other improvements to resctrl code: better configuration,
   simplifications, debugging support, fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmU7x3YACgkQEsHwGGHe
 VUqj4BAAn9HiQPuBWW5UPVpLoHBmKHtoNuIn2AWD3wcRwFwd+mO1JbPgQzMude0C
 QV/Dpm+PPxyFNATtCiRtqns3qHSt8wVMy/mrOKT7R20mmBxIhMb+323YvoamFSzc
 gamSKDZJFEp8Dqj2ccnwpFIdPjlTZGuOCumcxrHbrEs10ezsZ1UCSOjlQrRJSrFX
 M/KCVmrwVt6VR24Sz3K7e6atWgnl5Gj926VB0wXFSVAHII22Pirx7rdsHZXkgI0z
 pBHomVyZxyjzo3XV9szG1h+3iTPIebWH6A+25YGpmh7PZeFzuJhn6XXbpZ35tjSw
 3EKjkbwJyDXLLfAV+MMzVYZeCzpxy5MEuTW6aNi4y59k6GAhyeHClq9HWePo8rp7
 lMXVfSeFpdtG0n7WUVF2ctm7mAqTF8id2WGNfvXoP/bzB2mkXQ4x1GV+TvxAyex8
 OAk7iHk0IhrakfDj1XAE1o0BiSkGKaNo0eT8LnuByaUvHQHSBo/24fFcFOC9V1NL
 K4eGfgn7yyXBFWvJch5LtdmG3LHQEJ9Dh4zZ8TkyHOt42Lc32JdHtBrRGdWRMyq9
 p5lhLvPwuumjfjTeaXG4ABacdED1a8fiUzydumHmAux+an7irTqxwP51Y9MrxR1O
 37+YBgEcO2nmubCUKUjvBga4ztFo0f/hMVqGqnME+tYwr7NzXx8=
 =GGQX
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:

 - Add support for non-contiguous capacity bitmasks being added to
   Intel's CAT implementation

 - Other improvements to resctrl code: better configuration,
   simplifications, debugging support, fixes

* tag 'x86_cache_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Display RMID of resource group
  x86/resctrl: Add support for the files of MON groups only
  x86/resctrl: Display CLOSID for resource group
  x86/resctrl: Introduce "-o debug" mount option
  x86/resctrl: Move default group file creation to mount
  x86/resctrl: Unwind properly from rdt_enable_ctx()
  x86/resctrl: Rename rftype flags for consistency
  x86/resctrl: Simplify rftype flag definitions
  x86/resctrl: Add multiple tasks to the resctrl group at once
  Documentation/x86: Document resctrl's new sparse_masks
  x86/resctrl: Add sparse_masks file in info
  x86/resctrl: Enable non-contiguous CBMs in Intel CAT
  x86/resctrl: Rename arch_has_sparse_bitmaps
  x86/resctrl: Fix remaining kernel-doc warnings
2023-10-30 12:07:29 -10:00
Linus Torvalds
f84a52eef5 - A bunch of improvements, cleanups and fixlets to the SRSO mitigation
machinery and other, general cleanups to the hw mitigations code,
   by Josh Poimboeuf
 
 - Improve the return thunk detection by objtool as it is absolutely
   important that the default return thunk is not used after returns
   have been patched. Future work to detect and report this better is
   pending
 
 - Other misc cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmU7mFEACgkQEsHwGGHe
 VUpbBxAAtS4X5LCntPWUsDEBU80SBYAunEp0Wd0ttYEj+UrEk4tvnWVGFiIEr47A
 PrRKK9JCJtC6ko0+dwPtMi66L/T7mCpoNPI1kzfRG1IHJBfvCTGJhzZsesogvkA2
 1X9Je+RCVW4xVybIryxhjMGdB6jUoGEU1a4DmQXq481qiLB3ilvA1bIAaNo9BBYP
 rxKPrPcdOxn2NjxuOWg+FXjSc8LuAVSu3HqsgCW2AHJ6XIKEYWEq9FkXhwj9OJOr
 ax1F4qD1IY++jYZO9DJiltjeJyj0wC+yp8kDDURoLbcTk85WHlpD5vK0g64mELOA
 y0375thHep+vsrtQ/qZAmi/eVTaTekgbi7McahjoZebK7FbKOYRk6GZ+5+m29AVr
 DfQSJ7xQQqbCbpimeFmZ+gQf7mFexyDWvjUPyBl+OelOY1umdPM9IZVTnqib5LPr
 D2M+uqWfJhSwACi2o05LRv0gyhkAz0bGHrwZPmCVuxE5kBbhOpj4aT87fetUp/MW
 8lEFa3PHx/gkh2VOJ7ZgKzpeD75Vjo8TRAXOe4O2jn/L54gNEJ+1mukvrjW3+lp1
 ShmcZokl3ldPq6F5ioE+u45hVAfHkaruWM+5Rj3hsA/fdFN3isTVLhIRIsypPTKc
 p1ITT8Yhek8vkm9PcRBE5xWRmEZ2XE5ooDld930nJxra8QNVVQw=
 =E7c4
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 hw mitigation updates from Borislav Petkov:

 - A bunch of improvements, cleanups and fixlets to the SRSO mitigation
   machinery and other, general cleanups to the hw mitigations code, by
   Josh Poimboeuf

 - Improve the return thunk detection by objtool as it is absolutely
   important that the default return thunk is not used after returns
   have been patched. Future work to detect and report this better is
   pending

 - Other misc cleanups and fixes

* tag 'x86_bugs_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/retpoline: Document some thunk handling aspects
  x86/retpoline: Make sure there are no unconverted return thunks due to KCSAN
  x86/callthunks: Delete unused "struct thunk_desc"
  x86/vdso: Run objtool on vdso32-setup.o
  objtool: Fix return thunk patching in retpolines
  x86/srso: Remove unnecessary semicolon
  x86/pti: Fix kernel warnings for pti= and nopti cmdline options
  x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk()
  x86/nospec: Refactor UNTRAIN_RET[_*]
  x86/rethunk: Use SYM_CODE_START[_LOCAL]_NOALIGN macros
  x86/srso: Disentangle rethunk-dependent options
  x86/srso: Move retbleed IBPB check into existing 'has_microcode' code block
  x86/bugs: Remove default case for fully switched enums
  x86/srso: Remove 'pred_cmd' label
  x86/srso: Unexport untraining functions
  x86/srso: Improve i-cache locality for alias mitigation
  x86/srso: Fix unret validation dependencies
  x86/srso: Fix vulnerability reporting for missing microcode
  x86/srso: Print mitigation for retbleed IBPB case
  x86/srso: Print actual mitigation if requested mitigation isn't possible
  ...
2023-10-30 11:48:49 -10:00
Linus Torvalds
01ae815c50 - Specify what error addresses reported on AMD are actually usable
memory error addresses for further decoding
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmU7kb8ACgkQEsHwGGHe
 VUq3/A//VTrsON+RRS+M7PVewXMiTbwjVytum/9gWXtuUBEFdWCQjCe4TSaI6+mX
 v8inAomBE7s3SoQYkosF1VO2l0r68aJLOm6hczzbjz+ZjGvramDiv5qCs0iMM8m4
 Nvwyjeo1+2G6JeaX2rR7fqnZkA4NcYE1/s05pksNEaXMsAhpSOWenRgUK1EyQXLE
 y1u63G5GLMT4cpjEmEcbp9Lb02WwQzB9inZ1f4MFoujkI5VJ/9b68D+DpGwHd4Ag
 HNrg6LR/YpVwioVnsa+xEiQSxxwuRCHvS8kbc27d3qhfT4cRhmtAIsHYkyeO75TJ
 jkXU1Gme/k2RDEYHOz7heSVWgmG3y9/swc3UZJFE0QAnjdajY7mUsM9+o5uCm4Y6
 rALf8z7t0+oMpG1YML5Y+0wvgcPk9pih6Mm9tbBlFCXPi2OQ5bieNkHe7RQXHcQx
 xwFoQI0ByWvW7omu+jqA8iN4YSLaQhST2wzghPF1Wu7KAewu5lpU+9kmgmL7utme
 aHIQFdhRFusYEABqlr8XQSew5FMtIfOeWWCdgWHghUQp6LCsA0QeUxcQR9VdY9Th
 IgY1j4G2lQeLpDlnWE9VPMCWk4cuTABRyVWEu1B4wScU3xRWD8jOh+LcF76RdoYz
 k7GW0d68DGDCRgU7q86LNM/bNH0zyIIteTj64uHBzQm0ygJcdZU=
 =NcVX
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS updates from Borislav Petkov:

 - Specify what error addresses reported on AMD are actually usable
   memory error addresses for further decoding

* tag 'ras_core_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Cleanup mce_usable_address()
  x86/mce: Define amd_mce_usable_address()
  x86/MCE/AMD: Split amd_mce_is_memory_error()
2023-10-30 11:47:03 -10:00
Linus Torvalds
df9c65b5fc vfs-6.7.iov_iter
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTppQwAKCRCRxhvAZXjc
 om2kAP4u+eLsrhJHfUPUttGUEkSkZE+5/s/f1A/1GcV51usLSgEAu8urxAnP49GW
 INaDABXaFfKx8/KI/H2YFZPKGwlNEwY=
 =PDDE
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.7.iov_iter' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull iov_iter updates from Christian Brauner:
 "This contain's David's iov_iter cleanup work to convert the iov_iter
  iteration macros to inline functions:

   - Remove last_offset from iov_iter as it was only used by ITER_PIPE

   - Add a __user tag on copy_mc_to_user()'s dst argument on x86 to
     match that on powerpc and get rid of a sparse warning

   - Convert iter->user_backed to user_backed_iter() in the sound PCM
     driver

   - Convert iter->user_backed to user_backed_iter() in a couple of
     infiniband drivers

   - Renumber the type enum so that the ITER_* constants match the order
     in iterate_and_advance*()

   - Since the preceding patch puts UBUF and IOVEC at 0 and 1, change
     user_backed_iter() to just use the type value and get rid of the
     extra flag

   - Convert the iov_iter iteration macros to always-inline functions to
     make the code easier to follow. It uses function pointers, but they
     get optimised away

   - Move the check for ->copy_mc to _copy_from_iter() and
     copy_page_from_iter_atomic() rather than in memcpy_from_iter_mc()
     where it gets repeated for every segment. Instead, we check once
     and invoke a side function that can use iterate_bvec() rather than
     iterate_and_advance() and supply a different step function

   - Move the copy-and-csum code to net/ where it can be in proximity
     with the code that uses it

   - Fold memcpy_and_csum() in to its two users

   - Move csum_and_copy_from_iter_full() out of line and merge in
     csum_and_copy_from_iter() since the former is the only caller of
     the latter

   - Move hash_and_copy_to_iter() to net/ where it can be with its only
     caller"

* tag 'vfs-6.7.iov_iter' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  iov_iter, net: Move hash_and_copy_to_iter() to net/
  iov_iter, net: Merge csum_and_copy_from_iter{,_full}() together
  iov_iter, net: Fold in csum_and_memcpy()
  iov_iter, net: Move csum_and_copy_to/from_iter() to net/
  iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  iov_iter: Convert iterate*() to inline funcs
  iov_iter: Derive user-backedness from the iterator type
  iov_iter: Renumber ITER_* constants
  infiniband: Use user_backed_iter() to see if iterator is UBUF/IOVEC
  sound: Fix snd_pcm_readv()/writev() to use iov access functions
  iov_iter, x86: Be consistent about the __user tag on copy_mc_to_user()
  iov_iter: Remove last_offset from iov_iter as it was for ITER_PIPE
2023-10-30 09:24:21 -10:00
Linus Torvalds
2af9b20dbb Misc fixes:
- Fix a possible CPU hotplug deadlock bug caused by the new
    TSC synchronization code.
 
  - Fix a legacy PIC discovery bug that results in device troubles on
    affected systems, such as non-working keybards, etc.
 
  - Add a new Intel CPU model number to <asm/intel-family.h>.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU85uARHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jSrhAAr3FrzkDWqLUUuDMWFEEVkTGz9GLc2uMP
 bGT7HApp301wGpFv35bbBVp7HKLSB9nFeLsWKQF6MBa2uSUiCxSBcSNfHVJ6jX1O
 Ac4hgl0Q/tZl0yZoag3u4j3FS57d9Qkzpg0QLTz9MVnQXEPTcDkli9tfSbRpxx3B
 WvcGwAD+VeMjeghIJHxTJW9KwouIf3gAzqoKE7TKEozJ4S2LbNs3nEsDGsRHJsPN
 jy/Fxlny6pdlkJArP/ILA9WO3yqBn2doIN9oqWKQCQtYbt1USnuJb5oCOXYAOijG
 A9iLJmMjDyN2TZ8lKYHBcHVmEVTeB+6VnikejcAaT4EMrArj5mdjL3K7McZ24NFh
 s8mx/S+v4mBExwNsaoq8kuUEZySWxNbFPLBjOq8BKIZYYDQUp1NK58CRWv9BAB6y
 GjgYlMYI6EGDb+QAoTKZ5KNqrwtUPuk2ijjJTB15DpgLkl7XmyoVOulrtVDrhuGZ
 Uuge9h0CLKnqD7lLfJdI7NLYxmZxPiQKbL9vrIqlk989UM5ItvttxWBNzhAdSTsG
 VIaXAOJgGDj74dJC+3b0umugKJcrycUFBy5/RxOg+OcnSPd5g/L4XWcLLk+OElE3
 pFBetWmO2HU4pxhL1GdWsDuF+RkYug4XKQqQP2LJ8Zepff/jkF+hVJoLYUhQIF6A
 OMQv1N6/nSg=
 =k19e
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Fix a possible CPU hotplug deadlock bug caused by the new TSC
   synchronization code

 - Fix a legacy PIC discovery bug that results in device troubles on
   affected systems, such as non-working keybards, etc

 - Add a new Intel CPU model number to <asm/intel-family.h>

* tag 'x86-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Defer marking TSC unstable to a worker
  x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility
  x86/cpu: Add model number for Intel Arrow Lake mobile processor
2023-10-28 08:15:07 -10:00
Masahiro Yamada
7f6d8f7e43 kbuild: remove ARCH_POSTLINK from module builds
The '%.ko' rule in arch/*/Makefile.postlink does nothing but call the
'true' command.

Remove the unneeded code.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
2023-10-28 21:10:08 +09:00
Masahiro Yamada
56769ba4b2 kbuild: unify vdso_install rules
Currently, there is no standard implementation for vdso_install,
leading to various issues:

 1. Code duplication

    Many architectures duplicate similar code just for copying files
    to the install destination.

    Some architectures (arm, sparc, x86) create build-id symlinks,
    introducing more code duplication.

 2. Unintended updates of in-tree build artifacts

    The vdso_install rule depends on the vdso files to install.
    It may update in-tree build artifacts. This can be problematic,
    as explained in commit 19514fc665 ("arm, kbuild: make
    "make install" not depend on vmlinux").

 3. Broken code in some architectures

    Makefile code is often copied from one architecture to another
    without proper adaptation.

    'make vdso_install' for parisc does not work.

    'make vdso_install' for s390 installs vdso64, but not vdso32.

To address these problems, this commit introduces a generic vdso_install
rule.

Architectures that support vdso_install need to define vdso-install-y
in arch/*/Makefile. vdso-install-y lists the files to install.

For example, arch/x86/Makefile looks like this:

  vdso-install-$(CONFIG_X86_64)           += arch/x86/entry/vdso/vdso64.so.dbg
  vdso-install-$(CONFIG_X86_X32_ABI)      += arch/x86/entry/vdso/vdsox32.so.dbg
  vdso-install-$(CONFIG_X86_32)           += arch/x86/entry/vdso/vdso32.so.dbg
  vdso-install-$(CONFIG_IA32_EMULATION)   += arch/x86/entry/vdso/vdso32.so.dbg

These files will be installed to $(MODLIB)/vdso/ with the .dbg suffix,
if exists, stripped away.

vdso-install-y can optionally take the second field after the colon
separator. This is needed because some architectures install a vdso
file as a different base name.

The following is a snippet from arch/arm64/Makefile.

  vdso-install-$(CONFIG_COMPAT_VDSO)      += arch/arm64/kernel/vdso32/vdso.so.dbg:vdso32.so

This will rename vdso.so.dbg to vdso32.so during installation. If such
architectures change their implementation so that the base names match,
this workaround will go away.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Sven Schnelle <svens@linux.ibm.com> # s390
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Reviewed-by: Guo Ren <guoren@kernel.org>
Acked-by: Helge Deller <deller@gmx.de>  # parisc
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2023-10-28 21:09:02 +09:00
Mingwei Zhang
fad505b2cb KVM: x86: Service NMI requests after PMI requests in VM-Enter path
Service NMI and SMI requests after PMI requests in vcpu_enter_guest() so
that KVM does not need to cancel and redo the VM-Enter if the guest
configures its PMIs to be delivered as NMIs (likely) or SMIs (unlikely).
Because APIC emulation "injects" NMIs via KVM_REQ_NMI, handling PMI
requests after NMI requests (the likely case) means KVM won't detect the
pending NMI request until the final check for outstanding requests.
Detecting requests at the final stage is costly as KVM has already loaded
guest state, potentially queued events for injection, disabled IRQs,
dropped SRCU, etc., most of which needs to be unwound.

Note that changing the order of request processing doesn't change the end
result, as KVM's final check for outstanding requests prevents entering
the guest until all requests are serviced.  I.e. KVM will ultimately
coalesce events (or not) regardless of the ordering.

Using SPEC2017 benchmark programs running along with Intel vtune in a VM
demonstrates that the following code change reduces 800~1500 canceled
VM-Enters per second.

Some glory details:

Probe the invocation to vmx_cancel_injection():

    $ perf probe -a vmx_cancel_injection
    $ perf stat -a -e probe:vmx_cancel_injection -I 10000 # per 10 seconds

Partial results when SPEC2017 with Intel vtune are running in the VM:

On kernel without the change:
    10.010018010              14254      probe:vmx_cancel_injection
    20.037646388              15207      probe:vmx_cancel_injection
    30.078739816              15261      probe:vmx_cancel_injection
    40.114033258              15085      probe:vmx_cancel_injection
    50.149297460              15112      probe:vmx_cancel_injection
    60.185103088              15104      probe:vmx_cancel_injection

On kernel with the change:
    10.003595390                 40      probe:vmx_cancel_injection
    20.017855682                 31      probe:vmx_cancel_injection
    30.028355883                 34      probe:vmx_cancel_injection
    40.038686298                 31      probe:vmx_cancel_injection
    50.048795162                 20      probe:vmx_cancel_injection
    60.069057747                 19      probe:vmx_cancel_injection

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20231002040839.2630027-1-mizhang@google.com
[sean: hoist PMU/PMI above SMI too, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-27 13:20:29 -07:00
Thomas Gleixner
bd94d86f49 x86/tsc: Defer marking TSC unstable to a worker
Tetsuo reported the following lockdep splat when the TSC synchronization
fails during CPU hotplug:

   tsc: Marking TSC unstable due to check_tsc_sync_source failed
  
   WARNING: inconsistent lock state
   inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
   ffffffff8cfa1c78 (watchdog_lock){?.-.}-{2:2}, at: clocksource_watchdog+0x23/0x5a0
   {IN-HARDIRQ-W} state was registered at:
     _raw_spin_lock_irqsave+0x3f/0x60
     clocksource_mark_unstable+0x1b/0x90
     mark_tsc_unstable+0x41/0x50
     check_tsc_sync_source+0x14f/0x180
     sysvec_call_function_single+0x69/0x90

   Possible unsafe locking scenario:
     lock(watchdog_lock);
     <Interrupt>
       lock(watchdog_lock);

   stack backtrace:
    _raw_spin_lock+0x30/0x40
    clocksource_watchdog+0x23/0x5a0
    run_timer_softirq+0x2a/0x50
    sysvec_apic_timer_interrupt+0x6e/0x90

The reason is the recent conversion of the TSC synchronization function
during CPU hotplug on the control CPU to a SMP function call. In case
that the synchronization with the upcoming CPU fails, the TSC has to be
marked unstable via clocksource_mark_unstable().

clocksource_mark_unstable() acquires 'watchdog_lock', but that lock is
taken with interrupts enabled in the watchdog timer callback to minimize
interrupt disabled time. That's obviously a possible deadlock scenario,

Before that change the synchronization function was invoked in thread
context so this could not happen.

As it is not crucical whether the unstable marking happens slightly
delayed, defer the call to a worker thread which avoids the lock context
problem.

Fixes: 9d349d47f0 ("x86/smpboot: Make TSC synchronization function call based")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87zg064ceg.ffs@tglx
2023-10-27 20:36:57 +02:00
Thomas Gleixner
128b0c9781 x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility
David and a few others reported that on certain newer systems some legacy
interrupts fail to work correctly.

Debugging revealed that the BIOS of these systems leaves the legacy PIC in
uninitialized state which makes the PIC detection fail and the kernel
switches to a dummy implementation.

Unfortunately this fallback causes quite some code to fail as it depends on
checks for the number of legacy PIC interrupts or the availability of the
real PIC.

In theory there is no reason to use the PIC on any modern system when
IO/APIC is available, but the dependencies on the related checks cannot be
resolved trivially and on short notice. This needs lots of analysis and
rework.

The PIC detection has been added to avoid quirky checks and force selection
of the dummy implementation all over the place, especially in VM guest
scenarios. So it's not an option to revert the relevant commit as that
would break a lot of other scenarios.

One solution would be to try to initialize the PIC on detection fail and
retry the detection, but that puts the burden on everything which does not
have a PIC.

Fortunately the ACPI/MADT table header has a flag field, which advertises
in bit 0 that the system is PCAT compatible, which means it has a legacy
8259 PIC.

Evaluate that bit and if set avoid the detection routine and keep the real
PIC installed, which then gets initialized (for nothing) and makes the rest
of the code with all the dependencies work again.

Fixes: e179f69141 ("x86, irq, pic: Probe for legacy PIC and set legacy_pic appropriately")
Reported-by: David Lazar <dlazar@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: David Lazar <dlazar@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218003
Link: https://lore.kernel.org/r/875y2u5s8g.ffs@tglx
2023-10-27 20:36:49 +02:00
Tony Luck
b99d70c0d1 x86/cpu: Add model number for Intel Arrow Lake mobile processor
For "reasons" Intel has code-named this CPU with a "_H" suffix.

[ dhansen: As usual, apply this and send it upstream quickly to
	   make it easier for anyone who is doing work that
	   consumes this. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20231025202513.12358-1-tony.luck%40intel.com
2023-10-27 10:19:26 -07:00
Kan Liang
3374491619 perf/x86/intel: Support branch counters logging
The branch counters logging (A.K.A LBR event logging) introduces a
per-counter indication of precise event occurrences in LBRs. It can
provide a means to attribute exposed retirement latency to combinations
of events across a block of instructions. It also provides a means of
attributing Timed LBR latencies to events.

The feature is first introduced on SRF/GRR. It is an enhancement of the
ARCH LBR. It adds new fields in the LBR_INFO MSRs to log the occurrences
of events on the GP counters. The information is displayed by the order
of counters.

The design proposed in this patch requires that the events which are
logged must be in a group with the event that has LBR. If there are
more than one LBR group, the counters logging information only from the
current group (overflowed) are stored for the perf tool, otherwise the
perf tool cannot know which and when other groups are scheduled
especially when multiplexing is triggered. The user can ensure it uses
the maximum number of counters that support LBR info (4 by now) by
making the group large enough.

The HW only logs events by the order of counters. The order may be
different from the order of enabling which the perf tool can understand.
When parsing the information of each branch entry, convert the counter
order to the enabled order, and store the enabled order in the extension
space.

Unconditionally reset LBRs for an LBR event group when it's deleted. The
logged counter information is only valid for the current LBR group. If
another LBR group is scheduled later, the information from the stale
LBRs would be otherwise wrongly interpreted.

Add a sanity check in intel_pmu_hw_config(). Disable the feature if other
counter filters (inv, cmask, edge, in_tx) are set or LBR call stack mode
is enabled. (For the LBR call stack mode, we cannot simply flush the
LBR, since it will break the call stack. Also, there is no obvious usage
with the call stack mode for now.)

Only applying the PERF_SAMPLE_BRANCH_COUNTERS doesn't require any branch
stack setup.

Expose the maximum number of supported counters and the width of the
counters into the sysfs. The perf tool can use the information to parse
the logged counters in each branch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-5-kan.liang@linux.intel.com
2023-10-27 15:05:11 +02:00
Kan Liang
318c498591 perf/x86/intel: Reorganize attrs and is_visible
Some attrs and is_visible implementations are rather far away from one
another which makes the whole thing hard to interpret.

There are only two attribute groups which have both .attrs and
.is_visible, group_default and group_caps_lbr. Move them together.

No functional changes.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-4-kan.liang@linux.intel.com
2023-10-27 15:05:10 +02:00
Kan Liang
1f2376cd03 perf: Add branch_sample_call_stack
Add a helper function to check call stack sample type.

The later patch will invoke the function in several places.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-3-kan.liang@linux.intel.com
2023-10-27 15:05:09 +02:00
Kan Liang
85846b2707 perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag
Currently, branch_sample_type !=0 is used to check whether a branch
stack setup is required. But it doesn't check the sample type,
unnecessary branch stack setup may be done for a counting event. E.g.,
perf record -e "{branch-instructions,branch-misses}:S" -j any
Also, the event only with the new PERF_SAMPLE_BRANCH_COUNTERS branch
sample type may not require a branch stack setup either.

Add a new flag NEEDS_BRANCH_STACK to indicate whether the event requires
a branch stack setup. Replace the needs_branch_stack() by checking the
new flag.

The counting event check is implemented here. The later patch will take
the new PERF_SAMPLE_BRANCH_COUNTERS into account.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-2-kan.liang@linux.intel.com
2023-10-27 15:05:09 +02:00
Kan Liang
571d91dcad perf: Add branch stack counters
Currently, the additional information of a branch entry is stored in a
u64 space. With more and more information added, the space is running
out. For example, the information of occurrences of events will be added
for each branch.

Two places were suggested to append the counters.
https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/
One place is right after the flags of each branch entry. It changes the
existing struct perf_branch_entry. The later ARCH specific
implementation has to be really careful to consistently pick
the right struct.
The other place is right after the entire struct perf_branch_stack.
The disadvantage is that the pointer of the extra space has to be
recorded. The common interface perf_sample_save_brstack() has to be
updated.

The latter is much straightforward, and should be easily understood and
maintained. It is implemented in the patch.

Add a new branch sample type, PERF_SAMPLE_BRANCH_COUNTERS, to indicate
the event which is recorded in the branch info.

The "u64 counters" may store the occurrences of several events. The
information regarding the number of events/counters and the width of
each counter should be exposed via sysfs as a reference for the perf
tool. Define the branch_counter_nr and branch_counter_width ABI here.
The support will be implemented later in the Intel-specific patch.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231025201626.3000228-1-kan.liang@linux.intel.com
2023-10-27 15:05:08 +02:00
Jakub Kicinski
ec4c20ca09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/mac80211/rx.c
  91535613b6 ("wifi: mac80211: don't drop all unprotected public action frames")
  6c02fab724 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value")

Adjacent changes:

drivers/net/ethernet/apm/xgene/xgene_enet_main.c
  61471264c0 ("net: ethernet: apm: Convert to platform remove callback returning void")
  d2ca43f306 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF")

net/vmw_vsock/virtio_transport.c
  64c99d2d6a ("vsock/virtio: support to send non-linear skb")
  53b08c4985 ("vsock/virtio: initialize the_virtio_vsock before using VQs")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26 13:46:28 -07:00
Koichiro Den
b56ebe7c89 x86/apic/msi: Fix misconfigured non-maskable MSI quirk
commit ef8dd01538 ("genirq/msi: Make interrupt allocation less
convoluted"), reworked the code so that the x86 specific quirk for affinity
setting of non-maskable PCI/MSI interrupts is not longer activated if
necessary.

This could be solved by restoring the original logic in the core MSI code,
but after a deeper analysis it turned out that the quirk flag is not
required at all.

The quirk is only required when the PCI/MSI device cannot mask the MSI
interrupts, which in turn also prevents reservation mode from being enabled
for the affected interrupt.

This allows ot remove the NOMASK quirk bit completely as msi_set_affinity()
can instead check whether reservation mode is enabled for the interrupt,
which gives exactly the same answer.

Even in the momentary non-existing case that the reservation mode would be
not set for a maskable MSI interrupt this would not cause any harm as it
just would cause msi_set_affinity() to go needlessly through the
functionaly equivalent slow path, which works perfectly fine with maskable
interrupts as well.

Rework msi_set_affinity() to query the reservation mode and remove all
NOMASK quirk logic from the core code.

[ tglx: Massaged changelog ]

Fixes: ef8dd01538 ("genirq/msi: Make interrupt allocation less convoluted")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231026032036.2462428-1-den@valinux.co.jp
2023-10-26 13:53:06 +02:00
Ingo Molnar
70c8dc9104 x86/defconfig: Enable CONFIG_DEBUG_ENTRY=y
A bug was recently found via CONFIG_DEBUG_ENTRY=y, and the x86
tree kinda is the main source of changes to the x86 entry code,
so enable this debug option by default in our defconfigs.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
2023-10-24 19:12:37 +02:00
Ashok Raj
cf5ab01c87 x86/microcode/intel: Add a minimum required revision for late loading
In general users, don't have the necessary information to determine
whether late loading of a new microcode version is safe and does not
modify anything which the currently running kernel uses already, e.g.
removal of CPUID bits or behavioural changes of MSRs.

To address this issue, Intel has added a "minimum required version"
field to a previously reserved field in the microcode header.  Microcode
updates should only be applied if the current microcode version is equal
to, or greater than this minimum required version.

Thomas made some suggestions on how meta-data in the microcode file could
provide Linux with information to decide if the new microcode is suitable
candidate for late loading. But even the "simpler" option requires a lot of
metadata and corresponding kernel code to parse it, so the final suggestion
was to add the 'minimum required version' field in the header.

When microcode changes visible features, microcode will set the minimum
required version to its own revision which prevents late loading.

Old microcode blobs have the minimum revision field always set to 0, which
indicates that there is no information and the kernel considers it
unsafe.

This is a pure OS software mechanism. The hardware/firmware ignores this
header field.

For early loading there is no restriction because OS visible features
are enumerated after the early load and therefore a change has no
effect.

The check is always enabled, but by default not enforced. It can be
enforced via Kconfig or kernel command line.

If enforced, the kernel refuses to late load microcode with a minimum
required version field which is zero or when the currently loaded
microcode revision is smaller than the minimum required revision.

If not enforced the load happens independent of the revision check to
stay compatible with the existing behaviour, but it influences the
decision whether the kernel is tainted or not. If the check signals that
the late load is safe, then the kernel is not tainted.

Early loading is not affected by this.

[ tglx: Massaged changelog and fixed up the implementation ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.776467264@linutronix.de
2023-10-24 15:05:55 +02:00
Thomas Gleixner
9407bda845 x86/microcode: Prepare for minimal revision check
Applying microcode late can be fatal for the running kernel when the
update changes functionality which is in use already in a non-compatible
way, e.g. by removing a CPUID bit.

There is no way for admins which do not have access to the vendors deep
technical support to decide whether late loading of such a microcode is
safe or not.

Intel has added a new field to the microcode header which tells the
minimal microcode revision which is required to be active in the CPU in
order to be safe.

Provide infrastructure for handling this in the core code and a command
line switch which allows to enforce it.

If the update is considered safe the kernel is not tainted and the annoying
warning message not emitted. If it's enforced and the currently loaded
microcode revision is not safe for late loading then the load is aborted.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211724.079611170@linutronix.de
2023-10-24 15:05:55 +02:00
Thomas Gleixner
8f849ff63b x86/microcode: Handle "offline" CPUs correctly
Offline CPUs need to be parked in a safe loop when microcode update is
in progress on the primary CPU. Currently, offline CPUs are parked in
mwait_play_dead(), and for Intel CPUs, its not a safe instruction,
because the MWAIT instruction can be patched in the new microcode update
that can cause instability.

  - Add a new microcode state 'UCODE_OFFLINE' to report status on per-CPU
  basis.
  - Force NMI on the offline CPUs.

Wake up offline CPUs while the update is in progress and then return
them back to mwait_play_dead() after microcode update is complete.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.660850472@linutronix.de
2023-10-24 15:05:55 +02:00
Thomas Gleixner
9cab5fb776 x86/apic: Provide apic_force_nmi_on_cpu()
When SMT siblings are soft-offlined and parked in one of the play_dead()
variants they still react on NMI, which is problematic on affected Intel
CPUs. The default play_dead() variant uses MWAIT on modern CPUs, which is
not guaranteed to be safe when updated concurrently.

Right now late loading is prevented when not all SMT siblings are online,
but as they still react on NMI, it is possible to bring them out of their
park position into a trivial rendezvous handler.

Provide a function which allows to do that. I does sanity checks whether
the target is in the cpus_booted_once_mask and whether the APIC driver
supports it.

Mark X2APIC and XAPIC as capable, but exclude 32bit and the UV and NUMACHIP
variants as that needs feedback from the relevant experts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.603100036@linutronix.de
2023-10-24 15:05:55 +02:00
Thomas Gleixner
1582c0f4a2 x86/microcode: Protect against instrumentation
The wait for control loop in which the siblings are waiting for the
microcode update on the primary thread must be protected against
instrumentation as instrumentation can end up in #INT3, #DB or #PF,
which then returns with IRET. That IRET reenables NMI which is the
opposite of what the NMI rendezvous is trying to achieve.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.545969323@linutronix.de
2023-10-24 15:05:55 +02:00
Thomas Gleixner
7eb314a228 x86/microcode: Rendezvous and load in NMI
stop_machine() does not prevent the spin-waiting sibling from handling
an NMI, which is obviously violating the whole concept of rendezvous.

Implement a static branch right in the beginning of the NMI handler
which is nopped out except when enabled by the late loading mechanism.

The late loader enables the static branch before stop_machine() is
invoked. Each CPU has an nmi_enable in its control structure which
indicates whether the CPU should go into the update routine.

This is required to bridge the gap between enabling the branch and
actually being at the point where it is required to enter the loader
wait loop.

Each CPU which arrives in the stopper thread function sets that flag and
issues a self NMI right after that. If the NMI function sees the flag
clear, it returns. If it's set it clears the flag and enters the
rendezvous.

This is safe against a real NMI which hits in between setting the flag
and sending the NMI to itself. The real NMI will be swallowed by the
microcode update and the self NMI will then let stuff continue.
Otherwise this would end up with a spurious NMI.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.489900814@linutronix.de
2023-10-24 15:05:55 +02:00
Thomas Gleixner
0bf8716512 x86/microcode: Replace the all-in-one rendevous handler
with a new handler which just separates the control flow of primary and
secondary CPUs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.433704135@linutronix.de
2023-10-24 15:05:55 +02:00
Thomas Gleixner
6067788f04 x86/microcode: Provide new control functions
The current all in one code is unreadable and really not suited for
adding future features like uniform loading with package or system
scope.

Provide a set of new control functions which split the handling of the
primary and secondary CPUs. These will replace the current rendezvous
all in one function in the next step. This is intentionally a separate
change because diff makes an complete unreadable mess otherwise.

So the flow separates the primary and the secondary CPUs into their own
functions which use the control field in the per CPU ucode_ctrl struct.

   primary()			secondary()
    wait_for_all()		 wait_for_all()
    apply_ucode()		 wait_for_release()
    release()			 apply_ucode()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.377922731@linutronix.de
2023-10-24 15:05:55 +02:00
Thomas Gleixner
ba3aeb97cb x86/microcode: Add per CPU control field
Add a per CPU control field to ucode_ctrl and define constants for it
which are going to be used to control the loading state machine.

In theory this could be a global control field, but a global control does
not cover the following case:

 15 primary CPUs load microcode successfully
  1 primary CPU fails and returns with an error code

With global control the sibling of the failed CPU would either try again or
the whole operation would be aborted with the consequence that the 15
siblings do not invoke the apply path and end up with inconsistent software
state. The result in dmesg would be inconsistent too.

There are two additional fields added and initialized:

ctrl_cpu and secondaries. ctrl_cpu is the CPU number of the primary thread
for now, but with the upcoming uniform loading at package or system scope
this will be one CPU per package or just one CPU. Secondaries hands the
control CPU a CPU mask which will be required to release the secondary CPUs
out of the wait loop.

Preparatory change for implementing a properly split control flow for
primary and secondary CPUs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.319959519@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
4b753955e9 x86/microcode: Add per CPU result state
The microcode rendezvous is purely acting on global state, which does
not allow to analyze fails in a coherent way.

Introduce per CPU state where the results are written into, which allows to
analyze the return codes of the individual CPUs.

Initialize the state when walking the cpu_present_mask in the online
check to avoid another for_each_cpu() loop.

Enhance the result print out with that.

The structure is intentionally named ucode_ctrl as it will gain control
fields in subsequent changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211723.632681010@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
0772b9aa1a x86/microcode: Sanitize __wait_for_cpus()
The code is too complicated for no reason:

 - The return value is pointless as this is a strict boolean.

 - It's way simpler to count down from num_online_cpus() and check for
   zero.

  - The timeout argument is pointless as this is always one second.

  - Touching the NMI watchdog every 100ns does not make any sense, neither
    does checking every 100ns. This is really not a hotpath operation.

Preload the atomic counter with the number of online CPUs and simplify the
whole timeout logic. Delay for one microsecond and touch the NMI watchdog
once per millisecond.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.204251527@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
6f059e634d x86/microcode: Clarify the late load logic
reload_store() is way too complicated. Split the inner workings out and
make the following enhancements:

 - Taint the kernel only when the microcode was actually updated. If. e.g.
   the rendezvous fails, then nothing happened and there is no reason for
   tainting.

 - Return useful error codes

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20231002115903.145048840@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
634ac23ad6 x86/microcode: Handle "nosmt" correctly
On CPUs where microcode loading is not NMI-safe the SMT siblings which
are parked in one of the play_dead() variants still react to NMIs.

So if an NMI hits while the primary thread updates the microcode the
resulting behaviour is undefined. The default play_dead() implementation on
modern CPUs is using MWAIT which is not guaranteed to be safe against
a microcode update which affects MWAIT.

Take the cpus_booted_once_mask into account to detect this case and
refuse to load late if the vendor specific driver does not advertise
that late loading is NMI safe.

AMD stated that this is safe, so mark the AMD driver accordingly.

This requirement will be partially lifted in later changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.087472735@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
ba48aa3238 x86/microcode: Clean up mc_cpu_down_prep()
This function has nothing to do with suspend. It's a hotplug
callback. Remove the bogus comment.

Drop the pointless debug printk. The hotplug core provides tracepoints
which track the invocation of those callbacks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115903.028651784@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
2e1997335c x86/microcode: Get rid of the schedule work indirection
Scheduling work on all CPUs to collect the microcode information is just
another extra step for no value. Let the CPU hotplug callback registration
do it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211723.354748138@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
8529e8ab6c x86/microcode: Mop up early loading leftovers
Get rid of the initrd_gone hack which was required to keep
find_microcode_in_initrd() functional after init.

As find_microcode_in_initrd() is now only used during init, mark it
accordingly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211723.298854846@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
5af05b8d51 x86/microcode/amd: Use cached microcode for AP load
Now that the microcode cache is initialized before the APs are brought
up, there is no point in scanning builtin/initrd microcode during AP
loading.

Convert the AP loader to utilize the cache, which in turn makes the CPU
hotplug callback which applies the microcode after initrd/builtin is
gone, obsolete as the early loading during late hotplug operations
including the resume path depends now only on the cache.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211723.243426023@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
a7939f0167 x86/microcode/amd: Cache builtin/initrd microcode early
There is no reason to scan builtin/initrd microcode on each AP.

Cache the builtin/initrd microcode in an early initcall so that the
early AP loader can utilize the cache.

The existing fs initcall which invoked save_microcode_in_initrd_amd() is
still required to maintain the initrd_gone flag. Rename it accordingly.
This will be removed once the AP loader code is converted to use the
cache.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211723.187566507@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
d419d28261 x86/microcode/amd: Cache builtin microcode too
save_microcode_in_initrd_amd() fails to cache builtin microcode and only
scans initrd.

Use find_blobs_in_containers() instead which covers both.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231010150702.495139089@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
ecfd410893 x86/microcode/amd: Use correct per CPU ucode_cpu_info
find_blobs_in_containers() is invoked on every CPU but overwrites
unconditionally ucode_cpu_info of CPU0.

Fix this by using the proper CPU data and move the assignment into the
call site apply_ucode_from_containers() so that the function can be
reused.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231010150702.433454320@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
b48b26f992 x86/microcode: Remove pointless apply() invocation
Microcode is applied on the APs during early bringup. There is no point
in trying to apply the microcode again during the hotplug operations and
neither at the point where the microcode device is initialized.

Collect CPU info and microcode revision in setup_online_cpu() for now.
This will move to the CPU hotplug callback later.

  [ bp: Leave the starting notifier for the following scenario:

    - boot, late load, suspend to disk, resume

    without the starting notifier, only the last core manages to update the
    microcode upon resume:

    # rdmsr -a 0x8b
    10000bf
    10000bf
    10000bf
    10000bf
    10000bf
    10000dc <----

    This is on an AMD F10h machine.

    For the future, one should check whether potential unification of
    the CPU init path could cover the resume path too so that this can
    be simplified even more.

  tglx: This is caused by the odd handling of APs which try to find the
  microcode blob in builtin or initrd instead of caching the microcode
  blob during early init before the APs are brought up. Will be cleaned
  up in a later step. ]

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231017211723.018821624@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
b7fcd995b2 x86/microcode/intel: Rework intel_find_matching_signature()
Take a cpu_signature argument and work from there. Move the match()
helper next to the callsite as there is no point for having it in
a header.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115902.797820205@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
11f96ac4c2 x86/microcode/intel: Reuse intel_cpu_collect_info()
No point for an almost duplicate function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115902.741173606@linutronix.de
2023-10-24 15:05:54 +02:00
Thomas Gleixner
164aa1ca53 x86/microcode/intel: Rework intel_cpu_collect_info()
Nothing needs struct ucode_cpu_info. Make it take struct cpu_signature,
let it return a boolean and simplify the implementation. Rename it now
that the silly name clash with collect_cpu_info() is gone.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211722.851573238@linutronix.de
2023-10-24 15:05:53 +02:00
Thomas Gleixner
3973718cff x86/microcode/intel: Unify microcode apply() functions
Deduplicate the early and late apply() functions.

  [ bp: Rename the function which does the actual application to
      __apply_microcode() to differentiate it from
      microcode_ops.apply_microcode(). ]

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231017211722.795508212@linutronix.de
2023-10-24 15:05:53 +02:00
Thomas Gleixner
f24f204405 x86/microcode/intel: Switch to kvmalloc()
Microcode blobs are getting larger and might soon reach the kmalloc()
limit. Switch over kvmalloc().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115902.564323243@linutronix.de
2023-10-24 15:05:53 +02:00
Thomas Gleixner
2a1dada3d1 x86/microcode/intel: Save the microcode only after a successful late-load
There are situations where the late microcode is loaded into memory but
is not applied:

  1) The rendezvous fails
  2) The microcode is rejected by the CPUs

If any of this happens then the pointer which was updated at firmware
load time is stale and subsequent CPU hotplug operations either fail to
update or create inconsistent microcode state.

Save the loaded microcode in a separate pointer before the late load is
attempted and when successful, update the hotplug pointer accordingly
via a new microcode_ops callback.

Remove the pointless fallback in the loader to a microcode pointer which
is never populated.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115902.505491309@linutronix.de
2023-10-24 15:05:53 +02:00
Thomas Gleixner
dd5e3e3ca6 x86/microcode/intel: Simplify early loading
The early loading code is overly complicated:

  - It scans the builtin/initrd for microcode not only on the BSP, but also
    on all APs during early boot and then later in the boot process it
    scans again to duplicate and save the microcode before initrd goes
    away.

    That's a pointless exercise because this can be simply done before
    bringing up the APs when the memory allocator is up and running.

 - Saving the microcode from within the scan loop is completely
   non-obvious and a left over of the microcode cache.

   This can be done at the call site now which makes it obvious.

Rework the code so that only the BSP scans the builtin/initrd microcode
once during early boot and save it away in an early initcall for later
use.

  [ bp: Test and fold in a fix from tglx ontop which handles the need to
    distinguish what save_microcode() does depending on when it is
    called:

     - when on the BSP during early load, it needs to find a newer
       revision than the one currently loaded on the BSP

     - later, before SMP init, it still runs on the BSP and gets the BSP
       revision just loaded and uses that revision to know which patch
       to save for the APs. For that it needs to find the exact one as
       on the BSP.
   ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211722.629085215@linutronix.de
2023-10-24 15:02:36 +02:00
Matthew Maurer
6a5c032c4b x86: Enable IBT in Rust if enabled in C
These flags are not made conditional on compiler support because at the
moment exactly one version of rustc supported, and that one supports
these flags.

Building without these additional flags will manifest as objtool
printing a large number of errors about missing ENDBR and if CFI is
enabled (not currently possible) will result in incorrectly structured
function prefixes.

Signed-off-by: Matthew Maurer <mmaurer@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231009224347.2076221-1-mmaurer@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-10-23 15:35:00 +02:00
Ingo Molnar
4e5b65a22b Linux 6.6-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmU1ngkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGrsIH/0k/+gdBBYFFdEym
 foRhKir9WV3ZX4oIozJjA1f7T+qVYclKs6kaYm3gNepRBb6AoG8pdgv4MMAqhYsf
 QMe2XHi0MrO/qKBgfNfivxEa9jq+0QK5uvTbqCRqCAB8LfwVyDqapCmg3EuiZcPW
 UbMITmnwLIfXgPxvp9rabmCsTqO6FLbf0GDOVIkNSAIDBXMpcO1iffjrWUbhRa7n
 oIoiJmWJLcXLxPWDsRKbpJwzw2cIG08YhfQYAiQnC3YaeRm1FKLDIICRBsmfYzja
 rWv9r4dn4TDfV4/AnjggQnsZvz2yPCxNaFSQIT88nIeiLvyuUTJ9j8aidsSfMZQf
 xZAbzbA=
 =NoQv
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-rc7' into sched/core, to pick up fixes

Pick up recent sched/urgent fixes merged upstream.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-10-23 11:32:25 +02:00
Dave Airlie
7cd62eab9b Linux 6.6-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmU1ngkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGrsIH/0k/+gdBBYFFdEym
 foRhKir9WV3ZX4oIozJjA1f7T+qVYclKs6kaYm3gNepRBb6AoG8pdgv4MMAqhYsf
 QMe2XHi0MrO/qKBgfNfivxEa9jq+0QK5uvTbqCRqCAB8LfwVyDqapCmg3EuiZcPW
 UbMITmnwLIfXgPxvp9rabmCsTqO6FLbf0GDOVIkNSAIDBXMpcO1iffjrWUbhRa7n
 oIoiJmWJLcXLxPWDsRKbpJwzw2cIG08YhfQYAiQnC3YaeRm1FKLDIICRBsmfYzja
 rWv9r4dn4TDfV4/AnjggQnsZvz2yPCxNaFSQIT88nIeiLvyuUTJ9j8aidsSfMZQf
 xZAbzbA=
 =NoQv
 -----END PGP SIGNATURE-----

BackMerge tag 'v6.6-rc7' into drm-next

This is needed to add the msm pr which is based on a higher base.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2023-10-23 18:20:06 +10:00
Borislav Petkov (AMD)
9d9c22cc44 x86/retpoline: Document some thunk handling aspects
After a lot of experimenting (see thread Link points to) document for
now the issues and requirements for future improvements to the thunk
handling and potential issuing of a diagnostic when the default thunk
hasn't been patched out.

This documentation is only temporary and that close before the merge
window it is only a placeholder for those future improvements.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231010171020.462211-1-david.kaplan@amd.com
2023-10-20 13:17:14 +02:00
Alexey Dobriyan
321a145137 x86/callthunks: Delete unused "struct thunk_desc"
It looks like it was never used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/843bf596-db67-4b33-a865-2bae4a4418e5@p183
2023-10-20 12:58:48 +02:00
David Kaplan
b587fef124 x86/vdso: Run objtool on vdso32-setup.o
vdso32-setup.c is part of the main kernel image and should not be
excluded from objtool.  Objtool is necessary in part for ensuring that
returns in this file are correctly patched to the appropriate return
thunk at runtime.

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231010171020.462211-3-david.kaplan@amd.com
2023-10-20 12:58:27 +02:00
Yang Li
904e1ddd0b x86/srso: Remove unnecessary semicolon
scripts/coccinelle/misc/semicolon.cocci reports:

  arch/x86/kernel/cpu/bugs.c:713:2-3: Unneeded semicolon

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230810010550.25733-1-yang.lee@linux.alibaba.com
2023-10-20 12:50:35 +02:00
Jo Van Bulck
0bd7feb2df x86/pti: Fix kernel warnings for pti= and nopti cmdline options
Parse the pti= and nopti cmdline options using early_param to fix 'Unknown
kernel command line parameters "nopti", will be passed to user space'
warnings in the kernel log when nopti or pti= are passed to the kernel
cmdline on x86 platforms.

Additionally allow the kernel to warn for malformed pti= options.

Signed-off-by: Jo Van Bulck <jo.vanbulck@cs.kuleuven.be>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20230819080921.5324-2-jo.vanbulck@cs.kuleuven.be
2023-10-20 12:50:14 +02:00
Josh Poimboeuf
99ee56c765 x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk()
For consistency with the other return thunks, rename __x86_return_skl()
to call_depth_return_thunk().

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/ae44e9f9976934e3b5b47a458d523ccb15867561.1693889988.git.jpoimboe@kernel.org
2023-10-20 12:45:48 +02:00
Josh Poimboeuf
e8efc0800b x86/nospec: Refactor UNTRAIN_RET[_*]
Factor out the UNTRAIN_RET[_*] common bits into a helper macro.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/f06d45489778bd49623297af2a983eea09067a74.1693889988.git.jpoimboe@kernel.org
2023-10-20 12:41:57 +02:00
Josh Poimboeuf
0a3c49178c x86/rethunk: Use SYM_CODE_START[_LOCAL]_NOALIGN macros
Macros already exist for unaligned code block symbols.  Use them.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/26d461bd509cc840af24c94586561c06d39812b2.1693889988.git.jpoimboe@kernel.org
2023-10-20 12:40:42 +02:00
Josh Poimboeuf
34a3cae747 x86/srso: Disentangle rethunk-dependent options
CONFIG_RETHUNK, CONFIG_CPU_UNRET_ENTRY and CONFIG_CPU_SRSO are all
tangled up.  De-spaghettify the code a bit.

Some of the rethunk-related code has been shuffled around within the
'.text..__x86.return_thunk' section, but otherwise there are no
functional changes.  srso_alias_untrain_ret() and srso_alias_safe_ret()
((which are very address-sensitive) haven't moved.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/2845084ed303d8384905db3b87b77693945302b4.1693889988.git.jpoimboe@kernel.org
2023-10-20 12:30:50 +02:00
Josh Poimboeuf
351236947a x86/srso: Move retbleed IBPB check into existing 'has_microcode' code block
Simplify the code flow a bit by moving the retbleed IBPB check into the
existing 'has_microcode' block.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/0a22b86b1f6b07f9046a9ab763fc0e0d1b7a91d4.1693889988.git.jpoimboe@kernel.org
2023-10-20 12:29:25 +02:00
Josh Poimboeuf
0a0ce0da7f x86/bugs: Remove default case for fully switched enums
For enum switch statements which handle all possible cases, remove the
default case so a compiler warning gets printed if one of the enums gets
accidentally omitted from the switch statement.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/fcf6feefab991b72e411c2aed688b18e65e06aed.1693889988.git.jpoimboe@kernel.org
2023-10-20 12:28:44 +02:00
Josh Poimboeuf
55ca9010c4 x86/srso: Remove 'pred_cmd' label
SBPB is only enabled in two distinct cases:

1) when SRSO has been disabled with srso=off

2) when SRSO has been fixed (in future HW)

Simplify the control flow by getting rid of the 'pred_cmd' label and
moving the SBPB enablement check to the two corresponding code sites.
This makes it more clear when exactly SBPB gets enabled.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/bb20e8569cfa144def5e6f25e610804bc4974de2.1693889988.git.jpoimboe@kernel.org
2023-10-20 12:26:59 +02:00
Josh Poimboeuf
eb54be26b0 x86/srso: Unexport untraining functions
These functions aren't called outside of retpoline.S.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/1ae080f95ce7266c82cba6d2adde82349b832654.1693889988.git.jpoimboe@kernel.org
2023-10-20 12:21:59 +02:00
Josh Poimboeuf
aa730cff0c x86/srso: Improve i-cache locality for alias mitigation
Move srso_alias_return_thunk() to the same section as
srso_alias_safe_ret() so they can share a cache line.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/eadaf5530b46a7ae8b936522da45ae555d2b3393.1693889988.git.jpoimboe@kernel.org
2023-10-20 12:04:18 +02:00
Josh Poimboeuf
eeb9f34df0 x86/srso: Fix unret validation dependencies
CONFIG_CPU_SRSO isn't dependent on CONFIG_CPU_UNRET_ENTRY (AMD
Retbleed), so the two features are independently configurable.  Fix
several issues for the (presumably rare) case where CONFIG_CPU_SRSO is
enabled but CONFIG_CPU_UNRET_ENTRY isn't.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/299fb7740174d0f2335e91c58af0e9c242b4bac1.1693889988.git.jpoimboe@kernel.org
2023-10-20 11:46:59 +02:00
Josh Poimboeuf
dc6306ad5b x86/srso: Fix vulnerability reporting for missing microcode
The SRSO default safe-ret mitigation is reported as "mitigated" even if
microcode hasn't been updated.  That's wrong because userspace may still
be vulnerable to SRSO attacks due to IBPB not flushing branch type
predictions.

Report the safe-ret + !microcode case as vulnerable.

Also report the microcode-only case as vulnerable as it leaves the
kernel open to attacks.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/a8a14f97d1b0e03ec255c81637afdf4cf0ae9c99.1693889988.git.jpoimboe@kernel.org
2023-10-20 11:46:09 +02:00
Josh Poimboeuf
de9f5f7b06 x86/srso: Print mitigation for retbleed IBPB case
When overriding the requested mitigation with IBPB due to retbleed=ibpb,
print the mitigation in the usual format instead of a custom error
message.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/ec3af919e267773d896c240faf30bfc6a1fd6304.1693889988.git.jpoimboe@kernel.org
2023-10-20 11:45:24 +02:00
Josh Poimboeuf
3fc7b28e83 x86/srso: Print actual mitigation if requested mitigation isn't possible
If the kernel wasn't compiled to support the requested option, print the
actual option that ends up getting used.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/7e7a12ea9d85a9f76ca16a3efb71f262dee46ab1.1693889988.git.jpoimboe@kernel.org
2023-10-20 11:44:26 +02:00
Josh Poimboeuf
1d1142ac51 x86/srso: Fix SBPB enablement for (possible) future fixed HW
Make the SBPB check more robust against the (possible) case where future
HW has SRSO fixed but doesn't have the SRSO_NO bit set.

Fixes: 1b5277c0ea ("x86/srso: Add SRSO_NO support")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/cee5050db750b391c9f35f5334f8ff40e66c01b9.1693889988.git.jpoimboe@kernel.org
2023-10-20 11:34:51 +02:00
Mike Rapoport (IBM)
a1e2b8b368 x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size
Qi Zheng reported crashes in a production environment and provided a
simplified example as a reproducer:

 |  For example, if we use Qemu to start a two NUMA node kernel,
 |  one of the nodes has 2M memory (less than NODE_MIN_SIZE),
 |  and the other node has 2G, then we will encounter the
 |  following panic:
 |
 |    BUG: kernel NULL pointer dereference, address: 0000000000000000
 |    <...>
 |    RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40
 |    <...>
 |    Call Trace:
 |      <TASK>
 |      deactivate_slab()
 |      bootstrap()
 |      kmem_cache_init()
 |      start_kernel()
 |      secondary_startup_64_no_verify()

The crashes happen because of inconsistency between the nodemask that
has nodes with less than 4MB as memoryless, and the actual memory fed
into the core mm.

The commit:

  9391a3f9c7 ("[PATCH] x86_64: Clear more state when ignoring empty node in SRAT parsing")

... that introduced minimal size of a NUMA node does not explain why
a node size cannot be less than 4MB and what boot failures this
restriction might fix.

Fixes have been submitted to the core MM code to tighten up the
memory topologies it accepts and to not crash on weird input:

  mm: page_alloc: skip memoryless nodes entirely
  mm: memory_hotplug: drop memoryless node from fallback lists

Andrew has accepted them into the -mm tree, but there are no
stable SHA1's yet.

This patch drops the limitation for minimal node size on x86:

  - which works around the crash without the fixes to the core MM.
  - makes x86 topologies less weird,
  - removes an arbitrary and undocumented limitation on NUMA topologies.

[ mingo: Improved changelog clarity. ]

Reported-by: Qi Zheng <zhengqi.arch@bytedance.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/r/ZS+2qqjEO5/867br@gmail.com
2023-10-20 10:40:22 +02:00
Eric Biggers
796b06f5c9 crypto: x86/nhpoly1305 - implement ->digest
Implement the ->digest method to improve performance on single-page
messages by reducing the number of indirect calls.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20 13:39:25 +08:00
Eric Biggers
fdcac2ddc7 crypto: x86/sha256 - implement ->digest for sha256
Implement a ->digest function for sha256-ssse3, sha256-avx, sha256-avx2,
and sha256-ni.  This improves the performance of crypto_shash_digest()
with these algorithms by reducing the number of indirect calls that are
made.

For now, don't bother with this for sha224, since sha224 is rarely used.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20 13:39:25 +08:00
Linus Torvalds
0df072ab65 Take care of a race between when the #VC exception is raised and when
the guest kernel gets to emulate certain instructions in SEV-{ES,SNP}
 guests by:
 
 - disabling emulation of MMIO instructions when coming from user mode
 
 - checking the IO permission bitmap before emulating IO instructions and
   verifying the memory operands of INS/OUTS insns.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmUufCwACgkQEsHwGGHe
 VUoTHA//YO81VH8JkvfKwxh322mbD+TDTkgWgcpClsWnkIZQdyCpKVTwsWWuhwX5
 FmCEc3I75hRK3ts3sdhZYOS94gKVUyWf2ERm2qMD02+08tS3K/TxJyx5xBMz9U03
 VOiWRC1rp33MZ0eCrXenTbA7Xay6AhU34pz4qSdEvkUKUU6YIdCfnspFXSi84Uqy
 tgmyPDJhSH/3hE46EJSHd4m6c8PO3Su/oUJHMy/refbxAscf9NNdWpGlPY285Aox
 RTA0mOYQRRKf0YFkGabLY9IIcL0w+NXMhMVEMFNiXyxFvaM8CONhK6SDmzvcUngB
 gOfsN6nD4JDqfH11gXCdxS3n0IZuAAMHyEigktvp1qnyNEDTBUtbfUkyqvITg+JC
 u3KMFSSYB58colTK/bkhE0IHnH2bKzhkDuVKzmJn/OCTxf0xxfGsnjbdw0JxMO81
 /9ORx8/QKWzv411AH2DUNh4vIJqDxVTJJb8zkScnYStX2ust6Ra+jYIr+mmf46md
 +Rzo5qoe/GnAtReCdGFg3w339nEbUz51n5uqm9KN4QnH39wg5R8nPiAUMHOlO1Zm
 PNvNgSZUkiiJpMci/KBbyFzPJTO7YjjRql7GWRwhWrclSPOrq49kocK5eIEYS4ol
 cd5cKF92hHsnwycz2dZsDQwYqEQ5J+c6kZTwfUwJcoUBxCWP/qI=
 =MNCv
 -----END PGP SIGNATURE-----

Merge tag 'sev_fixes_for_v6.6' of //git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "Take care of a race between when the #VC exception is raised and when
  the guest kernel gets to emulate certain instructions in SEV-{ES,SNP}
  guests by:

   - disabling emulation of MMIO instructions when coming from user mode

   - checking the IO permission bitmap before emulating IO instructions
     and verifying the memory operands of INS/OUTS insns"

* tag 'sev_fixes_for_v6.6' of //git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Check for user-space IOIO pointing to kernel space
  x86/sev: Check IOBM for IOIO exceptions from user-space
  x86/sev: Disable MMIO emulation from user mode
2023-10-19 18:12:08 -07:00
Kuppuswamy Sathyanarayanan
f4738f56d1 virt: tdx-guest: Add Quote generation support using TSM_REPORTS
In TDX guest, the attestation process is used to verify the TDX guest
trustworthiness to other entities before provisioning secrets to the
guest. The first step in the attestation process is TDREPORT
generation, which involves getting the guest measurement data in the
format of TDREPORT, which is further used to validate the authenticity
of the TDX guest. TDREPORT by design is integrity-protected and can
only be verified on the local machine.

To support remote verification of the TDREPORT in a SGX-based
attestation, the TDREPORT needs to be sent to the SGX Quoting Enclave
(QE) to convert it to a remotely verifiable Quote. SGX QE by design can
only run outside of the TDX guest (i.e. in a host process or in a
normal VM) and guest can use communication channels like vsock or
TCP/IP to send the TDREPORT to the QE. But for security concerns, the
TDX guest may not support these communication channels. To handle such
cases, TDX defines a GetQuote hypercall which can be used by the guest
to request the host VMM to communicate with the SGX QE. More details
about GetQuote hypercall can be found in TDX Guest-Host Communication
Interface (GHCI) for Intel TDX 1.0, section titled
"TDG.VP.VMCALL<GetQuote>".

Trusted Security Module (TSM) [1] exposes a common ABI for Confidential
Computing Guest platforms to get the measurement data via ConfigFS.
Extend the TSM framework and add support to allow an attestation agent
to get the TDX Quote data (included usage example below).

  report=/sys/kernel/config/tsm/report/report0
  mkdir $report
  dd if=/dev/urandom bs=64 count=1 > $report/inblob
  hexdump -C $report/outblob
  rmdir $report

GetQuote TDVMCALL requires TD guest pass a 4K aligned shared buffer
with TDREPORT data as input, which is further used by the VMM to copy
the TD Quote result after successful Quote generation. To create the
shared buffer, allocate a large enough memory and mark it shared using
set_memory_decrypted() in tdx_guest_init(). This buffer will be re-used
for GetQuote requests in the TDX TSM handler.

Although this method reserves a fixed chunk of memory for GetQuote
requests, such one time allocation can help avoid memory fragmentation
related allocation failures later in the uptime of the guest.

Since the Quote generation process is not time-critical or frequently
used, the current version uses a polling model for Quote requests and
it also does not support parallel GetQuote requests.

Link: https://lore.kernel.org/lkml/169342399185.3934343.3035845348326944519.stgit@dwillia2-xfh.jf.intel.com/ [1]
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Erdem Aktas <erdemaktas@google.com>
Tested-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Tested-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-19 18:12:00 -07:00
Jakub Kicinski
041c3466f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

net/mac80211/key.c
  02e0e426a2 ("wifi: mac80211: fix error path key leak")
  2a8b665e6b ("wifi: mac80211: remove key_mtx")
  7d6904bf26 ("Merge wireless into wireless-next")
https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/ti/Kconfig
  a602ee3176 ("net: ethernet: ti: Fix mixed module-builtin object")
  98bdeae950 ("net: cpmac: remove driver to prepare for platform removal")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 13:29:01 -07:00
Maciej S. Szmigiero
2770d47220 KVM: x86: Ignore MSR_AMD64_TW_CFG access
Hyper-V enabled Windows Server 2022 KVM VM cannot be started on Zen1 Ryzen
since it crashes at boot with SYSTEM_THREAD_EXCEPTION_NOT_HANDLED +
STATUS_PRIVILEGED_INSTRUCTION (in other words, because of an unexpected #GP
in the guest kernel).

This is because Windows tries to set bit 8 in MSR_AMD64_TW_CFG and can't
handle receiving a #GP when doing so.

Give this MSR the same treatment that commit 2e32b71906
("x86, kvm: Add MSR_AMD64_BU_CFG2 to the list of ignored MSRs") gave
MSR_AMD64_BU_CFG2 under justification that this MSR is baremetal-relevant
only.
Although apparently it was then needed for Linux guests, not Windows as in
this case.

With this change, the aforementioned guest setup is able to finish booting
successfully.

This issue can be reproduced either on a Summit Ridge Ryzen (with
just "-cpu host") or on a Naples EPYC (with "-cpu host,stepping=1" since
EPYC is ordinarily stepping 2).

Alternatively, userspace could solve the problem by using MSR filters, but
forcing every userspace to define a filter isn't very friendly and doesn't
add much, if any, value.  The only potential hiccup is if one of these
"baremetal-only" MSRs ever requires actual emulation and/or has F/M/S
specific behavior.  But if that happens, then KVM can still punt *that*
handling to userspace since userspace MSR filters "win" over KVM's default
handling.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1ce85d9c7c9e9632393816cf19c902e0a3f411f1.1697731406.git.maciej.szmigiero@oracle.com
[sean: call out MSR filtering alternative]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-19 10:55:14 -07:00
Liang Chen
122ae01c51 KVM: x86: remove the unused assigned_dev_head from kvm_arch
Legacy device assignment was dropped years ago. This field is not used
anymore.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Link: https://lore.kernel.org/r/20231019043336.8998-1-liangchen.linux@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-19 08:42:22 -07:00
Thomas Gleixner
0177669ee6 x86/microcode/intel: Cleanup code further
Sanitize the microcode scan loop, fixup printks and move the loading
function for builtin microcode next to the place where it is used and mark
it __init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115902.389400871@linutronix.de
2023-10-19 14:10:50 +02:00
Thomas Gleixner
6b072022ab x86/microcode/intel: Simplify and rename generic_load_microcode()
so it becomes less obfuscated and rename it because there is nothing
generic about it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115902.330295409@linutronix.de
2023-10-19 14:10:00 +02:00
Thomas Gleixner
b0f0bf5eef x86/microcode/intel: Simplify scan_microcode()
Make it readable and comprehensible.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231002115902.271940980@linutronix.de
2023-10-19 12:32:40 +02:00
Ashok Raj
ae76d951f6 x86/microcode/intel: Rip out mixed stepping support for Intel CPUs
Mixed steppings aren't supported on Intel CPUs. Only one microcode patch
is required for the entire system. The caching of microcode blobs which
match the family and model is therefore pointless and in fact is
dysfunctional as CPU hotplug updates use only a single microcode blob,
i.e. the one where *intel_ucode_patch points to.

Remove the microcode cache and make it an AMD local feature.

  [ tglx:
     - save only at the end. Otherwise random microcode ends up in the
  	  pointer for early loading
     - free the ucode patch pointer in save_microcode_patch() only
    after kmemdup() has succeeded, as reported by Andrew Cooper ]

Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211722.404362809@linutronix.de
2023-10-19 12:29:39 +02:00
Li zeming
1de9992f9d KVM: x86/mmu: Remove unnecessary ‘NULL’ values from sptep
Don't initialize "spte" and "sptep" in fast_page_fault() as they are both
guaranteed (for all intents and purposes) to be written at the start of
every loop iteration.  Add a sanity check that "sptep" is non-NULL after
walking the shadow page tables, as encountering a NULL root would result
in "spte" not being written, i.e. would lead to uninitialized data or the
previous value being consumed.

Signed-off-by: Li zeming <zeming@nfschina.com>
Link: https://lore.kernel.org/r/20230905182006.2964-1-zeming@nfschina.com
[sean: rewrite changelog with --verbose]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-18 14:34:28 -07:00
Matthew Wilcox (Oracle)
247dbcdbf7 bitops: add xor_unlock_is_negative_byte()
Replace clear_bit_and_unlock_is_negative_byte() with
xor_unlock_is_negative_byte().  We have a few places that like to lock a
folio, set a flag and unlock it again.  Allow for the possibility of
combining the latter two operations for efficiency.  We are guaranteed
that the caller holds the lock, so it is safe to unlock it with the xor. 
The caller must guarantee that nobody else will set the flag without
holding the lock; it is not safe to do this with the PG_dirty flag, for
example.

Link: https://lkml.kernel.org/r/20231004165317.1061855-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 14:34:16 -07:00
Jim Mattson
329369caec x86: KVM: Add feature flag for CPUID.80000021H:EAX[bit 1]
Define an X86_FEATURE_* flag for CPUID.80000021H:EAX.[bit 1], and
advertise the feature to userspace via KVM_GET_SUPPORTED_CPUID.

Per AMD's "Processor Programming Reference (PPR) for AMD Family 19h
Model 61h, Revision B1 Processors (56713-B1-PUB)," this CPUID bit
indicates that a WRMSR to MSR_FS_BASE, MSR_GS_BASE, or
MSR_KERNEL_GS_BASE is non-serializing. This is a change in previously
architected behavior.

Effectively, this CPUID bit is a "defeature" bit, or a reverse
polarity feature bit. When this CPUID bit is clear, the feature
(serialization on WRMSR to any of these three MSRs) is available. When
this CPUID bit is set, the feature is not available.

KVM_GET_SUPPORTED_CPUID must pass this bit through from the underlying
hardware, if it is set. Leaving the bit clear claims that WRMSR to
these three MSRs will be serializing in a guest running under
KVM. That isn't true. Though KVM could emulate the feature by
intercepting writes to the specified MSRs, it does not do so
today. The guest is allowed direct read/write access to these MSRs
without interception, so the innate hardware behavior is preserved
under KVM.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231005031237.1652871-1-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-18 13:50:28 -07:00
Dongli Zhang
2081a8450e KVM: x86: remove always-false condition in kvmclock_sync_fn
The 'kvmclock_periodic_sync' is a readonly param that cannot change after
bootup.

The kvm_arch_vcpu_postcreate() is not going to schedule the
kvmclock_sync_work if kvmclock_periodic_sync == false.

As a result, the "if (!kvmclock_periodic_sync)" can never be true if the
kvmclock_sync_work = kvmclock_sync_fn() is scheduled.

Link: https://lore.kernel.org/kvm/a461bf3f-c17e-9c3f-56aa-726225e8391d@oracle.com
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Link: https://lore.kernel.org/r/20231001213637.76686-1-dongli.zhang@oracle.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-10-18 13:49:29 -07:00