linux-stable/arch/x86/kernel
Borislav Petkov 9a90ed065a x86/thermal: Fix LVT thermal setup for SMI delivery mode
There are machines out there with added value crap^WBIOS which provide an
SMI handler for the local APIC thermal sensor interrupt. Out of reset,
the BSP on those machines has something like 0x200 in that APIC register
(timestamps left in because this whole issue is timing sensitive):

  [    0.033858] read lvtthmr: 0x330, val: 0x200

which means:

 - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled
 - bits [10:8] have 010b which means SMI delivery mode.

Now, later during boot, when the kernel programs the local APIC, it
soft-disables it temporarily through the spurious vector register:

  setup_local_APIC:

  	...

	/*
	 * If this comes from kexec/kcrash the APIC might be enabled in
	 * SPIV. Soft disable it before doing further initialization.
	 */
	value = apic_read(APIC_SPIV);
	value &= ~APIC_SPIV_APIC_ENABLED;
	apic_write(APIC_SPIV, value);

which means (from the SDM):

"10.4.7.2 Local APIC State After It Has Been Software Disabled

...

* The mask bits for all the LVT entries are set. Attempts to reset these
bits will be ignored."

And this happens too:

  [    0.124111] APIC: Switch to symmetric I/O mode setup
  [    0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0
  [    0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0

This results in CPU 0 soft lockups depending on the placement in time
when the APIC soft-disable happens. Those soft lockups are not 100%
reproducible and the reason for that can only be speculated as no one
tells you what SMM does. Likely, it confuses the SMM code that the APIC
is disabled and the thermal interrupt doesn't doesn't fire at all,
leading to CPU 0 stuck in SMM forever...

Now, before

  4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")

due to how the APIC_LVTTHMR was read before APIC initialization in
mcheck_intel_therm_init(), it would read the value with the mask bit 16
clear and then intel_init_thermal() would replicate it onto the APs and
all would be peachy - the thermal interrupt would remain enabled.

But that commit moved that reading to a later moment in
intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too
late and with its interrupt mask bit set.

Thus, revert back to the old behavior of reading the thermal LVT
register before the APIC gets initialized.

Fixes: 4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
Reported-by: James Feeney <james@nurealm.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
2021-05-31 22:32:26 +02:00
..
acpi Trivial cleanups and fixes all over the place. 2021-04-26 09:25:47 -07:00
apic x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing 2021-05-29 11:41:14 +02:00
cpu - Enable -Wundef for the compressed kernel build stage 2021-05-16 09:31:06 -07:00
fpu x86: Fix various typos in comments 2021-03-18 15:31:53 +01:00
kprobes - turn the stack canary into a normal __percpu variable on 32-bit which 2021-04-27 17:45:09 -07:00
.gitignore
alternative.c The x86 MM changes in this cycle were: 2021-04-29 11:41:43 -07:00
amd_gart_64.c dma-mapping: split <linux/dma-mapping.h> 2020-10-06 07:07:03 +02:00
amd_nb.c x86: Fix various typos in comments, take #2 2021-03-21 23:50:28 +01:00
aperture_64.c
apm_32.c x86: Fix various typos in comments, take #2 2021-03-21 23:50:28 +01:00
asm-offsets.c x86/paravirt: Switch iret pvops to ALTERNATIVE 2021-03-11 19:58:54 +01:00
asm-offsets_32.c x86/stackprotector/32: Make the canary into a regular percpu variable 2021-03-08 13:19:05 +01:00
asm-offsets_64.c x86/xen: Drop USERGS_SYSRET64 paravirt call 2021-02-10 12:32:07 +01:00
audit_64.c
bootflag.c
check.c
cpuid.c smp: Cleanup smp_call_function*() 2020-11-24 16:47:49 +01:00
crash.c Devicetree updates for v5.13: 2021-04-28 15:50:24 -07:00
crash_core_32.c
crash_core_64.c
crash_dump_32.c x86/crashdump/32: Simplify copy_oldmem_page() 2020-11-24 14:42:09 +01:00
crash_dump_64.c
devicetree.c x86/devicetree: Fix the ioapic interrupt type table 2020-10-28 20:26:24 +01:00
doublefault_32.c x86/stackprotector/32: Make the canary into a regular percpu variable 2021-03-08 13:19:05 +01:00
dumpstack.c - Remove all uses of TIF_IA32 and TIF_X32 and reclaim the two bits in the end 2020-12-14 13:45:26 -08:00
dumpstack_32.c
dumpstack_64.c x86/irq/64: Adjust the per CPU irq stack pointer by 8 2021-02-10 23:34:14 +01:00
e820.c Power management updates for 5.13-rc1 2021-04-26 15:10:25 -07:00
early-quirks.c x86/gpu: Add Alderlake-S stolen memory support 2021-01-26 06:55:09 -08:00
early_printk.c
ebda.c
eisa.c
espfix_64.c
ftrace.c x86: Remove dynamic NOP selection 2021-03-15 16:24:59 +01:00
ftrace_32.S
ftrace_64.S x86/ftrace: Support objtool vmlinux.o validation in ftrace_64.S 2021-01-26 11:33:02 -06:00
head32.c
head64.c x86/sev-es: Rename sev-es.{ch} to sev.{ch} 2021-05-10 07:40:27 +02:00
head_32.S x86/stackprotector/32: Make the canary into a regular percpu variable 2021-03-08 13:19:05 +01:00
head_64.S - Fix the vmlinux size check on 64-bit along with adding useful clarifications on the topic 2020-12-14 13:54:50 -08:00
hpet.c x86/hpet: Use irq_find_matching_fwspec() to find remapping irqdomain 2020-10-28 20:26:28 +01:00
hw_breakpoint.c x86/debug: Prevent data breakpoints on cpu_dr7 2021-02-05 20:13:12 +01:00
i8237.c
i8253.c
i8259.c
idt.c x86: Fix various typos in comments 2021-03-18 15:31:53 +01:00
io_delay.c
ioport.c
irq.c x86: Fix various typos in comments 2021-03-18 15:31:53 +01:00
irq_32.c softirq: Move do_softirq_own_stack() to generic asm header 2021-02-10 23:34:16 +01:00
irq_64.c x86/softirq/64: Inline do_softirq_own_stack() 2021-02-10 23:34:17 +01:00
irq_work.c
irqflags.S x86/pv: Rework arch_local_irq_restore() to not use popf 2021-02-10 12:36:45 +01:00
irqinit.c x86/headers: Remove APIC headers from <asm/smp.h> 2020-08-06 16:13:09 +02:00
itmt.c
jailhouse.c locking/seqlock, headers: Untangle the spaghetti monster 2020-08-06 16:13:13 +02:00
jump_label.c x86: Remove dynamic NOP selection 2021-03-15 16:24:59 +01:00
kdebugfs.c
kexec-bzimage64.c x86: Use ELF fields defined in 'struct kimage' 2021-03-08 12:06:29 -07:00
kgdb.c x86: Fix various typos in comments 2021-03-18 15:31:53 +01:00
ksysfs.c
kvm.c x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline() 2021-05-07 06:06:11 -04:00
kvmclock.c x86/kvm: Disable all PV features on crash 2021-05-07 06:06:10 -04:00
ldt.c x86/ldt: Use tlb_gather_mmu_fullmm() when freeing LDT page-tables 2021-01-29 20:02:29 +01:00
machine_kexec_32.c
machine_kexec_64.c Devicetree updates for v5.13: 2021-04-28 15:50:24 -07:00
Makefile x86/sev-es: Rename sev-es.{ch} to sev.{ch} 2021-05-10 07:40:27 +02:00
mmconf-fam10h_64.c x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG 2021-05-10 07:51:38 +02:00
module.c x86/build: Treat R_386_PLT32 relocation as R_386_PC32 2021-01-28 12:24:06 +01:00
mpparse.c Surgery of the MSI interrupt handling to prepare the support of upcoming 2020-10-12 11:40:41 -07:00
msr.c x86/MSR: Filter MSR writes through X86_IOC_WRMSR_REGS ioctl too 2021-01-27 19:06:47 +01:00
nmi.c x86/sev-es: Rename sev-es.{ch} to sev.{ch} 2021-05-10 07:40:27 +02:00
nmi_selftest.c
paravirt-spinlocks.c x86/paravirt: Add new features for paravirt patching 2021-03-11 19:51:49 +01:00
paravirt.c The x86 MM changes in this cycle were: 2021-04-29 11:41:43 -07:00
pci-dma.c dma-mapping: move dma-debug.h to kernel/dma/ 2020-10-06 07:07:05 +02:00
pci-iommu_table.c x86: Remove definition of DEBUG 2021-01-15 08:23:10 +01:00
pci-swiotlb.c
pcspeaker.c
perf_regs.c - Remove all uses of TIF_IA32 and TIF_X32 and reclaim the two bits in the end 2020-12-14 13:45:26 -08:00
platform-quirks.c
pmem.c
probe_roms.c
process.c x86/process: setup io_threads more like normal user space threads 2021-05-05 17:47:41 -06:00
process.h
process_32.c
process_64.c x86/irq: Sanitize irq stack tracking 2021-02-10 23:34:13 +01:00
ptrace.c x86/ptrace: Clean up PTRACE_GETREGS/PTRACE_PUTREGS regset selection 2021-02-04 12:33:15 +01:00
pvclock.c x86: Fix various typos in comments 2021-03-18 15:31:53 +01:00
quirks.c x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() 2020-10-06 11:18:04 +02:00
reboot.c x86: 2021-02-21 13:31:43 -08:00
reboot_fixups_32.c
relocate_kernel_32.S x86: Fix various typos in comments, take #2 2021-03-21 23:50:28 +01:00
relocate_kernel_64.S x86: Fix various typos in comments, take #2 2021-03-21 23:50:28 +01:00
resource.c
rtc.c
setup.c x86/thermal: Fix LVT thermal setup for SMI delivery mode 2021-05-31 22:32:26 +02:00
setup_percpu.c x86/stackprotector/32: Make the canary into a regular percpu variable 2021-03-08 13:19:05 +01:00
sev-shared.c x86/sev-es: Invalidate the GHCB after completing VMGEXIT 2021-05-18 07:06:29 +02:00
sev.c x86/sev-es: Use __put_user()/__get_user() for data accesses 2021-05-19 18:45:37 +02:00
sev_verify_cbit.S x86/boot/compressed/64: Check SEV encryption in 64-bit boot-path 2020-10-29 18:06:52 +01:00
signal.c Merge branch 'linus' into x86/cleanups, to resolve conflict 2021-03-21 22:16:08 +01:00
signal_compat.c signal: Deliver all of the siginfo perf data in _perf 2021-05-18 16:20:54 -05:00
smp.c x86: Fix various typos in comments, take #2 2021-03-21 23:50:28 +01:00
smpboot.c x86, sched: Fix the AMD CPPC maximum performance value on certain AMD Ryzen generations 2021-05-13 12:10:24 +02:00
stacktrace.c stacktrace: Move documentation for arch_stack_walk_reliable() to header 2021-03-10 15:52:31 +01:00
static_call.c x86: Remove dynamic NOP selection 2021-03-15 16:24:59 +01:00
step.c entry: Ensure trap after single-step on system call return 2021-02-06 00:21:42 +01:00
sys_ia32.c x86: switch to kernel_clone() 2020-08-20 13:12:58 +02:00
sys_x86_64.c x86/mm: Refine mmap syscall implementation 2021-01-05 19:07:42 +01:00
sysfb.c
sysfb_efi.c x86: Fix various typos in comments 2021-03-18 15:31:53 +01:00
sysfb_simplefb.c
tboot.c x86/boot/tboot: Avoid Wstringop-overread-warning 2021-03-23 00:16:13 +01:00
time.c
tls.c x86/stackprotector/32: Make the canary into a regular percpu variable 2021-03-08 13:19:05 +01:00
tls.h
topology.c x86: Fix various typos in comments 2021-03-18 15:31:53 +01:00
trace_clock.c
tracepoint.c
traps.c - turn the stack canary into a normal __percpu variable on 32-bit which 2021-04-27 17:45:09 -07:00
tsc.c Trivial cleanups and fixes all over the place. 2021-04-26 09:25:47 -07:00
tsc_msr.c Misc fixes and small updates all around the place: 2020-08-15 10:38:03 -07:00
tsc_sync.c x86: Fix various typos in comments, take #2 2021-03-21 23:50:28 +01:00
umip.c - turn the stack canary into a normal __percpu variable on 32-bit which 2021-04-27 17:45:09 -07:00
unwind_frame.c
unwind_guess.c
unwind_orc.c x86/unwind/orc: Silence warnings caused by missing ORC data 2021-03-06 13:09:45 +01:00
uprobes.c x86/uprobes: Convert to insn_decode() 2021-03-15 12:05:03 +01:00
verify_cpu.S
vm86_32.c x86/vm86/32: Remove VM86_SCREEN_BITMAP support 2021-01-21 20:08:53 +01:00
vmlinux.lds.S x86/build: Fix vmlinux size check on 64-bit 2020-10-29 21:54:35 +01:00
vsmp_64.c
x86_init.c x86/apic: Support 15 bits of APIC ID in MSI where available 2020-10-28 20:26:29 +01:00