linux-stable/arch
Jue Wang ba37c73be3 x86/mce: Work around an erratum on fast string copy instructions
[ Upstream commit 8ca97812c3 ]

A rare kernel panic scenario can happen when the following conditions
are met due to an erratum on fast string copy instructions:

1) An uncorrected error.
2) That error must be in first cache line of a page.
3) Kernel must execute page_copy from the page immediately before that
page.

The fast string copy instructions ("REP; MOVS*") could consume an
uncorrectable memory error in the cache line _right after_ the desired
region to copy and raise an MCE.

Bit 0 of MSR_IA32_MISC_ENABLE can be cleared to disable fast string
copy and will avoid such spurious machine checks. However, that is less
preferable due to the permanent performance impact. Considering memory
poison is rare, it's desirable to keep fast string copy enabled until an
MCE is seen.

Intel has confirmed the following:
1. The CPU erratum of fast string copy only applies to Skylake,
Cascade Lake and Cooper Lake generations.

Directly return from the MCE handler:
2. Will result in complete execution of the "REP; MOVS*" with no data
loss or corruption.
3. Will not result in another MCE firing on the next poisoned cache line
due to "REP; MOVS*".
4. Will resume execution from a correct point in code.
5. Will result in the same instruction that triggered the MCE firing a
second MCE immediately for any other software recoverable data fetch
errors.
6. Is not safe without disabling the fast string copy, as the next fast
string copy of the same buffer on the same CPU would result in a PANIC
MCE.

This should mitigate the erratum completely with the only caveat that
the fast string copy is disabled on the affected hyper thread thus
performance degradation.

This is still better than the OS crashing on MCEs raised on an
irrelevant process due to "REP; MOVS*' accesses in a kernel context,
e.g., copy_page.

Tested:

Injected errors on 1st cache line of 8 anonymous pages of process
'proc1' and observed MCE consumption from 'proc2' with no panic
(directly returned).

Without the fix, the host panicked within a few minutes on a
random 'proc2' process due to kernel access from copy_page.

  [ bp: Fix comment style + touch ups, zap an unlikely(), improve the
    quirk function's readability. ]

Signed-off-by: Jue Wang <juew@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20220218013209.2436006-1-juew@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13 19:27:16 +02:00
..
alpha bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
arc uaccess: fix type mismatch warnings from access_ok() 2022-04-08 13:58:44 +02:00
arm ARM: dts: spear13xx: Update SPI dma properties 2022-04-08 13:59:03 +02:00
arm64 KVM: arm64: Do not change the PMU event filter after a VCPU has run 2022-04-13 19:27:13 +02:00
csky uaccess: fix type mismatch warnings from access_ok() 2022-04-08 13:58:44 +02:00
h8300 bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
hexagon uaccess: fix integer overflow on access_ok() 2022-03-28 10:03:21 +02:00
ia64 ia64: make IA64_MCA_RECOVERY bool instead of tristate 2022-01-30 09:56:58 +02:00
m68k m68k: coldfire/device.c: only build for MCF_EDMA when h/w macros are defined 2022-04-08 13:57:51 +02:00
microblaze uaccess: fix nios2 and microblaze get_user_8() 2022-04-08 13:57:49 +02:00
mips mips: Enable KCSAN - take 2 2022-04-08 13:58:59 +02:00
nds32 nds32: fix access_ok() checks in get/put_user 2022-03-28 10:03:22 +02:00
nios2 uaccess: fix type mismatch warnings from access_ok() 2022-04-08 13:58:44 +02:00
openrisc bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
parisc parisc: Fix handling off probe non-access faults 2022-04-08 13:58:38 +02:00
powerpc powerpc/set_memory: Avoid spinlock recursion in change_page_attr() 2022-04-13 19:27:14 +02:00
riscv riscv module: remove (NOLOAD) 2022-04-08 13:58:57 +02:00
s390 KVM: s390x: fix SCK locking 2022-04-08 13:57:30 +02:00
sh bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
sparc uaccess: fix type mismatch warnings from access_ok() 2022-04-08 13:58:44 +02:00
um um: fix and optimize xor select template for CONFIG64 and timetravel mode 2022-04-13 19:27:06 +02:00
x86 x86/mce: Work around an erratum on fast string copy instructions 2022-04-13 19:27:16 +02:00
xtensa xtensa: add missing XCHAL_HAVE_WINDOWED check 2022-04-08 13:58:17 +02:00
.gitignore
Kconfig stack: Constrain and fix stack offset randomization with Clang builds 2022-04-08 13:57:34 +02:00