linux-stable/arch
Tony Luck 619d747c18 x86/mce: Avoid infinite loop for copy from user recovery
commit 81065b35e2 upstream.

There are two cases for machine check recovery:

1) The machine check was triggered by ring3 (application) code.
   This is the simpler case. The machine check handler simply queues
   work to be executed on return to user. That code unmaps the page
   from all users and arranges to send a SIGBUS to the task that
   triggered the poison.

2) The machine check was triggered in kernel code that is covered by
   an exception table entry. In this case the machine check handler
   still queues a work entry to unmap the page, etc. but this will
   not be called right away because the #MC handler returns to the
   fix up code address in the exception table entry.

Problems occur if the kernel triggers another machine check before the
return to user processes the first queued work item.

Specifically, the work is queued using the ->mce_kill_me callback
structure in the task struct for the current thread. Attempting to queue
a second work item using this same callback results in a loop in the
linked list of work functions to call. So when the kernel does return to
user, it enters an infinite loop processing the same entry for ever.

There are some legitimate scenarios where the kernel may take a second
machine check before returning to the user.

1) Some code (e.g. futex) first tries a get_user() with page faults
   disabled. If this fails, the code retries with page faults enabled
   expecting that this will resolve the page fault.

2) Copy from user code retries a copy in byte-at-time mode to check
   whether any additional bytes can be copied.

On the other side of the fence are some bad drivers that do not check
the return value from individual get_user() calls and may access
multiple user addresses without noticing that some/all calls have
failed.

Fix by adding a counter (current->mce_count) to keep track of repeated
machine checks before task_work() is called. First machine check saves
the address information and calls task_work_add(). Subsequent machine
checks before that task_work call back is executed check that the address
is in the same page as the first machine check (since the callback will
offline exactly one page).

Expected worst case is four machine checks before moving on (e.g. one
user access with page faults disabled, then a repeat to the same address
with page faults enabled ... repeat in copy tail bytes). Just in case
there is some code that loops forever enforce a limit of 10.

 [ bp: Massage commit message, drop noinstr, fix typo, extend panic
   messages. ]

Fixes: 5567d11c21 ("x86/mce: Send #MC singal from task work")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22 12:28:07 +02:00
..
alpha alpha: Send stop IPI to send to online CPUs 2021-08-12 13:22:20 +02:00
arc ARC: export clear_user_page() for modules 2021-09-22 12:28:04 +02:00
arm ARM: tegra: tamonten: Fix UART pad setting 2021-09-18 13:40:28 +02:00
arm64 KVM: arm64: Handle PSCI resets before userspace touches vCPU state 2021-09-22 12:28:04 +02:00
c6x
csky csky: syscache: Fixup duplicate cache flush 2021-07-14 16:56:52 +02:00
h8300 h8300: fix PREEMPTION build, TI_PRE_COUNT undefined 2021-02-17 11:02:28 +01:00
hexagon hexagon: use common DISCARDS macro 2021-07-20 16:05:53 +02:00
ia64 mm/page_alloc: fix memory map initialization for descending nodes 2021-07-25 14:36:18 +02:00
m68k m68knommu: only set CONFIG_ISA_DMA_API for ColdFire sub-arch 2021-09-18 13:40:31 +02:00
microblaze
mips MIPS: Malta: fix alignment of the devicetree buffer 2021-09-18 13:40:16 +02:00
nds32 nds32: fix up stack guard gap 2021-07-28 14:35:46 +02:00
nios2 nios2: fixed broken sys_clone syscall 2021-03-04 11:38:16 +01:00
openrisc openrisc: don't printk() unconditionally 2021-09-18 13:40:13 +02:00
parisc parisc: fix crash with signals and alloca 2021-09-18 13:40:35 +02:00
powerpc KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registers 2021-09-22 12:27:59 +02:00
riscv riscv: Fixup patch_text panic in ftrace 2021-09-03 10:09:29 +02:00
s390 s390/bpf: Fix branch shortening during codegen pass 2021-09-22 12:28:02 +02:00
sh sched/core: Initialize the idle task with preemption disabled 2021-07-14 16:55:50 +02:00
sparc bpf: Introduce BPF nospec instruction for mitigating Spectre v4 2021-08-04 12:46:44 +02:00
um um: fix error return code in winch_tramp() 2021-07-20 16:05:51 +02:00
x86 x86/mce: Avoid infinite loop for copy from user recovery 2021-09-22 12:28:07 +02:00
xtensa xtensa: ISS: don't panic in rs_init 2021-09-18 13:40:22 +02:00
.gitignore
Kconfig