linux-stable/drivers/acpi/apei
Shuai Xue 410063c9e1 ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events
[ Upstream commit a70297d221 ]

There are two major types of uncorrected recoverable (UCR) errors :

 - Synchronous error: The error is detected and raised at the point of
   the consumption in the execution flow, e.g. when a CPU tries to
   access a poisoned cache line. The CPU will take a synchronous error
   exception such as Synchronous External Abort (SEA) on Arm64 and
   Machine Check Exception (MCE) on X86. OS requires to take action (for
   example, offline failure page/kill failure thread) to recover this
   uncorrectable error.

 - Asynchronous error: The error is detected out of processor execution
   context, e.g. when an error is detected by a background scrubber.
   Some data in the memory are corrupted. But the data have not been
   consumed. OS is optional to take action to recover this uncorrectable
   error.

When APEI firmware first is enabled, a platform may describe one error
source for the handling of synchronous errors (e.g. MCE or SEA notification
), or for handling asynchronous errors (e.g. SCI or External Interrupt
notification). In other words, we can distinguish synchronous errors by
APEI notification. For synchronous errors, kernel will kill the current
process which accessing the poisoned page by sending SIGBUS with
BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify the
process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in
early kill mode. However, the GHES driver always sets mf_flags to 0 so that
all synchronous errors are handled as asynchronous errors in memory failure.

To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous
events.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Tested-by: Ma Wupeng <mawupeng1@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05 20:14:15 +00:00
..
Kconfig ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue 2019-02-07 23:10:45 +01:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
apei-base.c ACPI: APEI: Remove a useless include 2022-12-02 20:18:50 +01:00
apei-internal.h efi: fix missing prototype warnings 2023-05-25 09:26:19 +02:00
bert.c ACPI: APEI: mark bert_disable as __initdata 2023-06-12 19:23:25 +02:00
einj.c ACPI: APEI: EINJ: warn on invalid argument when explicitly indicated by platform 2023-03-27 20:46:08 +02:00
erst-dbg.c ACPI: APEI: Fix missing ERST record id 2022-04-13 20:29:24 +02:00
erst.c ACPI: APEI: Remove unneeded result variables 2022-09-24 18:50:42 +02:00
ghes.c ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events 2024-02-05 20:14:15 +00:00
hest.c ACPI: APEI: fix return value of __setup handlers 2022-03-08 19:43:39 +01:00