linux-stable/drivers/base
zhenwei pi 67f22ba775 mm/memory-failure: disable unpoison once hw error happens
Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only.  Since 17fae1294a, the KPTE gets cleared
on a x86 platform once hardware memory corrupts.

Unpoisoning a hardware corrupted page puts page back buddy only, the
kernel has a chance to access the page with *NOT PRESENT* KPTE.  This
leads BUG during accessing on the corrupted KPTE.

Suggested by David&Naoya, disable unpoison mechanism when a real HW error
happens to avoid BUG like this:

 Unpoison: Software-unpoisoned page 0x61234
 BUG: unable to handle page fault for address: ffff888061234000
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G   M       OE     5.18.0.bm.1-amd64 #7
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
 RIP: 0010:clear_page_erms+0x7/0x10
 Code: ...
 RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
 RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
 RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
 R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
 FS:  00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  prep_new_page+0x151/0x170
  get_page_from_freelist+0xca0/0xe20
  ? sysvec_apic_timer_interrupt+0xab/0xc0
  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
  __alloc_pages+0x17e/0x340
  __folio_alloc+0x17/0x40
  vma_alloc_folio+0x84/0x280
  __handle_mm_fault+0x8d4/0xeb0
  handle_mm_fault+0xd5/0x2a0
  do_user_addr_fault+0x1d0/0x680
  ? kvm_read_and_reset_apf_flags+0x3b/0x50
  exc_page_fault+0x78/0x170
  asm_exc_page_fault+0x27/0x30

Link: https://lkml.kernel.org/r/20220615093209.259374-2-pizhenwei@bytedance.com
Fixes: 847ce401df ("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294a ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>	[5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16 19:11:32 -07:00
..
firmware_loader firmware_loader: enable XZ by default if compressed support is enabled 2022-06-03 15:46:03 -07:00
power Thermal control updates for 5.19-rc1 2022-05-24 16:19:30 -07:00
regmap regmap: Add missing map->bus check 2022-05-09 12:48:45 +01:00
test driver core: Simplify async probe test code by using ktime_ms_delta() 2021-12-29 10:57:22 +01:00
Kconfig devtmpfs: mount with noexec and nosuid 2021-12-30 13:54:42 +01:00
Makefile driver core: Add sysfs support for physical location of a device 2022-04-27 09:51:57 +02:00
arch_numa.c mm: percpu: add generic pcpu_populate_pte() function 2022-01-20 08:52:52 +02:00
arch_topology.c arch_topology: Trace the update thermal pressure 2022-05-06 09:57:38 +02:00
attribute_container.c driver core: attribute_container: fix W=1 warnings 2021-05-14 13:37:10 +02:00
auxiliary.c Documentation/auxiliary_bus: Move the text into the code 2021-12-03 16:41:50 +01:00
base.h driver core: Extend deferred probe timeout on driver registration 2022-05-19 19:32:33 +02:00
bus.c driver: base: fix UAF when driver_attach failed 2022-05-19 19:28:42 +02:00
cacheinfo.c cacheinfo: clear cache_leaves(cpu) in free_cache_attributes() 2021-07-21 17:29:40 +02:00
class.c block: remove genhd.h 2022-02-02 07:49:59 -07:00
component.c component: Add common helper for compare/release functions 2022-02-25 12:16:12 +01:00
container.c
core.c Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
cpu.c sched/isolation: Use single feature type while referring to housekeeping cpumask 2022-02-16 15:57:55 +01:00
dd.c driver core: Set default deferred_probe_timeout back to 0. 2022-06-03 11:58:54 -07:00
devcoredump.c devcoredump: remove contact information 2021-06-04 15:05:44 +02:00
devres.c devres: fix typos in comments 2022-03-18 14:30:12 +01:00
devtmpfs.c Driver core changes for 5.18-rc1 2022-03-28 12:41:28 -07:00
driver.c driver core: Extend deferred probe timeout on driver registration 2022-05-19 19:32:33 +02:00
firmware.c
hypervisor.c
init.c drivers/base/node: consolidate node device subsystem initialization in node_dev_init() 2022-03-22 15:57:10 -07:00
isa.c bus: Make remove callback return void 2021-07-21 11:53:42 +02:00
map.c driver: base: Prefer unsigned int to bare use of unsigned 2021-07-21 17:30:09 +02:00
memory.c mm/memory-failure: disable unpoison once hw error happens 2022-06-16 19:11:32 -07:00
module.c
node.c drivers/base/node.c: fix compaction sysfs file leak 2022-04-28 23:16:06 -07:00
physical_location.c driver core: location: Add "back" as a possible output for panel 2022-05-19 19:28:32 +02:00
physical_location.h driver core: Add sysfs support for physical location of a device 2022-04-27 09:51:57 +02:00
pinctrl.c
platform-msi.c platform-msi: Simplify platform device MSI code 2021-12-16 22:22:19 +01:00
platform.c Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
property.c USB / Thunderbolt changes for 5.19-rc1 2022-06-03 11:17:49 -07:00
soc.c base: soc: Make soc_device_match() simpler and easier to read 2022-03-18 14:28:07 +01:00
swnode.c software node: fix wrong node passed to find nargs_prop 2021-12-22 18:26:18 +01:00
syscore.c
topology.c topology: Fix up build warning in topology_is_visible() 2022-04-23 12:53:11 +02:00
trace.c devres: Enable trace events 2021-06-15 17:14:36 +02:00
trace.h devres: Enable trace events 2021-06-15 17:14:36 +02:00
transport_class.c