linux-stable/include
Naoya Horiguchi b16a6af974 mm: hwpoison: disable memory error handling on 1GB hugepage
commit 31286a8484 upstream.

Recently the following BUG was reported:

    Injecting memory failure for pfn 0x3c0000 at process virtual address 0x7fe300000000
    Memory failure: 0x3c0000: recovery action for huge page: Recovered
    BUG: unable to handle kernel paging request at ffff8dfcc0003000
    IP: gup_pgd_range+0x1f0/0xc20
    PGD 17ae72067 P4D 17ae72067 PUD 0
    Oops: 0000 [#1] SMP PTI
    ...
    CPU: 3 PID: 5467 Comm: hugetlb_1gb Not tainted 4.15.0-rc8-mm1-abc+ #3
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014

You can easily reproduce this by calling madvise(MADV_HWPOISON) twice on
a 1GB hugepage.  This happens because get_user_pages_fast() is not aware
of a migration entry on pud that was created in the 1st madvise() event.

I think that conversion to pud-aligned migration entry is working, but
other MM code walking over page table isn't prepared for it.  We need
some time and effort to make all this work properly, so this patch
avoids the reported bug by just disabling error handling for 1GB
hugepage.

[n-horiguchi@ah.jp.nec.com: v2]
  Link: http://lkml.kernel.org/r/1517284444-18149-1-git-send-email-n-horiguchi@ah.jp.nec.com
Link: http://lkml.kernel.org/r/1517207283-15769-1-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11 16:29:20 +02:00
..
acpi ACPI / EC: Fix regression related to PM ops support in ECDT device 2017-12-05 11:26:33 +01:00
asm-generic bug.h: work around GCC PR82365 in BUG() 2018-05-30 07:52:00 +02:00
clocksource License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
crypto crypto: af_alg - Fix race around ctx->rcvused by making it atomic_t 2018-03-03 10:24:29 +01:00
drm drm/syncobj: Stop reusing the same struct file for all syncobj -> fd 2018-03-28 18:24:47 +02:00
dt-bindings dt-bindings: clock: mediatek: add binding for fixed-factor clock axisel_d4 2018-04-24 09:36:34 +02:00
keys License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kvm arm/arm64: KVM: Add PSCI version selection API 2018-05-01 12:58:27 -07:00
linux mm: hwpoison: disable memory error handling on 1GB hugepage 2018-07-11 16:29:20 +02:00
math-emu
media License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memory
misc
net netfilter: nf_tables: bogus EBUSY in chain deletions 2018-07-08 15:30:49 +02:00
pcmcia
ras License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
rdma IB/hfi1: Optimize kthread pointer locking when queuing CQ entries 2018-07-03 11:24:54 +02:00
scsi scsi: core: Make SCSI Status CONDITION MET equivalent to GOOD 2018-05-25 16:17:50 +02:00
soc soc: bcm2835: Make !RASPBERRYPI_FIRMWARE dummies return failure 2018-06-21 04:02:42 +09:00
sound ALSA: control: Hardening for potential Spectre v1 2018-05-01 12:58:16 -07:00
target target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK 2017-11-30 08:40:51 +00:00
trace tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all} 2018-05-22 18:53:57 +02:00
uapi btrfs: define SUPER_FLAG_METADUMP_V2 2018-06-11 22:49:18 +02:00
video License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xen xen/balloon: Mark unallocated host memory as UNUSABLE 2018-03-03 10:24:28 +01:00