linux-stable/mm
Miaohe Lin d72b771191 mm/huge_memory: don't unpoison huge_zero_folio
commit fe6f86f4b4 upstream.

When I did memory failure tests recently, below panic occurs:

 kernel BUG at include/linux/mm.h:1135!
 invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
 RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
 FS:  0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
 Call Trace:
  <TASK>
  do_shrink_slab+0x14f/0x6a0
  shrink_slab+0xca/0x8c0
  shrink_node+0x2d0/0x7d0
  balance_pgdat+0x33a/0x720
  kswapd+0x1f3/0x410
  kthread+0xd5/0x100
  ret_from_fork+0x2f/0x50
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 Modules linked in: mce_inject hwpoison_inject
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
 FS:  0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0

The root cause is that HWPoison flag will be set for huge_zero_folio
without increasing the folio refcnt.  But then unpoison_memory() will
decrease the folio refcnt unexpectedly as it appears like a successfully
hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when
releasing huge_zero_folio.

Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue.
We're not prepared to unpoison huge_zero_folio yet.

Link: https://lkml.kernel.org/r/20240516122608.22610-1-linmiaohe@huawei.com
Fixes: 478d134e95 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Xu Yu <xuyu@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-21 14:40:38 +02:00
..
damon
kasan
kfence
kmsan kmsan: do not wipe out origin when doing partial unpoisoning 2024-06-16 13:51:06 +02:00
backing-dev.c
balloon_compaction.c
bootmem_info.c
cma.c mm/cma: drop incorrect alignment check in cma_init_reserved_mem 2024-06-16 13:51:07 +02:00
cma.h
cma_debug.c
cma_sysfs.c
compaction.c
debug.c
debug_page_alloc.c
debug_page_ref.c
debug_vm_pgtable.c
dmapool.c
dmapool_test.c
early_ioremap.c
fadvise.c
fail_page_alloc.c
failslab.c
filemap.c
folio-compat.c
gup.c
gup_test.c
gup_test.h
highmem.c
hmm.c
huge_memory.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-06-16 13:51:04 +02:00
hugetlb.c mm/hugetlb: pass correct order_per_bit to cma_declare_contiguous_nid 2024-06-16 13:51:07 +02:00
hugetlb_cgroup.c
hugetlb_vmemmap.c
hugetlb_vmemmap.h
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c
io-mapping.c
ioremap.c
Kconfig
Kconfig.debug
khugepaged.c
kmemleak.c
ksm.c mm/ksm: fix ksm_zero_pages accounting 2024-06-16 13:51:06 +02:00
list_lru.c
maccess.c
madvise.c
Makefile
mapping_dirty_helpers.c
memblock.c memblock: make memblock_set_node() also warn about use of MAX_NUMNODES 2024-06-21 14:40:29 +02:00
memcontrol.c
memfd.c
memory-failure.c mm/huge_memory: don't unpoison huge_zero_folio 2024-06-21 14:40:38 +02:00
memory-tiers.c
memory.c
memory_hotplug.c
mempolicy.c
mempool.c
memremap.c
memtest.c
migrate.c
migrate_device.c
mincore.c
mlock.c
mm_init.c
mm_slot.h
mmap.c
mmap_lock.c
mmu_gather.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c
oom_kill.c
page-writeback.c
page_alloc.c
page_counter.c
page_ext.c
page_idle.c
page_io.c
page_isolation.c
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c
page_vma_mapped.c
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-06-16 13:51:04 +02:00
process_vm_access.c
ptdump.c
readahead.c
rmap.c
rodata_test.c
secretmem.c
shmem.c
shmem_quota.c
show_mem.c
shrinker.c
shrinker_debug.c
shuffle.c
shuffle.h
slab.h
slab_common.c
slub.c
sparse-vmemmap.c
sparse.c
swap.c
swap.h
swap_cgroup.c
swap_slots.c
swap_state.c
swapfile.c
truncate.c
usercopy.c
userfaultfd.c mm/userfaultfd: Do not place zeropages when zeropages are disallowed 2024-05-30 09:44:07 +02:00
util.c
vmalloc.c mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL 2024-06-16 13:51:08 +02:00
vmpressure.c
vmscan.c
vmstat.c
workingset.c
z3fold.c
zbud.c
zpool.c
zsmalloc.c
zswap.c