linux-stable/mm
Rongwei Wang 111a79d9b9 mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
commit 6fe7d6b992 upstream.

The si->lock must be held when deleting the si from the available list.
Otherwise, another thread can re-add the si to the available list, which
can lead to memory corruption.  The only place we have found where this
happens is in the swapoff path.  This case can be described as below:

core 0                       core 1
swapoff

del_from_avail_list(si)      waiting

try lock si->lock            acquire swap_avail_lock
                             and re-add si into
                             swap_avail_head

acquire si->lock but missing si already being added again, and continuing
to clear SWP_WRITEOK, etc.

It can be easily found that a massive warning messages can be triggered
inside get_swap_pages() by some special cases, for example, we call
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
run much swapon-swapoff operations (e.g.  stress-ng-swap).

However, in the worst case, panic can be caused by the above scene.  In
swapoff(), the memory used by si could be kept in swap_info[] after
turning off a swap.  This means memory corruption will not be caused
immediately until allocated and reset for a new swap in the swapon path.
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)

------------[ cut here ]------------
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Modules linked in: rfkill(E) crct10dif_ce(E)...
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : plist_check_prev_next_node+0x50/0x70
lr : plist_check_prev_next_node+0x50/0x70
sp : ffff0018009d3c30
x29: ffff0018009d3c40 x28: ffff800011b32a98
x27: 0000000000000000 x26: ffff001803908000
x25: ffff8000128ea088 x24: ffff800011b32a48
x23: 0000000000000028 x22: ffff001800875c00
x21: ffff800010f9e520 x20: ffff001800875c00
x19: ffff001800fdc6e0 x18: 0000000000000030
x17: 0000000000000000 x16: 0000000000000000
x15: 0736076307640766 x14: 0730073007380731
x13: 0736076307640766 x12: 0730073007380731
x11: 000000000004058d x10: 0000000085a85b76
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0017df9ed338 x4 : 0000000000000001
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 plist_check_prev_next_node+0x50/0x70
 plist_check_head+0x80/0xf0
 plist_add+0x28/0x140
 add_to_avail_list+0x9c/0xf0
 _enable_swap_info+0x78/0xb4
 __do_sys_swapon+0x918/0xa10
 __arm64_sys_swapon+0x20/0x30
 el0_svc_common+0x8c/0x220
 do_el0_svc+0x2c/0x90
 el0_svc+0x1c/0x30
 el0_sync_handler+0xa8/0xb0
 el0_sync+0x148/0x180
irq event stamp: 2082270

Now, si->lock locked before calling 'del_from_avail_list()' to make sure
other thread see the si had been deleted and SWP_WRITEOK cleared together,
will not reinsert again.

This problem exists in versions after stable 5.10.y.

Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com
Fixes: a2468cc9bf ("swap: choose swap device according to numa node")
Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com>
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:02:11 +02:00
..
kasan panic: Consolidate open-coded panic_on_warn checks 2023-02-06 07:46:34 +01:00
backing-dev.c mm: bdi: initialize bdi_min_ratio when bdi is unregistered 2021-12-14 10:16:54 +01:00
balloon_compaction.c virtio_balloon: fix deadlock on OOM 2018-10-13 09:27:30 +02:00
bootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cleancache.c fs: switch ->s_uuid to uuid_t 2017-06-05 16:59:12 +02:00
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-08-06 19:05:23 +02:00
cma.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-06-15 11:54:51 +02:00
compaction.c mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone 2019-10-05 12:48:13 +02:00
debug.c mm: get rid of vmacache_flush_all() entirely 2018-09-19 22:43:48 +02:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dmapool.c
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2018-02-25 11:08:03 +01:00
fadvise.c mm/fadvise.c: fix signed overflow UBSAN complaint 2018-09-15 09:45:28 +02:00
failslab.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
filemap.c mm: fs: initialize fsdata passed to write_begin/write_end interface 2022-11-25 17:36:55 +01:00
frame_vector.c v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() fails 2022-12-08 11:16:33 +01:00
frontswap.c
gup.c gup: document and work around "COW can break either way" issue 2021-04-28 12:08:42 +02:00
highmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hmm.c mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL 2019-01-13 10:01:02 +01:00
huge_memory.c mm/huge_memory.c: don't discard hugepage if other processes are mapping it 2021-07-20 16:17:41 +02:00
hugetlb.c mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages 2022-11-03 23:50:54 +09:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-20 17:59:33 +01:00
hwpoison-inject.c mm: hwpoison: call shake_page() unconditionally 2017-05-03 15:52:12 -07:00
init-mm.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-07-11 12:48:10 +02:00
interval_tree.c lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
Kconfig mm/hmm: select mmu notifier when selecting HMM 2019-06-15 11:54:51 +02:00
Kconfig.debug kmemcheck: rip it out 2018-02-22 15:42:24 +01:00
khugepaged.c mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths 2023-01-18 09:26:04 +01:00
kmemleak-test.c
kmemleak.c Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" 2022-09-15 12:23:51 +02:00
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-22 10:57:39 +02:00
list_lru.c mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node 2019-06-19 08:20:54 +02:00
maccess.c uaccess: Add non-pagefault user-space write function 2020-09-09 19:03:11 +02:00
madvise.c mm: madvise(MADV_DODUMP): allow hugetlbfs pages 2018-10-10 08:54:22 +02:00
Makefile kmemcheck: rip it out 2018-02-22 15:42:24 +01:00
memblock.c memblock: use kfree() to release kmalloced memblock regions 2022-03-02 11:34:00 +01:00
memcontrol.c memcg: fix possible use-after-free in memcg_write_event_control() 2022-12-14 11:26:13 +01:00
memory-failure.c mm/memory-failure: make sure wait for page writeback in memory_failure 2021-06-30 08:48:48 -04:00
memory.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2023-01-18 09:26:04 +01:00
memory_hotplug.c mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() 2021-09-22 11:45:34 +02:00
mempolicy.c migrate: hugetlb: check for hugetlb shared PMD in node migration 2023-02-22 12:46:04 +01:00
mempool.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c mm/migrate_device.c: flush TLB while holding PTL 2022-10-26 13:16:50 +02:00
mincore.c mm/mincore.c: make mincore() more conservative 2019-05-21 18:50:16 +02:00
mlock.c mm/mlock.c: change count_mm_mlocked_page_nr return type 2019-07-10 09:54:36 +02:00
mm_init.c
mmap.c mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region() 2022-09-20 11:51:30 +02:00
mmu_context.c
mmu_notifier.c mm/mmu_notifier: use hlist_add_head_rcu() 2019-07-31 07:28:56 +02:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa 2020-03-11 18:03:02 +01:00
mremap.c mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) 2022-04-20 09:08:29 +02:00
msync.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nobootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nommu.c x86/mm: split vmalloc_sync_all() 2020-04-02 16:34:20 +02:00
oom_kill.c mm, oom: do not trigger out_of_memory from the #PF 2021-11-26 11:40:37 +01:00
page-writeback.c mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() 2020-01-23 08:20:32 +01:00
page_alloc.c mm: prevent page_frag_alloc() from corrupting the memory 2022-10-26 13:16:49 +02:00
page_counter.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_ext.c mm/page_ext.c: fix an imbalance with kmemleak 2019-04-05 22:31:27 +02:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-07-03 13:15:59 +02:00
page_io.c swap: fix swapfile read/write offset 2021-03-07 11:27:46 +01:00
page_isolation.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
page_owner.c mm/page_owner.c: remove drain_all_pages from init_early_allocated_pages 2020-07-31 16:44:45 +02:00
page_poison.c page_poison: play nicely with KASAN 2019-04-05 22:31:28 +02:00
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-07-11 12:48:12 +02:00
pagewalk.c mm: pagewalk: fix termination condition in walk_pte_range() 2020-10-01 13:12:32 +02:00
percpu-internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
percpu-km.c percpu: convert spin_lock_irq to spin_lock_irqsave. 2019-02-12 19:46:05 +01:00
percpu-stats.c percpu: fix starting offset for chunk statistics traversal 2017-09-27 14:45:57 -07:00
percpu-vm.c percpu: add __GFP_NORETRY semantics to the percpu balancing path 2018-04-08 14:26:29 +02:00
percpu.c percpu: fix first chunk size calculation for populated bitmap 2020-09-23 10:46:36 +02:00
pgtable-generic.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
process_vm_access.c
quicklist.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
readahead.c readahead: stricter check for bdi io_pages 2018-09-09 19:55:53 +02:00
rmap.c mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse 2022-09-05 10:25:06 +02:00
rodata_test.c mm: fix RODATA_TEST failure "rodata_test: test data was not read only" 2017-10-03 17:54:24 -07:00
shmem.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-08 19:01:58 +01:00
slab.c mm/slab.c: fix an infinite loop in leaks_show() 2019-06-15 11:54:51 +02:00
slab.h mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag 2021-11-26 11:40:40 +01:00
slab_common.c mm/slab: use memzero_explicit() in kzfree() 2020-06-30 15:38:08 -04:00
slob.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
slub.c mm/slub: fix to return errno if kmalloc() fails 2022-09-28 10:56:50 +02:00
sparse-vmemmap.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sparse.c mm: sections are not offlined during memory hotremove 2018-05-16 10:10:27 +02:00
swap.c mm: avoid marking swap cached page as lazyfree 2017-10-03 17:54:24 -07:00
swap_cgroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_slots.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_state.c mm/swap_state: fix a data race in swapin_nr_pages 2020-10-01 13:12:46 +02:00
swapfile.c mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() 2023-04-20 12:02:11 +02:00
truncate.c mm: cleancache: fix corruption on missed inode invalidation 2018-12-08 13:03:40 +01:00
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-10-11 18:18:34 +02:00
userfaultfd.c mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() 2022-05-15 19:40:27 +02:00
util.c mm: kvmalloc does not fallback to vmalloc for incompatible gfp flags 2023-02-06 07:46:35 +01:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-19 22:43:48 +02:00
vmalloc.c mm/vmalloc.c: don't dereference possible NULL pointer in __vunmap() 2020-06-03 08:18:11 +02:00
vmpressure.c mm, vmpressure: pass-through notification support 2017-07-10 16:32:31 -07:00
vmscan.c mm: memcg: make sure memory.events is uptodate when waking pollers 2021-04-07 12:47:03 +02:00
vmstat.c mm, vmstat: drop zone->lock in /proc/pagetypeinfo 2021-06-03 08:36:10 +02:00
workingset.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
z3fold.c z3fold: fix possible reclaim races 2018-12-01 09:42:54 +01:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:20:57 +02:00
zswap.c zswap: re-check zswap_is_full() after do zswap_shrink() 2018-09-05 09:26:30 +02:00