linux-stable/arch/x86
Omar Sandoval c12cd77cb0 mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
Commit 3ee48b6af4 ("mm, x86: Saving vmcore with non-lazy freeing of
vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to
lazy_max_pages() + 1, ensuring that any future vunmaps() immediately
purge the vmap areas instead of doing it lazily.

Commit 690467c81b ("mm/vmalloc: Move draining areas out of caller
context") moved the purging from the vunmap() caller to a worker thread.
Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin
(possibly forever).  For example, consider the following scenario:

 1. Thread reads from /proc/vmcore. This eventually calls
    __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets
    vmap_lazy_nr to lazy_max_pages() + 1.

 2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2
    pages (one page plus the guard page) to the purge list and
    vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the
    drain_vmap_work is scheduled.

 3. Thread returns from the kernel and is scheduled out.

 4. Worker thread is scheduled in and calls drain_vmap_area_work(). It
    frees the 2 pages on the purge list. vmap_lazy_nr is now
    lazy_max_pages() + 1.

 5. This is still over the threshold, so it tries to purge areas again,
    but doesn't find anything.

 6. Repeat 5.

If the system is running with only one CPU (which is typicial for kdump)
and preemption is disabled, then this will never make forward progress:
there aren't any more pages to purge, so it hangs.  If there is more
than one CPU or preemption is enabled, then the worker thread will spin
forever in the background.  (Note that if there were already pages to be
purged at the time that set_iounmap_nonlazy() was called, this bug is
avoided.)

This can be reproduced with anything that reads from /proc/vmcore
multiple times.  E.g., vmcore-dmesg /proc/vmcore.

It turns out that improvements to vmap() over the years have obsoleted
the need for this "optimization".  I benchmarked `dd if=/proc/vmcore
of=/dev/null` with 4k and 1M read sizes on a system with a 32GB vmcore.
The test was run on 5.17, 5.18-rc1 with a fix that avoided the hang, and
5.18-rc1 with set_iounmap_nonlazy() removed entirely:

    |5.17  |5.18+fix|5.18+removal
  4k|40.86s|  40.09s|      26.73s
  1M|24.47s|  23.98s|      21.84s

The removal was the fastest (by a wide margin with 4k reads).  This
patch removes set_iounmap_nonlazy().

Link: https://lkml.kernel.org/r/52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com
Fixes: 690467c81b  ("mm/vmalloc: Move draining areas out of caller context")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:56 -07:00
..
boot memcpy updates for v5.18-rc1 2022-03-26 12:19:04 -07:00
coco x86/coco: Add API to handle encryption mask 2022-02-23 19:14:29 +01:00
configs x86/config: Make the x86 defconfigs a bit more usable 2022-03-27 20:58:35 +02:00
crypto This push fixes the following issues: 2022-03-31 11:17:39 -07:00
entry Kbuild updates for v5.18 2022-03-31 11:59:03 -07:00
events perf/x86/intel: Update the FRONTEND MSR mask on Sapphire Rapids 2022-04-05 09:59:44 +02:00
hyperv hyperv-next for 5.17 2022-01-16 15:53:00 +02:00
ia32 audit/stable-5.16 PR 20211101 2021-11-01 21:17:39 -07:00
include mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore 2022-04-15 14:49:56 -07:00
kernel mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore 2022-04-15 14:49:56 -07:00
kvm KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU 2022-04-11 13:29:51 -04:00
lib A set of x86 fixes and updates: 2022-04-03 12:15:47 -07:00
math-emu x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
mm x86/mm/tlb: Revert retpoline avoidance approach 2022-04-04 19:41:36 +02:00
net x86,bpf: Avoid IBT objtool warning 2022-04-07 11:27:02 +02:00
pci PCI/sysfs: Find shadow ROM before static attribute initialization 2022-01-26 10:41:21 -06:00
platform objtool,efi: Update __efi64_thunk annotation 2022-03-15 10:32:32 +01:00
power x86/speculation: Restore speculation related MSRs during S3 resume 2022-04-05 10:18:31 -07:00
purgatory x86/purgatory: Remove -nostdlib compiler flag 2021-12-30 14:13:06 +01:00
ras
realmode - Flush *all* mappings from the TLB after switching to the trampoline 2022-01-10 09:51:38 -08:00
tools x86/build: Use the proper name CONFIG_FW_LOADER 2021-12-29 22:20:38 +01:00
um Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2022-04-01 19:57:03 -07:00
video
xen xen: branch for v5.18-rc1 2022-03-28 14:32:39 -07:00
.gitignore
Kbuild x86/cc: Move arch/x86/{kernel/cc_platform.c => coco/core.c} 2022-02-23 18:25:58 +01:00
Kconfig Revert the RT related signal changes. They need to be reworked and 2022-04-03 12:08:26 -07:00
Kconfig.assembler
Kconfig.cpu x86/mmx_32: Remove X86_USE_3DNOW 2021-12-11 09:09:45 +01:00
Kconfig.debug
Makefile x86: Remove toolchain check for X32 ABI capability 2022-03-15 10:32:48 +01:00
Makefile.um
Makefile_32.cpu