linux-stable/include/asm-generic
Linus Torvalds 5df397dec7 mm: delay page_remove_rmap() until after the TLB has been flushed
When we remove a page table entry, we are very careful to only free the
page after we have flushed the TLB, because other CPUs could still be
using the page through stale TLB entries until after the flush.

However, we have removed the rmap entry for that page early, which means
that functions like folio_mkclean() would end up not serializing with the
page table lock because the page had already been made invisible to rmap.

And that is a problem, because while the TLB entry exists, we could end up
with the following situation:

 (a) one CPU could come in and clean it, never seeing our mapping of the
     page

 (b) another CPU could continue to use the stale and dirty TLB entry and
     continue to write to said page

resulting in a page that has been dirtied, but then marked clean again,
all while another CPU might have dirtied it some more.

End result: possibly lost dirty data.

This extends our current TLB gather infrastructure to optionally track a
"should I do a delayed page_remove_rmap() for this page after flushing the
TLB".  It uses the newly introduced 'encoded page pointer' to do that
without having to keep separate data around.

Note, this is complicated by a couple of issues:

 - we want to delay the rmap removal, but not past the page table lock,
   because that simplifies the memcg accounting

 - only SMP configurations want to delay TLB flushing, since on UP
   there are obviously no remote TLBs to worry about, and the page
   table lock means there are no preemption issues either

 - s390 has its own mmu_gather model that doesn't delay TLB flushing,
   and as a result also does not want the delayed rmap. As such, we can
   treat S390 like the UP case and use a common fallback for the "no
   delays" case.

 - we can track an enormous number of pages in our mmu_gather structure,
   with MAX_GATHER_BATCH_COUNT batches of MAX_TABLE_BATCH pages each,
   all set up to be approximately 10k pending pages.

   We do not want to have a huge number of batched pages that we then
   need to check for delayed rmap handling inside the page table lock.

Particularly that last point results in a noteworthy detail, where the
normal page batch gathering is limited once we have delayed rmaps pending,
in such a way that only the last batch (the so-called "active batch") in
the mmu_gather structure can have any delayed entries.

NOTE!  While the "possibly lost dirty data" sounds catastrophic, for this
all to happen you need to have a user thread doing either madvise() with
MADV_DONTNEED or a full re-mmap() of the area concurrently with another
thread continuing to use said mapping.

So arguably this is about user space doing crazy things, but from a VM
consistency standpoint it's better if we track the dirty bit properly even
when user space goes off the rails.

[akpm@linux-foundation.org: fix UP build, per Linus]
Link: https://lore.kernel.org/all/B88D3073-440A-41C7-95F4-895D3F657EF2@gmail.com/
Link: https://lkml.kernel.org/r/20221109203051.1835763-4-torvalds@linux-foundation.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hugh Dickins <hughd@google.com>
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Tested-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:50 -08:00
..
bitops wait_on_bit: add an acquire memory barrier 2022-08-26 09:30:25 -07:00
vdso
access_ok.h uaccess: remove CONFIG_SET_FS 2022-02-25 09:36:06 +01:00
archrandom.h random: handle archrandom with multiple longs 2022-07-25 13:26:14 +02:00
asm-offsets.h
asm-prototypes.h
atomic.h locking/atomic: delete !ARCH_ATOMIC remnants 2021-05-26 13:20:52 +02:00
atomic64.h locking/atomic: delete !ARCH_ATOMIC remnants 2021-05-26 13:20:52 +02:00
audit_change_attr.h
audit_dir_write.h
audit_read.h
audit_signal.h
audit_write.h
barrier.h asm-generic: Add memory barrier dma_mb() 2022-06-23 18:34:58 +01:00
bitops.h include: move find.h from asm_generic to linux 2022-01-15 08:47:31 -08:00
bitsperlong.h lib: extend the scope of small_const_nbits() macro 2021-05-06 19:24:11 -07:00
bug.h treewide: Drop WARN_ON_FUNCTION_MISMATCH 2022-09-26 10:13:14 -07:00
bugs.h
cache.h
cacheflush.h asm-generic: instrument usercopy in cacheflush.h 2022-10-03 14:03:18 -07:00
checksum.h unify generic instances of csum_partial_copy_nocheck() 2020-08-20 15:45:14 -04:00
cmpxchg-local.h locking/atomic: cmpxchg: make generic a prefix 2021-05-26 13:20:50 +02:00
cmpxchg.h locking/atomic: delete !ARCH_ATOMIC remnants 2021-05-26 13:20:52 +02:00
compat.h asm-generic: compat: fix compat_arg_u64() and compat_arg_u64_dual() 2022-11-01 10:20:11 +11:00
current.h
delay.h
device.h
div64.h ARM: 9117/1: asm-generic: div64: Remove always-true __div64_const32_is_OK() 2021-08-20 11:39:28 +01:00
dma-mapping.h
dma.h
early_ioremap.h mm/early_ioremap.c: remove redundant early_ioremap_shutdown() 2021-09-08 11:50:24 -07:00
emergency-restart.h
error-injection.h asm-generic/error-injection.h: fix a spelling mistake, and a coding style issue 2021-12-17 14:12:14 +01:00
exec.h
export.h kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS 2022-05-24 16:33:20 +09:00
extable.h
fb.h
fixmap.h
flat.h
ftrace.h
futex.h futex: Fix additional regressions 2021-12-11 23:31:51 +01:00
getorder.h asm-generic: force inlining of get_order() to work around gcc10 poor decision 2020-12-15 22:46:15 -08:00
gpio.h
hardirq.h irqstat: Move declaration into asm-generic/hardirq.h 2020-11-23 10:31:06 +01:00
hugetlb.h mm: change huge_ptep_clear_flush() to return the original pte 2022-05-13 16:48:55 -07:00
hw_irq.h
hyperv-tlfs.h KVM: x86: Add checks for reserved-to-zero Hyper-V hypercall fields 2022-02-10 13:50:36 -05:00
ide_iops.h
int-ll64.h
io.h asm-generic: updates for 6.0 2022-08-05 10:07:23 -07:00
ioctl.h
iomap.h parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled 2021-09-19 10:36:09 -07:00
irq.h
irq_regs.h
irq_work.h
irqflags.h
Kbuild xen: branch for v6.0-rc1 2022-08-04 15:10:55 -07:00
kdebug.h
kmap_size.h mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL 2020-11-24 14:42:08 +01:00
kprobes.h treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
kvm_para.h
kvm_types.h KVM: Move x86's version of struct kvm_mmu_memory_cache to common code 2020-07-09 13:29:42 -04:00
linkage.h
local.h
local64.h
logic_io.h logic_io instance of iounmap() needs volatile on argument 2021-12-21 21:31:08 +01:00
mcs_spinlock.h
memory_model.h mm: remove CONFIG_DISCONTIGMEM 2021-06-29 10:53:55 -07:00
mm_hooks.h
mmiowb.h asm-generic/mmiowb: Allow mmiowb_set_pending() when preemptible() 2020-07-17 10:02:03 +01:00
mmiowb_types.h
mmu.h
mmu_context.h asm-generic: add generic MMU versions of mmu context functions 2020-10-26 16:45:03 +01:00
module.h
module.lds.h kbuild: preprocess module linker script 2020-09-25 00:36:41 +09:00
mshyperv.h hyperv: simplify and rename generate_guest_id 2022-09-28 13:36:56 +00:00
msi.h Generic interrupt and irqchips subsystem: 2020-12-15 15:03:31 -08:00
nommu_context.h asm-generic: add generic MMU versions of mmu context functions 2020-10-26 16:45:03 +01:00
numa.h numa: Move numa implementation to common code 2021-01-14 15:08:55 -08:00
page.h c6x: remove architecture 2021-01-20 09:30:45 +01:00
param.h
parport.h
pci.h asm-generic: Add new pci.h and use it 2022-07-22 17:34:57 -05:00
pci_iomap.h PCI: Stub __pci_ioport_map() for arches that don't support it at all 2022-07-29 12:01:00 -05:00
percpu.h asm-generic: percpu: avoid Wshadow warning 2020-10-26 23:54:48 +00:00
pgalloc.h asm-generic: Prepare for riscv use of pud_alloc_one and pud_free 2022-01-19 17:54:08 -08:00
pgtable-nop4d.h mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * 2021-07-08 11:48:22 -07:00
pgtable-nopmd.h riscv/mm: fix two page table check related issues 2022-05-19 14:08:48 -07:00
pgtable-nopud.h mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * 2021-07-08 11:48:22 -07:00
pgtable_uffd.h
preempt.h sched/core: Initialize the idle task with preemption disabled 2021-05-12 13:01:45 +02:00
qrwlock.h asm-generic changes for 5.19 2022-05-26 10:50:30 -07:00
qrwlock_types.h locking/qrwlock: Change "queue rwlock" to "queued rwlock" 2022-05-11 16:27:04 +02:00
qspinlock.h asm-generic: qspinlock: Indicate the use of mixed-size atomics 2022-05-11 11:49:47 -07:00
qspinlock_types.h locking/qspinlock: Do not include atomic.h from qspinlock_types.h 2020-07-29 16:14:19 +02:00
resource.h
rwonce.h asm/rwonce: Don't pull <asm/barrier.h> into 'asm-generic/rwonce.h' 2020-07-21 10:50:36 +01:00
seccomp.h seccomp: Use -1 marker for end of mode 1 syscall list 2020-07-10 16:01:52 -07:00
sections.h asm-generic: sections: refactor memory_intersects 2022-08-28 14:02:45 -07:00
serial.h
set_memory.h
shmparam.h
signal.h asm-generic: Remove empty #ifdef SA_RESTORER 2022-09-10 09:56:53 +02:00
simd.h
softirq_stack.h asm-generic: Conditionally enable do_softirq_own_stack() via Kconfig. 2022-09-05 17:20:55 +02:00
spinlock.h asm-generic: ticket-lock: New generic ticket-based spinlock 2022-05-11 11:49:38 -07:00
spinlock_types.h asm-generic: ticket-lock: New generic ticket-based spinlock 2022-05-11 11:49:38 -07:00
statfs.h
string.h
switch_to.h
syscall.h ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h 2022-03-10 13:35:08 -06:00
syscalls.h
timex.h
tlb.h mm: delay page_remove_rmap() until after the TLB has been flushed 2022-11-30 15:58:50 -08:00
tlbflush.h
topology.h mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA 2021-06-29 10:53:55 -07:00
trace_clock.h
uaccess.h uaccess: remove CONFIG_SET_FS 2022-02-25 09:36:06 +01:00
unaligned.h asm-generic: make parameter types consistent in _unaligned_be48() 2022-09-11 21:55:12 -07:00
user.h
vermagic.h
vga.h
vmlinux.lds.h ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph() 2022-10-20 17:10:27 +02:00
vtime.h
word-at-a-time.h
xor.h lib/xor: make xor prototypes more friendly to compiler vectorization 2022-02-11 20:39:39 +11:00