linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-13 22:25:03 +00:00

History

Mel Gorman 72b252aed5 mm: send one IPI per CPU to TLB flush all entries after unmapping pages An IPI is sent to flush remote TLBs when a page is unmapped that was potentially accesssed by other CPUs. There are many circumstances where this happens but the obvious one is kswapd reclaiming pages belonging to a running process as kswapd and the task are likely running on separate CPUs. On small machines, this is not a significant problem but as machine gets larger with more cores and more memory, the cost of these IPIs can be high. This patch uses a simple structure that tracks CPUs that potentially have TLB entries for pages being unmapped. When the unmapping is complete, the full TLB is flushed on the assumption that a refill cost is lower than flushing individual entries. Architectures wishing to do this must give the following guarantee. If a clean page is unmapped and not immediately flushed, the architecture must guarantee that a write to that linear address from a CPU with a cached TLB entry will trap a page fault. This is essentially what the kernel already depends on but the window is much larger with this patch applied and is worth highlighting. The architecture should consider whether the cost of the full TLB flush is higher than sending an IPI to flush each individual entry. An additional architecture helper called flush_tlb_local is required. It's a trivial wrapper with some accounting in the x86 case. The impact of this patch depends on the workload as measuring any benefit requires both mapped pages co-located on the LRU and memory pressure. The case with the biggest impact is multiple processes reading mapped pages taken from the vm-scalability test suite. The test case uses NR_CPU readers of mapped files that consume 10*RAM. Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%) Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%) Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 581.00 611.43 System 5804.93 4111.76 Elapsed 161.03 122.12 This is showing that the readers completed 24.40% faster with 29% less system CPU time. From vmstats, it is known that the vanilla kernel was interrupted roughly 900K times per second during the steady phase of the test and the patched kernel was interrupts 180K times per second. The impact is lower on a single socket machine. 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%) Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%) Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 58.09 57.64 System 111.82 76.56 Elapsed 27.29 22.55 It's still a noticeable improvement with vmstat showing interrupts went from roughly 500K per second to 45K per second. The patch will have no impact on workloads with no memory pressure or have relatively few mapped pages. It will have an unpredictable impact on the workload running on the CPU being flushed as it'll depend on how many TLB entries need to be refilled and how long that takes. Worst case, the TLB will be completely cleared of active entries when the target PFNs were not resident at all. [sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2015-09-04 16:54:41 -07:00
..
kasan	x86/kasan, mm: Introduce generic kasan_populate_zero_shadow()	2015-08-22 14:54:55 +02:00
backing-dev.c	writeback: don't drain bdi_writeback_congested on bdi destruction	2015-07-02 08:46:00 -06:00
balloon_compaction.c	mm/balloon_compaction: fix deflation when compaction is disabled	2014-10-29 16:33:15 -07:00
bootmem.c	mm: page_alloc: pass PFN to __free_pages_bootmem	2015-06-30 19:44:55 -07:00
cleancache.c	cleancache: remove limit on the number of cleancache enabled filesystems	2015-04-14 16:49:03 -07:00
cma.c	mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute	2015-06-24 17:49:44 -07:00
cma.h	mm: cma: mark cma_bitmap_maxno() inline in header	2015-08-14 15:56:32 -07:00
cma_debug.c	mm/cma_debug: correct size input to bitmap function	2015-07-17 16:39:54 -07:00
compaction.c	mm/compaction.c: fix "suitable_migration_target() unused" warning	2015-04-15 16:35:20 -07:00
debug-pagealloc.c	mm/debug-pagealloc: make debug-pagealloc boottime configurable	2014-12-13 12:42:48 -08:00
debug.c	tracing: Rename ftrace_event.h to trace_events.h	2015-05-13 14:05:12 -04:00
dmapool.c	mm/dmapool.c: fixed a brace coding style issue	2014-10-09 22:26:00 -04:00
early_ioremap.c	mm: create generic early_ioremap() support	2014-04-07 16:36:15 -07:00
fadvise.c	writeback: implement and use inode_congested()	2015-06-02 08:33:35 -06:00
failslab.c
filemap.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2015-07-04 19:36:06 -07:00
frontswap.c	frontswap: allow multiple backends	2015-06-24 17:49:45 -07:00
gup.c	mm: use READ_ONCE() for non-scalar types	2015-04-15 16:35:18 -07:00
highmem.c	mm/highmem: make kmap cache coloring aware	2014-08-06 18:01:22 -07:00
huge_memory.c	userfaultfd: propagate the full address in THP faults	2015-09-04 16:54:41 -07:00
hugetlb.c	mm/hugetlb: remove unused arch hook prepare/release_hugepage	2015-06-25 17:00:35 -07:00
hugetlb_cgroup.c	mm: page_counter: pull "-1" handling out of page_counter_memparse()	2015-02-11 17:06:02 -08:00
hwpoison-inject.c	mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling	2015-06-24 17:49:42 -07:00
init-mm.c
internal.h	mm: send one IPI per CPU to TLB flush all entries after unmapping pages	2015-09-04 16:54:41 -07:00
interval_tree.c	mm: replace vma->sharead.linear with vma->shared	2015-02-10 14:30:31 -08:00
Kconfig	mm/Kconfig: NEED_BOUNCE_POOL: clean-up condition	2015-07-23 20:59:41 +02:00
Kconfig.debug	mm/debug_pagealloc: remove obsolete Kconfig options	2015-01-08 15:10:52 -08:00
kmemcheck.c	mm/slab_common: move kmem_cache definition to internal header	2014-10-09 22:25:50 -04:00
kmemleak-test.c	mm/kmemleak-test.c: use pr_fmt for logging	2014-06-06 16:08:18 -07:00
kmemleak.c	mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()	2015-06-24 17:49:46 -07:00
ksm.c	mm: remove rest of ACCESS_ONCE() usages	2015-04-15 16:35:18 -07:00
list_lru.c	memcg: reparent list_lrus and free kmemcg_id on css offline	2015-02-12 18:54:10 -08:00
maccess.c	lib: move strncpy_from_unsafe() into mm/maccess.c	2015-08-31 12:36:10 -07:00
madvise.c	userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx	2015-09-04 16:54:41 -07:00
Makefile	userfaultfd: mcopy_atomic\|mfill_zeropage: UFFDIO_COPY\|UFFDIO_ZEROPAGE preparation	2015-09-04 16:54:41 -07:00
memblock.c	mm: page_alloc: pass PFN to __free_pages_bootmem	2015-06-30 19:44:55 -07:00
memcontrol.c	Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block	2015-06-25 16:00:17 -07:00
memory-failure.c	mm/hwpoison: fix panic due to split huge zero page	2015-08-14 15:56:32 -07:00
memory.c	userfaultfd: call handle_userfault() for userfaultfd_missing() faults	2015-09-04 16:54:41 -07:00
memory_hotplug.c	memory-hotplug: add hot-added memory ranges to memblock before allocate node_data for a node.	2015-09-04 16:54:41 -07:00
mempolicy.c	userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx	2015-09-04 16:54:41 -07:00
mempool.c	mm/mempool.c: kasan: poison mempool elements	2015-04-15 16:35:20 -07:00
memtest.c	mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute	2015-06-24 17:49:44 -07:00
migrate.c	mm/memory-failure: set PageHWPoison before migrate_pages()	2015-08-07 04:39:42 +03:00
mincore.c	mincore: apply page table walker on do_mincore()	2015-02-11 17:06:06 -08:00
mlock.c	userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx	2015-09-04 16:54:41 -07:00
mm_init.c	mm: meminit: remove mminit_verify_page_links	2015-06-30 19:44:56 -07:00
mmap.c	userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx	2015-09-04 16:54:41 -07:00
mmu_context.c
mmu_notifier.c	mmu_notifier: add the callback for mmu_notifier_invalidate_range()	2014-11-13 13:46:09 +11:00
mmzone.c	mm: microoptimize zonelist operations	2015-02-11 17:06:02 -08:00
mprotect.c	userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx	2015-09-04 16:54:41 -07:00
mremap.c	mm: new arch_remap() hook	2015-06-24 17:49:41 -07:00
msync.c	mm: remove rest usage of VM_NONLINEAR and pte_file()	2015-02-10 14:30:31 -08:00
nobootmem.c	mm: page_alloc: pass PFN to __free_pages_bootmem	2015-06-30 19:44:55 -07:00
nommu.c	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2015-09-01 18:46:42 -07:00
oom_kill.c	mm/oom_kill.c: print points as unsigned int	2015-06-24 17:49:44 -07:00
page-writeback.c	writeback: fix initial dirty limit	2015-08-07 04:39:42 +03:00
page_alloc.c	mm: make page pfmemalloc check more robust	2015-08-21 14:30:10 -07:00
page_counter.c	mm: page_counter: pull "-1" handling out of page_counter_memparse()	2015-02-11 17:06:02 -08:00
page_ext.c	mm/page_owner: keep track of page owners	2014-12-13 12:42:48 -08:00
page_io.c	fs: use helper bio_add_page() instead of open coding on bi_io_vec	2015-08-13 12:32:00 -06:00
page_isolation.c	CMA: page_isolation: check buddy before accessing it	2015-05-14 17:55:51 -07:00
page_owner.c	mm/page_owner: set correct gfp_mask on page_owner	2015-07-17 16:39:54 -07:00
pagewalk.c	mm/pagewalk.c: prevent positive return value of walk_page_test() from being passed to callers	2015-03-25 16:20:30 -07:00
percpu-km.c	percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated	2014-09-02 14:46:05 -04:00
percpu-vm.c	percpu: move region iterations out of pcpu_[de]populate_chunk()	2014-09-02 14:46:02 -04:00
percpu.c	percpu: clean up of schunk->map[] assignment in pcpu_setup_first_chunk	2015-07-21 11:31:00 -04:00
pgtable-generic.c	mm: clarify that the function operates on hugepage pte	2015-06-24 17:49:44 -07:00
process_vm_access.c	process_vm_access: switch to {compat_,}import_iovec()	2015-04-11 22:27:12 -04:00
quicklist.c
readahead.c	writeback: implement and use inode_congested()	2015-06-02 08:33:35 -06:00
rmap.c	mm: send one IPI per CPU to TLB flush all entries after unmapping pages	2015-09-04 16:54:41 -07:00
shmem.c	ipc: use private shmem or hugetlbfs inodes for shm segments.	2015-08-07 04:39:41 +03:00
slab.c	slab: infrastructure for bulk object allocation and freeing	2015-09-04 16:54:41 -07:00
slab.h	mm/slab.h: fix argument order in cache_from_obj's error message	2015-09-04 16:54:41 -07:00
slab_common.c	slab: infrastructure for bulk object allocation and freeing	2015-09-04 16:54:41 -07:00
slob.c	slab: infrastructure for bulk object allocation and freeing	2015-09-04 16:54:41 -07:00
slub.c	mm/slub: don't wait for high-order page allocation	2015-09-04 16:54:41 -07:00
sparse-vmemmap.c
sparse.c	mm: use macros from compiler.h instead of __attribute__((...))	2014-04-07 16:35:54 -07:00
swap.c	mm: drop bogus VM_BUG_ON_PAGE assert in put_page() codepath	2015-06-24 17:49:42 -07:00
swap_cgroup.c	mm: page_cgroup: rename file to mm/swap_cgroup.c	2014-12-10 17:41:09 -08:00
swap_state.c	mm: remove rest of ACCESS_ONCE() usages	2015-04-15 16:35:18 -07:00
swapfile.c	vfs: add seq_file_path() helper	2015-06-23 18:01:07 -04:00
truncate.c	memcg: add per cgroup dirty page accounting	2015-06-02 08:33:33 -06:00
userfaultfd.c	userfaultfd: avoid mmap_sem read recursion in mcopy_atomic	2015-09-04 16:54:41 -07:00
util.c	mm: uninline and cleanup page-mapping related helpers	2015-04-15 16:35:19 -07:00
vmacache.c	mm,vmacache: count number of system-wide flushes	2014-12-13 12:42:48 -08:00
vmalloc.c	mm/vmalloc: get rid of dirty bitmap inside vmap_block structure	2015-04-15 16:35:18 -07:00
vmpressure.c	mm/vmpressure.c: fix race in vmpressure_work_fn()	2014-12-02 17:32:07 -08:00
vmscan.c	mm: send one IPI per CPU to TLB flush all entries after unmapping pages	2015-09-04 16:54:41 -07:00
vmstat.c	vmstat: Reduce time interval to stat update on idle cpu	2015-02-11 17:06:07 -08:00
workingset.c	list_lru: add helpers to isolate items	2015-02-12 18:54:10 -08:00
zbud.c	zpool: remove zpool_evict()	2015-06-25 17:00:37 -07:00
zpool.c	zpool: remove zpool_evict()	2015-06-25 17:00:37 -07:00
zsmalloc.c	zpool: remove zpool_evict()	2015-06-25 17:00:37 -07:00
zswap.c	zswap: runtime enable/disable	2015-06-25 17:00:37 -07:00