linux-stable/mm
Johannes Weiner 52c29b0482 mm: memcontrol: account "kmem" consumers in cgroup2 memory controller
The original cgroup memory controller has an extension to account slab
memory (and other "kernel memory" consumers) in a separate "kmem"
counter, once the user set an explicit limit on that "kmem" pool.

However, this includes various consumers whose sizes are directly linked
to userspace activity.  Accounting them as an optional "kmem" extension
is problematic for several reasons:

1. It leaves the main memory interface with incomplete semantics. A
   user who puts their workload into a cgroup and configures a memory
   limit does not expect us to leave holes in the containment as big
   as the dentry and inode cache, or the kernel stack pages.

2. If the limit set on this random historical subgroup of consumers is
   reached, subsequent allocations will fail even when the main memory
   pool available to the cgroup is not yet exhausted and/or has
   reclaimable memory in it.

3. Calling it 'kernel memory' is misleading. The dentry and inode
   caches are no more 'kernel' (or no less 'user') memory than the
   page cache itself. Treating these consumers as different classes is
   a historical implementation detail that should not leak to users.

So, in addition to page cache, anonymous memory, and network socket
memory, account the following memory consumers per default in the
cgroup2 memory controller:

     - threadinfo
     - task_struct
     - task_delay_info
     - pid
     - cred
     - mm_struct
     - vm_area_struct and vm_region (nommu)
     - anon_vma and anon_vma_chain
     - signal_struct
     - sighand_struct
     - fs_struct
     - files_struct
     - fdtable and fdtable->full_fds_bits
     - dentry and external_name
     - inode for all filesystems.

This should give us reasonable memory isolation for most common
workloads out of the box.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20 17:09:18 -08:00
..
kasan UBSAN: run-time undefined behavior sanity checker 2016-01-20 17:09:18 -08:00
Kconfig mm: re-enable THP 2016-01-15 17:56:32 -08:00
Kconfig.debug mm/debug_pagealloc: remove obsolete Kconfig options 2015-01-08 15:10:52 -08:00
Makefile media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
backing-dev.c mm: memcontrol: export root_mem_cgroup 2016-01-14 16:00:49 -08:00
balloon_compaction.c virtio_balloon: fix race between migration and ballooning 2016-01-12 20:47:06 +02:00
bootmem.c x86/mm: Introduce max_possible_pfn 2015-12-06 12:46:31 +01:00
cleancache.c cleancache: remove limit on the number of cleancache enabled filesystems 2015-04-14 16:49:03 -07:00
cma.c mm/cma.c: suppress warning 2015-11-05 19:34:48 -08:00
cma.h mm: cma: mark cma_bitmap_maxno() inline in header 2015-08-14 15:56:32 -07:00
cma_debug.c mm/cma_debug: correct size input to bitmap function 2015-07-17 16:39:54 -07:00
compaction.c mm/compaction.c: __compact_pgdat() code cleanuup 2016-01-14 16:00:49 -08:00
debug-pagealloc.c mm/debug-pagealloc: make debug-pagealloc boottime configurable 2014-12-13 12:42:48 -08:00
debug.c mm: rework mapcount accounting to enable 4k mapping of THPs 2016-01-15 17:56:32 -08:00
dmapool.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
early_ioremap.c mm/early_ioremap: use offset_in_page macro 2015-11-05 19:34:48 -08:00
fadvise.c writeback: implement and use inode_congested() 2015-06-02 08:33:35 -06:00
failslab.c mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM 2015-11-06 17:50:42 -08:00
filemap.c mm: differentiate page_mapped() from page_mapcount() for compound pages 2016-01-15 17:56:32 -08:00
frame_vector.c mm: fix docbook comment for get_vaddr_frames() 2015-11-05 19:34:48 -08:00
frontswap.c frontswap: allow multiple backends 2015-06-24 17:49:45 -07:00
gup.c mm: bring in additional flag for fixup_user_fault to signal unlock 2016-01-15 17:56:32 -08:00
highmem.c mm/highmem: make kmap cache coloring aware 2014-08-06 18:01:22 -07:00
huge_memory.c thp: fix interrupt unsafe locking in split_huge_page() 2016-01-20 17:09:18 -08:00
hugetlb.c mm: rework mapcount accounting to enable 4k mapping of THPs 2016-01-15 17:56:32 -08:00
hugetlb_cgroup.c mm: make compound_head() robust 2015-11-06 17:50:42 -08:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c
internal.h thp: reintroduce split_huge_page() 2016-01-15 17:56:32 -08:00
interval_tree.c mm: replace vma->sharead.linear with vma->shared 2015-02-10 14:30:31 -08:00
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c Revert "gfp: add __GFP_NOACCOUNT" 2016-01-14 16:00:49 -08:00
ksm.c mm/ksm.c: mark stable page dirty 2016-01-15 17:56:32 -08:00
list_lru.c mm: memcontrol: move kmem accounting code to CONFIG_MEMCG 2016-01-20 17:09:18 -08:00
maccess.c mm/maccess.c: actually return -EFAULT from strncpy_from_unsafe 2015-11-05 19:34:48 -08:00
madvise.c mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called 2016-01-15 17:56:32 -08:00
memblock.c mm/memblock: introduce for_each_memblock_type() 2016-01-14 16:00:49 -08:00
memcontrol.c mm: memcontrol: account "kmem" consumers in cgroup2 memory controller 2016-01-20 17:09:18 -08:00
memory-failure.c mm: soft-offline: exit with failure for non anonymous thp 2016-01-15 17:56:32 -08:00
memory.c mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd 2016-01-15 17:56:32 -08:00
memory_hotplug.c x86, mm: introduce vmem_altmap to augment vmemmap_populate() 2016-01-15 17:56:32 -08:00
mempolicy.c mm: mempolicy: skip non-migratable VMAs when setting MPOL_MF_LAZY 2016-01-15 17:56:32 -08:00
mempool.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c thp: introduce deferred_split_huge_page() 2016-01-15 17:56:32 -08:00
mincore.c mm, thp: remove infrastructure for handling splitting PMDs 2016-01-15 17:56:32 -08:00
mlock.c mm/mlock.c: change can_do_mlock return value type to boolean 2016-01-15 17:56:32 -08:00
mm_init.c mm: meminit: remove mminit_verify_page_links 2015-06-30 19:44:56 -07:00
mmap.c mm: fix locking order in mm_take_all_locks() 2016-01-15 17:56:32 -08:00
mmu_context.c
mmu_notifier.c mmu-notifier: add clear_young callback 2015-09-10 13:29:01 -07:00
mmzone.c mm/mmzone.c: memmap_valid_within() can be boolean 2016-01-14 16:00:49 -08:00
mprotect.c mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd 2016-01-15 17:56:32 -08:00
mremap.c mm, thp: remove infrastructure for handling splitting PMDs 2016-01-15 17:56:32 -08:00
msync.c mm/msync: use offset_in_page macro 2015-11-05 19:34:48 -08:00
nobootmem.c x86/mm: Introduce max_possible_pfn 2015-12-06 12:46:31 +01:00
nommu.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
oom_kill.c mm, shmem: add internal shmem resident memory accounting 2016-01-14 16:00:49 -08:00
page-writeback.c mm: page_alloc: generalize the dirty balance reserve 2016-01-14 16:00:49 -08:00
page_alloc.c mm/page_alloc.c: remove unused struct zone *z variable 2016-01-15 17:56:32 -08:00
page_counter.c mm: page_counter: let page_counter_try_charge() return bool 2015-11-05 19:34:48 -08:00
page_ext.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_idle.c mm: add page_check_address_transhuge() helper 2016-01-15 17:56:32 -08:00
page_io.c fs: use helper bio_add_page() instead of open coding on bi_io_vec 2015-08-13 12:32:00 -06:00
page_isolation.c mm/page_isolation: do some cleanup in "undo_isolate_page_range" 2016-01-15 17:56:32 -08:00
page_owner.c mm/page_owner: set correct gfp_mask on page_owner 2015-07-17 16:39:54 -07:00
pagewalk.c thp: rename split_huge_page_pmd() to split_huge_pmd() 2016-01-15 17:56:32 -08:00
percpu-km.c percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated 2014-09-02 14:46:05 -04:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c mm/percpu: use offset_in_page macro 2015-11-05 19:34:48 -08:00
pgtable-generic.c mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd 2016-01-15 17:56:32 -08:00
process_vm_access.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
quicklist.c
readahead.c mm: move lru_to_page to mm_inline.h 2016-01-14 16:00:49 -08:00
rmap.c mm: fix locking order in mm_take_all_locks() 2016-01-15 17:56:32 -08:00
shmem.c memcg: adjust to support new THP refcounting 2016-01-15 17:56:32 -08:00
slab.c mm/slab.c: add a helper function get_first_slab 2016-01-14 16:00:49 -08:00
slab.h mm: memcontrol: move kmem accounting code to CONFIG_MEMCG 2016-01-20 17:09:18 -08:00
slab_common.c mm: memcontrol: move kmem accounting code to CONFIG_MEMCG 2016-01-20 17:09:18 -08:00
slob.c slab/slub: adjust kmem_cache_alloc_bulk API 2015-11-22 11:58:44 -08:00
slub.c mm: memcontrol: move kmem accounting code to CONFIG_MEMCG 2016-01-20 17:09:18 -08:00
sparse-vmemmap.c x86, mm: introduce vmem_altmap to augment vmemmap_populate() 2016-01-15 17:56:32 -08:00
sparse.c x86, mm: introduce vmem_altmap to augment vmemmap_populate() 2016-01-15 17:56:32 -08:00
swap.c mm, x86: get_user_pages() for dax mappings 2016-01-15 17:56:32 -08:00
swap_cgroup.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
swap_state.c mm: support madvise(MADV_FREE) 2016-01-15 17:56:32 -08:00
swapfile.c mm: make swapoff more robust against soft dirty 2016-01-15 17:56:32 -08:00
truncate.c memcg: add per cgroup dirty page accounting 2015-06-02 08:33:33 -06:00
userfaultfd.c memcg: adjust to support new THP refcounting 2016-01-15 17:56:32 -08:00
util.c proc read mm's {arg,env}_{start,end} with mmap semaphore taken. 2016-01-20 17:09:18 -08:00
vmacache.c mm/vmacache: inline vmacache_valid_mm() 2015-11-05 19:34:48 -08:00
vmalloc.c mm/vmalloc.c: use macro IS_ALIGNED to judge the aligment 2016-01-15 17:56:32 -08:00
vmpressure.c memcg: avoid vmpressure oops when memcg disabled 2016-01-14 16:00:49 -08:00
vmscan.c mm: memcontrol: give the kmem states more descriptive names 2016-01-20 17:09:18 -08:00
vmstat.c mm: support madvise(MADV_FREE) 2016-01-15 17:56:32 -08:00
workingset.c list_lru: add helpers to isolate items 2015-02-12 18:54:10 -08:00
zbud.c mm/zbud.c: use list_last_entry() instead of list_tail_entry() 2016-01-15 11:40:52 -08:00
zpool.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zsmalloc.c zsmalloc: fix migrate_zspage-zs_free race condition 2016-01-20 17:09:18 -08:00
zswap.c mm/zswap: change incorrect strncmp use to strcmp 2015-12-18 14:25:40 -08:00