linux-stable/fs/proc
Vasily Averin 425b9c7f51 memcg: accounting for objects allocated for new netdevice
Creating a new netdevice allocates at least ~50Kb of memory for various
kernel objects, but only ~5Kb of them are accounted to memcg. As a result,
creating an unlimited number of netdevice inside a memcg-limited container
does not fall within memcg restrictions, consumes a significant part
of the host's memory, can cause global OOM and lead to random kills of
host processes.

The main consumers of non-accounted memory are:
 ~10Kb   80+ kernfs nodes
 ~6Kb    ipv6_add_dev() allocations
  6Kb    __register_sysctl_table() allocations
  4Kb    neigh_sysctl_register() allocations
  4Kb    __devinet_sysctl_register() allocations
  4Kb    __addrconf_sysctl_register() allocations

Accounting of these objects allows to increase the share of memcg-related
memory up to 60-70% (~38Kb accounted vs ~54Kb total for dummy netdevice
on typical VM with default Fedora 35 kernel) and this should be enough
to somehow protect the host from misuse inside container.

Other related objects are quite small and may not be taken into account
to minimize the expected performance degradation.

It should be separately mentonied ~300 bytes of percpu allocation
of struct ipstats_mib in snmp6_alloc_dev(), on huge multi-cpu nodes
it can become the main consumer of memory.

This patch does not enables kernfs accounting as it affects
other parts of the kernel and should be discussed separately.
However, even without kernfs, this patch significantly improves the
current situation and allows to take into account more than half
of all netdevice allocations.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/354a0a5f-9ec3-a25c-3215-304eab2157bc@openvz.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-04 19:16:46 -07:00
..
Kconfig treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Makefile proc: bootconfig: Add /proc/bootconfig to show boot config list 2020-01-13 13:19:39 -05:00
array.c tracehook: Remove tracehook.h 2022-03-10 16:51:51 -06:00
base.c ptrace: Cleanups for v5.18 2022-03-28 17:29:53 -07:00
bootconfig.c proc: bootconfig: Add null pointer check 2022-04-02 08:40:09 -04:00
cmdline.c
consoles.c
cpuinfo.c proc/cpuinfo: switch to ->read_iter 2020-11-06 10:05:18 -08:00
devices.c block: move block-related definitions out of fs.h 2020-06-24 09:16:02 -06:00
fd.c procfs/dmabuf: add inode number to /proc/*/fdinfo 2021-07-01 11:06:04 -07:00
fd.h fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
generic.c fs: proc: store PDE()->data into inode->i_private 2022-01-22 08:33:37 +02:00
inode.c fs: allocate inode by using alloc_inode_sb() 2022-03-22 15:57:03 -07:00
internal.h fs: proc: store PDE()->data into inode->i_private 2022-01-22 08:33:37 +02:00
interrupts.c
kcore.c fs/proc/kcore: use page_offline_(freeze|thaw) 2021-06-30 20:47:28 -07:00
kmsg.c proc: faster open/read/close with "permanent" files 2020-04-07 10:43:42 -07:00
loadavg.c sched: Make nr_running() return 32-bit value 2021-05-12 21:34:14 +02:00
meminfo.c mm: memcontrol: convert NR_FILE_PMDMAPPED account to pages 2021-02-24 13:38:29 -08:00
namespaces.c Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-01-29 11:20:24 -08:00
nommu.c mm: don't include asm/pgtable.h if linux/mm.h is already included 2020-06-09 09:39:13 -07:00
page.c mm: don't include <linux/memremap.h> in <linux/mm.h> 2022-03-03 12:47:33 -05:00
proc_net.c proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
proc_sysctl.c memcg: accounting for objects allocated for new netdevice 2022-05-04 19:16:46 -07:00
proc_tty.c
root.c fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
self.c Revert "proc: don't allow async path resolution of /proc/self components" 2021-02-23 20:32:11 -07:00
softirqs.c
stat.c fs/proc/uptime.c: Fix idle time reporting in /proc/uptime 2021-10-05 15:51:35 +02:00
task_mmu.c proc: fix documentation and description of pagemap 2022-03-05 11:08:33 -08:00
task_nommu.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
thread_self.c Revert "proc: don't allow async path resolution of /proc/thread-self components" 2021-02-23 20:32:11 -07:00
uptime.c fs/proc/uptime.c: Fix idle time reporting in /proc/uptime 2021-10-05 15:51:35 +02:00
util.c
version.c
vmcore.c proc/vmcore: fix vmcore_alloc_buf() kernel-doc comment 2022-03-23 19:00:33 -07:00