linux-stable/kernel
Suren Baghdasaryan fc7d33941b mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
[ Upstream commit 67197a4f28 ]

Currently __set_oom_adj loops through all processes in the system to keep
oom_score_adj and oom_score_adj_min in sync between processes sharing
their mm.  This is done for any task with more that one mm_users, which
includes processes with multiple threads (sharing mm and signals).
However for such processes the loop is unnecessary because their signal
structure is shared as well.

Android updates oom_score_adj whenever a tasks changes its role
(background/foreground/...) or binds to/unbinds from a service, making it
more/less important.  Such operation can happen frequently.  We noticed
that updates to oom_score_adj became more expensive and after further
investigation found out that the patch mentioned in "Fixes" introduced a
regression.  Using Pixel 4 with a typical Android workload, write time to
oom_score_adj increased from ~3.57us to ~362us.  Moreover this regression
linearly depends on the number of multi-threaded processes running on the
system.

Mark the mm with a new MMF_MULTIPROCESS flag bit when task is created with
(CLONE_VM && !CLONE_THREAD && !CLONE_VFORK).  Change __set_oom_adj to use
MMF_MULTIPROCESS instead of mm_users to decide whether oom_score_adj
update should be synchronized between multiple processes.  To prevent
races between clone() and __set_oom_adj(), when oom_score_adj of the
process being cloned might be modified from userspace, we use
oom_adj_mutex.  Its scope is changed to global.

The combination of (CLONE_VM && !CLONE_THREAD) is rarely used except for
the case of vfork().  To prevent performance regressions of vfork(), we
skip taking oom_adj_mutex and setting MMF_MULTIPROCESS when CLONE_VFORK is
specified.  Clearing the MMF_MULTIPROCESS flag (when the last process
sharing the mm exits) is left out of this patch to keep it simple and
because it is believed that this threading model is rare.  Should there
ever be a need for optimizing that case as well, it can be done by hooking
into the exit path, likely following the mm_update_next_owner pattern.

With the combination of (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK) being
quite rare, the regression is gone after the change is applied.

[surenb@google.com: v3]
  Link: https://lkml.kernel.org/r/20200902012558.2335613-1-surenb@google.com

Fixes: 44a70adec9 ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj")
Reported-by: Tim Murray <timmurray@google.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Christian Kellner <christian@kellner.me>
Cc: Adrian Reber <areber@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: https://lkml.kernel.org/r/20200824153036.3201505-1-surenb@google.com
Debugged-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29 09:07:08 +01:00
..
bpf bpf: Remove recursion prevention from rcu free callback 2020-10-01 13:12:35 +02:00
cgroup cgroup: add missing skcd->no_refcnt check in cgroup_sk_clone() 2020-08-21 09:48:02 +02:00
configs
debug kgdb: Avoid suspicious RCU usage warning 2020-07-09 09:36:30 +02:00
events perf: Fix task_function_call() error handling 2020-10-14 09:51:14 +02:00
gcov gcov: add support for GCC 10.1 2020-09-23 10:46:32 +02:00
irq genirq/affinity: Make affinity setting if activated opt-in 2020-08-21 09:48:23 +02:00
livepatch livepatch: Nullify obj->mod in klp_module_coming()'s error path 2019-10-07 18:55:09 +02:00
locking locking/lockdep: Fix overflow in presentation of average lock-time 2020-09-03 11:22:27 +02:00
power PM: hibernate: Freeze kernel threads in software_resume() 2020-05-05 19:15:50 +02:00
printk printk: handle blank console arguments passed in. 2020-10-01 13:12:45 +02:00
rcu rcuperf: Fix cleanup path for invalid perf_type strings 2019-05-31 06:47:33 -07:00
sched sched: correct SD_flags returned by tl->sd_flags() 2020-08-21 09:48:02 +02:00
time timekeeping: Prevent 32bit truncation in scale64_check_overflow() 2020-10-01 13:12:36 +02:00
trace ftrace: Move RCU is watching check after recursion check 2020-10-14 09:51:11 +02:00
.gitignore
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-10 09:31:17 +01:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-16 20:23:05 +01:00
audit.c audit: fix a net reference leak in audit_list_rules_send() 2020-06-20 10:25:10 +02:00
audit.h audit: fix a net reference leak in audit_list_rules_send() 2020-06-20 10:25:10 +02:00
audit_fsnotify.c
audit_tree.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
audit_watch.c audit: CONFIG_CHANGE don't log internal bookkeeping as an event 2020-10-01 13:12:33 +02:00
auditfilter.c audit: fix a net reference leak in audit_list_rules_send() 2020-06-20 10:25:10 +02:00
auditsc.c audit: print empty EXECVE args 2019-12-01 09:14:03 +01:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:15:08 -08:00
capability.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c make 'user_access_begin()' do 'access_ok()' 2020-06-20 10:24:58 +02:00
configs.c
context_tracking.c
cpu.c x86/speculation: Remove redundant arch_smt_update() invocation 2020-04-24 08:00:41 +02:00
cpu_pm.c kernel/cpu_pm: Fix uninitted local in cpu_pm 2020-06-20 10:25:19 +02:00
crash_core.c kdump: write correct address of mem_section into vmcoreinfo 2018-01-17 09:45:27 +01:00
crash_dump.c
cred.c memcg: account security cred as well to kmemcg 2020-01-09 10:17:54 +01:00
delayacct.c delayacct: Use raw_spinlocks 2018-08-03 07:50:38 +02:00
dma.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elfcore.c kernel/elfcore.c: include proper prototypes 2019-10-11 18:18:42 +02:00
exec_domain.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exit.c exit: Move preemption fixup up, move blocking operations down 2020-06-20 10:25:10 +02:00
extable.c extable: Enable RCU if it is not watching in kernel_text_address() 2017-09-23 16:50:20 -04:00
fork.c mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary 2020-10-29 09:07:08 +01:00
freezer.c
futex.c futex: Unbreak futex hashing 2020-04-02 16:34:21 +02:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-20 10:10:18 +01:00
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:15:05 +02:00
irq_work.c
jump_label.c sched/core: Fix cpu.max vs. cpuhotplug deadlock 2018-12-05 19:41:17 +01:00
kallsyms.c kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol 2019-09-21 07:15:38 +02:00
kcmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: ensure irq code sees a valid area 2018-08-03 07:50:22 +02:00
kexec.c
kexec_core.c kexec: Allocate decrypted control pages for kdump if SME is enabled 2019-11-24 08:23:15 +01:00
kexec_file.c
kexec_internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kmod.c kmod: make request_module() return an error when autoloading is disabled 2020-04-24 08:00:44 +02:00
kprobes.c kprobes: Fix to check probe enabled before disarm_kprobe_ftrace() 2020-10-01 13:12:51 +02:00
ksysfs.c
kthread.c kthread, tracing: Don't expose half-written comm when creating kthreads 2018-08-03 07:50:21 +02:00
latencytop.c
Makefile y2038: futex: Move compat implementation into futex.c 2019-12-05 15:38:22 +01:00
memremap.c mm, devm_memremap_pages: kill mapping "System RAM" support 2019-01-13 10:01:02 +01:00
module-internal.h
module.c kernel/module: Fix memleak in module_add_modinfo_attrs() 2020-02-14 16:32:06 -05:00
module_signing.c
notifier.c x86/mm: split vmalloc_sync_all() 2020-04-02 16:34:20 +02:00
nsproxy.c
padata.c padata: purge get_cpu and reorder_via_wq from padata_do_serial 2020-05-27 16:43:05 +02:00
panic.c panic: ensure preemption is disabled during panic() 2019-10-17 13:43:19 -07:00
params.c kernel/params.c: improve STANDARD_PARAM_DEF readability 2017-10-03 17:54:26 -07:00
pid.c
pid_namespace.c signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig 2019-07-31 07:28:21 +02:00
profile.c
ptrace.c ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() 2020-01-23 08:20:32 +01:00
range.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c
relay.c kernel/relay.c: fix memleak on destroy relay channel 2020-08-26 10:29:54 +02:00
resource.c resource: fix integer overflow at reallocation 2018-04-24 09:36:22 +02:00
seccomp.c seccomp: Move speculation migitation control to arch code 2018-05-22 18:54:04 +02:00
signal.c signal: Extend exec_id to 64bits 2020-04-24 08:00:38 +02:00
smp.c cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM 2019-02-12 19:46:13 +01:00
smpboot.c watchdog/core, powerpc: Lock cpus across reconfiguration 2017-10-04 10:53:54 +02:00
smpboot.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
softirq.c Mark HI and TASKLET softirq synchronous 2018-08-15 18:12:47 +02:00
stacktrace.c
stop_machine.c stop_machine: Atomically queue and wake stopper threads 2018-09-05 09:26:36 +02:00
sys.c kernel/sys.c: avoid copying possible padding bytes in copy_to_user 2020-10-01 13:12:30 +02:00
sys_ni.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sysctl.c kernel: sysctl: make drop_caches write-only 2020-01-04 13:59:57 +01:00
sysctl_binary.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-25 14:26:21 +01:00
taskstats.c taskstats: fix data-race 2020-01-09 10:17:53 +01:00
test_kprobes.c
torture.c
tracepoint.c tracepoint: Do not warn on ENOMEM 2018-05-09 09:51:50 +02:00
tsacct.c
ucount.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-20 10:10:18 +01:00
umh.c usermodehelper: reset umask to default before executing user process 2020-10-14 09:51:10 +02:00
up.c
user-return-notifier.c
user.c
user_namespace.c userns: move user access out of the mutex 2018-09-09 19:56:00 +02:00
utsname.c
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 19:56:00 +02:00
watchdog.c watchdog/softlockup: Enforce that timestamp is valid on boot 2020-02-28 16:36:05 +01:00
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
workqueue.c workqueue: don't use wq_select_unbound_cpu() for bound works 2020-03-20 10:54:16 +01:00
workqueue_internal.h Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2017-11-06 12:26:49 -08:00