linux-stable/kernel
Vincent Guittot 6b94780e45 sched/core: Use load_avg for selecting idlest group
find_idlest_group() only compares the runnable_load_avg when looking
for the least loaded group. But on fork intensive use case like
hackbench where tasks blocked quickly after the fork, this can lead to
selecting the same CPU instead of other CPUs, which have similar
runnable load but a lower load_avg.

When the runnable_load_avg of 2 CPUs are close, we now take into
account the amount of blocked load as a 2nd selection factor. There is
now 3 zones for the runnable_load of the rq:

 - [0 .. (runnable_load - imbalance)]:
	Select the new rq which has significantly less runnable_load

 - [(runnable_load - imbalance) .. (runnable_load + imbalance)]:
	The runnable loads are close so we use load_avg to chose
	between the 2 rq

 - [(runnable_load + imbalance) .. ULONG_MAX]:
	Keep the current rq which has significantly less runnable_load

The scale factor that is currently used for comparing runnable_load,
doesn't work well with small value. As an example, the use of a
scaling factor fails as soon as this_runnable_load == 0 because we
always select local rq even if min_runnable_load is only 1, which
doesn't really make sense because they are just the same. So instead
of scaling factor, we use an absolute margin for runnable_load to
detect CPUs with similar runnable_load and we keep using scaling
factor for blocked load.

For use case like hackbench, this enable the scheduler to select
different CPUs during the fork sequence and to spread tasks across the
system.

Tests have been done on a Hikey board (ARM based octo cores) for
several kernel. The result below gives min, max, avg and stdev values
of 18 runs with each configuration.

The patches depend on the "no missing update_rq_clock()" work.

hackbench -P -g 1

         ea86cb4b76  7dc603c902  v4.8        v4.8+patches
  min    0.049         0.050         0.051       0,048
  avg    0.057         0.057(0%)     0.057(0%)   0,055(+5%)
  max    0.066         0.068         0.070       0,063
  stdev  +/-9%         +/-9%         +/-8%       +/-9%

More performance numbers here:

  https://lkml.kernel.org/r/20161203214707.GI20785@codeblueprint.co.uk

Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: kernellwp@gmail.com
Cc: umgwanakikbuti@gmail.com
Cc: yuyang.du@intel.comc
Link: http://lkml.kernel.org/r/1481216215-24651-3-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11 13:10:57 +01:00
..
bpf bpf: fix states equal logic for varlen access 2016-11-30 14:50:52 -05:00
configs config: android: enable CONFIG_SECCOMP 2016-10-11 15:06:32 -07:00
debug
events perf/core: Remove invalid warning from list_update_cgroup_even()t 2016-12-06 09:44:29 +01:00
gcov gcov: add support for gcc version >= 6 2016-07-15 14:54:27 +09:00
irq genirq: Use irq type from irqdata instead of irqdesc 2016-11-08 15:15:19 +01:00
livepatch livepatch/module: make TAINT_LIVEPATCH module-specific 2016-08-26 14:42:08 +02:00
locking lockdep: Fix report formatting 2016-12-06 10:40:08 +01:00
power PM / sleep: fix device reference leak in test_suspend 2016-11-02 05:10:04 +01:00
printk Revert "printk: make reading the kernel log flush pending lines" 2016-11-14 09:31:52 -08:00
rcu This adds a new gcc plugin named "latent_entropy". It is designed to 2016-10-15 10:03:15 -07:00
sched sched/core: Use load_avg for selecting idlest group 2016-12-11 13:10:57 +01:00
time sched/cputime: Simplify task_cputime() 2016-11-15 09:51:05 +01:00
trace ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip records 2016-11-14 16:31:49 -05:00
.gitignore
acct.c
async.c
audit.c Merge branch 'stable-4.9' of git://git.infradead.org/users/pcmoore/audit 2016-10-04 14:21:41 -07:00
audit.h Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit 2016-07-29 17:54:17 -07:00
audit_fsnotify.c
audit_tree.c audit: cleanup prune_tree_thread 2016-04-04 09:46:47 -04:00
audit_watch.c Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit 2016-09-01 15:55:56 -07:00
auditfilter.c audit: add fields to exclude filter by reusing user filter 2016-06-27 11:01:00 -04:00
auditsc.c Merge branch 'stable-4.9' of git://git.infradead.org/users/pcmoore/audit 2016-10-04 14:21:41 -07:00
backtracetest.c
bounds.c
capability.c kernel: Add noaudit variant of ns_capable() 2016-06-06 20:16:18 +10:00
cgroup.c Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-10-14 12:18:50 -07:00
cgroup_freezer.c
cgroup_pids.c cgroup: Use lld instead of ld when printing pids controller events_limit 2016-06-21 15:03:36 -04:00
compat.c
configs.c
context_tracking.c
cpu.c cpu/hotplug: Use distinct name for cpu_hotplug.dep_map 2016-10-16 11:09:32 +02:00
cpu_pm.c
cpuset.c Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-10-14 12:18:50 -07:00
crash_dump.c
cred.c cred: Reject inodes with invalid ids in set_create_file_as() 2016-06-30 18:05:09 -05:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c sched/autogroup: Do not use autogroup->tg in zombie threads 2016-11-22 12:33:43 +01:00
extable.c
fork.c kthread: Make struct kthread kmalloc'ed 2016-12-08 14:36:18 +01:00
freezer.c freezer, oom: check TIF_MEMDIE on the correct task 2016-07-28 16:07:41 -07:00
futex.c futex: Add some more function commentry 2016-09-05 17:20:18 +02:00
futex_compat.c
groups.c cred: simpler, 1D supplementary groups 2016-10-07 18:46:30 -07:00
hung_task.c hung_task: allow hung_task_panic when hung_task_warnings is 0 2016-10-11 15:06:33 -07:00
irq_work.c
jump_label.c powerpc updates for 4.8 #2 2016-08-05 09:00:54 -04:00
kallsyms.c kallsyms: add support for relative offsets in kallsyms address table 2016-03-15 16:55:16 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: add missing #include <linux/sched.h> 2016-12-07 17:10:00 -08:00
kexec.c kexec: allow architectures to override boot mapping 2016-08-02 19:35:27 -04:00
kexec_core.c kexec: add restriction on kexec_load() segment sizes 2016-08-02 19:35:31 -04:00
kexec_file.c kexec: fix double-free when failing to relocate the purgatory 2016-09-01 17:52:01 -07:00
kexec_internal.h
kmod.c
kprobes.c kprobes: include <asm/sections.h> instead of <asm-generic/sections.h> 2016-10-11 15:06:31 -07:00
ksysfs.c kexec: add a kexec_crash_loaded() function 2016-08-02 19:35:30 -04:00
kthread.c kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker() 2016-12-08 14:36:20 +01:00
latencytop.c
Makefile userns: Add per user namespace sysctls. 2016-08-08 13:18:58 -05:00
membarrier.c
memremap.c mm: fix cache mode of dax pmd mappings 2016-09-09 17:34:46 -07:00
module-internal.h
module.c Re-enable CONFIG_MODVERSIONS in a slightly weaker form 2016-11-29 16:01:30 -08:00
module_signing.c KEYS: Move the point of trust determination to __key_link() 2016-04-11 22:43:43 +01:00
notifier.c
nsproxy.c
padata.c padata: Convert to hotplug state machine 2016-09-19 21:44:30 +02:00
panic.c x86/panic: replace smp_send_stop() with kdump friendly version in panic path 2016-10-11 15:06:32 -07:00
params.c
pid.c remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
pid_namespace.c Merge branch 'nsfs-ioctls' into HEAD 2016-09-22 20:00:36 -05:00
profile.c profile: Convert to hotplug state machine 2016-07-15 10:41:42 +02:00
ptrace.c mm: replace access_process_vm() write parameter with gup_flags 2016-10-19 08:31:25 -07:00
range.c
reboot.c
relay.c relay: Use irq_work instead of plain timer for deferred wakeup 2016-10-11 15:06:32 -07:00
resource.c /proc/iomem: only expose physical resource addresses to privileged users 2016-04-14 12:56:09 -07:00
seccomp.c seccomp: Fix tracer exit notifications during fatal signals 2016-08-30 16:12:46 -07:00
signal.c x86/signal: Add SA_{X32,IA32}_ABI sa_flags 2016-09-14 21:28:11 +02:00
smp.c smp: Allocate smp_call_on_cpu() workqueue on stack too 2016-09-22 14:49:10 +02:00
smpboot.c kthread/smpboot: do not park in kthread_create_on_cpu() 2016-10-11 15:06:33 -07:00
smpboot.h
softirq.c softirq: Display IRQ_POLL for irq-poll statistics 2016-10-21 15:45:47 -06:00
stacktrace.c
stop_machine.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-10-03 13:39:00 -07:00
sys.c prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable 2016-05-23 17:04:14 -07:00
sys_ni.c x86/pkeys: Fix pkeys build breakage for some non-x86 arches 2016-09-13 14:41:36 +02:00
sysctl.c sched/fair: Kill the unused 'sched_shares_window_ns' tunable 2016-10-20 08:44:57 +02:00
sysctl_binary.c kernel/sysctl_binary.c: use generic UUID library 2016-05-20 17:58:30 -07:00
task_work.c task_work: use READ_ONCE/lockless_dereference, avoid pi_lock if !task_works 2016-08-02 19:35:02 -04:00
taskstats.c taskstats: fix the length of cgroupstats_cmd_get_policy 2016-11-03 16:55:58 -04:00
test_kprobes.c
torture.c torture: Convert torture_shutdown() to hrtimer 2016-08-22 10:01:49 -07:00
tracepoint.c kernel/...: convert pr_warning to pr_warn 2016-03-22 15:36:02 -07:00
tsacct.c
ucount.c mntns: Add a limit on the number of mount namespaces. 2016-08-31 07:28:35 -05:00
uid16.c cred: simpler, 1D supplementary groups 2016-10-07 18:46:30 -07:00
up.c smp: Add function to execute a function synchronously on a CPU 2016-09-05 13:52:39 +02:00
user-return-notifier.c
user.c
user_namespace.c Merge branch 'nsfs-ioctls' into HEAD 2016-09-22 20:00:36 -05:00
utsname.c Merge branch 'nsfs-ioctls' into HEAD 2016-09-22 20:00:36 -05:00
utsname_sysctl.c
watchdog.c Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86" 2016-07-10 20:58:36 +02:00
workqueue.c kthread: rename probe_kthread_data() to kthread_probe_data() 2016-10-11 15:06:33 -07:00
workqueue_internal.h