linux-stable/kernel/sched
Vik Heyndrickx 20878232c5 sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.

Likewise but not as obviously noticeable, a fully loaded system with no
processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
(multiplied by number of cores).

Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.

It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.

By changing the way the internally kept value is rounded, that internal
value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
increasing load, the internally kept load value is rounded up, when the
load is decreasing, the load value is rounded down.

The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
tested on single, dual, and octal cores system. It was tested on virtual
hosts and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.

Tested-by: Damien Wyart <damien.wyart@free.fr>
Signed-off-by: Vik Heyndrickx <vik.heyndrickx@veribox.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Doug Smythies <dsmythies@telus.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0f004f5a69 ("sched: Cure more NO_HZ load average woes")
Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:34 +02:00
..
auto_group.c sched/core: Move the sched_to_prio[] arrays out of line 2015-12-04 10:34:46 +01:00
auto_group.h sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE() 2015-05-08 12:11:32 +02:00
clock.c sched/clock: Make local_clock()/cpu_clock() inline 2016-04-13 12:25:22 +02:00
completion.c sched/completion: Serialize completion_done() with complete() 2015-02-18 14:27:40 +01:00
core.c sched/nohz: Fix affine unpinned timers mess 2016-05-12 09:55:32 +02:00
cpuacct.c sched/cpuacct: Check for NULL when using task_pt_regs() 2016-04-13 13:22:37 +02:00
cpuacct.h sched/cpuacct: Simplify the cpuacct code 2016-03-21 11:00:28 +01:00
cpudeadline.c sched/deadline: Unify dl_time_before() usage 2015-09-23 09:51:25 +02:00
cpudeadline.h sched/deadline: Unify dl_time_before() usage 2015-09-23 09:51:25 +02:00
cpufreq.c cpufreq: Move scheduler-related code to the sched directory 2016-03-10 20:44:47 +01:00
cpupri.c Merge commit '3cf2f34' into sched/core, to fix build error 2014-06-12 13:46:37 +02:00
cpupri.h sched/cpupri: Remove unnecessary definitions in cpupri.h 2014-11-16 10:58:59 +01:00
cputime.c sched/cputime: Fix steal_account_process_tick() to always return jiffies 2016-03-08 12:24:56 +01:00
deadline.c Merge branch 'sched/urgent' into sched/core to pick up fixes 2016-05-12 09:18:13 +02:00
debug.c sched/debug: Print out idle balance values even on !CONFIG_SCHEDSTATS kernels 2016-05-05 09:41:09 +02:00
fair.c sched/fair: Correct unit of load_above_capacity 2016-05-12 09:55:33 +02:00
features.h sched/fair: Convert arch_scale_cpu_capacity() from weak function to #define 2015-09-13 09:52:55 +02:00
idle.c rcu: Make CPU_DYING_IDLE an explicit call 2016-03-01 20:36:58 +01:00
idle_task.c locking/lockdep, sched/core: Implement a better lock pinning scheme 2016-05-05 09:23:59 +02:00
loadavg.c sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems 2016-05-12 09:55:34 +02:00
Makefile kernel: add kcov code coverage 2016-03-22 15:36:02 -07:00
rt.c Merge branch 'sched/urgent' into sched/core to pick up fixes 2016-05-12 09:18:13 +02:00
sched.h sched/core: Kill sched_class::task_waking to clean up the migration logic 2016-05-12 09:55:31 +02:00
stats.c sched: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:37 -08:00
stats.h sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
stop_task.c locking/lockdep, sched/core: Implement a better lock pinning scheme 2016-05-05 09:23:59 +02:00
swait.c wait.[ch]: Introduce the simple waitqueue (swait) implementation 2016-02-25 11:27:16 +01:00
wait.c sched/wait: Fix the signal handling fix 2015-12-13 14:30:59 -08:00