linux-stable/kernel
Vik Heyndrickx 20878232c5 sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.

Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.

Likewise but not as obviously noticeable, a fully loaded system with no
processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
(multiplied by number of cores).

Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.

It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.

By changing the way the internally kept value is rounded, that internal
value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
increasing load, the internally kept load value is rounded up, when the
load is decreasing, the load value is rounded down.

The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
tested on single, dual, and octal cores system. It was tested on virtual
hosts and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.

Tested-by: Damien Wyart <damien.wyart@free.fr>
Signed-off-by: Vik Heyndrickx <vik.heyndrickx@veribox.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Doug Smythies <dsmythies@telus.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0f004f5a69 ("sched: Cure more NO_HZ load average woes")
Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:55:34 +02:00
..
bpf bpf: fix check_map_func_compatibility logic 2016-04-28 17:29:45 -04:00
configs
debug mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings 2016-02-22 08:51:37 +01:00
events perf/core: Change the default paranoia level to 2 2016-05-09 17:57:12 -07:00
gcov gcov: use within_module() helper. 2015-12-04 22:46:25 +01:00
irq genirq: Dont allow affinity mask to be updated on IPIs 2016-04-21 12:05:15 +02:00
livepatch livepatch/module: remove livepatch module notifier 2016-03-17 09:45:10 +01:00
locking Merge branch 'sched/urgent' into sched/core to pick up fixes 2016-05-12 09:18:13 +02:00
power Power management and ACPI material for v4.6-rc1, part 2 2016-03-24 22:59:58 -07:00
printk printk: add clear_idx symbol to vmcoreinfo 2016-03-17 15:09:34 -07:00
rcu kernel: add kcov code coverage 2016-03-22 15:36:02 -07:00
sched sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems 2016-05-12 09:55:34 +02:00
time sched/fair: Correctly handle nohz ticks CPU load accounting 2016-04-23 14:20:42 +02:00
trace tracing: Don't display trigger file for events that can't be enabled 2016-05-03 12:59:30 -04:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
acct.c
async.c async: export current_is_async() 2015-11-19 17:51:48 +01:00
audit.c Merge branch 'stable-4.6' of git://git.infradead.org/users/pcmoore/audit 2016-03-19 17:52:49 -07:00
audit.h security: Make inode argument of inode_getsecid non-const 2015-12-24 11:09:39 -05:00
audit_fsnotify.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
audit_tree.c audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_watch.c Merge branch 'stable-4.6' of git://git.infradead.org/users/pcmoore/audit 2016-03-19 17:52:49 -07:00
auditfilter.c audit: Fix typo in comment 2016-02-08 11:25:39 -05:00
auditsc.c auditsc: for seccomp events, log syscall compat state using in_compat_syscall 2016-03-22 15:36:02 -07:00
backtracetest.c
bounds.c
capability.c
cgroup.c cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback 2016-04-25 15:45:14 -04:00
cgroup_freezer.c cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends 2015-12-03 10:24:08 -05:00
cgroup_pids.c cgroup_pids: fix a typo. 2015-12-14 14:54:37 -05:00
compat.c
configs.c
context_tracking.c context_tracking: Switch to new static_branch API 2015-11-24 09:56:43 +01:00
cpu.c sched/hotplug: Make activate() the last hotplug step 2016-05-06 14:58:25 +02:00
cpu_pm.c
cpuset.c cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback 2016-04-25 15:45:14 -04:00
crash_dump.c
cred.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
delayacct.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
dma.c
elfcore.c
exec_domain.c
exit.c oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space 2016-03-25 16:37:42 -07:00
extable.c
fork.c kernel: add kcov code coverage 2016-03-22 15:36:02 -07:00
freezer.c
futex.c futex: Acknowledge a new waiter in counter before plist 2016-04-21 11:06:09 +02:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
groups.c
hung_task.c kernel/hung_task.c: use timeout diff when timeout is updated 2016-03-22 15:36:02 -07:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
kallsyms.c kallsyms: add support for relative offsets in kallsyms address table 2016-03-15 16:55:16 -07:00
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: don't profile branches in kcov 2016-04-28 19:34:04 -07:00
kexec.c kexec: set KEXEC_TYPE_CRASH before sanity_check_segment_list() 2016-01-20 17:09:18 -08:00
kexec_core.c kexec: export OFFSET(page.compound_head) to find out compound tail page 2016-04-28 19:34:04 -07:00
kexec_file.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-03-17 11:33:45 -07:00
kexec_internal.h kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c
ksysfs.c rcu: Remove TINY_RCU bloat from pointless boot parameters 2015-12-07 16:59:37 -08:00
kthread.c
latencytop.c sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
Makefile kernel: add kcov code coverage 2016-03-22 15:36:02 -07:00
membarrier.c
memremap.c memremap: add MEMREMAP_WC flag 2016-03-22 15:36:02 -07:00
module-internal.h
module.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching 2016-03-17 21:46:32 -07:00
module_signing.c X.509: Make algo identifiers text instead of enum 2016-03-03 21:49:27 +00:00
notifier.c
nsproxy.c cgroup: introduce cgroup namespaces 2016-02-16 13:04:58 -05:00
padata.c
panic.c panic: change nmi_panic from macro to function 2016-03-22 15:36:02 -07:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 15:44:04 -08:00
pid_namespace.c
profile.c profile: hide unused functions when !CONFIG_PROC_FS 2016-03-22 15:36:02 -07:00
ptrace.c ptrace: change __ptrace_unlink() to clear ->ptrace under ->siglock 2016-03-22 15:36:02 -07:00
range.c
reboot.c
relay.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
resource.c /proc/iomem: only expose physical resource addresses to privileged users 2016-04-14 12:56:09 -07:00
seccomp.c seccomp: check in_compat_syscall, not is_compat_task, in strict mode 2016-03-22 15:36:02 -07:00
signal.c kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE 2016-03-22 15:36:02 -07:00
smp.c Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 13:50:29 -07:00
smpboot.c cpu/hotplug: Unpark smpboot threads from the state machine 2016-03-01 20:36:56 +01:00
smpboot.h cpu/hotplug: Create hotplug threads 2016-03-01 20:36:56 +01:00
softirq.c arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections 2016-03-25 16:37:42 -07:00
stacktrace.c
stop_machine.c kernel/stop_machine.c: remove CONFIG_SMP dependencies 2016-01-16 11:17:24 -08:00
sys.c timer: convert timer_slack_ns from unsigned long to u64 2016-03-17 15:09:34 -07:00
sys_ni.c vfs: add copy_file_range syscall and vfs helper 2015-12-01 14:00:53 -05:00
sysctl.c mm: scale kswapd watermarks in proportion to memory 2016-03-17 15:09:34 -07:00
sysctl_binary.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-03-22 15:36:02 -07:00
task_work.c
taskstats.c
test_kprobes.c
torture.c torture: Consolidate cond_resched_rcu_qs() into stutter_wait() 2015-10-06 11:25:01 -07:00
tracepoint.c kernel/...: convert pr_warning to pr_warn 2016-03-22 15:36:02 -07:00
tsacct.c time, acct: Drop irq save & restore from __acct_update_integrals() 2016-02-29 09:53:09 +01:00
uid16.c
up.c
user-return-notifier.c
user.c
user_namespace.c kernel/*: switch to memdup_user_nul() 2016-01-04 10:27:55 -05:00
utsname.c
utsname_sysctl.c
watchdog.c watchdog: don't run proc_watchdog_update if new value is same as old 2016-03-17 15:09:34 -07:00
workqueue.c Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2016-04-27 12:03:59 -07:00
workqueue_internal.h sched/core: Get rid of 'cpu' argument in wq_worker_sleeping() 2016-03-02 10:28:47 -05:00