linux-stable/kernel
Xunlei Pang d93cef31a5 sched/fair: Fix bandwidth timer clock drift condition
commit 512ac999d2 upstream.

I noticed that cgroup task groups constantly get throttled even
if they have low CPU usage, this causes some jitters on the response
time to some of our business containers when enabling CPU quotas.

It's very simple to reproduce:

  mkdir /sys/fs/cgroup/cpu/test
  cd /sys/fs/cgroup/cpu/test
  echo 100000 > cpu.cfs_quota_us
  echo $$ > tasks

then repeat:

  cat cpu.stat | grep nr_throttled  # nr_throttled will increase steadily

After some analysis, we found that cfs_rq::runtime_remaining will
be cleared by expire_cfs_rq_runtime() due to two equal but stale
"cfs_{b|q}->runtime_expires" after period timer is re-armed.

The current condition to judge clock drift in expire_cfs_rq_runtime()
is wrong, the two runtime_expires are actually the same when clock
drift happens, so this condtion can never hit. The orginal design was
correctly done by this commit:

  a9cf55b286 ("sched: Expire invalid runtime")

... but was changed to be the current implementation due to its locking bug.

This patch introduces another way, it adds a new field in both structures
cfs_rq and cfs_bandwidth to record the expiration update sequence, and
uses them to figure out if clock drift happens (true if they are equal).

Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[alakeshh: backport: Fixed merge conflicts:
 - sched.h: Fix the indentation and order in which the variables are
   declared to match with coding style of the existing code in 4.14
   Struct members of same type were declared in separate lines in
   upstream patch which has been changed back to having multiple
   members of same type in the same line.
   e.g. int a; int b; ->  int a, b; ]
Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: 51f2176d74 ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")
Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 08:09:46 +01:00
..
bpf bpf: Prevent memory disambiguation attack 2018-12-05 19:41:10 +01:00
cgroup cgroup: fix CSS_TASK_ITER_PROCS 2019-01-09 17:14:50 +01:00
configs ANDROID: binder: add hwbinder,vndbinder to BINDER_DEVICES. 2017-08-22 18:43:23 -07:00
debug kdb: use memmove instead of overlapping memcpy 2018-12-08 13:03:36 +01:00
events uprobes: Fix handle_swbp() vs. unregister() + register() race once more 2018-12-08 13:03:36 +01:00
gcov License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
irq genirq/affinity: Don't return with empty affinity masks on error 2019-01-13 10:01:02 +01:00
livepatch livepatch: Validate module/old func name length 2018-09-09 19:55:58 +02:00
locking locking/qspinlock, x86: Provide liveness guarantee 2018-12-21 14:13:09 +01:00
power PM / sleep: wakeup: Fix build error caused by missing SRCU support 2018-09-09 19:55:58 +02:00
printk printk: Wake klogd when passing console_lock owner 2018-12-17 09:28:55 +01:00
rcu rcu: Make need_resched() respond to urgent RCU-QS needs 2018-12-01 09:43:00 +01:00
sched sched/fair: Fix bandwidth timer clock drift condition 2019-01-23 08:09:46 +01:00
time posix-timers: Fix division by zero bug 2018-12-29 13:39:09 +01:00
trace tracing: Fix memory leak of instance function hash filters 2018-12-21 14:13:06 +01:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-10 09:31:17 +01:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-16 20:23:05 +01:00
audit.c audit: return on memory error to avoid null pointer dereference 2018-05-30 07:52:39 +02:00
audit.h ipc: mqueue: Replace timespec with timespec64 2017-09-03 20:21:24 -04:00
audit_fsnotify.c Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-05-03 11:05:15 -07:00
audit_tree.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
audit_watch.c audit: fix use-after-free in audit_add_watch 2018-09-26 08:38:09 +02:00
auditfilter.c audit: allow not equal op for audit by executable 2018-08-03 07:50:39 +02:00
auditsc.c audit: fix potential null dereference 'context->module.name' 2018-08-06 16:20:49 +02:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:15:08 -08:00
capability.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c compat: fix 4-byte infoleak via uninitialized struct field 2018-05-16 10:10:26 +02:00
configs.c
context_tracking.c
cpu.c x86/speculation: Rework SMT state change 2018-12-05 19:41:20 +01:00
cpu_pm.c PM / CPU: replace raw_notifier with atomic_notifier 2017-07-31 13:09:49 +02:00
crash_core.c kdump: write correct address of mem_section into vmcoreinfo 2018-01-17 09:45:27 +01:00
crash_dump.c
cred.c doc: ReSTify credentials.txt 2017-05-18 10:30:19 -06:00
delayacct.c delayacct: Use raw_spinlocks 2018-08-03 07:50:38 +02:00
dma.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elfcore.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exec_domain.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exit.c kernel/exit.c: export abort() to modules 2018-02-13 10:19:49 +01:00
extable.c extable: Enable RCU if it is not watching in kernel_text_address() 2017-09-23 16:50:20 -04:00
fork.c fork: record start_time late 2019-01-13 10:01:02 +01:00
freezer.c
futex.c futex: Fix OWNER_DEAD fixup 2018-02-03 17:38:47 +01:00
futex_compat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-20 10:10:18 +01:00
hung_task.c kernel/hung_task.c: show all hung tasks before panic 2018-08-03 07:50:23 +02:00
irq_work.c
jump_label.c sched/core: Fix cpu.max vs. cpuhotplug deadlock 2018-12-05 19:41:17 +01:00
kallsyms.c kernel/kallsyms.c: replace all_var with IS_ENABLED(CONFIG_KALLSYMS_ALL) 2017-07-10 16:32:34 -07:00
kcmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kcov.c kcov: ensure irq code sees a valid area 2018-08-03 07:50:22 +02:00
kexec.c kdump: protect vmcoreinfo data under the crash memory 2017-07-12 16:26:00 -07:00
kexec_core.c x86/mm, kexec: Allow kexec to be used with SME 2017-07-18 11:38:04 +02:00
kexec_file.c kexec_file: adjust declaration of kexec_purgatory 2017-07-12 16:26:02 -07:00
kexec_internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kmod.c kmod: move #ifdef CONFIG_MODULES wrapper to Makefile 2017-09-08 18:26:51 -07:00
kprobes.c kprobes: Return error if we fail to reuse kprobe instead of BUG_ON() 2018-11-13 11:14:55 -08:00
ksysfs.c kexec: move vmcoreinfo out of the kernel's .bss section 2017-07-12 16:25:59 -07:00
kthread.c kthread, tracing: Don't expose half-written comm when creating kthreads 2018-08-03 07:50:21 +02:00
latencytop.c
memremap.c mm, devm_memremap_pages: kill mapping "System RAM" support 2019-01-13 10:01:02 +01:00
module-internal.h
module.c module: exclude SHN_UNDEF symbols from kallsyms api 2018-10-03 17:00:53 -07:00
module_signing.c
notifier.c
nsproxy.c
padata.c padata: Avoid nested calls to cpus_read_lock() in pcrypt_init_padata() 2017-05-26 10:10:37 +02:00
panic.c panic: avoid deadlocks in re-entrant console drivers 2018-12-29 13:39:10 +01:00
params.c kernel/params.c: improve STANDARD_PARAM_DEF readability 2017-10-03 17:54:26 -07:00
pid.c pids: make task_tgid_nr_ns() safe 2017-08-21 12:47:31 -07:00
pid_namespace.c userns,pidns: Verify the userns for new pid namespaces 2017-07-20 07:43:58 -05:00
profile.c
ptrace.c ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS 2018-12-05 19:41:21 +01:00
range.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c
relay.c kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE 2018-05-30 07:52:00 +02:00
resource.c resource: fix integer overflow at reallocation 2018-04-24 09:36:22 +02:00
seccomp.c seccomp: Move speculation migitation control to arch code 2018-05-22 18:54:04 +02:00
signal.c signal: Guard against negative signal numbers in copy_siginfo_from_user32 2018-11-13 11:15:07 -08:00
smp.c cpu/hotplug: Fix SMT supported evaluation 2018-08-15 18:13:00 +02:00
smpboot.c watchdog/core, powerpc: Lock cpus across reconfiguration 2017-10-04 10:53:54 +02:00
smpboot.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
softirq.c Mark HI and TASKLET softirq synchronous 2018-08-15 18:12:47 +02:00
stacktrace.c
stop_machine.c stop_machine: Atomically queue and wake stopper threads 2018-09-05 09:26:36 +02:00
sys.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 19:56:00 +02:00
sys_ni.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sysctl.c namei: allow restricted O_CREAT of FIFOs and regular files 2018-12-01 09:42:59 +01:00
sysctl_binary.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-25 14:26:21 +01:00
taskstats.c taskstats: add e/u/stime for TGID command 2017-05-08 17:15:12 -07:00
test_kprobes.c
torture.c torture: Fix typo suppressing CPU-hotplug statistics 2017-07-25 13:04:45 -07:00
tracepoint.c tracepoint: Do not warn on ENOMEM 2018-05-09 09:51:50 +02:00
tsacct.c
ucount.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-20 10:10:18 +01:00
umh.c kmod: split out umh code into its own file 2017-09-08 18:26:50 -07:00
up.c smp: Avoid using two cache lines for struct call_single_data 2017-08-29 15:14:38 +02:00
user-return-notifier.c
user.c
user_namespace.c userns: move user access out of the mutex 2018-09-09 19:56:00 +02:00
utsname.c
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 19:56:00 +02:00
watchdog.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
workqueue.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
workqueue_internal.h Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2017-11-06 12:26:49 -08:00