linux-stable/kernel/sched
Xunlei Pang d93cef31a5 sched/fair: Fix bandwidth timer clock drift condition
commit 512ac999d2 upstream.

I noticed that cgroup task groups constantly get throttled even
if they have low CPU usage, this causes some jitters on the response
time to some of our business containers when enabling CPU quotas.

It's very simple to reproduce:

  mkdir /sys/fs/cgroup/cpu/test
  cd /sys/fs/cgroup/cpu/test
  echo 100000 > cpu.cfs_quota_us
  echo $$ > tasks

then repeat:

  cat cpu.stat | grep nr_throttled  # nr_throttled will increase steadily

After some analysis, we found that cfs_rq::runtime_remaining will
be cleared by expire_cfs_rq_runtime() due to two equal but stale
"cfs_{b|q}->runtime_expires" after period timer is re-armed.

The current condition to judge clock drift in expire_cfs_rq_runtime()
is wrong, the two runtime_expires are actually the same when clock
drift happens, so this condtion can never hit. The orginal design was
correctly done by this commit:

  a9cf55b286 ("sched: Expire invalid runtime")

... but was changed to be the current implementation due to its locking bug.

This patch introduces another way, it adds a new field in both structures
cfs_rq and cfs_bandwidth to record the expiration update sequence, and
uses them to figure out if clock drift happens (true if they are equal).

Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[alakeshh: backport: Fixed merge conflicts:
 - sched.h: Fix the indentation and order in which the variables are
   declared to match with coding style of the existing code in 4.14
   Struct members of same type were declared in separate lines in
   upstream patch which has been changed back to having multiple
   members of same type in the same line.
   e.g. int a; int b; ->  int a, b; ]
Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: 51f2176d74 ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")
Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-23 08:09:46 +01:00
..
autogroup.c sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] 2018-05-16 10:10:30 +02:00
autogroup.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
clock.c sched/clock: Fix early boot preempt assumption in __set_sched_clock_stable() 2017-05-24 09:10:00 +02:00
completion.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
core.c sched/smt: Make sched_smt_present track topology 2018-12-05 19:41:20 +01:00
cpuacct.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cpuacct.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cpudeadline.c sched/deadline: Change return value of cpudl_find() 2017-08-10 12:18:17 +02:00
cpudeadline.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cpufreq.c cpufreq / sched: Pass flags to cpufreq_update_util() 2016-08-16 22:14:55 +02:00
cpufreq_schedutil.c cpufreq: schedutil: Avoid using invalid next_freq 2018-05-16 10:10:29 +02:00
cpupri.c sched/cpupri: Don't re-initialize 'struct cpupri' 2017-08-10 12:18:14 +02:00
cpupri.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cputime.c sched/cputime: Don't use smp_processor_id() in preemptible context 2017-07-14 10:27:15 +02:00
deadline.c sched/deadline: Fix switching to -deadline 2018-09-15 09:45:35 +02:00
debug.c sched/debug: Remove unused variable 2017-09-29 10:09:09 +02:00
fair.c sched/fair: Fix bandwidth timer clock drift condition 2019-01-23 08:09:46 +01:00
features.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
idle.c PM / s2idle: Rename ->enter_freeze to ->enter_s2idle 2017-08-11 01:29:56 +02:00
idle_task.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
loadavg.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
membarrier.c membarrier: Disable preemption when calling smp_call_function_many() 2018-01-17 09:45:23 +01:00
rt.c sched/rt: Restore rt_runtime after disabling RT_RUNTIME_SHARE 2018-09-05 09:26:29 +02:00
sched-pelt.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sched.h sched/fair: Fix bandwidth timer clock drift condition 2019-01-23 08:09:46 +01:00
stats.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
stats.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
stop_task.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swait.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
topology.c sched/rt: Up the root domain ref count when passing it around via IPIs 2018-02-16 20:22:44 +01:00
wait.c sched/core: Use smp_mb() in wake_woken_function() 2018-09-26 08:38:10 +02:00
wait_bit.c sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming 2017-06-20 12:19:14 +02:00