linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-11-01 17:08:10 +00:00

History

Michael Wang dd39eadc71 sched: Avoid scale real weight down to zero [ Upstream commit `26cf52229e` ] During our testing, we found a case that shares no longer working correctly, the cgroup topology is like: /sys/fs/cgroup/cpu/A (shares=102400) /sys/fs/cgroup/cpu/A/B (shares=2) /sys/fs/cgroup/cpu/A/B/C (shares=1024) /sys/fs/cgroup/cpu/D (shares=1024) /sys/fs/cgroup/cpu/D/E (shares=1024) /sys/fs/cgroup/cpu/D/E/F (shares=1024) The same benchmark is running in group C & F, no other tasks are running, the benchmark is capable to consumed all the CPUs. We suppose the group C will win more CPU resources since it could enjoy all the shares of group A, but it's F who wins much more. The reason is because we have group B with shares as 2, since A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus, so A->cfs_rq.load.weight become very small. And in calc_group_shares() we calculate shares as: load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg); shares = (tg_shares * load) / tg_weight; Since the 'cfs_rq->load.weight' is too small, the load become 0 after scale down, although 'tg_shares' is 102400, shares of the se which stand for group A on root cfs_rq become 2. While the se of D on root cfs_rq is far more bigger than 2, so it wins the battle. Thus when scale_load_down() scale real weight down to 0, it's no longer telling the real story, the caller will have the wrong information and the calculation will be buggy. This patch add check in scale_load_down(), so the real weight will be >= MIN_SHARES after scale, after applied the group C wins as expected. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>		2020-04-17 10:50:02 +02:00
..
autogroup.c	sched/autogroup: Make autogroup_path() always available	2019-06-24 19:23:40 +02:00
autogroup.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
clock.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
completion.c	sched/Documentation: Update wake_up() & co. memory-barrier guarantees	2018-07-17 09:30:34 +02:00
core.c	sched/fair: Prevent unlimited runtime on throttled group	2020-03-05 16:43:36 +01:00
cpuacct.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpudeadline.c	Linux 5.2-rc5	2019-06-17 12:12:27 +02:00
cpudeadline.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpufreq.c	cpufreq: Avoid leaving stale IRQ work items during CPU offline	2019-12-31 16:46:06 +01:00
cpufreq_schedutil.c	cpufreq: Avoid leaving stale IRQ work items during CPU offline	2019-12-31 16:46:06 +01:00
cpupri.c	Linux 5.2-rc5	2019-06-17 12:12:27 +02:00
cpupri.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cputime.c	sched/vtime: Fix guest/system mis-accounting on task switch	2019-10-09 12:38:03 +02:00
deadline.c	sched/core: Further clarify sched_class::set_next_task()	2020-01-26 10:01:03 +01:00
debug.c	Linux 5.2-rc6	2019-06-24 19:19:53 +02:00
fair.c	sched/fair: Optimize select_idle_cpu	2020-03-05 16:43:48 +01:00
features.h	sched/fair: Replace source_load() & target_load() with weighted_cpuload()	2019-06-03 11:49:39 +02:00
idle.c	sched/core: Further clarify sched_class::set_next_task()	2020-01-26 10:01:03 +01:00
isolation.c	sched/isolation: Prefer housekeeping CPU in local node	2019-07-25 15:51:55 +02:00
loadavg.c	timers/nohz: Update NOHZ load in remote tick	2020-03-05 16:43:36 +01:00
Makefile	psi: pressure stall information for CPU, memory, and IO	2018-10-26 16:26:32 -07:00
membarrier.c	membarrier: Fix RCU locking bug caused by faulty merge	2019-10-01 21:27:50 +02:00
pelt.c	sched/debug: Add new tracepoint to track PELT at se level	2019-06-24 19:23:42 +02:00
pelt.h	sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()	2019-06-24 19:23:39 +02:00
psi.c	sched/psi: Fix OOB write when writing 0 bytes to PSI files	2020-02-28 17:22:21 +01:00
rt.c	sched/core: Further clarify sched_class::set_next_task()	2020-01-26 10:01:03 +01:00
sched-pelt.h	sched/fair: Fix "runnable_avg_yN_inv" not used warnings	2019-06-17 12:15:58 +02:00
sched.h	sched: Avoid scale real weight down to zero	2020-04-17 10:50:02 +02:00
stats.c	proc: introduce proc_create_seq{,_data}	2018-05-16 07:23:35 +02:00
stats.h	sched/stats: Fix unlikely() use of sched_info_on()	2019-07-25 15:51:55 +02:00
stop_task.c	sched/core: Further clarify sched_class::set_next_task()	2020-01-26 10:01:03 +01:00
swait.c	kernel/sched/: remove caller signal_pending branch predictions	2019-01-04 13:13:48 -08:00
topology.c	sched/topology: Assert non-NUMA topology masks don't (partially) overlap	2020-02-24 08:36:52 +01:00
wait.c	sched/wait: Deduplicate code with do-while	2019-06-24 19:23:40 +02:00
wait_bit.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00