linux-stable/kernel
Yicong Yang 32d937f94b sched/fair: Don't balance task to its current running CPU
[ Upstream commit 0dd37d6dd3 ]

We've run into the case that the balancer tries to balance a migration
disabled task and trigger the warning in set_task_cpu() like below:

 ------------[ cut here ]------------
 WARNING: CPU: 7 PID: 0 at kernel/sched/core.c:3115 set_task_cpu+0x188/0x240
 Modules linked in: hclgevf xt_CHECKSUM ipt_REJECT nf_reject_ipv4 <...snip>
 CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G           O       6.1.0-rc4+ #1
 Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B221.01 12/09/2021
 pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : set_task_cpu+0x188/0x240
 lr : load_balance+0x5d0/0xc60
 sp : ffff80000803bc70
 x29: ffff80000803bc70 x28: ffff004089e190e8 x27: ffff004089e19040
 x26: ffff007effcabc38 x25: 0000000000000000 x24: 0000000000000001
 x23: ffff80000803be84 x22: 000000000000000c x21: ffffb093e79e2a78
 x20: 000000000000000c x19: ffff004089e19040 x18: 0000000000000000
 x17: 0000000000001fad x16: 0000000000000030 x15: 0000000000000000
 x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000000
 x11: 0000000000000001 x10: 0000000000000400 x9 : ffffb093e4cee530
 x8 : 00000000fffffffe x7 : 0000000000ce168a x6 : 000000000000013e
 x5 : 00000000ffffffe1 x4 : 0000000000000001 x3 : 0000000000000b2a
 x2 : 0000000000000b2a x1 : ffffb093e6d6c510 x0 : 0000000000000001
 Call trace:
  set_task_cpu+0x188/0x240
  load_balance+0x5d0/0xc60
  rebalance_domains+0x26c/0x380
  _nohz_idle_balance.isra.0+0x1e0/0x370
  run_rebalance_domains+0x6c/0x80
  __do_softirq+0x128/0x3d8
  ____do_softirq+0x18/0x24
  call_on_irq_stack+0x2c/0x38
  do_softirq_own_stack+0x24/0x3c
  __irq_exit_rcu+0xcc/0xf4
  irq_exit_rcu+0x18/0x24
  el1_interrupt+0x4c/0xe4
  el1h_64_irq_handler+0x18/0x2c
  el1h_64_irq+0x74/0x78
  arch_cpu_idle+0x18/0x4c
  default_idle_call+0x58/0x194
  do_idle+0x244/0x2b0
  cpu_startup_entry+0x30/0x3c
  secondary_start_kernel+0x14c/0x190
  __secondary_switched+0xb0/0xb4
 ---[ end trace 0000000000000000 ]---

Further investigation shows that the warning is superfluous, the migration
disabled task is just going to be migrated to its current running CPU.
This is because that on load balance if the dst_cpu is not allowed by the
task, we'll re-select a new_dst_cpu as a candidate. If no task can be
balanced to dst_cpu we'll try to balance the task to the new_dst_cpu
instead. In this case when the migration disabled task is not on CPU it
only allows to run on its current CPU, load balance will select its
current CPU as new_dst_cpu and later triggers the warning above.

The new_dst_cpu is chosen from the env->dst_grpmask. Currently it
contains CPUs in sched_group_span() and if we have overlapped groups it's
possible to run into this case. This patch makes env->dst_grpmask of
group_balance_mask() which exclude any CPUs from the busiest group and
solve the issue. For balancing in a domain with no overlapped groups
the behaviour keeps same as before.

Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20230530082507.10444-1-yangyicong@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11 11:33:48 +02:00
..
bpf bpf: Adjust insufficient default bpf_jit_limit 2023-04-05 11:14:15 +02:00
cgroup cgroup: Do not corrupt task iteration when rebinding subsystem 2023-06-28 10:14:19 +02:00
configs
debug kdb: Make memory allocations more robust 2021-03-03 18:22:36 +01:00
events treewide: Remove uninitialized_var() usage 2023-08-11 11:33:32 +02:00
gcov gcov: add support for checksum field 2023-01-18 09:26:35 +01:00
irq irqdomain: Drop bogus fwspec-mapping error handling 2023-03-11 16:26:47 +01:00
livepatch livepatch: fix race between fork and KLP transition 2022-10-26 13:16:57 +02:00
locking treewide: Remove uninitialized_var() usage 2023-08-11 11:33:32 +02:00
power PM: hibernate: Fix mistake in kerneldoc comment 2023-01-18 09:26:09 +01:00
printk printk: fix return value of printk.devkmsg __setup handler 2022-04-20 09:08:15 +02:00
rcu rcu: Suppress smp_processor_id() complaint in synchronize_rcu_expedited_wait() 2023-03-11 16:26:41 +01:00
sched sched/fair: Don't balance task to its current running CPU 2023-08-11 11:33:48 +02:00
time posix-timers: Ensure timer ID search-loop limit is valid 2023-08-11 11:33:48 +02:00
trace ring-buffer: Fix deadloop issue on reading trace_pipe 2023-08-11 11:33:46 +02:00
.gitignore
acct.c acct: fix potential integer overflow in encode_comp_t() 2023-01-18 09:26:30 +01:00
async.c treewide: Remove uninitialized_var() usage 2023-08-11 11:33:32 +02:00
audit.c treewide: Remove uninitialized_var() usage 2023-08-11 11:33:32 +02:00
audit.h audit: fix a net reference leak in audit_list_rules_send() 2020-06-20 10:25:10 +02:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-09-05 10:25:02 +02:00
audit_tree.c
audit_watch.c audit: CONFIG_CHANGE don't log internal bookkeeping as an event 2020-10-01 13:12:33 +02:00
auditfilter.c audit: fix a net reference leak in audit_list_rules_send() 2020-06-20 10:25:10 +02:00
auditsc.c audit: print empty EXECVE args 2019-12-01 09:14:03 +01:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:15:08 -08:00
capability.c
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-05 11:14:19 +02:00
configs.c
context_tracking.c
cpu.c random: clear fast pool, crng, and batches in cpuhp bring up 2022-06-25 11:46:35 +02:00
cpu_pm.c kernel/cpu_pm: Fix uninitted local in cpu_pm 2020-06-20 10:25:19 +02:00
crash_core.c kdump: write correct address of mem_section into vmcoreinfo 2018-01-17 09:45:27 +01:00
crash_dump.c
cred.c memcg: account security cred as well to kmemcg 2020-01-09 10:17:54 +01:00
delayacct.c delayacct: Use raw_spinlocks 2018-08-03 07:50:38 +02:00
dma.c
exec_domain.c
exit.c treewide: Remove uninitialized_var() usage 2023-08-11 11:33:32 +02:00
extable.c kernel/extable.c: use address-of operator on section symbols 2023-06-09 10:22:53 +02:00
fork.c mm/hugetlb: initialize hugetlb_usage in mm_init 2021-09-22 11:45:32 +02:00
freezer.c
futex.c treewide: Remove uninitialized_var() usage 2023-08-11 11:33:32 +02:00
groups.c
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:15:05 +02:00
irq_work.c
jump_label.c sched/core: Fix cpu.max vs. cpuhotplug deadlock 2018-12-05 19:41:17 +01:00
kallsyms.c kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol 2019-09-21 07:15:38 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: ensure irq code sees a valid area 2018-08-03 07:50:22 +02:00
kexec.c
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-08-11 11:33:34 +02:00
kexec_file.c kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add] 2022-07-02 16:18:11 +02:00
kexec_internal.h
kmod.c kmod: make request_module() return an error when autoloading is disabled 2020-04-24 08:00:44 +02:00
kprobes.c x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range 2023-03-11 16:26:46 +01:00
ksysfs.c
kthread.c kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() 2021-07-11 12:48:13 +02:00
latencytop.c
Makefile elfcore: fix building with clang 2021-02-10 09:12:08 +01:00
memremap.c mm, devm_memremap_pages: kill mapping "System RAM" support 2019-01-13 10:01:02 +01:00
module-internal.h
module.c module: Don't wait for GOING modules 2023-02-06 07:46:31 +01:00
module_signing.c
notifier.c x86/mm: split vmalloc_sync_all() 2020-04-02 16:34:20 +02:00
nsproxy.c
padata.c padata: purge get_cpu and reorder_via_wq from padata_do_serial 2020-05-27 16:43:05 +02:00
panic.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-06 07:46:35 +01:00
params.c
pid.c
pid_namespace.c memcg: enable accounting for pids in nested pid namespaces 2021-09-22 11:45:32 +02:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-25 11:11:24 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-14 16:53:43 +02:00
range.c
reboot.c reboot: fix overflow parsing reboot cpu number 2020-11-18 18:28:02 +01:00
relay.c kernel/relay.c: fix memleak on destroy relay channel 2020-08-26 10:29:54 +02:00
resource.c resource: fix integer overflow at reallocation 2018-04-24 09:36:22 +02:00
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:44:52 +01:00
signal.c signal handling: don't use BUG_ON() for debugging 2022-07-21 20:42:47 +02:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:08:33 +02:00
smpboot.c kthread: Extract KTHREAD_IS_PER_CPU 2021-02-07 14:47:41 +01:00
smpboot.h
softirq.c Mark HI and TASKLET softirq synchronous 2018-08-15 18:12:47 +02:00
stacktrace.c
stop_machine.c stop_machine: Atomically queue and wake stopper threads 2018-09-05 09:26:36 +02:00
sys.c prlimit: do_prlimit needs to have a speculation check 2023-01-24 07:05:18 +01:00
sys_ni.c
sysctl.c proc: proc_skip_spaces() shouldn't think it is working on C strings 2022-12-08 11:16:33 +01:00
sysctl_binary.c
task_work.c
taskstats.c taskstats: fix data-race 2020-01-09 10:17:53 +01:00
test_kprobes.c
torture.c
tracepoint.c tracepoint: Do not fail unregistering a probe due to memory failure 2021-03-03 18:22:47 +01:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-02-23 11:57:34 +01:00
ucount.c
uid16.c
umh.c usermodehelper: reset umask to default before executing user process 2020-10-14 09:51:10 +02:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-22 10:57:35 +02:00
user-return-notifier.c
user.c
user_namespace.c userns: move user access out of the mutex 2018-09-09 19:56:00 +02:00
utsname.c
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 19:56:00 +02:00
watchdog.c watchdog/softlockup: Enforce that timestamp is valid on boot 2020-02-28 16:36:05 +01:00
watchdog_hld.c watchdog/perf: more properly prevent false positives with turbo modes 2023-08-11 11:33:34 +02:00
workqueue.c workqueue: clean up WORK_* constant types, clarify masking 2023-08-11 11:33:43 +02:00
workqueue_internal.h