linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-23 11:01:15 +00:00

History

Shrikanth Hegde f8858d9606 sched/fair: Optimize should_we_balance() for large SMT systems should_we_balance() is called in load_balance() to find out if the CPU that is trying to do the load balance is the right one or not. With commit: b1bfeab9b002("sched/fair: Consider the idle state of the whole core for load balance") the code tries to find an idle core to do the load balancing and falls back on an idle sibling CPU if there is no idle core. However, on larger SMT systems, it could be needlessly iterating to find a idle by scanning all the CPUs in an non-idle core. If the core is not idle, and first SMT sibling which is idle has been found, then its not needed to check other SMT siblings for idleness Lets say in SMT4, Core0 has 0,2,4,6 and CPU0 is BUSY and rest are IDLE. balancing domain is MC/DIE. CPU2 will be set as the first idle_smt and same process would be repeated for CPU4 and CPU6 but this is unnecessary. Since calling is_core_idle loops through all CPU's in the SMT mask, effect is multiplied by weight of smt_mask. For example,when say 1 CPU is busy, we would skip loop for 2 CPU's and skip iterating over 8CPU's. That effect would be more in DIE/NUMA domain where there are more cores. Testing and performance evaluation ================================== The test has been done on this system which has 12 cores, i.e 24 small cores with SMT=4: lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Model name: POWER10 (architected), altivec supported Thread(s) per core: 8 Used funclatency bcc tool to evaluate the time taken by should_we_balance(). For base tip/sched/core the time taken is collected by making the should_we_balance() noinline. time is in nanoseconds. The values are collected by running the funclatency tracer for 60 seconds. values are average of 3 such runs. This represents the expected reduced time with patch. tip/sched/core was at commit: `2f88c8e802` ("sched/eevdf/doc: Modify the documented knob to base_slice_ns as well") Results: ------------------------------------------------------------------------------ workload tip/sched/core with_patch(%gain) ------------------------------------------------------------------------------ idle system 809.3 695.0(16.45) stress ng – 12 threads -l 100 1013.5 893.1(13.49) stress ng – 24 threads -l 100 1073.5 980.0(9.54) stress ng – 48 threads -l 100 683.0 641.0(6.55) stress ng – 96 threads -l 100 2421.0 2300(5.26) stress ng – 96 threads -l 15 375.5 377.5(-0.53) stress ng – 96 threads -l 25 635.5 637.5(-0.31) stress ng – 96 threads -l 35 934.0 891.0(4.83) Ran schbench(old), hackbench and stress_ng to evaluate the workload performance between tip/sched/core and with patch. No modification to tip/sched/core TL;DR: Good improvement is seen with schbench. when hackbench and stress_ng runs for longer good improvement is seen. ------------------------------------------------------------------------------ schbench(old) tip +patch(%gain) 10 iterations sched/core ------------------------------------------------------------------------------ 1 Threads 50.0th: 8.00 9.00(-12.50) 75.0th: 9.60 9.00(6.25) 90.0th: 11.80 10.20(13.56) 95.0th: 12.60 10.40(17.46) 99.0th: 13.60 11.90(12.50) 99.5th: 14.10 12.60(10.64) 99.9th: 15.90 14.60(8.18) 2 Threads 50.0th: 9.90 9.20(7.07) 75.0th: 12.60 10.10(19.84) 90.0th: 15.50 12.00(22.58) 95.0th: 17.70 14.00(20.90) 99.0th: 21.20 16.90(20.28) 99.5th: 22.60 17.50(22.57) 99.9th: 30.40 19.40(36.18) 4 Threads 50.0th: 12.50 10.60(15.20) 75.0th: 15.30 12.00(21.57) 90.0th: 18.60 14.10(24.19) 95.0th: 21.30 16.20(23.94) 99.0th: 26.00 20.70(20.38) 99.5th: 27.60 22.50(18.48) 99.9th: 33.90 31.40(7.37) 8 Threads 50.0th: 16.30 14.30(12.27) 75.0th: 20.20 17.40(13.86) 90.0th: 24.50 21.90(10.61) 95.0th: 27.30 24.70(9.52) 99.0th: 35.00 31.20(10.86) 99.5th: 46.40 33.30(28.23) 99.9th: 89.30 57.50(35.61) 16 Threads 50.0th: 22.70 20.70(8.81) 75.0th: 30.10 27.40(8.97) 90.0th: 36.00 32.80(8.89) 95.0th: 39.60 36.40(8.08) 99.0th: 49.20 44.10(10.37) 99.5th: 64.90 50.50(22.19) 99.9th: 143.50 100.60(29.90) 32 Threads 50.0th: 34.60 35.50(-2.60) 75.0th: 48.20 50.50(-4.77) 90.0th: 59.20 62.40(-5.41) 95.0th: 65.20 69.00(-5.83) 99.0th: 80.40 83.80(-4.23) 99.5th: 102.10 98.90(3.13) 99.9th: 727.10 506.80(30.30) schbench does improve in general. There is some run to run variation with schbench. Did a validation run to confirm that trend is similar. ------------------------------------------------------------------------------ hackbench tip +patch(%gain) 20 iterations, 50000 loops sched/core ------------------------------------------------------------------------------ Process 10 groups : 11.74 11.70(0.34) Process 20 groups : 22.73 22.69(0.18) Process 30 groups : 33.39 33.40(-0.03) Process 40 groups : 43.73 43.61(0.27) Process 50 groups : 53.82 54.35(-0.98) Process 60 groups : 64.16 65.29(-1.76) thread 10 Time : 12.81 12.79(0.16) thread 20 Time : 24.63 24.47(0.65) Process(Pipe) 10 Time : 6.40 6.34(0.94) Process(Pipe) 20 Time : 10.62 10.63(-0.09) Process(Pipe) 30 Time : 15.09 14.84(1.66) Process(Pipe) 40 Time : 19.42 19.01(2.11) Process(Pipe) 50 Time : 24.04 23.34(2.91) Process(Pipe) 60 Time : 28.94 27.51(4.94) thread(Pipe) 10 Time : 6.96 6.87(1.29) thread(Pipe) 20 Time : 11.74 11.73(0.09) hackbench shows slight improvement with pipe. Slight degradation in process. ------------------------------------------------------------------------------ stress_ng tip +patch(%gain) 10 iterations 100000 cpu_ops sched/core ------------------------------------------------------------------------------ --cpu=96 -util=100 Time taken : 5.30, 5.01(5.47) --cpu=48 -util=100 Time taken : 7.94, 6.73(15.24) --cpu=24 -util=100 Time taken : 11.67, 8.75(25.02) --cpu=12 -util=100 Time taken : 15.71, 15.02(4.39) --cpu=96 -util=10 Time taken : 22.71, 22.19(2.29) --cpu=96 -util=20 Time taken : 12.14, 12.37(-1.89) --cpu=96 -util=30 Time taken : 8.76, 8.86(-1.14) --cpu=96 -util=40 Time taken : 7.13, 7.14(-0.14) --cpu=96 -util=50 Time taken : 6.10, 6.13(-0.49) --cpu=96 -util=60 Time taken : 5.42, 5.41(0.18) --cpu=96 -util=70 Time taken : 4.94, 4.94(0.00) --cpu=96 -util=80 Time taken : 4.56, 4.53(0.66) --cpu=96 -util=90 Time taken : 4.27, 4.26(0.23) Good improvement seen with 24 CPUs. In this case only one CPU is busy, and no core is idle. Decent improvement with 100% utilization case. no difference in other utilization. Fixes: `b1bfeab9b0` ("sched/fair: Consider the idle state of the whole core for load balance") Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230902081204.232218-1-sshegde@linux.vnet.ibm.com		2023-09-02 12:56:04 +02:00
..
bpf	v6.6-vfs.ctime	2023-08-28 09:31:32 -07:00
cgroup	Linux 6.5-rc2	2023-07-19 09:43:25 +02:00
configs	mm/slab: rename CONFIG_SLAB to CONFIG_SLAB_DEPRECATED	2023-05-26 19:01:47 +02:00
debug	kdb: move kdb_send_sig() declaration to a better header file	2023-07-03 09:27:12 +01:00
dma	dma-mapping fixes for Linux 6.5	2023-07-09 10:24:22 -07:00
entry	entry: Remove empty addr_limit_user_check()	2023-08-23 10:32:39 +02:00
events	Perf events changes for v6.6:	2023-08-28 16:35:01 -07:00
futex	- Prevent the leaking of a debug timer in futex_waitv()	2023-01-01 11:15:05 -08:00
gcov	gcov: add support for checksum field	2022-12-21 14:31:52 -08:00
irq	Boring updates for the interrupt subsystem:	2023-08-28 14:33:11 -07:00
kcsan	kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures	2023-06-09 23:29:50 +10:00
livepatch	livepatch: Make 'klp_stack_entries' static	2023-06-05 13:56:52 +02:00
locking	Misc x86 cleanups.	2023-08-28 17:05:58 -07:00
module	module: fix init_module_from_file() error handling	2023-07-04 10:17:11 -07:00
power	PM: hibernate: fix resume_store() return value when hibernation not available	2023-08-07 11:41:11 +02:00
printk	seqlock/latch: Provide raw_read_seqcount_latch_retry()	2023-06-05 21:11:03 +02:00
rcu	Merge branches 'doc.2023.07.14b', 'fixes.2023.08.16a', 'rcu-tasks.2023.07.24a', 'rcuscale.2023.07.14b', 'refscale.2023.07.14b', 'torture.2023.08.14a' and 'torturescripts.2023.07.20a' into HEAD	2023-08-16 14:31:08 -07:00
sched	sched/fair: Optimize should_we_balance() for large SMT systems	2023-09-02 12:56:04 +02:00
time	clocksource: Handle negative skews in "skew is too large" messages	2023-07-14 15:17:09 -07:00
trace	tracing: Introduce pipe_cpumask to avoid race on trace_pipes	2023-08-21 11:17:14 -04:00
.gitignore
acct.c	acct: fix potential integer overflow in encode_comp_t()	2022-11-30 16:13:18 -08:00
async.c
audit.c	audit: use time_after to compare time	2022-08-29 19:47:03 -04:00
audit.h	audit: avoid missing-prototype warnings	2023-05-17 11:34:55 -04:00
audit_fsnotify.c	audit: fix potential double free on error path from fsnotify_add_inode_mark	2022-08-22 18:50:06 -04:00
audit_tree.c
audit_watch.c	audit_init_parent(): constify path	2022-09-01 17:39:30 -04:00
auditfilter.c
auditsc.c	capability: just use a 'u64' instead of a 'u32[2]' array	2023-03-01 10:01:22 -08:00
backtracetest.c
bounds.c	mm: multi-gen LRU: minimal implementation	2022-09-26 19:46:09 -07:00
capability.c	capability: fix kernel-doc warnings in capability.c	2023-05-22 14:30:52 -04:00
cfi.c	cfi: Switch to -fsanitize=kcfi	2022-09-26 10:13:13 -07:00
compat.c	sched_getaffinity: don't assume 'cpumask_size()' is fully initialized	2023-03-14 19:32:38 -07:00
configs.c
context_tracking.c	locking/atomic: treewide: use raw_atomic*_<op>()	2023-06-05 09:57:20 +02:00
cpu.c	cpu/SMT: Fix cpu_smt_possible() comment	2023-07-31 17:32:44 +02:00
cpu_pm.c	cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()	2023-01-13 11:48:15 +01:00
crash_core.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
crash_dump.c
cred.c	cred: Do not default to init_cred in prepare_kernel_cred()	2022-11-01 10:04:52 -07:00
delayacct.c	delayacct: track delays from IRQ/SOFTIRQ	2023-04-18 16:39:34 -07:00
dma.c
exec_domain.c
exit.c	fork, vhost: Use CLONE_THREAD to fix freezer/ps regression	2023-06-01 17:15:33 -04:00
extable.c	context_tracking: Take NMI eqs entrypoints over RCU	2022-07-05 13:32:59 -07:00
fail_function.c	kernel/fail_function: fix memory leak with using debugfs_lookup()	2023-02-08 13:36:22 +01:00
fork.c	kernel/fork: beware of __put_task_struct() calling context	2023-07-13 15:21:48 +02:00
freezer.c	freezer,sched: Rewrite core freezer logic	2022-09-07 21:53:50 +02:00
gen_kheaders.sh	Revert "kheaders: substituting --sort in archive creation"	2023-05-28 16:20:21 +09:00
groups.c	security: Add LSM hook to setgroups() syscall	2022-07-15 18:21:49 +00:00
hung_task.c	kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static	2023-04-08 13:45:37 -07:00
iomem.c
irq_work.c	trace: Add trace_ipi_send_cpu()	2023-03-24 11:01:29 +01:00
jump_label.c	jump_label: Prevent key->enabled int overflow	2022-12-01 15:53:05 -08:00
kallsyms.c	kallsyms: Change func signature for cleanup_symbol_name()	2023-08-25 15:00:36 -07:00
kallsyms_internal.h	kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[]	2022-11-12 18:47:36 -08:00
kallsyms_selftest.c	kallsyms: Fix kallsyms_selftest failure	2023-08-25 10:44:20 -07:00
kallsyms_selftest.h	kallsyms: Add self-test facility	2022-11-15 00:42:02 -08:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c	kcov: add prototypes for helper functions	2023-06-09 17:44:17 -07:00
kexec.c	kexec: introduce sysctl parameters kexec_load_limit_*	2023-02-02 22:50:05 -08:00
kexec_core.c	kexec: enable kexec_crash_size to support two crash kernel regions	2023-06-09 17:44:24 -07:00
kexec_elf.c
kexec_file.c	- Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in	2023-06-28 10:59:38 -07:00
kexec_internal.h	panic, kexec: make __crash_kexec() NMI safe	2022-09-11 21:55:06 -07:00
kheaders.c	kheaders: Use array declaration instead of char	2023-03-24 20:10:59 -07:00
kprobes.c	kprobes: Prohibit probing on CFI preamble symbol	2023-07-29 23:32:26 +09:00
ksyms_common.c	kallsyms: make kallsyms_show_value() as generic function	2023-06-08 12:27:20 -07:00
ksysfs.c	kernel/ksysfs.c: use sysfs_emit for sysfs show handlers	2023-03-24 17:09:14 +01:00
kthread.c	- Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in	2023-06-28 10:59:38 -07:00
latencytop.c	latencytop: use the last element of latency_record of system	2022-09-11 21:55:12 -07:00
Makefile	v6.5-rc1-modules-next	2023-06-28 15:51:08 -07:00
module_signature.c
notifier.c	notifiers: add tracepoints to the notifiers infrastructure	2023-04-08 13:45:38 -07:00
nsproxy.c	nsproxy: Convert nsproxy.count to refcount_t	2023-08-21 11:29:12 -07:00
padata.c	padata: use alignment when calculating the number of worker threads	2023-03-14 17:06:44 +08:00
panic.c	panic: hide unused global functions	2023-06-09 17:44:15 -07:00
params.c	kallsyms: Replace all non-returning strlcpy with strscpy	2023-06-14 12:27:38 -07:00
pid.c	pid: use struct_size_t() helper	2023-07-01 08:26:23 -07:00
pid_namespace.c	pid: use struct_size_t() helper	2023-07-01 08:26:23 -07:00
pid_sysctl.h	kernel: pid_namespace: remove unused set_memfd_noexec_scope()	2023-06-19 16:19:28 -07:00
profile.c	kernel/profile.c: simplify duplicated code in profile_setup()	2022-09-11 21:55:12 -07:00
ptrace.c	ptrace: Provide set/get interface for syscall user dispatch	2023-04-16 14:23:07 +02:00
range.c
reboot.c	kernel/reboot: Add SYS_OFF_MODE_RESTART_PREPARE mode	2022-10-04 15:59:36 +02:00
regset.c
relay.c	relayfs: fix out-of-bounds access in relay_file_read	2023-05-02 17:23:27 -07:00
resource.c	dax/kmem: Fix leak of memory-hotplug resources	2023-02-17 14:58:01 -08:00
resource_kunit.c
rseq.c	rseq: Extend struct rseq with per-memory-map concurrency ID	2022-12-27 12:52:12 +01:00
scftorture.c	scftorture: Pause testing after memory-allocation failure	2023-07-14 15:02:57 -07:00
scs.c	scs: add support for dynamic shadow call stacks	2022-11-09 18:06:35 +00:00
seccomp.c	seccomp: Add missing kerndoc notations	2023-08-17 12:32:15 -07:00
signal.c	mm: suppress mm fault logging if fatal signal already pending	2023-07-26 10:51:59 -07:00
smp.c	smp: Reduce NMI traffic from CSD waiters to CSD destination	2023-07-10 14:19:04 -07:00
smpboot.c	cpu/hotplug: Remove unused state functions	2023-05-15 13:45:00 +02:00
smpboot.h
softirq.c	sched/core: introduce sched_core_idle_cpu()	2023-07-13 15:21:50 +02:00
stackleak.c	stackleak: allow to specify arch specific stackleak poison function	2023-04-20 11:36:35 +02:00
stacktrace.c
static_call.c
static_call_inline.c	static_call: Add call depth tracking support	2022-10-17 16:41:16 +02:00
stop_machine.c	Scheduler changes in this cycle were:	2022-05-24 11:11:13 -07:00
sys.c	prctl: move PR_GET_AUXV out of PR_MCE_KILL	2023-07-17 12:53:21 -07:00
sys_ni.c	asm-generic updates for 6.5	2023-07-06 10:06:04 -07:00
sysctl-test.c	kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred}	2022-09-08 16:56:45 -07:00
sysctl.c	v6.5-rc1-sysctl-next	2023-06-28 16:05:21 -07:00
task_work.c	task_work: use try_cmpxchg in task_work_add, task_work_cancel_match and task_work_run	2022-09-11 21:55:10 -07:00
taskstats.c	genetlink: start to validate reserved header bytes	2022-08-29 12:47:15 +01:00
torture.c	torture: Stop right-shifting torture_random() return values	2023-08-14 15:01:08 -07:00
tracepoint.c	tracepoint: Allow livepatch module add trace event	2023-02-18 14:34:36 -05:00
tsacct.c
ucount.c	ucounts: Split rlimit and ucount values and max values	2022-05-18 18:24:57 -05:00
uid16.c
uid16.h
umh.c	sysctl: fix unused proc_cap_handler() function warning	2023-06-29 15:19:43 -07:00
up.c
user-return-notifier.c
user.c	kernel/user: Allow user_struct::locked_vm to be usable for iommufd	2022-11-30 20:16:49 -04:00
user_namespace.c	userns: fix a struct's kernel-doc notation	2023-02-02 22:50:04 -08:00
usermode_driver.c	blob_to_mnt(): kern_unmount() is needed to undo kern_mount()	2022-05-19 23:25:47 -04:00
utsname.c
utsname_sysctl.c	utsname: simplify one-level sysctl registration for uts_kern_table	2023-04-13 11:49:35 -07:00
vhost_task.c	vhost: Fix worker hangs due to missed wake up calls	2023-06-08 15:43:09 -04:00
watch_queue.c	watch_queue: prevent dangling pipe pointer	2023-06-06 10:47:04 +02:00
watchdog.c	watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64	2023-06-19 16:25:29 -07:00
watchdog_buddy.c	watchdog/hardlockup: move SMP barriers from common code to buddy code	2023-06-19 16:25:28 -07:00
watchdog_perf.c	watchdog/perf: add a weak function for an arch to detect if perf can use NMIs	2023-06-09 17:44:21 -07:00
workqueue.c	workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000	2023-07-25 11:49:57 -10:00
workqueue_internal.h	workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE	2023-05-17 17:02:08 -10:00