linux-stable/kernel
Neeraj Upadhyay 4f2bfd9494 srcu: Make expedited RCU grace periods block even less frequently
The purpose of commit 282d8998e9 ("srcu: Prevent expedited GPs
and blocking readers from consuming CPU") was to prevent a long
series of never-blocking expedited SRCU grace periods from blocking
kernel-live-patching (KLP) progress.  Although it was successful, it also
resulted in excessive boot times on certain embedded workloads running
under qemu with the "-bios QEMU_EFI.fd" command line.  Here "excessive"
means increasing the boot time up into the three-to-four minute range.
This increase in boot time was due to the more than 6000 back-to-back
invocations of synchronize_rcu_expedited() within the KVM host OS, which
in turn resulted from qemu's emulation of a long series of MMIO accesses.

Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace
periods") did not significantly help this particular use case.

Zhangfei Gao and Shameerali Kolothum Thodi did experiments varying the
value of SRCU_MAX_NODELAY_PHASE with HZ=250 and with various values
of non-sleeping per phase counts on a system with preemption enabled,
and observed the following boot times:

+──────────────────────────+────────────────+
| SRCU_MAX_NODELAY_PHASE   | Boot time (s)  |
+──────────────────────────+────────────────+
| 100                      | 30.053         |
| 150                      | 25.151         |
| 200                      | 20.704         |
| 250                      | 15.748         |
| 500                      | 11.401         |
| 1000                     | 11.443         |
| 10000                    | 11.258         |
| 1000000                  | 11.154         |
+──────────────────────────+────────────────+

Analysis on the experiment results show additional improvements with
CPU-bound delays approaching one jiffy in duration. This improvement was
also seen when number of per-phase iterations were scaled to one jiffy.

This commit therefore scales per-grace-period phase number of non-sleeping
polls so that non-sleeping polls extend for about one jiffy. In addition,
the delay-calculation call to srcu_get_delay() in srcu_gp_end() is
replaced with a simple check for an expedited grace period.  This change
schedules callback invocation immediately after expedited grace periods
complete, which results in greatly improved boot times.  Testing done
by Marc and Zhangfei confirms that this change recovers most of the
performance degradation in boottime; for CONFIG_HZ_250 configuration,
specifically, boot times improve from 3m50s to 41s on Marc's setup;
and from 2m40s to ~9.7s on Zhangfei's setup.

In addition to the changes to default per phase delays, this
change adds 3 new kernel parameters - srcutree.srcu_max_nodelay,
srcutree.srcu_max_nodelay_phase, and srcutree.srcu_retry_check_delay.
This allows users to configure the srcu grace period scanning delays in
order to more quickly react to additional use cases.

Fixes: 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace periods")
Fixes: 282d8998e9 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Reported-by: yueluck <yueluck@163.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-07-19 11:39:59 -07:00
..
bpf bpf: Fix calling global functions from BPF_PROG_TYPE_EXT programs 2022-06-07 10:41:20 -07:00
cgroup Merge branch 'for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2022-05-25 11:47:25 -07:00
configs x86/configs: Add x86 debugging Kconfig fragment plus docs 2022-04-06 19:56:29 +02:00
debug Modules updates for v5.19-rc1 2022-05-26 17:13:43 -07:00
dma swiotlb: fix setting ->force_bounce 2022-06-02 07:17:59 +02:00
entry * Fix syzkaller NULL pointer dereference 2022-06-08 09:16:31 -07:00
events Two small perf updates: 2022-06-05 10:40:31 -07:00
futex drm for 5.19-rc1 2022-05-25 16:18:27 -07:00
gcov gcov: Remove compiler version check 2021-12-02 17:25:21 +09:00
irq genirq: PM: Use runtime PM for chained interrupts 2022-06-09 15:58:13 +01:00
kcsan linux-kselftest-kunit-5.19-rc1 2022-05-25 11:32:53 -07:00
livepatch Livepatching changes for 5.19 2022-06-02 08:55:01 -07:00
locking locking/lockdep: Use sched_clock() for random numbers 2022-06-13 10:29:57 +02:00
module module: Fix prefix for module.sig_enforce module param 2022-06-02 12:44:33 -07:00
power cxl for 5.19 2022-05-27 21:24:19 -07:00
printk printk: Wait for the global console lock when the system is going down 2022-06-15 22:04:15 +02:00
rcu srcu: Make expedited RCU grace periods block even less frequently 2022-07-19 11:39:59 -07:00
sched sched: Fix balance_push() vs __sched_setscheduler() 2022-06-13 10:15:07 +02:00
time While looking at the ptrace problems with PREEMPT_RT and the problems 2022-06-03 16:13:25 -07:00
trace Networking fixes for 5.19-rc2, including fixes from bpf and netfilter. 2022-06-09 12:06:52 -07:00
.gitignore
acct.c kernel/acct: move acct sysctls to its own file 2022-04-06 13:43:44 -07:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-03 11:20:34 -08:00
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2022-01-25 13:22:51 -05:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-02-22 13:51:40 -05:00
audit_fsnotify.c fsnotify: make allow_dups a property of the group 2022-04-25 14:37:18 +02:00
audit_tree.c audit: use fsnotify group lock helpers 2022-04-25 14:37:28 +02:00
audit_watch.c fsnotify: pass flags argument to fsnotify_alloc_group() 2022-04-25 14:37:12 +02:00
auditfilter.c audit/stable-5.17 PR 20220110 2022-01-11 13:08:21 -08:00
auditsc.c audit: free module name 2022-06-15 19:28:44 -04:00
backtracetest.c
bounds.c
capability.c xfs: don't generate selinux audit messages for capability testing 2022-03-09 10:32:06 -08:00
cfi.c cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle 2022-06-13 09:18:46 -07:00
compat.c
configs.c
context_tracking.c
cpu.c Intel Trust Domain Extensions 2022-05-23 17:51:12 -07:00
cpu_pm.c
crash_core.c Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
crash_dump.c
cred.c x86: Mark __invalid_creds() __noreturn 2022-03-15 10:32:44 +01:00
delayacct.c delayacct: track delays from write-protect copy 2022-06-01 15:55:25 -07:00
dma.c
exec_domain.c
exit.c ptrace: Cleanups for v5.18 2022-03-28 17:29:53 -07:00
extable.c lkdtm: Really write into kernel text in WRITE_KERN 2022-02-16 23:25:12 +11:00
fail_function.c
fork.c This set of changes updates init and user mode helper tasks to be 2022-06-03 16:03:05 -07:00
freezer.c
gen_kheaders.sh kheaders: Have cpio unconditionally replace files 2022-05-08 03:16:59 +09:00
groups.c
hung_task.c Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
iomem.c
irq_work.c irq_work: use kasan_record_aux_stack_noalloc() record callstack 2022-04-15 14:49:55 -07:00
jump_label.c
kallsyms.c ftrace: Add ftrace_lookup_symbols function 2022-05-10 14:42:06 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels" 2022-03-31 10:36:55 +02:00
kcov.c kcov: update pos before writing pc in trace function 2022-05-25 13:05:42 -07:00
kexec.c
kexec_core.c Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
kexec_elf.c
kexec_file.c RISC-V Patches for the 5.19 Merge Window, Part 1 2022-05-31 14:10:54 -07:00
kexec_internal.h
kheaders.c
kmod.c
kprobes.c tracing updates for 5.19: 2022-05-29 10:31:36 -07:00
ksysfs.c kernel/ksysfs.c: use helper macro __ATTR_RW 2022-03-23 19:00:33 -07:00
kthread.c kthread: unexport kthread_blkcg 2022-05-02 14:06:20 -06:00
latencytop.c latencytop: move sysctl to its own file 2022-04-21 11:40:59 -07:00
Makefile kernel: add platform_has() infrastructure 2022-06-06 08:06:00 +02:00
module_signature.c
notifier.c notifier: Add blocking/atomic_notifier_chain_register_unique_prio() 2022-05-19 19:30:30 +02:00
nsproxy.c
padata.c padata: replace cpumask_weight with cpumask_empty in padata.c 2022-01-31 11:21:46 +11:00
panic.c Merge branch 'rework/kthreads' into for-linus 2022-06-17 16:36:48 +02:00
params.c kobject: remove kset from struct kset_uevent_ops callbacks 2021-12-28 11:26:18 +01:00
pid.c
pid_namespace.c kernel: pid_namespace: use NULL instead of using plain integer as pointer 2022-04-29 14:38:00 -07:00
platform-feature.c kernel: add platform_has() infrastructure 2022-06-06 08:06:00 +02:00
profile.c exit: Remove profile_handoff_task 2022-01-08 12:43:57 -06:00
ptrace.c While looking at the ptrace problems with PREEMPT_RT and the problems 2022-06-03 16:13:25 -07:00
range.c
reboot.c printk fixes for 5.19-rc3 2022-06-17 14:57:42 -05:00
regset.c
relay.c relay: remove redundant assignment to pointer buf 2022-05-12 20:38:37 -07:00
resource.c kernel/resource: fix kfree() of bootmem memory again 2022-03-23 19:00:35 -07:00
resource_kunit.c
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-02-02 13:11:34 +01:00
scftorture.c scftorture: Fix distribution of short handler delays 2022-04-11 17:07:29 -07:00
scs.c kasan, vmalloc: only tag normal vmalloc allocations 2022-03-24 19:06:48 -07:00
seccomp.c seccomp: Add wait_killable semantic to seccomp user notifier 2022-05-03 14:11:58 -07:00
signal.c While looking at the ptrace problems with PREEMPT_RT and the problems 2022-06-03 16:13:25 -07:00
smp.c Scheduler changes in this cycle were: 2022-05-24 11:11:13 -07:00
smpboot.c cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again. 2022-04-12 14:13:01 +02:00
smpboot.h
softirq.c smp: Make softirq handling RT safe in flush_smp_call_function_queue() 2022-05-01 10:03:43 +02:00
stackleak.c stackleak: add on/off stack variants 2022-05-08 01:33:09 -07:00
stacktrace.c uaccess: remove CONFIG_SET_FS 2022-02-25 09:36:06 +01:00
static_call.c static_call: Don't make __static_call_return0 static 2022-04-05 09:59:38 +02:00
static_call_inline.c static_call: Don't make __static_call_return0 static 2022-04-05 09:59:38 +02:00
stop_machine.c Scheduler changes in this cycle were: 2022-05-24 11:11:13 -07:00
sys.c arm64/sme: Implement vector length configuration prctl()s 2022-04-22 18:50:54 +01:00
sys_ni.c mm/mempolicy: wire up syscall set_mempolicy_home_node 2022-01-15 16:30:30 +02:00
sysctl-test.c
sysctl.c sysctl changes for v5.19-rc1 2022-05-26 16:57:20 -07:00
task_work.c task_work: allow TWA_SIGNAL without a rescheduling IPI 2022-04-30 08:39:32 -06:00
taskstats.c kernel: make taskstats available from all net namespaces 2022-04-29 14:38:03 -07:00
torture.c torture: Wake up kthreads after storing task_struct pointer 2022-02-01 17:24:39 -08:00
tracepoint.c
tsacct.c taskstats: version 12 with thread group and exe info 2022-04-29 14:38:03 -07:00
ucount.c ucounts: Handle wrapping in is_ucounts_overlimit 2022-02-17 09:11:57 -06:00
uid16.c
uid16.h
umh.c kthread: Don't allocate kthread_struct for init and umh 2022-05-06 14:49:44 -05:00
up.c
user-return-notifier.c
user.c
user_namespace.c ucounts: Fix systemd LimitNPROC with private users regression 2022-02-25 10:40:14 -06:00
usermode_driver.c blob_to_mnt(): kern_unmount() is needed to undo kern_mount() 2022-05-19 23:25:47 -04:00
utsname.c
utsname_sysctl.c
watch_queue.c watch_queue: Free the page array when watch_queue is dismantled 2022-04-02 10:37:39 -07:00
watchdog.c Not a lot of material this cycle. Many singleton patches against various 2022-05-27 11:22:03 -07:00
watchdog_hld.c printk: add functions to prefer direct printing 2022-04-22 21:30:58 +02:00
workqueue.c workqueue: Wrap flush_workqueue() using a macro 2022-06-07 07:07:14 -10:00
workqueue_internal.h