linux-stable/kernel
Chen Lin 02e52d7daa ring-buffer: Do not swap cpu_buffer during resize process
[ Upstream commit 8a96c0288d ]

When ring_buffer_swap_cpu was called during resize process,
the cpu buffer was swapped in the middle, resulting in incorrect state.
Continuing to run in the wrong state will result in oops.

This issue can be easily reproduced using the following two scripts:
/tmp # cat test1.sh
//#! /bin/sh
for i in `seq 0 100000`
do
         echo 2000 > /sys/kernel/debug/tracing/buffer_size_kb
         sleep 0.5
         echo 5000 > /sys/kernel/debug/tracing/buffer_size_kb
         sleep 0.5
done
/tmp # cat test2.sh
//#! /bin/sh
for i in `seq 0 100000`
do
        echo irqsoff > /sys/kernel/debug/tracing/current_tracer
        sleep 1
        echo nop > /sys/kernel/debug/tracing/current_tracer
        sleep 1
done
/tmp # ./test1.sh &
/tmp # ./test2.sh &

A typical oops log is as follows, sometimes with other different oops logs.

[  231.711293] WARNING: CPU: 0 PID: 9 at kernel/trace/ring_buffer.c:2026 rb_update_pages+0x378/0x3f8
[  231.713375] Modules linked in:
[  231.714735] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec23f92 #15
[  231.716750] Hardware name: linux,dummy-virt (DT)
[  231.718152] Workqueue: events update_pages_handler
[  231.719714] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  231.721171] pc : rb_update_pages+0x378/0x3f8
[  231.722212] lr : rb_update_pages+0x25c/0x3f8
[  231.723248] sp : ffff800082b9bd50
[  231.724169] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
[  231.726102] x26: 0000000000000001 x25: fffffffffffff010 x24: 0000000000000ff0
[  231.728122] x23: ffff0000c3a0b600 x22: ffff0000c3a0b5c0 x21: fffffffffffffe0a
[  231.730203] x20: ffff0000c3a0b600 x19: ffff0000c0102400 x18: 0000000000000000
[  231.732329] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe7aa8510
[  231.734212] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
[  231.736291] x11: ffff8000826998a8 x10: ffff800082b9baf0 x9 : ffff800081137558
[  231.738195] x8 : fffffc00030e82c8 x7 : 0000000000000000 x6 : 0000000000000001
[  231.740192] x5 : ffff0000ffbafe00 x4 : 0000000000000000 x3 : 0000000000000000
[  231.742118] x2 : 00000000000006aa x1 : 0000000000000001 x0 : ffff0000c0007208
[  231.744196] Call trace:
[  231.744892]  rb_update_pages+0x378/0x3f8
[  231.745893]  update_pages_handler+0x1c/0x38
[  231.746893]  process_one_work+0x1f0/0x468
[  231.747852]  worker_thread+0x54/0x410
[  231.748737]  kthread+0x124/0x138
[  231.749549]  ret_from_fork+0x10/0x20
[  231.750434] ---[ end trace 0000000000000000 ]---
[  233.720486] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  233.721696] Mem abort info:
[  233.721935]   ESR = 0x0000000096000004
[  233.722283]   EC = 0x25: DABT (current EL), IL = 32 bits
[  233.722596]   SET = 0, FnV = 0
[  233.722805]   EA = 0, S1PTW = 0
[  233.723026]   FSC = 0x04: level 0 translation fault
[  233.723458] Data abort info:
[  233.723734]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  233.724176]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  233.724589]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  233.725075] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000104943000
[  233.725592] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[  233.726231] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[  233.726720] Modules linked in:
[  233.727007] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec23f92 #15
[  233.727777] Hardware name: linux,dummy-virt (DT)
[  233.728225] Workqueue: events update_pages_handler
[  233.728655] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  233.729054] pc : rb_update_pages+0x1a8/0x3f8
[  233.729334] lr : rb_update_pages+0x154/0x3f8
[  233.729592] sp : ffff800082b9bd50
[  233.729792] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
[  233.730220] x26: 0000000000000000 x25: ffff800082a8b840 x24: ffff0000c0102418
[  233.730653] x23: 0000000000000000 x22: fffffc000304c880 x21: 0000000000000003
[  233.731105] x20: 00000000000001f4 x19: ffff0000c0102400 x18: ffff800082fcbc58
[  233.731727] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000001
[  233.732282] x14: ffff8000825fe0c8 x13: 0000000000000001 x12: 0000000000000000
[  233.732709] x11: ffff8000826998a8 x10: 0000000000000ae0 x9 : ffff8000801b760c
[  233.733148] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000c03298c0
[  233.733553] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000
[  233.733972] x2 : ffff0000c3a0b600 x1 : 0000000000000000 x0 : 0000000000000000
[  233.734418] Call trace:
[  233.734593]  rb_update_pages+0x1a8/0x3f8
[  233.734853]  update_pages_handler+0x1c/0x38
[  233.735148]  process_one_work+0x1f0/0x468
[  233.735525]  worker_thread+0x54/0x410
[  233.735852]  kthread+0x124/0x138
[  233.736064]  ret_from_fork+0x10/0x20
[  233.736387] Code: 92400000 910006b5 aa000021 aa0303f7 (f9400060)
[  233.736959] ---[ end trace 0000000000000000 ]---

After analysis, the seq of the error is as follows [1-5]:

int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
			int cpu_id)
{
	for_each_buffer_cpu(buffer, cpu) {
		cpu_buffer = buffer->buffers[cpu];
		//1. get cpu_buffer, aka cpu_buffer(A)
		...
		...
		schedule_work_on(cpu,
		 &cpu_buffer->update_pages_work);
		//2. 'update_pages_work' is queue on 'cpu', cpu_buffer(A) is passed to
		// update_pages_handler, do the update process, set 'update_done' in
		// complete(&cpu_buffer->update_done) and to wakeup resize process.
	//---->
		//3. Just at this moment, ring_buffer_swap_cpu is triggered,
		//cpu_buffer(A) be swaped to cpu_buffer(B), the max_buffer.
		//ring_buffer_swap_cpu is called as the 'Call trace' below.

		Call trace:
		 dump_backtrace+0x0/0x2f8
		 show_stack+0x18/0x28
		 dump_stack+0x12c/0x188
		 ring_buffer_swap_cpu+0x2f8/0x328
		 update_max_tr_single+0x180/0x210
		 check_critical_timing+0x2b4/0x2c8
		 tracer_hardirqs_on+0x1c0/0x200
		 trace_hardirqs_on+0xec/0x378
		 el0_svc_common+0x64/0x260
		 do_el0_svc+0x90/0xf8
		 el0_svc+0x20/0x30
		 el0_sync_handler+0xb0/0xb8
		 el0_sync+0x180/0x1c0
	//<----

	/* wait for all the updates to complete */
	for_each_buffer_cpu(buffer, cpu) {
		cpu_buffer = buffer->buffers[cpu];
		//4. get cpu_buffer, cpu_buffer(B) is used in the following process,
		//the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong.
		//for example, cpu_buffer(A)->update_done will leave be set 1, and will
		//not 'wait_for_completion' at the next resize round.
		  if (!cpu_buffer->nr_pages_to_update)
			continue;

		if (cpu_online(cpu))
			wait_for_completion(&cpu_buffer->update_done);
		cpu_buffer->nr_pages_to_update = 0;
	}
	...
}
	//5. the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong,
	//Continuing to run in the wrong state, then oops occurs.

Link: https://lore.kernel.org/linux-trace-kernel/202307191558478409990@zte.com.cn

Signed-off-by: Chen Lin <chen.lin5@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23 17:32:35 +02:00
..
bpf bpf, cpumap: Handle skb as well when clean up ptr_ring 2023-08-11 12:14:15 +02:00
cgroup sched/psi: use kernfs polling functions for PSI trigger polling 2023-07-27 08:56:53 +02:00
configs Char/Misc drivers for 6.4-rc1 2023-04-27 12:07:50 -07:00
debug kdb: use srcu console list iterator 2022-12-02 11:25:00 +01:00
dma dma-remap: use kvmalloc_array/kvfree for larger dma memory remap 2023-08-23 17:32:27 +02:00
entry ptrace: Provide set/get interface for syscall user dispatch 2023-04-16 14:23:07 +02:00
events perf/core: Fix perf_sample_data not properly initialized for different swevents in perf_tp_event() 2023-05-08 10:58:26 +02:00
futex - Prevent the leaking of a debug timer in futex_waitv() 2023-01-01 11:15:05 -08:00
gcov gcov: add support for checksum field 2022-12-21 14:31:52 -08:00
irq xen: branch for v6.4-rc4 2023-05-27 09:42:56 -07:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-07-19 16:36:13 +02:00
livepatch Scheduler changes for v6.4: 2023-04-28 14:53:30 -07:00
locking locking/rtmutex: Fix task->pi_waiters integrity 2023-08-03 10:26:09 +02:00
module module/decompress: Fix error checking on zstd decompression 2023-06-01 14:36:46 -07:00
power PM: QoS: Restore support for default value on frequency QoS 2023-07-23 13:54:11 +02:00
printk - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
rcu rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp 2023-07-27 08:56:48 +02:00
sched sched/psi: use kernfs polling functions for PSI trigger polling 2023-07-27 08:56:53 +02:00
time posix-timers: Ensure timer ID search-loop limit is valid 2023-07-27 08:56:45 +02:00
trace ring-buffer: Do not swap cpu_buffer during resize process 2023-08-23 17:32:35 +02:00
.gitignore
acct.c acct: fix potential integer overflow in encode_comp_t() 2022-11-30 16:13:18 -08:00
async.c
audit.c audit: use time_after to compare time 2022-08-29 19:47:03 -04:00
audit.h audit: remove selinux_audit_rule_update() declaration 2022-09-07 11:30:15 -04:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-22 18:50:06 -04:00
audit_tree.c
audit_watch.c audit_init_parent(): constify path 2022-09-01 17:39:30 -04:00
auditfilter.c
auditsc.c capability: just use a 'u64' instead of a 'u32[2]' array 2023-03-01 10:01:22 -08:00
backtracetest.c
bounds.c mm: multi-gen LRU: minimal implementation 2022-09-26 19:46:09 -07:00
capability.c capability: just use a 'u64' instead of a 'u32[2]' array 2023-03-01 10:01:22 -08:00
cfi.c cfi: Switch to -fsanitize=kcfi 2022-09-26 10:13:13 -07:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-03-14 19:32:38 -07:00
configs.c
context_tracking.c context_tracking: Fix noinstr vs KASAN 2023-01-13 11:48:18 +01:00
cpu.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
cpu_pm.c cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}() 2023-01-13 11:48:15 +01:00
crash_core.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
crash_dump.c
cred.c cred: Do not default to init_cred in prepare_kernel_cred() 2022-11-01 10:04:52 -07:00
delayacct.c delayacct: track delays from IRQ/SOFTIRQ 2023-04-18 16:39:34 -07:00
dma.c
exec_domain.c
exit.c fork, vhost: Use CLONE_THREAD to fix freezer/ps regression 2023-06-01 17:15:33 -04:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-02-08 13:36:22 +01:00
fork.c fork: lock VMAs of the parent process when forking 2023-07-11 06:31:05 +02:00
freezer.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
gen_kheaders.sh kheaders: use standard naming for the temporary directory 2023-01-22 23:43:34 +09:00
groups.c
hung_task.c kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static 2023-04-08 13:45:37 -07:00
iomem.c
irq_work.c trace: Add trace_ipi_send_cpu() 2023-03-24 11:01:29 +01:00
jump_label.c jump_label: Prevent key->enabled int overflow 2022-12-01 15:53:05 -08:00
kallsyms.c kallsyms: strip LTO-only suffixes from promoted global functions 2023-07-27 08:56:54 +02:00
kallsyms_internal.h kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] 2022-11-12 18:47:36 -08:00
kallsyms_selftest.c kallsyms: Delete an unused parameter related to {module_}kallsyms_on_each_symbol() 2023-03-19 13:27:19 -07:00
kallsyms_selftest.h kallsyms: Add self-test facility 2022-11-15 00:42:02 -08:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c mm: replace vma->vm_flags direct modifications with modifier calls 2023-02-09 16:51:39 -08:00
kexec.c kexec: introduce sysctl parameters kexec_load_limit_* 2023-02-02 22:50:05 -08:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-19 16:35:28 +02:00
kexec_elf.c
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-12 11:31:50 -07:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kheaders.c kheaders: Use array declaration instead of char 2023-03-24 20:10:59 -07:00
kprobes.c x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range 2023-02-21 08:49:16 +09:00
ksysfs.c kernel/ksysfs.c: use sysfs_emit for sysfs show handlers 2023-03-24 17:09:14 +01:00
kthread.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
latencytop.c latencytop: use the last element of latency_record of system 2022-09-11 21:55:12 -07:00
Makefile modules-6.4-rc1 2023-04-27 16:36:55 -07:00
module_signature.c
notifier.c notifiers: add tracepoints to the notifiers infrastructure 2023-04-08 13:45:38 -07:00
nsproxy.c convert setns(2) to fdget()/fdput() 2023-04-20 22:55:35 -04:00
padata.c padata: use alignment when calculating the number of worker threads 2023-03-14 17:06:44 +08:00
panic.c cpu: Mark nmi_panic_self_stop() __noreturn 2023-04-14 17:31:26 +02:00
params.c module: make module_ktype structure constant 2023-03-09 12:55:15 -08:00
pid.c pid: add pidfd_prepare() 2023-04-03 11:16:56 +02:00
pid_namespace.c kernel: pid_namespace: simplify sysctls with register_sysctl() 2023-05-02 19:23:29 -07:00
pid_sysctl.h kernel: pid_namespace: simplify sysctls with register_sysctl() 2023-05-02 19:23:29 -07:00
profile.c kernel/profile.c: simplify duplicated code in profile_setup() 2022-09-11 21:55:12 -07:00
ptrace.c ptrace: Provide set/get interface for syscall user dispatch 2023-04-16 14:23:07 +02:00
range.c
reboot.c kernel/reboot: Add SYS_OFF_MODE_RESTART_PREPARE mode 2022-10-04 15:59:36 +02:00
regset.c
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-02 17:23:27 -07:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-02-17 14:58:01 -08:00
resource_kunit.c
rseq.c rseq: Extend struct rseq with per-memory-map concurrency ID 2022-12-27 12:52:12 +01:00
scftorture.c
scs.c scs: add support for dynamic shadow call stacks 2022-11-09 18:06:35 +00:00
seccomp.c seccomp: simplify sysctls with register_sysctl_init() 2023-04-13 11:49:20 -07:00
signal.c mm: suppress mm fault logging if fatal signal already pending 2023-08-03 10:25:53 +02:00
smp.c trace,smp: Trace all smp_function_call*() invocations 2023-03-24 11:01:30 +01:00
smpboot.c smpboot: use atomic_try_cmpxchg in cpu_wait_death and cpu_report_death 2022-09-11 21:55:10 -07:00
smpboot.h
softirq.c softirq: Add trace points for tasklet entry/exit 2023-04-15 10:17:16 +02:00
stackleak.c stackleak: allow to specify arch specific stackleak poison function 2023-04-20 11:36:35 +02:00
stacktrace.c
static_call.c
static_call_inline.c static_call: Add call depth tracking support 2022-10-17 16:41:16 +02:00
stop_machine.c
sys.c prctl: move PR_GET_AUXV out of PR_MCE_KILL 2023-07-27 08:56:33 +02:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-20 15:17:45 -07:00
sysctl-test.c kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred} 2022-09-08 16:56:45 -07:00
sysctl.c mm: compaction: move compaction sysctl to its own file 2023-04-13 11:49:35 -07:00
task_work.c task_work: use try_cmpxchg in task_work_add, task_work_cancel_match and task_work_run 2022-09-11 21:55:10 -07:00
taskstats.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
torture.c torture: Fix hang during kthread shutdown phase 2023-01-05 12:10:35 -08:00
tracepoint.c tracepoint: Allow livepatch module add trace event 2023-02-18 14:34:36 -05:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c umh: simplify the capability pointer logic 2023-03-03 16:18:19 -08:00
up.c
user-return-notifier.c
user.c kernel/user: Allow user_struct::locked_vm to be usable for iommufd 2022-11-30 20:16:49 -04:00
user_namespace.c userns: fix a struct's kernel-doc notation 2023-02-02 22:50:04 -08:00
usermode_driver.c
utsname.c
utsname_sysctl.c utsname: simplify one-level sysctl registration for uts_kern_table 2023-04-13 11:49:35 -07:00
vhost_task.c vhost: Fix worker hangs due to missed wake up calls 2023-06-08 15:43:09 -04:00
watch_queue.c watch_queue: prevent dangling pipe pointer 2023-07-19 16:36:53 +02:00
watchdog.c watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails 2023-07-19 16:35:33 +02:00
watchdog_hld.c watchdog/hardlockup: rename some "NMI watchdog" constants/function 2023-07-19 16:35:32 +02:00
workqueue.c workqueue: clean up WORK_* constant types, clarify masking 2023-06-23 12:08:14 -07:00
workqueue_internal.h