linux-stable/kernel/trace
Petr Pavlu c68b7a442e ring-buffer: Fix a race between readers and resize checks
commit c2274b908d upstream.

The reader code in rb_get_reader_page() swaps a new reader page into the
ring buffer by doing cmpxchg on old->list.prev->next to point it to the
new page. Following that, if the operation is successful,
old->list.next->prev gets updated too. This means the underlying
doubly-linked list is temporarily inconsistent, page->prev->next or
page->next->prev might not be equal back to page for some page in the
ring buffer.

The resize operation in ring_buffer_resize() can be invoked in parallel.
It calls rb_check_pages() which can detect the described inconsistency
and stop further tracing:

[  190.271762] ------------[ cut here ]------------
[  190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0
[  190.271789] Modules linked in: [...]
[  190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1
[  190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G            E      6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f
[  190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[  190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0
[  190.272023] Code: [...]
[  190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206
[  190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80
[  190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700
[  190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000
[  190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720
[  190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000
[  190.272053] FS:  00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000
[  190.272057] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0
[  190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  190.272077] Call Trace:
[  190.272098]  <TASK>
[  190.272189]  ring_buffer_resize+0x2ab/0x460
[  190.272199]  __tracing_resize_ring_buffer.part.0+0x23/0xa0
[  190.272206]  tracing_resize_ring_buffer+0x65/0x90
[  190.272216]  tracing_entries_write+0x74/0xc0
[  190.272225]  vfs_write+0xf5/0x420
[  190.272248]  ksys_write+0x67/0xe0
[  190.272256]  do_syscall_64+0x82/0x170
[  190.272363]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  190.272373] RIP: 0033:0x7f1bd657d263
[  190.272381] Code: [...]
[  190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263
[  190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001
[  190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000
[  190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500
[  190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002
[  190.272412]  </TASK>
[  190.272414] ---[ end trace 0000000000000000 ]---

Note that ring_buffer_resize() calls rb_check_pages() only if the parent
trace_buffer has recording disabled. Recent commit d78ab79270
("tracing: Stop current tracer when resizing buffer") causes that it is
now always the case which makes it more likely to experience this issue.

The window to hit this race is nonetheless very small. To help
reproducing it, one can add a delay loop in rb_get_reader_page():

 ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
 if (!ret)
 	goto spin;
 for (unsigned i = 0; i < 1U << 26; i++)  /* inserted delay loop */
 	__asm__ __volatile__ ("" : : : "memory");
 rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;

.. and then run the following commands on the target system:

 echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable
 while true; do
 	echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
 	echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
 done &
 while true; do
 	for i in /sys/kernel/tracing/per_cpu/*; do
 		timeout 0.1 cat $i/trace_pipe; sleep 0.2
 	done
 done

To fix the problem, make sure ring_buffer_resize() doesn't invoke
rb_check_pages() concurrently with a reader operating on the same
ring_buffer_per_cpu by taking its cpu_buffer->reader_lock.

Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavlu@suse.com

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 659f451ff2 ("ring-buffer: Add integrity check at end of iter read")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
[ Fixed whitespace ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16 13:28:30 +02:00
..
Kconfig tracing/kprobes: Do the notrace functions check without kprobes on ftrace 2021-01-19 18:26:12 +01:00
Makefile
blktrace.c blktrace: Fix output non-blktrace event when blk_classic option enabled 2023-01-18 11:41:13 +01:00
bpf_trace.c bpf: Clear the probe_addr for uprobe 2023-09-23 10:59:41 +02:00
fgraph.c fgraph: Initialize tracing_graph_pause at task creation 2021-02-10 09:25:29 +01:00
ftrace.c ftrace: Fix possible warning on checking all pages used in ftrace_process_locs() 2023-08-11 11:53:46 +02:00
ftrace_internal.h
power-traces.c
preemptirq_delay_test.c
ring_buffer.c ring-buffer: Fix a race between readers and resize checks 2024-06-16 13:28:30 +02:00
ring_buffer_benchmark.c
rpm-traces.c
trace.c tracing: Inform kmemleak of saved_cmdlines allocation 2024-02-23 08:25:13 +01:00
trace.h tracing: Have trace_event_file have ref counters 2023-11-28 16:50:22 +00:00
trace_benchmark.c
trace_benchmark.h
trace_branch.c
trace_clock.c tracing: Do no increment trace_clock_global() by one 2021-06-23 14:41:28 +02:00
trace_dynevent.c tracing: Free buffers when a used dynamic event is removed 2022-12-08 11:23:04 +01:00
trace_dynevent.h
trace_entries.h tracing: Set kernel_stack's caller size properly 2020-10-01 13:17:29 +02:00
trace_event_perf.c tracing: Show size of requested perf buffer 2024-05-02 16:18:35 +02:00
trace_events.c tracing: Have trace_event_file have ref counters 2023-11-28 16:50:22 +00:00
trace_events_filter.c tracing: Have trace_event_file have ref counters 2023-11-28 16:50:22 +00:00
trace_events_filter_test.h
trace_events_hist.c tracing/histograms: Return an error if we fail to add histogram to hist_vars list 2023-07-27 08:37:45 +02:00
trace_events_trigger.c Revert "tracing/trigger: Fix to return error if failed to alloc snapshot" 2024-05-02 16:18:30 +02:00
trace_export.c
trace_functions.c tracing: Have all levels of checks prevent recursion 2021-10-27 09:54:29 +02:00
trace_functions_graph.c
trace_hwlat.c tracing: Remove WARN_ON in start_thread() 2020-12-08 10:40:28 +01:00
trace_irqsoff.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-30 16:27:23 +02:00
trace_kdb.c
trace_kprobe.c tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs 2023-08-30 16:27:14 +02:00
trace_kprobe_selftest.c
trace_kprobe_selftest.h
trace_mmiotrace.c
trace_nop.c
trace_output.c tracing: Add size check when printing trace_marker output 2024-01-25 14:34:20 -08:00
trace_output.h
trace_preemptirq.c tracing: hold caller_addr to hardirq_{enable,disable}_ip 2022-09-28 11:03:57 +02:00
trace_printk.c
trace_probe.c tracing/probes: Have kprobes and uprobes use $COMM too 2022-08-25 11:18:39 +02:00
trace_probe.h tracing/probe: trace_probe_primary_from_call(): checked list_first_entry 2023-06-09 10:29:02 +02:00
trace_probe_tmpl.h tracing/probes: Fix to update dynamic data counter if fetcharg uses it 2023-08-30 16:27:14 +02:00
trace_sched_switch.c tracing: Fix sched switch start/stop refcount racy updates 2020-02-11 04:35:07 -08:00
trace_sched_wakeup.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-30 16:27:23 +02:00
trace_selftest.c ftrace: Handle tracing when switching between context 2020-11-10 12:37:28 +01:00
trace_selftest_dynamic.c
trace_seq.c
trace_stack.c tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined 2020-01-14 20:08:22 +01:00
trace_stat.c tracing: Fix very unlikely race of registering two stat tracers 2020-02-24 08:36:30 +01:00
trace_stat.h
trace_syscalls.c
trace_uprobe.c bpf: Clear the probe_addr for uprobe 2023-09-23 10:59:41 +02:00
tracing_map.c tracing: Ensure visibility when inserting an element into tracing_map 2024-02-23 08:24:50 +01:00
tracing_map.h