linux-stable/kernel/trace
Petr Pavlu 595363182f ring-buffer: Fix a race between readers and resize checks
commit c2274b908d upstream.

The reader code in rb_get_reader_page() swaps a new reader page into the
ring buffer by doing cmpxchg on old->list.prev->next to point it to the
new page. Following that, if the operation is successful,
old->list.next->prev gets updated too. This means the underlying
doubly-linked list is temporarily inconsistent, page->prev->next or
page->next->prev might not be equal back to page for some page in the
ring buffer.

The resize operation in ring_buffer_resize() can be invoked in parallel.
It calls rb_check_pages() which can detect the described inconsistency
and stop further tracing:

[  190.271762] ------------[ cut here ]------------
[  190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0
[  190.271789] Modules linked in: [...]
[  190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1
[  190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G            E      6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f
[  190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[  190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0
[  190.272023] Code: [...]
[  190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206
[  190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80
[  190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700
[  190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000
[  190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720
[  190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000
[  190.272053] FS:  00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000
[  190.272057] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0
[  190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  190.272077] Call Trace:
[  190.272098]  <TASK>
[  190.272189]  ring_buffer_resize+0x2ab/0x460
[  190.272199]  __tracing_resize_ring_buffer.part.0+0x23/0xa0
[  190.272206]  tracing_resize_ring_buffer+0x65/0x90
[  190.272216]  tracing_entries_write+0x74/0xc0
[  190.272225]  vfs_write+0xf5/0x420
[  190.272248]  ksys_write+0x67/0xe0
[  190.272256]  do_syscall_64+0x82/0x170
[  190.272363]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  190.272373] RIP: 0033:0x7f1bd657d263
[  190.272381] Code: [...]
[  190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263
[  190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001
[  190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000
[  190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500
[  190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002
[  190.272412]  </TASK>
[  190.272414] ---[ end trace 0000000000000000 ]---

Note that ring_buffer_resize() calls rb_check_pages() only if the parent
trace_buffer has recording disabled. Recent commit d78ab79270
("tracing: Stop current tracer when resizing buffer") causes that it is
now always the case which makes it more likely to experience this issue.

The window to hit this race is nonetheless very small. To help
reproducing it, one can add a delay loop in rb_get_reader_page():

 ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
 if (!ret)
 	goto spin;
 for (unsigned i = 0; i < 1U << 26; i++)  /* inserted delay loop */
 	__asm__ __volatile__ ("" : : : "memory");
 rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;

.. and then run the following commands on the target system:

 echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable
 while true; do
 	echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
 	echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
 done &
 while true; do
 	for i in /sys/kernel/tracing/per_cpu/*; do
 		timeout 0.1 cat $i/trace_pipe; sleep 0.2
 	done
 done

To fix the problem, make sure ring_buffer_resize() doesn't invoke
rb_check_pages() concurrently with a reader operating on the same
ring_buffer_per_cpu by taking its cpu_buffer->reader_lock.

Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavlu@suse.com

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 659f451ff2 ("ring-buffer: Add integrity check at end of iter read")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
[ Fixed whitespace ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16 13:39:12 +02:00
..
Kconfig tracing: Fix complicated dependency of CONFIG_TRACER_MAX_TRACE 2023-01-12 11:58:55 +01:00
Makefile tracing: Place trace_pid_list logic into abstract functions 2022-07-29 17:25:29 +02:00
blktrace.c trace/blktrace: fix memory leak with using debugfs_lookup() 2023-03-10 09:39:47 +01:00
bpf_trace.c bpf: Remove trace_printk_lock 2024-03-01 13:21:43 +01:00
bpf_trace.h
error_report-traces.c
fgraph.c
ftrace.c ftrace: Fix possible warning on checking all pages used in ftrace_process_locs() 2023-07-23 13:47:56 +02:00
ftrace_internal.h
kprobe_event_gen_test.c tracing: Fix wrong return in kprobe_event_gen_test.c 2023-04-05 11:24:54 +02:00
pid_list.c tracing: Place trace_pid_list logic into abstract functions 2022-07-29 17:25:29 +02:00
pid_list.h tracing: Place trace_pid_list logic into abstract functions 2022-07-29 17:25:29 +02:00
power-traces.c
preemptirq_delay_test.c
ring_buffer.c ring-buffer: Fix a race between readers and resize checks 2024-06-16 13:39:12 +02:00
ring_buffer_benchmark.c
rpm-traces.c
synth_event_gen_test.c tracing / synthetic: Disable events after testing in synth_event_gen_test_init() 2024-01-05 15:13:35 +01:00
trace.c tracing: Use .flush() call to wake up readers 2024-04-10 16:18:45 +02:00
trace.h tracing: Fix uaf issue when open the hist or hist_debug file 2024-01-25 14:52:29 -08:00
trace_benchmark.c
trace_benchmark.h
trace_boot.c tracing: Initialize integer variable to prevent garbage return value 2022-06-09 10:23:21 +02:00
trace_branch.c
trace_clock.c
trace_dynevent.c tracing: Free buffers when a used dynamic event is removed 2022-12-08 11:28:43 +01:00
trace_dynevent.h tracing: Add DYNAMIC flag for dynamic events 2021-08-18 18:10:32 -04:00
trace_entries.h
trace_eprobe.c kernel/trace: Fix cleanup logic of enable_trace_eprobe 2023-07-23 13:47:43 +02:00
trace_event_perf.c tracing: Show size of requested perf buffer 2024-05-02 16:24:47 +02:00
trace_events.c tracing: hide unused ftrace_event_id_fops 2024-04-17 11:15:16 +02:00
trace_events_filter.c tracing: Have trace_event_file have ref counters 2023-11-28 16:56:36 +00:00
trace_events_filter_test.h
trace_events_hist.c tracing: Fix uaf issue when open the hist or hist_debug file 2024-01-25 14:52:29 -08:00
trace_events_inject.c tracing: Have event inject files inc the trace array ref count 2023-10-06 13:18:02 +02:00
trace_events_synth.c tracing: Have the user copy of synthetic event address use correct context 2023-11-28 16:56:31 +00:00
trace_events_trigger.c Revert "tracing/trigger: Fix to return error if failed to alloc snapshot" 2024-04-27 17:05:23 +02:00
trace_export.c
trace_functions.c
trace_functions_graph.c tracing: Disable "other" permission bits in the tracefs files 2021-11-18 19:16:15 +01:00
trace_hwlat.c tracing: Remove extra space at the end of hwlat_detector/mode 2023-09-19 12:22:48 +02:00
trace_irqsoff.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-30 16:18:13 +02:00
trace_kdb.c kdb: Rename members of struct kdbtab_t 2021-07-27 17:05:06 +01:00
trace_kprobe.c tracing/kprobes: Fix symbol counting logic by looking at modules as well 2024-01-15 18:51:26 +01:00
trace_kprobe_selftest.c
trace_kprobe_selftest.h
trace_mmiotrace.c
trace_nop.c
trace_osnoise.c tracing/osnoise: Fix duration type 2022-12-08 11:28:43 +01:00
trace_output.c tracing: Add size check when printing trace_marker output 2024-01-25 14:52:29 -08:00
trace_output.h
trace_preemptirq.c tracing: hold caller_addr to hardirq_{enable,disable}_ip 2022-09-20 12:39:43 +02:00
trace_printk.c tracing: Disable "other" permission bits in the tracefs files 2021-11-18 19:16:15 +01:00
trace_probe.c Revert "tracing: Add "(fault)" name injection to kernel probes" 2023-08-03 10:22:31 +02:00
trace_probe.h tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols 2023-12-23 10:42:00 +01:00
trace_probe_kernel.h tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails 2023-08-03 10:22:31 +02:00
trace_probe_tmpl.h tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails 2023-08-03 10:22:31 +02:00
trace_recursion_record.c tracing: Disable "other" permission bits in the tracefs files 2021-11-18 19:16:15 +01:00
trace_sched_switch.c
trace_sched_wakeup.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-30 16:18:13 +02:00
trace_selftest.c
trace_selftest_dynamic.c
trace_seq.c
trace_stack.c tracing: Disable "other" permission bits in the tracefs files 2021-11-18 19:16:15 +01:00
trace_stat.c tracing: Disable "other" permission bits in the tracefs files 2021-11-18 19:16:15 +01:00
trace_stat.h
trace_synth.h tracing: Allow synthetic events to pass around stacktraces 2023-08-03 10:22:30 +02:00
trace_syscalls.c tracing: Make tp_printk work on syscall tracepoints 2022-06-14 18:36:14 +02:00
trace_uprobe.c bpf: Clear the probe_addr for uprobe 2023-09-19 12:22:32 +02:00
tracing_map.c tracing: Ensure visibility when inserting an element into tracing_map 2024-02-23 08:54:28 +01:00
tracing_map.h