linux-stable/kernel/trace
Steven Rostedt (Google) bddd8b50bf ring-buffer: Fix slowpath of interrupted event
[ Upstream commit b803d7c664 ]

To synchronize the timestamps with the ring buffer reservation, there are
two timestamps that are saved in the buffer meta data.

1. before_stamp
2. write_stamp

When the two are equal, the write_stamp is considered valid, as in, it may
be used to calculate the delta of the next event as the write_stamp is the
timestamp of the previous reserved event on the buffer.

This is done by the following:

 /*A*/	w = current position on the ring buffer
	before = before_stamp
	after = write_stamp
	ts = read current timestamp

	if (before != after) {
		write_stamp is not valid, force adding an absolute
		timestamp.
	}

 /*B*/	before_stamp = ts

 /*C*/	write = local_add_return(event length, position on ring buffer)

	if (w == write - event length) {
		/* Nothing interrupted between A and C */
 /*E*/		write_stamp = ts;
		delta = ts - after
		/*
		 * If nothing interrupted again,
		 * before_stamp == write_stamp and write_stamp
		 * can be used to calculate the delta for
		 * events that come in after this one.
		 */
	} else {

		/*
		 * The slow path!
		 * Was interrupted between A and C.
		 */

This is the place that there's a bug. We currently have:

		after = write_stamp
		ts = read current timestamp

 /*F*/		if (write == current position on the ring buffer &&
		    after < ts && cmpxchg(write_stamp, after, ts)) {

			delta = ts - after;

		} else {
			delta = 0;
		}

The assumption is that if the current position on the ring buffer hasn't
moved between C and F, then it also was not interrupted, and that the last
event written has a timestamp that matches the write_stamp. That is the
write_stamp is valid.

But this may not be the case:

If a task context event was interrupted by softirq between B and C.

And the softirq wrote an event that got interrupted by a hard irq between
C and E.

and the hard irq wrote an event (does not need to be interrupted)

We have:

 /*B*/ before_stamp = ts of normal context

   ---> interrupted by softirq

	/*B*/ before_stamp = ts of softirq context

	  ---> interrupted by hardirq

		/*B*/ before_stamp = ts of hard irq context
		/*E*/ write_stamp = ts of hard irq context

		/* matches and write_stamp valid */
	  <----

	/*E*/ write_stamp = ts of softirq context

	/* No longer matches before_stamp, write_stamp is not valid! */

   <---

 w != write - length, go to slow path

// Right now the order of events in the ring buffer is:
//
// |-- softirq event --|-- hard irq event --|-- normal context event --|
//

 after = write_stamp (this is the ts of softirq)
 ts = read current timestamp

 if (write == current position on the ring buffer [true] &&
     after < ts [true] && cmpxchg(write_stamp, after, ts) [true]) {

	delta = ts - after  [Wrong!]

The delta is to be between the hard irq event and the normal context
event, but the above logic made the delta between the softirq event and
the normal context event, where the hard irq event is between the two. This
will shift all the remaining event timestamps on the sub-buffer
incorrectly.

The write_stamp is only valid if it matches the before_stamp. The cmpxchg
does nothing to help this.

Instead, the following logic can be done to fix this:

	before = before_stamp
	ts = read current timestamp
	before_stamp = ts

	after = write_stamp

	if (write == current position on the ring buffer &&
	    after == before && after < ts) {

		delta = ts - after

	} else {
		delta = 0;
	}

The above will only use the write_stamp if it still matches before_stamp
and was tested to not have changed since C.

As a bonus, with this logic we do not need any 64-bit cmpxchg() at all!

This means the 32-bit rb_time_t workaround can finally be removed. But
that's for a later time.

Link: https://lore.kernel.org/linux-trace-kernel/20231218175229.58ec3daf@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20231218230712.3a76b081@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: dd93942570 ("ring-buffer: Do not try to put back write_stamp")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01 12:42:44 +00:00
..
rv tracing/tools: Updates for 6.4 2023-04-28 16:11:26 -07:00
blktrace.c block: remove more NULL checks after bdev_get_queue() 2023-02-21 09:23:22 -07:00
bpf_trace.c bpf: Fix uprobe_multi get_pid_task error path 2023-09-15 10:32:40 -07:00
bpf_trace.h
error_report-traces.c
fgraph.c tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
fprobe.c fprobe: Fix to ensure the number of active retprobes is not zero 2023-10-17 10:22:42 +09:00
ftrace.c ftrace: Use LIST_HEAD to initialize clear_hash 2023-09-01 21:18:38 -04:00
ftrace_internal.h tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
Kconfig Probes updates for v6.5: 2023-06-30 10:44:53 -07:00
kprobe_event_gen_test.c tracing: Fix wrong return in kprobe_event_gen_test.c 2023-03-19 12:20:48 -04:00
Makefile tracing/probes: Move finding func-proto API and getting func-param API to trace_btf 2023-08-23 09:39:45 +09:00
pid_list.c tracing: Cleanup double word in comment 2022-04-26 17:58:50 -04:00
pid_list.h
power-traces.c
preemptirq_delay_test.c
rethook.c rethook: Use __rcu pointer for rethook::handler 2023-12-13 18:45:19 +01:00
ring_buffer.c ring-buffer: Fix slowpath of interrupted event 2024-01-01 12:42:44 +00:00
ring_buffer_benchmark.c ring_buffer: Remove unused "event" parameter 2022-11-23 19:08:30 -05:00
rpm-traces.c
synth_event_gen_test.c tracing: Always use canonical ftrace path 2023-02-18 14:34:09 -05:00
trace.c tracing: Update snapshot buffer on resize if it is allocated 2023-12-20 17:02:05 +01:00
trace.h tracing: Have trace_event_file have ref counters 2023-11-08 11:56:21 +01:00
trace_benchmark.c tracing: Add numeric delta time to the trace event benchmark 2022-09-26 13:01:09 -04:00
trace_benchmark.h tracing: Add numeric delta time to the trace event benchmark 2022-09-26 13:01:09 -04:00
trace_boot.c tracing/boot: Test strscpy() against less than zero for error 2023-07-05 10:30:49 -04:00
trace_branch.c
trace_btf.c tracing/probes: Add a function to search a member of a struct/union 2023-08-23 09:40:16 +09:00
trace_btf.h tracing/probes: Add a function to search a member of a struct/union 2023-08-23 09:40:16 +09:00
trace_clock.c
trace_dynevent.c tracing: Free buffers when a used dynamic event is removed 2022-11-23 19:07:12 -05:00
trace_dynevent.h
trace_entries.h tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure 2023-07-30 18:11:44 -04:00
trace_eprobe.c tracing/probes: Support BTF argument on module functions 2023-08-23 09:39:15 +09:00
trace_event_perf.c tracing/perf: Use strndup_user instead of kzalloc/strncpy_from_user 2022-11-23 19:08:31 -05:00
trace_events.c tracing: Have trace_event_file have ref counters 2023-11-08 11:56:21 +01:00
trace_events_filter.c tracing: Have trace_event_file have ref counters 2023-11-08 11:56:21 +01:00
trace_events_filter_test.h
trace_events_hist.c tracing/histograms: Return an error if we fail to add histogram to hist_vars list 2023-07-23 11:18:52 -04:00
trace_events_inject.c tracing: Have event inject files inc the trace array ref count 2023-09-07 16:38:54 -04:00
trace_events_synth.c tracing: Have the user copy of synthetic event address use correct context 2023-11-28 17:20:05 +00:00
trace_events_trigger.c tracing: Fix kernel-doc warnings in trace_events_trigger.c 2023-07-28 19:59:03 -04:00
trace_events_user.c tracing/user_events: Align set_bit() address for all archs 2023-09-30 16:25:41 -04:00
trace_export.c tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure 2023-07-30 18:11:44 -04:00
trace_fprobe.c tracing: fprobe-event: Fix to check tracepoint event and return 2023-11-28 17:20:13 +00:00
trace_functions.c
trace_functions_graph.c function_graph: Support recording and printing the return value of function 2023-06-20 18:38:37 -04:00
trace_hwlat.c tracing: Remove extra space at the end of hwlat_detector/mode 2023-09-01 21:00:00 -04:00
trace_irqsoff.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-17 13:49:37 -04:00
trace_kdb.c
trace_kprobe.c tracing/kprobes: Fix the order of argument descriptions 2023-11-20 11:59:38 +01:00
trace_kprobe_selftest.c tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
trace_kprobe_selftest.h
trace_mmiotrace.c
trace_nop.c
trace_osnoise.c tracing/timerlat: Add user-space interface 2023-06-22 10:39:56 -04:00
trace_output.c Tracing updates for 6.5: 2023-06-30 10:33:17 -07:00
trace_output.h tracing: Add "fields" option to show raw trace event fields 2023-03-29 06:52:08 -04:00
trace_preemptirq.c cpuidle: tracing, preempt: Squash _rcuidle tracing 2023-01-31 15:01:46 +01:00
trace_printk.c
trace_probe.c tracing/probes: Add string type check with BTF 2023-08-23 09:41:13 +09:00
trace_probe.h tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols 2023-10-20 22:10:41 +09:00
trace_probe_kernel.h tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails 2023-07-14 17:04:58 +09:00
trace_probe_tmpl.h tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails 2023-07-14 17:04:58 +09:00
trace_recursion_record.c tracing: Use trace_create_file() to simplify creation of tracefs entries 2022-05-26 21:12:52 -04:00
trace_sched_switch.c sched/tracing: Append prev_state to tp args instead 2022-05-12 00:37:11 +02:00
trace_sched_wakeup.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-17 13:49:37 -04:00
trace_selftest.c tracing: Have function_graph selftest call cond_resched() 2023-05-28 21:15:46 -04:00
trace_selftest_dynamic.c
trace_seq.c tracing: Fix kernel-doc warnings in trace_seq.c 2023-07-28 19:59:04 -04:00
trace_stack.c
trace_stat.c
trace_stat.h
trace_synth.h tracing: Allow synthetic events to pass around stacktraces 2023-01-25 10:31:24 -05:00
trace_syscalls.c tracing: bpf: use struct trace_entry in struct syscall_tp_t 2023-08-01 10:53:28 -07:00
trace_uprobe.c Probes updates for v6.6: 2023-09-02 11:10:50 -07:00
tracing_map.c tracing: Remove unused variable 'dups' 2022-10-03 12:20:31 -04:00
tracing_map.h tracing: Remove unused extern declaration tracing_map_set_field_descr() 2023-07-23 11:08:14 -04:00