linux-stable/kernel
Steven Rostedt (Google) 2d5f12de4c ring-buffer: Only update pages_touched when a new page is touched
commit ffe3986fec upstream.

The "buffer_percent" logic that is used by the ring buffer splice code to
only wake up the tasks when there's no data after the buffer is filled to
the percentage of the "buffer_percent" file is dependent on three
variables that determine the amount of data that is in the ring buffer:

 1) pages_read - incremented whenever a new sub-buffer is consumed
 2) pages_lost - incremented every time a writer overwrites a sub-buffer
 3) pages_touched - incremented when a write goes to a new sub-buffer

The percentage is the calculation of:

  (pages_touched - (pages_lost + pages_read)) / nr_pages

Basically, the amount of data is the total number of sub-bufs that have been
touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
divided by the total count to give the buffer percentage. When the
percentage is greater than the value in the "buffer_percent" file, it
wakes up splice readers waiting for that amount.

It was observed that over time, the amount read from the splice was
constantly decreasing the longer the trace was running. That is, if one
asked for 60%, it would read over 60% when it first starts tracing, but
then it would be woken up at under 60% and would slowly decrease the
amount of data read after being woken up, where the amount becomes much
less than the buffer percent.

This was due to an accounting of the pages_touched incrementation. This
value is incremented whenever a writer transfers to a new sub-buffer. But
the place where it was incremented was incorrect. If a writer overflowed
the current sub-buffer it would go to the next one. If it gets preempted
by an interrupt at that time, and the interrupt performs a trace, it too
will end up going to the next sub-buffer. But only one should increment
the counter. Unfortunately, that was not the case.

Change the cmpxchg() that does the real switch of the tail-page into a
try_cmpxchg(), and on success, perform the increment of pages_touched. This
will only increment the counter once for when the writer moves to a new
sub-buffer, and not when there's a race and is incremented for when a
writer and its preempting writer both move to the same new sub-buffer.

Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 2c2b0a78b3 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-17 11:18:22 +02:00
..
bpf bpf: Protect against int overflow for stack access size 2024-04-10 16:28:23 +02:00
cgroup cgroup_freezer: cgroup_freezing: Check if not frozen 2023-12-13 18:39:19 +01:00
configs Kbuild: add Rust support 2022-09-28 09:02:20 +02:00
debug kdb: Fix a potential buffer overflow in kdb_local() 2024-01-25 15:27:51 -08:00
dma dma-direct: Leak pages on dma_set_decrypted() failure 2024-04-13 13:04:59 +02:00
entry entry: Respect changes to system call number by trace_sys_enter() 2024-04-03 15:19:44 +02:00
events perf: Fix the nr_addr_filters fix 2024-02-05 20:13:00 +00:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:51:50 +01:00
gcov gcov: add support for checksum field 2022-12-31 13:33:11 +01:00
irq genirq/affinity: Move group_cpus_evenly() into lib/ 2024-01-10 17:10:33 +01:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-07-19 16:21:37 +02:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-11-20 11:52:10 +01:00
locking lockdep: Fix block chain corruption 2023-12-03 07:32:09 +01:00
module modules: wait do_free_init correctly 2024-03-26 18:20:52 -04:00
power PM: suspend: Set mem_sleep_current during kernel command line setup 2024-04-03 15:19:28 +02:00
printk printk: Update @console_may_schedule in console_trylock_spinning() 2024-04-03 15:19:44 +02:00
rcu rcu/exp: Handle RCU expedited grace period kworker allocation failure 2024-03-26 18:20:28 -04:00
sched sched/fair: Take the scheduling domain into account in select_idle_core() 2024-03-26 18:20:30 -04:00
time timers: Rename del_timer_sync() to timer_delete_sync() 2024-04-03 15:19:23 +02:00
trace ring-buffer: Only update pages_touched when a new page is touched 2024-04-17 11:18:22 +02:00
.gitignore
acct.c acct: fix potential integer overflow in encode_comp_t() 2022-12-31 13:32:58 +01:00
async.c async: Introduce async_schedule_dev_nocall() 2024-01-31 16:17:00 -08:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-05 20:12:47 +00:00
audit.h audit: remove selinux_audit_rule_update() declaration 2022-09-07 11:30:15 -04:00
audit_fsnotify.c
audit_tree.c
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-28 17:07:08 +00:00
auditfilter.c
auditsc.c audit,io_uring: io_uring openat triggers audit reference count underflow 2023-10-25 12:03:04 +02:00
backtracetest.c
bounds.c bounds: support non-power-of-two CONFIG_NR_CPUS 2024-04-03 15:19:27 +02:00
capability.c
cfi.c cfi: Switch to -fsanitize=kcfi 2022-09-26 10:13:13 -07:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-06 12:10:40 +02:00
configs.c
context_tracking.c context_tracking: Fix noinstr vs KASAN 2023-03-10 09:33:45 +01:00
cpu.c cpu/SMT: Make SMT control more robust against enumeration failures 2024-01-10 17:10:26 +01:00
cpu_pm.c
crash_core.c vmcoreinfo: add kallsyms_num_syms symbol 2022-08-28 14:02:44 -07:00
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 17:00:20 +01:00
delayacct.c delayacct: support re-entrance detection of thrashing accounting 2022-09-26 19:46:07 -07:00
dma.c
exec_domain.c
exit.c exit: Detect and fix irq disabled state in oops 2023-03-10 09:33:45 +01:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 13:55:39 +01:00
fork.c kernel/fork: beware of __put_task_struct() calling context 2023-09-23 11:11:00 +02:00
freezer.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
gen_kheaders.sh kbuild: build init/built-in.a just once 2022-09-29 04:40:15 +09:00
groups.c
hung_task.c sched: Fix more TASK_state comparisons 2022-09-30 16:50:39 +02:00
iomem.c
irq_work.c
jump_label.c
kallsyms.c kallsyms: Add helper kallsyms_on_each_match_symbol() 2023-10-25 12:03:16 +02:00
kallsyms_internal.h kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] 2023-10-25 12:03:16 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: kmsan: unpoison area->list in kcov_remote_area_put() 2022-10-03 14:03:23 -07:00
kexec.c kernel: kexec: copy user-array safely 2023-11-28 17:06:57 +00:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-19 16:21:08 +02:00
kexec_elf.c
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 16:00:55 +02:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kheaders.c kheaders: Use array declaration instead of char 2023-05-11 23:03:02 +09:00
kmod.c
kprobes.c kprobes: consistent rcu api usage for kretprobe holder 2023-12-13 18:39:17 +01:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2022-09-11 21:55:06 -07:00
kthread.c signal: break out of wait loops on kthread_stop() 2022-10-09 16:01:59 -07:00
latencytop.c latencytop: use the last element of latency_record of system 2022-09-11 21:55:12 -07:00
Makefile cfi: Fix CFI failure with KASAN 2022-12-31 13:33:08 +01:00
module_signature.c
notifier.c
nsproxy.c Revert "fs/exec: allow to unshare a time namespace on vfork+exec" 2022-09-13 10:38:43 -07:00
padata.c crypto: pcrypt - Fix hungtask for PADATA_RESET 2023-11-28 17:06:58 +00:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 13:04:54 +02:00
params.c
pid.c
pid_namespace.c rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() 2023-03-10 09:32:52 +01:00
profile.c kernel/profile.c: simplify duplicated code in profile_setup() 2022-09-11 21:55:12 -07:00
ptrace.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
range.c
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 17:07:13 +00:00
regset.c
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-11 23:03:03 +09:00
resource.c PCI: Allow drivers to request exclusive config regions 2023-09-13 09:42:46 +02:00
resource_kunit.c
rseq.c rseq: Use pr_warn_once() when deprecated/unknown ABI flags are encountered 2022-11-14 09:58:32 +01:00
scftorture.c scftorture: Forgive memory-allocation failure if KASAN 2023-09-23 11:11:00 +02:00
scs.c
seccomp.c
signal.c mm: suppress mm fault logging if fatal signal already pending 2023-08-03 10:24:01 +02:00
smp.c smp,csd: Throw an error if a CSD lock is stuck for too long 2023-11-28 17:06:55 +00:00
smpboot.c smpboot: use atomic_try_cmpxchg in cpu_wait_death and cpu_report_death 2022-09-11 21:55:10 -07:00
smpboot.h
softirq.c
stackleak.c
stacktrace.c
static_call.c
static_call_inline.c
stop_machine.c
sys.c getrusage: use sig->stats_lock rather than lock_task_sighand() 2024-03-15 10:48:22 -04:00
sys_ni.c
sysctl-test.c kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred} 2022-09-08 16:56:45 -07:00
sysctl.c proc: proc_skip_spaces() shouldn't think it is working on C strings 2022-12-05 12:09:06 -08:00
task_work.c task_work: use try_cmpxchg in task_work_add, task_work_cancel_match and task_work_run 2022-09-11 21:55:10 -07:00
taskstats.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
torture.c torture: Fix hang during kthread shutdown phase 2023-03-10 09:34:07 +01:00
tracepoint.c tracepoint: Optimize the critical region of mutex_lock in tracepoint_module_coming() 2022-09-26 13:01:18 -04:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c freezer,umh: Fix call_usermode_helper_exec() vs SIGKILL 2023-02-22 12:59:50 +01:00
up.c
user-return-notifier.c
user.c
user_namespace.c ucounts: Split rlimit and ucount values and max values 2022-10-09 16:24:05 -07:00
usermode_driver.c
utsname.c
utsname_sysctl.c kernel/utsname_sysctl.c: Fix hostname polling 2022-10-23 12:01:01 -07:00
watch_queue.c kernel: watch_queue: copy user-array safely 2023-11-28 17:06:57 +00:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 17:07:09 +00:00
watchdog_hld.c watchdog/perf: more properly prevent false positives with turbo modes 2023-07-19 16:21:08 +02:00
workqueue.c workqueue: Provide one lock class key per work_on_cpu() callsite 2023-11-28 17:06:55 +00:00
workqueue_internal.h