linux-stable/kernel
Steven Rostedt (Google) 047aa3dc0a tracing: Fix wasted memory in saved_cmdlines logic
commit 44dc5c41b5 upstream.

While looking at improving the saved_cmdlines cache I found a huge amount
of wasted memory that should be used for the cmdlines.

The tracing data saves pids during the trace. At sched switch, if a trace
occurred, it will save the comm of the task that did the trace. This is
saved in a "cache" that maps pids to comms and exposed to user space via
the /sys/kernel/tracing/saved_cmdlines file. Currently it only caches by
default 128 comms.

The structure that uses this creates an array to store the pids using
PID_MAX_DEFAULT (which is usually set to 32768). This causes the structure
to be of the size of 131104 bytes on 64 bit machines.

In hex: 131104 = 0x20020, and since the kernel allocates generic memory in
powers of two, the kernel would allocate 0x40000 or 262144 bytes to store
this structure. That leaves 131040 bytes of wasted space.

Worse, the structure points to an allocated array to store the comm names,
which is 16 bytes times the amount of names to save (currently 128), which
is 2048 bytes. Instead of allocating a separate array, make the structure
end with a variable length string and use the extra space for that.

This is similar to a recommendation that Linus had made about eventfs_inode names:

  https://lore.kernel.org/all/20240130190355.11486-5-torvalds@linux-foundation.org/

Instead of allocating a separate string array to hold the saved comms,
have the structure end with: char saved_cmdlines[]; and round up to the
next power of two over sizeof(struct saved_cmdline_buffers) + num_cmdlines * TASK_COMM_LEN
It will use this extra space for the saved_cmdline portion.

Now, instead of saving only 128 comms by default, by using this wasted
space at the end of the structure it can save over 8000 comms and even
saves space by removing the need for allocating the other array.

Link: https://lore.kernel.org/linux-trace-kernel/20240209063622.1f7b6d5f@rorschach.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Mete Durlu <meted@linux.ibm.com>
Fixes: 939c7a4f04 ("tracing: Introduce saved_cmdlines_size file")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23 08:55:06 +01:00
..
bpf bpf: Set uattr->batch.count as zero before batched update or deletion 2024-02-23 08:54:42 +01:00
cgroup cgroup: Remove duplicates in cgroup v1 tasks file 2023-10-19 23:05:37 +02:00
configs
debug kdb: Fix a potential buffer overflow in kdb_local() 2024-01-25 14:52:54 -08:00
dma dma-mapping: clear dev->dma_mem to NULL after freeing it 2024-01-25 14:52:38 -08:00
entry entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up 2023-03-30 12:47:51 +02:00
events perf: Fix the nr_addr_filters fix 2024-02-23 08:54:52 +01:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:08:13 +01:00
gcov gcov: add support for checksum field 2022-12-31 13:14:47 +01:00
irq genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware 2023-11-28 16:56:30 +00:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-07-23 13:47:12 +02:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-11-20 11:08:25 +01:00
locking lockdep: Fix block chain corruption 2023-12-03 07:31:23 +01:00
power PM: hibernate: Enforce ordering during image compression/decompression 2024-02-23 08:54:23 +01:00
printk printk: Consolidate console deferred printing 2023-09-23 11:09:59 +02:00
rcu rcu: Avoid tracing a few functions executed in stop machine 2023-12-08 08:48:02 +01:00
sched sched: Fix stop_one_cpu_nowait() vs hotplug 2023-11-20 11:08:13 +01:00
time clocksource: Skip watchdog check for large watchdog intervals 2024-02-23 08:55:01 +01:00
trace tracing: Fix wasted memory in saved_cmdlines logic 2024-02-23 08:55:06 +01:00
.gitignore
acct.c acct: fix potential integer overflow in encode_comp_t() 2022-12-31 13:14:40 +01:00
async.c async: Introduce async_schedule_dev_nocall() 2024-02-23 08:54:25 +01:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-23 08:54:37 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:23:06 +02:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-31 17:16:33 +02:00
audit_tree.c
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-28 16:56:27 +00:00
auditfilter.c
auditsc.c audit: fix possible soft lockup in __audit_inode_child() 2023-09-19 12:22:39 +02:00
backtracetest.c
bounds.c
capability.c
cfi.c cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle 2022-06-22 14:22:04 +02:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-05 11:24:53 +02:00
configs.c
context_tracking.c
cpu.c hrtimers: Push pending hrtimers away from outgoing CPU earlier 2023-12-13 18:36:31 +01:00
cpu_pm.c
crash_core.c kernel/crash_core: suppress unknown crashkernel parameter warning 2021-12-29 12:28:49 +01:00
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 15:17:37 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-01 08:27:22 +01:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 13:57:38 +01:00
fork.c kernel/fork: beware of __put_task_struct() calling context 2023-09-23 11:09:55 +02:00
freezer.c
gen_kheaders.sh
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c kallsyms: Make kallsyms_on_each_symbol generally available 2023-12-13 18:36:45 +01:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c
kexec.c kernel: kexec: copy user-array safely 2023-11-28 16:56:16 +00:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-23 13:46:52 +02:00
kexec_elf.c
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 15:59:14 +02:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:13:57 +02:00
kheaders.c kheaders: Use array declaration instead of char 2023-05-11 23:00:17 +09:00
kmod.c
kprobes.c kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list 2024-01-25 14:52:31 -08:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2023-04-20 12:13:57 +02:00
kthread.c kthread: add the helper function kthread_run_on_cpu() 2023-03-30 12:47:42 +02:00
latencytop.c
Makefile futex: Move to kernel/futex/ 2022-12-31 13:14:04 +01:00
module-internal.h
module.c kallsyms: Make module_kallsyms_on_each_symbol generally available 2024-01-15 18:51:26 +01:00
module_signature.c
module_signing.c
notifier.c
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c crypto: pcrypt - Fix hungtask for PADATA_RESET 2023-11-28 16:56:18 +00:00
panic.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-01 08:27:22 +01:00
params.c
pid.c
pid_namespace.c rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() 2023-03-10 09:39:09 +01:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-17 14:24:04 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-09 10:22:29 +02:00
range.c
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 16:56:31 +00:00
regset.c
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-11 23:00:18 +09:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-03-10 09:40:08 +01:00
resource_kunit.c
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:23:10 +02:00
scftorture.c scftorture: Forgive memory-allocation failure if KASAN 2023-09-23 11:09:55 +02:00
scs.c scs: Release kasan vmalloc poison in scs_free process 2021-11-18 19:16:29 +01:00
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:56:38 +01:00
signal.c signal handling: don't use BUG_ON() for debugging 2022-07-21 21:24:42 +02:00
smp.c locking/csd_lock: Change csdlock_debug from early_param to __setup 2022-08-17 14:24:24 +02:00
smpboot.c
smpboot.h
softirq.c timers/nohz: Last resort update jiffies on nohz_full IRQ entry 2023-08-16 18:22:04 +02:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:03:07 +01:00
stacktrace.c stacktrace: move filter_irq_stacks() to kernel/stacktrace.c 2022-04-13 20:59:28 +02:00
static_call.c static_call: Don't make __static_call_return0 static 2022-04-13 20:59:28 +02:00
static_call_inline.c static_call: Don't make __static_call_return0 static 2022-04-13 20:59:28 +02:00
stop_machine.c
sys.c kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() 2023-04-26 13:51:52 +02:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-31 17:16:33 +02:00
sysctl-test.c
sysctl.c kernel/panic: move panic sysctls to its own file 2023-02-01 08:27:20 +01:00
task_work.c
taskstats.c
test_kprobes.c
torture.c torture: Fix hang during kthread shutdown phase 2023-08-30 16:18:19 +02:00
tracepoint.c
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 11:05:35 +01:00
ucount.c ucounts: Handle wrapping in is_ucounts_overlimit 2022-02-23 12:03:20 +01:00
uid16.c
uid16.h
umh.c
up.c
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
user_namespace.c ucounts: Fix systemd LimitNPROC with private users regression 2022-03-08 19:12:42 +01:00
usermode_driver.c
utsname.c
utsname_sysctl.c
watch_queue.c kernel: watch_queue: copy user-array safely 2023-11-28 16:56:16 +00:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 16:56:28 +00:00
watchdog_hld.c watchdog/perf: more properly prevent false positives with turbo modes 2023-07-23 13:46:52 +02:00
workqueue.c Revert "workqueue: remove unused cancel_work()" 2023-12-08 08:48:03 +01:00
workqueue_internal.h