linux-stable/kernel
Beau Belgrave 7f5a08c79d user_events: Add minimal support for trace_event into ftrace
Minimal support for interacting with dynamic events, trace_event and
ftrace. Core outline of flow between user process, ioctl and trace_event
APIs.

User mode processes that wish to use trace events to get data into
ftrace, perf, eBPF, etc are limited to uprobes today. The user events
features enables an ABI for user mode processes to create and write to
trace events that are isolated from kernel level trace events. This
enables a faster path for tracing from user mode data as well as opens
managed code to participate in trace events, where stub locations are
dynamic.

User processes often want to trace only when it's useful. To enable this
a set of pages are mapped into the user process space that indicate the
current state of the user events that have been registered. User
processes can check if their event is hooked to a trace/probe, and if it
is, emit the event data out via the write() syscall.

Two new files are introduced into tracefs to accomplish this:
user_events_status - This file is mmap'd into participating user mode
processes to indicate event status.

user_events_data - This file is opened and register/delete ioctl's are
issued to create/open/delete trace events that can be used for tracing.

The typical scenario is on process start to mmap user_events_status. Processes
then register the events they plan to use via the REG ioctl. The ioctl reads
and updates the passed in user_reg struct. The status_index of the struct is
used to know the byte in the status page to check for that event. The
write_index of the struct is used to describe that event when writing out to
the fd that was used for the ioctl call. The data must always include this
index first when writing out data for an event. Data can be written either by
write() or by writev().

For example, in memory:
int index;
char data[];

Psuedo code example of typical usage:
struct user_reg reg;

int page_fd = open("user_events_status", O_RDWR);
char *page_data = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, page_fd, 0);
close(page_fd);

int data_fd = open("user_events_data", O_RDWR);

reg.size = sizeof(reg);
reg.name_args = (__u64)"test";

ioctl(data_fd, DIAG_IOCSREG, &reg);
int status_id = reg.status_index;
int write_id = reg.write_index;

struct iovec io[2];
io[0].iov_base = &write_id;
io[0].iov_len = sizeof(write_id);
io[1].iov_base = payload;
io[1].iov_len = sizeof(payload);

if (page_data[status_id])
	writev(data_fd, io, 2);

User events are also exposed via the dynamic_events tracefs file for
both create and delete. Current status is exposed via the user_events_status
tracefs file.

Simple example to register a user event via dynamic_events:
	echo u:test >> dynamic_events
	cat dynamic_events
	u:test

If an event is hooked to a probe, the probe hooked shows up:
	echo 1 > events/user_events/test/enable
	cat user_events_status
	1:test # Used by ftrace

	Active: 1
	Busy: 1
	Max: 4096

If an event is not hooked to a probe, no probe status shows up:
	echo 0 > events/user_events/test/enable
	cat user_events_status
	1:test

	Active: 1
	Busy: 0
	Max: 4096

Users can describe the trace event format via the following format:
	name[:FLAG1[,FLAG2...] [field1[;field2...]]

Each field has the following format:
	type name

Example for char array with a size of 20 named msg:
	echo 'u:detailed char[20] msg' >> dynamic_events
	cat dynamic_events
	u:detailed char[20] msg

Data offsets are based on the data written out via write() and will be
updated to reflect the correct offset in the trace_event fields. For dynamic
data it is recommended to use the new __rel_loc data type. This type will be
the same as __data_loc, but the offset is relative to this entry. This allows
user_events to not worry about what common fields are being inserted before
the data.

The above format is valid for both the ioctl and the dynamic_events file.

Link: https://lkml.kernel.org/r/20220118204326.2169-2-beaub@linux.microsoft.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10 22:36:46 -05:00
..
bpf bpf: Fix ringbuf memory type confusion when passing to helpers 2022-01-19 01:21:46 +01:00
cgroup Merge branch 'for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2022-01-11 09:14:37 -08:00
configs configs: introduce debug.config for CI-like setup 2022-01-20 08:52:55 +02:00
debug kdb: Adopt scheduler's task classification 2021-11-03 17:21:37 +00:00
dma hyperv-next for 5.17 2022-01-16 15:53:00 +02:00
entry entry: Snapshot thread flags 2021-12-01 00:06:43 +01:00
events Peter Zijlstra says: 2022-01-12 16:26:58 -08:00
futex Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
gcov gcov: Remove compiler version check 2021-12-02 17:25:21 +09:00
irq proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
kcsan KCSAN updates for v5.17 2022-01-11 09:51:26 -08:00
livepatch Livepatching changes for 5.17 2022-01-16 10:08:13 +02:00
locking locking/rwlocks: introduce write_lock_nested 2022-01-22 08:33:37 +02:00
power PM: hibernate: Allow ACPI hardware signature to be honoured 2021-12-08 16:06:10 +01:00
printk printk: fix build warning when CONFIG_PRINTK=n 2022-01-22 08:33:36 +02:00
rcu Merge branch 'akpm' (patches from Andrew) 2022-01-15 20:37:06 +02:00
sched Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
time bitmap patches for 5.17-rc1 2022-01-23 06:20:44 +02:00
trace user_events: Add minimal support for trace_event into ftrace 2022-02-10 22:36:46 -05:00
.gitignore .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
acct.c kernel: remove spurious blkdev.h includes 2021-10-18 06:17:01 -06:00
async.c kernel/async.c: remove async_unregister_domain() 2021-05-07 00:26:33 -07:00
audit.c audit/stable-5.17 PR 20220110 2022-01-11 13:08:21 -08:00
audit.h audit/stable-5.16 PR 20211101 2021-11-01 21:17:39 -07:00
audit_fsnotify.c fsnotify: clarify contract for create event hooks 2021-10-27 12:32:34 +02:00
audit_tree.c audit: use struct_size() helper in kmalloc() 2021-12-14 17:39:42 -05:00
audit_watch.c \n 2021-11-06 16:43:20 -07:00
auditfilter.c audit/stable-5.17 PR 20220110 2022-01-11 13:08:21 -08:00
auditsc.c lsm: security_task_getsecid_subj() -> security_current_getsecid_subj() 2021-11-22 17:52:47 -05:00
backtracetest.c
bounds.c
capability.c
cfi.c cfi: Use rcu_read_{un}lock_sched_notrace 2021-08-11 13:11:12 -07:00
compat.c arch: remove compat_alloc_user_space 2021-09-08 15:32:35 -07:00
configs.c
context_tracking.c
cpu.c sched/scs: Reset task stack state in bringup_cpu() 2021-11-24 12:20:27 +01:00
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-08-16 18:55:32 +02:00
crash_core.c kernel/crash_core: suppress unknown crashkernel parameter warning 2021-12-25 12:20:55 -08:00
crash_dump.c
cred.c ucounts: In set_cred_ucounts assume new->ucounts is non-NULL 2021-10-20 10:45:34 -05:00
delayacct.c delayacct: track delays from memory compact 2022-01-20 08:52:55 +02:00
dma.c
exec_domain.c
exit.c exit: Fix the exit_code for wait_task_zombie 2022-01-08 12:43:57 -06:00
extable.c extable: use is_kernel_text() helper 2021-11-09 10:02:51 -08:00
fail_function.c
fork.c Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
freezer.c sched: Add get_current_state() 2021-06-18 11:43:08 +02:00
gen_kheaders.sh kbuild: clean up ${quiet} checks in shell scripts 2021-05-27 04:01:50 +09:00
groups.c
hung_task.c hung_task: move hung_task sysctl interface to hung_task.c 2022-01-22 08:33:34 +02:00
iomem.c
irq_work.c irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT 2021-10-15 11:25:18 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-05 10:46:20 +02:00
kallsyms.c Livepatching changes for 5.17 2022-01-16 10:08:13 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
Kconfig.preempt preempt: Restore preemption model selection configs 2021-11-11 13:09:33 +01:00
kcov.c kcov: replace local_irq_save() with a local_lock_t 2021-11-09 10:02:52 -08:00
kexec.c kexec: avoid compat_alloc_user_space 2021-09-08 15:32:34 -07:00
kexec_core.c exit: Move oops specific logic from do_exit into make_task_dead 2021-12-13 12:04:45 -06:00
kexec_elf.c
kexec_file.c memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED 2021-11-06 13:30:42 -07:00
kexec_internal.h
kheaders.c
kmod.c modules: add CONFIG_MODPROBE_PATH 2021-05-07 00:26:33 -07:00
kprobes.c kprobe: move sysctl_kprobes_optimization to kprobes.c 2022-01-22 08:33:36 +02:00
ksysfs.c
kthread.c Merge branch 'akpm' (patches from Andrew) 2022-01-20 10:41:01 +02:00
latencytop.c
Makefile module: add in-kernel support for decompressing 2022-01-11 18:45:02 -08:00
module-internal.h module: add in-kernel support for decompressing 2022-01-11 18:45:02 -08:00
module.c Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux 2022-01-17 07:32:51 +02:00
module_decompress.c kernel: Fix spelling mistake "compresser" -> "compressor" 2022-01-13 07:17:47 -08:00
module_signature.c
module_signing.c
notifier.c notifier: Return an error when a callback has already been registered 2021-12-29 10:37:33 +01:00
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c padata: Remove repeated verbose license text 2021-08-27 16:30:18 +08:00
panic.c panic: remove oops_id 2022-01-20 08:52:55 +02:00
params.c kobject: remove kset from struct kset_uevent_ops callbacks 2021-12-28 11:26:18 +01:00
pid.c pid: add pidfd_get_task() helper 2021-10-14 13:29:18 +02:00
pid_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
profile.c exit: Remove profile_handoff_task 2022-01-08 12:43:57 -06:00
ptrace.c ptrace: Remove second setting of PT_SEIZED in ptrace_attach 2022-01-08 12:43:57 -06:00
range.c
reboot.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2021-11-12 11:53:16 -08:00
regset.c
relay.c
resource.c proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
resource_kunit.c
rseq.c KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest 2021-09-22 10:24:01 -04:00
scftorture.c scftorture: Always log error message 2021-12-07 16:36:17 -08:00
scs.c scs: Release kasan vmalloc poison in scs_free process 2021-09-30 09:37:27 +01:00
seccomp.c Merge branch 'exit-cleanups-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-09-01 14:52:05 -07:00
signal.c Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
smp.c sched: Improve wake_up_all_idle_cpus() take #2 2021-10-22 15:32:46 +02:00
smpboot.c smpboot: Replace deprecated CPU-hotplug functions. 2021-08-10 14:57:42 +02:00
smpboot.h
softirq.c timers/nohz: Last resort update jiffies on nohz_full IRQ entry 2021-12-02 15:07:22 +01:00
stackleak.c stackleak: move stack_erasing sysctl to stackleak.c 2022-01-22 08:33:35 +02:00
stacktrace.c stacktrace: move filter_irq_stacks() to kernel/stacktrace.c 2021-11-06 13:30:43 -07:00
static_call.c static_call: Fix static_call_text_reserved() vs __init 2021-07-05 10:46:33 +02:00
stop_machine.c
sys.c Merge branch 'akpm' (patches from Andrew) 2022-01-20 10:41:01 +02:00
sys_ni.c mm/mempolicy: wire up syscall set_mempolicy_home_node 2022-01-15 16:30:30 +02:00
sysctl-test.c kernel/sysctl-test: Remove some casts which are no-longer required 2021-06-23 16:41:24 -06:00
sysctl.c sysctl: returns -EINVAL when a negative value is passed to proc_doulongvec_minmax 2022-01-22 08:33:37 +02:00
task_work.c kasan: record task_work_add() call stack 2021-04-30 11:20:42 -07:00
taskstats.c
torture.c locktorture,rcutorture,torture: Always log error message 2021-12-07 16:36:17 -08:00
tracepoint.c tracepoint: Fix kerneldoc comments 2021-08-16 11:39:51 -04:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-08 12:43:57 -06:00
ucount.c ucounts: Fix rlimit max values check 2021-12-09 15:37:18 -06:00
uid16.c
uid16.h
umh.c kernel/umh.c: fix some spelling mistakes 2021-05-07 00:26:34 -07:00
up.c A set of locking related fixes and updates: 2021-05-09 13:07:03 -07:00
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
user_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
usermode_driver.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
utsname.c
utsname_sysctl.c
watch_queue.c
watchdog.c watchdog: move watchdog sysctl interface to watchdog.c 2022-01-22 08:33:34 +02:00
watchdog_hld.c
workqueue.c Merge branch 'workqueue/for-5.16-fixes' into workqueue/for-5.17 2022-01-10 07:54:04 -10:00
workqueue_internal.h workqueue: Assign a color to barrier work items 2021-08-17 07:49:10 -10:00