Commit Graph

105 Commits

Author SHA1 Message Date
Steven Rostedt (Google) cbc7c29dff tracing: Have trace_event_file have ref counters
commit bb32500fb9 upstream.

The following can crash the kernel:

 # cd /sys/kernel/tracing
 # echo 'p:sched schedule' > kprobe_events
 # exec 5>>events/kprobes/sched/enable
 # > kprobe_events
 # exec 5>&-

The above commands:

 1. Change directory to the tracefs directory
 2. Create a kprobe event (doesn't matter what one)
 3. Open bash file descriptor 5 on the enable file of the kprobe event
 4. Delete the kprobe event (removes the files too)
 5. Close the bash file descriptor 5

The above causes a crash!

 BUG: kernel NULL pointer dereference, address: 0000000000000028
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
 RIP: 0010:tracing_release_file_tr+0xc/0x50

What happens here is that the kprobe event creates a trace_event_file
"file" descriptor that represents the file in tracefs to the event. It
maintains state of the event (is it enabled for the given instance?).
Opening the "enable" file gets a reference to the event "file" descriptor
via the open file descriptor. When the kprobe event is deleted, the file is
also deleted from the tracefs system which also frees the event "file"
descriptor.

But as the tracefs file is still opened by user space, it will not be
totally removed until the final dput() is called on it. But this is not
true with the event "file" descriptor that is already freed. If the user
does a write to or simply closes the file descriptor it will reference the
event "file" descriptor that was just freed, causing a use-after-free bug.

To solve this, add a ref count to the event "file" descriptor as well as a
new flag called "FREED". The "file" will not be freed until the last
reference is released. But the FREE flag will be set when the event is
removed to prevent any more modifications to that event from happening,
even if there's still a reference to the event "file" descriptor.

Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: f5ca233e2e ("tracing: Increase trace array ref count on enable and filter files")
Reported-by: Beau Belgrave <beaub@linux.microsoft.com>
Tested-by: Beau Belgrave <beaub@linux.microsoft.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28 16:56:36 +00:00
Yafang Shao 7b75b4c90a bpf: Clear the probe_addr for uprobe
[ Upstream commit 5125e757e6 ]

To avoid returning uninitialized or random values when querying the file
descriptor (fd) and accessing probe_addr, it is necessary to clear the
variable prior to its use.

Fixes: 41bdc4b40e ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-6-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19 12:22:32 +02:00
Steven Rostedt (Google) 2475de2bc0 tracing: Wake up ring buffer waiters on closing of the file
commit f3ddb74ad0 upstream.

When the file that represents the ring buffer is closed, there may be
waiters waiting on more input from the ring buffer. Call
ring_buffer_wake_waiters() to wake up any waiters when the file is
closed.

Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: e30f53aad2 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-26 12:34:31 +02:00
Steven Rostedt (VMware) 85f4a563c9 tracing: Add length protection to histogram string copies
[ Upstream commit 938aa33f14 ]

The string copies to the histogram storage has a max size of 256 bytes
(defined by MAX_FILTER_STR_VAL). Only the string size of the event field
needs to be copied to the event storage, but no more than what is in the
event storage. Although nothing should be bigger than 256 bytes, there's
no protection against overwriting of the storage if one day there is.

Copy no more than the destination size, and enforce it.

Also had to turn MAX_FILTER_STR_VAL into an unsigned int, to keep the
min() comparison of the string sizes of comparable types.

Link: https://lore.kernel.org/all/CAHk-=wjREUihCGrtRBwfX47y_KrLCGjiq3t6QtoNJpmVrAEb1w@mail.gmail.com/
Link: https://lkml.kernel.org/r/20211114132834.183429a4@rorschach.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 63f84ae6b8 ("tracing/histogram: Do not copy the fixed-size char array field over the field size")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25 09:48:34 +01:00
Linus Torvalds 58ca241587 Tracing updates for 5.15:
- Simplifying the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT
 
  - bootconfig now can start histograms
 
  - bootconfig supports group/all enabling
 
  - histograms now can put values in linear size buckets
 
  - execnames can be passed to synthetic events
 
  - Introduction of "event probes" that attach to other events and
    can retrieve data from pointers of fields, or record fields
    as different types (a pointer to a string as a string instead
    of just a hex number)
 
  - Various fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYTJDixQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnPLAP9XviWrZD27uFj6LU/Vp2umbq8la1aC
 oW8o9itUGpLoHQD+OtsMpQXsWrxoNw/JD1OWCH4J0YN+TnZAUUG2E9e0twA=
 =OZXG
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - simplify the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT

 - bootconfig can now start histograms

 - bootconfig supports group/all enabling

 - histograms now can put values in linear size buckets

 - execnames can be passed to synthetic events

 - introduce "event probes" that attach to other events and can retrieve
   data from pointers of fields, or record fields as different types (a
   pointer to a string as a string instead of just a hex number)

 - various fixes and clean ups

* tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (35 commits)
  tracing/doc: Fix table format in histogram code
  selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes
  selftests/ftrace: Add selftest for testing eprobe events on synthetic events
  selftests/ftrace: Add test case to test adding and removing of event probe
  selftests/ftrace: Fix requirement check of README file
  selftests/ftrace: Add clear_dynamic_events() to test cases
  tracing: Add a probe that attaches to trace events
  tracing/probes: Reject events which have the same name of existing one
  tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs
  tracing/probe: Change traceprobe_set_print_fmt() to take a type
  tracing/probes: Use struct_size() instead of defining custom macros
  tracing/probes: Allow for dot delimiter as well as slash for system names
  tracing/probe: Have traceprobe_parse_probe_arg() take a const arg
  tracing: Have dynamic events have a ref counter
  tracing: Add DYNAMIC flag for dynamic events
  tracing: Replace deprecated CPU-hotplug functions.
  MAINTAINERS: Add an entry for os noise/latency
  tracepoint: Fix kerneldoc comments
  bootconfig/tracing/ktest: Update ktest example for boot-time tracing
  tools/bootconfig: Use per-group/all enable option in ftrace2bconf script
  ...
2021-09-05 11:50:41 -07:00
Tzvetomir Stoyanov (VMware) 7491e2c442 tracing: Add a probe that attaches to trace events
A new dynamic event is introduced: event probe. The event is attached
to an existing tracepoint and uses its fields as arguments. The user
can specify custom format string of the new event, select what tracepoint
arguments will be printed and how to print them.
An event probe is created by writing configuration string in
'dynamic_events' ftrace file:
 e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS]	- Set an event probe
 -:SNAME/ENAME					- Delete an event probe

Where:
 SNAME	- System name, if omitted 'eprobes' is used.
 ENAME	- Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used.
 SYSTEM	- Name of the system, where the tracepoint is defined, mandatory.
 EVENT	- Name of the tracepoint event in SYSTEM, mandatory.
 FETCHARGS - Arguments:
  <name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print
			   it as given TYPE with given name. Supported
			   types are:
	                    (u8/u16/u32/u64/s8/s16/s32/s64), basic type
        	            (x8/x16/x32/x64), hexadecimal types
			    "string", "ustring" and bitfield.

Example, attach an event probe on openat system call and print name of the
file that will be opened:
 echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events
A new dynamic event is created in events/esys/eopen/ directory. It
can be deleted with:
 echo "-:esys/eopen" >> dynamic_events

Filters, triggers and histograms can be attached to the new event, it can
be matched in synthetic events. There is one limitation - an event probe
can not be attached to kprobe, uprobe or another event probe.

Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com
Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-20 14:18:40 -04:00
Steven Rostedt (VMware) 1d18538e6a tracing: Have dynamic events have a ref counter
As dynamic events are not created by modules, if something is attached to
one, calling "try_module_get()" on its "mod" field, is not going to keep
the dynamic event from going away.

Since dynamic events do not need the "mod" pointer of the event structure,
make a union out of it in order to save memory (there's one structure for
each of the thousand+ events in the kernel), and have any event with the
DYNAMIC flag set to use a ref counter instead.

Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/
Link: https://lkml.kernel.org/r/20210817035027.174869074@goodmis.org

Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-18 18:13:47 -04:00
Steven Rostedt (VMware) 8b0e6c744f tracing: Add DYNAMIC flag for dynamic events
To differentiate between static and dynamic events, add a new flag
DYNAMIC to the event flags that all dynamic events have set. This will
allow to differentiate when attaching to a dynamic event from a static
event.

Static events have a mod pointer that references the module they were
created in (or NULL for core kernel). This can be incremented when the
event has something attached to it. But there exists no such mechanism for
dynamic events. This is dangerous as the dynamic events may now disappear
without the "attachment" knowing that it no longer exists.

To enforce the dynamic flag, change dyn_event_add() to pass the event that
is being created such that it can set the DYNAMIC flag of the event. This
helps make sure that no location that creates a dynamic event misses
setting this flag.

Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/
Link: https://lkml.kernel.org/r/20210817035026.936958254@goodmis.org

Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-18 18:10:32 -04:00
Andrii Nakryiko 82e6b1eee6 bpf: Allow to specify user-provided bpf_cookie for BPF perf links
Add ability for users to specify custom u64 value (bpf_cookie) when creating
BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
tracepoints).

This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).

This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use two 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for bpf_cookie, reuse that space through union of
cgroup_storage and new bpf_cookie field.

Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user-provided cookie value from inside a BPF program.
Generic perf_event BPF programs will access this value from perf_event itself
through passed in BPF program context.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org
2021-08-17 00:45:07 +02:00
Andrii Nakryiko b89fbfbb85 bpf: Implement minimal BPF perf link
Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
the common BPF link infrastructure, allowing to list all active perf_event
based attachments, auto-detaching BPF program from perf_event when link's FD
is closed, get generic BPF link fdinfo/get_info functionality.

BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
are currently supported.

Force-detaching and atomic BPF program updates are not yet implemented, but
with perf_event-based BPF links we now have common framework for this without
the need to extend ioctl()-based perf_event interface.

One interesting consideration is a new value for bpf_attach_type, which
BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
define a single BPF_PERF_EVENT attach type for all of them and adjust
link_create()'s logic for checking correspondence between attach type and
program type.

The alternative would be to define three new attach types (e.g., BPF_KPROBE,
BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
libbpf. I chose to not do this to avoid unnecessary proliferation of
bpf_attach_type enum values and not have to deal with naming conflicts.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210815070609.987780-5-andrii@kernel.org
2021-08-17 00:45:07 +02:00
Wan Jiabing 1decdb335c tracing: Remove duplicate struct declaration in trace_events.h
struct trace_array is declared twice. One has been declared
at forward declaration. Remove the duplicate.

Link: https://lkml.kernel.org/r/20210330034056.2266969-1-wanjiabing@vivo.com

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-01 14:18:32 -04:00
Ingo Molnar f2cc020d78 tracing: Fix various typos in comments
Fix ~59 single-word typos in the tracing code comments, and fix
the grammar in a handful of places.

Link: https://lore.kernel.org/r/20210322224546.GA1981273@gmail.com
Link: https://lkml.kernel.org/r/20210323174935.GA4176821@gmail.com

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-23 14:08:18 -04:00
Steven Rostedt (VMware) b47e330231 tracing: Pass buffer of event to trigger operations
The ring_buffer_event_time_stamp() is going to be updated to extract the
time stamp for the event without needing it to be set to have absolute
values for all events. But to do so, it needs the buffer that the event is
on as the buffer saves information for the event before it is committed to
the buffer.

If the trace buffer is disabled, a temporary buffer is used, and there's
no access to this buffer from the current histogram triggers, even though
it is passed to the trace event code.

Pass the buffer that the event is on all the way down to the histogram
triggers.

Link: https://lkml.kernel.org/r/20210316164113.542448131@goodmis.org

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18 12:58:26 -04:00
Steven Rostedt (VMware) f9f344479d tracing: Fix comment about the trace_event_call flags
In the declaration of the struct trace_event_call, the flags has the bits
defined in the comment above it. But these bits are also defined by the
TRACE_EVENT_FL_* enums just above the declaration of the struct. As the
comment about the flags in the struct has become stale and incorrect, just
replace it with a reference to the TRACE_EVENT_FL_* enum above.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04 09:51:50 -05:00
Masami Hiramatsu efbbdaa22b tracing: Show real address for trace event arguments
To help debugging kernel, show real address for trace event arguments
in tracefs/trace{,pipe} instead of hashed pointer value.

Since ftrace human-readable format uses vsprintf(), all %p are
translated to hash values instead of pointer address.

However, when debugging the kernel, raw address value gives a
hint when comparing with the memory mapping in the kernel.
(Those are sometimes used with crash log, which is not hashed too)
So converting %p with %px when calling trace_seq_printf().

Moreover, this is not improving the security because the tracefs
can be used only by root user and the raw address values are readable
from tracefs/percpu/cpu*/trace_pipe_raw file.

Link: https://lkml.kernel.org/r/160277370703.29307.5134475491761971203.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-11 16:31:57 -05:00
Sebastian Andrzej Siewior 5817708493 tracing: Remove NULL check from current in tracing_generic_entry_update().
I can't imagine when or why `current' would return a NULL pointer. This
check was added in commit
      72829bc3d6 ("ftrace: move enums to ftrace.h and make helper function global")

but it doesn't give me hint why it was needed.

Assume `current' never returns a NULL pointer and remove the check.

Link: https://lkml.kernel.org/r/20210125194511.3924915-5-bigeasy@linutronix.de

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02 17:02:06 -05:00
Sebastian Andrzej Siewior 0c02006e6f tracing: Inline tracing_gen_ctx_flags()
Inline tracing_gen_ctx_flags(). This allows to have one ifdef
CONFIG_TRACE_IRQFLAGS_SUPPORT.

This requires to move `trace_flag_type' so tracing_gen_ctx_flags() can
use it.

Link: https://lkml.kernel.org/r/20210125194511.3924915-3-bigeasy@linutronix.de

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20210125140323.6b1ff20c@gandalf.local.home
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02 17:02:06 -05:00
Sebastian Andrzej Siewior 36590c50b2 tracing: Merge irqflags + preempt counter.
The state of the interrupts (irqflags) and the preemption counter are
both passed down to tracing_generic_entry_update(). Only one bit of
irqflags is actually required: The on/off state. The complete 32bit
of the preemption counter isn't needed. Just whether of the upper bits
(softirq, hardirq and NMI) are set and the preemption depth is needed.

The irqflags and the preemption counter could be evaluated early and the
information stored in an integer `trace_ctx'.
tracing_generic_entry_update() would use the upper bits as the
TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth
(considering that one must be substracted from the counter in one
special cases).

The actual preemption value is not used except for the tracing record.
The `irqflags' variable is mostly used only for the tracing record. An
exception here is for instance wakeup_tracer_call() or
probe_wakeup_sched_switch() which explicilty disable interrupts and use
that `irqflags' to save (and restore) the IRQ state and to record the
state.

Struct trace_event_buffer has also the `pc' and flags' members which can
be replaced with `trace_ctx' since their actual value is not used
outside of trace recording.

This will reduce tracing_generic_entry_update() to simply assign values
to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved
to _tracing_gen_ctx_flags() which replaces preempt_count() and
local_save_flags() invocations.

As an example, ftrace_syscall_enter() may invoke:
- trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update()
- event_trigger_unlock_commit()
  -> ftrace_trace_stack() -> … -> tracing_generic_entry_update()
  -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update()

In this case the TRACE_FLAG_* bits were evaluated three times. By using
the `trace_ctx' they are evaluated once and assigned three times.

A build with all tracers enabled on x86-64 with and without the patch:

    text     data      bss      dec      hex    filename
21970669 17084168  7639260 46694097  2c87ed1 vmlinux.old
21970293 17084168  7639260 46693721  2c87d59 vmlinux.new

text shrank by 379 bytes, data remained constant.

Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02 17:02:06 -05:00
Joe Perches 33def8498f treewide: Convert macro and uses of __section(foo) to __section("foo")
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.

Remove the quote operator # from compiler_attributes.h __section macro.

Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.

Conversion done using the script at:

    https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-25 14:51:49 -07:00
Steven Rostedt (VMware) ff895103a8 tracing: Save off entry when peeking at next entry
In order to have the iterator read the buffer even when it's still updating,
it requires that the ring buffer iterator saves each event in a separate
location outside the ring buffer such that its use is immutable.

There's one use case that saves off the event returned from the ring buffer
interator and calls it again to look at the next event, before going back to
use the first event. As the ring buffer iterator will only have a single
copy, this use case will no longer be supported.

Instead, have the one use case create its own buffer to store the first
event when looking at the next event. This way, when looking at the first
event again, it wont be corrupted by the second read.

Link: http://lkml.kernel.org/r/20200317213415.722539921@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-19 17:48:36 -04:00
Linus Torvalds 61a7595403 Various fixes:
- Fix an uninitialized variable
 
  - Fix compile bug to bootconfig userspace tool (in tools directory)
 
  - Suppress some error messages of bootconfig userspace tool
 
  - Remove unneded CONFIG_LIBXBC from bootconfig
 
  - Allocate bootconfig xbc_nodes dynamically.
    To ease complaints about taking up static memory at boot up
 
  - Use of parse_args() to parse bootconfig instead of strstr() usage
    Prevents issues of double quotes containing the interested string
 
  - Fix missing ring_buffer_nest_end() on synthetic event error path
 
  - Return zero not -EINVAL on soft disabled synthetic event
    (soft disabling must be the same as hard disabling, which returns zero)
 
  - Consolidate synthetic event code (remove duplicate code)
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXkMTwRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnJQAQD5aKiD6jx+zGtLsFTuZMcEGvhhUuJ6
 oUaSvhKO3UqezwD/V5avKuuC0wWt//gOWDY+0+4QNjmvn1GLnaNYohs6fg0=
 =NLT9
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Various fixes:

   - Fix an uninitialized variable

   - Fix compile bug to bootconfig userspace tool (in tools directory)

   - Suppress some error messages of bootconfig userspace tool

   - Remove unneded CONFIG_LIBXBC from bootconfig

   - Allocate bootconfig xbc_nodes dynamically. To ease complaints about
     taking up static memory at boot up

   - Use of parse_args() to parse bootconfig instead of strstr() usage
     Prevents issues of double quotes containing the interested string

   - Fix missing ring_buffer_nest_end() on synthetic event error path

   - Return zero not -EINVAL on soft disabled synthetic event (soft
     disabling must be the same as hard disabling, which returns zero)

   - Consolidate synthetic event code (remove duplicate code)"

* tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Consolidate trace() functions
  tracing: Don't return -EINVAL when tracing soft disabled synth events
  tracing: Add missing nest end to synth_event_trace_start() error case
  tools/bootconfig: Suppress non-error messages
  bootconfig: Allocate xbc_nodes array dynamically
  bootconfig: Use parse_args() to find bootconfig and '--'
  tracing/kprobe: Fix uninitialized variable bug
  bootconfig: Remove unneeded CONFIG_LIBXBC
  tools/bootconfig: Fix wrong __VA_ARGS__ usage
2020-02-11 16:39:18 -08:00
Tom Zanussi 7276531d40 tracing: Consolidate trace() functions
Move the checking, buffer reserve and buffer commit code in
synth_event_trace_start/end() into inline functions
__synth_event_trace_start/end() so they can also be used by
synth_event_trace() and synth_event_trace_array(), and then have all
those functions use them.

Also, change synth_event_trace_state.enabled to disabled so it only
needs to be set if the event is disabled, which is not normally the
case.

Link: http://lkml.kernel.org/r/b1f3108d0f450e58192955a300e31d0405ab4149.1581374549.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-10 22:00:21 -05:00
Linus Torvalds e310396bb8 Tracing updates:
- Added new "bootconfig".
    Looks for a file appended to initrd to add boot config options.
    This has been discussed thoroughly at Linux Plumbers.
    Very useful for adding kprobes at bootup.
    Only enabled if "bootconfig" is on the real kernel command line.
 
  - Created dynamic event creation.
    Merges common code between creating synthetic events and
      kprobe events.
 
  - Rename perf "ring_buffer" structure to "perf_buffer"
 
  - Rename ftrace "ring_buffer" structure to "trace_buffer"
    Had to rename existing "trace_buffer" to "array_buffer"
 
  - Allow trace_printk() to work withing (some) tracing code.
 
  - Sort of tracing configs to be a little better organized
 
  - Fixed bug where ftrace_graph hash was not being protected properly
 
  - Various other small fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXjtAURQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qshOAQDzopQmvAVrrI6oogghr8JQA30Z2yqT
 i+Ld7vPWL2MV9wEA1S+zLGDSYrj8f/vsCq6BxRYT1ApO+YtmY6LTXiUejwg=
 =WNds
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - Added new "bootconfig".

   This looks for a file appended to initrd to add boot config options,
   and has been discussed thoroughly at Linux Plumbers.

   Very useful for adding kprobes at bootup.

   Only enabled if "bootconfig" is on the real kernel command line.

 - Created dynamic event creation.

   Merges common code between creating synthetic events and kprobe
   events.

 - Rename perf "ring_buffer" structure to "perf_buffer"

 - Rename ftrace "ring_buffer" structure to "trace_buffer"

   Had to rename existing "trace_buffer" to "array_buffer"

 - Allow trace_printk() to work withing (some) tracing code.

 - Sort of tracing configs to be a little better organized

 - Fixed bug where ftrace_graph hash was not being protected properly

 - Various other small fixes and clean ups

* tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
  bootconfig: Show the number of nodes on boot message
  tools/bootconfig: Show the number of bootconfig nodes
  bootconfig: Add more parse error messages
  bootconfig: Use bootconfig instead of boot config
  ftrace: Protect ftrace_graph_hash with ftrace_sync
  ftrace: Add comment to why rcu_dereference_sched() is open coded
  tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
  tracing: Annotate ftrace_graph_hash pointer with __rcu
  bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
  tracing: Use seq_buf for building dynevent_cmd string
  tracing: Remove useless code in dynevent_arg_pair_add()
  tracing: Remove check_arg() callbacks from dynevent args
  tracing: Consolidate some synth_event_trace code
  tracing: Fix now invalid var_ref_vals assumption in trace action
  tracing: Change trace_boot to use synth_event interface
  tracing: Move tracing selftests to bottom of menu
  tracing: Move mmio tracer config up with the other tracers
  tracing: Move tracing test module configs together
  tracing: Move all function tracing configs together
  tracing: Documentation for in-kernel synthetic event API
  ...
2020-02-06 07:12:11 +00:00
Tom Zanussi 2b90927c77 tracing: Use seq_buf for building dynevent_cmd string
The dynevent_cmd commands that build up the command string don't need
to do that themselves - there's a seq_buf facility that does pretty
much the same thing those command are doing manually, so use it
instead.

Link: http://lkml.kernel.org/r/eb8a6e835c964d0ab8a38cbf5ffa60746b54a465.1580506712.git.zanussi@kernel.org

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-01 13:10:15 -05:00
Tom Zanussi 2a588dd1d5 tracing: Add kprobe event command generation functions
Add functions used to generate kprobe event commands, built on top of
the dynevent_cmd interface.

kprobe_event_gen_cmd_start() is used to create a kprobe event command
using a variable arg list, and kretprobe_event_gen_cmd_start() does
the same for kretprobe event commands.  kprobe_event_add_fields() can
be used to add single fields one by one or as a group.  Once all
desired fields are added, kprobe_event_gen_cmd_end() or
kretprobe_event_gen_cmd_end() respectively are used to actually
execute the command and create the event.

Link: http://lkml.kernel.org/r/95cc4696502bb6017f9126f306a45ad19b4cc14f.1580323897.git.zanussi@kernel.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30 09:46:28 -05:00
Tom Zanussi 8dcc53ad95 tracing: Add synth_event_trace() and related functions
Add an exported function named synth_event_trace(), allowing modules
or other kernel code to trace synthetic events.

Also added are several functions that allow the same functionality to
be broken out in a piecewise fashion, which are useful in situations
where tracing an event from a full array of values would be
cumbersome.  Those functions are synth_event_trace_start/end() and
synth_event_add_(next)_val().

Link: http://lkml.kernel.org/r/7a84de5f1854acf4144b57efe835ca645afa764f.1580323897.git.zanussi@kernel.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30 09:46:28 -05:00
Tom Zanussi 35ca5207c2 tracing: Add synthetic event command generation functions
Add functions used to generate synthetic event commands, built on top
of the dynevent_cmd interface.

synth_event_gen_cmd_start() is used to create a synthetic event
command using a variable arg list and
synth_event_gen_cmd_array_start() does the same thing but using an
array of field descriptors.  synth_event_add_field(),
synth_event_add_field_str() and synth_event_add_fields() can be used
to add single fields one by one or as a group.  Once all desired
fields are added, synth_event_gen_cmd_end() is used to actually
execute the command and create the event.

synth_event_create() does everything, including creating the event, in
a single call.

Link: http://lkml.kernel.org/r/38fef702fad5ef208009f459552f34a94befd860.1580323897.git.zanussi@kernel.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30 09:46:28 -05:00
Tom Zanussi 86c5426bad tracing: Add dynamic event command creation interface
Add an interface used to build up dynamic event creation commands,
such as synthetic and kprobe events.  Interfaces specific to those
particular types of events and others can be built on top of this
interface.

Command creation is started by first using the dynevent_cmd_init()
function to initialize the dynevent_cmd object.  Following that, args
are appended and optionally checked by the dynevent_arg_add() and
dynevent_arg_pair_add() functions, which use objects representing
arguments and pairs of arguments, initialized respectively by
dynevent_arg_init() and dynevent_arg_pair_init().  Finally, once all
args have been successfully added, the command is finalized and
actually created using dynevent_create().

The code here for actually printing into the dyn_event->cmd buffer
using snprintf() etc was adapted from v4 of Masami's 'tracing/boot:
Add synthetic event support' patch.

Link: http://lkml.kernel.org/r/1f65fa44390b6f238f6036777c3784ced1dcc6a0.1580323897.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30 09:46:28 -05:00
Tom Zanussi f5f6b255a2 tracing: Add synth_event_delete()
create_or_delete_synth_event() contains code to delete a synthetic
event, which would be useful on its own - specifically, it would be
useful to allow event-creating modules to call it separately.

Separate out the delete code from that function and create an exported
function named synth_event_delete().

Link: http://lkml.kernel.org/r/050db3b06df7f0a4b8a2922da602d1d879c7c1c2.1580323897.git.zanussi@kernel.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30 09:46:28 -05:00
Tom Zanussi e3e2a2cc9c tracing: Add trace_get/put_event_file()
Add a function to get an event file and prevent it from going away on
module or instance removal.

trace_get_event_file() will find an event file in a given instance (if
instance is NULL, it assumes the top trace array) and return it,
pinning the instance's trace array as well as the event's module, if
applicable, so they won't go away while in use.

trace_put_event_file() does the matching release.

Link: http://lkml.kernel.org/r/bb31ac4bdda168d5ed3c4b5f5a4c8f633e8d9118.1580323897.git.zanussi@kernel.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
[ Moved trace_array_put() to end of trace_put_event_file() ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30 09:46:28 -05:00
Masami Hiramatsu 8cfcf15503 tracing: kprobes: Output kprobe event to printk buffer
Since kprobe-events use event_trigger_unlock_commit_regs() directly,
that events doesn't show up in printk buffer if "tp_printk" is set.

Use trace_event_buffer_commit() in kprobe events so that it can
invoke output_printk() as same as other trace events.

Link: http://lkml.kernel.org/r/157867233085.17873.5210928676787339604.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ Adjusted data var declaration placement in __kretprobe_trace_func() ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:40 -05:00
Steven Rostedt (VMware) 1329249437 tracing: Make struct ring_buffer less ambiguous
As there's two struct ring_buffers in the kernel, it causes some confusion.
The other one being the perf ring buffer. It was agreed upon that as neither
of the ring buffers are generic enough to be used globally, they should be
renamed as:

   perf's ring_buffer -> perf_buffer
   ftrace's ring_buffer -> trace_buffer

This implements the changes to the ring buffer that ftrace uses.

Link: https://lore.kernel.org/r/20191213140531.116b3200@gandalf.local.home

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:38 -05:00
Steven Rostedt (VMware) 1c5eb4481e tracing: Rename trace_buffer to array_buffer
As we are working to remove the generic "ring_buffer" name that is used by
both tracing and perf, the ring_buffer name for tracing will be renamed to
trace_buffer, and perf's ring buffer will be renamed to perf_buffer.

As there already exists a trace_buffer that is used by the trace_arrays, it
needs to be first renamed to array_buffer.

Link: https://lore.kernel.org/r/20191213153553.GE20583@krava

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13 13:19:38 -05:00
Ingo Molnar 2040cf9f59 Linux 5.5-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3tf/0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGlKwH/3fTToujuJfTx5E5
 mrARAP65J1L/DxpEKvKRt2bNZo6w13mNd8g7ZPmYChz90bYGvXQSG8hYTU9iAw3O
 yimSTJlNXDhVAluB53XnDdUxIWC4HUZsNxWJNCeXMuiMcGNsTGX+v3f+x7oHCT0P
 jI1RSIsFGjgr0RWqZ8U5aJckQo2xABC1TfYw53K66Oc/JLZpSFJFwMgjf1fD5diU
 HGDA8E2p0u1TQIyNzr86iqMvnlSRYBQwBQn6OgEKCG4Z0NLtXfDF4mqnxsXgLmIH
 oQoFfxaMKXyGWds7ZxwcGWntALCF41ThfpiJWDIyxjWxFEty4bqTCbDPwwyp7ip0
 iuASmTI=
 =YqO2
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc1' into core/kprobes, to resolve conflicts

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:11:00 +01:00
Peter Zijlstra 04ae87a520 ftrace: Rework event_create_dir()
Rework event_create_dir() to use an array of static data instead of
function pointers where possible.

The problem is that it would call the function pointer on module load
before parse_args(), possibly even before jump_labels were initialized.
Luckily the generated functions don't use jump_labels but it still seems
fragile. It also gets in the way of changing when we make the module map
executable.

The generated function are basically calling trace_define_field() with a
bunch of static arguments. So instead of a function, capture these
arguments in a static array, avoiding the function call.

Now there are a number of cases where the fields are dynamic (syscall
arguments, kprobes and uprobes), in which case a static array does not
work, for these we preserve the function call. Luckily all these cases
are not related to modules and so we can retain the function call for
them.

Also fix up all broken tracepoint definitions that now generate a
compile error.

Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111132458.342979914@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27 07:44:25 +01:00
Divya Indi 2887978714 tracing: Adding new functions for kernel access to Ftrace instances
Adding 2 new functions -
1) struct trace_array *trace_array_get_by_name(const char *name);

Return pointer to a trace array with given name. If it does not exist,
create and return pointer to the new trace array.

2) int trace_array_set_clr_event(struct trace_array *tr,
const char *system ,const char *event, bool enable);

Enable/Disable events to this trace array.

Additionally,
- To handle reference counters, export trace_array_put()
- Due to introduction of the above 2 new functions, we no longer need to
  export - ftrace_set_clr_event & trace_array_create APIs.

Link: http://lkml.kernel.org/r/1574276919-11119-2-git-send-email-divya.indi@oracle.com

Signed-off-by: Divya Indi <divya.indi@oracle.com>
Reviewed-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-22 19:41:08 -05:00
Piotr Maziarz ef56e047b2 tracing: Use seq_buf_hex_dump() to dump buffers
Without this, buffers can be printed with __print_array macro that has
no formatting options and can be hard to read. The other way is to
mimic formatting capability with multiple calls of trace event with one
call per row which gives performance impact and different timestamp in
each row.

Link: http://lkml.kernel.org/r/1573130738-29390-2-git-send-email-piotrx.maziarz@linux.intel.com

Signed-off-by: Piotr Maziarz <piotrx.maziarz@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-14 13:15:12 -05:00
Denis Efremov 595a438c78 tracing: Make exported ftrace_set_clr_event non-static
The function ftrace_set_clr_event is declared static and marked
EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the
function was decided to be a part of API, this commit removes the static
attribute and adds the declaration to the header.

Link: http://lkml.kernel.org/r/20190704172110.27041-1-efremov@linux.com

Fixes: f45d1225ad ("tracing: Kernel access to Ftrace instances")
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31 06:51:49 -04:00
Cong Wang 0aeb1def44 tracing: Make trace_get_fields() global
trace_get_fields() is the only way to read tracepoint fields at
run time, as their fields are defined at compile-time with macros.
Make this function visible to all users and it will be used by
trace event injection code to calculate the size of a tracepoint
entry.
Link: http://lkml.kernel.org/r/20190525165802.25944-4-xiyou.wangcong@gmail.com

Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16 15:14:48 -04:00
Cong Wang 46710f3a34 tracing: Pass type into tracing_generic_entry_update()
All callers of tracing_generic_entry_update() have to initialize
entry->type, so let's just simply move it inside.
Link: http://lkml.kernel.org/r/20190525165802.25944-2-xiyou.wangcong@gmail.com

Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-16 15:14:48 -04:00
Matt Mullins a38d1107f9 bpf: support raw tracepoints in modules
Distributions build drivers as modules, including network and filesystem
drivers which export numerous tracepoints.  This enables
bpf(BPF_RAW_TRACEPOINT_OPEN) to attach to those tracepoints.

Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-18 14:08:12 -08:00
Song Liu a6ca88b241 trace_uprobe: support reference counter in fd-based uprobe
This patch enables uprobes with reference counter in fd-based uprobe.
Highest 32 bits of perf_event_attr.config is used to stored offset
of the reference count (semaphore).

Format information in /sys/bus/event_source/devices/uprobe/format/ is
updated to reflect this new feature.

Link: http://lkml.kernel.org/r/20181002053636.1896903-1-songliubraving@fb.com

Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-10 22:14:17 -04:00
Linus Torvalds 1c8c5a9d38 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add Maglev hashing scheduler to IPVS, from Inju Song.

 2) Lots of new TC subsystem tests from Roman Mashak.

 3) Add TCP zero copy receive and fix delayed acks and autotuning with
    SO_RCVLOWAT, from Eric Dumazet.

 4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
    Brouer.

 5) Add ttl inherit support to vxlan, from Hangbin Liu.

 6) Properly separate ipv6 routes into their logically independant
    components. fib6_info for the routing table, and fib6_nh for sets of
    nexthops, which thus can be shared. From David Ahern.

 7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
    messages from XDP programs. From Nikita V. Shirokov.

 8) Lots of long overdue cleanups to the r8169 driver, from Heiner
    Kallweit.

 9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.

10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.

11) Plumb extack down into fib_rules, from Roopa Prabhu.

12) Add Flower classifier offload support to igb, from Vinicius Costa
    Gomes.

13) Add UDP GSO support, from Willem de Bruijn.

14) Add documentation for eBPF helpers, from Quentin Monnet.

15) Add TLS tx offload to mlx5, from Ilya Lesokhin.

16) Allow applications to be given the number of bytes available to read
    on a socket via a control message returned from recvmsg(), from
    Soheil Hassas Yeganeh.

17) Add x86_32 eBPF JIT compiler, from Wang YanQing.

18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
    From Björn Töpel.

19) Remove indirect load support from all of the BPF JITs and handle
    these operations in the verifier by translating them into native BPF
    instead. From Daniel Borkmann.

20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.

21) Allow XDP programs to do lookups in the main kernel routing tables
    for forwarding. From David Ahern.

22) Allow drivers to store hardware state into an ELF section of kernel
    dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.

23) Various RACK and loss detection improvements in TCP, from Yuchung
    Cheng.

24) Add TCP SACK compression, from Eric Dumazet.

25) Add User Mode Helper support and basic bpfilter infrastructure, from
    Alexei Starovoitov.

26) Support ports and protocol values in RTM_GETROUTE, from Roopa
    Prabhu.

27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
    Brouer.

28) Add lots of forwarding selftests, from Petr Machata.

29) Add generic network device failover driver, from Sridhar Samudrala.

* ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
  strparser: Add __strp_unpause and use it in ktls.
  rxrpc: Fix terminal retransmission connection ID to include the channel
  net: hns3: Optimize PF CMDQ interrupt switching process
  net: hns3: Fix for VF mailbox receiving unknown message
  net: hns3: Fix for VF mailbox cannot receiving PF response
  bnx2x: use the right constant
  Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
  net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
  enic: fix UDP rss bits
  netdev-FAQ: clarify DaveM's position for stable backports
  rtnetlink: validate attributes in do_setlink()
  mlxsw: Add extack messages for port_{un, }split failures
  netdevsim: Add extack error message for devlink reload
  devlink: Add extack to reload and port_{un, }split operations
  net: metrics: add proper netlink validation
  ipmr: fix error path when ipmr_new_table fails
  ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
  net: hns3: remove unused hclgevf_cfg_func_mta_filter
  netfilter: provide udp*_lib_lookup for nf_tproxy
  qed*: Utilize FW 8.37.2.0
  ...
2018-06-06 18:39:49 -07:00
Steven Rostedt (VMware) c94e45bc38 tracing: Do not reference event data in post call triggers
Trace event triggers can be called before or after the event has been
committed. If it has been called after the commit, there's a possibility
that the event no longer exists. Currently, the two post callers is the
trigger to disable tracing (traceoff) and the one that will record a stack
dump (stacktrace). Neither of them reference the trace event entry record,
as that would lead to a race condition that could pass in corrupted data.

To prevent any other users of the post data triggers from using the trace
event record, pass in NULL to the post call trigger functions for the event
record, as they should never need to use them in the first place.

This does not fix any bug, but prevents bugs from happening by new post call
trigger users.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-29 08:28:02 -04:00
Yonghong Song 41bdc4b40e bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.

There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.

This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
   . prog_id
   . tracepoint name, or
   . k[ret]probe funcname + offset or kernel addr, or
   . u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24 18:18:19 -07:00
Linus Torvalds 2a56bb596b New features:
- Tom Zanussi's extended histogram work
    This adds the synthetic events to have histograms from multiple event data
    Adds triggers "onmatch" and "onmax" to call the synthetic events
    Several updates to the histogram code from this
 
  - Allow way to nest ring buffer calls in the same context
 
  - Allow absolute time stamps in ring buffer
 
  - Rewrite of filter code parsing based on Al Viro's suggestions
 
  - Setting of trace_clock to global if TSC is unstable (on boot)
 
  - Better OOM handling when allocating large ring buffers
 
  - Added initcall tracepoints (consolidated initcall_debug code with them)
 
 And other various fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iQHIBAABCgAyFiEEPm6V/WuN2kyArTUe1a05Y9njSUkFAlrLoCAUHHJvc3RlZHRA
 Z29vZG1pcy5vcmcACgkQ1a05Y9njSUks/QwAn/ky8WgfjcRdjKmBYuEwDedvm9iI
 V9G5kpv5JMw5dLz4l1pS3tA3M9Lyuc5z3Shw92FTy36vdU1wxEjQgHa7viB1xk9x
 KsiTyNjTsgrRd7GVHMy/8Be2RRiTRLaXKAsLCoj/c7QWzagV1P8XWlWK5mojYkh/
 DrSXyg9Avkp30+sU1bvcLWnmmZUFqMxs+bWipD9uFc98USMMyeP25nrnhrj0gDTg
 Q93cjXUuyVRC4lJ2YTW0GCSKhMKEw5f/ltEOT1hwScqYkCJj1EubKqS53R/9h21z
 IPUrYcqLnMRu0j2ejR+UAy5Vsy3gJUrPMQb0F6hlu1DwbMd0d/9SGh1c+Sm+zorh
 yftWTdCZsYrXkaOuB6V5M30X+KBwbWO0Xc9VCvgJ/IU5vMlgLSt5itTWbT/Fmfhb
 ll5/RXP7zhSXRv5sdl/BP3/4dd6F8jpyKyaR2Rk2+XjBOGIq5mvqNGr4Vj9AzxW8
 E0nvq7l7e0dbxZNM42gEm3cht1VUg7Zz0Y0+
 =91oN
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "New features:

   - Tom Zanussi's extended histogram work.

     This adds the synthetic events to have histograms from multiple
     event data Adds triggers "onmatch" and "onmax" to call the
     synthetic events Several updates to the histogram code from this

   - Allow way to nest ring buffer calls in the same context

   - Allow absolute time stamps in ring buffer

   - Rewrite of filter code parsing based on Al Viro's suggestions

   - Setting of trace_clock to global if TSC is unstable (on boot)

   - Better OOM handling when allocating large ring buffers

   - Added initcall tracepoints (consolidated initcall_debug code with
     them)

  And other various fixes and clean ups"

* tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
  init: Have initcall_debug still work without CONFIG_TRACEPOINTS
  init, tracing: Have printk come through the trace events for initcall_debug
  init, tracing: instrument security and console initcall trace events
  init, tracing: Add initcall trace events
  tracing: Add rcu dereference annotation for test func that touches filter->prog
  tracing: Add rcu dereference annotation for filter->prog
  tracing: Fixup logic inversion on setting trace_global_clock defaults
  tracing: Hide global trace clock from lockdep
  ring-buffer: Add set/clear_current_oom_origin() during allocations
  ring-buffer: Check if memory is available before allocation
  lockdep: Add print_irqtrace_events() to __warn
  vsprintf: Do not preprocess non-dereferenced pointers for bprintf (%px and %pK)
  tracing: Uninitialized variable in create_tracing_map_fields()
  tracing: Make sure variable string fields are NULL-terminated
  tracing: Add action comparisons when testing matching hist triggers
  tracing: Don't add flag strings when displaying variable references
  tracing: Fix display of hist trigger expressions containing timestamps
  ftrace: Drop a VLA in module_exists()
  tracing: Mention trace_clock=global when warning about unstable clocks
  tracing: Default to using trace_global_clock if sched_clock is unstable
  ...
2018-04-10 11:27:30 -07:00
Linus Torvalds 5bb053bef8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Support offloading wireless authentication to userspace via
    NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari.

 2) A lot of work on network namespace setup/teardown from Kirill Tkhai.
    Setup and cleanup of namespaces now all run asynchronously and thus
    performance is significantly increased.

 3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon
    Streiff.

 4) Support zerocopy on RDS sockets, from Sowmini Varadhan.

 5) Use denser instruction encoding in x86 eBPF JIT, from Daniel
    Borkmann.

 6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime
    Chevallier.

 7) Support grafting of child qdiscs in mlxsw driver, from Nogah
    Frankel.

 8) Add packet forwarding tests to selftests, from Ido Schimmel.

 9) Deal with sub-optimal GSO packets better in BBR congestion control,
    from Eric Dumazet.

10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern.

11) Add path MTU tests to selftests, from Stefano Brivio.

12) Various bits of IPSEC offloading support for mlx5, from Aviad
    Yehezkel, Yossi Kuperman, and Saeed Mahameed.

13) Support RSS spreading on ntuple filters in SFC driver, from Edward
    Cree.

14) Lots of sockmap work from John Fastabend. Applications can use eBPF
    to filter sendmsg and sendpage operations.

15) In-kernel receive TLS support, from Dave Watson.

16) Add XDP support to ixgbevf, this is significant because it should
    allow optimized XDP usage in various cloud environments. From Tony
    Nguyen.

17) Add new Intel E800 series "ice" ethernet driver, from Anirudh
    Venkataramanan et al.

18) IP fragmentation match offload support in nfp driver, from Pieter
    Jansen van Vuuren.

19) Support XDP redirect in i40e driver, from Björn Töpel.

20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of
    tracepoints in their raw form, from Alexei Starovoitov.

21) Lots of striding RQ improvements to mlx5 driver with many
    performance improvements, from Tariq Toukan.

22) Use rhashtable for inet frag reassembly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits)
  net: mvneta: improve suspend/resume
  net: mvneta: split rxq/txq init and txq deinit into SW and HW parts
  ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh
  net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
  net: bgmac: Correctly annotate register space
  route: check sysctl_fib_multipath_use_neigh earlier than hash
  fix typo in command value in drivers/net/phy/mdio-bitbang.
  sky2: Increase D3 delay to sky2 stops working after suspend
  net/mlx5e: Set EQE based as default TX interrupt moderation mode
  ibmvnic: Disable irqs before exiting reset from closed state
  net: sched: do not emit messages while holding spinlock
  vlan: also check phy_driver ts_info for vlan's real device
  Bluetooth: Mark expected switch fall-throughs
  Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME
  Bluetooth: btrsi: remove unused including <linux/version.h>
  Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4
  sh_eth: kill useless check in __sh_eth_get_regs()
  sh_eth: add sh_eth_cpu_data::no_xdfar flag
  ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()
  ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()
  ...
2018-04-03 14:04:18 -07:00
Alexei Starovoitov c4f6699dfc bpf: introduce BPF_RAW_TRACEPOINT
Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
kernel internal arguments of the tracepoints in their raw form.

>From bpf program point of view the access to the arguments look like:
struct bpf_raw_tracepoint_args {
       __u64 args[0];
};

int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
  // program can read args[N] where N depends on tracepoint
  // and statically verified at program load+attach time
}

kprobe+bpf infrastructure allows programs access function arguments.
This feature allows programs access raw tracepoint arguments.

Similar to proposed 'dynamic ftrace events' there are no abi guarantees
to what the tracepoints arguments are and what their meaning is.
The program needs to type cast args properly and use bpf_probe_read()
helper to access struct fields when argument is a pointer.

For every tracepoint __bpf_trace_##call function is prepared.
In assembler it looks like:
(gdb) disassemble __bpf_trace_xdp_exception
Dump of assembler code for function __bpf_trace_xdp_exception:
   0xffffffff81132080 <+0>:     mov    %ecx,%ecx
   0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>

where

TRACE_EVENT(xdp_exception,
        TP_PROTO(const struct net_device *dev,
                 const struct bpf_prog *xdp, u32 act),

The above assembler snippet is casting 32-bit 'act' field into 'u64'
to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
and in total this approach adds 7k bytes to .text.

This approach gives the lowest possible overhead
while calling trace_xdp_exception() from kernel C code and
transitioning into bpf land.
Since tracepoint+bpf are used at speeds of 1M+ events per second
this is valuable optimization.

The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
that returns anon_inode FD of 'bpf-raw-tracepoint' object.

The user space looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint with prog attached
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

Ctrl-C of tracing daemon or cmdline tool that uses this feature
will automatically detach bpf program, unload it and
unregister tracepoint probe.

On the kernel side the __bpf_raw_tp_map section of pointers to
tracepoint definition and to __bpf_trace_*() probe function is used
to find a tracepoint with "xdp_exception" name and
corresponding __bpf_trace_xdp_exception() probe function
which are passed to tracepoint_probe_register() to connect probe
with tracepoint.

Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
tracepoint mechanisms. perf_event_open() can be used in parallel
on the same tracepoint.
Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
Each with its own bpf program. The kernel will execute
all tracepoint probes and all attached bpf programs.

In the future bpf_raw_tracepoints can be extended with
query/introspection logic.

__bpf_raw_tp_map section logic was contributed by Steven Rostedt

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28 22:55:19 +02:00
Tom Zanussi 1ac4f51c0e tracing: Give event triggers access to ring_buffer_event
The ring_buffer event can provide a timestamp that may be useful to
various triggers - pass it into the handlers for that purpose.

Link: http://lkml.kernel.org/r/6de592683b59fa70ffa5d43d0109896623fc1367.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10 16:05:51 -05:00
Ingo Molnar 7057bb975d Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17 11:39:28 +01:00