When an anomaly is found while modifying function code, ftrace_bug() is
called which disables the function tracing infrastructure and reports
information about what failed. If the code that is to be replaced does not
match what is expected, then actual code is shown. Currently there is no
arch generic way to show what was expected.
Add a new variable pointer calld ftrace_expected that the arch code can set
to point to what it expected so that ftrace_bug() can report the actual text
as well as the text that was expected to be there.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The ftrace function hook utility has several internal checks to make sure
that whatever it modifies is exactly what it expects to be modifying. This
is essential as modifying running code can be extremely dangerous to the
system.
When an anomaly is detected, ftrace_bug() is called which sends a splat to
the console and disables function tracing. There's some extra information
that is printed to help diagnose the issue.
One thing that is missing though is output of what ftrace was doing at the
time of the crash. Was it updating a call site or perhaps converting a call
site to a nop? A new global enum variable is created to state what ftrace
was doing at the time of the anomaly, and this is reported in ftrace_bug().
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Now that max_stack_lock is a global variable, it requires a naming
convention that is unlikely to collide. Rename it to the same naming
convention that the other stack_trace variables have.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
A stack frame may be used in a different way depending on cpu architecture.
Thus it is not always appropriate to slurp the stack contents, as current
check_stack() does, in order to calcurate a stack index (height) at a given
function call. At least not on arm64.
In addition, there is a possibility that we will mistakenly detect a stale
stack frame which has not been overwritten.
This patch makes check_stack() a weak function so as to later implement
arch-specific version.
Link: http://lkml.kernel.org/r/1446182741-31019-5-git-send-email-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Commit 4104d326b6 ("ftrace: Remove global function list and call function
directly") simplified the ftrace code by removing the global_ops list with a
new design. But this cleanup also broke the filtering of PIDs that are added
to the set_ftrace_pid file.
Add back the proper hooks to have pid filtering working once again.
Cc: stable@vger.kernel.org # 3.16+
Reported-by: Matt Fleming <matt@console-pimps.org>
Reported-by: Richard Weinberger <richard.weinberger@gmail.com>
Tested-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add the kernel command line tp_printk option that will have tracepoints
that are active sent to printk() as well as to the trace buffer.
Passing "tp_printk" will activate this. To turn it off, the sysctl
/proc/sys/kernel/tracepoint_printk can have '0' echoed into it. Note,
this only works if the cmdline option is used. Echoing 1 into the sysctl
file without the cmdline option will have no affect.
Note, this is a dangerous option. Having high frequency tracepoints send
their data to printk() can possibly cause a live lock. This is another
reason why this is only active if the command line option is used.
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Enabling tracepoints at boot up can be very useful. The tracepoint
can be initialized right after RCU has been. There's no need to
wait for the early_initcall() to be called. That's too late for some
things that can use tracepoints for debugging. Move the logic to
enable tracepoints out of the initcalls and into init/main.c to
right after rcu_init().
This also allows trace_printk() to be used early too.
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Introduce FTRACE_OPS_FL_IPMODIFY to avoid conflict among
ftrace users who may modify regs->ip to change the execution
path. If two or more users modify the regs->ip on the same
function entry, one of them will be broken. So they must add
IPMODIFY flag and make sure that ftrace_set_filter_ip() succeeds.
Note that ftrace doesn't allow ftrace_ops which has IPMODIFY
flag to have notrace hash, and the ftrace_ops must have a
filter hash (so that the ftrace_ops can hook only specific
entries), because it strongly depends on the address and
must be allowed for only few selected functions.
Link: http://lkml.kernel.org/r/20141121102516.11844.27829.stgit@localhost.localdomain
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ fixed up some of the comments ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Stack traces that happen from function tracing check if the address
on the stack is a __kernel_text_address(). That is, is the address
kernel code. This calls core_kernel_text() which returns true
if the address is part of the builtin kernel code. It also calls
is_module_text_address() which returns true if the address belongs
to module code.
But what is missing is ftrace dynamically allocated trampolines.
These trampolines are allocated for individual ftrace_ops that
call the ftrace_ops callback functions directly. But if they do a
stack trace, the code checking the stack wont detect them as they
are neither core kernel code nor module address space.
Adding another field to ftrace_ops that also stores the size of
the trampoline assigned to it we can create a new function called
is_ftrace_trampoline() that returns true if the address is a
dynamically allocate ftrace trampoline. Note, it ignores trampolines
that are not dynamically allocated as they will return true with
the core_kernel_text() function.
Link: http://lkml.kernel.org/r/20141119034829.497125839@goodmis.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
With the introduction of the dynamic trampolines, it is useful that if
things go wrong that ftrace_bug() produces more information about what
the current state is. This can help debug issues that may arise.
Ftrace has lots of checks to make sure that the state of the system it
touchs is exactly what it expects it to be. When it detects an abnormality
it calls ftrace_bug() and disables itself to prevent any further damage.
It is crucial that ftrace_bug() produces sufficient information that
can be used to debug the situation.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Allowing function callbacks to declare their own trampolines requires
that each ftrace_ops that has a trampoline must have some sort of
accounting that keeps track of which ops has a trampoline attached
to a record.
The easy way to solve this was to add a "tramp_hash" that created a
hash entry for every function that a ops uses with a trampoline.
But since we can have literally tens of thousands of functions being
traced, that means we need tens of thousands of descriptors to map
the ops to the function in the hash. This is quite expensive and
can cause enabling and disabling the function graph tracer to take
some time to start and stop. It can take up to several seconds to
disable or enable all functions in the function graph tracer for this
reason.
The better approach albeit more complex, is to keep track of how ops
are being enabled and disabled, and use that along with the counting
of the number of ops attached to records, to determive what ops has
a trampoline attached to a record at enabling and disabling of
tracing.
To do this, the tramp_hash has been replaced with an old_filter_hash
and old_notrace_hash, which get the copy of the ops filter_hash and
notrace_hash respectively. The old hashes is kept until the ops has
been modified or removed and the old hashes are used with the logic
of the accounting to determine the ops that have the trampoline of
a record. The reason this has less of a footprint is due to the trick
that an "empty" hash in the filter_hash means "all functions" and
an empty hash in the notrace hash means "no functions" in the hash.
This is much more efficienct, doesn't have the delay, and takes up
much less memory, as we do not need to map all the functions but
just figure out which functions are mapped at the time it is
enabled or disabled.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add three new flags for ftrace_ops:
FTRACE_OPS_FL_ADDING
FTRACE_OPS_FL_REMOVING
FTRACE_OPS_FL_MODIFYING
These will be set for the ftrace_ops when they are first added
to the function tracing, being removed from function tracing
or just having their functions changed from function tracing,
respectively.
This will be needed to remove the tramp_hash, which can grow quite
big. The tramp_hash is used to note what functions a ftrace_ops
is using a trampoline for. Denoting which ftrace_ops is being
modified, will allow us to use the ftrace_ops hashes themselves,
which are much smaller as they have a global flag to denote if
a ftrace_ops is tracing all functions, as well as a notrace hash
if the ftrace_ops is tracing all but a few. The tramp_hash just
creates a hash item for every function, which can go into the 10s
of thousands if all functions are using the ftrace_ops trampoline.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add the helper function to what the mcount trampoline is to call
for a ftrace_ops function. This helper will be used by arch code
in the future to set up dynamic trampolines. But as this does the
same tests that are performed in choosing what function to call for
the default mcount trampoline, might as well use it to clean up
the existing code.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently the top level debug file system function tracer shares its
ftrace_ops with the function graph tracer. This was thought to be fine
because the tracers are not used together, as one can only enable
function or function_graph tracer in the current_tracer file.
But that assumption proved to be incorrect. The function profiler
can use the function graph tracer when function tracing is enabled.
Since all function graph users uses the function tracing ftrace_ops
this causes a conflict and when a user enables both function profiling
as well as the function tracer it will crash ftrace and disable it.
The quick solution so far is to move them as separate ftrace_ops like
it was earlier. The problem though is to synchronize the functions that
are traced because both function and function_graph tracer are limited
by the selections made in the set_ftrace_filter and set_ftrace_notrace
files.
To handle this, a new structure is made called ftrace_ops_hash. This
structure will now hold the filter_hash and notrace_hash, and the
ftrace_ops will point to this structure. That will allow two ftrace_ops
to share the same hashes.
Since most ftrace_ops do not share the hashes, and to keep allocation
simple, the ftrace_ops structure will include both a pointer to the
ftrace_ops_hash called func_hash, as well as the structure itself,
called local_hash. When the ops are registered, the func_hash pointer
will be initialized to point to the local_hash within the ftrace_ops
structure. Some of the ftrace internal ftrace_ops will be initialized
statically. This will allow for the function and function_graph tracer
to have separate ops but still share the same hash tables that determine
what functions they trace.
Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Having two fields within the same struct that is off by one character
can be confusing and error prone. Rename the counter "trampolines"
to "nr_trampolines" to explicitly show it is a counter and not to
be confused by the "trampoline" field.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
All users of function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST have
been removed. We can safely remove them from the kernel.
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
function_trace_stop is no longer used to disable function tracing.
This means that archs are no longer limited if it does not support
checking this variable in the mcount trampoline.
No need to use the list_func for archs that do not support this
obsolete method.
Acked-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There are no more kernel users of ftrace_stop() and ftrace_start().
Remove them.
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ftrace_stop() is going away as it disables parts of function tracing
that affects users that should not be affected. But ftrace_graph_stop()
is built on ftrace_stop(). Here's another example of killing all of
function tracing because something went wrong with function graph
tracing.
Instead of disabling all users of function tracing on function graph
error, disable only function graph tracing.
A new function is created called ftrace_graph_is_dead(). This is called
in strategic paths to prevent function graph from doing more harm and
allowing at least a warning to be printed before the system crashes.
NOTE: ftrace_stop() is still used until all the archs are converted over
to use ftrace_graph_is_dead(). After that, ftrace_stop() will be removed.
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently if an arch supports function graph tracing, the core code will
just assign the function graph trampoline to the function graph addr that
gets called.
But as the old method for function graph tracing always calls the function
trampoline first and that calls the function graph trampoline, some
archs may have the function graph trampoline dependent on operations that
were done in the function trampoline. This causes function graph tracer
to break on those archs.
Instead of having the default be to set the function graph ftrace_ops
to the function graph trampoline, have it instead just set it to zero
which will keep it from jumping to a trampoline that is not set up
to be jumped directly too.
Link: http://lkml.kernel.org/r/53BED155.9040607@nvidia.com
Reported-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Tested-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Function graph tracing is a bit different than the function tracers, as
it is processed after either the ftrace_caller or ftrace_regs_caller
and we only have one place to modify the jump to ftrace_graph_caller,
the jump needs to happen after the restore of registeres.
The function graph tracer is dependent on the function tracer, where
even if the function graph tracing is going on by itself, the save and
restore of registers is still done for function tracing regardless of
if function tracing is happening, before it calls the function graph
code.
If there's no function tracing happening, it is possible to just call
the function graph tracer directly, and avoid the wasted effort to save
and restore regs for function tracing.
This requires adding new flags to the dyn_ftrace records:
FTRACE_FL_TRAMP
FTRACE_FL_TRAMP_EN
The first is set if the count for the record is one, and the ftrace_ops
associated to that record has its own trampoline. That way the mcount code
can call that trampoline directly.
In the future, trampolines can be added to arbitrary ftrace_ops, where you
can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
and if they are not tracing the same functions, then instead of doing a
loop to check all registered ftrace_ops against their hashes, just call the
ftrace_ops trampoline directly, which would call the registered ftrace_ops
function directly.
Without this patch perf showed:
0.05% hackbench [kernel.kallsyms] [k] ftrace_caller
0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save
0.05% hackbench [kernel.kallsyms] [k] native_sched_clock
0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] preempt_trace
0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return
0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
See that the ftrace_caller took up more time than the ftrace_graph_caller
did.
With this patch:
0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit
0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard
0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller
0.04% hackbench [kernel.kallsyms] [k] sched_clock
The ftrace_caller is no where to be found and ftrace_graph_caller still
takes up the same percentage.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The ftrace dynamic record has a flags element that also has a counter.
Instead of hard coding "rec->flags & ~FTRACE_FL_MASK" all over the
place. Use a macro instead.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As new flags will be added to the ftrace dynamic record, and since
the flags field is also a counter, converting the numbers used to
do the shifting and masking into a set of macros where we only need
to deal with the max bit count of the counter and the number of bits
for the flags will prevent mistakes in the future.
Dealing with only two numbers is much easier than updating all the
macros that deal with shifting and masking.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
to help out the rest of the kernel to ease their use of trace events.
The big change for this release is the allowing of other tracers,
such as the latency tracers, to be used in the trace instances and allow
for function or function graph tracing to be in the top level
simultaneously.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTlbUqAAoJEKQekfcNnQGuP+8H+wTBG06beHsqe6XcaeXcKNkt
Mimm0O04oQdw89CBWeJvXyOwRTtiN4M/4hxHXBTDtChxM9oUyWw1o0IpSMMuQ16O
w9r3DfC8e1air+ufEuYWM0QNtyzHi8EfDSNia55ON5jvtkCZTXOEKZD+n8M9w28p
I7PVgr0PDztsCpethCpg0M8beK9zuQPWMzsHAQCsKI06Xl5z33kPIJR15Exh+Kr1
uVVTZW7JFVAPuSnteLSIx9pN6OjsVGzOZCljg+O+9/v/02u5nkMiS2nURxae86kg
RTSiRYT6Hvl/MCBhdss/w5kgSk6BYiZ0hXbLtwetvre+vQrOR5CnDw2DxZ7e+gU=
=oudH
-----END PGP SIGNATURE-----
Merge tag 'trace-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Lots of tweaks, small fixes, optimizations, and some helper functions
to help out the rest of the kernel to ease their use of trace events.
The big change for this release is the allowing of other tracers, such
as the latency tracers, to be used in the trace instances and allow
for function or function graph tracing to be in the top level
simultaneously"
* tag 'trace-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
tracing: Fix memory leak on instance deletion
tracing: Fix leak of ring buffer data when new instances creation fails
tracing/kprobes: Avoid self tests if tracing is disabled on boot up
tracing: Return error if ftrace_trace_arrays list is empty
tracing: Only calculate stats of tracepoint benchmarks for 2^32 times
tracing: Convert stddev into u64 in tracepoint benchmark
tracing: Introduce saved_cmdlines_size file
tracing: Add __get_dynamic_array_len() macro for trace events
tracing: Remove unused variable in trace_benchmark
tracing: Eliminate double free on failure of allocation on boot up
ftrace/x86: Call text_ip_addr() instead of the duplicated code
tracing: Print max callstack on stacktrace bug
tracing: Move locking of trace_cmdline_lock into start/stop seq calls
tracing: Try again for saved cmdline if failed due to locking
tracing: Have saved_cmdlines use the seq_read infrastructure
tracing: Add tracepoint benchmark tracepoint
tracing: Print nasty banner when trace_printk() is in use
tracing: Add funcgraph_tail option to print function name after closing braces
tracing: Eliminate duplicate TRACE_GRAPH_PRINT_xx defines
tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks
...
Most archs with HAVE_ARCH_CALLER_ADDR have pretty much the same
definitions of CALLER_ADDRx(n). Instead of duplicating the code for all
the archs, define a ftrace_return_address0() and
ftrace_return_address(n) that can be overwritten by the archs if they
need to do something different. Instead of 7 macros in every arch, we
now only have at most 2 (and actually only 1 as
ftrace_return_address0() should be the same for all archs).
The CALLER_ADDRx(n) will now be defined in linux/ftrace.h and use the
ftrace_return_address*(n?) macros. This removes a lot of the duplicate
code.
Link: http://lkml.kernel.org/p/1400585464-30333-1-git-send-email-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As the decision to what needs to be done (converting a call to the
ftrace_caller to ftrace_caller_regs or to convert from ftrace_caller_regs
to ftrace_caller) can easily be determined from the rec->flags of
FTRACE_FL_REGS and FTRACE_FL_REGS_EN, there's no need to have the
ftrace_check_record() return either a UPDATE_MODIFY_CALL_REGS or a
UPDATE_MODIFY_CALL. Just he latter is enough. This added flag causes
more complexity than is required. Remove it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Move and rename get_ftrace_addr() and get_ftrace_addr_old() to
ftrace_get_addr_new() and ftrace_get_addr_curr() respectively.
This moves these two helper functions in the generic code out from
the arch specific code, and renames them to have a better generic
name. This will allow other archs to use them as well as makes it
a bit easier to work on getting separate trampolines for different
functions.
ftrace_get_addr_new() returns the trampoline address that the mcount
call address will be converted to.
ftrace_get_addr_curr() returns the trampoline address of what the
mcount call address currently jumps to.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
A race exists between module loading and enabling of function tracer.
CPU 1 CPU 2
----- -----
load_module()
module->state = MODULE_STATE_COMING
register_ftrace_function()
mutex_lock(&ftrace_lock);
ftrace_startup()
update_ftrace_function();
ftrace_arch_code_modify_prepare()
set_all_module_text_rw();
<enables-ftrace>
ftrace_arch_code_modify_post_process()
set_all_module_text_ro();
[ here all module text is set to RO,
including the module that is
loading!! ]
blocking_notifier_call_chain(MODULE_STATE_COMING);
ftrace_init_module()
[ tries to modify code, but it's RO, and fails!
ftrace_bug() is called]
When this race happens, ftrace_bug() will produces a nasty warning and
all of the function tracing features will be disabled until reboot.
The simple solution is to treate module load the same way the core
kernel is treated at boot. To hardcode the ftrace function modification
of converting calls to mcount into nops. This is done in init/main.c
there's no reason it could not be done in load_module(). This gives
a better control of the changes and doesn't tie the state of the
module to its notifiers as much. Ftrace is special, it needs to be
treated as such.
The reason this would work, is that the ftrace_module_init() would be
called while the module is in MODULE_STATE_UNFORMED, which is ignored
by the set_all_module_text_ro() call.
Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.com
Reported-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Instead of having a list of global functions that are called,
as only one global function is allow to be enabled at a time, there's
no reason to have a list.
Instead, simply have all the users of the global ops, use the global ops
directly, instead of registering their own ftrace_ops. Just switch what
function is used before enabling the function tracer.
This removes a lot of code as well as the complexity involved with it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The 'freelist' member was introduced to 'struct dyn_ftrace' in commit
ee000b7f9f (tracing: use union for
multi-usages field), but the use of this member was later removed in
3208230983 (ftrace: Remove usage of
"freed" records). Remove also the 'freelist' member now.
Link: http://lkml.kernel.org/r/1393268401-24379-5-git-send-email-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As the data parameter is not really used by any ftrace_dyn_arch_init,
remove that from ftrace_dyn_arch_init. This also removes the addr
local variable from ftrace_init which is now unused.
Note the documentation was imprecise as it did not suggest to set
(*data) to 0.
Link: http://lkml.kernel.org/r/1393268401-24379-4-git-send-email-jslaby@suse.cz
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Create a "set_ftrace_filter" and "set_ftrace_notrace" files in the instance
directories to let users filter of functions to trace for the given instance.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Passing data to the function callback was originally done by adding the
ftrace_ops in another structure, and using the container_of() to get
the field. But this adds a bit more complexity than it is worth, and
adding a simple .private field to ftrace_ops makes things a lot easier.
But be warned, the .private data should not be freed once it is used
unless the ftrace_ops itself has gone through the necessary freeing
routines. A simple synchronize_sched() is not enough as functions
can be traced that are called outside the view of RCU and all its
concoctions.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Trace event triggers added a lseek that uses the ftrace_filter_lseek()
function. Unfortunately, when function tracing is not configured in
that function is not defined and the kernel fails to build.
This is the second time that function was added to a file ops and
it broke the build due to requiring special config dependencies.
Make a generic tracing_lseek() that all the tracing utilities may
use.
Also, modify the old ftrace_filter_lseek() to return 0 instead of
1 on WRONLY. Not sure why it was a 1 as that does not make sense.
This also changes the old tracing_seek() to modify the file pos
pointer on WRONLY as well.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a traceoff_on_warning option in both the kernel command line as well
as a sysctl option. When set, any WARN*() function that is hit will cause
the tracing_on variable to be cleared, which disables writing to the
ring buffer.
This is useful especially when tracing a bug with function tracing. When
a warning is hit, the print caused by the warning can flood the trace with
the functions that producing the output for the warning. This can make the
resulting trace useless by either hiding where the bug happened, or worse,
by overflowing the buffer and losing the trace of the bug totally.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This is a leftover after ftrace_regex_lseek() was renamed to
ftrace_filter_lseek() and then ftrace_filter_lseek() was moved
out side of CONFIG_DYNAMIC_FTRACE.
Link: http://lkml.kernel.org/r/51B1A1BD.40905@huawei.com
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Fix a deadlock on ftrace_regex_lock which happens when setting
an enable_event trigger on dynamic kprobe event as below.
----
sh-2.05b# echo p vfs_symlink > kprobe_events
sh-2.05b# echo vfs_symlink:enable_event:kprobes:p_vfs_symlink_0 > set_ftrace_filter
=============================================
[ INFO: possible recursive locking detected ]
3.9.0+ #35 Not tainted
---------------------------------------------
sh/72 is trying to acquire lock:
(ftrace_regex_lock){+.+.+.}, at: [<ffffffff810ba6c1>] ftrace_set_hash+0x81/0x1f0
but task is already holding lock:
(ftrace_regex_lock){+.+.+.}, at: [<ffffffff810b7cbd>] ftrace_regex_write.isra.29.part.30+0x3d/0x220
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(ftrace_regex_lock);
lock(ftrace_regex_lock);
*** DEADLOCK ***
----
To fix that, this introduces a finer regex_lock for each ftrace_ops.
ftrace_regex_lock is too big of a lock which protects all
filter/notrace_hash operations, but it doesn't need to be a global
lock after supporting multiple ftrace_ops because each ftrace_ops
has its own filter/notrace_hash.
Link: http://lkml.kernel.org/r/20130509054417.30398.84254.stgit@mhiramat-M0-7522
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ Added initialization flag and automate mutex initialization for
non ftrace.c ftrace_probes. ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Along with the usual minor fixes and clean ups there are a few major
changes with this pull request.
1) Multiple buffers for the ftrace facility
This feature has been requested by many people over the last few years.
I even heard that Google was about to implement it themselves. I finally
had time and cleaned up the code such that you can now create multiple
instances of the ftrace buffer and have different events go to different
buffers. This way, a low frequency event will not be lost in the noise
of a high frequency event.
Note, currently only events can go to different buffers, the tracers
(ie. function, function_graph and the latency tracers) still can only
be written to the main buffer.
2) The function tracer triggers have now been extended.
The function tracer had two triggers. One to enable tracing when a
function is hit, and one to disable tracing. Now you can record a
stack trace on a single (or many) function(s), take a snapshot of the
buffer (copy it to the snapshot buffer), and you can enable or disable
an event to be traced when a function is hit.
3) A perf clock has been added.
A "perf" clock can be chosen to be used when tracing. This will cause
ftrace to use the same clock as perf uses, and hopefully this will make
it easier to interleave the perf and ftrace data for analysis.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJRfnTPAAoJEOdOSU1xswtMqYYH/1WIdrwXmxHflErnYkCIr3sU
QtYae2K5A1HcgiqOvRJrdWMOt016iMx5CaQQyBFM1vvMiPY0sTWRmwNxDfZzz9LN
10jRvWEzZSLtzl+a9mkFWLEpr5nR/QODOxkWFCnRWscp46sp04LSTxGDYsOnPQZB
sam/AQ1h4xA+DqDBChm9BDEUEPorGleTlN54LBaCGgSFGvrbF+eAg2s4vHNAQAvQ
8d5xjSE9zC7J+FqbVxvJTbKI3+EqKL6hMsJKsKfi0SI+FuxBaFMSltXck5zKyTI4
HpNJzXCmw+v90Tju7oMkPHh6RTbESPCHoGU+wqE52fM6m7oScVeuI/kfc6USwU4=
=W1n+
-----END PGP SIGNATURE-----
Merge tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Along with the usual minor fixes and clean ups there are a few major
changes with this pull request.
1) Multiple buffers for the ftrace facility
This feature has been requested by many people over the last few
years. I even heard that Google was about to implement it themselves.
I finally had time and cleaned up the code such that you can now
create multiple instances of the ftrace buffer and have different
events go to different buffers. This way, a low frequency event will
not be lost in the noise of a high frequency event.
Note, currently only events can go to different buffers, the tracers
(ie function, function_graph and the latency tracers) still can only
be written to the main buffer.
2) The function tracer triggers have now been extended.
The function tracer had two triggers. One to enable tracing when a
function is hit, and one to disable tracing. Now you can record a
stack trace on a single (or many) function(s), take a snapshot of the
buffer (copy it to the snapshot buffer), and you can enable or disable
an event to be traced when a function is hit.
3) A perf clock has been added.
A "perf" clock can be chosen to be used when tracing. This will cause
ftrace to use the same clock as perf uses, and hopefully this will
make it easier to interleave the perf and ftrace data for analysis."
* tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (82 commits)
tracepoints: Prevent null probe from being added
tracing: Compare to 1 instead of zero for is_signed_type()
tracing: Remove obsolete macro guard _TRACE_PROFILE_INIT
ftrace: Get rid of ftrace_profile_bits
tracing: Check return value of tracing_init_dentry()
tracing: Get rid of unneeded key calculation in ftrace_hash_move()
tracing: Reset ftrace_graph_filter_enabled if count is zero
tracing: Fix off-by-one on allocating stat->pages
kernel: tracing: Use strlcpy instead of strncpy
tracing: Update debugfs README file
tracing: Fix ftrace_dump()
tracing: Rename trace_event_mutex to trace_event_sem
tracing: Fix comment about prefix in arch_syscall_match_sym_name()
tracing: Convert trace_destroy_fields() to static
tracing: Move find_event_field() into trace_events.c
tracing: Use TRACE_MAX_PRINT instead of constant
tracing: Use pr_warn_once instead of open coded implementation
ring-buffer: Add ring buffer startup selftest
tracing: Bring Documentation/trace/ftrace.txt up to date
tracing: Add "perf" trace_clock
...
Conflicts:
kernel/trace/ftrace.c
kernel/trace/trace.c
As ftrace_filter_lseek is now used with ftrace_pid_fops, it needs to
be moved out of the #ifdef CONFIG_DYNAMIC_FTRACE section as the
ftrace_pid_fops is defined when DYNAMIC_FTRACE is not.
Cc: stable@vger.kernel.org
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently set_ftrace_pid and set_graph_function files use seq_lseek
for their fops. However seq_open() is called only for FMODE_READ in
the fops->open() so that if an user tries to seek one of those file
when she open it for writing, it sees NULL seq_file and then panic.
It can be easily reproduced with following command:
$ cd /sys/kernel/debug/tracing
$ echo 1234 | sudo tee -a set_ftrace_pid
In this example, GNU coreutils' tee opens the file with fopen(, "a")
and then the fopen() internally calls lseek().
Link: http://lkml.kernel.org/r/1365663302-2170-1-git-send-email-namhyung@kernel.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The function tracing control loop used by perf spits out a warning
if the called function is not a control function. This is because
the control function references a per cpu allocated data structure
on struct ftrace_ops that is not allocated for other types of
functions.
commit 0a016409e4 "ftrace: Optimize the function tracer list loop"
Had an optimization done to all function tracing loops to optimize
for a single registered ops. Unfortunately, this allows for a slight
race when tracing starts or ends, where the stub function might be
called after the current registered ops is removed. In this case we
get the following dump:
root# perf stat -e ftrace:function sleep 1
[ 74.339105] WARNING: at include/linux/ftrace.h:209 ftrace_ops_control_func+0xde/0xf0()
[ 74.349522] Hardware name: PRIMERGY RX200 S6
[ 74.357149] Modules linked in: sg igb iTCO_wdt ptp pps_core iTCO_vendor_support i7core_edac dca lpc_ich i2c_i801 coretemp edac_core crc32c_intel mfd_core ghash_clmulni_intel dm_multipath acpi_power_meter pcspk
r microcode vhost_net tun macvtap macvlan nfsd kvm_intel kvm auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sd_mod crc_t10dif sr_mod cdrom mgag200 i2c_algo_bit drm_kms_helper ttm qla2xxx mptsas ahci drm li
bahci scsi_transport_sas mptscsih libata scsi_transport_fc i2c_core mptbase scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[ 74.446233] Pid: 1377, comm: perf Tainted: G W 3.9.0-rc1 #1
[ 74.453458] Call Trace:
[ 74.456233] [<ffffffff81062e3f>] warn_slowpath_common+0x7f/0xc0
[ 74.462997] [<ffffffff810fbc60>] ? rcu_note_context_switch+0xa0/0xa0
[ 74.470272] [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
[ 74.478117] [<ffffffff81062e9a>] warn_slowpath_null+0x1a/0x20
[ 74.484681] [<ffffffff81102ede>] ftrace_ops_control_func+0xde/0xf0
[ 74.491760] [<ffffffff8162f400>] ftrace_call+0x5/0x2f
[ 74.497511] [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
[ 74.503486] [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
[ 74.509500] [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
[ 74.516088] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[ 74.522268] [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
[ 74.528837] [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
[ 74.536696] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[ 74.542878] [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
[ 74.548869] [<ffffffff81105c67>] unregister_ftrace_function+0x27/0x50
[ 74.556243] [<ffffffff8111eadf>] perf_ftrace_event_register+0x9f/0x140
[ 74.563709] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[ 74.569887] [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
[ 74.575898] [<ffffffff8111e94e>] perf_trace_destroy+0x2e/0x50
[ 74.582505] [<ffffffff81127ba9>] tp_perf_event_destroy+0x9/0x10
[ 74.589298] [<ffffffff811295d0>] free_event+0x70/0x1a0
[ 74.595208] [<ffffffff8112a579>] perf_event_release_kernel+0x69/0xa0
[ 74.602460] [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
[ 74.608667] [<ffffffff8112a640>] put_event+0x90/0xc0
[ 74.614373] [<ffffffff8112a740>] perf_release+0x10/0x20
[ 74.620367] [<ffffffff811a3044>] __fput+0xf4/0x280
[ 74.625894] [<ffffffff811a31de>] ____fput+0xe/0x10
[ 74.631387] [<ffffffff81083697>] task_work_run+0xa7/0xe0
[ 74.637452] [<ffffffff81014981>] do_notify_resume+0x71/0xb0
[ 74.643843] [<ffffffff8162fa92>] int_signal+0x12/0x17
To fix this a new ftrace_ops flag is added that denotes the ftrace_list_end
ftrace_ops stub as just that, a stub. This flag is now checked in the
control loop and the function is not called if the flag is set.
Thanks to Jovi for not just reporting the bug, but also pointing out
where the bug was in the code.
Link: http://lkml.kernel.org/r/514A8855.7090402@redhat.com
Link: http://lkml.kernel.org/r/1364377499-1900-15-git-send-email-jovi.zhangwei@huawei.com
Tested-by: WANG Chao <chaowang@redhat.com>
Reported-by: WANG Chao <chaowang@redhat.com>
Reported-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When a function probe is created, each function that the probe is
attached to, a "callback" method is called. On release of the probe,
each function entry calls the "free" method.
First, "callback" is a confusing name and does not really match what
it does. Callback sounds like it will be called when the probe
triggers. But that's not the case. This is really an "init" function,
so lets rename it as such.
Secondly, both "init" and "free" do not pass enough information back
to the handlers. Pass back the ops, ip and data for each time the
method is called. We have the information, might as well use it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Move SAVE_REGS support flag into Kconfig and rename
it to CONFIG_DYNAMIC_FTRACE_WITH_REGS. This also introduces
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which indicates
the architecture depending part of ftrace has a code
that saves full registers.
On the other hand, CONFIG_DYNAMIC_FTRACE_WITH_REGS indicates
the code is enabled.
Link: http://lkml.kernel.org/r/20120928081516.3560.72534.stgit@ltc138.sdl.hitachi.co.jp
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
But the kernel decided to call it "origin" instead. Fix most of the
sites.
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When CONFIG_DYNAMIC_FTRACE is not set, ftrace_location() is not defined.
If a user (like kprobes) references this function, it will break
the compile when CONFIG_DYNAMIC_FTRACE is not set.
Add ftrace_location() as a nop (return 0) when DYNAMIC_FTRACE
is not defined.
Link: http://lkml.kernel.org/r/20120612225426.961092717@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a new filter update interface ftrace_set_filter_ip()
to set ftrace filter by ip address, not only glob pattern.
Link: http://lkml.kernel.org/r/20120605102808.27845.67952.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add selftests to test the function tracing recursion protection actually
does work. It also tests if a ftrace_ops states it will perform its own
protection. Although, even if the ftrace_ops states it will protect itself,
the ftrace infrastructure may still provide protection if the arch does
not support all features or another ftrace_ops is registered.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As more users of the function tracer utility are being added, they do
not always add the necessary recursion protection. To protect from
function recursion due to tracing, if the callback ftrace_ops does not
specifically specify that it protects against recursion (by setting
the FTRACE_OPS_FL_RECURSION_SAFE flag), the list operation will be
called by the mcount trampoline which adds recursion protection.
If the flag is set, then the function will be called directly with no
extra protection.
Note, the list operation is called if more than one function callback
is registered, or if the arch does not support all of the function
tracer features.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a way to have different functions calling different trampolines.
If a ftrace_ops wants regs saved on the return, then have only the
functions with ops registered to save regs. Functions registered by
other ops would not be affected, unless the functions overlap.
If one ftrace_ops registered functions A, B and C and another ops
registered fucntions to save regs on A, and D, then only functions
A and D would be saving regs. Function B and C would work as normal.
Although A is registered by both ops: normal and saves regs; this is fine
as saving the regs is needed to satisfy one of the ops that calls it
but the regs are ignored by the other ops function.
x86_64 implements the full regs saving, and i386 just passes a NULL
for regs to satisfy the ftrace_ops passing. Where an arch must supply
both regs and ftrace_ops parameters, even if regs is just NULL.
It is OK for an arch to pass NULL regs. All function trace users that
require regs passing must add the flag FTRACE_OPS_FL_SAVE_REGS when
registering the ftrace_ops. If the arch does not support saving regs
then the ftrace_ops will fail to register. The flag
FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED may be set that will prevent the
ftrace_ops from failing to register. In this case, the handler may
either check if regs is not NULL or check if ARCH_SUPPORTS_FTRACE_SAVE_REGS.
If the arch supports passing regs it will set this macro and pass regs
for ops that request them. All other archs will just pass NULL.
Link: Link: http://lkml.kernel.org/r/20120711195745.107705970@goodmis.org
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Return as the 4th paramater to the function tracer callback the pt_regs.
Later patches that implement regs passing for the architectures will require
having the ftrace_ops set the SAVE_REGS flag, which will tell the arch
to take the time to pass a full set of pt_regs to the ftrace_ops callback
function. If the arch does not support it then it should pass NULL.
If an arch can pass full regs, then it should define:
ARCH_SUPPORTS_FTRACE_SAVE_REGS to 1
Link: http://lkml.kernel.org/r/20120702201821.019966811@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As the function tracer starts to get more features, the support for
theses features will spread out throughout the different architectures
over time. These features boil down to what each arch does in the
mcount trampoline (the ftrace_caller).
Currently there's two features that are not the same throughout the
archs.
1) Support to stop function tracing before the callback
2) passing of the ftrace ops
Both of these require placing an indirect function to support the
features if the mcount trampoline does not.
On a side note, for all architectures, when more than one callback
is registered to the function tracer, an intermediate 'list' function
is called by the mcount trampoline to iterate through the callbacks
that are registered.
Instead of making a separate function for each of these features,
and requiring several indirect calls, just use the single 'list' function
as the intermediate, to handle all cases. If an arch does not support
the 'stop function tracing' or the passing of ftrace ops, just force
it to use the list function that will handle the features required.
This makes the code cleaner and simpler and removes a lot of
#ifdefs in the code.
Link: http://lkml.kernel.org/r/20120612225424.495625483@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently the function trace callback receives only the ip and parent_ip
of the function that it traced. It would be more powerful to also return
the ops that registered the function as well. This allows the same function
to act differently depending on what ftrace_ops registered it.
Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
To remove duplicate code, have the ftrace arch_ftrace_update_code()
use the generic ftrace_modify_all_code(). This requires that the
default ftrace_replace_code() becomes a weak function so that an
arch may override it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Rename __ftrace_modify_code() to ftrace_modify_all_code() and make
it global for all archs to use. This will remove the duplication
of code, as archs that can modify code without stop_machine()
can use it directly outside of the stop_machine() call.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ftrace_location() is passed an addr, and returns 1 if the addr is
on a ftrace nop (or caller to ftrace_caller), and 0 otherwise.
To let kprobes know if it should move a breakpoint or not, it
must return the actual addr that is the start of the ftrace nop.
This way a kprobe placed on the location of a ftrace nop, can
instead be placed on the instruction after the nop. Even if the
probe addr is on the second or later byte of the nop, it can
simply be moved forward.
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This fixes spending time for evaluating parameters in trace_preempt_on/off when
the tracer config is off.
The patch mainly inspired by Steven Rostedt, thanks Steven.
Link: http://lkml.kernel.org/r/4FA73510.7070705@samsung.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Minho Ban <mhban@samsung.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This method changes x86 to add a breakpoint to the mcount locations
instead of calling stop machine.
Now that iret can be handled by NMIs, we perform the following to
update code:
1) Add a breakpoint to all locations that will be modified
2) Sync all cores
3) Update all locations to be either a nop or call (except breakpoint
op)
4) Sync all cores
5) Remove the breakpoint with the new code.
6) Sync all cores
[
Added updates that Masami suggested:
Use unlikely(modifying_ftrace_code) in int3 trap to keep kprobes efficient.
Don't use NOTIFY_* in ftrace handler in int3 as it is not a notifier.
]
Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Adding support to filter function trace event via perf
interface. It is now possible to use filter interface
in the perf tool like:
perf record -e ftrace:function --filter="(ip == mm_*)" ls
The filter syntax is restricted to the the 'ip' field only,
and following operators are accepted '==' '!=' '||', ending
up with the filter strings like:
ip == f1[, ]f2 ... || ip != f3[, ]f4 ...
with comma ',' or space ' ' as a function separator. If the
space ' ' is used as a separator, the right side of the
assignment needs to be enclosed in double quotes '"', e.g.:
perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls
perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls
perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls
The '==' operator adds trace filter with same effect as would
be added via set_ftrace_filter file.
The '!=' operator adds trace filter with same effect as would
be added via set_ftrace_notrace file.
The right side of the '!=', '==' operators is list of functions
or regexp. to be added to filter separated by space.
The '||' operator is used for connecting multiple filter definitions
together. It is possible to have more than one '==' and '!='
operators within one filter string.
Link: http://lkml.kernel.org/r/1329317514-8131-8-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.
Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.
When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.
The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.
Adding 3 inline functions:
ftrace_function_local_disable/ftrace_function_local_enable
- enable/disable the ftrace_ops on current cpu
ftrace_function_local_disabled
- get disabled ftrace_ops::disabled value for current cpu
Link: http://lkml.kernel.org/r/1329317514-8131-2-git-send-email-jolsa@redhat.com
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently the ftrace_set_filter and ftrace_set_notrace functions
do not return any return code. So there's no way for ftrace_ops
user to tell wether the filter was correctly applied.
The set_ftrace_filter interface returns error in case the filter
did not match:
# echo krava > set_ftrace_filter
bash: echo: write error: Invalid argument
Changing both ftrace_set_filter and ftrace_set_notrace functions
to return zero if the filter was applied correctly or -E* values
in case of error.
Link: http://lkml.kernel.org/r/1325495060-6402-2-git-send-email-jolsa@redhat.com
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The stack tracer uses the call ftrace_set_early_filter() function
to allow the stack tracer to pick its own functions on boot.
But this function is not defined if dynamic ftrace is not set.
This causes a compiler error when stack tracer is enabled and
dynamic ftrace is not.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Change set_ftrace_early_filter() to ftrace_set_early_filter()
and make it a global function. This will allow other subsystems
in the kernel to be able to enable function tracing at start
up and reuse the ftrace function parsing code.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The set_ftrace_filter shows "hashed" functions, which are functions
that are added with operations to them (like traceon and traceoff).
As other subsystems may be able to show what functions they are
using for function tracing, the hash items should no longer
be shown just because the FILTER flag is set. As they have nothing
to do with other subsystems filters.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The function tracer is set up to allow any other subsystem (like perf)
to use it. Ftrace already has a way to list what functions are enabled
by the global_ops. It would be very helpful to let other users of
the function tracer to be able to use the same code.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As new functions come in to be initalized from mcount to nop,
they are done by groups of pages. Whether it is the core kernel
or a module. There's no need to keep track of these on a per record
basis.
At startup, and as any module is loaded, the functions to be
traced are stored in a group of pages and added to the function
list at the end. We just need to keep a pointer to the first
page of the list that was added, and use that to know where to
start on the list for initializing functions.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Records that are added to the function trace table are
permanently there, except for modules. By separating out the
modules to their own pages that can be freed in one shot
we can remove the "freed" flag and simplify some of the record
management.
Another benefit of doing this is that we can also move the
records around; sort them.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The stop machine method to modify all functions in the kernel
(some 20,000 of them) is the safest way to do so across all archs.
But some archs may not need this big hammer approach to modify code
on SMP machines, and can simply just update the code it needs.
Adding a weak function arch_ftrace_update_code() that now does the
stop machine, will also let any arch override this method.
If the arch needs to check the system and then decide if it can
avoid stop machine, it can still call ftrace_run_stop_machine() to
use the old method.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The <linux/module.h> pretty much brings in the kitchen sink along
with it, so it should be avoided wherever reasonably possible in
terms of being included from other commonly used <linux/something.h>
files, as it results in a measureable increase on compile times.
The worst culprit was probably device.h since it is used everywhere.
This file also had an implicit dependency/usage of mutex.h which was
masked by module.h, and is also fixed here at the same time.
There are over a dozen other headers that simply declare the
struct instead of pulling in the whole file, so follow their lead
and simply make it a few more.
Most of the implicit dependencies on module.h being present by
these headers pulling it in have been now weeded out, so we can
finally make this change with hopefully minimal breakage.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The struct ftrace_hash was declared within CONFIG_FUNCTION_TRACER
but was referenced outside of it.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Since users of the function tracer can now pick and choose which
functions they want to trace agnostically from other users of the
function tracer, we need to pass the ops struct to the ftrace_set_filter()
functions.
The functions ftrace_set_global_filter() and ftrace_set_global_notrace()
is added to keep the old filter functions which are used to modify
the generic function tracers.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Now that functions may be selected individually, it only makes sense
that we should allow dynamically allocated trace structures to
be traced. This will allow perf to allocate a ftrace_ops structure
at runtime and use it to pick and choose which functions that
structure will trace.
Note, a dynamically allocated ftrace_ops will always be called
indirectly instead of being called directly from the mcount in
entry.S. This is because there's no safe way to prevent mcount
from being preempted before calling the function, unless we
modify every entry.S to do so (not likely). Thus, dynamically allocated
functions will now be called by the ftrace_ops_list_func() that
loops through the ops that are allocated if there are more than
one op allocated at a time. This loop is protected with a
preempt_disable.
To determine if an ftrace_ops structure is allocated or not, a new
util function was added to the kernel/extable.c called
core_kernel_data(), which returns 1 if the address is between
_sdata and _edata.
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ftrace_ops that are registered to trace functions can now be
agnostic to each other in respect to what functions they trace.
Each ops has their own hash of the functions they want to trace
and a hash to what they do not want to trace. A empty hash for
the functions they want to trace denotes all functions should
be traced that are not in the notrace hash.
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Every function has its own record that stores the instruction
pointer and flags for the function to be traced. There are only
two flags: enabled and free. The enabled flag states that tracing
for the function has been enabled (actively traced), and the free
flag states that the record no longer points to a function and can
be used by new functions (loaded modules).
These flags are now moved to the MSB of the flags (actually just
the top 32bits). The rest of the bits (30 bits) are now used as
a ref counter. Everytime a tracer register functions to trace,
those functions will have its counter incremented.
When tracing is enabled, to determine if a function should be traced,
the counter is examined, and if it is non-zero it is set to trace.
When a ftrace_ops is registered to trace functions, its hashes
are examined. If the ftrace_ops filter_hash count is zero, then
all functions are set to be traced, otherwise only the functions
in the hash are to be traced. The exception to this is if a function
is also in the ftrace_ops notrace_hash. Then that function's counter
is not incremented for this ftrace_ops.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Combine the filter and notrace hashes to be accessed by a single entity,
the global_ops. The global_ops is a ftrace_ops structure that is passed
to different functions that can read or modify the filtering of the
function tracer.
The ftrace_ops structure was modified to hold a filter and notrace
hashes so that later patches may allow each ftrace_ops to have its own
set of rules to what functions may be filtered.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When multiple users are allowed to have their own set of functions
to trace, having the FTRACE_FL_FILTER flag will not be enough to
handle the accounting of those users. Each user will need their own
set of functions.
Replace the FTRACE_FL_FILTER with a filter_hash instead. This is
temporary until the rest of the function filtering accounting
gets in.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
To prepare for the accounting system that will allow multiple users of
the function tracer, having the FTRACE_FL_NOTRACE as a flag in the
dyn_trace record does not make sense.
All ftrace_ops will soon have a hash of functions they should trace
and not trace. By making a global hash of functions not to trace makes
this easier for the transition.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Since we disable all function tracer processing if we detect
that a modification of a instruction had failed, we do not need
to track that the record has failed. No more ftrace processing
is allowed, and the FTRACE_FL_CONVERTED flag is pointless.
The FTRACE_FL_CONVERTED flag was used to denote records that were
successfully converted from mcount calls into nops. But if a single
record fails, all of ftrace is disabled.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Since we disable all function tracer processing if we detect
that a modification of a instruction had failed, we do not need
to track that the record has failed. No more ftrace processing
is allowed, and the FTRACE_FL_FAILED flag is pointless.
Removing this flag simplifies some of the code, but some ftrace_disabled
checks needed to be added or move around a little.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When the fuction graph tracer starts, it needs to make a special
stack for each task to save the real return values of the tasks.
All running tasks have this stack created, as well as any new
tasks.
On CPU hot plug, the new idle task will allocate a stack as well
when init_idle() is called. The problem is that cpu hotplug does
not create a new idle_task. Instead it uses the idle task that
existed when the cpu went down.
ftrace_graph_init_task() will add a new ret_stack to the task
that is given to it. Because a clone will make the task
have a stack of its parent it does not check if the task's
ret_stack is already NULL or not. When the CPU hotplug code
starts a CPU up again, it will allocate a new stack even
though one already existed for it.
The solution is to treat the idle_task specially. In fact, the
function_graph code already does, just not at init_idle().
Instead of using the ftrace_graph_init_task() for the idle task,
which that function expects the task to be a clone, have a
separate ftrace_graph_init_idle_task(). Also, we will create a
per_cpu ret_stack that is used by the idle task. When we call
ftrace_graph_init_idle_task() it will check if the idle task's
ret_stack is NULL, if it is, then it will assign it the per_cpu
ret_stack.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add more details to the dynamic function tracing design implementation.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
LKML-Reference: <1279610015-10250-1-git-send-email-vapier@gentoo.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Fix "integer as NULL pointer" warning.
tracing: Fix tracepoint.h DECLARE_TRACE() to allow more than one header
tracing: Make the documentation clear on trace_event boot option
ring-buffer: Wrap open-coded WARN_ONCE
tracing: Convert nop macros to static inlines
tracing: Fix sleep time function profiling
tracing: Show sample std dev in function profiling
tracing: Add documentation for trace commands mod, traceon/traceoff
ring-buffer: Make benchmark handle missed events
ring-buffer: Make non-consuming read less expensive with lots of cpus.
tracing: Add graph output support for irqsoff tracer
tracing: Have graph flags passed in to ouput functions
tracing: Add ftrace events for graph tracer
tracing: Dump either the oops's cpu source or all cpus buffers
tracing: Fix uninitialized variable of tracing/trace output
The ftrace.h file contains several functions as macros when the
functions are disabled due to config options. This patch converts
most of them to static inlines.
There are two exceptions:
register_ftrace_function() and unregister_ftrace_function()
This is because their parameter "ops" must not be evaluated since
code using the function is allowed to #ifdef out the creation of
the parameter.
This also fixes an error caused by recent changes:
kernel/trace/trace_irqsoff.c: In function 'start_irqsoff_tracer':
kernel/trace/trace_irqsoff.c:571: error: expected expression before 'do'
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add function graph output to irqsoff tracer.
The graph output is enabled by setting new 'display-graph' trace option.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1270227683-14631-4-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one
dump every cpu buffers when an oops or panic happens.
It's nice when you have few cpus but it may take ages if have many,
plus you miss the real origin of the problem in all the cpu traces.
Sometimes, all you need is to dump the cpu buffer that triggered the
opps, most of the time it is our main interest.
This patch modifies ftrace_dump_on_oops to handle this choice.
The ftrace_dump_on_oops kernel parameter, when it comes alone, has
the same behaviour than before. But ftrace_dump_on_oops=orig_cpu
will only dump the buffer of the cpu that oops'ed.
Similarly, sysctl kernel.ftrace_dump_on_oops=1 and
echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous
behaviour. But setting 2 jumps into cpu origin dump mode.
v2: Fix double setup
v3: Fix spelling issues reported by Randy Dunlap
v4: Also update __ftrace_dump in the selftests
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Support for the PMU's BTS features has been upstreamed in
v2.6.32, but we still have the old and disabled ptrace-BTS,
as Linus noticed it not so long ago.
It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
regard for other uses (perf) and doesn't provide the flexibility
needed for perf either.
Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
was never used and ptrace-block-step can be implemented using a
much simpler approach.
So axe all 3000 lines of it. That includes the *locked_memory*()
APIs in mm/mlock.c as well.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20100325135413.938004390@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Most implementations of arch_syscall_addr() are the same, so create a
default version in common code and move the one piece that differs (the
syscall table) to asm/syscall.h. New arch ports don't have to waste
time copying & pasting this simple function.
The s390/sparc versions need to be different, so document why.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1264498803-17278-1-git-send-email-vapier@gentoo.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Remove record freezing. Because kprobes never puts probe on
ftrace's mcount call anymore, it doesn't need ftrace to check
whether kprobes on it.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100202214925.4694.73469.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introducing *_text_reserved functions for checking the text
address range is partially reserved or not. This patch provides
checking routines for x86 smp alternatives and dynamic ftrace.
Since both functions modify fixed pieces of kernel text, they
should reserve and protect those from other dynamic text
modifier, like kprobes.
This will also be extended when introducing other subsystems
which modify fixed pieces of kernel text. Dynamic text modifiers
should avoid those.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <20100202214911.4694.16587.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When the module is about the unload we release its call records.
The ftrace_release function was given wrong values representing
the module core boundaries, thus not releasing its call records.
Plus making ftrace_release function module specific.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1254934835-363-3-git-send-email-jolsa@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.
It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix the following 'make includecheck' warning:
include/linux/ftrace.h: linux/sched.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1247068321.4382.102.camel@ht.satnam>
In case gcc does something funny with the stack frames, or the return
from function code, we would like to detect that.
An arch may implement passing of a variable that is unique to the
function and can be saved on entering a function and can be tested
when exiting the function. Usually the frame pointer can be used for
this purpose.
This patch also implements this for x86. Where it passes in the stack
frame of the parent function, and will test that frame on exit.
There was a case in x86_32 with optimize for size (-Os) where, for a
few functions, gcc would align the stack frame and place a copy of the
return address into it. The function graph tracer modified the copy and
not the actual return address. On return from the funtion, it did not go
to the tracer hook, but returned to the parent. This broke the function
graph tracer, because the return of the parent (where gcc did not do
this funky manipulation) returned to the location that the child function
was suppose to. This caused strange kernel crashes.
This test detected the problem and pointed out where the issue was.
This modifies the parameters of one of the functions that the arch
specific code calls, so it includes changes to arch code to accommodate
the new prototype.
Note, I notice that the parsic arch implements its own push_return_trace.
This is now a generic function and the ftrace_push_return_trace should be
used instead. This patch does not touch that code.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The tracing infrastructure allows for recursion. That is, an interrupt
may interrupt the act of tracing an event, and that interrupt may very well
perform its own trace. This is a recursive trace, and is fine to do.
The problem arises when there is a bug, and the utility doing the trace
calls something that recurses back into the tracer. This recursion is not
caused by an external event like an interrupt, but by code that is not
expected to recurse. The result could be a lockup.
This patch adds a bitmask to the task structure that keeps track
of the trace recursion. To find the interrupt depth, the following
algorithm is used:
level = hardirq_count() + softirq_count() + in_nmi;
Here, level will be the depth of interrutps and softirqs, and even handles
the nmi. Then the corresponding bit is set in the recursion bitmask.
If the bit was already set, we know we had a recursion at the same level
and we warn about it and fail the writing to the buffer.
After the data has been committed to the buffer, we clear the bit.
No atomics are needed. The only races are with interrupts and they reset
the bitmask before returning anywy.
[ Impact: detect same irq level trace recursion ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The hooks in the module code for the function tracer must be called
before any of that module code runs. The function tracer hooks
modify the module (replacing calls to mcount to nops). If the code
is executed while the change occurs, then the CPU can take a GPF.
To handle the above with a bit of paranoia, I originally implemented
the hooks as calls directly from the module code.
After examining the notifier calls, it looks as though the start up
notify is called before any of the module's code is executed. This makes
the use of the notify safe with ftrace.
Only the startup notify is required to be "safe". The shutdown simply
removes the entries from the ftrace function list, and does not modify
any code.
This change has another benefit. It removes a issue with a reverse dependency
in the mutexes of ftrace_lock and module_mutex.
[ Impact: fix lock dependency bug, cleanup ]
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Merge reason: pick up both v2.6.30-rc1 [which includes tracing/urgent fixes]
and pick up the current lineup of tracing/urgent fixes as well
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix build warnings and possibe compat misbehavior on IA64
Building a kernel on ia64 might trigger these ugly build warnings:
CC arch/ia64/ia32/sys_ia32.o
In file included from arch/ia64/ia32/sys_ia32.c:55:
arch/ia64/ia32/ia32priv.h:290:1: warning: "elf_check_arch" redefined
In file included from include/linux/elf.h:7,
from include/linux/module.h:14,
from include/linux/ftrace.h:8,
from include/linux/syscalls.h:68,
from arch/ia64/ia32/sys_ia32.c:18:
arch/ia64/include/asm/elf.h:19:1: warning: this is the location of the previous definition
[...]
sys_ia32.c includes linux/syscalls.h which in turn includes linux/ftrace.h
to import the syscalls tracing prototypes.
But including ftrace.h can pull too much things for a low level file,
especially on ia64 where the ia32 private headers conflict with higher
level headers.
Now we isolate the syscall tracing headers in their own lightweight file.
Reported-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Baron <jbaron@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Michael Davidson <md@google.com>
LKML-Reference: <20090408184058.GB6017@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: dont break future extensions of INIT_TASK
While not a problem right now, due to lack of a comma, build fails if
elements are appended to INIT_TASK() macro in development code:
arch/x86/kernel/init_task.c:33: error: request for member `XXXXXXXXXX' in something not a structure or union
arch/x86/kernel/init_task.c:33: error: initializer element is not constant
arch/x86/kernel/init_task.c:33: error: (near initialization for `init_task.ret_stack')
make[1]: *** [arch/x86/kernel/init_task.o] Error 1
make: *** [arch/x86/kernel] Error 2
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: srostedt@redhat.com
LKML-Reference: <200904080505.n3855hcn017109@www262.sakura.ne.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix to crash going to kexec
The init task did not properly initialize the function graph pointers.
Altough these pointers are NULL, they can not be assumed to be NULL
for the init task, and must still be properly initialize.
This usually is not an issue since a problem only arises when a task
exits, and the init tasks do not usually exit. But when doing tests
with kexec, the init tasks do exit, and the bug appears.
This patch properly initializes the init tasks function graph data
structures.
Reported-and-Tested-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <alpine.DEB.2.00.0903252053080.5675@gandalf.stny.rr.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
graph time is the time that a function is executing another function.
Thus if function A calls B, if graph-time is set, then the time for
A includes B. This is the default behavior. But if graph-time is off,
then the time spent executing B is subtracted from A.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: reduce size of memory in function profiler
The function profiler originally introduces its counters into the
function records itself. There is 20 thousand different functions on
a normal system, and that is adding 20 thousand counters for profiling
event when not needed.
A normal run of the profiler yields only a couple of thousand functions
executed, depending on what is being profiled. This means we have around
18 thousand useless counters.
This patch rectifies this by moving the data out of the function
records used by dynamic ftrace. Data is preallocated to hold the functions
when the profiling begins. Checks are made during profiling to see if
more recorcds should be allocated, and they are allocated if it is safe
to do so.
This also removes the dependency from using dynamic ftrace, and also
removes the overhead by having it enabled.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: new profiling feature
This patch adds a function profiler. In debugfs/tracing/ two new
files are created.
function_profile_enabled - to enable or disable profiling
trace_stat/functions - the profiled functions.
For example:
echo 1 > /debugfs/tracing/function_profile_enabled
./hackbench 50
echo 0 > /debugfs/tracing/function_profile_enabled
yields:
cat /debugfs/tracing/trace_stat/functions
Function Hit
-------- ---
_spin_lock 10106442
_spin_unlock 10097492
kfree 6013704
_spin_unlock_irqrestore 4423941
_spin_lock_irqsave 4406825
__phys_addr 4181686
__slab_free 4038222
dput 4030130
path_put 4023387
unroll_tree_refs 4019532
[...]
The most hit functions are listed first. Functions that are not
hit are not listed.
This feature depends on and uses dynamic function tracing. When the
function profiling is disabled, no overhead occurs. But it still
takes up around 300KB to hold the data, thus it is not recomended
to keep it enabled for systems low on memory.
When a '1' is echoed into the function_profile_enabled file, the
counters for is function is reset back to zero. Thus you can see what
functions are hit most by different programs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: cleanup
struct dyn_ftrace::ip has different usages in his lifecycle,
we use union for it. And also for struct dyn_ftrace::flags.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <49C871BE.3080405@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch move the timestamp from happening in the arch specific
code into the general code. This allows for better control by the tracer
to time manipulation.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: new feature
This adds the generic support for syscalls tracing. This is
currently exploited through a devoted tracer but other tracing
engines can use it. (They just have to play with
{start,stop}_ftrace_syscalls() and use the display callbacks
unless they want to override them.)
The syscalls prototypes definitions are abused here to steal
some metadata informations:
- syscall name, param types, param names, number of params
The syscall addr is not directly saved during this definition
because we don't know if its prototype is available in the
namespace. But we don't really need it. The arch has just to
build a function able to resolve the syscall number to its
metadata struct.
The current tracer prints the syscall names, parameters names
and values (and their types optionally). Currently the value is
a raw hex but higher level values diplaying is on my TODO list.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1236955332-10133-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: save memory
The struct dyn_ftrace table is very large, this patch will save
about 50%.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49BA2C9F.8020009@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Provide basic callbacks to do syscall tracing.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <1236401580-5758-2-git-send-email-fweisbec@gmail.com>
[ simplified it to a trace_printk() for now. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: faster and lighter tracing
Now that we have trace_bprintk() which is faster and consume lesser
memory than trace_printk() and has the same purpose, we can now drop
the old implementation in favour of the binary one from trace_bprintk(),
which means we move all the implementation of trace_bprintk() to
trace_printk(), so the Api doesn't change except that we must now use
trace_seq_bprintk() to print the TRACE_PRINT entries.
Some changes result of this:
- Previously, trace_bprintk depended of a single tracer and couldn't
work without. This tracer has been dropped and the whole implementation
of trace_printk() (like the module formats management) is now integrated
in the tracing core (comes with CONFIG_TRACING), though we keep the file
trace_printk (previously trace_bprintk.c) where we can find the module
management. Thus we don't overflow trace.c
- changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.
- change a bit trace_printk/trace_vprintk macros to support non-builtin formats
constants, and fix 'const' qualifiers warnings. But this is all transparent for
developers.
- etc...
V2:
- Rebase against last changes
- Fix mispell on the changelog
V3:
- Rebase against last changes (moving trace_printk() to kernel.h)
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add a generic printk() for tracing, like trace_printk()
trace_bprintk() uses the infrastructure to record events on ring_buffer.
[ fweisbec@gmail.com: ported to latest -tip, made it work if
!CONFIG_MODULES, never free the format strings from modules
because we can't keep track of them and conditionnaly create
the ftrace format strings section (reported by Steven Rostedt) ]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: save on memory for tracing
Current tracers are typically using a struct(like struct ftrace_entry,
struct ctx_switch_entry, struct special_entr etc...)to record a binary
event. These structs can only record a their own kind of events.
A new kind of tracer need a new struct and a lot of code too handle it.
So we need a generic binary record for events. This infrastructure
is for this purpose.
[fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
tracing as reported by Steven Rostedt]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: decrease hangs risks with the graph tracer on slow systems
Since the function graph tracer can spend too much time on timer
interrupts, it's better now to use the more lightweight local
clock. Anyway, the function graph traces are more reliable on a
per cpu trace.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <49af243d.06e9300a.53ad.ffff840c@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make common utility functions such as trace_printk() and
tracing_start()/tracing_stop() generally available to kernel
code.
Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Use a more generic name - this also allows the prototype to move
to kernel.h and be generally available to kernel developers who
want to do some quick tracing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The current definition of CALLER_ADDRx isn't suitable for all platforms.
E.g. for ARM __builtin_return_address(N) doesn't work for N > 0 and
AFAIK for powerpc there are no frame pointers needed to have a working
__builtin_return_address. This patch allows defining the CALLER_ADDRx
macros in <asm/ftrace.h> and let these take precedence.
Because now <asm/ftrace.h> is included unconditionally in
<linux/ftrace.h> all archs that don't already had this include get an
empty one for free.
Signed-off-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch creates the weak functions: ftrace_arch_code_modify_prepare
and ftrace_arch_code_modify_post_process that are called before and
after the stop machine is called to modify the kernel text.
If the arch needs to do pre or post processing, it only needs to define
these functions.
[ Update: Ingo Molnar suggested using the name ftrace_arch_code_modify_*
over using ftrace_arch_modify_* ]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
There is nothing really arch specific of the push and pop functions
used by the function graph tracer. This patch moves them to generic
code.
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: clean up
Ingo Molnar did not like the _hook naming convention used by the
select function tracer. Luis Claudio R. Goncalves suggested using
the "_probe" extension. This patch implements the change of
calling the functions and variables "_hook" and replacing them
with "_probe".
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch adds a call back for the tracers that have hooks to
selected functions. This allows the tracer to show better output
in the set_ftrace_filter file.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: new feature
Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.
This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.
The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.
To register a function:
struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};
int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data);
glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced
ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
That is, if the tracer needs to do something special for each
function, that is being traced, and wants to give each function
its own data. The address of the entry data is passed to this
callback, so that the callback may wish to update the entry to
whatever it would like.
free is a callback for when the entry is freed. In case the tracer
allocated any data, it is give the chance to free it.
To unregister we have three functions:
void
unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)
This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).
void
unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)
This will unregister all functions matching glob that has an entry
pointing to ops.
void unregister_ftrace_function_hook_all(char *glob)
This simply unregisters all funcs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Allow for other tracers to add their own commands for function
selection. This interface gives a trace the ability to name a
command for function selection. Right now it is pretty limited
in what it offers, but this is a building step for more features.
The :mod: command is converted to this interface and also serves
as a template for other implementations.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: cleanup
Move the power tracer headers to trace/power.h to keep ftrace.h and power bits
more easy to maintain as separated topics.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up
Fixed several typos in the comments.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Dump the branch trace on an oops (based on ftrace_dump_on_oops).
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Define FTRACE_ADDR. In IA64, a function pointer isn't a 'unsigned long' but a
'struct {unsigned long ip, unsigned long gp}'.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: broaden gcc printf format checks for ftrace_printk()
format arguments checking for ftrace_printk() is __printf(1, 2),
not __printf(1, 0).
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: enhancement to stack tracer
The stack tracer currently is either on when configured in or
off when it is not. It can not be disabled when it is configured on.
(besides disabling the function tracer that it uses)
This patch adds a way to enable or disable the stack tracer at
run time. It defaults off on bootup, but a kernel parameter 'stacktrace'
has been added to enable it on bootup.
A new sysctl has been added "kernel.stack_tracer_enabled" to let
the user enable or disable the stack tracer at run time.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: move most important x86 irq entry-points to a separate subsection
Annotate do_IRQ and smp_apic_timer_interrupt to put them into the .irqentry.text
subsection. These function will so be recognized as hardirq entrypoints for the
function-graph-tracer. We could also annotate other irq entries but the others
are far less important but they can be added on request.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
to my 966c8c12dc sprint_symbol(): use
less stack exposing a bug in slub's list_locations() -
kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
beyond the end of page provided.
The 100 slop which list_locations() allows at end of page looks roughly
enough for all the other stuff it might print after the symbol before
it checks again: break out KSYM_SYMBOL_LEN earlier than before.
Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
them.
[akpm@linux-foundation.org: ftrace.h needs module.h]
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc Miles Lane <miles.lane@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: Provide a way to pause the function graph tracer
As suggested by Steven Rostedt, the previous patch that prevented from
spinlock function tracing shouldn't use the raw_spinlock to fix it.
It's much better to follow lockdep with normal spinlock, so this patch
adds a new flag for each task to make the function graph tracer able
to be paused. We also can send an ftrace_printk whithout worrying of
the irrelevant traced spinlock during insertion.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: trace more functions
When the function graph tracer is configured, three more files are not
traced to prevent only four functions to be traced. And this impacts the
normal function tracer too.
arch/x86/kernel/process_64/32.c:
I had crashes when I let this file traced. After some debugging, I saw
that the "current" task point was changed inside__swtich_to(), ie:
"write_pda(pcurrent, next_p);" inside process_64.c Since the tracer store
the original return address of the function inside current, we had
crashes. Only __switch_to() has to be excluded from tracing.
kernel/module.c and kernel/extable.c:
Because of a function used internally by the function graph tracer:
__kernel_text_address()
To let the other functions inside these files to be traced, this patch
introduces the __notrace_funcgraph function prefix which is __notrace if
function graph tracer is configured and nothing if not.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
As suggested by Steven Rostedt, this patch provide a new macro
task_curr_ret_stack() to move the cpp conditionnal CONFIG into
the linux/ftrace.h headers.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the file:
/debugfs/tracing/set_graph_function
which can be used along with the function graph tracer.
When this file is empty, the function graph tracer will act as
usual. When the file has a function in it, the function graph
tracer will only trace that function.
For example:
# echo blk_unplug > /debugfs/tracing/set_graph_function
# cat /debugfs/tracing/trace
[...]
------------------------------------------
| 2) make-19003 => kjournald-2219
------------------------------------------
2) | blk_unplug() {
2) | dm_unplug_all() {
2) | dm_get_table() {
2) 1.381 us | _read_lock();
2) 0.911 us | dm_table_get();
2) 1. 76 us | _read_unlock();
2) + 12.912 us | }
2) | dm_table_unplug_all() {
2) | blk_unplug() {
2) 0.778 us | generic_unplug_device();
2) 2.409 us | }
2) 5.992 us | }
2) 0.813 us | dm_table_put();
2) + 29. 90 us | }
2) + 34.532 us | }
You can add up to 32 functions into this file. Currently we limit it
to 32, but this may change with later improvements.
To add another function, use the append '>>':
# echo sys_read >> /debugfs/tracing/set_graph_function
# cat /debugfs/tracing/set_graph_function
blk_unplug
sys_read
Using the '>' will clear out the function and write anew:
# echo sys_write > /debug/tracing/set_graph_function
# cat /debug/tracing/set_graph_function
sys_write
Note, if you have function graph running while doing this, the small
time between clearing it and updating it will cause the graph to
record all functions. This should not be an issue because after
it sets the filter, only those functions will be recorded from then on.
If you need to only record a particular function then set this
file first before starting the function graph tracer. In the future
this side effect may be corrected.
The set_graph_function file is similar to the set_ftrace_filter but
it does not take wild cards nor does it allow for more than one
function to be set with a single write. There is no technical reason why
this is the case, I just do not have the time yet to implement that.
Note, dynamic ftrace must be enabled for this to appear because it
uses the dynamic ftrace records to match the name to the mcount
call sites.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: feature, let entry function decide to trace or not
This patch lets the graph tracer entry function decide if the tracing
should be done at the end as well. This requires all function graph
entry functions return 1 if it should trace, or 0 if the return should
not be traced.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new ftrace_graph_stop function
While developing more features of function graph, I hit a bug that
caused the WARN_ON to trigger in the prepare_ftrace_return function.
Well, it was hard for me to find out that was happening because the
bug would not print, it would just cause a hard lockup or reboot.
The reason is that it is not safe to call printk from this function.
Looking further, I also found that it calls unregister_ftrace_graph,
which grabs a mutex and calls kstop machine. This would definitely
lock the box up if it were to trigger.
This patch adds a fast and safe ftrace_graph_stop() which will
stop the function tracer. Then it is safe to call the WARN ON.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new "power-tracer" ftrace plugin
This patch adds a C/P-state ftrace plugin that will generate
detailed statistics about the C/P-states that are being used,
so that we can look at detailed decisions that the C/P-state
code is making, rather than the too high level "average"
that we have today.
An example way of using this is:
mount -t debugfs none /sys/kernel/debug
echo cstate > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_enabled
sleep 1
echo 0 > /sys/kernel/debug/tracing/tracing_enabled
cat /sys/kernel/debug/tracing/trace | perl scripts/trace/cstate.pl > out.svg
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: more efficient code for ftrace graph tracer
This patch uses the dynamic patching, when available, to patch
the function graph code into the kernel.
This patch will ease the way for letting both function tracing
and function graph tracing run together.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>