Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (105 commits)
  ring-buffer: only enable ring_buffer_swap_cpu when needed
  ring-buffer: check for swapped buffers in start of committing
  tracing: report error in trace if we fail to swap latency buffer
  tracing: add trace_array_printk for internal tracers to use
  tracing: pass around ring buffer instead of tracer
  tracing: make tracing_reset safe for external use
  tracing: use timestamp to determine start of latency traces
  tracing: Remove mentioning of legacy latency_trace file from documentation
  tracing/filters: Defer pred allocation, fix memory leak
  tracing: remove users of tracing_reset
  tracing: disable buffers and synchronize_sched before resetting
  tracing: disable update max tracer while reading trace
  tracing: print out start and stop in latency traces
  ring-buffer: disable all cpu buffers when one finds a problem
  ring-buffer: do not count discarded events
  ring-buffer: remove ring_buffer_event_discard
  ring-buffer: fix ring_buffer_read crossing pages
  ring-buffer: remove unnecessary cpu_relax
  ring-buffer: do not swap buffers during a commit
  ring-buffer: do not reset while in a commit
  ...
This commit is contained in:
Linus Torvalds 2009-09-11 13:24:03 -07:00
commit 483e3cd6a3
65 changed files with 4129 additions and 1309 deletions

View file

@ -2480,6 +2480,11 @@ and is between 256 and 4096 characters. It is defined in the file
trace_buf_size=nn[KMG]
[FTRACE] will set tracing buffer size.
trace_event=[event-list]
[FTRACE] Set and start specified trace events in order
to facilitate early boot debugging.
See also Documentation/trace/events.txt
trix= [HW,OSS] MediaTrix AudioTrix Pro
Format:
<io>,<irq>,<dma>,<dma2>,<sb_io>,<sb_irq>,<sb_dma>,<mpu_io>,<mpu_irq>

View file

@ -83,6 +83,15 @@ When reading one of these enable files, there are four results:
X - there is a mixture of events enabled and disabled
? - this file does not affect any event
2.3 Boot option
---------------
In order to facilitate early boot debugging, use boot option:
trace_event=[event-list]
The format of this boot option is the same as described in section 2.1.
3. Defining an event-enabled tracepoint
=======================================

View file

@ -85,26 +85,19 @@ of ftrace. Here is a list of some of the key files:
This file holds the output of the trace in a human
readable format (described below).
latency_trace:
This file shows the same trace but the information
is organized more to display possible latencies
in the system (described below).
trace_pipe:
The output is the same as the "trace" file but this
file is meant to be streamed with live tracing.
Reads from this file will block until new data
is retrieved. Unlike the "trace" and "latency_trace"
files, this file is a consumer. This means reading
from this file causes sequential reads to display
more current data. Once data is read from this
file, it is consumed, and will not be read
again with a sequential read. The "trace" and
"latency_trace" files are static, and if the
tracer is not adding more data, they will display
the same information every time they are read.
Reads from this file will block until new data is
retrieved. Unlike the "trace" file, this file is a
consumer. This means reading from this file causes
sequential reads to display more current data. Once
data is read from this file, it is consumed, and
will not be read again with a sequential read. The
"trace" file is static, and if the tracer is not
adding more data,they will display the same
information every time they are read.
trace_options:
@ -117,10 +110,10 @@ of ftrace. Here is a list of some of the key files:
Some of the tracers record the max latency.
For example, the time interrupts are disabled.
This time is saved in this file. The max trace
will also be stored, and displayed by either
"trace" or "latency_trace". A new max trace will
only be recorded if the latency is greater than
the value in this file. (in microseconds)
will also be stored, and displayed by "trace".
A new max trace will only be recorded if the
latency is greater than the value in this
file. (in microseconds)
buffer_size_kb:
@ -210,7 +203,7 @@ Here is the list of current tracers that may be configured.
the trace with the longest max latency.
See tracing_max_latency. When a new max is recorded,
it replaces the old trace. It is best to view this
trace via the latency_trace file.
trace with the latency-format option enabled.
"preemptoff"
@ -307,8 +300,8 @@ the lowest priority thread (pid 0).
Latency trace format
--------------------
For traces that display latency times, the latency_trace file
gives somewhat more information to see why a latency happened.
When the latency-format option is enabled, the trace file gives
somewhat more information to see why a latency happened.
Here is a typical trace.
# tracer: irqsoff
@ -380,9 +373,10 @@ explains which is which.
The above is mostly meaningful for kernel developers.
time: This differs from the trace file output. The trace file output
includes an absolute timestamp. The timestamp used by the
latency_trace file is relative to the start of the trace.
time: When the latency-format option is enabled, the trace file
output includes a timestamp relative to the start of the
trace. This differs from the output when latency-format
is disabled, which includes an absolute timestamp.
delay: This is just to help catch your eye a bit better. And
needs to be fixed to be only relative to the same CPU.
@ -440,7 +434,8 @@ Here are the available options:
sym-addr:
bash-4000 [01] 1477.606694: simple_strtoul <c0339346>
verbose - This deals with the latency_trace file.
verbose - This deals with the trace file when the
latency-format option is enabled.
bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \
(+0.000ms): simple_strtoul (strict_strtoul)
@ -472,7 +467,7 @@ Here are the available options:
the app is no longer running
The lookup is performed when you read
trace,trace_pipe,latency_trace. Example:
trace,trace_pipe. Example:
a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
@ -481,6 +476,11 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
every scheduling event. Will add overhead if
there's a lot of tasks running at once.
latency-format - This option changes the trace. When
it is enabled, the trace displays
additional information about the
latencies, as described in "Latency
trace format".
sched_switch
------------
@ -596,12 +596,13 @@ To reset the maximum, echo 0 into tracing_max_latency. Here is
an example:
# echo irqsoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# ls -ltr
[...]
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: irqsoff
#
irqsoff latency trace v1.1.5 on 2.6.26
@ -703,12 +704,13 @@ which preemption was disabled. The control of preemptoff tracer
is much like the irqsoff tracer.
# echo preemptoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# ls -ltr
[...]
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: preemptoff
#
preemptoff latency trace v1.1.5 on 2.6.26-rc8
@ -850,12 +852,13 @@ Again, using this trace is much like the irqsoff and preemptoff
tracers.
# echo preemptirqsoff > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# ls -ltr
[...]
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: preemptirqsoff
#
preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
@ -1012,11 +1015,12 @@ Instead of performing an 'ls', we will run 'sleep 1' under
'chrt' which changes the priority of the task.
# echo wakeup > current_tracer
# echo latency-format > trace_options
# echo 0 > tracing_max_latency
# echo 1 > tracing_enabled
# chrt -f 5 sleep 1
# echo 0 > tracing_enabled
# cat latency_trace
# cat trace
# tracer: wakeup
#
wakeup latency trace v1.1.5 on 2.6.26-rc8

View file

@ -0,0 +1,42 @@
" Enable folding for ftrace function_graph traces.
"
" To use, :source this file while viewing a function_graph trace, or use vim's
" -S option to load from the command-line together with a trace. You can then
" use the usual vim fold commands, such as "za", to open and close nested
" functions. While closed, a fold will show the total time taken for a call,
" as would normally appear on the line with the closing brace. Folded
" functions will not include finish_task_switch(), so folding should remain
" relatively sane even through a context switch.
"
" Note that this will almost certainly only work well with a
" single-CPU trace (e.g. trace-cmd report --cpu 1).
function! FunctionGraphFoldExpr(lnum)
let line = getline(a:lnum)
if line[-1:] == '{'
if line =~ 'finish_task_switch() {$'
return '>1'
endif
return 'a1'
elseif line[-1:] == '}'
return 's1'
else
return '='
endif
endfunction
function! FunctionGraphFoldText()
let s = split(getline(v:foldstart), '|', 1)
if getline(v:foldend+1) =~ 'finish_task_switch() {$'
let s[2] = ' task switch '
else
let e = split(getline(v:foldend), '|', 1)
let s[2] = e[2]
endif
return join(s, '|')
endfunction
setlocal foldexpr=FunctionGraphFoldExpr(v:lnum)
setlocal foldtext=FunctionGraphFoldText()
setlocal foldcolumn=12
setlocal foldmethod=expr

View file

@ -0,0 +1,955 @@
Lockless Ring Buffer Design
===========================
Copyright 2009 Red Hat Inc.
Author: Steven Rostedt <srostedt@redhat.com>
License: The GNU Free Documentation License, Version 1.2
(dual licensed under the GPL v2)
Reviewers: Mathieu Desnoyers, Huang Ying, Hidetoshi Seto,
and Frederic Weisbecker.
Written for: 2.6.31
Terminology used in this Document
---------------------------------
tail - where new writes happen in the ring buffer.
head - where new reads happen in the ring buffer.
producer - the task that writes into the ring buffer (same as writer)
writer - same as producer
consumer - the task that reads from the buffer (same as reader)
reader - same as consumer.
reader_page - A page outside the ring buffer used solely (for the most part)
by the reader.
head_page - a pointer to the page that the reader will use next
tail_page - a pointer to the page that will be written to next
commit_page - a pointer to the page with the last finished non nested write.
cmpxchg - hardware assisted atomic transaction that performs the following:
A = B iff previous A == C
R = cmpxchg(A, C, B) is saying that we replace A with B if and only if
current A is equal to C, and we put the old (current) A into R
R gets the previous A regardless if A is updated with B or not.
To see if the update was successful a compare of R == C may be used.
The Generic Ring Buffer
-----------------------
The ring buffer can be used in either an overwrite mode or in
producer/consumer mode.
Producer/consumer mode is where the producer were to fill up the
buffer before the consumer could free up anything, the producer
will stop writing to the buffer. This will lose most recent events.
Overwrite mode is where the produce were to fill up the buffer
before the consumer could free up anything, the producer will
overwrite the older data. This will lose the oldest events.
No two writers can write at the same time (on the same per cpu buffer),
but a writer may interrupt another writer, but it must finish writing
before the previous writer may continue. This is very important to the
algorithm. The writers act like a "stack". The way interrupts works
enforces this behavior.
writer1 start
<preempted> writer2 start
<preempted> writer3 start
writer3 finishes
writer2 finishes
writer1 finishes
This is very much like a writer being preempted by an interrupt and
the interrupt doing a write as well.
Readers can happen at any time. But no two readers may run at the
same time, nor can a reader preempt/interrupt another reader. A reader
can not preempt/interrupt a writer, but it may read/consume from the
buffer at the same time as a writer is writing, but the reader must be
on another processor to do so. A reader may read on its own processor
and can be preempted by a writer.
A writer can preempt a reader, but a reader can not preempt a writer.
But a reader can read the buffer at the same time (on another processor)
as a writer.
The ring buffer is made up of a list of pages held together by a link list.
At initialization a reader page is allocated for the reader that is not
part of the ring buffer.
The head_page, tail_page and commit_page are all initialized to point
to the same page.
The reader page is initialized to have its next pointer pointing to
the head page, and its previous pointer pointing to a page before
the head page.
The reader has its own page to use. At start up time, this page is
allocated but is not attached to the list. When the reader wants
to read from the buffer, if its page is empty (like it is on start up)
it will swap its page with the head_page. The old reader page will
become part of the ring buffer and the head_page will be removed.
The page after the inserted page (old reader_page) will become the
new head page.
Once the new page is given to the reader, the reader could do what
it wants with it, as long as a writer has left that page.
A sample of how the reader page is swapped: Note this does not
show the head page in the buffer, it is for demonstrating a swap
only.
+------+
|reader| RING BUFFER
|page |
+------+
+---+ +---+ +---+
| |-->| |-->| |
| |<--| |<--| |
+---+ +---+ +---+
^ | ^ |
| +-------------+ |
+-----------------+
+------+
|reader| RING BUFFER
|page |-------------------+
+------+ v
| +---+ +---+ +---+
| | |-->| |-->| |
| | |<--| |<--| |<-+
| +---+ +---+ +---+ |
| ^ | ^ | |
| | +-------------+ | |
| +-----------------+ |
+------------------------------------+
+------+
|reader| RING BUFFER
|page |-------------------+
+------+ <---------------+ v
| ^ +---+ +---+ +---+
| | | |-->| |-->| |
| | | | | |<--| |<-+
| | +---+ +---+ +---+ |
| | | ^ | |
| | +-------------+ | |
| +-----------------------------+ |
+------------------------------------+
+------+
|buffer| RING BUFFER
|page |-------------------+
+------+ <---------------+ v
| ^ +---+ +---+ +---+
| | | | | |-->| |
| | New | | | |<--| |<-+
| | Reader +---+ +---+ +---+ |
| | page ----^ | |
| | | |
| +-----------------------------+ |
+------------------------------------+
It is possible that the page swapped is the commit page and the tail page,
if what is in the ring buffer is less than what is held in a buffer page.
reader page commit page tail page
| | |
v | |
+---+ | |
| |<----------+ |
| |<------------------------+
| |------+
+---+ |
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
This case is still valid for this algorithm.
When the writer leaves the page, it simply goes into the ring buffer
since the reader page still points to the next location in the ring
buffer.
The main pointers:
reader page - The page used solely by the reader and is not part
of the ring buffer (may be swapped in)
head page - the next page in the ring buffer that will be swapped
with the reader page.
tail page - the page where the next write will take place.
commit page - the page that last finished a write.
The commit page only is updated by the outer most writer in the
writer stack. A writer that preempts another writer will not move the
commit page.
When data is written into the ring buffer, a position is reserved
in the ring buffer and passed back to the writer. When the writer
is finished writing data into that position, it commits the write.
Another write (or a read) may take place at anytime during this
transaction. If another write happens it must finish before continuing
with the previous write.
Write reserve:
Buffer page
+---------+
|written |
+---------+ <--- given back to writer (current commit)
|reserved |
+---------+ <--- tail pointer
| empty |
+---------+
Write commit:
Buffer page
+---------+
|written |
+---------+
|written |
+---------+ <--- next positon for write (current commit)
| empty |
+---------+
If a write happens after the first reserve:
Buffer page
+---------+
|written |
+---------+ <-- current commit
|reserved |
+---------+ <--- given back to second writer
|reserved |
+---------+ <--- tail pointer
After second writer commits:
Buffer page
+---------+
|written |
+---------+ <--(last full commit)
|reserved |
+---------+
|pending |
|commit |
+---------+ <--- tail pointer
When the first writer commits:
Buffer page
+---------+
|written |
+---------+
|written |
+---------+
|written |
+---------+ <--(last full commit and tail pointer)
The commit pointer points to the last write location that was
committed without preempting another write. When a write that
preempted another write is committed, it only becomes a pending commit
and will not be a full commit till all writes have been committed.
The commit page points to the page that has the last full commit.
The tail page points to the page with the last write (before
committing).
The tail page is always equal to or after the commit page. It may
be several pages ahead. If the tail page catches up to the commit
page then no more writes may take place (regardless of the mode
of the ring buffer: overwrite and produce/consumer).
The order of pages are:
head page
commit page
tail page
Possible scenario:
tail page
head page commit page |
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
There is a special case that the head page is after either the commit page
and possibly the tail page. That is when the commit (and tail) page has been
swapped with the reader page. This is because the head page is always
part of the ring buffer, but the reader page is not. When ever there
has been less than a full page that has been committed inside the ring buffer,
and a reader swaps out a page, it will be swapping out the commit page.
reader page commit page tail page
| | |
v | |
+---+ | |
| |<----------+ |
| |<------------------------+
| |------+
+---+ |
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
head page
In this case, the head page will not move when the tail and commit
move back into the ring buffer.
The reader can not swap a page into the ring buffer if the commit page
is still on that page. If the read meets the last commit (real commit
not pending or reserved), then there is nothing more to read.
The buffer is considered empty until another full commit finishes.
When the tail meets the head page, if the buffer is in overwrite mode,
the head page will be pushed ahead one. If the buffer is in producer/consumer
mode, the write will fail.
Overwrite mode:
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
head page
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
head page
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
head page
Note, the reader page will still point to the previous head page.
But when a swap takes place, it will use the most recent head page.
Making the Ring Buffer Lockless:
--------------------------------
The main idea behind the lockless algorithm is to combine the moving
of the head_page pointer with the swapping of pages with the reader.
State flags are placed inside the pointer to the page. To do this,
each page must be aligned in memory by 4 bytes. This will allow the 2
least significant bits of the address to be used as flags. Since
they will always be zero for the address. To get the address,
simply mask out the flags.
MASK = ~3
address & MASK
Two flags will be kept by these two bits:
HEADER - the page being pointed to is a head page
UPDATE - the page being pointed to is being updated by a writer
and was or is about to be a head page.
reader page
|
v
+---+
| |------+
+---+ |
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The above pointer "-H->" would have the HEADER flag set. That is
the next page is the next page to be swapped out by the reader.
This pointer means the next page is the head page.
When the tail page meets the head pointer, it will use cmpxchg to
change the pointer to the UPDATE state:
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
"-U->" represents a pointer in the UPDATE state.
Any access to the reader will need to take some sort of lock to serialize
the readers. But the writers will never take a lock to write to the
ring buffer. This means we only need to worry about a single reader,
and writes only preempt in "stack" formation.
When the reader tries to swap the page with the ring buffer, it
will also use cmpxchg. If the flag bit in the pointer to the
head page does not have the HEADER flag set, the compare will fail
and the reader will need to look for the new head page and try again.
Note, the flag UPDATE and HEADER are never set at the same time.
The reader swaps the reader page as follows:
+------+
|reader| RING BUFFER
|page |
+------+
+---+ +---+ +---+
| |--->| |--->| |
| |<---| |<---| |
+---+ +---+ +---+
^ | ^ |
| +---------------+ |
+-----H-------------+
The reader sets the reader page next pointer as HEADER to the page after
the head page.
+------+
|reader| RING BUFFER
|page |-------H-----------+
+------+ v
| +---+ +---+ +---+
| | |--->| |--->| |
| | |<---| |<---| |<-+
| +---+ +---+ +---+ |
| ^ | ^ | |
| | +---------------+ | |
| +-----H-------------+ |
+--------------------------------------+
It does a cmpxchg with the pointer to the previous head page to make it
point to the reader page. Note that the new pointer does not have the HEADER
flag set. This action atomically moves the head page forward.
+------+
|reader| RING BUFFER
|page |-------H-----------+
+------+ v
| ^ +---+ +---+ +---+
| | | |-->| |-->| |
| | | |<--| |<--| |<-+
| | +---+ +---+ +---+ |
| | | ^ | |
| | +-------------+ | |
| +-----------------------------+ |
+------------------------------------+
After the new head page is set, the previous pointer of the head page is
updated to the reader page.
+------+
|reader| RING BUFFER
|page |-------H-----------+
+------+ <---------------+ v
| ^ +---+ +---+ +---+
| | | |-->| |-->| |
| | | | | |<--| |<-+
| | +---+ +---+ +---+ |
| | | ^ | |
| | +-------------+ | |
| +-----------------------------+ |
+------------------------------------+
+------+
|buffer| RING BUFFER
|page |-------H-----------+ <--- New head page
+------+ <---------------+ v
| ^ +---+ +---+ +---+
| | | | | |-->| |
| | New | | | |<--| |<-+
| | Reader +---+ +---+ +---+ |
| | page ----^ | |
| | | |
| +-----------------------------+ |
+------------------------------------+
Another important point. The page that the reader page points back to
by its previous pointer (the one that now points to the new head page)
never points back to the reader page. That is because the reader page is
not part of the ring buffer. Traversing the ring buffer via the next pointers
will always stay in the ring buffer. Traversing the ring buffer via the
prev pointers may not.
Note, the way to determine a reader page is simply by examining the previous
pointer of the page. If the next pointer of the previous page does not
point back to the original page, then the original page is a reader page:
+--------+
| reader | next +----+
| page |-------->| |<====== (buffer page)
+--------+ +----+
| | ^
| v | next
prev | +----+
+------------->| |
+----+
The way the head page moves forward:
When the tail page meets the head page and the buffer is in overwrite mode
and more writes take place, the head page must be moved forward before the
writer may move the tail page. The way this is done is that the writer
performs a cmpxchg to convert the pointer to the head page from the HEADER
flag to have the UPDATE flag set. Once this is done, the reader will
not be able to swap the head page from the buffer, nor will it be able to
move the head page, until the writer is finished with the move.
This eliminates any races that the reader can have on the writer. The reader
must spin, and this is why the reader can not preempt the writer.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The following page will be made into the new head page.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
After the new head page has been set, we can set the old head page
pointer back to NORMAL.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
After the head page has been moved, the tail page may now move forward.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The above are the trivial updates. Now for the more complex scenarios.
As stated before, if enough writes preempt the first write, the
tail page may make it all the way around the buffer and meet the commit
page. At this time, we must start dropping writes (usually with some kind
of warning to the user). But what happens if the commit was still on the
reader page? The commit page is not part of the ring buffer. The tail page
must account for this.
reader page commit page
| |
v |
+---+ |
| |<----------+
| |
| |------+
+---+ |
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
^
|
tail page
If the tail page were to simply push the head page forward, the commit when
leaving the reader page would not be pointing to the correct page.
The solution to this is to test if the commit page is on the reader page
before pushing the head page. If it is, then it can be assumed that the
tail page wrapped the buffer, and we must drop new writes.
This is not a race condition, because the commit page can only be moved
by the outter most writer (the writer that was preempted).
This means that the commit will not move while a writer is moving the
tail page. The reader can not swap the reader page if it is also being
used as the commit page. The reader can simply check that the commit
is off the reader page. Once the commit page leaves the reader page
it will never go back on it unless a reader does another swap with the
buffer page that is also the commit page.
Nested writes
-------------
In the pushing forward of the tail page we must first push forward
the head page if the head page is the next page. If the head page
is not the next page, the tail page is simply updated with a cmpxchg.
Only writers move the tail page. This must be done atomically to protect
against nested writers.
temp_page = tail_page
next_page = temp_page->next
cmpxchg(tail_page, temp_page, next_page)
The above will update the tail page if it is still pointing to the expected
page. If this fails, a nested write pushed it forward, the the current write
does not need to push it.
temp page
|
v
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Nested write comes in and moves the tail page forward:
tail page (moved by nested writer)
temp page |
| |
v v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The above would fail the cmpxchg, but since the tail page has already
been moved forward, the writer will just try again to reserve storage
on the new tail page.
But the moving of the head page is a bit more complex.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The write converts the head page pointer to UPDATE.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
But if a nested writer preempts here. It will see that the next
page is a head page, but it is also nested. It will detect that
it is nested and will save that information. The detection is the
fact that it sees the UPDATE flag instead of a HEADER or NORMAL
pointer.
The nested writer will set the new head page pointer.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
But it will not reset the update back to normal. Only the writer
that converted a pointer from HEAD to UPDATE will convert it back
to NORMAL.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
After the nested writer finishes, the outer most writer will convert
the UPDATE pointer to NORMAL.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
It can be even more complex if several nested writes came in and moved
the tail page ahead several pages:
(first writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The write converts the head page pointer to UPDATE.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Next writer comes in, and sees the update and sets up the new
head page.
(second writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The nested writer moves the tail page forward. But does not set the old
update page to NORMAL because it is not the outer most writer.
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Another writer preempts and sees the page after the tail page is a head page.
It changes it from HEAD to UPDATE.
(third writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-U->| |--->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The writer will move the head page forward:
(third writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-U->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
But now that the third writer did change the HEAD flag to UPDATE it
will convert it to normal:
(third writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Then it will move the tail page, and return back to the second writer.
(second writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The second writer will fail to move the tail page because it was already
moved, so it will try again and add its data to the new tail page.
It will return to the first writer.
(first writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
The first writer can not know atomically test if the tail page moved
while it updates the HEAD page. It will then update the head page to
what it thinks is the new head page.
(first writer)
tail page
|
v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Since the cmpxchg returns the old value of the pointer the first writer
will see it succeeded in updating the pointer from NORMAL to HEAD.
But as we can see, this is not good enough. It must also check to see
if the tail page is either where it use to be or on the next page:
(first writer)
A B tail page
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
If tail page != A and tail page does not equal B, then it must reset the
pointer back to NORMAL. The fact that it only needs to worry about
nested writers, it only needs to check this after setting the HEAD page.
(first writer)
A B tail page
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+
Now the writer can update the head page. This is also why the head page must
remain in UPDATE and only reset by the outer most writer. This prevents
the reader from seeing the incorrect head page.
(first writer)
A B tail page
| | |
v v v
+---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |-H->
--->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+

View file

@ -84,7 +84,7 @@ config S390
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_TRACE_MCOUNT_TEST
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FTRACE_SYSCALLS
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_DYNAMIC_FTRACE
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_DEFAULT_NO_SPIN_MUTEXES

View file

@ -900,7 +900,7 @@ CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_FUNCTION_TRACER is not set

View file

@ -92,7 +92,7 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_SYSCALL_TRACE 8 /* syscall trace active */
#define TIF_SYSCALL_AUDIT 9 /* syscall auditing active */
#define TIF_SECCOMP 10 /* secure computing */
#define TIF_SYSCALL_FTRACE 11 /* ftrace syscall instrumentation */
#define TIF_SYSCALL_TRACEPOINT 11 /* syscall tracepoint instrumentation */
#define TIF_USEDFPU 16 /* FPU was used by this task this quantum (SMP) */
#define TIF_POLLING_NRFLAG 17 /* true if poll_idle() is polling
TIF_NEED_RESCHED */
@ -111,7 +111,7 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
#define _TIF_SYSCALL_AUDIT (1<<TIF_SYSCALL_AUDIT)
#define _TIF_SECCOMP (1<<TIF_SECCOMP)
#define _TIF_SYSCALL_FTRACE (1<<TIF_SYSCALL_FTRACE)
#define _TIF_SYSCALL_TRACEPOINT (1<<TIF_SYSCALL_TRACEPOINT)
#define _TIF_USEDFPU (1<<TIF_USEDFPU)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
#define _TIF_31BIT (1<<TIF_31BIT)

View file

@ -54,7 +54,7 @@ _TIF_WORK_SVC = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
_TIF_WORK_INT = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
_TIF_MCCK_PENDING)
_TIF_SYSCALL = (_TIF_SYSCALL_TRACE>>8 | _TIF_SYSCALL_AUDIT>>8 | \
_TIF_SECCOMP>>8 | _TIF_SYSCALL_FTRACE>>8)
_TIF_SECCOMP>>8 | _TIF_SYSCALL_TRACEPOINT>>8)
STACK_SHIFT = PAGE_SHIFT + THREAD_ORDER
STACK_SIZE = 1 << STACK_SHIFT

View file

@ -57,7 +57,7 @@ _TIF_WORK_SVC = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
_TIF_WORK_INT = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
_TIF_MCCK_PENDING)
_TIF_SYSCALL = (_TIF_SYSCALL_TRACE>>8 | _TIF_SYSCALL_AUDIT>>8 | \
_TIF_SECCOMP>>8 | _TIF_SYSCALL_FTRACE>>8)
_TIF_SECCOMP>>8 | _TIF_SYSCALL_TRACEPOINT>>8)
#define BASED(name) name-system_call(%r13)

View file

@ -220,6 +220,29 @@ struct syscall_metadata *syscall_nr_to_meta(int nr)
return syscalls_metadata[nr];
}
int syscall_name_to_nr(char *name)
{
int i;
if (!syscalls_metadata)
return -1;
for (i = 0; i < NR_syscalls; i++)
if (syscalls_metadata[i])
if (!strcmp(syscalls_metadata[i]->name, name))
return i;
return -1;
}
void set_syscall_enter_id(int num, int id)
{
syscalls_metadata[num]->enter_id = id;
}
void set_syscall_exit_id(int num, int id)
{
syscalls_metadata[num]->exit_id = id;
}
static struct syscall_metadata *find_syscall_meta(unsigned long syscall)
{
struct syscall_metadata *start;
@ -237,24 +260,19 @@ static struct syscall_metadata *find_syscall_meta(unsigned long syscall)
return NULL;
}
void arch_init_ftrace_syscalls(void)
static int __init arch_init_ftrace_syscalls(void)
{
struct syscall_metadata *meta;
int i;
static atomic_t refs;
if (atomic_inc_return(&refs) != 1)
goto out;
syscalls_metadata = kzalloc(sizeof(*syscalls_metadata) * NR_syscalls,
GFP_KERNEL);
if (!syscalls_metadata)
goto out;
return -ENOMEM;
for (i = 0; i < NR_syscalls; i++) {
meta = find_syscall_meta((unsigned long)sys_call_table[i]);
syscalls_metadata[i] = meta;
}
return;
out:
atomic_dec(&refs);
return 0;
}
arch_initcall(arch_init_ftrace_syscalls);
#endif

View file

@ -51,6 +51,9 @@
#include "compat_ptrace.h"
#endif
#define CREATE_TRACE_POINTS
#include <trace/events/syscalls.h>
enum s390_regset {
REGSET_GENERAL,
REGSET_FP,
@ -661,8 +664,8 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
ret = -1;
}
if (unlikely(test_thread_flag(TIF_SYSCALL_FTRACE)))
ftrace_syscall_enter(regs);
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->gprs[2]);
if (unlikely(current->audit_context))
audit_syscall_entry(is_compat_task() ?
@ -679,8 +682,8 @@ asmlinkage void do_syscall_trace_exit(struct pt_regs *regs)
audit_syscall_exit(AUDITSC_RESULT(regs->gprs[2]),
regs->gprs[2]);
if (unlikely(test_thread_flag(TIF_SYSCALL_FTRACE)))
ftrace_syscall_exit(regs);
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_exit(regs, regs->gprs[2]);
if (test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, 0);

View file

@ -38,7 +38,7 @@ config X86
select HAVE_FUNCTION_GRAPH_FP_TEST
select HAVE_FUNCTION_TRACE_MCOUNT_TEST
select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE
select HAVE_FTRACE_SYSCALLS
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KVM
select HAVE_ARCH_KGDB
select HAVE_ARCH_TRACEHOOK

View file

@ -2355,7 +2355,7 @@ CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_HW_BRANCH_TRACER=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_RING_BUFFER=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y

View file

@ -2329,7 +2329,7 @@ CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_HW_BRANCH_TRACER=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_RING_BUFFER=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y

View file

@ -28,13 +28,6 @@
#endif
/* FIXME: I don't want to stay hardcoded */
#ifdef CONFIG_X86_64
# define FTRACE_SYSCALL_MAX 296
#else
# define FTRACE_SYSCALL_MAX 333
#endif
#ifdef CONFIG_FUNCTION_TRACER
#define MCOUNT_ADDR ((long)(mcount))
#define MCOUNT_INSN_SIZE 5 /* sizeof mcount call */

View file

@ -95,7 +95,7 @@ struct thread_info {
#define TIF_DEBUGCTLMSR 25 /* uses thread_struct.debugctlmsr */
#define TIF_DS_AREA_MSR 26 /* uses thread_struct.ds_area_msr */
#define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */
#define TIF_SYSCALL_FTRACE 28 /* for ftrace syscall instrumentation */
#define TIF_SYSCALL_TRACEPOINT 28 /* syscall tracepoint instrumentation */
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
@ -118,17 +118,17 @@ struct thread_info {
#define _TIF_DEBUGCTLMSR (1 << TIF_DEBUGCTLMSR)
#define _TIF_DS_AREA_MSR (1 << TIF_DS_AREA_MSR)
#define _TIF_LAZY_MMU_UPDATES (1 << TIF_LAZY_MMU_UPDATES)
#define _TIF_SYSCALL_FTRACE (1 << TIF_SYSCALL_FTRACE)
#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT)
/* work to do in syscall_trace_enter() */
#define _TIF_WORK_SYSCALL_ENTRY \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_FTRACE | \
_TIF_SYSCALL_AUDIT | _TIF_SECCOMP | _TIF_SINGLESTEP)
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT | \
_TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT)
/* work to do in syscall_trace_leave() */
#define _TIF_WORK_SYSCALL_EXIT \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP | \
_TIF_SYSCALL_FTRACE)
_TIF_SYSCALL_TRACEPOINT)
/* work to do on interrupt/exception return */
#define _TIF_WORK_MASK \
@ -137,7 +137,8 @@ struct thread_info {
_TIF_SINGLESTEP|_TIF_SECCOMP|_TIF_SYSCALL_EMU))
/* work to do on any return to user space */
#define _TIF_ALLWORK_MASK ((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_FTRACE)
#define _TIF_ALLWORK_MASK \
((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT)
/* Only used for 64 bit */
#define _TIF_DO_NOTIFY_MASK \

View file

@ -345,6 +345,8 @@
#ifdef __KERNEL__
#define NR_syscalls 337
#define __ARCH_WANT_IPC_PARSE_VERSION
#define __ARCH_WANT_OLD_READDIR
#define __ARCH_WANT_OLD_STAT

View file

@ -688,6 +688,12 @@ __SYSCALL(__NR_perf_counter_open, sys_perf_counter_open)
#endif /* __NO_STUBS */
#ifdef __KERNEL__
#ifndef COMPILE_OFFSETS
#include <asm/asm-offsets.h>
#define NR_syscalls (__NR_syscall_max + 1)
#endif
/*
* "Conditional" syscalls
*

View file

@ -3,6 +3,7 @@
* This code generates raw asm output which is post-processed to extract
* and format the required data.
*/
#define COMPILE_OFFSETS
#include <linux/crypto.h>
#include <linux/sched.h>

View file

@ -417,10 +417,6 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
unsigned long return_hooker = (unsigned long)
&return_to_handler;
/* Nmi's are currently unsupported */
if (unlikely(in_nmi()))
return;
if (unlikely(atomic_read(&current->tracing_graph_pause)))
return;
@ -498,37 +494,56 @@ static struct syscall_metadata *find_syscall_meta(unsigned long *syscall)
struct syscall_metadata *syscall_nr_to_meta(int nr)
{
if (!syscalls_metadata || nr >= FTRACE_SYSCALL_MAX || nr < 0)
if (!syscalls_metadata || nr >= NR_syscalls || nr < 0)
return NULL;
return syscalls_metadata[nr];
}
void arch_init_ftrace_syscalls(void)
int syscall_name_to_nr(char *name)
{
int i;
if (!syscalls_metadata)
return -1;
for (i = 0; i < NR_syscalls; i++) {
if (syscalls_metadata[i]) {
if (!strcmp(syscalls_metadata[i]->name, name))
return i;
}
}
return -1;
}
void set_syscall_enter_id(int num, int id)
{
syscalls_metadata[num]->enter_id = id;
}
void set_syscall_exit_id(int num, int id)
{
syscalls_metadata[num]->exit_id = id;
}
static int __init arch_init_ftrace_syscalls(void)
{
int i;
struct syscall_metadata *meta;
unsigned long **psys_syscall_table = &sys_call_table;
static atomic_t refs;
if (atomic_inc_return(&refs) != 1)
goto end;
syscalls_metadata = kzalloc(sizeof(*syscalls_metadata) *
FTRACE_SYSCALL_MAX, GFP_KERNEL);
NR_syscalls, GFP_KERNEL);
if (!syscalls_metadata) {
WARN_ON(1);
return;
return -ENOMEM;
}
for (i = 0; i < FTRACE_SYSCALL_MAX; i++) {
for (i = 0; i < NR_syscalls; i++) {
meta = find_syscall_meta(psys_syscall_table[i]);
syscalls_metadata[i] = meta;
}
return;
/* Paranoid: avoid overflow */
end:
atomic_dec(&refs);
return 0;
}
arch_initcall(arch_init_ftrace_syscalls);
#endif

View file

@ -35,10 +35,11 @@
#include <asm/proto.h>
#include <asm/ds.h>
#include <trace/syscall.h>
#include "tls.h"
#define CREATE_TRACE_POINTS
#include <trace/events/syscalls.h>
enum x86_regset {
REGSET_GENERAL,
REGSET_FP,
@ -1497,8 +1498,8 @@ asmregparm long syscall_trace_enter(struct pt_regs *regs)
tracehook_report_syscall_entry(regs))
ret = -1L;
if (unlikely(test_thread_flag(TIF_SYSCALL_FTRACE)))
ftrace_syscall_enter(regs);
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->orig_ax);
if (unlikely(current->audit_context)) {
if (IS_IA32)
@ -1523,8 +1524,8 @@ asmregparm void syscall_trace_leave(struct pt_regs *regs)
if (unlikely(current->audit_context))
audit_syscall_exit(AUDITSC_RESULT(regs->ax), regs->ax);
if (unlikely(test_thread_flag(TIF_SYSCALL_FTRACE)))
ftrace_syscall_exit(regs);
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_exit(regs, regs->ax);
if (test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, 0);

View file

@ -18,9 +18,9 @@
#include <asm/ia32.h>
#include <asm/syscalls.h>
asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long off)
SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, off)
{
long error;
struct file *file;
@ -226,7 +226,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
}
asmlinkage long sys_uname(struct new_utsname __user *name)
SYSCALL_DEFINE1(uname, struct new_utsname __user *, name)
{
int err;
down_read(&uts_sem);

View file

@ -93,16 +93,22 @@ void tracing_generic_entry_update(struct trace_entry *entry,
unsigned long flags,
int pc);
struct ring_buffer_event *
trace_current_buffer_lock_reserve(int type, unsigned long len,
trace_current_buffer_lock_reserve(struct ring_buffer **current_buffer,
int type, unsigned long len,
unsigned long flags, int pc);
void trace_current_buffer_unlock_commit(struct ring_buffer_event *event,
void trace_current_buffer_unlock_commit(struct ring_buffer *buffer,
struct ring_buffer_event *event,
unsigned long flags, int pc);
void trace_nowake_buffer_unlock_commit(struct ring_buffer_event *event,
void trace_nowake_buffer_unlock_commit(struct ring_buffer *buffer,
struct ring_buffer_event *event,
unsigned long flags, int pc);
void trace_current_buffer_discard_commit(struct ring_buffer_event *event);
void trace_current_buffer_discard_commit(struct ring_buffer *buffer,
struct ring_buffer_event *event);
void tracing_record_cmdline(struct task_struct *tsk);
struct event_filter;
struct ftrace_event_call {
struct list_head list;
char *name;
@ -110,16 +116,18 @@ struct ftrace_event_call {
struct dentry *dir;
struct trace_event *event;
int enabled;
int (*regfunc)(void);
void (*unregfunc)(void);
int (*regfunc)(void *);
void (*unregfunc)(void *);
int id;
int (*raw_init)(void);
int (*show_format)(struct trace_seq *s);
int (*define_fields)(void);
int (*show_format)(struct ftrace_event_call *call,
struct trace_seq *s);
int (*define_fields)(struct ftrace_event_call *);
struct list_head fields;
int filter_active;
void *filter;
struct event_filter *filter;
void *mod;
void *data;
atomic_t profile_count;
int (*profile_enable)(struct ftrace_event_call *);
@ -129,15 +137,25 @@ struct ftrace_event_call {
#define MAX_FILTER_PRED 32
#define MAX_FILTER_STR_VAL 128
extern int init_preds(struct ftrace_event_call *call);
extern void destroy_preds(struct ftrace_event_call *call);
extern int filter_match_preds(struct ftrace_event_call *call, void *rec);
extern int filter_current_check_discard(struct ftrace_event_call *call,
extern int filter_current_check_discard(struct ring_buffer *buffer,
struct ftrace_event_call *call,
void *rec,
struct ring_buffer_event *event);
extern int trace_define_field(struct ftrace_event_call *call, char *type,
char *name, int offset, int size, int is_signed);
enum {
FILTER_OTHER = 0,
FILTER_STATIC_STRING,
FILTER_DYN_STRING,
FILTER_PTR_STRING,
};
extern int trace_define_field(struct ftrace_event_call *call,
const char *type, const char *name,
int offset, int size, int is_signed,
int filter_type);
extern int trace_define_common_fields(struct ftrace_event_call *call);
#define is_signed_type(type) (((type)(-1)) < 0)
@ -162,11 +180,4 @@ do { \
__trace_printk(ip, fmt, ##args); \
} while (0)
#define __common_field(type, item, is_signed) \
ret = trace_define_field(event_call, #type, "common_" #item, \
offsetof(typeof(field.ent), item), \
sizeof(field.ent.item), is_signed); \
if (ret) \
return ret;
#endif /* _LINUX_FTRACE_EVENT_H */

View file

@ -17,10 +17,12 @@
#include <linux/moduleparam.h>
#include <linux/marker.h>
#include <linux/tracepoint.h>
#include <asm/local.h>
#include <asm/local.h>
#include <asm/module.h>
#include <trace/events/module.h>
/* Not Yet Implemented */
#define MODULE_SUPPORTED_DEVICE(name)
@ -462,7 +464,10 @@ static inline local_t *__module_ref_addr(struct module *mod, int cpu)
static inline void __module_get(struct module *module)
{
if (module) {
local_inc(__module_ref_addr(module, get_cpu()));
unsigned int cpu = get_cpu();
local_inc(__module_ref_addr(module, cpu));
trace_module_get(module, _THIS_IP_,
local_read(__module_ref_addr(module, cpu)));
put_cpu();
}
}
@ -473,8 +478,11 @@ static inline int try_module_get(struct module *module)
if (module) {
unsigned int cpu = get_cpu();
if (likely(module_is_live(module)))
if (likely(module_is_live(module))) {
local_inc(__module_ref_addr(module, cpu));
trace_module_get(module, _THIS_IP_,
local_read(__module_ref_addr(module, cpu)));
}
else
ret = 0;
put_cpu();

View file

@ -766,6 +766,8 @@ extern int sysctl_perf_counter_mlock;
extern int sysctl_perf_counter_sample_rate;
extern void perf_counter_init(void);
extern void perf_tpcounter_event(int event_id, u64 addr, u64 count,
void *record, int entry_size);
#ifndef perf_misc_flags
#define perf_misc_flags(regs) (user_mode(regs) ? PERF_EVENT_MISC_USER : \

View file

@ -74,20 +74,6 @@ ring_buffer_event_time_delta(struct ring_buffer_event *event)
return event->time_delta;
}
/*
* ring_buffer_event_discard can discard any event in the ring buffer.
* it is up to the caller to protect against a reader from
* consuming it or a writer from wrapping and replacing it.
*
* No external protection is needed if this is called before
* the event is commited. But in that case it would be better to
* use ring_buffer_discard_commit.
*
* Note, if an event that has not been committed is discarded
* with ring_buffer_event_discard, it must still be committed.
*/
void ring_buffer_event_discard(struct ring_buffer_event *event);
/*
* ring_buffer_discard_commit will remove an event that has not
* ben committed yet. If this is used, then ring_buffer_unlock_commit
@ -154,8 +140,17 @@ unsigned long ring_buffer_size(struct ring_buffer *buffer);
void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
void ring_buffer_reset(struct ring_buffer *buffer);
#ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
struct ring_buffer *buffer_b, int cpu);
#else
static inline int
ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
struct ring_buffer *buffer_b, int cpu)
{
return -ENODEV;
}
#endif
int ring_buffer_empty(struct ring_buffer *buffer);
int ring_buffer_empty_cpu(struct ring_buffer *buffer, int cpu);
@ -170,7 +165,6 @@ unsigned long ring_buffer_overruns(struct ring_buffer *buffer);
unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu);
unsigned long ring_buffer_overrun_cpu(struct ring_buffer *buffer, int cpu);
unsigned long ring_buffer_commit_overrun_cpu(struct ring_buffer *buffer, int cpu);
unsigned long ring_buffer_nmi_dropped_cpu(struct ring_buffer *buffer, int cpu);
u64 ring_buffer_time_stamp(struct ring_buffer *buffer, int cpu);
void ring_buffer_normalize_time_stamp(struct ring_buffer *buffer,

View file

@ -64,6 +64,7 @@ struct perf_counter_attr;
#include <linux/sem.h>
#include <asm/siginfo.h>
#include <asm/signal.h>
#include <linux/unistd.h>
#include <linux/quota.h>
#include <linux/key.h>
#include <trace/syscall.h>
@ -97,6 +98,53 @@ struct perf_counter_attr;
#define __SC_TEST5(t5, a5, ...) __SC_TEST(t5); __SC_TEST4(__VA_ARGS__)
#define __SC_TEST6(t6, a6, ...) __SC_TEST(t6); __SC_TEST5(__VA_ARGS__)
#ifdef CONFIG_EVENT_PROFILE
#define TRACE_SYS_ENTER_PROFILE(sname) \
static int prof_sysenter_enable_##sname(struct ftrace_event_call *event_call) \
{ \
int ret = 0; \
if (!atomic_inc_return(&event_enter_##sname.profile_count)) \
ret = reg_prof_syscall_enter("sys"#sname); \
return ret; \
} \
\
static void prof_sysenter_disable_##sname(struct ftrace_event_call *event_call)\
{ \
if (atomic_add_negative(-1, &event_enter_##sname.profile_count)) \
unreg_prof_syscall_enter("sys"#sname); \
}
#define TRACE_SYS_EXIT_PROFILE(sname) \
static int prof_sysexit_enable_##sname(struct ftrace_event_call *event_call) \
{ \
int ret = 0; \
if (!atomic_inc_return(&event_exit_##sname.profile_count)) \
ret = reg_prof_syscall_exit("sys"#sname); \
return ret; \
} \
\
static void prof_sysexit_disable_##sname(struct ftrace_event_call *event_call) \
{ \
if (atomic_add_negative(-1, &event_exit_##sname.profile_count)) \
unreg_prof_syscall_exit("sys"#sname); \
}
#define TRACE_SYS_ENTER_PROFILE_INIT(sname) \
.profile_count = ATOMIC_INIT(-1), \
.profile_enable = prof_sysenter_enable_##sname, \
.profile_disable = prof_sysenter_disable_##sname,
#define TRACE_SYS_EXIT_PROFILE_INIT(sname) \
.profile_count = ATOMIC_INIT(-1), \
.profile_enable = prof_sysexit_enable_##sname, \
.profile_disable = prof_sysexit_disable_##sname,
#else
#define TRACE_SYS_ENTER_PROFILE(sname)
#define TRACE_SYS_ENTER_PROFILE_INIT(sname)
#define TRACE_SYS_EXIT_PROFILE(sname)
#define TRACE_SYS_EXIT_PROFILE_INIT(sname)
#endif
#ifdef CONFIG_FTRACE_SYSCALLS
#define __SC_STR_ADECL1(t, a) #a
#define __SC_STR_ADECL2(t, a, ...) #a, __SC_STR_ADECL1(__VA_ARGS__)
@ -112,7 +160,81 @@ struct perf_counter_attr;
#define __SC_STR_TDECL5(t, a, ...) #t, __SC_STR_TDECL4(__VA_ARGS__)
#define __SC_STR_TDECL6(t, a, ...) #t, __SC_STR_TDECL5(__VA_ARGS__)
#define SYSCALL_TRACE_ENTER_EVENT(sname) \
static struct ftrace_event_call event_enter_##sname; \
struct trace_event enter_syscall_print_##sname = { \
.trace = print_syscall_enter, \
}; \
static int init_enter_##sname(void) \
{ \
int num, id; \
num = syscall_name_to_nr("sys"#sname); \
if (num < 0) \
return -ENOSYS; \
id = register_ftrace_event(&enter_syscall_print_##sname);\
if (!id) \
return -ENODEV; \
event_enter_##sname.id = id; \
set_syscall_enter_id(num, id); \
INIT_LIST_HEAD(&event_enter_##sname.fields); \
return 0; \
} \
TRACE_SYS_ENTER_PROFILE(sname); \
static struct ftrace_event_call __used \
__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_events"))) \
event_enter_##sname = { \
.name = "sys_enter"#sname, \
.system = "syscalls", \
.event = &event_syscall_enter, \
.raw_init = init_enter_##sname, \
.show_format = syscall_enter_format, \
.define_fields = syscall_enter_define_fields, \
.regfunc = reg_event_syscall_enter, \
.unregfunc = unreg_event_syscall_enter, \
.data = "sys"#sname, \
TRACE_SYS_ENTER_PROFILE_INIT(sname) \
}
#define SYSCALL_TRACE_EXIT_EVENT(sname) \
static struct ftrace_event_call event_exit_##sname; \
struct trace_event exit_syscall_print_##sname = { \
.trace = print_syscall_exit, \
}; \
static int init_exit_##sname(void) \
{ \
int num, id; \
num = syscall_name_to_nr("sys"#sname); \
if (num < 0) \
return -ENOSYS; \
id = register_ftrace_event(&exit_syscall_print_##sname);\
if (!id) \
return -ENODEV; \
event_exit_##sname.id = id; \
set_syscall_exit_id(num, id); \
INIT_LIST_HEAD(&event_exit_##sname.fields); \
return 0; \
} \
TRACE_SYS_EXIT_PROFILE(sname); \
static struct ftrace_event_call __used \
__attribute__((__aligned__(4))) \
__attribute__((section("_ftrace_events"))) \
event_exit_##sname = { \
.name = "sys_exit"#sname, \
.system = "syscalls", \
.event = &event_syscall_exit, \
.raw_init = init_exit_##sname, \
.show_format = syscall_exit_format, \
.define_fields = syscall_exit_define_fields, \
.regfunc = reg_event_syscall_exit, \
.unregfunc = unreg_event_syscall_exit, \
.data = "sys"#sname, \
TRACE_SYS_EXIT_PROFILE_INIT(sname) \
}
#define SYSCALL_METADATA(sname, nb) \
SYSCALL_TRACE_ENTER_EVENT(sname); \
SYSCALL_TRACE_EXIT_EVENT(sname); \
static const struct syscall_metadata __used \
__attribute__((__aligned__(4))) \
__attribute__((section("__syscalls_metadata"))) \
@ -121,18 +243,23 @@ struct perf_counter_attr;
.nb_args = nb, \
.types = types_##sname, \
.args = args_##sname, \
}
.enter_event = &event_enter_##sname, \
.exit_event = &event_exit_##sname, \
};
#define SYSCALL_DEFINE0(sname) \
SYSCALL_TRACE_ENTER_EVENT(_##sname); \
SYSCALL_TRACE_EXIT_EVENT(_##sname); \
static const struct syscall_metadata __used \
__attribute__((__aligned__(4))) \
__attribute__((section("__syscalls_metadata"))) \
__syscall_meta_##sname = { \
.name = "sys_"#sname, \
.nb_args = 0, \
.enter_event = &event_enter__##sname, \
.exit_event = &event_exit__##sname, \
}; \
asmlinkage long sys_##sname(void)
#else
#define SYSCALL_DEFINE0(name) asmlinkage long sys_##name(void)
#endif

View file

@ -23,6 +23,8 @@ struct tracepoint;
struct tracepoint {
const char *name; /* Tracepoint name */
int state; /* State. */
void (*regfunc)(void);
void (*unregfunc)(void);
void **funcs;
} __attribute__((aligned(32))); /*
* Aligned on 32 bytes because it is
@ -78,12 +80,16 @@ struct tracepoint {
return tracepoint_probe_unregister(#name, (void *)probe);\
}
#define DEFINE_TRACE(name) \
#define DEFINE_TRACE_FN(name, reg, unreg) \
static const char __tpstrtab_##name[] \
__attribute__((section("__tracepoints_strings"))) = #name; \
struct tracepoint __tracepoint_##name \
__attribute__((section("__tracepoints"), aligned(32))) = \
{ __tpstrtab_##name, 0, NULL }
{ __tpstrtab_##name, 0, reg, unreg, NULL }
#define DEFINE_TRACE(name) \
DEFINE_TRACE_FN(name, NULL, NULL);
#define EXPORT_TRACEPOINT_SYMBOL_GPL(name) \
EXPORT_SYMBOL_GPL(__tracepoint_##name)
@ -108,6 +114,7 @@ extern void tracepoint_update_probe_range(struct tracepoint *begin,
return -ENOSYS; \
}
#define DEFINE_TRACE_FN(name, reg, unreg)
#define DEFINE_TRACE(name)
#define EXPORT_TRACEPOINT_SYMBOL_GPL(name)
#define EXPORT_TRACEPOINT_SYMBOL(name)
@ -158,6 +165,15 @@ static inline void tracepoint_synchronize_unregister(void)
#define PARAMS(args...) args
#endif /* _LINUX_TRACEPOINT_H */
/*
* Note: we keep the TRACE_EVENT outside the include file ifdef protection.
* This is due to the way trace events work. If a file includes two
* trace event headers under one "CREATE_TRACE_POINTS" the first include
* will override the TRACE_EVENT and break the second include.
*/
#ifndef TRACE_EVENT
/*
* For use with the TRACE_EVENT macro:
@ -259,10 +275,15 @@ static inline void tracepoint_synchronize_unregister(void)
* can also by used by generic instrumentation like SystemTap), and
* it is also used to expose a structured trace record in
* /sys/kernel/debug/tracing/events/.
*
* A set of (un)registration functions can be passed to the variant
* TRACE_EVENT_FN to perform any (un)registration work.
*/
#define TRACE_EVENT(name, proto, args, struct, assign, print) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
#endif
#define TRACE_EVENT_FN(name, proto, args, struct, \
assign, print, reg, unreg) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
#endif
#endif /* ifdef TRACE_EVENT (see note above) */

View file

@ -26,6 +26,11 @@
#define TRACE_EVENT(name, proto, args, tstruct, assign, print) \
DEFINE_TRACE(name)
#undef TRACE_EVENT_FN
#define TRACE_EVENT_FN(name, proto, args, tstruct, \
assign, print, reg, unreg) \
DEFINE_TRACE_FN(name, reg, unreg)
#undef DECLARE_TRACE
#define DECLARE_TRACE(name, proto, args) \
DEFINE_TRACE(name)
@ -56,6 +61,8 @@
#include <trace/ftrace.h>
#endif
#undef TRACE_EVENT
#undef TRACE_EVENT_FN
#undef TRACE_HEADER_MULTI_READ
/* Only undef what we defined in this file */

View file

@ -0,0 +1,126 @@
#undef TRACE_SYSTEM
#define TRACE_SYSTEM module
#if !defined(_TRACE_MODULE_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_MODULE_H
#include <linux/tracepoint.h>
#ifdef CONFIG_MODULES
struct module;
#define show_module_flags(flags) __print_flags(flags, "", \
{ (1UL << TAINT_PROPRIETARY_MODULE), "P" }, \
{ (1UL << TAINT_FORCED_MODULE), "F" }, \
{ (1UL << TAINT_CRAP), "C" })
TRACE_EVENT(module_load,
TP_PROTO(struct module *mod),
TP_ARGS(mod),
TP_STRUCT__entry(
__field( unsigned int, taints )
__string( name, mod->name )
),
TP_fast_assign(
__entry->taints = mod->taints;
__assign_str(name, mod->name);
),
TP_printk("%s %s", __get_str(name), show_module_flags(__entry->taints))
);
TRACE_EVENT(module_free,
TP_PROTO(struct module *mod),
TP_ARGS(mod),
TP_STRUCT__entry(
__string( name, mod->name )
),
TP_fast_assign(
__assign_str(name, mod->name);
),
TP_printk("%s", __get_str(name))
);
TRACE_EVENT(module_get,
TP_PROTO(struct module *mod, unsigned long ip, int refcnt),
TP_ARGS(mod, ip, refcnt),
TP_STRUCT__entry(
__field( unsigned long, ip )
__field( int, refcnt )
__string( name, mod->name )
),
TP_fast_assign(
__entry->ip = ip;
__entry->refcnt = refcnt;
__assign_str(name, mod->name);
),
TP_printk("%s call_site=%pf refcnt=%d",
__get_str(name), (void *)__entry->ip, __entry->refcnt)
);
TRACE_EVENT(module_put,
TP_PROTO(struct module *mod, unsigned long ip, int refcnt),
TP_ARGS(mod, ip, refcnt),
TP_STRUCT__entry(
__field( unsigned long, ip )
__field( int, refcnt )
__string( name, mod->name )
),
TP_fast_assign(
__entry->ip = ip;
__entry->refcnt = refcnt;
__assign_str(name, mod->name);
),
TP_printk("%s call_site=%pf refcnt=%d",
__get_str(name), (void *)__entry->ip, __entry->refcnt)
);
TRACE_EVENT(module_request,
TP_PROTO(char *name, bool wait, unsigned long ip),
TP_ARGS(name, wait, ip),
TP_STRUCT__entry(
__field( bool, wait )
__field( unsigned long, ip )
__string( name, name )
),
TP_fast_assign(
__entry->wait = wait;
__entry->ip = ip;
__assign_str(name, name);
),
TP_printk("%s wait=%d call_site=%pf",
__get_str(name), (int)__entry->wait, (void *)__entry->ip)
);
#endif /* CONFIG_MODULES */
#endif /* _TRACE_MODULE_H */
/* This part must be outside protection */
#include <trace/define_trace.h>

View file

@ -94,6 +94,7 @@ TRACE_EVENT(sched_wakeup,
__field( pid_t, pid )
__field( int, prio )
__field( int, success )
__field( int, cpu )
),
TP_fast_assign(
@ -101,11 +102,12 @@ TRACE_EVENT(sched_wakeup,
__entry->pid = p->pid;
__entry->prio = p->prio;
__entry->success = success;
__entry->cpu = task_cpu(p);
),
TP_printk("task %s:%d [%d] success=%d",
TP_printk("task %s:%d [%d] success=%d [%03d]",
__entry->comm, __entry->pid, __entry->prio,
__entry->success)
__entry->success, __entry->cpu)
);
/*
@ -125,6 +127,7 @@ TRACE_EVENT(sched_wakeup_new,
__field( pid_t, pid )
__field( int, prio )
__field( int, success )
__field( int, cpu )
),
TP_fast_assign(
@ -132,11 +135,12 @@ TRACE_EVENT(sched_wakeup_new,
__entry->pid = p->pid;
__entry->prio = p->prio;
__entry->success = success;
__entry->cpu = task_cpu(p);
),
TP_printk("task %s:%d [%d] success=%d",
TP_printk("task %s:%d [%d] success=%d [%03d]",
__entry->comm, __entry->pid, __entry->prio,
__entry->success)
__entry->success, __entry->cpu)
);
/*

View file

@ -0,0 +1,70 @@
#undef TRACE_SYSTEM
#define TRACE_SYSTEM syscalls
#if !defined(_TRACE_EVENTS_SYSCALLS_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_EVENTS_SYSCALLS_H
#include <linux/tracepoint.h>
#include <asm/ptrace.h>
#include <asm/syscall.h>
#ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
extern void syscall_regfunc(void);
extern void syscall_unregfunc(void);
TRACE_EVENT_FN(sys_enter,
TP_PROTO(struct pt_regs *regs, long id),
TP_ARGS(regs, id),
TP_STRUCT__entry(
__field( long, id )
__array( unsigned long, args, 6 )
),
TP_fast_assign(
__entry->id = id;
syscall_get_arguments(current, regs, 0, 6, __entry->args);
),
TP_printk("NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)",
__entry->id,
__entry->args[0], __entry->args[1], __entry->args[2],
__entry->args[3], __entry->args[4], __entry->args[5]),
syscall_regfunc, syscall_unregfunc
);
TRACE_EVENT_FN(sys_exit,
TP_PROTO(struct pt_regs *regs, long ret),
TP_ARGS(regs, ret),
TP_STRUCT__entry(
__field( long, id )
__field( long, ret )
),
TP_fast_assign(
__entry->id = syscall_get_nr(current, regs);
__entry->ret = ret;
),
TP_printk("NR %ld = %ld",
__entry->id, __entry->ret),
syscall_regfunc, syscall_unregfunc
);
#endif /* CONFIG_HAVE_SYSCALL_TRACEPOINTS */
#endif /* _TRACE_EVENTS_SYSCALLS_H */
/* This part must be outside protection */
#include <trace/define_trace.h>

View file

@ -21,11 +21,14 @@
#undef __field
#define __field(type, item) type item;
#undef __field_ext
#define __field_ext(type, item, filter_type) type item;
#undef __array
#define __array(type, item, len) type item[len];
#undef __dynamic_array
#define __dynamic_array(type, item, len) unsigned short __data_loc_##item;
#define __dynamic_array(type, item, len) u32 __data_loc_##item;
#undef __string
#define __string(item, src) __dynamic_array(char, item, -1)
@ -42,6 +45,16 @@
}; \
static struct ftrace_event_call event_##name
#undef __cpparg
#define __cpparg(arg...) arg
/* Callbacks are meaningless to ftrace. */
#undef TRACE_EVENT_FN
#define TRACE_EVENT_FN(name, proto, args, tstruct, \
assign, print, reg, unreg) \
TRACE_EVENT(name, __cpparg(proto), __cpparg(args), \
__cpparg(tstruct), __cpparg(assign), __cpparg(print)) \
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
@ -51,23 +64,27 @@
* Include the following:
*
* struct ftrace_data_offsets_<call> {
* int <item1>;
* int <item2>;
* u32 <item1>;
* u32 <item2>;
* [...]
* };
*
* The __dynamic_array() macro will create each int <item>, this is
* The __dynamic_array() macro will create each u32 <item>, this is
* to keep the offset of each array from the beginning of the event.
* The size of an array is also encoded, in the higher 16 bits of <item>.
*/
#undef __field
#define __field(type, item);
#define __field(type, item)
#undef __field_ext
#define __field_ext(type, item, filter_type)
#undef __array
#define __array(type, item, len)
#undef __dynamic_array
#define __dynamic_array(type, item, len) int item;
#define __dynamic_array(type, item, len) u32 item;
#undef __string
#define __string(item, src) __dynamic_array(char, item, -1)
@ -109,6 +126,9 @@
if (!ret) \
return 0;
#undef __field_ext
#define __field_ext(type, item, filter_type) __field(type, item)
#undef __array
#define __array(type, item, len) \
ret = trace_seq_printf(s, "\tfield:" #type " " #item "[" #len "];\t" \
@ -120,7 +140,7 @@
#undef __dynamic_array
#define __dynamic_array(type, item, len) \
ret = trace_seq_printf(s, "\tfield:__data_loc " #item ";\t" \
ret = trace_seq_printf(s, "\tfield:__data_loc " #type "[] " #item ";\t"\
"offset:%u;\tsize:%u;\n", \
(unsigned int)offsetof(typeof(field), \
__data_loc_##item), \
@ -150,7 +170,8 @@
#undef TRACE_EVENT
#define TRACE_EVENT(call, proto, args, tstruct, func, print) \
static int \
ftrace_format_##call(struct trace_seq *s) \
ftrace_format_##call(struct ftrace_event_call *unused, \
struct trace_seq *s) \
{ \
struct ftrace_raw_##call field __attribute__((unused)); \
int ret = 0; \
@ -210,7 +231,7 @@ ftrace_format_##call(struct trace_seq *s) \
#undef __get_dynamic_array
#define __get_dynamic_array(field) \
((void *)__entry + __entry->__data_loc_##field)
((void *)__entry + (__entry->__data_loc_##field & 0xffff))
#undef __get_str
#define __get_str(field) (char *)__get_dynamic_array(field)
@ -263,28 +284,33 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
#undef __field
#define __field(type, item) \
#undef __field_ext
#define __field_ext(type, item, filter_type) \
ret = trace_define_field(event_call, #type, #item, \
offsetof(typeof(field), item), \
sizeof(field.item), is_signed_type(type)); \
sizeof(field.item), \
is_signed_type(type), filter_type); \
if (ret) \
return ret;
#undef __field
#define __field(type, item) __field_ext(type, item, FILTER_OTHER)
#undef __array
#define __array(type, item, len) \
BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
ret = trace_define_field(event_call, #type "[" #len "]", #item, \
offsetof(typeof(field), item), \
sizeof(field.item), 0); \
sizeof(field.item), 0, FILTER_OTHER); \
if (ret) \
return ret;
#undef __dynamic_array
#define __dynamic_array(type, item, len) \
ret = trace_define_field(event_call, "__data_loc" "[" #type "]", #item,\
offsetof(typeof(field), __data_loc_##item), \
sizeof(field.__data_loc_##item), 0);
ret = trace_define_field(event_call, "__data_loc " #type "[]", #item, \
offsetof(typeof(field), __data_loc_##item), \
sizeof(field.__data_loc_##item), 0, \
FILTER_OTHER);
#undef __string
#define __string(item, src) __dynamic_array(char, item, -1)
@ -292,17 +318,14 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \
#undef TRACE_EVENT
#define TRACE_EVENT(call, proto, args, tstruct, func, print) \
int \
ftrace_define_fields_##call(void) \
ftrace_define_fields_##call(struct ftrace_event_call *event_call) \
{ \
struct ftrace_raw_##call field; \
struct ftrace_event_call *event_call = &event_##call; \
int ret; \
\
__common_field(int, type, 1); \
__common_field(unsigned char, flags, 0); \
__common_field(unsigned char, preempt_count, 0); \
__common_field(int, pid, 1); \
__common_field(int, tgid, 1); \
ret = trace_define_common_fields(event_call); \
if (ret) \
return ret; \
\
tstruct; \
\
@ -321,6 +344,9 @@ ftrace_define_fields_##call(void) \
#undef __field
#define __field(type, item)
#undef __field_ext
#define __field_ext(type, item, filter_type)
#undef __array
#define __array(type, item, len)
@ -328,6 +354,7 @@ ftrace_define_fields_##call(void) \
#define __dynamic_array(type, item, len) \
__data_offsets->item = __data_size + \
offsetof(typeof(*entry), __data); \
__data_offsets->item |= (len * sizeof(type)) << 16; \
__data_size += (len) * sizeof(type);
#undef __string
@ -433,13 +460,15 @@ static void ftrace_profile_disable_##call(struct ftrace_event_call *event_call)\
* {
* struct ring_buffer_event *event;
* struct ftrace_raw_<call> *entry; <-- defined in stage 1
* struct ring_buffer *buffer;
* unsigned long irq_flags;
* int pc;
*
* local_save_flags(irq_flags);
* pc = preempt_count();
*
* event = trace_current_buffer_lock_reserve(event_<call>.id,
* event = trace_current_buffer_lock_reserve(&buffer,
* event_<call>.id,
* sizeof(struct ftrace_raw_<call>),
* irq_flags, pc);
* if (!event)
@ -449,7 +478,7 @@ static void ftrace_profile_disable_##call(struct ftrace_event_call *event_call)\
* <assign>; <-- Here we assign the entries by the __field and
* __array macros.
*
* trace_current_buffer_unlock_commit(event, irq_flags, pc);
* trace_current_buffer_unlock_commit(buffer, event, irq_flags, pc);
* }
*
* static int ftrace_raw_reg_event_<call>(void)
@ -541,6 +570,7 @@ static void ftrace_raw_event_##call(proto) \
struct ftrace_event_call *event_call = &event_##call; \
struct ring_buffer_event *event; \
struct ftrace_raw_##call *entry; \
struct ring_buffer *buffer; \
unsigned long irq_flags; \
int __data_size; \
int pc; \
@ -550,7 +580,8 @@ static void ftrace_raw_event_##call(proto) \
\
__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
\
event = trace_current_buffer_lock_reserve(event_##call.id, \
event = trace_current_buffer_lock_reserve(&buffer, \
event_##call.id, \
sizeof(*entry) + __data_size, \
irq_flags, pc); \
if (!event) \
@ -562,11 +593,12 @@ static void ftrace_raw_event_##call(proto) \
\
{ assign; } \
\
if (!filter_current_check_discard(event_call, entry, event)) \
trace_nowake_buffer_unlock_commit(event, irq_flags, pc); \
if (!filter_current_check_discard(buffer, event_call, entry, event)) \
trace_nowake_buffer_unlock_commit(buffer, \
event, irq_flags, pc); \
} \
\
static int ftrace_raw_reg_event_##call(void) \
static int ftrace_raw_reg_event_##call(void *ptr) \
{ \
int ret; \
\
@ -577,7 +609,7 @@ static int ftrace_raw_reg_event_##call(void) \
return ret; \
} \
\
static void ftrace_raw_unreg_event_##call(void) \
static void ftrace_raw_unreg_event_##call(void *ptr) \
{ \
unregister_trace_##call(ftrace_raw_event_##call); \
} \
@ -595,7 +627,6 @@ static int ftrace_raw_init_event_##call(void) \
return -ENODEV; \
event_##call.id = id; \
INIT_LIST_HEAD(&event_##call.fields); \
init_preds(&event_##call); \
return 0; \
} \
\

View file

@ -1,8 +1,13 @@
#ifndef _TRACE_SYSCALL_H
#define _TRACE_SYSCALL_H
#include <linux/tracepoint.h>
#include <linux/unistd.h>
#include <linux/ftrace_event.h>
#include <asm/ptrace.h>
/*
* A syscall entry in the ftrace syscalls array.
*
@ -10,26 +15,49 @@
* @nb_args: number of parameters it takes
* @types: list of types as strings
* @args: list of args as strings (args[i] matches types[i])
* @enter_id: associated ftrace enter event id
* @exit_id: associated ftrace exit event id
* @enter_event: associated syscall_enter trace event
* @exit_event: associated syscall_exit trace event
*/
struct syscall_metadata {
const char *name;
int nb_args;
const char **types;
const char **args;
int enter_id;
int exit_id;
struct ftrace_event_call *enter_event;
struct ftrace_event_call *exit_event;
};
#ifdef CONFIG_FTRACE_SYSCALLS
extern void arch_init_ftrace_syscalls(void);
extern struct syscall_metadata *syscall_nr_to_meta(int nr);
extern void start_ftrace_syscalls(void);
extern void stop_ftrace_syscalls(void);
extern void ftrace_syscall_enter(struct pt_regs *regs);
extern void ftrace_syscall_exit(struct pt_regs *regs);
#else
static inline void start_ftrace_syscalls(void) { }
static inline void stop_ftrace_syscalls(void) { }
static inline void ftrace_syscall_enter(struct pt_regs *regs) { }
static inline void ftrace_syscall_exit(struct pt_regs *regs) { }
extern int syscall_name_to_nr(char *name);
void set_syscall_enter_id(int num, int id);
void set_syscall_exit_id(int num, int id);
extern struct trace_event event_syscall_enter;
extern struct trace_event event_syscall_exit;
extern int reg_event_syscall_enter(void *ptr);
extern void unreg_event_syscall_enter(void *ptr);
extern int reg_event_syscall_exit(void *ptr);
extern void unreg_event_syscall_exit(void *ptr);
extern int syscall_enter_format(struct ftrace_event_call *call,
struct trace_seq *s);
extern int syscall_exit_format(struct ftrace_event_call *call,
struct trace_seq *s);
extern int syscall_enter_define_fields(struct ftrace_event_call *call);
extern int syscall_exit_define_fields(struct ftrace_event_call *call);
enum print_line_t print_syscall_enter(struct trace_iterator *iter, int flags);
enum print_line_t print_syscall_exit(struct trace_iterator *iter, int flags);
#endif
#ifdef CONFIG_EVENT_PROFILE
int reg_prof_syscall_enter(char *name);
void unreg_prof_syscall_enter(char *name);
int reg_prof_syscall_exit(char *name);
void unreg_prof_syscall_exit(char *name);
#endif
#endif /* _TRACE_SYSCALL_H */

View file

@ -37,6 +37,8 @@
#include <linux/suspend.h>
#include <asm/uaccess.h>
#include <trace/events/module.h>
extern int max_threads;
static struct workqueue_struct *khelper_wq;
@ -112,6 +114,8 @@ int __request_module(bool wait, const char *fmt, ...)
return -ENOMEM;
}
trace_module_request(module_name, wait, _RET_IP_);
ret = call_usermodehelper(modprobe_path, argv, envp,
wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
atomic_dec(&kmod_concurrent);

View file

@ -103,7 +103,7 @@ static struct kprobe_blackpoint kprobe_blacklist[] = {
#define INSNS_PER_PAGE (PAGE_SIZE/(MAX_INSN_SIZE * sizeof(kprobe_opcode_t)))
struct kprobe_insn_page {
struct hlist_node hlist;
struct list_head list;
kprobe_opcode_t *insns; /* Page of instruction slots */
char slot_used[INSNS_PER_PAGE];
int nused;
@ -117,7 +117,7 @@ enum kprobe_slot_state {
};
static DEFINE_MUTEX(kprobe_insn_mutex); /* Protects kprobe_insn_pages */
static struct hlist_head kprobe_insn_pages;
static LIST_HEAD(kprobe_insn_pages);
static int kprobe_garbage_slots;
static int collect_garbage_slots(void);
@ -152,10 +152,9 @@ static int __kprobes check_safety(void)
static kprobe_opcode_t __kprobes *__get_insn_slot(void)
{
struct kprobe_insn_page *kip;
struct hlist_node *pos;
retry:
hlist_for_each_entry(kip, pos, &kprobe_insn_pages, hlist) {
list_for_each_entry(kip, &kprobe_insn_pages, list) {
if (kip->nused < INSNS_PER_PAGE) {
int i;
for (i = 0; i < INSNS_PER_PAGE; i++) {
@ -189,8 +188,8 @@ static kprobe_opcode_t __kprobes *__get_insn_slot(void)
kfree(kip);
return NULL;
}
INIT_HLIST_NODE(&kip->hlist);
hlist_add_head(&kip->hlist, &kprobe_insn_pages);
INIT_LIST_HEAD(&kip->list);
list_add(&kip->list, &kprobe_insn_pages);
memset(kip->slot_used, SLOT_CLEAN, INSNS_PER_PAGE);
kip->slot_used[0] = SLOT_USED;
kip->nused = 1;
@ -219,12 +218,8 @@ static int __kprobes collect_one_slot(struct kprobe_insn_page *kip, int idx)
* so as not to have to set it up again the
* next time somebody inserts a probe.
*/
hlist_del(&kip->hlist);
if (hlist_empty(&kprobe_insn_pages)) {
INIT_HLIST_NODE(&kip->hlist);
hlist_add_head(&kip->hlist,
&kprobe_insn_pages);
} else {
if (!list_is_singular(&kprobe_insn_pages)) {
list_del(&kip->list);
module_free(NULL, kip->insns);
kfree(kip);
}
@ -235,14 +230,13 @@ static int __kprobes collect_one_slot(struct kprobe_insn_page *kip, int idx)
static int __kprobes collect_garbage_slots(void)
{
struct kprobe_insn_page *kip;
struct hlist_node *pos, *next;
struct kprobe_insn_page *kip, *next;
/* Ensure no-one is preepmted on the garbages */
if (check_safety())
return -EAGAIN;
hlist_for_each_entry_safe(kip, pos, next, &kprobe_insn_pages, hlist) {
list_for_each_entry_safe(kip, next, &kprobe_insn_pages, list) {
int i;
if (kip->ngarbage == 0)
continue;
@ -260,19 +254,17 @@ static int __kprobes collect_garbage_slots(void)
void __kprobes free_insn_slot(kprobe_opcode_t * slot, int dirty)
{
struct kprobe_insn_page *kip;
struct hlist_node *pos;
mutex_lock(&kprobe_insn_mutex);
hlist_for_each_entry(kip, pos, &kprobe_insn_pages, hlist) {
list_for_each_entry(kip, &kprobe_insn_pages, list) {
if (kip->insns <= slot &&
slot < kip->insns + (INSNS_PER_PAGE * MAX_INSN_SIZE)) {
int i = (slot - kip->insns) / MAX_INSN_SIZE;
if (dirty) {
kip->slot_used[i] = SLOT_DIRTY;
kip->ngarbage++;
} else {
} else
collect_one_slot(kip, i);
}
break;
}
}

View file

@ -55,6 +55,11 @@
#include <linux/percpu.h>
#include <linux/kmemleak.h>
#define CREATE_TRACE_POINTS
#include <trace/events/module.h>
EXPORT_TRACEPOINT_SYMBOL(module_get);
#if 0
#define DEBUGP printk
#else
@ -942,6 +947,8 @@ void module_put(struct module *module)
if (module) {
unsigned int cpu = get_cpu();
local_dec(__module_ref_addr(module, cpu));
trace_module_put(module, _RET_IP_,
local_read(__module_ref_addr(module, cpu)));
/* Maybe they're waiting for us to drop reference? */
if (unlikely(!module_is_live(module)))
wake_up_process(module->waiter);
@ -1497,6 +1504,8 @@ static int __unlink_module(void *_mod)
/* Free a module, remove from lists, etc (must hold module_mutex). */
static void free_module(struct module *mod)
{
trace_module_free(mod);
/* Delete from various lists */
stop_machine(__unlink_module, mod, NULL);
remove_notes_attrs(mod);
@ -2364,6 +2373,8 @@ static noinline struct module *load_module(void __user *umod,
/* Get rid of temporary copy */
vfree(hdr);
trace_module_load(mod);
/* Done! */
return mod;

View file

@ -41,7 +41,7 @@ config HAVE_FTRACE_MCOUNT_RECORD
config HAVE_HW_BRANCH_TRACER
bool
config HAVE_FTRACE_SYSCALLS
config HAVE_SYSCALL_TRACEPOINTS
bool
config TRACER_MAX_TRACE
@ -60,9 +60,14 @@ config EVENT_TRACING
bool
config CONTEXT_SWITCH_TRACER
select MARKERS
bool
config RING_BUFFER_ALLOW_SWAP
bool
help
Allow the use of ring_buffer_swap_cpu.
Adds a very slight overhead to tracing when enabled.
# All tracer options should select GENERIC_TRACER. For those options that are
# enabled by all tracers (context switch and event tracer) they select TRACING.
# This allows those options to appear when no other tracer is selected. But the
@ -147,6 +152,7 @@ config IRQSOFF_TRACER
select TRACE_IRQFLAGS
select GENERIC_TRACER
select TRACER_MAX_TRACE
select RING_BUFFER_ALLOW_SWAP
help
This option measures the time spent in irqs-off critical
sections, with microsecond accuracy.
@ -168,6 +174,7 @@ config PREEMPT_TRACER
depends on PREEMPT
select GENERIC_TRACER
select TRACER_MAX_TRACE
select RING_BUFFER_ALLOW_SWAP
help
This option measures the time spent in preemption off critical
sections, with microsecond accuracy.
@ -211,7 +218,7 @@ config ENABLE_DEFAULT_TRACERS
config FTRACE_SYSCALLS
bool "Trace syscalls"
depends on HAVE_FTRACE_SYSCALLS
depends on HAVE_SYSCALL_TRACEPOINTS
select GENERIC_TRACER
select KALLSYMS
help

View file

@ -65,13 +65,15 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
{
struct blk_io_trace *t;
struct ring_buffer_event *event = NULL;
struct ring_buffer *buffer = NULL;
int pc = 0;
int cpu = smp_processor_id();
bool blk_tracer = blk_tracer_enabled;
if (blk_tracer) {
buffer = blk_tr->buffer;
pc = preempt_count();
event = trace_buffer_lock_reserve(blk_tr, TRACE_BLK,
event = trace_buffer_lock_reserve(buffer, TRACE_BLK,
sizeof(*t) + len,
0, pc);
if (!event)
@ -96,7 +98,7 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
memcpy((void *) t + sizeof(*t), data, len);
if (blk_tracer)
trace_buffer_unlock_commit(blk_tr, event, 0, pc);
trace_buffer_unlock_commit(buffer, event, 0, pc);
}
}
@ -179,6 +181,7 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
{
struct task_struct *tsk = current;
struct ring_buffer_event *event = NULL;
struct ring_buffer *buffer = NULL;
struct blk_io_trace *t;
unsigned long flags = 0;
unsigned long *sequence;
@ -204,8 +207,9 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
if (blk_tracer) {
tracing_record_cmdline(current);
buffer = blk_tr->buffer;
pc = preempt_count();
event = trace_buffer_lock_reserve(blk_tr, TRACE_BLK,
event = trace_buffer_lock_reserve(buffer, TRACE_BLK,
sizeof(*t) + pdu_len,
0, pc);
if (!event)
@ -252,7 +256,7 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
memcpy((void *) t + sizeof(*t), pdu_data, pdu_len);
if (blk_tracer) {
trace_buffer_unlock_commit(blk_tr, event, 0, pc);
trace_buffer_unlock_commit(buffer, event, 0, pc);
return;
}
}

View file

@ -1016,71 +1016,35 @@ static int
__ftrace_replace_code(struct dyn_ftrace *rec, int enable)
{
unsigned long ftrace_addr;
unsigned long ip, fl;
unsigned long flag = 0UL;
ftrace_addr = (unsigned long)FTRACE_ADDR;
ip = rec->ip;
/*
* If this record is not to be traced and
* it is not enabled then do nothing.
* If this record is not to be traced or we want to disable it,
* then disable it.
*
* If this record is not to be traced and
* it is enabled then disable it.
* If we want to enable it and filtering is off, then enable it.
*
* If we want to enable it and filtering is on, enable it only if
* it's filtered
*/
if (rec->flags & FTRACE_FL_NOTRACE) {
if (rec->flags & FTRACE_FL_ENABLED)
rec->flags &= ~FTRACE_FL_ENABLED;
else
return 0;
} else if (ftrace_filtered && enable) {
/*
* Filtering is on:
*/
fl = rec->flags & (FTRACE_FL_FILTER | FTRACE_FL_ENABLED);
/* Record is filtered and enabled, do nothing */
if (fl == (FTRACE_FL_FILTER | FTRACE_FL_ENABLED))
return 0;
/* Record is not filtered or enabled, do nothing */
if (!fl)
return 0;
/* Record is not filtered but enabled, disable it */
if (fl == FTRACE_FL_ENABLED)
rec->flags &= ~FTRACE_FL_ENABLED;
else
/* Otherwise record is filtered but not enabled, enable it */
rec->flags |= FTRACE_FL_ENABLED;
} else {
/* Disable or not filtered */
if (enable) {
/* if record is enabled, do nothing */
if (rec->flags & FTRACE_FL_ENABLED)
return 0;
rec->flags |= FTRACE_FL_ENABLED;
} else {
/* if record is not enabled, do nothing */
if (!(rec->flags & FTRACE_FL_ENABLED))
return 0;
rec->flags &= ~FTRACE_FL_ENABLED;
}
if (enable && !(rec->flags & FTRACE_FL_NOTRACE)) {
if (!ftrace_filtered || (rec->flags & FTRACE_FL_FILTER))
flag = FTRACE_FL_ENABLED;
}
if (rec->flags & FTRACE_FL_ENABLED)
/* If the state of this record hasn't changed, then do nothing */
if ((rec->flags & FTRACE_FL_ENABLED) == flag)
return 0;
if (flag) {
rec->flags |= FTRACE_FL_ENABLED;
return ftrace_make_call(rec, ftrace_addr);
else
return ftrace_make_nop(NULL, rec, ftrace_addr);
}
rec->flags &= ~FTRACE_FL_ENABLED;
return ftrace_make_nop(NULL, rec, ftrace_addr);
}
static void ftrace_replace_code(int enable)
@ -1375,7 +1339,6 @@ struct ftrace_iterator {
unsigned flags;
unsigned char buffer[FTRACE_BUFF_MAX+1];
unsigned buffer_idx;
unsigned filtered;
};
static void *
@ -1438,18 +1401,13 @@ static int t_hash_show(struct seq_file *m, void *v)
{
struct ftrace_func_probe *rec;
struct hlist_node *hnd = v;
char str[KSYM_SYMBOL_LEN];
rec = hlist_entry(hnd, struct ftrace_func_probe, node);
if (rec->ops->print)
return rec->ops->print(m, rec->ip, rec->ops, rec->data);
kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
seq_printf(m, "%s:", str);
kallsyms_lookup((unsigned long)rec->ops->func, NULL, NULL, NULL, str);
seq_printf(m, "%s", str);
seq_printf(m, "%pf:%pf", (void *)rec->ip, (void *)rec->ops->func);
if (rec->data)
seq_printf(m, ":%p", rec->data);
@ -1547,7 +1505,6 @@ static int t_show(struct seq_file *m, void *v)
{
struct ftrace_iterator *iter = m->private;
struct dyn_ftrace *rec = v;
char str[KSYM_SYMBOL_LEN];
if (iter->flags & FTRACE_ITER_HASH)
return t_hash_show(m, v);
@ -1560,9 +1517,7 @@ static int t_show(struct seq_file *m, void *v)
if (!rec)
return 0;
kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
seq_printf(m, "%s\n", str);
seq_printf(m, "%pf\n", (void *)rec->ip);
return 0;
}
@ -1601,17 +1556,6 @@ ftrace_avail_open(struct inode *inode, struct file *file)
return ret;
}
int ftrace_avail_release(struct inode *inode, struct file *file)
{
struct seq_file *m = (struct seq_file *)file->private_data;
struct ftrace_iterator *iter = m->private;
seq_release(inode, file);
kfree(iter);
return 0;
}
static int
ftrace_failures_open(struct inode *inode, struct file *file)
{
@ -2317,7 +2261,6 @@ ftrace_regex_write(struct file *file, const char __user *ubuf,
}
if (isspace(ch)) {
iter->filtered++;
iter->buffer[iter->buffer_idx] = 0;
ret = ftrace_process_regex(iter->buffer,
iter->buffer_idx, enable);
@ -2448,7 +2391,6 @@ ftrace_regex_release(struct inode *inode, struct file *file, int enable)
iter = file->private_data;
if (iter->buffer_idx) {
iter->filtered++;
iter->buffer[iter->buffer_idx] = 0;
ftrace_match_records(iter->buffer, iter->buffer_idx, enable);
}
@ -2479,14 +2421,14 @@ static const struct file_operations ftrace_avail_fops = {
.open = ftrace_avail_open,
.read = seq_read,
.llseek = seq_lseek,
.release = ftrace_avail_release,
.release = seq_release_private,
};
static const struct file_operations ftrace_failures_fops = {
.open = ftrace_failures_open,
.read = seq_read,
.llseek = seq_lseek,
.release = ftrace_avail_release,
.release = seq_release_private,
};
static const struct file_operations ftrace_filter_fops = {
@ -2548,7 +2490,6 @@ static void g_stop(struct seq_file *m, void *p)
static int g_show(struct seq_file *m, void *v)
{
unsigned long *ptr = v;
char str[KSYM_SYMBOL_LEN];
if (!ptr)
return 0;
@ -2558,9 +2499,7 @@ static int g_show(struct seq_file *m, void *v)
return 0;
}
kallsyms_lookup(*ptr, NULL, NULL, NULL, str);
seq_printf(m, "%s\n", str);
seq_printf(m, "%pf\n", v);
return 0;
}

View file

@ -183,11 +183,9 @@ static void kmemtrace_stop_probes(void)
static int kmem_trace_init(struct trace_array *tr)
{
int cpu;
kmemtrace_array = tr;
for_each_cpu(cpu, cpu_possible_mask)
tracing_reset(tr, cpu);
tracing_reset_online_cpus(tr);
kmemtrace_start_probes();
@ -239,12 +237,52 @@ struct kmemtrace_user_event_alloc {
};
static enum print_line_t
kmemtrace_print_alloc_user(struct trace_iterator *iter,
struct kmemtrace_alloc_entry *entry)
kmemtrace_print_alloc(struct trace_iterator *iter, int flags)
{
struct kmemtrace_user_event_alloc *ev_alloc;
struct trace_seq *s = &iter->seq;
struct kmemtrace_alloc_entry *entry;
int ret;
trace_assign_type(entry, iter->ent);
ret = trace_seq_printf(s, "type_id %d call_site %pF ptr %lu "
"bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d\n",
entry->type_id, (void *)entry->call_site, (unsigned long)entry->ptr,
(unsigned long)entry->bytes_req, (unsigned long)entry->bytes_alloc,
(unsigned long)entry->gfp_flags, entry->node);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
return TRACE_TYPE_HANDLED;
}
static enum print_line_t
kmemtrace_print_free(struct trace_iterator *iter, int flags)
{
struct trace_seq *s = &iter->seq;
struct kmemtrace_free_entry *entry;
int ret;
trace_assign_type(entry, iter->ent);
ret = trace_seq_printf(s, "type_id %d call_site %pF ptr %lu\n",
entry->type_id, (void *)entry->call_site,
(unsigned long)entry->ptr);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
return TRACE_TYPE_HANDLED;
}
static enum print_line_t
kmemtrace_print_alloc_user(struct trace_iterator *iter, int flags)
{
struct trace_seq *s = &iter->seq;
struct kmemtrace_alloc_entry *entry;
struct kmemtrace_user_event *ev;
struct kmemtrace_user_event_alloc *ev_alloc;
trace_assign_type(entry, iter->ent);
ev = trace_seq_reserve(s, sizeof(*ev));
if (!ev)
@ -271,12 +309,14 @@ kmemtrace_print_alloc_user(struct trace_iterator *iter,
}
static enum print_line_t
kmemtrace_print_free_user(struct trace_iterator *iter,
struct kmemtrace_free_entry *entry)
kmemtrace_print_free_user(struct trace_iterator *iter, int flags)
{
struct trace_seq *s = &iter->seq;
struct kmemtrace_free_entry *entry;
struct kmemtrace_user_event *ev;
trace_assign_type(entry, iter->ent);
ev = trace_seq_reserve(s, sizeof(*ev));
if (!ev)
return TRACE_TYPE_PARTIAL_LINE;
@ -294,12 +334,14 @@ kmemtrace_print_free_user(struct trace_iterator *iter,
/* The two other following provide a more minimalistic output */
static enum print_line_t
kmemtrace_print_alloc_compress(struct trace_iterator *iter,
struct kmemtrace_alloc_entry *entry)
kmemtrace_print_alloc_compress(struct trace_iterator *iter)
{
struct kmemtrace_alloc_entry *entry;
struct trace_seq *s = &iter->seq;
int ret;
trace_assign_type(entry, iter->ent);
/* Alloc entry */
ret = trace_seq_printf(s, " + ");
if (!ret)
@ -345,29 +387,24 @@ kmemtrace_print_alloc_compress(struct trace_iterator *iter,
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
/* Node */
ret = trace_seq_printf(s, "%4d ", entry->node);
/* Node and call site*/
ret = trace_seq_printf(s, "%4d %pf\n", entry->node,
(void *)entry->call_site);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
/* Call site */
ret = seq_print_ip_sym(s, entry->call_site, 0);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
if (!trace_seq_printf(s, "\n"))
return TRACE_TYPE_PARTIAL_LINE;
return TRACE_TYPE_HANDLED;
}
static enum print_line_t
kmemtrace_print_free_compress(struct trace_iterator *iter,
struct kmemtrace_free_entry *entry)
kmemtrace_print_free_compress(struct trace_iterator *iter)
{
struct kmemtrace_free_entry *entry;
struct trace_seq *s = &iter->seq;
int ret;
trace_assign_type(entry, iter->ent);
/* Free entry */
ret = trace_seq_printf(s, " - ");
if (!ret)
@ -401,19 +438,11 @@ kmemtrace_print_free_compress(struct trace_iterator *iter,
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
/* Skip node */
ret = trace_seq_printf(s, " ");
/* Skip node and print call site*/
ret = trace_seq_printf(s, " %pf\n", (void *)entry->call_site);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
/* Call site */
ret = seq_print_ip_sym(s, entry->call_site, 0);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
if (!trace_seq_printf(s, "\n"))
return TRACE_TYPE_PARTIAL_LINE;
return TRACE_TYPE_HANDLED;
}
@ -421,32 +450,31 @@ static enum print_line_t kmemtrace_print_line(struct trace_iterator *iter)
{
struct trace_entry *entry = iter->ent;
if (!(kmem_tracer_flags.val & TRACE_KMEM_OPT_MINIMAL))
return TRACE_TYPE_UNHANDLED;
switch (entry->type) {
case TRACE_KMEM_ALLOC: {
struct kmemtrace_alloc_entry *field;
trace_assign_type(field, entry);
if (kmem_tracer_flags.val & TRACE_KMEM_OPT_MINIMAL)
return kmemtrace_print_alloc_compress(iter, field);
else
return kmemtrace_print_alloc_user(iter, field);
}
case TRACE_KMEM_FREE: {
struct kmemtrace_free_entry *field;
trace_assign_type(field, entry);
if (kmem_tracer_flags.val & TRACE_KMEM_OPT_MINIMAL)
return kmemtrace_print_free_compress(iter, field);
else
return kmemtrace_print_free_user(iter, field);
}
case TRACE_KMEM_ALLOC:
return kmemtrace_print_alloc_compress(iter);
case TRACE_KMEM_FREE:
return kmemtrace_print_free_compress(iter);
default:
return TRACE_TYPE_UNHANDLED;
}
}
static struct trace_event kmem_trace_alloc = {
.type = TRACE_KMEM_ALLOC,
.trace = kmemtrace_print_alloc,
.binary = kmemtrace_print_alloc_user,
};
static struct trace_event kmem_trace_free = {
.type = TRACE_KMEM_FREE,
.trace = kmemtrace_print_free,
.binary = kmemtrace_print_free_user,
};
static struct tracer kmem_tracer __read_mostly = {
.name = "kmemtrace",
.init = kmem_trace_init,
@ -463,6 +491,21 @@ void kmemtrace_init(void)
static int __init init_kmem_tracer(void)
{
return register_tracer(&kmem_tracer);
if (!register_ftrace_event(&kmem_trace_alloc)) {
pr_warning("Warning: could not register kmem events\n");
return 1;
}
if (!register_ftrace_event(&kmem_trace_free)) {
pr_warning("Warning: could not register kmem events\n");
return 1;
}
if (!register_tracer(&kmem_tracer)) {
pr_warning("Warning: could not register the kmem tracer\n");
return 1;
}
return 0;
}
device_initcall(init_kmem_tracer);

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -34,8 +34,6 @@ enum trace_type {
TRACE_GRAPH_ENT,
TRACE_USER_STACK,
TRACE_HW_BRANCHES,
TRACE_SYSCALL_ENTER,
TRACE_SYSCALL_EXIT,
TRACE_KMEM_ALLOC,
TRACE_KMEM_FREE,
TRACE_POWER,
@ -236,9 +234,6 @@ struct trace_array_cpu {
atomic_t disabled;
void *buffer_page; /* ring buffer spare */
/* these fields get copied into max-trace: */
unsigned long trace_idx;
unsigned long overrun;
unsigned long saved_latency;
unsigned long critical_start;
unsigned long critical_end;
@ -246,6 +241,7 @@ struct trace_array_cpu {
unsigned long nice;
unsigned long policy;
unsigned long rt_priority;
unsigned long skipped_entries;
cycle_t preempt_timestamp;
pid_t pid;
uid_t uid;
@ -319,10 +315,6 @@ extern void __ftrace_bad_type(void);
TRACE_KMEM_ALLOC); \
IF_ASSIGN(var, ent, struct kmemtrace_free_entry, \
TRACE_KMEM_FREE); \
IF_ASSIGN(var, ent, struct syscall_trace_enter, \
TRACE_SYSCALL_ENTER); \
IF_ASSIGN(var, ent, struct syscall_trace_exit, \
TRACE_SYSCALL_EXIT); \
__ftrace_bad_type(); \
} while (0)
@ -423,12 +415,13 @@ void init_tracer_sysprof_debugfs(struct dentry *d_tracer);
struct ring_buffer_event;
struct ring_buffer_event *trace_buffer_lock_reserve(struct trace_array *tr,
int type,
unsigned long len,
unsigned long flags,
int pc);
void trace_buffer_unlock_commit(struct trace_array *tr,
struct ring_buffer_event *
trace_buffer_lock_reserve(struct ring_buffer *buffer,
int type,
unsigned long len,
unsigned long flags,
int pc);
void trace_buffer_unlock_commit(struct ring_buffer *buffer,
struct ring_buffer_event *event,
unsigned long flags, int pc);
@ -467,6 +460,7 @@ void trace_function(struct trace_array *tr,
void trace_graph_return(struct ftrace_graph_ret *trace);
int trace_graph_entry(struct ftrace_graph_ent *trace);
void set_graph_array(struct trace_array *tr);
void tracing_start_cmdline_record(void);
void tracing_stop_cmdline_record(void);
@ -478,16 +472,40 @@ void unregister_tracer(struct tracer *type);
extern unsigned long nsecs_to_usecs(unsigned long nsecs);
#ifdef CONFIG_TRACER_MAX_TRACE
extern unsigned long tracing_max_latency;
extern unsigned long tracing_thresh;
void update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu);
void update_max_tr_single(struct trace_array *tr,
struct task_struct *tsk, int cpu);
#endif /* CONFIG_TRACER_MAX_TRACE */
void __trace_stack(struct trace_array *tr,
unsigned long flags,
int skip, int pc);
#ifdef CONFIG_STACKTRACE
void ftrace_trace_stack(struct ring_buffer *buffer, unsigned long flags,
int skip, int pc);
void ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags,
int pc);
void __trace_stack(struct trace_array *tr, unsigned long flags, int skip,
int pc);
#else
static inline void ftrace_trace_stack(struct trace_array *tr,
unsigned long flags, int skip, int pc)
{
}
static inline void ftrace_trace_userstack(struct trace_array *tr,
unsigned long flags, int pc)
{
}
static inline void __trace_stack(struct trace_array *tr, unsigned long flags,
int skip, int pc)
{
}
#endif /* CONFIG_STACKTRACE */
extern cycle_t ftrace_now(int cpu);
@ -513,6 +531,10 @@ extern unsigned long ftrace_update_tot_cnt;
extern int DYN_FTRACE_TEST_NAME(void);
#endif
extern int ring_buffer_expanded;
extern bool tracing_selftest_disabled;
DECLARE_PER_CPU(local_t, ftrace_cpu_disabled);
#ifdef CONFIG_FTRACE_STARTUP_TEST
extern int trace_selftest_startup_function(struct tracer *trace,
struct trace_array *tr);
@ -544,9 +566,16 @@ extern int
trace_vbprintk(unsigned long ip, const char *fmt, va_list args);
extern int
trace_vprintk(unsigned long ip, const char *fmt, va_list args);
extern int
trace_array_vprintk(struct trace_array *tr,
unsigned long ip, const char *fmt, va_list args);
int trace_array_printk(struct trace_array *tr,
unsigned long ip, const char *fmt, ...);
extern unsigned long trace_flags;
extern int trace_clock_id;
/* Standard output formatting function used for function return traces */
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
extern enum print_line_t print_graph_function(struct trace_iterator *iter);
@ -635,9 +664,8 @@ enum trace_iterator_flags {
TRACE_ITER_PRINTK_MSGONLY = 0x10000,
TRACE_ITER_CONTEXT_INFO = 0x20000, /* Print pid/cpu/time */
TRACE_ITER_LATENCY_FMT = 0x40000,
TRACE_ITER_GLOBAL_CLK = 0x80000,
TRACE_ITER_SLEEP_TIME = 0x100000,
TRACE_ITER_GRAPH_TIME = 0x200000,
TRACE_ITER_SLEEP_TIME = 0x80000,
TRACE_ITER_GRAPH_TIME = 0x100000,
};
/*
@ -734,6 +762,7 @@ struct ftrace_event_field {
struct list_head link;
char *name;
char *type;
int filter_type;
int offset;
int size;
int is_signed;
@ -743,13 +772,15 @@ struct event_filter {
int n_preds;
struct filter_pred **preds;
char *filter_string;
bool no_reset;
};
struct event_subsystem {
struct list_head list;
const char *name;
struct dentry *entry;
void *filter;
struct event_filter *filter;
int nr_events;
};
struct filter_pred;
@ -777,6 +808,7 @@ extern int apply_subsystem_event_filter(struct event_subsystem *system,
char *filter_string);
extern void print_subsystem_event_filter(struct event_subsystem *system,
struct trace_seq *s);
extern int filter_assign_type(const char *type);
static inline int
filter_check_discard(struct ftrace_event_call *call, void *rec,

View file

@ -41,14 +41,12 @@ void disable_boot_trace(void)
static int boot_trace_init(struct trace_array *tr)
{
int cpu;
boot_trace = tr;
if (!tr)
return 0;
for_each_cpu(cpu, cpu_possible_mask)
tracing_reset(tr, cpu);
tracing_reset_online_cpus(tr);
tracing_sched_switch_assign_trace(tr);
return 0;
@ -132,6 +130,7 @@ struct tracer boot_tracer __read_mostly =
void trace_boot_call(struct boot_trace_call *bt, initcall_t fn)
{
struct ring_buffer_event *event;
struct ring_buffer *buffer;
struct trace_boot_call *entry;
struct trace_array *tr = boot_trace;
@ -144,13 +143,14 @@ void trace_boot_call(struct boot_trace_call *bt, initcall_t fn)
sprint_symbol(bt->func, (unsigned long)fn);
preempt_disable();
event = trace_buffer_lock_reserve(tr, TRACE_BOOT_CALL,
buffer = tr->buffer;
event = trace_buffer_lock_reserve(buffer, TRACE_BOOT_CALL,
sizeof(*entry), 0, 0);
if (!event)
goto out;
entry = ring_buffer_event_data(event);
entry->boot_call = *bt;
trace_buffer_unlock_commit(tr, event, 0, 0);
trace_buffer_unlock_commit(buffer, event, 0, 0);
out:
preempt_enable();
}
@ -158,6 +158,7 @@ void trace_boot_call(struct boot_trace_call *bt, initcall_t fn)
void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn)
{
struct ring_buffer_event *event;
struct ring_buffer *buffer;
struct trace_boot_ret *entry;
struct trace_array *tr = boot_trace;
@ -167,13 +168,14 @@ void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn)
sprint_symbol(bt->func, (unsigned long)fn);
preempt_disable();
event = trace_buffer_lock_reserve(tr, TRACE_BOOT_RET,
buffer = tr->buffer;
event = trace_buffer_lock_reserve(buffer, TRACE_BOOT_RET,
sizeof(*entry), 0, 0);
if (!event)
goto out;
entry = ring_buffer_event_data(event);
entry->boot_ret = *bt;
trace_buffer_unlock_commit(tr, event, 0, 0);
trace_buffer_unlock_commit(buffer, event, 0, 0);
out:
preempt_enable();
}

View file

@ -17,6 +17,8 @@
#include <linux/ctype.h>
#include <linux/delay.h>
#include <asm/setup.h>
#include "trace_output.h"
#define TRACE_SYSTEM "TRACE_SYSTEM"
@ -25,8 +27,9 @@ DEFINE_MUTEX(event_mutex);
LIST_HEAD(ftrace_events);
int trace_define_field(struct ftrace_event_call *call, char *type,
char *name, int offset, int size, int is_signed)
int trace_define_field(struct ftrace_event_call *call, const char *type,
const char *name, int offset, int size, int is_signed,
int filter_type)
{
struct ftrace_event_field *field;
@ -42,9 +45,15 @@ int trace_define_field(struct ftrace_event_call *call, char *type,
if (!field->type)
goto err;
if (filter_type == FILTER_OTHER)
field->filter_type = filter_assign_type(type);
else
field->filter_type = filter_type;
field->offset = offset;
field->size = size;
field->is_signed = is_signed;
list_add(&field->link, &call->fields);
return 0;
@ -60,6 +69,29 @@ int trace_define_field(struct ftrace_event_call *call, char *type,
}
EXPORT_SYMBOL_GPL(trace_define_field);
#define __common_field(type, item) \
ret = trace_define_field(call, #type, "common_" #item, \
offsetof(typeof(ent), item), \
sizeof(ent.item), \
is_signed_type(type), FILTER_OTHER); \
if (ret) \
return ret;
int trace_define_common_fields(struct ftrace_event_call *call)
{
int ret;
struct trace_entry ent;
__common_field(unsigned short, type);
__common_field(unsigned char, flags);
__common_field(unsigned char, preempt_count);
__common_field(int, pid);
__common_field(int, tgid);
return ret;
}
EXPORT_SYMBOL_GPL(trace_define_common_fields);
#ifdef CONFIG_MODULES
static void trace_destroy_fields(struct ftrace_event_call *call)
@ -84,14 +116,14 @@ static void ftrace_event_enable_disable(struct ftrace_event_call *call,
if (call->enabled) {
call->enabled = 0;
tracing_stop_cmdline_record();
call->unregfunc();
call->unregfunc(call->data);
}
break;
case 1:
if (!call->enabled) {
call->enabled = 1;
tracing_start_cmdline_record();
call->regfunc();
call->regfunc(call->data);
}
break;
}
@ -574,7 +606,7 @@ event_format_read(struct file *filp, char __user *ubuf, size_t cnt,
trace_seq_printf(s, "format:\n");
trace_write_header(s);
r = call->show_format(s);
r = call->show_format(call, s);
if (!r) {
/*
* ug! The format output is bigger than a PAGE!!
@ -849,8 +881,10 @@ event_subsystem_dir(const char *name, struct dentry *d_events)
/* First see if we did not already create this dir */
list_for_each_entry(system, &event_subsystems, list) {
if (strcmp(system->name, name) == 0)
if (strcmp(system->name, name) == 0) {
system->nr_events++;
return system->entry;
}
}
/* need to create new entry */
@ -869,6 +903,7 @@ event_subsystem_dir(const char *name, struct dentry *d_events)
return d_events;
}
system->nr_events = 1;
system->name = kstrdup(name, GFP_KERNEL);
if (!system->name) {
debugfs_remove(system->entry);
@ -920,15 +955,6 @@ event_create_dir(struct ftrace_event_call *call, struct dentry *d_events,
if (strcmp(call->system, TRACE_SYSTEM) != 0)
d_events = event_subsystem_dir(call->system, d_events);
if (call->raw_init) {
ret = call->raw_init();
if (ret < 0) {
pr_warning("Could not initialize trace point"
" events/%s\n", call->name);
return ret;
}
}
call->dir = debugfs_create_dir(call->name, d_events);
if (!call->dir) {
pr_warning("Could not create debugfs "
@ -945,7 +971,7 @@ event_create_dir(struct ftrace_event_call *call, struct dentry *d_events,
id);
if (call->define_fields) {
ret = call->define_fields();
ret = call->define_fields(call);
if (ret < 0) {
pr_warning("Could not initialize trace point"
" events/%s\n", call->name);
@ -987,6 +1013,32 @@ struct ftrace_module_file_ops {
struct file_operations filter;
};
static void remove_subsystem_dir(const char *name)
{
struct event_subsystem *system;
if (strcmp(name, TRACE_SYSTEM) == 0)
return;
list_for_each_entry(system, &event_subsystems, list) {
if (strcmp(system->name, name) == 0) {
if (!--system->nr_events) {
struct event_filter *filter = system->filter;
debugfs_remove_recursive(system->entry);
list_del(&system->list);
if (filter) {
kfree(filter->filter_string);
kfree(filter);
}
kfree(system->name);
kfree(system);
}
break;
}
}
}
static struct ftrace_module_file_ops *
trace_create_file_ops(struct module *mod)
{
@ -1027,6 +1079,7 @@ static void trace_module_add_events(struct module *mod)
struct ftrace_module_file_ops *file_ops = NULL;
struct ftrace_event_call *call, *start, *end;
struct dentry *d_events;
int ret;
start = mod->trace_events;
end = mod->trace_events + mod->num_trace_events;
@ -1042,7 +1095,15 @@ static void trace_module_add_events(struct module *mod)
/* The linker may leave blanks */
if (!call->name)
continue;
if (call->raw_init) {
ret = call->raw_init();
if (ret < 0) {
if (ret != -ENOSYS)
pr_warning("Could not initialize trace "
"point events/%s\n", call->name);
continue;
}
}
/*
* This module has events, create file ops for this module
* if not already done.
@ -1077,6 +1138,7 @@ static void trace_module_remove_events(struct module *mod)
list_del(&call->list);
trace_destroy_fields(call);
destroy_preds(call);
remove_subsystem_dir(call->system);
}
}
@ -1133,6 +1195,18 @@ struct notifier_block trace_module_nb = {
extern struct ftrace_event_call __start_ftrace_events[];
extern struct ftrace_event_call __stop_ftrace_events[];
static char bootup_event_buf[COMMAND_LINE_SIZE] __initdata;
static __init int setup_trace_event(char *str)
{
strlcpy(bootup_event_buf, str, COMMAND_LINE_SIZE);
ring_buffer_expanded = 1;
tracing_selftest_disabled = 1;
return 1;
}
__setup("trace_event=", setup_trace_event);
static __init int event_trace_init(void)
{
struct ftrace_event_call *call;
@ -1140,6 +1214,8 @@ static __init int event_trace_init(void)
struct dentry *entry;
struct dentry *d_events;
int ret;
char *buf = bootup_event_buf;
char *token;
d_tracer = tracing_init_dentry();
if (!d_tracer)
@ -1179,12 +1255,34 @@ static __init int event_trace_init(void)
/* The linker may leave blanks */
if (!call->name)
continue;
if (call->raw_init) {
ret = call->raw_init();
if (ret < 0) {
if (ret != -ENOSYS)
pr_warning("Could not initialize trace "
"point events/%s\n", call->name);
continue;
}
}
list_add(&call->list, &ftrace_events);
event_create_dir(call, d_events, &ftrace_event_id_fops,
&ftrace_enable_fops, &ftrace_event_filter_fops,
&ftrace_event_format_fops);
}
while (true) {
token = strsep(&buf, ",");
if (!token)
break;
if (!*token)
continue;
ret = ftrace_set_clr_event(token, 1);
if (ret)
pr_warning("Failed to enable trace event: %s\n", token);
}
ret = register_module_notifier(&trace_module_nb);
if (ret)
pr_warning("Failed to register trace events module notifier\n");
@ -1340,6 +1438,7 @@ static void
function_test_events_call(unsigned long ip, unsigned long parent_ip)
{
struct ring_buffer_event *event;
struct ring_buffer *buffer;
struct ftrace_entry *entry;
unsigned long flags;
long disabled;
@ -1357,7 +1456,8 @@ function_test_events_call(unsigned long ip, unsigned long parent_ip)
local_save_flags(flags);
event = trace_current_buffer_lock_reserve(TRACE_FN, sizeof(*entry),
event = trace_current_buffer_lock_reserve(&buffer,
TRACE_FN, sizeof(*entry),
flags, pc);
if (!event)
goto out;
@ -1365,7 +1465,7 @@ function_test_events_call(unsigned long ip, unsigned long parent_ip)
entry->ip = ip;
entry->parent_ip = parent_ip;
trace_nowake_buffer_unlock_commit(event, flags, pc);
trace_nowake_buffer_unlock_commit(buffer, event, flags, pc);
out:
atomic_dec(&per_cpu(test_event_disable, cpu));
@ -1392,10 +1492,10 @@ static __init void event_trace_self_test_with_function(void)
static __init int event_trace_self_tests_init(void)
{
event_trace_self_tests();
event_trace_self_test_with_function();
if (!tracing_selftest_disabled) {
event_trace_self_tests();
event_trace_self_test_with_function();
}
return 0;
}

View file

@ -163,6 +163,20 @@ static int filter_pred_string(struct filter_pred *pred, void *event,
return match;
}
/* Filter predicate for char * pointers */
static int filter_pred_pchar(struct filter_pred *pred, void *event,
int val1, int val2)
{
char **addr = (char **)(event + pred->offset);
int cmp, match;
cmp = strncmp(*addr, pred->str_val, pred->str_len);
match = (!cmp) ^ pred->not;
return match;
}
/*
* Filter predicate for dynamic sized arrays of characters.
* These are implemented through a list of strings at the end
@ -176,11 +190,13 @@ static int filter_pred_string(struct filter_pred *pred, void *event,
static int filter_pred_strloc(struct filter_pred *pred, void *event,
int val1, int val2)
{
unsigned short str_loc = *(unsigned short *)(event + pred->offset);
u32 str_item = *(u32 *)(event + pred->offset);
int str_loc = str_item & 0xffff;
int str_len = str_item >> 16;
char *addr = (char *)(event + str_loc);
int cmp, match;
cmp = strncmp(addr, pred->str_val, pred->str_len);
cmp = strncmp(addr, pred->str_val, str_len);
match = (!cmp) ^ pred->not;
@ -293,7 +309,7 @@ void print_event_filter(struct ftrace_event_call *call, struct trace_seq *s)
struct event_filter *filter = call->filter;
mutex_lock(&event_mutex);
if (filter->filter_string)
if (filter && filter->filter_string)
trace_seq_printf(s, "%s\n", filter->filter_string);
else
trace_seq_printf(s, "none\n");
@ -306,7 +322,7 @@ void print_subsystem_event_filter(struct event_subsystem *system,
struct event_filter *filter = system->filter;
mutex_lock(&event_mutex);
if (filter->filter_string)
if (filter && filter->filter_string)
trace_seq_printf(s, "%s\n", filter->filter_string);
else
trace_seq_printf(s, "none\n");
@ -374,6 +390,9 @@ void destroy_preds(struct ftrace_event_call *call)
struct event_filter *filter = call->filter;
int i;
if (!filter)
return;
for (i = 0; i < MAX_FILTER_PRED; i++) {
if (filter->preds[i])
filter_free_pred(filter->preds[i]);
@ -384,17 +403,19 @@ void destroy_preds(struct ftrace_event_call *call)
call->filter = NULL;
}
int init_preds(struct ftrace_event_call *call)
static int init_preds(struct ftrace_event_call *call)
{
struct event_filter *filter;
struct filter_pred *pred;
int i;
if (call->filter)
return 0;
filter = call->filter = kzalloc(sizeof(*filter), GFP_KERNEL);
if (!call->filter)
return -ENOMEM;
call->filter_active = 0;
filter->n_preds = 0;
filter->preds = kzalloc(MAX_FILTER_PRED * sizeof(pred), GFP_KERNEL);
@ -416,30 +437,55 @@ int init_preds(struct ftrace_event_call *call)
return -ENOMEM;
}
EXPORT_SYMBOL_GPL(init_preds);
static void filter_free_subsystem_preds(struct event_subsystem *system)
static int init_subsystem_preds(struct event_subsystem *system)
{
struct event_filter *filter = system->filter;
struct ftrace_event_call *call;
int i;
if (filter->n_preds) {
for (i = 0; i < filter->n_preds; i++)
filter_free_pred(filter->preds[i]);
kfree(filter->preds);
filter->preds = NULL;
filter->n_preds = 0;
}
int err;
list_for_each_entry(call, &ftrace_events, list) {
if (!call->define_fields)
continue;
if (!strcmp(call->system, system->name)) {
filter_disable_preds(call);
remove_filter_string(call->filter);
if (strcmp(call->system, system->name) != 0)
continue;
err = init_preds(call);
if (err)
return err;
}
return 0;
}
enum {
FILTER_DISABLE_ALL,
FILTER_INIT_NO_RESET,
FILTER_SKIP_NO_RESET,
};
static void filter_free_subsystem_preds(struct event_subsystem *system,
int flag)
{
struct ftrace_event_call *call;
list_for_each_entry(call, &ftrace_events, list) {
if (!call->define_fields)
continue;
if (strcmp(call->system, system->name) != 0)
continue;
if (flag == FILTER_INIT_NO_RESET) {
call->filter->no_reset = false;
continue;
}
if (flag == FILTER_SKIP_NO_RESET && call->filter->no_reset)
continue;
filter_disable_preds(call);
remove_filter_string(call->filter);
}
}
@ -468,12 +514,7 @@ static int filter_add_pred_fn(struct filter_parse_state *ps,
return 0;
}
enum {
FILTER_STATIC_STRING = 1,
FILTER_DYN_STRING
};
static int is_string_field(const char *type)
int filter_assign_type(const char *type)
{
if (strstr(type, "__data_loc") && strstr(type, "char"))
return FILTER_DYN_STRING;
@ -481,12 +522,19 @@ static int is_string_field(const char *type)
if (strchr(type, '[') && strstr(type, "char"))
return FILTER_STATIC_STRING;
return 0;
return FILTER_OTHER;
}
static bool is_string_field(struct ftrace_event_field *field)
{
return field->filter_type == FILTER_DYN_STRING ||
field->filter_type == FILTER_STATIC_STRING ||
field->filter_type == FILTER_PTR_STRING;
}
static int is_legal_op(struct ftrace_event_field *field, int op)
{
if (is_string_field(field->type) && (op != OP_EQ && op != OP_NE))
if (is_string_field(field) && (op != OP_EQ && op != OP_NE))
return 0;
return 1;
@ -537,22 +585,24 @@ static filter_pred_fn_t select_comparison_fn(int op, int field_size,
static int filter_add_pred(struct filter_parse_state *ps,
struct ftrace_event_call *call,
struct filter_pred *pred)
struct filter_pred *pred,
bool dry_run)
{
struct ftrace_event_field *field;
filter_pred_fn_t fn;
unsigned long long val;
int string_type;
int ret;
pred->fn = filter_pred_none;
if (pred->op == OP_AND) {
pred->pop_n = 2;
return filter_add_pred_fn(ps, call, pred, filter_pred_and);
fn = filter_pred_and;
goto add_pred_fn;
} else if (pred->op == OP_OR) {
pred->pop_n = 2;
return filter_add_pred_fn(ps, call, pred, filter_pred_or);
fn = filter_pred_or;
goto add_pred_fn;
}
field = find_event_field(call, pred->field_name);
@ -568,16 +618,17 @@ static int filter_add_pred(struct filter_parse_state *ps,
return -EINVAL;
}
string_type = is_string_field(field->type);
if (string_type) {
if (string_type == FILTER_STATIC_STRING)
fn = filter_pred_string;
else
fn = filter_pred_strloc;
if (is_string_field(field)) {
pred->str_len = field->size;
if (pred->op == OP_NE)
pred->not = 1;
return filter_add_pred_fn(ps, call, pred, fn);
if (field->filter_type == FILTER_STATIC_STRING)
fn = filter_pred_string;
else if (field->filter_type == FILTER_DYN_STRING)
fn = filter_pred_strloc;
else {
fn = filter_pred_pchar;
pred->str_len = strlen(pred->str_val);
}
} else {
if (field->is_signed)
ret = strict_strtoll(pred->str_val, 0, &val);
@ -588,41 +639,33 @@ static int filter_add_pred(struct filter_parse_state *ps,
return -EINVAL;
}
pred->val = val;
}
fn = select_comparison_fn(pred->op, field->size, field->is_signed);
if (!fn) {
parse_error(ps, FILT_ERR_INVALID_OP, 0);
return -EINVAL;
fn = select_comparison_fn(pred->op, field->size,
field->is_signed);
if (!fn) {
parse_error(ps, FILT_ERR_INVALID_OP, 0);
return -EINVAL;
}
}
if (pred->op == OP_NE)
pred->not = 1;
return filter_add_pred_fn(ps, call, pred, fn);
add_pred_fn:
if (!dry_run)
return filter_add_pred_fn(ps, call, pred, fn);
return 0;
}
static int filter_add_subsystem_pred(struct filter_parse_state *ps,
struct event_subsystem *system,
struct filter_pred *pred,
char *filter_string)
char *filter_string,
bool dry_run)
{
struct event_filter *filter = system->filter;
struct ftrace_event_call *call;
int err = 0;
if (!filter->preds) {
filter->preds = kzalloc(MAX_FILTER_PRED * sizeof(pred),
GFP_KERNEL);
if (!filter->preds)
return -ENOMEM;
}
if (filter->n_preds == MAX_FILTER_PRED) {
parse_error(ps, FILT_ERR_TOO_MANY_PREDS, 0);
return -ENOSPC;
}
bool fail = true;
list_for_each_entry(call, &ftrace_events, list) {
@ -632,19 +675,24 @@ static int filter_add_subsystem_pred(struct filter_parse_state *ps,
if (strcmp(call->system, system->name))
continue;
err = filter_add_pred(ps, call, pred);
if (err) {
filter_free_subsystem_preds(system);
parse_error(ps, FILT_ERR_BAD_SUBSYS_FILTER, 0);
goto out;
}
replace_filter_string(call->filter, filter_string);
if (call->filter->no_reset)
continue;
err = filter_add_pred(ps, call, pred, dry_run);
if (err)
call->filter->no_reset = true;
else
fail = false;
if (!dry_run)
replace_filter_string(call->filter, filter_string);
}
filter->preds[filter->n_preds] = pred;
filter->n_preds++;
out:
return err;
if (fail) {
parse_error(ps, FILT_ERR_BAD_SUBSYS_FILTER, 0);
return err;
}
return 0;
}
static void parse_init(struct filter_parse_state *ps,
@ -1003,12 +1051,14 @@ static int check_preds(struct filter_parse_state *ps)
static int replace_preds(struct event_subsystem *system,
struct ftrace_event_call *call,
struct filter_parse_state *ps,
char *filter_string)
char *filter_string,
bool dry_run)
{
char *operand1 = NULL, *operand2 = NULL;
struct filter_pred *pred;
struct postfix_elt *elt;
int err;
int n_preds = 0;
err = check_preds(ps);
if (err)
@ -1027,24 +1077,14 @@ static int replace_preds(struct event_subsystem *system,
continue;
}
if (n_preds++ == MAX_FILTER_PRED) {
parse_error(ps, FILT_ERR_TOO_MANY_PREDS, 0);
return -ENOSPC;
}
if (elt->op == OP_AND || elt->op == OP_OR) {
pred = create_logical_pred(elt->op);
if (!pred)
return -ENOMEM;
if (call) {
err = filter_add_pred(ps, call, pred);
filter_free_pred(pred);
} else {
err = filter_add_subsystem_pred(ps, system,
pred, filter_string);
if (err)
filter_free_pred(pred);
}
if (err)
return err;
operand1 = operand2 = NULL;
continue;
goto add_pred;
}
if (!operand1 || !operand2) {
@ -1053,17 +1093,15 @@ static int replace_preds(struct event_subsystem *system,
}
pred = create_pred(elt->op, operand1, operand2);
add_pred:
if (!pred)
return -ENOMEM;
if (call) {
err = filter_add_pred(ps, call, pred);
filter_free_pred(pred);
} else {
if (call)
err = filter_add_pred(ps, call, pred, false);
else
err = filter_add_subsystem_pred(ps, system, pred,
filter_string);
if (err)
filter_free_pred(pred);
}
filter_string, dry_run);
filter_free_pred(pred);
if (err)
return err;
@ -1081,6 +1119,10 @@ int apply_event_filter(struct ftrace_event_call *call, char *filter_string)
mutex_lock(&event_mutex);
err = init_preds(call);
if (err)
goto out_unlock;
if (!strcmp(strstrip(filter_string), "0")) {
filter_disable_preds(call);
remove_filter_string(call->filter);
@ -1103,7 +1145,7 @@ int apply_event_filter(struct ftrace_event_call *call, char *filter_string)
goto out;
}
err = replace_preds(NULL, call, ps, filter_string);
err = replace_preds(NULL, call, ps, filter_string, false);
if (err)
append_filter_err(ps, call->filter);
@ -1126,8 +1168,12 @@ int apply_subsystem_event_filter(struct event_subsystem *system,
mutex_lock(&event_mutex);
err = init_subsystem_preds(system);
if (err)
goto out_unlock;
if (!strcmp(strstrip(filter_string), "0")) {
filter_free_subsystem_preds(system);
filter_free_subsystem_preds(system, FILTER_DISABLE_ALL);
remove_filter_string(system->filter);
mutex_unlock(&event_mutex);
return 0;
@ -1138,7 +1184,6 @@ int apply_subsystem_event_filter(struct event_subsystem *system,
if (!ps)
goto out_unlock;
filter_free_subsystem_preds(system);
replace_filter_string(system->filter, filter_string);
parse_init(ps, filter_ops, filter_string);
@ -1148,9 +1193,23 @@ int apply_subsystem_event_filter(struct event_subsystem *system,
goto out;
}
err = replace_preds(system, NULL, ps, filter_string);
if (err)
filter_free_subsystem_preds(system, FILTER_INIT_NO_RESET);
/* try to see the filter can be applied to which events */
err = replace_preds(system, NULL, ps, filter_string, true);
if (err) {
append_filter_err(ps, system->filter);
goto out;
}
filter_free_subsystem_preds(system, FILTER_SKIP_NO_RESET);
/* really apply the filter to the events */
err = replace_preds(system, NULL, ps, filter_string, false);
if (err) {
append_filter_err(ps, system->filter);
filter_free_subsystem_preds(system, 2);
}
out:
filter_opstack_clear(ps);

View file

@ -60,7 +60,8 @@ extern void __bad_type_size(void);
#undef TRACE_EVENT_FORMAT
#define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
static int \
ftrace_format_##call(struct trace_seq *s) \
ftrace_format_##call(struct ftrace_event_call *unused, \
struct trace_seq *s) \
{ \
struct args field; \
int ret; \
@ -76,7 +77,8 @@ ftrace_format_##call(struct trace_seq *s) \
#define TRACE_EVENT_FORMAT_NOFILTER(call, proto, args, fmt, tstruct, \
tpfmt) \
static int \
ftrace_format_##call(struct trace_seq *s) \
ftrace_format_##call(struct ftrace_event_call *unused, \
struct trace_seq *s) \
{ \
struct args field; \
int ret; \
@ -117,7 +119,7 @@ ftrace_format_##call(struct trace_seq *s) \
#undef TRACE_EVENT_FORMAT
#define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
int ftrace_define_fields_##call(void); \
int ftrace_define_fields_##call(struct ftrace_event_call *event_call); \
static int ftrace_raw_init_event_##call(void); \
\
struct ftrace_event_call __used \
@ -133,7 +135,6 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
static int ftrace_raw_init_event_##call(void) \
{ \
INIT_LIST_HEAD(&event_##call.fields); \
init_preds(&event_##call); \
return 0; \
} \
@ -156,7 +157,8 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
#define TRACE_FIELD(type, item, assign) \
ret = trace_define_field(event_call, #type, #item, \
offsetof(typeof(field), item), \
sizeof(field.item), is_signed_type(type)); \
sizeof(field.item), \
is_signed_type(type), FILTER_OTHER); \
if (ret) \
return ret;
@ -164,7 +166,7 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
#define TRACE_FIELD_SPECIAL(type, item, len, cmd) \
ret = trace_define_field(event_call, #type "[" #len "]", #item, \
offsetof(typeof(field), item), \
sizeof(field.item), 0); \
sizeof(field.item), 0, FILTER_OTHER); \
if (ret) \
return ret;
@ -172,7 +174,8 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
#define TRACE_FIELD_SIGN(type, item, assign, is_signed) \
ret = trace_define_field(event_call, #type, #item, \
offsetof(typeof(field), item), \
sizeof(field.item), is_signed); \
sizeof(field.item), is_signed, \
FILTER_OTHER); \
if (ret) \
return ret;
@ -182,17 +185,14 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
#undef TRACE_EVENT_FORMAT
#define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \
int \
ftrace_define_fields_##call(void) \
ftrace_define_fields_##call(struct ftrace_event_call *event_call) \
{ \
struct ftrace_event_call *event_call = &event_##call; \
struct args field; \
int ret; \
\
__common_field(unsigned char, type, 0); \
__common_field(unsigned char, flags, 0); \
__common_field(unsigned char, preempt_count, 0); \
__common_field(int, pid, 1); \
__common_field(int, tgid, 1); \
ret = trace_define_common_fields(event_call); \
if (ret) \
return ret; \
\
tstruct; \
\

View file

@ -288,11 +288,9 @@ static int
ftrace_trace_onoff_print(struct seq_file *m, unsigned long ip,
struct ftrace_probe_ops *ops, void *data)
{
char str[KSYM_SYMBOL_LEN];
long count = (long)data;
kallsyms_lookup(ip, NULL, NULL, NULL, str);
seq_printf(m, "%s:", str);
seq_printf(m, "%pf:", (void *)ip);
if (ops == &traceon_probe_ops)
seq_printf(m, "traceon");

View file

@ -52,7 +52,7 @@ static struct tracer_flags tracer_flags = {
.opts = trace_opts
};
/* pid on the last trace processed */
static struct trace_array *graph_array;
/* Add a function return address to the trace stack on thread info.*/
@ -166,10 +166,123 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
return ret;
}
static int __trace_graph_entry(struct trace_array *tr,
struct ftrace_graph_ent *trace,
unsigned long flags,
int pc)
{
struct ftrace_event_call *call = &event_funcgraph_entry;
struct ring_buffer_event *event;
struct ring_buffer *buffer = tr->buffer;
struct ftrace_graph_ent_entry *entry;
if (unlikely(local_read(&__get_cpu_var(ftrace_cpu_disabled))))
return 0;
event = trace_buffer_lock_reserve(buffer, TRACE_GRAPH_ENT,
sizeof(*entry), flags, pc);
if (!event)
return 0;
entry = ring_buffer_event_data(event);
entry->graph_ent = *trace;
if (!filter_current_check_discard(buffer, call, entry, event))
ring_buffer_unlock_commit(buffer, event);
return 1;
}
int trace_graph_entry(struct ftrace_graph_ent *trace)
{
struct trace_array *tr = graph_array;
struct trace_array_cpu *data;
unsigned long flags;
long disabled;
int ret;
int cpu;
int pc;
if (unlikely(!tr))
return 0;
if (!ftrace_trace_task(current))
return 0;
if (!ftrace_graph_addr(trace->func))
return 0;
local_irq_save(flags);
cpu = raw_smp_processor_id();
data = tr->data[cpu];
disabled = atomic_inc_return(&data->disabled);
if (likely(disabled == 1)) {
pc = preempt_count();
ret = __trace_graph_entry(tr, trace, flags, pc);
} else {
ret = 0;
}
/* Only do the atomic if it is not already set */
if (!test_tsk_trace_graph(current))
set_tsk_trace_graph(current);
atomic_dec(&data->disabled);
local_irq_restore(flags);
return ret;
}
static void __trace_graph_return(struct trace_array *tr,
struct ftrace_graph_ret *trace,
unsigned long flags,
int pc)
{
struct ftrace_event_call *call = &event_funcgraph_exit;
struct ring_buffer_event *event;
struct ring_buffer *buffer = tr->buffer;
struct ftrace_graph_ret_entry *entry;
if (unlikely(local_read(&__get_cpu_var(ftrace_cpu_disabled))))
return;
event = trace_buffer_lock_reserve(buffer, TRACE_GRAPH_RET,
sizeof(*entry), flags, pc);
if (!event)
return;
entry = ring_buffer_event_data(event);
entry->ret = *trace;
if (!filter_current_check_discard(buffer, call, entry, event))
ring_buffer_unlock_commit(buffer, event);
}
void trace_graph_return(struct ftrace_graph_ret *trace)
{
struct trace_array *tr = graph_array;
struct trace_array_cpu *data;
unsigned long flags;
long disabled;
int cpu;
int pc;
local_irq_save(flags);
cpu = raw_smp_processor_id();
data = tr->data[cpu];
disabled = atomic_inc_return(&data->disabled);
if (likely(disabled == 1)) {
pc = preempt_count();
__trace_graph_return(tr, trace, flags, pc);
}
if (!trace->depth)
clear_tsk_trace_graph(current);
atomic_dec(&data->disabled);
local_irq_restore(flags);
}
static int graph_trace_init(struct trace_array *tr)
{
int ret = register_ftrace_graph(&trace_graph_return,
&trace_graph_entry);
int ret;
graph_array = tr;
ret = register_ftrace_graph(&trace_graph_return,
&trace_graph_entry);
if (ret)
return ret;
tracing_start_cmdline_record();
@ -177,49 +290,30 @@ static int graph_trace_init(struct trace_array *tr)
return 0;
}
void set_graph_array(struct trace_array *tr)
{
graph_array = tr;
}
static void graph_trace_reset(struct trace_array *tr)
{
tracing_stop_cmdline_record();
unregister_ftrace_graph();
}
static inline int log10_cpu(int nb)
{
if (nb / 100)
return 3;
if (nb / 10)
return 2;
return 1;
}
static int max_bytes_for_cpu;
static enum print_line_t
print_graph_cpu(struct trace_seq *s, int cpu)
{
int i;
int ret;
int log10_this = log10_cpu(cpu);
int log10_all = log10_cpu(cpumask_weight(cpu_online_mask));
/*
* Start with a space character - to make it stand out
* to the right a bit when trace output is pasted into
* email:
*/
ret = trace_seq_printf(s, " ");
/*
* Tricky - we space the CPU field according to the max
* number of online CPUs. On a 2-cpu system it would take
* a maximum of 1 digit - on a 128 cpu system it would
* take up to 3 digits:
*/
for (i = 0; i < log10_all - log10_this; i++) {
ret = trace_seq_printf(s, " ");
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
}
ret = trace_seq_printf(s, "%d) ", cpu);
ret = trace_seq_printf(s, " %*d) ", max_bytes_for_cpu, cpu);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
@ -565,11 +659,7 @@ print_graph_entry_leaf(struct trace_iterator *iter,
return TRACE_TYPE_PARTIAL_LINE;
}
ret = seq_print_ip_sym(s, call->func, 0);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
ret = trace_seq_printf(s, "();\n");
ret = trace_seq_printf(s, "%pf();\n", (void *)call->func);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
@ -612,11 +702,7 @@ print_graph_entry_nested(struct trace_iterator *iter,
return TRACE_TYPE_PARTIAL_LINE;
}
ret = seq_print_ip_sym(s, call->func, 0);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
ret = trace_seq_printf(s, "() {\n");
ret = trace_seq_printf(s, "%pf() {\n", (void *)call->func);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
@ -934,6 +1020,8 @@ static struct tracer graph_trace __read_mostly = {
static __init int init_graph_trace(void)
{
max_bytes_for_cpu = snprintf(NULL, 0, "%d", nr_cpu_ids - 1);
return register_tracer(&graph_trace);
}

View file

@ -178,7 +178,6 @@ check_critical_timing(struct trace_array *tr,
out:
data->critical_sequence = max_sequence;
data->preempt_timestamp = ftrace_now(cpu);
tracing_reset(tr, cpu);
trace_function(tr, CALLER_ADDR0, parent_ip, flags, pc);
}
@ -208,7 +207,6 @@ start_critical_timing(unsigned long ip, unsigned long parent_ip)
data->critical_sequence = max_sequence;
data->preempt_timestamp = ftrace_now(cpu);
data->critical_start = parent_ip ? : ip;
tracing_reset(tr, cpu);
local_save_flags(flags);
@ -379,6 +377,7 @@ static void __irqsoff_tracer_init(struct trace_array *tr)
irqsoff_trace = tr;
/* make sure that the tracer is visible */
smp_wmb();
tracing_reset_online_cpus(tr);
start_irqsoff_tracer(tr);
}

View file

@ -307,11 +307,12 @@ static void __trace_mmiotrace_rw(struct trace_array *tr,
struct trace_array_cpu *data,
struct mmiotrace_rw *rw)
{
struct ring_buffer *buffer = tr->buffer;
struct ring_buffer_event *event;
struct trace_mmiotrace_rw *entry;
int pc = preempt_count();
event = trace_buffer_lock_reserve(tr, TRACE_MMIO_RW,
event = trace_buffer_lock_reserve(buffer, TRACE_MMIO_RW,
sizeof(*entry), 0, pc);
if (!event) {
atomic_inc(&dropped_count);
@ -319,7 +320,7 @@ static void __trace_mmiotrace_rw(struct trace_array *tr,
}
entry = ring_buffer_event_data(event);
entry->rw = *rw;
trace_buffer_unlock_commit(tr, event, 0, pc);
trace_buffer_unlock_commit(buffer, event, 0, pc);
}
void mmio_trace_rw(struct mmiotrace_rw *rw)
@ -333,11 +334,12 @@ static void __trace_mmiotrace_map(struct trace_array *tr,
struct trace_array_cpu *data,
struct mmiotrace_map *map)
{
struct ring_buffer *buffer = tr->buffer;
struct ring_buffer_event *event;
struct trace_mmiotrace_map *entry;
int pc = preempt_count();
event = trace_buffer_lock_reserve(tr, TRACE_MMIO_MAP,
event = trace_buffer_lock_reserve(buffer, TRACE_MMIO_MAP,
sizeof(*entry), 0, pc);
if (!event) {
atomic_inc(&dropped_count);
@ -345,7 +347,7 @@ static void __trace_mmiotrace_map(struct trace_array *tr,
}
entry = ring_buffer_event_data(event);
entry->map = *map;
trace_buffer_unlock_commit(tr, event, 0, pc);
trace_buffer_unlock_commit(buffer, event, 0, pc);
}
void mmio_trace_mapping(struct mmiotrace_map *map)

View file

@ -38,6 +38,7 @@ static void probe_power_end(struct power_trace *it)
{
struct ftrace_event_call *call = &event_power;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
struct trace_power *entry;
struct trace_array_cpu *data;
struct trace_array *tr = power_trace;
@ -45,18 +46,20 @@ static void probe_power_end(struct power_trace *it)
if (!trace_power_enabled)
return;
buffer = tr->buffer;
preempt_disable();
it->end = ktime_get();
data = tr->data[smp_processor_id()];
event = trace_buffer_lock_reserve(tr, TRACE_POWER,
event = trace_buffer_lock_reserve(buffer, TRACE_POWER,
sizeof(*entry), 0, 0);
if (!event)
goto out;
entry = ring_buffer_event_data(event);
entry->state_data = *it;
if (!filter_check_discard(call, entry, tr->buffer, event))
trace_buffer_unlock_commit(tr, event, 0, 0);
if (!filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit(buffer, event, 0, 0);
out:
preempt_enable();
}
@ -66,6 +69,7 @@ static void probe_power_mark(struct power_trace *it, unsigned int type,
{
struct ftrace_event_call *call = &event_power;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
struct trace_power *entry;
struct trace_array_cpu *data;
struct trace_array *tr = power_trace;
@ -73,6 +77,8 @@ static void probe_power_mark(struct power_trace *it, unsigned int type,
if (!trace_power_enabled)
return;
buffer = tr->buffer;
memset(it, 0, sizeof(struct power_trace));
it->state = level;
it->type = type;
@ -81,14 +87,14 @@ static void probe_power_mark(struct power_trace *it, unsigned int type,
it->end = it->stamp;
data = tr->data[smp_processor_id()];
event = trace_buffer_lock_reserve(tr, TRACE_POWER,
event = trace_buffer_lock_reserve(buffer, TRACE_POWER,
sizeof(*entry), 0, 0);
if (!event)
goto out;
entry = ring_buffer_event_data(event);
entry->state_data = *it;
if (!filter_check_discard(call, entry, tr->buffer, event))
trace_buffer_unlock_commit(tr, event, 0, 0);
if (!filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit(buffer, event, 0, 0);
out:
preempt_enable();
}
@ -144,14 +150,12 @@ static void power_trace_reset(struct trace_array *tr)
static int power_trace_init(struct trace_array *tr)
{
int cpu;
power_trace = tr;
trace_power_enabled = 1;
tracing_power_register();
for_each_cpu(cpu, cpu_possible_mask)
tracing_reset(tr, cpu);
tracing_reset_online_cpus(tr);
return 0;
}

View file

@ -20,6 +20,35 @@ static int sched_ref;
static DEFINE_MUTEX(sched_register_mutex);
static int sched_stopped;
void
tracing_sched_switch_trace(struct trace_array *tr,
struct task_struct *prev,
struct task_struct *next,
unsigned long flags, int pc)
{
struct ftrace_event_call *call = &event_context_switch;
struct ring_buffer *buffer = tr->buffer;
struct ring_buffer_event *event;
struct ctx_switch_entry *entry;
event = trace_buffer_lock_reserve(buffer, TRACE_CTX,
sizeof(*entry), flags, pc);
if (!event)
return;
entry = ring_buffer_event_data(event);
entry->prev_pid = prev->pid;
entry->prev_prio = prev->prio;
entry->prev_state = prev->state;
entry->next_pid = next->pid;
entry->next_prio = next->prio;
entry->next_state = next->state;
entry->next_cpu = task_cpu(next);
if (!filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit(buffer, event, flags, pc);
}
static void
probe_sched_switch(struct rq *__rq, struct task_struct *prev,
struct task_struct *next)
@ -49,6 +78,36 @@ probe_sched_switch(struct rq *__rq, struct task_struct *prev,
local_irq_restore(flags);
}
void
tracing_sched_wakeup_trace(struct trace_array *tr,
struct task_struct *wakee,
struct task_struct *curr,
unsigned long flags, int pc)
{
struct ftrace_event_call *call = &event_wakeup;
struct ring_buffer_event *event;
struct ctx_switch_entry *entry;
struct ring_buffer *buffer = tr->buffer;
event = trace_buffer_lock_reserve(buffer, TRACE_WAKE,
sizeof(*entry), flags, pc);
if (!event)
return;
entry = ring_buffer_event_data(event);
entry->prev_pid = curr->pid;
entry->prev_prio = curr->prio;
entry->prev_state = curr->state;
entry->next_pid = wakee->pid;
entry->next_prio = wakee->prio;
entry->next_state = wakee->state;
entry->next_cpu = task_cpu(wakee);
if (!filter_check_discard(call, entry, buffer, event))
ring_buffer_unlock_commit(buffer, event);
ftrace_trace_stack(tr->buffer, flags, 6, pc);
ftrace_trace_userstack(tr->buffer, flags, pc);
}
static void
probe_sched_wakeup(struct rq *__rq, struct task_struct *wakee, int success)
{

View file

@ -186,11 +186,6 @@ probe_wakeup_sched_switch(struct rq *rq, struct task_struct *prev,
static void __wakeup_reset(struct trace_array *tr)
{
int cpu;
for_each_possible_cpu(cpu)
tracing_reset(tr, cpu);
wakeup_cpu = -1;
wakeup_prio = -1;
@ -204,6 +199,8 @@ static void wakeup_reset(struct trace_array *tr)
{
unsigned long flags;
tracing_reset_online_cpus(tr);
local_irq_save(flags);
__raw_spin_lock(&wakeup_lock);
__wakeup_reset(tr);

View file

@ -288,6 +288,7 @@ trace_selftest_startup_function_graph(struct tracer *trace,
* to detect and recover from possible hangs
*/
tracing_reset_online_cpus(tr);
set_graph_array(tr);
ret = register_ftrace_graph(&trace_graph_return,
&trace_graph_entry_watchdog);
if (ret) {

View file

@ -186,43 +186,33 @@ static const struct file_operations stack_max_size_fops = {
};
static void *
t_next(struct seq_file *m, void *v, loff_t *pos)
__next(struct seq_file *m, loff_t *pos)
{
long i;
long n = *pos - 1;
(*pos)++;
if (v == SEQ_START_TOKEN)
i = 0;
else {
i = *(long *)v;
i++;
}
if (i >= max_stack_trace.nr_entries ||
stack_dump_trace[i] == ULONG_MAX)
if (n >= max_stack_trace.nr_entries || stack_dump_trace[n] == ULONG_MAX)
return NULL;
m->private = (void *)i;
m->private = (void *)n;
return &m->private;
}
static void *
t_next(struct seq_file *m, void *v, loff_t *pos)
{
(*pos)++;
return __next(m, pos);
}
static void *t_start(struct seq_file *m, loff_t *pos)
{
void *t = SEQ_START_TOKEN;
loff_t l = 0;
local_irq_disable();
__raw_spin_lock(&max_stack_lock);
if (*pos == 0)
return SEQ_START_TOKEN;
for (; t && l < *pos; t = t_next(m, t, &l))
;
return t;
return __next(m, pos);
}
static void t_stop(struct seq_file *m, void *p)
@ -234,15 +224,8 @@ static void t_stop(struct seq_file *m, void *p)
static int trace_lookup_stack(struct seq_file *m, long i)
{
unsigned long addr = stack_dump_trace[i];
#ifdef CONFIG_KALLSYMS
char str[KSYM_SYMBOL_LEN];
sprint_symbol(str, addr);
return seq_printf(m, "%s\n", str);
#else
return seq_printf(m, "%p\n", (void*)addr);
#endif
return seq_printf(m, "%pF\n", (void *)addr);
}
static void print_disabled(struct seq_file *m)

View file

@ -49,7 +49,8 @@ static struct dentry *stat_dir;
* but it will at least advance closer to the next one
* to be released.
*/
static struct rb_node *release_next(struct rb_node *node)
static struct rb_node *release_next(struct tracer_stat *ts,
struct rb_node *node)
{
struct stat_node *snode;
struct rb_node *parent = rb_parent(node);
@ -67,6 +68,8 @@ static struct rb_node *release_next(struct rb_node *node)
parent->rb_right = NULL;
snode = container_of(node, struct stat_node, node);
if (ts->stat_release)
ts->stat_release(snode->stat);
kfree(snode);
return parent;
@ -78,7 +81,7 @@ static void __reset_stat_session(struct stat_session *session)
struct rb_node *node = session->stat_root.rb_node;
while (node)
node = release_next(node);
node = release_next(session->ts, node);
session->stat_root = RB_ROOT;
}
@ -200,17 +203,21 @@ static void *stat_seq_start(struct seq_file *s, loff_t *pos)
{
struct stat_session *session = s->private;
struct rb_node *node;
int n = *pos;
int i;
/* Prevent from tracer switch or rbtree modification */
mutex_lock(&session->stat_mutex);
/* If we are in the beginning of the file, print the headers */
if (!*pos && session->ts->stat_headers)
return SEQ_START_TOKEN;
if (session->ts->stat_headers) {
if (n == 0)
return SEQ_START_TOKEN;
n--;
}
node = rb_first(&session->stat_root);
for (i = 0; node && i < *pos; i++)
for (i = 0; node && i < n; i++)
node = rb_next(node);
return node;

View file

@ -18,6 +18,8 @@ struct tracer_stat {
int (*stat_cmp)(void *p1, void *p2);
/* Print a stat entry */
int (*stat_show)(struct seq_file *s, void *p);
/* Release an entry */
void (*stat_release)(void *stat);
/* Print the headers of your stat entries */
int (*stat_headers)(struct seq_file *s);
};

View file

@ -1,30 +1,18 @@
#include <trace/syscall.h>
#include <trace/events/syscalls.h>
#include <linux/kernel.h>
#include <linux/ftrace.h>
#include <linux/perf_counter.h>
#include <asm/syscall.h>
#include "trace_output.h"
#include "trace.h"
/* Keep a counter of the syscall tracing users */
static int refcount;
/* Prevent from races on thread flags toggling */
static DEFINE_MUTEX(syscall_trace_lock);
/* Option to display the parameters types */
enum {
TRACE_SYSCALLS_OPT_TYPES = 0x1,
};
static struct tracer_opt syscalls_opts[] = {
{ TRACER_OPT(syscall_arg_type, TRACE_SYSCALLS_OPT_TYPES) },
{ }
};
static struct tracer_flags syscalls_flags = {
.val = 0, /* By default: no parameters types */
.opts = syscalls_opts
};
static int sys_refcount_enter;
static int sys_refcount_exit;
static DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
static DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
enum print_line_t
print_syscall_enter(struct trace_iterator *iter, int flags)
@ -35,35 +23,46 @@ print_syscall_enter(struct trace_iterator *iter, int flags)
struct syscall_metadata *entry;
int i, ret, syscall;
trace_assign_type(trace, ent);
trace = (typeof(trace))ent;
syscall = trace->nr;
entry = syscall_nr_to_meta(syscall);
if (!entry)
goto end;
if (entry->enter_id != ent->type) {
WARN_ON_ONCE(1);
goto end;
}
ret = trace_seq_printf(s, "%s(", entry->name);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
for (i = 0; i < entry->nb_args; i++) {
/* parameter types */
if (syscalls_flags.val & TRACE_SYSCALLS_OPT_TYPES) {
if (trace_flags & TRACE_ITER_VERBOSE) {
ret = trace_seq_printf(s, "%s ", entry->types[i]);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
}
/* parameter values */
ret = trace_seq_printf(s, "%s: %lx%s ", entry->args[i],
ret = trace_seq_printf(s, "%s: %lx%s", entry->args[i],
trace->args[i],
i == entry->nb_args - 1 ? ")" : ",");
i == entry->nb_args - 1 ? "" : ", ");
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
}
ret = trace_seq_putc(s, ')');
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
end:
trace_seq_printf(s, "\n");
ret = trace_seq_putc(s, '\n');
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
return TRACE_TYPE_HANDLED;
}
@ -77,16 +76,20 @@ print_syscall_exit(struct trace_iterator *iter, int flags)
struct syscall_metadata *entry;
int ret;
trace_assign_type(trace, ent);
trace = (typeof(trace))ent;
syscall = trace->nr;
entry = syscall_nr_to_meta(syscall);
if (!entry) {
trace_seq_printf(s, "\n");
return TRACE_TYPE_HANDLED;
}
if (entry->exit_id != ent->type) {
WARN_ON_ONCE(1);
return TRACE_TYPE_UNHANDLED;
}
ret = trace_seq_printf(s, "%s -> 0x%lx\n", entry->name,
trace->ret);
if (!ret)
@ -95,62 +98,140 @@ print_syscall_exit(struct trace_iterator *iter, int flags)
return TRACE_TYPE_HANDLED;
}
void start_ftrace_syscalls(void)
extern char *__bad_type_size(void);
#define SYSCALL_FIELD(type, name) \
sizeof(type) != sizeof(trace.name) ? \
__bad_type_size() : \
#type, #name, offsetof(typeof(trace), name), sizeof(trace.name)
int syscall_enter_format(struct ftrace_event_call *call, struct trace_seq *s)
{
unsigned long flags;
struct task_struct *g, *t;
int i;
int nr;
int ret;
struct syscall_metadata *entry;
struct syscall_trace_enter trace;
int offset = offsetof(struct syscall_trace_enter, args);
mutex_lock(&syscall_trace_lock);
nr = syscall_name_to_nr(call->data);
entry = syscall_nr_to_meta(nr);
/* Don't enable the flag on the tasks twice */
if (++refcount != 1)
goto unlock;
if (!entry)
return 0;
arch_init_ftrace_syscalls();
read_lock_irqsave(&tasklist_lock, flags);
ret = trace_seq_printf(s, "\tfield:%s %s;\toffset:%zu;\tsize:%zu;\n",
SYSCALL_FIELD(int, nr));
if (!ret)
return 0;
do_each_thread(g, t) {
set_tsk_thread_flag(t, TIF_SYSCALL_FTRACE);
} while_each_thread(g, t);
for (i = 0; i < entry->nb_args; i++) {
ret = trace_seq_printf(s, "\tfield:%s %s;", entry->types[i],
entry->args[i]);
if (!ret)
return 0;
ret = trace_seq_printf(s, "\toffset:%d;\tsize:%zu;\n", offset,
sizeof(unsigned long));
if (!ret)
return 0;
offset += sizeof(unsigned long);
}
read_unlock_irqrestore(&tasklist_lock, flags);
trace_seq_puts(s, "\nprint fmt: \"");
for (i = 0; i < entry->nb_args; i++) {
ret = trace_seq_printf(s, "%s: 0x%%0%zulx%s", entry->args[i],
sizeof(unsigned long),
i == entry->nb_args - 1 ? "" : ", ");
if (!ret)
return 0;
}
trace_seq_putc(s, '"');
unlock:
mutex_unlock(&syscall_trace_lock);
for (i = 0; i < entry->nb_args; i++) {
ret = trace_seq_printf(s, ", ((unsigned long)(REC->%s))",
entry->args[i]);
if (!ret)
return 0;
}
return trace_seq_putc(s, '\n');
}
void stop_ftrace_syscalls(void)
int syscall_exit_format(struct ftrace_event_call *call, struct trace_seq *s)
{
unsigned long flags;
struct task_struct *g, *t;
int ret;
struct syscall_trace_exit trace;
mutex_lock(&syscall_trace_lock);
ret = trace_seq_printf(s,
"\tfield:%s %s;\toffset:%zu;\tsize:%zu;\n"
"\tfield:%s %s;\toffset:%zu;\tsize:%zu;\n",
SYSCALL_FIELD(int, nr),
SYSCALL_FIELD(unsigned long, ret));
if (!ret)
return 0;
/* There are perhaps still some users */
if (--refcount)
goto unlock;
read_lock_irqsave(&tasklist_lock, flags);
do_each_thread(g, t) {
clear_tsk_thread_flag(t, TIF_SYSCALL_FTRACE);
} while_each_thread(g, t);
read_unlock_irqrestore(&tasklist_lock, flags);
unlock:
mutex_unlock(&syscall_trace_lock);
return trace_seq_printf(s, "\nprint fmt: \"0x%%lx\", REC->ret\n");
}
void ftrace_syscall_enter(struct pt_regs *regs)
int syscall_enter_define_fields(struct ftrace_event_call *call)
{
struct syscall_trace_enter trace;
struct syscall_metadata *meta;
int ret;
int nr;
int i;
int offset = offsetof(typeof(trace), args);
nr = syscall_name_to_nr(call->data);
meta = syscall_nr_to_meta(nr);
if (!meta)
return 0;
ret = trace_define_common_fields(call);
if (ret)
return ret;
for (i = 0; i < meta->nb_args; i++) {
ret = trace_define_field(call, meta->types[i],
meta->args[i], offset,
sizeof(unsigned long), 0,
FILTER_OTHER);
offset += sizeof(unsigned long);
}
return ret;
}
int syscall_exit_define_fields(struct ftrace_event_call *call)
{
struct syscall_trace_exit trace;
int ret;
ret = trace_define_common_fields(call);
if (ret)
return ret;
ret = trace_define_field(call, SYSCALL_FIELD(unsigned long, ret), 0,
FILTER_OTHER);
return ret;
}
void ftrace_syscall_enter(struct pt_regs *regs, long id)
{
struct syscall_trace_enter *entry;
struct syscall_metadata *sys_data;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
int size;
int syscall_nr;
syscall_nr = syscall_get_nr(current, regs);
if (syscall_nr < 0)
return;
if (!test_bit(syscall_nr, enabled_enter_syscalls))
return;
sys_data = syscall_nr_to_meta(syscall_nr);
if (!sys_data)
@ -158,8 +239,8 @@ void ftrace_syscall_enter(struct pt_regs *regs)
size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
event = trace_current_buffer_lock_reserve(TRACE_SYSCALL_ENTER, size,
0, 0);
event = trace_current_buffer_lock_reserve(&buffer, sys_data->enter_id,
size, 0, 0);
if (!event)
return;
@ -167,24 +248,30 @@ void ftrace_syscall_enter(struct pt_regs *regs)
entry->nr = syscall_nr;
syscall_get_arguments(current, regs, 0, sys_data->nb_args, entry->args);
trace_current_buffer_unlock_commit(event, 0, 0);
trace_wake_up();
if (!filter_current_check_discard(buffer, sys_data->enter_event,
entry, event))
trace_current_buffer_unlock_commit(buffer, event, 0, 0);
}
void ftrace_syscall_exit(struct pt_regs *regs)
void ftrace_syscall_exit(struct pt_regs *regs, long ret)
{
struct syscall_trace_exit *entry;
struct syscall_metadata *sys_data;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
int syscall_nr;
syscall_nr = syscall_get_nr(current, regs);
if (syscall_nr < 0)
return;
if (!test_bit(syscall_nr, enabled_exit_syscalls))
return;
sys_data = syscall_nr_to_meta(syscall_nr);
if (!sys_data)
return;
event = trace_current_buffer_lock_reserve(TRACE_SYSCALL_EXIT,
event = trace_current_buffer_lock_reserve(&buffer, sys_data->exit_id,
sizeof(*entry), 0, 0);
if (!event)
return;
@ -193,58 +280,244 @@ void ftrace_syscall_exit(struct pt_regs *regs)
entry->nr = syscall_nr;
entry->ret = syscall_get_return_value(current, regs);
trace_current_buffer_unlock_commit(event, 0, 0);
trace_wake_up();
if (!filter_current_check_discard(buffer, sys_data->exit_event,
entry, event))
trace_current_buffer_unlock_commit(buffer, event, 0, 0);
}
static int init_syscall_tracer(struct trace_array *tr)
int reg_event_syscall_enter(void *ptr)
{
start_ftrace_syscalls();
int ret = 0;
int num;
char *name;
return 0;
}
static void reset_syscall_tracer(struct trace_array *tr)
{
stop_ftrace_syscalls();
tracing_reset_online_cpus(tr);
}
static struct trace_event syscall_enter_event = {
.type = TRACE_SYSCALL_ENTER,
.trace = print_syscall_enter,
};
static struct trace_event syscall_exit_event = {
.type = TRACE_SYSCALL_EXIT,
.trace = print_syscall_exit,
};
static struct tracer syscall_tracer __read_mostly = {
.name = "syscall",
.init = init_syscall_tracer,
.reset = reset_syscall_tracer,
.flags = &syscalls_flags,
};
__init int register_ftrace_syscalls(void)
{
int ret;
ret = register_ftrace_event(&syscall_enter_event);
if (!ret) {
printk(KERN_WARNING "event %d failed to register\n",
syscall_enter_event.type);
WARN_ON_ONCE(1);
name = (char *)ptr;
num = syscall_name_to_nr(name);
if (num < 0 || num >= NR_syscalls)
return -ENOSYS;
mutex_lock(&syscall_trace_lock);
if (!sys_refcount_enter)
ret = register_trace_sys_enter(ftrace_syscall_enter);
if (ret) {
pr_info("event trace: Could not activate"
"syscall entry trace point");
} else {
set_bit(num, enabled_enter_syscalls);
sys_refcount_enter++;
}
ret = register_ftrace_event(&syscall_exit_event);
if (!ret) {
printk(KERN_WARNING "event %d failed to register\n",
syscall_exit_event.type);
WARN_ON_ONCE(1);
}
return register_tracer(&syscall_tracer);
mutex_unlock(&syscall_trace_lock);
return ret;
}
device_initcall(register_ftrace_syscalls);
void unreg_event_syscall_enter(void *ptr)
{
int num;
char *name;
name = (char *)ptr;
num = syscall_name_to_nr(name);
if (num < 0 || num >= NR_syscalls)
return;
mutex_lock(&syscall_trace_lock);
sys_refcount_enter--;
clear_bit(num, enabled_enter_syscalls);
if (!sys_refcount_enter)
unregister_trace_sys_enter(ftrace_syscall_enter);
mutex_unlock(&syscall_trace_lock);
}
int reg_event_syscall_exit(void *ptr)
{
int ret = 0;
int num;
char *name;
name = (char *)ptr;
num = syscall_name_to_nr(name);
if (num < 0 || num >= NR_syscalls)
return -ENOSYS;
mutex_lock(&syscall_trace_lock);
if (!sys_refcount_exit)
ret = register_trace_sys_exit(ftrace_syscall_exit);
if (ret) {
pr_info("event trace: Could not activate"
"syscall exit trace point");
} else {
set_bit(num, enabled_exit_syscalls);
sys_refcount_exit++;
}
mutex_unlock(&syscall_trace_lock);
return ret;
}
void unreg_event_syscall_exit(void *ptr)
{
int num;
char *name;
name = (char *)ptr;
num = syscall_name_to_nr(name);
if (num < 0 || num >= NR_syscalls)
return;
mutex_lock(&syscall_trace_lock);
sys_refcount_exit--;
clear_bit(num, enabled_exit_syscalls);
if (!sys_refcount_exit)
unregister_trace_sys_exit(ftrace_syscall_exit);
mutex_unlock(&syscall_trace_lock);
}
struct trace_event event_syscall_enter = {
.trace = print_syscall_enter,
};
struct trace_event event_syscall_exit = {
.trace = print_syscall_exit,
};
#ifdef CONFIG_EVENT_PROFILE
static DECLARE_BITMAP(enabled_prof_enter_syscalls, NR_syscalls);
static DECLARE_BITMAP(enabled_prof_exit_syscalls, NR_syscalls);
static int sys_prof_refcount_enter;
static int sys_prof_refcount_exit;
static void prof_syscall_enter(struct pt_regs *regs, long id)
{
struct syscall_trace_enter *rec;
struct syscall_metadata *sys_data;
int syscall_nr;
int size;
syscall_nr = syscall_get_nr(current, regs);
if (!test_bit(syscall_nr, enabled_prof_enter_syscalls))
return;
sys_data = syscall_nr_to_meta(syscall_nr);
if (!sys_data)
return;
/* get the size after alignment with the u32 buffer size field */
size = sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec);
size = ALIGN(size + sizeof(u32), sizeof(u64));
size -= sizeof(u32);
do {
char raw_data[size];
/* zero the dead bytes from align to not leak stack to user */
*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
rec = (struct syscall_trace_enter *) raw_data;
tracing_generic_entry_update(&rec->ent, 0, 0);
rec->ent.type = sys_data->enter_id;
rec->nr = syscall_nr;
syscall_get_arguments(current, regs, 0, sys_data->nb_args,
(unsigned long *)&rec->args);
perf_tpcounter_event(sys_data->enter_id, 0, 1, rec, size);
} while(0);
}
int reg_prof_syscall_enter(char *name)
{
int ret = 0;
int num;
num = syscall_name_to_nr(name);
if (num < 0 || num >= NR_syscalls)
return -ENOSYS;
mutex_lock(&syscall_trace_lock);
if (!sys_prof_refcount_enter)
ret = register_trace_sys_enter(prof_syscall_enter);
if (ret) {
pr_info("event trace: Could not activate"
"syscall entry trace point");
} else {
set_bit(num, enabled_prof_enter_syscalls);
sys_prof_refcount_enter++;
}
mutex_unlock(&syscall_trace_lock);
return ret;
}
void unreg_prof_syscall_enter(char *name)
{
int num;
num = syscall_name_to_nr(name);
if (num < 0 || num >= NR_syscalls)
return;
mutex_lock(&syscall_trace_lock);
sys_prof_refcount_enter--;
clear_bit(num, enabled_prof_enter_syscalls);
if (!sys_prof_refcount_enter)
unregister_trace_sys_enter(prof_syscall_enter);
mutex_unlock(&syscall_trace_lock);
}
static void prof_syscall_exit(struct pt_regs *regs, long ret)
{
struct syscall_metadata *sys_data;
struct syscall_trace_exit rec;
int syscall_nr;
syscall_nr = syscall_get_nr(current, regs);
if (!test_bit(syscall_nr, enabled_prof_exit_syscalls))
return;
sys_data = syscall_nr_to_meta(syscall_nr);
if (!sys_data)
return;
tracing_generic_entry_update(&rec.ent, 0, 0);
rec.ent.type = sys_data->exit_id;
rec.nr = syscall_nr;
rec.ret = syscall_get_return_value(current, regs);
perf_tpcounter_event(sys_data->exit_id, 0, 1, &rec, sizeof(rec));
}
int reg_prof_syscall_exit(char *name)
{
int ret = 0;
int num;
num = syscall_name_to_nr(name);
if (num < 0 || num >= NR_syscalls)
return -ENOSYS;
mutex_lock(&syscall_trace_lock);
if (!sys_prof_refcount_exit)
ret = register_trace_sys_exit(prof_syscall_exit);
if (ret) {
pr_info("event trace: Could not activate"
"syscall entry trace point");
} else {
set_bit(num, enabled_prof_exit_syscalls);
sys_prof_refcount_exit++;
}
mutex_unlock(&syscall_trace_lock);
return ret;
}
void unreg_prof_syscall_exit(char *name)
{
int num;
num = syscall_name_to_nr(name);
if (num < 0 || num >= NR_syscalls)
return;
mutex_lock(&syscall_trace_lock);
sys_prof_refcount_exit--;
clear_bit(num, enabled_prof_exit_syscalls);
if (!sys_prof_refcount_exit)
unregister_trace_sys_exit(prof_syscall_exit);
mutex_unlock(&syscall_trace_lock);
}
#endif

View file

@ -9,6 +9,7 @@
#include <trace/events/workqueue.h>
#include <linux/list.h>
#include <linux/percpu.h>
#include <linux/kref.h>
#include "trace_stat.h"
#include "trace.h"
@ -16,6 +17,7 @@
/* A cpu workqueue thread */
struct cpu_workqueue_stats {
struct list_head list;
struct kref kref;
int cpu;
pid_t pid;
/* Can be inserted from interrupt or user context, need to be atomic */
@ -39,6 +41,11 @@ struct workqueue_global_stats {
static DEFINE_PER_CPU(struct workqueue_global_stats, all_workqueue_stat);
#define workqueue_cpu_stat(cpu) (&per_cpu(all_workqueue_stat, cpu))
static void cpu_workqueue_stat_free(struct kref *kref)
{
kfree(container_of(kref, struct cpu_workqueue_stats, kref));
}
/* Insertion of a work */
static void
probe_workqueue_insertion(struct task_struct *wq_thread,
@ -96,8 +103,8 @@ static void probe_workqueue_creation(struct task_struct *wq_thread, int cpu)
return;
}
INIT_LIST_HEAD(&cws->list);
kref_init(&cws->kref);
cws->cpu = cpu;
cws->pid = wq_thread->pid;
spin_lock_irqsave(&workqueue_cpu_stat(cpu)->lock, flags);
@ -118,7 +125,7 @@ static void probe_workqueue_destruction(struct task_struct *wq_thread)
list) {
if (node->pid == wq_thread->pid) {
list_del(&node->list);
kfree(node);
kref_put(&node->kref, cpu_workqueue_stat_free);
goto found;
}
}
@ -137,9 +144,11 @@ static struct cpu_workqueue_stats *workqueue_stat_start_cpu(int cpu)
spin_lock_irqsave(&workqueue_cpu_stat(cpu)->lock, flags);
if (!list_empty(&workqueue_cpu_stat(cpu)->list))
if (!list_empty(&workqueue_cpu_stat(cpu)->list)) {
ret = list_entry(workqueue_cpu_stat(cpu)->list.next,
struct cpu_workqueue_stats, list);
kref_get(&ret->kref);
}
spin_unlock_irqrestore(&workqueue_cpu_stat(cpu)->lock, flags);
@ -162,9 +171,9 @@ static void *workqueue_stat_start(struct tracer_stat *trace)
static void *workqueue_stat_next(void *prev, int idx)
{
struct cpu_workqueue_stats *prev_cws = prev;
struct cpu_workqueue_stats *ret;
int cpu = prev_cws->cpu;
unsigned long flags;
void *ret = NULL;
spin_lock_irqsave(&workqueue_cpu_stat(cpu)->lock, flags);
if (list_is_last(&prev_cws->list, &workqueue_cpu_stat(cpu)->list)) {
@ -175,11 +184,14 @@ static void *workqueue_stat_next(void *prev, int idx)
return NULL;
} while (!(ret = workqueue_stat_start_cpu(cpu)));
return ret;
} else {
ret = list_entry(prev_cws->list.next,
struct cpu_workqueue_stats, list);
kref_get(&ret->kref);
}
spin_unlock_irqrestore(&workqueue_cpu_stat(cpu)->lock, flags);
return list_entry(prev_cws->list.next, struct cpu_workqueue_stats,
list);
return ret;
}
static int workqueue_stat_show(struct seq_file *s, void *p)
@ -203,6 +215,13 @@ static int workqueue_stat_show(struct seq_file *s, void *p)
return 0;
}
static void workqueue_stat_release(void *stat)
{
struct cpu_workqueue_stats *node = stat;
kref_put(&node->kref, cpu_workqueue_stat_free);
}
static int workqueue_stat_headers(struct seq_file *s)
{
seq_printf(s, "# CPU INSERTED EXECUTED NAME\n");
@ -215,6 +234,7 @@ struct tracer_stat workqueue_stats __read_mostly = {
.stat_start = workqueue_stat_start,
.stat_next = workqueue_stat_next,
.stat_show = workqueue_stat_show,
.stat_release = workqueue_stat_release,
.stat_headers = workqueue_stat_headers
};

View file

@ -24,6 +24,7 @@
#include <linux/tracepoint.h>
#include <linux/err.h>
#include <linux/slab.h>
#include <linux/sched.h>
extern struct tracepoint __start___tracepoints[];
extern struct tracepoint __stop___tracepoints[];
@ -242,6 +243,11 @@ static void set_tracepoint(struct tracepoint_entry **entry,
{
WARN_ON(strcmp((*entry)->name, elem->name) != 0);
if (elem->regfunc && !elem->state && active)
elem->regfunc();
else if (elem->unregfunc && elem->state && !active)
elem->unregfunc();
/*
* rcu_assign_pointer has a smp_wmb() which makes sure that the new
* probe callbacks array is consistent before setting a pointer to it.
@ -261,6 +267,9 @@ static void set_tracepoint(struct tracepoint_entry **entry,
*/
static void disable_tracepoint(struct tracepoint *elem)
{
if (elem->unregfunc && elem->state)
elem->unregfunc();
elem->state = 0;
rcu_assign_pointer(elem->funcs, NULL);
}
@ -554,9 +563,6 @@ int tracepoint_module_notify(struct notifier_block *self,
switch (val) {
case MODULE_STATE_COMING:
tracepoint_update_probe_range(mod->tracepoints,
mod->tracepoints + mod->num_tracepoints);
break;
case MODULE_STATE_GOING:
tracepoint_update_probe_range(mod->tracepoints,
mod->tracepoints + mod->num_tracepoints);
@ -577,3 +583,41 @@ static int init_tracepoints(void)
__initcall(init_tracepoints);
#endif /* CONFIG_MODULES */
#ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
/* NB: reg/unreg are called while guarded with the tracepoints_mutex */
static int sys_tracepoint_refcount;
void syscall_regfunc(void)
{
unsigned long flags;
struct task_struct *g, *t;
if (!sys_tracepoint_refcount) {
read_lock_irqsave(&tasklist_lock, flags);
do_each_thread(g, t) {
/* Skip kernel threads. */
if (t->mm)
set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
} while_each_thread(g, t);
read_unlock_irqrestore(&tasklist_lock, flags);
}
sys_tracepoint_refcount++;
}
void syscall_unregfunc(void)
{
unsigned long flags;
struct task_struct *g, *t;
sys_tracepoint_refcount--;
if (!sys_tracepoint_refcount) {
read_lock_irqsave(&tasklist_lock, flags);
do_each_thread(g, t) {
clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
} while_each_thread(g, t);
read_unlock_irqrestore(&tasklist_lock, flags);
}
}
#endif

View file

@ -57,7 +57,6 @@
# call mcount (offset: 0x5)
# [...]
# ret
# .globl my_func
# other_func:
# [...]
# call mcount (offset: 0x1b)

View file

@ -653,7 +653,7 @@ static void print_tracepoint_events(void)
for_each_event(sys_dirent, evt_dir, evt_dirent, evt_next) {
snprintf(evt_path, MAXPATHLEN, "%s:%s",
sys_dirent.d_name, evt_dirent.d_name);
fprintf(stderr, " %-40s [%s]\n", evt_path,
fprintf(stderr, " %-42s [%s]\n", evt_path,
event_type_descriptors[PERF_TYPE_TRACEPOINT+1]);
}
closedir(evt_dir);
@ -687,7 +687,7 @@ void print_events(void)
sprintf(name, "%s OR %s", syms->symbol, syms->alias);
else
strcpy(name, syms->symbol);
fprintf(stderr, " %-40s [%s]\n", name,
fprintf(stderr, " %-42s [%s]\n", name,
event_type_descriptors[type]);
prev_type = type;
@ -701,7 +701,7 @@ void print_events(void)
continue;
for (i = 0; i < PERF_COUNT_HW_CACHE_RESULT_MAX; i++) {
fprintf(stderr, " %-40s [%s]\n",
fprintf(stderr, " %-42s [%s]\n",
event_cache_name(type, op, i),
event_type_descriptors[4]);
}
@ -709,7 +709,7 @@ void print_events(void)
}
fprintf(stderr, "\n");
fprintf(stderr, " %-40s [raw hardware event descriptor]\n",
fprintf(stderr, " %-42s [raw hardware event descriptor]\n",
"rNNN");
fprintf(stderr, "\n");