Commit graph

716 commits

Author SHA1 Message Date
Jiri Olsa
e6902d1b73 perf report: Add --percent-type option
Set annotation percent type from following choices:

  global-period, local-period, global-hits, local-hits

With following report option setup the percent type will be passed to
annotation browser:

  $ perf report --percent-type period-local

The local/global keywords set if the percentage is computed in the scope
of the function (local) or the whole data (global).  The period/hits
keywords set the base the percentage is computed on - the samples period
or the number of samples (hits).

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-21-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:54 -03:00
Jiri Olsa
88c2119077 perf annotate: Add --percent-type option
Add --percent-type option to set annotation percent type from following
choices:

  global-period, local-period, global-hits, local-hits

Examples:

  $ perf annotate --percent-type period-local --stdio | head -1
   Percent         |      Source code ... es, percent: local period)
  $ perf annotate --percent-type hits-local --stdio | head -1
   Percent         |      Source code ... es, percent: local hits)
  $ perf annotate --percent-type hits-global --stdio | head -1
   Percent         |      Source code ... es, percent: global hits)
  $ perf annotate --percent-type period-global --stdio | head -1
   Percent         |      Source code ... es, percent: global period)

The local/global keywords set if the percentage is computed in the scope
of the function (local) or the whole data (global).

The period/hits keywords set the base the percentage is computed on -
the samples period or the number of samples (hits).

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-20-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:53 -03:00
Sangwon Hong
6feb3fec51 perf list: Add missing documentation for --desc and --debug options
Add missing documentation for --desc and --debug options to the 'perf
list' man page.

Signed-off-by: Sangwon Hong <qpakzk@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20180717110738.10779-1-qpakzk@gmail.com
[ Clarify that --desc is by default active ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24 14:49:57 -03:00
Tobias Tefke
788faab70d perf, tools: Use correct articles in comments
Some of the comments in the perf events code use articles incorrectly,
using 'a' for words beginning with a vowel sound, where 'an' should be
used.

Signed-off-by: Tobias Tefke <tobias.tefke@tutanota.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: jolsa@redhat.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/20180709105715.22938-1-tobias.tefke@tutanota.com
[ Fix a few more perf related 'a event' typo fixes from all around the kernel and tooling tree. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-16 00:21:03 +02:00
Jiri Olsa
9660e08ee8 perf stat: Add --interval-clear option
Adding --interval-clear option to clear the screen before next interval.

Committer testing:

  # perf stat -I 1000 --interval-clear

And, as expected, it behaves almost like:

  # watch -n 0 perf stat -a sleep 1

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180606221513.11302-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-07 15:53:36 -03:00
Jin Yao
ac56aa4549 perf script python: Add dict fields introduction to Documentation
Add a brief introduction about fields to perf-script-python.txt.

It should help python script developers in easily finding what fields
are supported.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1527843663-32288-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06 15:40:10 -03:00
Jiri Olsa
0ce2da1483 perf stat: Display user and system time
Adding the support to read rusage data once the workload is finished and
display the system/user time values:

  $ perf stat --null perf bench sched pipe
  ...

   Performance counter stats for 'perf bench sched pipe':

       5.342599256 seconds time elapsed

       2.544434000 seconds user
       4.549691000 seconds sys

It works only in non -r mode and only for workload target.

So as of now, for workload targets, we display 3 types of timings. The
time we meassure in perf stat from enable to disable+period:

       5.342599256 seconds time elapsed

The time spent in user and system lands, displayed only for workload
session/target:

       2.544434000 seconds user
       4.549691000 seconds sys

Those times are the very same displayed by 'time' tool.  They are
returned by wait4 call via the getrusage struct interface.

Committer notes:

Had to rename some variables to avoid this on older systems such as
centos:6:

  builtin-stat.c: In function 'print_footer':
  builtin-stat.c:1831: warning: declaration of 'stime' shadows a global declaration
  /usr/include/time.h:297: warning: shadowed declaration is here

Committer testing:

  # perf stat --null time perf bench sched pipe
  # Running 'sched/pipe' benchmark:
  # Executed 1000000 pipe operations between two processes

       Total time: 5.526 [sec]

         5.526534 usecs/op
           180945 ops/sec
  1.00user 6.25system 0:05.52elapsed 131%CPU (0avgtext+0avgdata 8056maxresident)k
  0inputs+0outputs (0major+606minor)pagefaults 0swaps

   Performance counter stats for 'time perf bench sched pipe':

         5.530978744 seconds time elapsed

         1.004037000 seconds user
         6.259937000 seconds sys

  #

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180605121313.31337-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06 12:52:04 -03:00
Alexey Budankov
f92da71280 perf record: Enable arbitrary event names thru name= modifier
Enable complex event names containing [.:=,] symbols to be encoded into Perf
trace using name= modifier e.g. like this:

  perf record -e cpu/name=\'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM\',\
		  period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex

Below is how it looks like in the report output. Please note explicit escaped
quoting at cmdline string in the header so that thestring can be directly reused
for another collection in shell:

perf report --header

  # ========
  ...
  # cmdline : /root/abudanko/kernel/tip/tools/perf/perf record -v -e cpu/name=\'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM\',period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex
  # event : name = OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM, , type = 4, size = 112, config = 0x100003c, { sample_period, sample_freq } = 3500000, sample_type = IP|TID|TIME, disabled = 1, inh
  ...
  # ========
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 24K of event 'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM'
  # Event count (approx.): 86492000000
  #
  # Overhead  Command  Shared Object     Symbol
  # ........  .......  ................  ..............................................
  #
      14.75%  futex    [kernel.vmlinux]  [k] __entry_trampoline_start
...

  perf stat -e cpu/name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\',period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex

  10000000 process context switches in 16678890291ns (1667.9ns/ctxsw)

   Performance counter stats for './futex':

      88,095,770,571      CPU_CLK_UNHALTED.THREAD:cmask=0x1

        16.679542407 seconds time elapsed

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/c194b060-761d-0d50-3b21-bb4ed680002d@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06 12:52:04 -03:00
Arnaldo Carvalho de Melo
7869e58894 Merge remote-tracking branch 'tip/perf/urgent' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:20 -03:00
Arnaldo Carvalho de Melo
18a7057420 perf tools: Fix perf.data format description of NRCPUS header
In the perf.data HEADER_CPUDESC feadure header we store first the number
of available CPUs in the system, then the number of CPUs at the time of
writing the header, not the other way around.

Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Lakshman Annadorai <lakshmana@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-j7o92acm2vnxjv70y4o3swoc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-30 15:40:26 -03:00
Thomas Richter
0c711138fa perf data: Update documentation section on cpu topology
Add an explanation of each cpu's core and socket identifier to the
perf.data file format documentation.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180528074433.16652-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-30 15:39:13 -03:00
Takashi Iwai
ffef80ecf8 perf Documentation: Support for asciidoctor
The asciidoc package seems behind the recent big wave of python3
conversion, and we were advised to switch to asciidoctor instead.  It's
almost compatible but some extensions used for perf documentation don't
work with it.  Here is the patch to cover them, and add the proper
support for asciidoctor.

Pass USE_ASCIIDOCTOR=yes to make for using asciidoctor instead of
asciidoc.  The man source and manual attributes are passed via command
options.  The support for these attributes have been fixed in the
latest asciidoctor code.

Since asciidoctor can covert to a man page and an HTML directly, we
can omit the dependency on xmlto when USE_ASCIIDOCTOR is set.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180424150456.17353-1-tiwai@suse.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-26 13:47:10 -03:00
Jiri Olsa
abc60bad00 perf stat: Display length strings of each run for --table option
Adding support to display visual aid 'length strings' to easily spot the
biggest difference in time table.

  $ perf stat -r 10 --table perf bench sched pipe
  ...

   Performance counter stats for './perf bench sched pipe' (5 runs):

             # Table of individual measurements:
             5.189 (-0.293) #
             5.189 (-0.294) #
             5.186 (-0.296) #
             5.663 (+0.181) ##
             6.186 (+0.703) ####

             # Final result:
             5.483 +- 0.198 seconds time elapsed  ( +-  3.62% )

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-9-jolsa@kernel.org
[ Updated 'perf stat --table' man page entry ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-26 09:30:27 -03:00
Jiri Olsa
e55c14af48 perf stat: Add --table option to display time of each run
Add --table option to display time for each run (-r option), like:

  $ perf stat --null -r 5 --table perf bench sched pipe

   Performance counter stats for './perf bench sched pipe' (5 runs):

             # Table of individual measurements:
             5.379 (-0.176)
             5.243 (-0.311)
             5.238 (-0.317)
             5.536 (-0.019)
             6.377 (+0.823)

             # Final result:
             5.555 +- 0.213 seconds time elapsed  ( +-  3.83% )

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-8-jolsa@kernel.org
[ Document the new option in 'perf stat's man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-26 09:30:27 -03:00
Ravi Bangoria
9a73c30854 perf buildid-cache: Support --purge-all option
User can remove files from cache using --remove/--purge options but both
needs list of files as an argument. It's not convenient when you want to
flush out entire cache. Add an option to purge all files from cache.

Ex,
  # perf buildid-cache -l
    8a86ef73e44067bca52cc3f6cd3e5446c783391c /tmp/a.out
    ebe71fdcf4b366518cc154d570a33cd461a51c36 /tmp/a.out.1
  # perf buildid-cache -P -v
    Removing /tmp/a.out (8a86ef73e44067bca52cc3f6cd3e5446c783391c): Ok
    Removing /tmp/a.out.1 (ebe71fdcf4b366518cc154d570a33cd461a51c36): Ok
    Purged all: Ok

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Sihyeon Jang <uneedsihyeon@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180417041346.5617-4-ravi.bangoria@linux.vnet.ibm.com
[ Initialize 'err' in build_id_cache__purge_all(), to fix build on debian:7, as it can be used uninitialized ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-26 09:30:26 -03:00
Ravi Bangoria
8e1e0d7467 perf buildid-cache: Support --list option
'perf buildid-cache' allows to add/remove files into cache but there is
no option to list all cached files. Add --list option to list all
_valid_ cached files.

Ex,
  # perf buildid-cache --add /tmp/a.out
  # perf buildid-cache -l
    8a86ef73e44067bca52cc3f6cd3e5446c783391c /tmp/a.out

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Sihyeon Jang <uneedsihyeon@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180417041346.5617-3-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-26 09:30:26 -03:00
Sangwon Hong
3138a2ef62 perf mem: Document incorrect and missing options
Several options were incorrectly described, some lacked describing
required arguments while others were simply not documented, fix it.

Signed-off-by: Sangwon Hong <qpakzk@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1524382146-19609-1-git-send-email-qpakzk@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-23 11:59:18 -03:00
Andi Kleen
a7e9eab3db perf mem: Allow all record/report options
For perf mem report / perf mem record, pass all unknown options
through to the underlying report/record commands. This makes things
like

perf mem record -a sleep 1

work. Matches how c2c and other tools work.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20180406203812.3087-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-18 15:35:48 -03:00
Alexey Budankov
bf30cc1882 perf script: Extend misc field decoding with switch out event type
Append 'p' sign to 'S' tag designating the type of context switch out event so
'Sp' means preemption context switch. Documentation is extended to cover
new presentation changes.

  $ perf script --show-switch-events -F +misc -I -i perf.data:

          hdparm 4073 [004] U  762.198265:     380194 cycles:ppp:      7faf727f5a23 strchr (/usr/lib64/ld-2.26.so)
          hdparm 4073 [004] K  762.198366:     441572 cycles:ppp:  ffffffffb9218435 alloc_set_pte (/lib/modules/4.16.0-rc6+/build/vmlinux)
          hdparm 4073 [004] S  762.198391: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:    0/0
         swapper    0 [004]    762.198392: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid: 4073/4073
         swapper    0 [004] Sp 762.198477: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid: 4073/4073
          hdparm 4073 [004]    762.198478: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:    0/0
         swapper    0 [007] K  762.198514:    2303073 cycles:ppp:  ffffffffb98b0c66 intel_idle (/lib/modules/4.16.0-rc6+/build/vmlinux)
         swapper    0 [007] Sp 762.198561: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid: 1134/1134
  kworker/u16:18 1134 [007]    762.198562: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid:    0/0
  kworker/u16:18 1134 [007] S  762.198567: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:    0/0

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5fc65ce7-8ca5-53ae-8858-8ddd27290575@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-17 09:47:39 -03:00
Arnaldo Carvalho de Melo
43c4023152 perf annotate: Allow setting the offset level in .perfconfig
The default is 1 (jump_target):

  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
  _raw_spin_lock_irqsave() /proc/kcore
    0.26        nop
    4.61        push   %rbx
   19.33        pushfq
    7.97        pop    %rax
    0.32        nop
    0.06        mov    %rax,%rbx
   14.63        cli
    0.06        nop
                xor    %eax,%eax
                mov    $0x1,%edx
   49.94        lock   cmpxchg %edx,(%rdi)
    0.16        test   %eax,%eax
              ↓ jne    2b
    2.66        mov    %rbx,%rax
                pop    %rbx
              ← retq
          2b:   mov    %eax,%esi
              → callq  *ffffffffb30eaed0
                mov    %rbx,%rax
                pop    %rbx
              ← retq
  #

But one can ask for showing offsets for call instructions by setting
this:

  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
  _raw_spin_lock_irqsave() /proc/kcore
    0.26        nop
    4.61        push   %rbx
   19.33        pushfq
    7.97        pop    %rax
    0.32        nop
    0.06        mov    %rax,%rbx
   14.63        cli
    0.06        nop
                xor    %eax,%eax
                mov    $0x1,%edx
   49.94        lock   cmpxchg %edx,(%rdi)
    0.16        test   %eax,%eax
              ↓ jne    2b
    2.66        mov    %rbx,%rax
                pop    %rbx
              ← retq
          2b:   mov    %eax,%esi
          2d: → callq  *ffffffffb30eaed0
                mov    %rbx,%rax
                pop    %rbx
              ← retq
  #

Or using a big value to ask for all offsets to be shown:

  # cat ~/.perfconfig
  [annotate]

	offset_level = 100

	hide_src_code = true
  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
  _raw_spin_lock_irqsave() /proc/kcore
    0.26   0:   nop
    4.61   5:   push   %rbx
   19.33   6:   pushfq
    7.97   7:   pop    %rax
    0.32   8:   nop
    0.06   d:   mov    %rax,%rbx
   14.63  10:   cli
    0.06  11:   nop
          17:   xor    %eax,%eax
          19:   mov    $0x1,%edx
   49.94  1e:   lock   cmpxchg %edx,(%rdi)
    0.16  22:   test   %eax,%eax
          24: ↓ jne    2b
    2.66  26:   mov    %rbx,%rax
          29:   pop    %rbx
          2a: ← retq
          2b:   mov    %eax,%esi
          2d: → callq  *ffffffffb30eaed0
          32:   mov    %rbx,%rax
          35:   pop    %rbx
          36: ← retq
   #

This also affects the TUI, i.e. the default 'perf annotate' and 'perf
top/report' -> A hotkey -> annotate interfaces, when slang-devel is present
in the build, i.e.:

  # perf version --build-options | grep slang
              libslang: [ on  ]  # HAVE_SLANG_SUPPORT
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-venm6x5zrt40eu8hxdsmqxz6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-13 10:00:05 -03:00
Takuya Yamamoto
e8103e44ce perf sched: Fix documentation for timehist
Fixed a incorrect option and usage to those shown by "perf sched timehist -h",
i.e. the default is really --call-graph, which is equivalent to -g.

Signed-off-by: Takuya Yamamoto <tkydevel@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/n/tip-8fzo0dlsi1mku5aqx8brep5s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-12 10:33:36 -03:00
Alexey Budankov
9dc9a95f03 perf stat: Enable 1ms interval for printing event counters values
Currently print count interval for performance counters values is
limited by 10ms so reading the values at frequencies higher than 100Hz
is restricted by the tool.

This change makes perf stat -I possible on frequencies up to 1KHz and,
to some extent, makes perf stat -I to be on-par with perf record
sampling profiling.

When running perf stat -I for monitoring e.g. PCIe uncore counters and
at the same time profiling some I/O workload by perf record e.g. for
cpu-cycles and context switches, it is then possible to observe
consolidated CPU/OS/IO(Uncore) performance picture for that workload.

Tool overhead warning printed when specifying -v option can be missed
due to screen scrolling in case you have output to the console
so message is moved into help available by running perf stat -h.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/b842ad6a-d606-32e4-afe5-974071b5198e@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-12 09:29:31 -03:00
Jin Yao
7098467256 perf version: Add man page
Since a new option '--build-options' is created for 'perf version', so
we need to document it.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1522402036-22915-7-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 13:52:23 -03:00
Arnaldo Carvalho de Melo
0a6545bda2 perf trace: Show only failing syscalls
For instance:

  # perf probe "vfs_getname=getname_flags:72 pathname=result->name:string"
  Added new event:
    probe:vfs_getname    (on getname_flags:72 with pathname=result->name:string)

  You can now use it in all perf tools, such as:

	  perf record -e probe:vfs_getname -aR sleep 1

  # perf trace --failure sleep 1
     0.043 ( 0.010 ms): sleep/10978 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory

For reference, here are all the syscalls in this case:

  # perf trace sleep 1
         ? (         ): sleep/10976  ... [continued]: execve()) = 0
       0.027 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d04000
       0.044 ( 0.010 ms): sleep/10976 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory
       0.057 ( 0.006 ms): sleep/10976 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
       0.064 ( 0.002 ms): sleep/10976 fstat(fd: 3, statbuf: 0x7fffac22b370) = 0
       0.067 ( 0.003 ms): sleep/10976 mmap(len: 111457, prot: READ, flags: PRIVATE, fd: 3) = 0x7feec8615000
       0.071 ( 0.001 ms): sleep/10976 close(fd: 3) = 0
       0.080 ( 0.007 ms): sleep/10976 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
       0.088 ( 0.002 ms): sleep/10976 read(fd: 3, buf: 0x7fffac22b538, count: 832) = 832
       0.092 ( 0.001 ms): sleep/10976 fstat(fd: 3, statbuf: 0x7fffac22b3d0) = 0
       0.094 ( 0.002 ms): sleep/10976 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7feec8613000
       0.099 ( 0.004 ms): sleep/10976 mmap(len: 3889792, prot: EXEC|READ, flags: PRIVATE|DENYWRITE, fd: 3) = 0x7feec8057000
       0.104 ( 0.007 ms): sleep/10976 mprotect(start: 0x7feec8203000, len: 2097152) = 0
       0.112 ( 0.005 ms): sleep/10976 mmap(addr: 0x7feec8403000, len: 24576, prot: READ|WRITE, flags: PRIVATE|DENYWRITE|FIXED, fd: 3, off: 1753088) = 0x7feec8403000
       0.120 ( 0.003 ms): sleep/10976 mmap(addr: 0x7feec8409000, len: 14976, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS|FIXED) = 0x7feec8409000
       0.128 ( 0.001 ms): sleep/10976 close(fd: 3) = 0
       0.139 ( 0.001 ms): sleep/10976 arch_prctl(option: 4098, arg2: 140663540761856) = 0
       0.186 ( 0.004 ms): sleep/10976 mprotect(start: 0x7feec8403000, len: 16384, prot: READ) = 0
       0.204 ( 0.003 ms): sleep/10976 mprotect(start: 0x55bdc0ec3000, len: 4096, prot: READ) = 0
       0.209 ( 0.004 ms): sleep/10976 mprotect(start: 0x7feec8631000, len: 4096, prot: READ) = 0
       0.214 ( 0.010 ms): sleep/10976 munmap(addr: 0x7feec8615000, len: 111457) = 0
       0.269 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d04000
       0.271 ( 0.002 ms): sleep/10976 brk(brk: 0x55bdc2d25000) = 0x55bdc2d25000
       0.274 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d25000
       0.278 ( 0.007 ms): sleep/10976 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
       0.288 ( 0.001 ms): sleep/10976 fstat(fd: 3</usr/lib/locale/locale-archive>, statbuf: 0x7feec8408aa0) = 0
       0.290 ( 0.003 ms): sleep/10976 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7feec1488000
       0.297 ( 0.001 ms): sleep/10976 close(fd: 3</usr/lib/locale/locale-archive>) = 0
       0.325 (1000.193 ms): sleep/10976 nanosleep(rqtp: 0x7fffac22c0b0) = 0
    1000.560 ( 0.006 ms): sleep/10976 close(fd: 1) = 0
    1000.573 ( 0.005 ms): sleep/10976 close(fd: 2) = 0
    1000.596 (         ): sleep/10976 exit_group()
  #

And can be done systemwide, etc, with backtraces:

  # perf trace --max-stack=16 --failure sleep 1
     0.048 ( 0.015 ms): sleep/11092 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory
                                       __access (inlined)
                                       dl_main (/usr/lib64/ld-2.26.so)
  #

Or for some specific syscalls:

  # perf trace --max-stack=16 -e openat --failure cat /tmp/rien
  cat: /tmp/rien: No such file or directory
       0.251 ( 0.012 ms): cat/11106 openat(dfd: CWD, filename: /tmp/rien) = -1 ENOENT No such file or directory
                                         __libc_open64 (inlined)
                                         main (/usr/bin/cat)
                                         __libc_start_main (/usr/lib64/libc-2.26.so)
                                         _start (/usr/bin/cat)
  #

Look for inotify* syscalls that fail, system wide, for 2 seconds, with backtraces:

  # perf trace -a --max-stack=16 --failure -e inotify* sleep 2
   819.165 ( 0.058 ms): gmain/1724 inotify_add_watch(fd: 8<anon_inode:inotify>, pathname: /home/acme/~, mask: 16789454) = -1 ENOENT No such file or directory
                                       __GI_inotify_add_watch (inlined)
                                       _ik_watch (/usr/lib64/libgio-2.0.so.0.5400.3)
                                       _ip_start_watching (/usr/lib64/libgio-2.0.so.0.5400.3)
                                       im_scan_missing (/usr/lib64/libgio-2.0.so.0.5400.3)
                                       g_timeout_dispatch (/usr/lib64/libglib-2.0.so.0.5400.3)
                                       g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0.5400.3)
                                       g_main_context_iterate.isra.23 (/usr/lib64/libglib-2.0.so.0.5400.3)
                                       g_main_context_iteration (/usr/lib64/libglib-2.0.so.0.5400.3)
                                       glib_worker_main (/usr/lib64/libglib-2.0.so.0.5400.3)
                                       g_thread_proxy (/usr/lib64/libglib-2.0.so.0.5400.3)
                                       start_thread (/usr/lib64/libpthread-2.26.so)
                                       __GI___clone (inlined)
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-8f7d3mngaxvi7tlzloz3n7cs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 07:57:37 -03:00
Kim Phillips
b74d12d598 perf tools: Add a "dso_size" sort order
Add DSO size to perf report/top sort output list.

This includes adding a map__size fn to map.h, which is
approximately equal to the DSO data file_size:

  DSO				file size	map (end-start)	file / (end-start)
  libwebkit2gtk-4.0.so.37.24.9	43260072	41295872	95%
  libglib-2.0.so.0.5400.1		 1125680	 1118208	99%
  libc-2.26.so			 1960656 	 1925120	101%
  libdbus-1.so.3.14.13		  309456 	  303104	102%

Sample output:

  $ ./perf report -s dso_size,dso
  Samples: 2K of event 'cycles:uppp', Event count (approx.): 128373340
  Overhead  DSO size  Shared Object
    90.62%   unknown  [unknown]
     2.87%   1118208  libglib-2.0.so.0.5400.1
     1.92%    303104  libdbus-1.so.3.14.13
     1.42%   1925120  libc-2.26.so
     0.77%  41295872  libwebkit2gtk-4.0.so.37.24.9
     0.61%    335872  libgobject-2.0.so.0.5400.1
     0.41%   1052672  libgdk-3.so.0.2200.25
     0.36%    106496  libpthread-2.26.so
     0.29%    221184  dbus-daemon
     0.17%    159744  ld-2.26.so
     0.13%     49152  libwayland-client.so.0.3.0
     0.12%   1642496  libgio-2.0.so.0.5400.1
     0.09%   7327744  libgtk-3.so.0.2200.25
     0.09%  12324864  libmozjs-52.so.0.0.0
     0.05%   4796416  perf
     0.04%    843776  libgjs.so.0.0.0
     0.03%   1409024  libmutter-clutter-1.so

Committer testing:

To sort by DSO size, use:

  # perf report -F dso_size,dso,overhead -s dso_size
  <SNIP>
     3465216  libdns-export.so.174.0.1   0.00%
     3522560  libgc.so.1.0.3             0.00%
     3538944  libbfd-2.29-13.fc27.so     0.59%
     3670016  libunistring.so.2.1.0      0.00%
     3723264  libguile-2.0.so.22.8.1     0.00%
     3776512  libgio-2.0.so.0.5400.3     0.00%
     3891200  libc-2.26.so               0.96%
     3944448  libmozjs-17.0.so           0.00%
     4218880  libperl.so.5.26.1          0.18%
     4452352  libpython2.7.so.1.0        0.02%
     4472832  perf                       0.02%
     4603904  git                        0.01%
     4751360  libcrypto.so.1.1.0g        0.00%
     5005312  libslang.so.2.3.1          0.00%
     7315456  libgtk-3.so.0.2200.26      0.09%
     8818688  i965_dri.so                2.46%
     8818688  i965_dri.so (deleted)      1.26%
    12414976  libmozjs-52.so.0.0.0       0.03%
    23642112  cc1                        2.02%
    27889664  [kernel.kallsyms]         25.41%
    80834560  libxul.so (deleted)       15.68%
    98078720  chrome                    32.03%
  1056964608  [kernel.kallsyms]          1.59%
  #

Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180327060956.1c01ebe67a2a941bb4468c6f@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 07:57:37 -03:00
Arnaldo Carvalho de Melo
91340c5184 perf report: Introduce --ignore-vmlinux command line option
We've had this in 'perf top' for quite a while, useful if one wishes
to force using /proc/kcore to do annotation using the patched kernel
instead of the ELF image it started from, aka vmlinux.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ircpvox4wzsv7gasrpb28fw9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21 12:53:42 -03:00
Arnaldo Carvalho de Melo
be316409e9 perf annotate: Introduce --ignore-vmlinux command line option
This is already present in 'perf top', albeit undocumented (will fix),
and is useful to use /proc/kcore instead of vmlinux and then get what is
really in place, not what the kernel starts with, before alternatives,
ftrace .text patching, etc, see the differences:

  # perf annotate --stdio2 _raw_spin_lock_irqsave
  _raw_spin_lock_irqsave() /lib/modules/4.16.0-rc4/build/vmlinux
  Event: anon group { cycles, instructions }

    0.00   3.17      → callq  __fentry__
    0.00   7.94        push   %rbx
    7.69  36.51      → callq  __page_file_index
                       mov    %rax,%rbx
    7.69   3.17      → callq  *ffffffff82225cd0
                       xor    %eax,%eax
                       mov    $0x1,%edx
   80.77  49.21        lock   cmpxchg %edx,(%rdi)
                       test   %eax,%eax
                     ↓ jne    2b
    3.85   0.00        mov    %rbx,%rax
                       pop    %rbx
                     ← retq
                 2b:   mov    %eax,%esi
                     → callq  queued_spin_lock_slowpath
                       mov    %rbx,%rax
                       pop    %rbx
                     ← retq
  [root@jouet ~]# perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  _raw_spin_lock_irqsave() /proc/kcore
  Event: anon group { cycles, instructions }

    0.00   3.17        nop
    0.00   7.94        push   %rbx
    0.00  23.81        pushfq
    7.69  12.70        pop    %rax
                       nop
                       mov    %rax,%rbx
    7.69   3.17        cli
                       nop
                       xor    %eax,%eax
                       mov    $0x1,%edx
   80.77  49.21        lock   cmpxchg %edx,(%rdi)
                       test   %eax,%eax
                     ↓ jne    2b
    3.85   0.00        mov    %rbx,%rax
                       pop    %rbx
                     ← retq
                 2b:   mov    %eax,%esi
                     → callq  *ffffffff820e96b0
                       mov    %rbx,%rax
                       pop    %rbx
                     ← retq
  #

Diff of the output of those commands:

  # perf annotate --stdio2 _raw_spin_lock_irqsave > /tmp/vmlinux
  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave > /tmp/kcore
  # diff -y /tmp/vmlinux /tmp/kcore
  _raw_spin_lock_irqsave() vmlinux             | _raw_spin_lock_irqsave() /proc/kcore
  Event: anon group { cycles, instructions }     Event: anon group { cycles, instructions }

   0.00  3.17  → callq __fentry__              |  0.00  3.17     nop
   0.00  7.94    push  %rbx                       0.00  7.94     push  %rbx
   7.69 36.51  → callq __page_file_index       |  0.00 23.81     pushfq
                                               >  7.69 12.70     pop   %rax
                                               >                 nop
                 mov   %rax,%rbx                                 mov   %rax,%rbx
   7.69  3.17  → callq *ffffffff82225cd0       |  7.69  3.17     cli
                                               >                 nop
                 xor   %eax,%eax                                 xor   %eax,%eax
                 mov   $0x1,%edx                                 mov   $0x1,%edx
  80.77 49.21    lock  cmpxchg %edx,(%rdi)       80.77 49.21     lock  cmpxchg %edx,(%rdi)
                 test  %eax,%eax                                 test  %eax,%eax
               ↓ jne   2b                                      ↓ jne   2b
   3.85  0.00    mov   %rbx,%rax                  3.85  0.00     mov   %rbx,%rax
                 pop   %rbx                                      pop   %rbx
               ← retq                                          ← retq
            2b:  mov   %eax,%esi                            2b:  mov   %eax,%esi
               → callq queued_spin_lock_slowpath|              → callq *ffffffff820e96b0
                 mov   %rbx,%rax                                 mov   %rbx,%rax
                 pop   %rbx                                      pop   %rbx
               ← retq                                          ← retq
  #

This should be further streamlined by doing both annotations and
allowing the TUI to toggle initial/current, and show the patched
instructions in a slightly different color.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-wz8d269hxkcwaczr0r4rhyjg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21 12:53:42 -03:00
Arnaldo Carvalho de Melo
befd2a38a6 perf annotate: Introduce the --stdio2 output mode
This uses the TUI augmented formatting routines, modulo interactivity.

  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  _raw_spin_lock_irqsave() /proc/kcore
  Event: cycles:ppp

  Percent

              Disassembly of section load0:

              ffffffff9a8734b0 <load0>:
                nop
                push   %rbx
   50.00        pushfq
                pop    %rax
                nop
                mov    %rax,%rbx
                cli
                nop
                xor    %eax,%eax
                mov    $0x1,%edx
   50.00        lock   cmpxchg %edx,(%rdi)
                test   %eax,%eax
              ↓ jne    2b
                mov    %rbx,%rax
                pop    %rbx
              ← retq
          2b:   mov    %eax,%esi
              → callq  queued_spin_lock_slowpath
                mov    %rbx,%rax
                pop    %rbx
              ← retq

Tested-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-6cte5o8z84mbivbvqlg14uh1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21 12:53:26 -03:00
Arnaldo Carvalho de Melo
a8403912d0 perf top: Document --ignore-vmlinux
We've had this since 2013, document it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Willy Tarreau <w@1wt.eu>
Fixes: fc2be6968e ("perf symbols: Add new option --ignore-vmlinux for perf top")
Link: https://lkml.kernel.org/n/tip-0jwfueooddwfsw9r603belxi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-19 13:51:52 -03:00
Jiri Olsa
8fab7843a1 perf c2c record: Record physical addresses in samples
We are going to display NUMA node information in following patches. For
this we need to have physical address data in the sample.

Adding --phys-data as a default option for perf c2c record.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180309101442.9224-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-16 13:52:57 -03:00
Stephane Eranian
2427b432e6 perf tools: Update quipper information
This patch updates the links to the Quipper library.  It is now
available from GitHub and has been updated.

Reported-by: Lakshman Annadorai <lakshmana@google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1520495985-2147-1-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-08 11:30:54 -03:00
Agustin Vega-Frias
c199c11dce perf pmu: Auto-merge PMU events created by prefix or glob match
Auto-merge for these events was disabled when auto-merging of non-alias
events was disabled in commit 63ce844 (perf stat: Only auto-merge events
that are PMU aliases).

Non-merging of legacy events is preserved:

    $ perf stat -ag -e cache-misses,cache-misses sleep 1

     Performance counter stats for 'system wide':

                86,323      cache-misses
                86,323      cache-misses

           1.002623307 seconds time elapsed

But prefix or glob matching auto-merges the events created:

    $ perf stat -a -e l3cache/read-miss/ sleep 1

     Performance counter stats for 'system wide':

                   328      l3cache/read-miss/

           1.002627008 seconds time elapsed

    $ perf stat -a -e l3cache_0_[01]/read-miss/ sleep 1

     Performance counter stats for 'system wide':

                   172      l3cache/read-miss/

           1.002627008 seconds time elapsed

As with events created with aliases, auto-merging can be suppressed with
the --no-merge option:

    $ perf stat -a -e l3cache/read-miss/ --no-merge sleep 1

     Performance counter stats for 'system wide':

                    67      l3cache/read-miss/
                    67      l3cache/read-miss/
                    63      l3cache/read-miss/
                    60      l3cache/read-miss/

           1.002622192 seconds time elapsed

Signed-off-by: Agustin Vega-Frias <agustinv@codeaurora.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Timur Tabi <timur@codeaurora.org>
Cc: linux-arm-kernel@lists.infradead.org
Change-Id: I0a47eed54c05e1982ca964d743b37f50f60c508c
Link: http://lkml.kernel.org/r/1520345084-42646-4-git-send-email-agustinv@codeaurora.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-08 10:05:49 -03:00
Agustin Vega-Frias
b2b9d3a3f0 perf pmu: Support wildcards on pmu name in dynamic pmu events
Starting on v4.12 event parsing code for dynamic pmu events already
supports prefix-based matching of multiple pmus when creating dynamic
events. E.g., in a system with the following dynamic pmus:

    mypmu_0
    mypmu_1
    mypmu_2
    mypmu_4

passing mypmu/<config>/ as an event spec will result in the creation of
the event in all of the pmus. This change expands this matching through
the use of fnmatch so glob-like expressions can be used to create events
in multiple pmus. E.g., in the system described above if a user only
wants to create the event in mypmu_0 and mypmu_1, mypmu_[01]/<config>/
can be passed.

Signed-off-by: Agustin Vega-Frias <agustinv@codeaurora.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Timur Tabi <timur@codeaurora.org>
Change-Id: Icb25653fc5d5239c20f3bffdfdf4ab4c9c9bb20b
Link: http://lkml.kernel.org/r/1520454947-16977-1-git-send-email-agustinv@codeaurora.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-08 10:05:25 -03:00
Takashi Iwai
ea66536ab2 perf tools: Correct title markers for asciidoctor
I've tested to process the perf man pages with asciidoctor that is
picker than asciidoc, and it revealed minor syntax errors in some
documents.  Namely, the title markers aren't aligned with the previous
line, hence asciidoctor didn't recognize as titles.

This patch corrects these markers to be processed properly.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180307105441.28512-1-tiwai@suse.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-07 10:26:32 -03:00
Arnaldo Carvalho de Melo
9ea42ba441 perf trace: Support setting cgroups as targets
One can set a cgroup as a default cgroup to be used by all events or
set cgroups with the 'perf stat' and 'perf record' behaviour, i.e.
'-G A' will be the cgroup for events defined so far in the command line.

Here in my main machine, with a kvm instance running a rhel6 guinea pig
I have:

  # ls -la /sys/fs/cgroup/perf_event/ | grep drw
  drwxr-xr-x. 14 root root 360 Mar  6 12:04 ..
  drwxr-xr-x.  3 root root   0 Mar  6 15:05 machine.slice
  #

So I can go ahead and use that cgroup hierarchy, say lets see what
syscalls are being emitted by threads in that 'machine.slice' hierarchy
that are taking more than 100ms:

  # perf trace --duration 100 -G machine.slice
     0.188 (249.850 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
   250.274 (249.743 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
   500.224 (249.755 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
   750.097 (249.934 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
  1000.244 (249.780 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
  1250.197 (249.796 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
  1500.124 (249.859 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
  1750.076 (172.900 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
   902.570 (1021.116 ms): qemu-system-x8/23667 ppoll(ufds: 0x558151e03180, nfds: 74, tsp: 0x7ffc00cd0900, sigsetsize: 8) = 1
  1923.825 (305.133 ms): qemu-system-x8/23667 ppoll(ufds: 0x558151e03180, nfds: 74, tsp: 0x7ffc00cd0900, sigsetsize: 8) = 1
  2000.172 (229.002 ms): CPU 0/KVM/23744 ioctl(fd: 16<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
^C  #

If we look inside that cgroup hierarchy we get:

  # ls -la /sys/fs/cgroup/perf_event/machine.slice/ | grep drw
  drwxr-xr-x. 3 root root 0 Mar  6 15:05 .
  drwxr-xr-x. 2 root root 0 Mar  6 16:16 machine-qemu\x2d2\x2drhel6.sandy.scope
  #

There is just one, but lets say there were more and we would want to see
5 seconds worth of syscall summary for the threads in that cgroup:

  # perf trace --summary -G machine.slice/machine-qemu\\x2d2\\x2drhel6.sandy.scope/ -a sleep 5

   Summary of events:

     qemu-system-x86 (23667), 143858 events, 24.2%

     syscall            calls    total       min       avg       max      stddev
                                 (msec)    (msec)    (msec)    (msec)        (%)
     --------------- -------- --------- --------- --------- ---------     ------
     ppoll              28492  4348.631     0.000     0.153    11.616      1.05%
     futex              19661   140.801     0.001     0.007     2.993      3.20%
     read               18440    68.084     0.001     0.004     1.653      4.33%
     ioctl               5387    24.768     0.002     0.005     0.134      1.62%

     CPU 0/KVM (23744), 449455 events, 75.8%

     syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
     --------------- -------- --------- --------- --------- ---------     ------
     ioctl             148364  3401.812     0.000     0.023    11.801      1.15%
     futex              36131   404.127     0.001     0.011     7.377      2.63%
     writev             29452   339.688     0.003     0.012     1.740      1.36%
     write              11315    45.992     0.001     0.004     0.105      1.10%

  #

See the documentation about how to set more than one cgroup for
different events in the same command line.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-t126jh4occqvu0xdqlcjygex@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-07 10:22:26 -03:00
Arnaldo Carvalho de Melo
b09c2364a4 perf record: Throttle user defined frequencies to the maximum allowed
# perf record -F 200000 sleep 1
  warning: Maximum frequency rate (15,000 Hz) exceeded, throttling from 200,000 Hz to 15,000 Hz.
           The limit can be raised via /proc/sys/kernel/perf_event_max_sample_rate.
           The kernel will lower it when perf's interrupts take too long.
	   Use --strict-freq to disable this throttling, refusing to record.
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.019 MB perf.data (15 samples) ]
  # perf evlist -v
  cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1

For those wanting that it fails if the desired frequency can't be used:

  # perf record --strict-freq -F 200000 sleep 1
  error: Maximum frequency rate (15,000 Hz) exceeded.
         Please use -F freq option with a lower value or consider
         tweaking /proc/sys/kernel/perf_event_max_sample_rate.
  #

Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-oyebruc44nlja499nqkr1nzn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-05 09:58:44 -03:00
Arnaldo Carvalho de Melo
7831bf2365 perf top: Allow asking for the maximum allowed sample rate
Add the handy '-F max' shortcut, just introduced to 'perf record', to
reading and using the kernel.perf_event_max_sample_rate value as the
user supplied sampling frequency:

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-hz04f296zccknnb5at06a6q0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-05 09:58:44 -03:00
Arnaldo Carvalho de Melo
67230479b2 perf record: Allow asking for the maximum allowed sample rate
Add the handy '-F max' shortcut to reading and using the
kernel.perf_event_max_sample_rate value as the user supplied
sampling frequency:

  # perf record -F max sleep 1
  info: Using a maximum frequency rate of 15,000 Hz
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.019 MB perf.data (14 samples) ]
  # sysctl kernel.perf_event_max_sample_rate
  kernel.perf_event_max_sample_rate = 15000
  # perf evlist -v
  cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1

  # perf record -F 10 sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.019 MB perf.data (4 samples) ]
  # perf evlist -v
  cycles:ppp: size: 112, { sample_period, sample_freq }: 10, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
  #

Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4y0tiuws62c64gp4cf0hme0m@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-05 09:58:43 -03:00
weiping zhang
25f72f9ed8 perf cgroup: Simplify arguments when tracking multiple events
When using -G with one cgroup and -e with multiple events, only the
first event gets the correct cgroup setting, all events from the second
onwards will track system-wide events.

If the user wants to track multiple events for a specific cgroup, the
user must give parameters like the following:

  $ perf stat -e e1 -e e2 -e e3 -G test,test,test

This patch simplify this case, just type one cgroup:

  $ perf stat -e e1 -e e2 -e e3 -G test

  $ mkdir -p /sys/fs/cgroup/perf_event/empty_cgroup
  $ perf stat -e cycles -e cache-misses -a -I 1000 -G empty_cgroup

Before:

     1.001007226   <not counted>      cycles	   empty_cgroup
     1.001007226           7,506      cache-misses

After:

     1.000834097   <not counted>      cycles	   empty_cgroup
     1.000834097   <not counted>      cache-misses empty_cgroup

Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180129154805.GA6284@localhost.didichuxing.com
[ Improved the doc text a bit, providing an example for cgroup + system wide counting ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-22 10:02:27 -03:00
Sangwon Hong
de71128688 perf kallsyms: Fix the usage on the man page
First, all man pages highlight only perf and subcommands except 'perf
kallsyms', which includes the full usage. Fix it for commands to
monopolize underlines.

Second, options can be ommited when executing 'perf kallsyms', so add
square brackets between <option>.

Signed-off-by: Sangwon Hong <qpakzk@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1518377864-20353-1-git-send-email-qpakzk@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-21 09:23:36 -03:00
Andi Kleen
fc2f52379b perf report: Fix description for --mem-mode
The "mem-loads" event only works when PEBS is enabled, so add the "/p"
("precise") suffix to the examples.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20180209163909.9240-1-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-v0gcd4u9tktrvjjsp6y7ouv4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16 14:55:46 -03:00
Sangwon Hong
7e99b19722 perf mem: Document a missing option
Add the missing --force option on the man page.

Signed-off-by: Sangwon Hong <qpakzk@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1518381517-30766-2-git-send-email-qpakzk@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16 14:55:42 -03:00
Sangwon Hong
577980a000 perf kmem: Document a missing option & an argument
First, 'perf kmem' has a '--force' option, but didn't document it on the
man page. So add it.

Second, the '--time' option has to get a value, but isn't documented on
the man page. Describe it.

Signed-off-by: Sangwon Hong <qpakzk@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1518381517-30766-1-git-send-email-qpakzk@gmail.com
[ Add blank like after --force block, as requested by Namhyung ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16 14:55:42 -03:00
Jaecheol Shin
ac2c306838 perf annotate: Add missing arguments in Man page
Some options must require an argument. But input, stdio-color, cpu have
no them.  So I added it.

Signed-off-by: Jaecheol Shin <jcgod413@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/20180207095205.62715-1-jcgod413@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16 14:55:41 -03:00
yuzhoujian
f1f8ad52f8 perf stat: Add support to print counts after a period of time
Introduce a new option to print counts after N milliseconds and update
'perf stat' documentation accordingly.

Show below is the output of the new option for perf stat.

  $ perf stat --time 2000 -e cycles -a
  Performance counter stats for 'system wide':

        157,260,423      cycles

        2.003060766 seconds time elapsed

We can print the count deltas after N milliseconds with this new
introduced option. This option is not supported with "-I" option.

In addition, according to Kangliang's patch(19afd10410), the
monitoring overhead for system-wide core event could be very high if the
interval-print parameter was below 100ms, and the limitation value is
10ms.

So the same warning will be displayed when the time is set between 10ms
to 100ms, and the minimal time is limited to 10ms. Users can make a
decision according to their spcific cases.

Committer notes:

This actually stops the workload after the specified time, then prints
the counts.

So I renamed the option to --timeout and updated the documentation to
state that it will not just print the counts after the specified time,
but will really stop the 'perf stat' session and print the counts.

The rename from 'time' to 'timeout' also fixes the build in systems
where 'time' is used by glibc and can't be used as a name of a variable,
such as centos:5 and centos:6.

Changes since v3:
- none.

Changes since v2:
- modify the time check in __run_perf_stat func to keep some consistency
  with the workload case.
- add the warning when the time is set between 10ms to 100ms.
- add the pr_err when the time is set below 10ms.

Changes since v1:
- none.

Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1517217923-8302-3-git-send-email-ufo19890607@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16 10:18:06 -03:00
yuzhoujian
db06a269ec perf stat: Add support to print counts for fixed times
Introduce a new option to print counts for fixed number of times and
update 'perf stat' documentation accordingly.

Show below is the output of the new option for perf stat.

  $ perf stat -I 1000 --interval-count 2 -e cycles -a
  #           time             counts unit events
           1.002827089         93,884,870      cycles
           2.004231506         56,573,446      cycles

We can just print the counts for several times with this newly
introduced option. The usage of it is a little like 'vmstat', and it
should be used together with "-I" option.

  $ vmstat -n 1 2
  procs ---------memory-------------- --swap- ----io-- -system-- ------cpu---
   r  b swpd   free   buff   cache    si   so  bi   bo  in   cs us sy id wa st
   0  0    0 78270544 547484 51732076  0   0   0   20    1    1  1  0 99  0 0
   0  0    0 78270512 547484 51732080  0   0   0   16  477 1555  0  0 100 0 0

Changes since v3:
- merge interval_count check and times check to one line.
- fix the wrong indent in stat.h
- use stat_config.times instead of 'times' in cmd_stat function.

Changes since v2:
- none.

Changes since v1:
- change the name of the new option "times-print" to "interval-count".
- keep the new option interval specifically.

Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1517217923-8302-2-git-send-email-ufo19890607@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16 10:09:24 -03:00
Jiri Olsa
ad52b8cb48 perf report: Add support to display group output for non group events
Add support to display group output for if non grouped events are
detected and user forces --group option. Now for non-group events
recorded like:

  $ perf record -e 'cycles,instructions' ls

you can still get group output by using --group option
in report:

  $ perf report --group --stdio
  ...
  #         Overhead  Command  Shared Object     Symbol
  # ................  .......  ................  ......................
  #
      17.67%   0.00%  ls       libc-2.25.so      [.] _IO_do_write@@GLIB
      15.59%  25.94%  ls       ls                [.] calculate_columns
      15.41%  31.35%  ls       libc-2.25.so      [.] __strcoll_l
  ...

Committer note:

We should improve on this by making sure that the first line states that
this is not a group, but since the user doesn't have to force group view
when really using grouped events (e.g. '{cycles,instructions}'), the
user better know what is being done...

Requested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180209092734.GB20449@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16 10:09:24 -03:00
Jiri Olsa
3233b37a71 perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
Adding --show-round-event to display PERF_RECORD_FINISHED_ROUND events
like:

  # perf script --show-round-events 2>/dev/null
               yes  8591 [002] 124177.397597:         18         cpu/mem-stores/P: ff...
               yes  8591 [002] 124177.397615:          1 cpu/mem-loads,ldlat=30/P: ff...
  PERF_RECORD_FINISHED_ROUND
              perf 10380 [001] 124177.397622:          6 cpu/mem-loads,ldlat=30/P: ff...
  PERF_RECORD_FINISHED_ROUND
           swapper     0 [000] 124177.400518:         88         cpu/mem-stores/P: ff...
           swapper     0 [000] 124177.400521:         88         cpu/mem-stores/P: ff...

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180206181813.10943-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16 10:09:23 -03:00
Sangwon Hong
ba7e851642 perf data: Document missing --force option
Add the --force option to the man page.

Signed-off-by: Sangwon Hong <qpakzk@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1517831315-31490-1-git-send-email-qpakzk@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15 09:57:33 -03:00
Arnaldo Carvalho de Melo
591421e151 perf trace: Add --print-sample
To help with debugging, like the interrupted out of order issue that
will be dealt with in the next patch in this series, changing the code
to deal with:

raw_syscalls:sys_enter 411967179.269 Timer 9609/9626 [2]
raw_syscalls:sys_enter 411967179.213 file:// Content 9609/9609 [3]
   328.038 (18446744073709.496 ms): Timer/9626 futex(uaddr: 0x7fc0d4027044, op: WAIT|PRIV, utime: 0x7fc0b0ffdb50     ) ...
 raw_syscalls:sys_exit 411967179.225 file:// Content 9609/9609 [3]
   327.982 ( 0.012 ms): file:// Conten/9609 futex(uaddr: 0x7fc0d4027040, op: WAKE|PRIV, val: 1                    ) = 1

That long duration is the bug.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-fljqiibjn7wet24jd1ed7abc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-25 06:37:29 -03:00