Commit graph

493 commits

Author SHA1 Message Date
Jin Yao
8b8ef2d74d perf report: Enable finding kernel inline functions
Currently perf supports a mode to query inline stack. It works well for
finding user space inline functions but it doesn't work for kernel ones,
due to some unnecessary check.

This patch removes these unnecessary checks. Now kernel inline functions
can be reported.

For example:

  perf report --inline -g func --stdio

  |--46.19%--do_huge_pmd_anonymous_page
  |          do_huge_pmd_anonymous_page (inline)
  |          __do_huge_pmd_anonymous_page (inline)
  |          __SetPageUptodate (inline)
  |          __set_bit (inline)

  The result is compared with the output of addr2line. They match.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1500409892-15904-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-07-18 23:14:27 -03:00
Jin Yao
7e63a13a26 perf annotate: Implement visual marker for macro fusion
For marking fused instructions clearly this patch adds a line before the
first instruction of pair and joins it with the arrow of the jump to its
target.

For example, when "je" is selected in annotate view, the line before
cmpl is displayed and joins the arrow of "je".

       │   ┌──cmpl   $0x0,argp_program_version_hook
 81.93 │   ├──je     20
       │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
       │   │↓ jne    29
       │   │↓ jmp    43
 11.47 │20:└─→cmpxch %esi,0x38a999(%rip)

That means the cmpl+je is a fused instruction pair and they should be
considered together.

Changelog:

v3: Use Arnaldo's fix to improve the arrow origin rendering.  To get the
    evsel->evlist->env->cpuid, save the evsel in annotate_browser.

v2: new function "ins__is_fused" to check if the instructions are fused.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1499403995-19857-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-07-18 23:13:49 -03:00
Jin Yao
69fb09f6cc perf annotate: Check for fused instructions
Macro fusion merges two instructions to a single micro-op. Intel core
platform performs this hardware optimization under limited
circumstances.

For example, CMP + JCC can be "fused" and executed /retired together.
While with sampling this can result in the sample sometimes being on the
JCC and sometimes on the CMP.  So for the fused instruction pair, they
could be considered together.

On Nehalem, fused instruction pairs:

  cmp/test + jcc.

On other new CPU:

  cmp/test/add/sub/and/inc/dec + jcc.

This patch adds an x86-specific function which checks if 2 instructions
are in a "fused" pair. For non-x86 arch, the function is just NULL.

Changelog:

v4: Move the CPU model checking to symbol__disassemble and save the CPU
    family/model in arch structure.

    It avoids checking every time when jump arrow printed.

v3: Add checking for Nehalem (CMP, TEST). For other newer Intel CPUs
    just check it by default (CMP, TEST, ADD, SUB, AND, INC, DEC).

v2: Remove the original weak function. Arnaldo points out that doing it
    as a weak function that will be overridden by the host arch doesn't
    work. So now it's implemented as an arch-specific function.

Committer fix:

Do not access evsel->evlist->env->cpuid, ->env can be null, introduce
perf_evsel__env_cpuid(), just like perf_evsel__env_arch(), also used in
this function call.

The original patch was segfaulting 'perf top' + annotation.

But this essentially disables this fused instructions augmentation in
'perf top', the right thing is to get the cpuid from the running kernel,
left for a later patch tho.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1499403995-19857-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-07-18 23:11:25 -03:00
Jin Yao
80f62589fa perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
When the jump instruction is displayed at the row 0 in annotate view,
the arrow is broken. An example:

 16.86 │   ┌──je     82
  0.01 │      movsd  (%rsp),%xmm0
       │      movsd  0x8(%rsp),%xmm4
       │      movsd  0x8(%rsp),%xmm1
       │      movsd  (%rsp),%xmm3
       │      divsd  %xmm4,%xmm0
       │      divsd  %xmm3,%xmm1
       │      movsd  (%rsp),%xmm2
       │      addsd  %xmm1,%xmm0
       │      addsd  %xmm2,%xmm0
       │      movsd  %xmm0,(%rsp)
       │82:   sub    $0x1,%ebx
 83.03 │    ↑ jne    38
       │      add    $0x10,%rsp
       │      xor    %eax,%eax
       │      pop    %rbx
       │    ← retq

The patch increments the row number before checking with 0.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Fixes: 944e1abed9 ("perf ui browser: Add method to draw up/down arrow line")
Link: http://lkml.kernel.org/r/1496901704-30275-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-07-10 16:36:40 -03:00
Jin Yao
dcaa394807 perf annotate: Return arch from symbol__disassemble() and save it in browser
In annotate browser, we will add support to check fused instructions.
While this is x86-specific feature so we need the annotate browser to
know what the arch it runs on.

symbol__disassemble() has figured out the arch. This patch just lets the
arch return from symbol__disassemble and save the arch in annotate
browser.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1497840958-4759-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-19 15:27:09 -03:00
Jin Yao
ec27ae1892 perf annotate browser: Display titles in left frame
The annotate browser is divided into 2 frames. Left frame contains 3
columns (some platforms only have one column).

For example:

                   │26  int compute_flag()
                   │27  {
 22.80  1.20       │      sub    $0x8,%rsp
                   │25          int i;
                   │
                   │27          i = rand() % 2;
 22.78  1.20     1 │    → callq  rand@plt

While it's hard for user to understand what the data is.

This patch adds the titles "Percent", "IPC" and "Cycle" on columns.

Percent  IPC Cycle │
                   │25  __attribute__((noinline))
                   │26  int compute_flag()
                   │27  {
 22.80  1.20       │      sub    $0x8,%rsp
                   │25          int i;
                   │
                   │27          i = rand() % 2;
 22.78  1.20     1 │    → callq  rand@plt

The titles are displayed at row 0 of annotate browser if row 0 doesn't
have values of percent, ipc and cycle.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/1493909895-9668-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-19 15:14:57 -03:00
Jin Yao
c564f0db92 perf report: Remove unnecessary check in annotate_browser_write()
In annotate_browser_write(),

        if (dl->offset != -1 && percent_max != 0.0) {
                if (percent_max != 0.0) {
			...
                }
                ...
        }

The second check of (percent_max != 0.0) is not necessary, remove it.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/1493909895-9668-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-19 15:14:57 -03:00
Namhyung Kim
7111ffff60 perf tools: Put caller above callee in --children mode
The __hpp__sort_acc() sorts entries using callchain depth in order to
put callers above in children mode.  But it assumed the callchain order
was callee-first.  Now default (for children) is caller-first so the
order of entries is reverted.

For example, consider following case:

  $ perf report --no-children
  ..l
  # Overhead  Command  Shared Object        Symbol
  # ........  .......  ...................  ..........................
  #
      99.44%  a.out    a.out                [.] main
              |
              ---main
                 __libc_start_main
                 _start

Then children mode should show 'start' above '__libc_start_main' since
it's the caller (parent) of the __libc_start_main.  But it's reversed:

  # Children      Self  Command  Shared Object    Symbol
  # ........  ........  .......  ...............  .....................
  #
      99.61%     0.00%  a.out    libc-2.25.so     [.] __libc_start_main
      99.61%     0.00%  a.out    a.out            [.] _start
      99.54%    99.44%  a.out    a.out            [.] main

This patch fixes it.

  # Children      Self  Command  Shared Object    Symbol
  # ........  ........  .......  ...............  .....................
  #
      99.61%     0.00%  a.out    a.out            [.] _start
      99.61%     0.00%  a.out    libc-2.25.so     [.] __libc_start_main
      99.54%    99.44%  a.out    a.out            [.] main

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yao Jin <yao.jin@linux.intel.com>
Cc: kernel-team@lge.com
Link: http://lkml.kernel.org/r/20170524062129.32529-8-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-24 08:41:49 +02:00
Arnaldo Carvalho de Melo
5068b52f73 perf ui gtk: Move gtk .so name to the only place where it is used
No need to pollute util.h with this.

Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-kec0chbdtgrd71o3oi2kz2zt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-26 15:31:57 -03:00
Arnaldo Carvalho de Melo
e7ff8920e6 perf tools: Use just forward declarations for struct thread where possible
Removing various instances of unnecessary includes, reducing the maze of
header dependencies.

Link: http://lkml.kernel.org/n/tip-hwu6eyuok9pc57alookyzmsf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-24 13:43:35 -03:00
Arnaldo Carvalho de Melo
58db1d6e7d perf tools: Move units conversion/formatting routines to separate object
Out of util.h, to disentangle it a bit more.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-vpksyj3w5fk9t8s6mxmkajyr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-20 13:22:44 -03:00
Arnaldo Carvalho de Melo
9607ad3a63 perf tools: Add signal.h to places using its definitions
And remove it from util.h, disentangling it a bit more.

Link: http://lkml.kernel.org/n/tip-2zg9s5nx90yde64j3g4z2uhk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-20 13:22:43 -03:00
Arnaldo Carvalho de Melo
76b31a29dd perf tools: Remove include dirent.h from util.h
The files using the dirent.h routines should instead include it,
reducing the includes hell that lead to longer build times.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-42g2f4z6nfg7mdb2ae97n7tj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19 13:01:55 -03:00
Arnaldo Carvalho de Melo
9a3993d408 perf tools: Move path related functions to util/path.h
Disentangling util.h header mess a bit more.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-aj6je8ly377i4upedmjzdsq6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19 13:01:53 -03:00
Arnaldo Carvalho de Melo
b0742e90f5 perf tools: Don't include terminal handling headers in util.h
Continuing the disentanglement, mostly the TUI needs CTRL(c), that is
in sys/ttydefaults.h and term.c needs the termios headers.

And term.h needs to be added to a few places too.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-il19zna7qj9ytavdbwlipc7t@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19 13:01:53 -03:00
Arnaldo Carvalho de Melo
a43783aeec perf tools: Include errno.h where needed
Removing it from util.h, part of an effort to disentangle the includes
hell, that makes changes to util.h or something included by it to cause
a complete rebuild of the tools.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ztrjy52q1rqcchuy3rubfgt2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19 13:01:51 -03:00
Arnaldo Carvalho de Melo
a067558e2f perf tools: Move extra string util functions to util/string2.h
Moving them from util.h, where they don't belong. Since libc already
have string.h, name it slightly differently, as string2.h.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-eh3vz5sqxsrdd8lodoro4jrw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19 13:01:51 -03:00
Arnaldo Carvalho de Melo
632a5cabea perf tools: Move srcline definitions to separate header
Out of util.h into a new file, srcline.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ludnlm4djqcdjziekzr4s3u9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19 13:01:50 -03:00
Arnaldo Carvalho de Melo
3d689ed609 perf tools: Move sane ctype stuff from util.h to sane_ctype.h
More stuff that came from git, out of the hodge-podge that is util.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-e3lana4gctz3ub4hn4y29hkw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19 13:01:48 -03:00
Arnaldo Carvalho de Melo
fd20e8111c perf tools: Including missing inttypes.h header
Needed to use the PRI[xu](32,64) formatting macros.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wkbho8kaw24q67dd11q0j39f@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19 13:01:46 -03:00
Arnaldo Carvalho de Melo
877a7a1105 perf tools: Add include <linux/kernel.h> where ARRAY_SIZE() is used
To pave the way for further cleanups where linux/kernel.h may stop being
included in some header.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-qqxan6tfsl6qx3l0v3nwgjvk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19 13:01:44 -03:00
Taeung Song
e21600fd41 perf ui browser: Refactor the code to parse color configs with ltrim()
When parsing {fore, back} ground color configs, use ltrim() instead of
just while loop and isspace().

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1491575061-704-4-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-11 08:45:10 -03:00
Milian Wolff
5dfa210e40 perf report: Enable sorting by srcline as key
Often it is interesting to know how costly a given source line is in
total. Previously, one had to build these sums manually based on all
addresses that pointed to the same source line. This patch introduces
srcline as a sort key, which will do the aggregation for us.

Paired with the recent addition of showing inline frames, this makes
perf report much more useful for many C++ work loads.

The following shows the new feature in action. First, let's show the
status quo output when we sort by address. The result contains many hist
entries that generate the same output:

  ~~~~~~~~~~~~~~~~
  $ perf report --stdio --inline -g address
  # Children      Self  Command       Shared Object        Symbol
  # ........  ........  ............  ...................  .........................................
  #
      99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
            |
            |--64.55%--main complex:655
            |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
            |          /usr/include/c++/6.3.1/complex:664 (inline)
            |          |
            |          |--60.31%--hypot +20
            |          |          |
            |          |          |--8.52%--__hypot_finite +273
            |          |          |
            |          |          |--7.32%--__hypot_finite +411
...
             --35.34%--_start +4194346
                       __libc_start_main +241
                       |
                       |--6.65%--main random.tcc:3326
                       |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |
                       |--2.70%--main random.tcc:3326
                       |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
                       |
                       |--1.69%--main random.tcc:3326
                       |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
  ...
  ~~~~~~~~~~~~~~~~

With this patch and `-g srcline` we instead get the following output:

  ~~~~~~~~~~~~~~~~
  $ perf report --stdio --inline -g srcline
  # Children      Self  Command       Shared Object        Symbol
  # ........  ........  ............  ...................  .........................................
  #
      99.89%    35.34%  cpp-inlining  cpp-inlining         [.] main
            |
            |--64.55%--main complex:655
            |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
            |          /usr/include/c++/6.3.1/complex:664 (inline)
            |          |
            |          |--64.02%--hypot
            |          |          |
            |          |           --59.81%--__hypot_finite
            |          |
            |           --0.53%--cabs
            |
             --35.34%--_start
                       __libc_start_main
                       |
                       |--12.48%--main random.tcc:3326
                       |          /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
                       |          /usr/include/c++/6.3.1/bits/random.h:185 (inline)
  ...
  ~~~~~~~~~~~~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-27 12:13:28 -03:00
Jin Yao
0d3eb0b778 perf report: Show inline stack for browser mode
If the address belongs to an inlined function, the source information
back to the first non-inlined function will be printed.

For example:

1. Show inlined function name
   perf report -g function --inline

-    0.69%     0.00%  inline   ld-2.23.so           [.] dl_main
   - dl_main
        0.56% _dl_relocate_object
         _dl_relocate_object (inline)
         elf_dynamic_do_Rela (inline)

2. Show the file/line information
   perf report -g address --inline

-    0.69%     0.00%  inline   ld-2.23.so           [.] _dl_start
     _dl_start rtld.c:307
      /build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline)
   + _dl_sysdep_start dl-sysdep.c:250

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1490474069-15823-6-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-27 12:12:59 -03:00
Jin Yao
0db64dd060 perf report: Show inline stack for stdio mode
If the address belongs to an inlined function, the source information
back to the first non-inlined function will be printed.

For example:

1. Show inlined function name
   perf report --stdio -g function --inline

     0.69%     0.00%  inline   ld-2.23.so           [.] dl_main
            |
            ---dl_main
               |
                --0.56%--_dl_relocate_object
                          _dl_relocate_object (inline)
                          elf_dynamic_do_Rela (inline)

2. Show the file/line information
   perf report --stdio -g address --inline

     0.69%     0.00%  inline   ld-2.23.so           [.] _dl_start_user
            |
            ---_dl_start_user .:0
               _dl_start rtld.c:307
               /build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline)
               _dl_sysdep_start dl-sysdep.c:250
               |
                --0.56%--dl_main rtld.c:2076

Committer tests:

  # perf record --call-graph dwarf ~/bin/perf stat usleep 1

 Performance counter stats for 'usleep 1':

          0.443020      task-clock (msec)         #    0.449 CPUs utilized
                 1      context-switches          #    0.002 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                52      page-faults               #    0.117 M/sec
         1,049,423      cycles                    #    2.369 GHz
           801,456      instructions              #    0.76  insn per cycle
           155,609      branches                  #  351.246 M/sec
             7,026      branch-misses             #    4.52% of all branches

       0.000987570 seconds time elapsed

  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.553 MB perf.data (66 samples) ]
  # perf report --stdio --inline fs__get_mountpoint
  <SNIP>
     1.73%     0.00%  perf     perf           [.] fs__get_mountpoint
            |
            ---fs__get_mountpoint
               fs__get_mountpoint (inline)
               fs__check_mounts (inline)
               __statfs
               entry_SYSCALL_64
               sys_statfs
               SYSC_statfs
               user_statfs
               user_path_at_empty
               filename_lookup
               path_lookupat
               link_path_walk
               inode_permission
               __inode_permission
               kernfs_iop_permission
               kernfs_refresh_inode
               security_inode_notifysecctx
               selinux_inode_notifysecctx
               selinux_inode_setsecurity
               security_context_to_sid
               security_context_to_sid_core
               string_to_context_struct
               symcmp

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1490474069-15823-5-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-27 12:02:22 -03:00
Changbin Du
3ef5b4023c perf hists browser: Fix typo in function switch_data_file
Should clear buf 'abs_path', not 'options'.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 341487ab56 ("perf hists browser: Add option for runtime switching perf data file")
Link: http://lkml.kernel.org/r/20170313114652.9207-1-changbin.du@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-13 11:58:57 -03:00
Namhyung Kim
bb963e1650 perf utils: Check verbose flag properly
It now can have negative value to suppress the message entirely.  So it
needs to check it being positive.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: kernel-team@lge.com
Link: http://lkml.kernel.org/r/20170217081742.17417-3-namhyung@kernel.org
[ Adjust fuzz on tools/perf/util/pmu.c, add > 0 checks in many other places ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-20 11:35:54 -03:00
Ingo Molnar
0fc6d1ca14 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-06 11:04:39 +01:00
Namhyung Kim
a1c9f97f0b perf diff: Fix -o/--order option behavior (again)
Commit 21e6d84286 ("perf diff: Use perf_hpp__register_sort_field
interface") changed list_add() to perf_hpp__register_sort_field().

This resulted in a behavior change since the field was added to the tail
instead of the head.  So the -o option is mostly ignored due to its
order in the list.

This patch fixes it by adding perf_hpp__prepend_sort_field().

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 21e6d84286 ("perf diff: Use perf_hpp__register_sort_field interface")
Link: http://lkml.kernel.org/r/20170118051457.30946-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-02 11:39:09 -03:00
Namhyung Kim
8381cdd0e3 perf diff: Fix segfault on 'perf diff -o N' option
The -o/--order option is to select column number to sort a diff result.

It does the job by adding a hpp field at the beginning of the sort list.
But it should not be added to the output field list as it has no
callbacks required by a output field.

During the setup_sorting(), the perf_hpp__setup_output_field() appends
the given sort keys to the output field if it's not there already.

Originally it was checked by fmt->list being non-empty.  But commit
3f931f2c42 ("perf hists: Make hpp setup function generic") changed it
to check the ->equal callback.

Anyways, we don't need to add the pseudo hpp field to the output field
list since it won't be used for output.  So just skip fields if they
have no ->color or ->entry callbacks.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 3f931f2c42 ("perf hists: Make hpp setup function generic")
Link: http://lkml.kernel.org/r/20170118051457.30946-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-02 11:39:04 -03:00
Jiri Olsa
0e3fa7a7ac perf hists browser: Add e/c hotkeys to expand/collapse callchain for current entry
Currently we allow only to expand or collapse all entries in the browser
with 'E' or 'C' keys. Allow user to expand or collapse only current
entry in the browser with e or c key.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1484904032-11040-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-20 13:37:26 -03:00
Jiri Olsa
b33f922651 perf hists browser: Put hist_entry folding logic into single function
It will be used in following patch to expand or collapse only the
current browser entry.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1484904032-11040-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-20 10:04:45 -03:00
Soramichi AKIYAMA
d25ed5d9fa perf tools: Move two variables usied in libperf from perf.c
The use_browser and perf_version_string variables are both declared in
perf.c but they are also referenced by other functions of libperf.a.

Therefore a user linking an own main() with libperf.a must declare those
two variables in their files even if the files never use the browser or
the version information.

This patch fixes this issue by moving use_browser and
perf_version_string out of perf.c to some other files.

Signed-off-by: Soramichi Akiyama <akiyama@m.soramichi.jp>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170117002237.c1aec0ce3b4d675dca018deb@m.soramichi.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-17 11:36:45 -03:00
Ravi Bangoria
e216874cc1 perf annotate: Fix jump target outside of function address range
If jump target is outside of function range, perf is not handling it
correctly. Especially when target address is lesser than function start
address, target offset will be negative. But, target address declared to
be unsigned, converts negative number into 2's complement. See below
example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is
lesser than function start address(34cf0).

        34ac0 - 34cf0 = -0x230 = 0xfffffffffffffdd0

Objdump output:

  0000000000034cf0 <__sigaction>:
  __GI___sigaction():
    34cf0: lea    -0x20(%rdi),%eax
    34cf3: cmp    -bashx1,%eax
    34cf6: jbe    34d00 <__sigaction+0x10>
    34cf8: jmpq   34ac0 <__GI___libc_sigaction>
    34cfd: nopl   (%rax)
    34d00: mov    0x386161(%rip),%rax        # 3bae68 <_DYNAMIC+0x2e8>
    34d07: movl   -bashx16,%fs:(%rax)
    34d0e: mov    -bashxffffffff,%eax
    34d13: retq

perf annotate before applying patch:

  __GI___sigaction  /usr/lib64/libc-2.22.so
           lea    -0x20(%rdi),%eax
           cmp    -bashx1,%eax
        v  jbe    10
        v  jmpq   fffffffffffffdd0
           nop
    10:    mov    _DYNAMIC+0x2e8,%rax
           movl   -bashx16,%fs:(%rax)
           mov    -bashxffffffff,%eax
           retq

perf annotate after applying patch:

  __GI___sigaction  /usr/lib64/libc-2.22.so
           lea    -0x20(%rdi),%eax
           cmp    -bashx1,%eax
        v  jbe    10
        ^  jmpq   34ac0 <__GI___libc_sigaction>
           nop
    10:    mov    _DYNAMIC+0x2e8,%rax
           movl   -bashx16,%fs:(%rax)
           mov    -bashxffffffff,%eax
           retq

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chris Riyder <chris.ryder@arm.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1480953407-7605-3-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15 16:25:46 -03:00
Arnaldo Carvalho de Melo
5252b1aeab perf annotate: Show invalid jump offset in error message
To help in debugging when the wrong offset is being used, like in:

       │13d98: ↓ jne    13dd1 <lzma_lzma_preset@@XZ_5.0+0x28e1>

That is the full line from objdump, and it seems what should be used is
13dd1, not 28e1.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-4nc0marsgst1ft6inmvqber7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-25 15:56:34 -03:00
Arnaldo Carvalho de Melo
9484b86e9c perf ui helpline: Provide a printf variant
To print some values, like in the annotation code with invalid jump
offsets.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-1vk0g5twas2ioswn1mmvnvwq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-25 15:49:16 -03:00
Arnaldo Carvalho de Melo
75b49202d8 perf annotate: Remove duplicate 'name' field from disasm_line
The disasm_line::name field is always equal to ins::name, being used
just to locate the instruction's ins_ops from the per-arch instructions
table.

Eliminate this duplication, nuking that field and instead make
ins__find() return an ins_ops, store it in disasm_line::ins.ops, and
keep just in disasm_line::ins.name what was in disasm_line::name, this
way we end up not keeping a reference to entries in the per-arch
instructions table.

This in turn will help supporting multiple ways to manage the per-arch
instructions table, allowing resorting that array, for instance, when
the entries will move after references to its addresses were made. The
same problem is avoided when one grows the array with realloc.

So architectures simply keeping a constant array will work as well as
architectures building the table using regular expressions or other
logic that involves resorting the table.

Reviewed-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chris Riyder <chris.ryder@arm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-vr899azvabnw9gtuepuqfd9t@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-25 10:24:16 -03:00
Ingo Molnar
47414424c5 perf/core improvements and fixes:
New tool:
 
 - 'perf sched timehist' provides an analysis of scheduling events.
 
   Example usage:
       perf sched record -- sleep 1
       perf sched timehist
 
   By default it shows the individual schedule events, including the wait
   time (time between sched-out and next sched-in events for the task), the
   task scheduling delay (time between wakeup and actually running) and run
   time for the task:
 
         time    cpu  task name         wait time  sch delay  run time
                      [tid/pid]            (msec)     (msec)    (msec)
     -------- ------  ----------------  ---------  ---------  --------
     1.874569 [0011]  gcc[31949]            0.014      0.000     1.148
     1.874591 [0010]  gcc[31951]            0.000      0.000     0.024
     1.874603 [0010]  migration/10[59]      3.350      0.004     0.011
     1.874604 [0011]  <idle>                1.148      0.000     0.035
     1.874723 [0005]  <idle>                0.016      0.000     1.383
     1.874746 [0005]  gcc[31949]            0.153      0.078     0.022
   ...
 
   Times are in msec.usec. (David Ahern, Namhyung Kim)
 
 Improvements:
 
 - Make 'perf c2c report' support -f/--force, to allow skipping the
   ownership check for root users, for instance, just like the other
   tools (Jiri Olsa)
 
 - Allow sorting cachelines by total number of HITMs, in addition to
   local and remote numbers (Jiri Olsa)
 
 Fixes:
 
 - Make sure errors aren't suppressed by the TUI reset at the end of
   a 'perf c2c report' session (Jiri Olsa)
 
 Infrastructure:
 
 - Initial work on having the annotate code better support multiple
   architectures, including the ability to cross-annotate, i.e. to
   annotate perf.data files collected on an ARM system on a x86_64
   workstation (Arnaldo Carvalho de Melo, Ravi Bangoria, Kim Phillips)
 
 - Use USECS_PER_SEC instead of hard coded number in libtraceevent (Steven Rostedt)
 
 - Add retrieval of preempt count and latency flags in libtraceevent (Steven Rostedt)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYNbr5AAoJENZQFvNTUqpAnq0P/1SkKcxUdjXHt59P9s1GH1W2
 VDDGdRMVG8IkhzNpVX7ojQ48rC/04e/QooFaASoMV9ySUI1V5aDi1JjcpSSqvEw7
 I4DobaJLwebqUJUP2LteoNAuX0UVq6jWUXFDCzeN9yAfoQ9qTNgejLtOACrQd32n
 l4FxyFvfrdhmy4I95Aa+1VaBGEOwzXmkr0h7DcGenYoKsO6lPJ/WtBhVtqvcq26G
 PtYhD2UZMmDhLfPy6kZffIfNtkJExeSqVkdoHYtt9cpvVO6JZdjfHVsvHc6TxW4f
 GXnHEC65Q7Gu2xRLPdaNYDXD9C7LZcOITnIwKt9GfCx2RV6nhVT2H7qnZM0xMP1l
 +362wIx9KJ628l/Q7SWQTjnL2a2yG4sCqNluSQizokYlUXvKOHfDzwT3TRy9QzVz
 H+mCL4f7eb8rZINRswVi7hi/KeQnLpUgNbJe9XCLdsCdA/lJeJ4kUcU52Nnx/Kp5
 nX7A+6KFthijJuAS0dFLsyi+t8Ln7TeeoDJ6n1REVwp7zNUBj+yQtOPNFKsPnaAq
 VFDpSkBxMHOC8vW2Dz1x7zkINjLsoOsc1Z3E5slc/ZAKfKeKyukCd0YDZitvIwuf
 67daqhoUtw4Gu9M5hKGx2jGy5osMlY9zzSBe/nENZGzcoLPBrHhCuV/w3IOKzLjY
 9EoFDSM2l34ihMGZliSa
 =gL8a
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-20161123' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New tool:

- 'perf sched timehist' provides an analysis of scheduling events.

  Example usage:
      perf sched record -- sleep 1
      perf sched timehist

  By default it shows the individual schedule events, including the wait
  time (time between sched-out and next sched-in events for the task), the
  task scheduling delay (time between wakeup and actually running) and run
  time for the task:

        time    cpu  task name         wait time  sch delay  run time
                     [tid/pid]            (msec)     (msec)    (msec)
    -------- ------  ----------------  ---------  ---------  --------
    1.874569 [0011]  gcc[31949]            0.014      0.000     1.148
    1.874591 [0010]  gcc[31951]            0.000      0.000     0.024
    1.874603 [0010]  migration/10[59]      3.350      0.004     0.011
    1.874604 [0011]  <idle>                1.148      0.000     0.035
    1.874723 [0005]  <idle>                0.016      0.000     1.383
    1.874746 [0005]  gcc[31949]            0.153      0.078     0.022
  ...

  Times are in msec.usec. (David Ahern, Namhyung Kim)

Improvements:

- Make 'perf c2c report' support -f/--force, to allow skipping the
  ownership check for root users, for instance, just like the other
  tools (Jiri Olsa)

- Allow sorting cachelines by total number of HITMs, in addition to
  local and remote numbers (Jiri Olsa)

Fixes:

- Make sure errors aren't suppressed by the TUI reset at the end of
  a 'perf c2c report' session (Jiri Olsa)

Infrastructure changes:

- Initial work on having the annotate code better support multiple
  architectures, including the ability to cross-annotate, i.e. to
  annotate perf.data files collected on an ARM system on a x86_64
  workstation (Arnaldo Carvalho de Melo, Ravi Bangoria, Kim Phillips)

- Use USECS_PER_SEC instead of hard coded number in libtraceevent (Steven Rostedt)

- Add retrieval of preempt count and latency flags in libtraceevent (Steven Rostedt)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-24 05:09:31 +01:00
Ingo Molnar
69e6cdd0cf Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-24 05:09:08 +01:00
Arnaldo Carvalho de Melo
786c1b5184 perf annotate: Start supporting cross arch annotation
Introduce a 'struct arch', where arch specific stuff will live, starting
with objdump's choice of comment delimitation character, that is '#' in
x86 while a ';' in arm.

This has some bits and pieces from a patch submitted by Ravi.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chris Riyder <chris.ryder@arm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-f337tzjjcl8vtapgvjxmhrbx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-17 17:12:50 -03:00
Jin Yao
fef51ecd10 perf report: Show branch info in callchain entry for browser mode
If the branch is 100% predicted then the "predicted" is hidden.
Similarly, if there is no branch tsx abort, the "abort" is hidden.
There is only cycles shown (cycle is supported on skylake platform,
older platform would be 0).

If no iterations, the "iterations" is hidden.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/1477876794-30749-6-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-14 13:34:08 -03:00
Jin Yao
8577ae6b04 perf report: Show branch info in callchain entry for stdio mode
If the branch is 100% predicted then the "predicted" is hidden.
Similarly, if there is no branch tsx abort, the "abort" is hidden.
There is only cycles shown (cycle is supported on skylake platform,
older platform would be 0).

If no iterations, the "iterations" is hidden.

For example:

|--29.93%--main div.c:39 (predicted:50.6%, cycles:1, iterations:18)
|          main div.c:44 (predicted:50.6%, cycles:1)
|          |
|           --22.69%--main div.c:42 (cycles:2, iterations:17)
|                     compute_flag div.c:28 (cycles:2)
|                     |
|                      --10.52%--compute_flag div.c:27 (cycles:1)
|                                rand rand.c:28 (cycles:1)
|                                rand rand.c:28 (cycles:1)
|                                __random random.c:298 (cycles:1)
|                                __random random.c:297 (cycles:1)
|                                __random random.c:295 (cycles:1)
|                                __random random.c:295 (cycles:1)
|                                __random random.c:295 (cycles:1)
|                                __random random.c:295 (cycles:6)

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/1477876794-30749-5-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-14 13:33:47 -03:00
Namhyung Kim
b9bf911e99 perf hists browser: Fix column indentation on --hierarchy
When horizontall scrolling is used in hierarchy mode, the the right most
column has unnecessary indentation.  Actually it's needed only if some
of left (overhead) columns were shown.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161108130833.9263-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-09 11:45:58 -03:00
Namhyung Kim
131d51eb1d perf hists browser: Show folded sign properly on --hierarchy
When horizontal scrolling is used in hierarchy mode, the folded signed
disappears at the right most column.

Committer note:

To test it, run 'perf top --hierarchy, see the '+' symbol at the first
column, then press the right arrow key, the '+' symbol will disappear,
this patch fixes that.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161108130833.9263-3-namhyung@kernel.org
[ Move 'width -= 2' invariant to right after the if/else ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-09 11:30:47 -03:00
Namhyung Kim
3d9f468392 perf hists browser: Fix indentation of folded sign on --hierarchy
It should indent 2 spaces for folded sign and a whitespace.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161108130833.9263-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-09 11:20:56 -03:00
Namhyung Kim
9cba984454 perf hist browser: Fix hierarchy column counts
The perf report/top on TUI supports horizontal scrolling using LEFT and
RIGHT keys.

But it calculate the number of columns incorrectly when hierarchy mode
is enabled so that keep pressing RIGHT key can make the output
disappeared.

In the hierarchy mode, all sort keys are collapsed into a single column,
so it needs to be applied when calculating column numbers.

Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161024162110.17918-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-09 11:19:28 -03:00
Namhyung Kim
8a06b0be65 perf hist browser: Fix hierarchy column counts
The perf report/top on TUI supports horizontal scrolling using LEFT and
RIGHT keys.

But it calculate the number of columns incorrectly when hierarchy mode
is enabled so that keep pressing RIGHT key can make the output
disappeared.

In the hierarchy mode, all sort keys are collapsed into a single column,
so it needs to be applied when calculating column numbers.

Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161024162110.17918-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-25 09:52:49 -03:00
Alexis Berlemont
21e8c81095 perf hists browser: Dynamically change verbosity level
Here is a small patch which tries to fulfill a point in the perf todo
list:

* Make pressing 'V' multiple times to go on cycling thru various
  verbosity levels in 'perf top', so that info that is present in
  'perf top -v' can be obtained without having to restart the tool
  (acme).

After a small grep in the code, the max verbosity level seems 3; so,
we cycle at 4; I did not dare define a MAX_VERBOSE_LEVEL constant.

Signed-off-by: Alexis Berlemont <alexis.berlemont@gmail.com>
Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161012214823.14324-2-alexis.berlemont@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-24 11:07:42 -03:00
Jiri Olsa
5a1a99cd2e perf c2c report: Add main TUI browser
Add the main cachelines TUI browser. It allows to navigate through
cachelines and display their details and callchains (implemented in the
following patches).

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-pk632k4h1uwc5t0lqc7k61zg@git.kernel.org
Link: http://lkml.kernel.org/r/20161021001706.GB23970@krava
[ Handle file with no entries, fixing segfault reported by Kim Phillips ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21 10:30:03 -03:00
Namhyung Kim
c611152373 perf top: Fix refreshing hierarchy entries on TUI
Markus reported that 'perf top --hierarchy' cannot scroll down after
refresh.  This was because the number of entries are not updated when
hierarchy is enabled.

Unlike normal report view, hierarchy mode needs to keep its own entry
count since it can have non-leaf entries which can expand/collapse.

Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: f5b763feeb ("perf hists browser: Count number of hierarchy entries")
Link: http://lkml.kernel.org/r/20161007050412.3000-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-13 11:10:14 -03:00