linux-stable/tools/perf
Milian Wolff 21ac9d547f perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:

Before:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      52496.495043      task-clock (msec)         #    0.999 CPUs utilized
               634      context-switches          #    0.012 K/sec
                 2      cpu-migrations            #    0.000 K/sec
           191,561      page-faults               #    0.004 M/sec
   165,074,498,235      cycles                    #    3.144 GHz
   334,170,832,408      instructions              #    2.02  insn per cycle
    90,220,029,745      branches                  # 1718.591 M/sec
       654,525,177      branch-misses             #    0.73% of all branches

      52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!

After:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                31      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           185,471      page-faults               #    0.008 M/sec
    71,188,113,681      cycles                    #    3.149 GHz
   133,204,943,083      instructions              #    1.87  insn per cycle
    34,886,384,979      branches                  # 1543.214 M/sec
       278,214,495      branch-misses             #    0.80% of all branches

      22.609857253 seconds time elapsed

Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.

I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25 10:50:46 -03:00
..
arch perf annotate: Remove arch::cpuid_parse callback 2017-10-23 11:20:54 -03:00
bench perf tools: Use __maybe_unused consistently 2017-06-19 15:27:06 -03:00
Documentation perf list: Fix group description in the man page 2017-10-23 11:20:54 -03:00
jvmti perf jit: fix typo: "incalid" -> "invalid" 2017-06-27 11:55:06 -03:00
pmu-events perf vendor events: Add Goldmont Plus V1 event file 2017-10-23 16:30:54 -03:00
python
scripts perf script python: Add support for sqlite3 to call-graph-from-sql.py 2017-08-15 17:03:38 -03:00
tests perf tests attr: Make hw events optional 2017-10-23 11:20:54 -03:00
trace perf trace beauty madvise: Generate 'behavior' string table from kernel headers 2017-09-21 13:12:59 -03:00
ui perf report: Remove code to handle inline frames from browsers 2017-10-24 09:59:55 -03:00
util perf report: Cache srclines for callchain nodes 2017-10-25 10:50:46 -03:00
.gitignore perf tools: Ignore generated files pmu-events/{jevents,pmu-events.c} for git 2017-03-13 10:59:36 -03:00
Build perf trace: Only build tools/perf/trace/beauty/ when building 'perf trace' 2017-07-18 23:13:52 -03:00
builtin-annotate.c perf annotate browser: Support --show-nr-samples option 2017-08-18 11:15:09 -03:00
builtin-bench.c perf tools: Remove unused 'prefix' from builtin functions 2017-03-27 11:58:09 -03:00
builtin-buildid-cache.c perf buildid-cache: Cache debuginfo 2017-07-18 23:14:11 -03:00
builtin-buildid-list.c perf tools: Include errno.h where needed 2017-04-19 13:01:51 -03:00
builtin-c2c.c perf tools: Fix leaking rec_argv in error cases 2017-09-18 09:40:21 -03:00
builtin-config.c perf config: Write a config file just once 2017-09-13 09:49:15 -03:00
builtin-data.c perf data: Add doc when no conversion support compiled 2017-07-28 16:30:45 -03:00
builtin-diff.c perf config: Do not die when parsing u64 or int config values 2017-06-27 11:44:58 -03:00
builtin-evlist.c perf tools: Remove unused 'prefix' from builtin functions 2017-03-27 11:58:09 -03:00
builtin-ftrace.c tools include: Adopt strstarts() from the kernel 2017-07-20 15:46:10 -03:00
builtin-help.c tools include: Adopt strstarts() from the kernel 2017-07-20 15:46:10 -03:00
builtin-inject.c perf tools: Add feature header record to pipe-mode 2017-07-18 23:14:36 -03:00
builtin-kallsyms.c perf tools: Including missing inttypes.h header 2017-04-19 13:01:46 -03:00
builtin-kmem.c perf kmem: Perform some cleanup if '--time' is given an invalid value 2017-10-23 16:30:53 -03:00
builtin-kvm.c perf top: Implement multithreading for perf_event__synthesize_threads 2017-10-03 09:27:46 -03:00
builtin-list.c perf list: Add metric groups to perf list 2017-09-13 09:49:13 -03:00
builtin-lock.c perf tools: Include errno.h where needed 2017-04-19 13:01:51 -03:00
builtin-mem.c perf tools: Fix leaking rec_argv in error cases 2017-09-18 09:40:21 -03:00
builtin-probe.c perf buildid-cache: Support binary objects from other namespaces 2017-07-18 23:14:11 -03:00
builtin-record.c perf mmap: Adopt push method from builtin-record.c 2017-10-23 11:20:54 -03:00
builtin-report.c perf report: Group stat values on global event id 2017-08-28 16:44:44 -03:00
builtin-sched.c perf sched timehist: Add pid and tid options 2017-09-13 09:49:12 -03:00
builtin-script.c perf script: Fix error handling path 2017-10-23 16:30:53 -03:00
builtin-stat.c perf stat: Fall weak group back even for EBADF 2017-09-13 09:49:16 -03:00
builtin-timechart.c perf tools: Fix leaking rec_argv in error cases 2017-09-18 09:40:21 -03:00
builtin-top.c perf top: Add option to set the number of thread for event synthesize 2017-10-03 09:27:54 -03:00
builtin-trace.c perf tools: Introduce binary__fprintf() 2017-10-23 16:30:52 -03:00
builtin-version.c perf tools: Remove string.h, unistd.h and sys/stat.h from util.h 2017-04-24 13:43:33 -03:00
builtin.h perf tools: Remove stale prototypes from builtin.h 2017-04-24 13:43:33 -03:00
check-headers.sh perf tools: Do not check ABI headers in a detached tarball build 2017-10-23 16:30:50 -03:00
command-list.txt perf tools: Missing c2c command in command-list 2017-03-13 10:59:31 -03:00
CREDITS
design.txt
Makefile
Makefile.config perf tools: Robustify detection of clang binary 2017-08-28 16:44:46 -03:00
Makefile.perf perf trace beauty madvise: Generate 'behavior' string table from kernel headers 2017-09-21 13:12:59 -03:00
MANIFEST perf tools: Get all of tools/{arch,include}/ in the MANIFEST 2017-09-25 10:39:43 -03:00
perf-archive.sh
perf-completion.sh
perf-read-vdso.c
perf-sys.h perf tools: Use default CPUINFO_PROC where it fits 2017-08-17 16:58:21 -03:00
perf-with-kcore.sh
perf.c perf tools: Support running perf binaries with a dash in their name 2017-09-12 12:48:54 -03:00
perf.h perf record: Support direct --user-regs arguments 2017-09-13 09:49:14 -03:00