linux-stable/tools/perf
Milian Wolff 1982ad48fc perf report: Fix off-by-one for non-activation frames
As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~~~~~~~~~~~~~
  #include <thread>
  #include <chrono>

  using namespace std;

  int main()
  {
    this_thread::sleep_for(chrono::milliseconds(1000));
    this_thread::sleep_for(chrono::milliseconds(100));
    this_thread::sleep_for(chrono::milliseconds(10));

    return 0;
  }
~~~~~~~~~~~~~~~

Now compile and record it:

~~~~~~~~~~~~~~~
  g++ -std=c++11 -g -O2 test.cpp
  echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
  perf record \
    --event sched:sched_stat_sleep \
    --event sched:sched_process_exit \
    --event sched:sched_switch --call-graph=dwarf \
    --output perf.data.raw \
    ./a.out
  echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
  perf inject --sched-stat --input perf.data.raw --output perf.data
~~~~~~~~~~~~~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~~~~~~~~~~~~~
  Overhead  Source:Line
  ........  ...........

   100.00%  core.c:0
            |
            ---__schedule core.c:0
               schedule
               do_nanosleep hrtimer.c:0
               hrtimer_nanosleep
               sys_nanosleep
               entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
               __nanosleep_nocancel .:0
               std::this_thread::sleep_for<long, std::ratio<1l, 1000l> > thread:323
               |
               |--90.08%--main test.cpp:9
               |          __libc_start_main
               |          _start
               |
               |--9.01%--main test.cpp:10
               |          __libc_start_main
               |          _start
               |
                --0.91%--main test.cpp:13
                          __libc_start_main
                          _start
~~~~~~~~~~~~~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~~~~~~~~~~~~~
  Overhead  Source:Line
  ........  ...........

   100.00%  core.c:0
            |
            ---__schedule core.c:0
               schedule
               do_nanosleep hrtimer.c:0
               hrtimer_nanosleep
               sys_nanosleep
               entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
               __nanosleep_nocancel .:0
               std::this_thread::sleep_for<long, std::ratio<1l, 1000l> > thread:323
               |
               |--90.08%--main test.cpp:8
               |          __libc_start_main
               |          _start
               |
               |--9.01%--main test.cpp:9
               |          __libc_start_main
               |          _start
               |
                --0.91%--main test.cpp:10
                          __libc_start_main
                          _start
~~~~~~~~~~~~~~~

Similarly it works for signal frames:

~~~~~~~~~~~~~~~
  __noinline void bar(void)
  {
    volatile long cnt = 0;

    for (cnt = 0; cnt < 100000000; cnt++);
  }

  __noinline void foo(void)
  {
    bar();
  }

  void sig_handler(int sig)
  {
    foo();
  }

  int main(void)
  {
    signal(SIGUSR1, sig_handler);
    raise(SIGUSR1);

    foo();
    return 0;
  }
~~~~~~~~~~~~~~~~

Before, the report wrongly points to `signal.c:29` after raise():

~~~~~~~~~~~~~~~~
  $ perf report --stdio --no-children -g srcline -s srcline
  ...
   100.00%  signal.c:11
            |
            ---bar signal.c:11
               |
               |--50.49%--main signal.c:29
               |          __libc_start_main
               |          _start
               |
                --49.51%--0x33a8f
                          raise .:0
                          main signal.c:29
                          __libc_start_main
                          _start
~~~~~~~~~~~~~~~~

With this patch in, the issue is fixed and we instead get:

~~~~~~~~~~~~~~~~
   100.00%  signal   signal            [.] bar
            |
            ---bar signal.c:11
               |
               |--50.49%--main signal.c:29
               |          __libc_start_main
               |          _start
               |
                --49.51%--0x33a8f
                          raise .:0
                          main signal.c:27
                          __libc_start_main
                          _start
~~~~~~~~~~~~~~~~

Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc(). For libunwind, we replace the functionality via
unw_is_signal_frame() for any but the very first frame.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yao Jin <yao.jin@linux.intel.com>
Cc: kernel-team@lge.com
Link: http://lkml.kernel.org/r/20170524062129.32529-4-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-24 08:41:48 +02:00
..
arch perf annotate: Fix AArch64 comment char 2017-05-04 10:03:00 -03:00
bench perf tools: Move extra string util functions to util/string2.h 2017-04-19 13:01:51 -03:00
Documentation perf tools: Fix spelling mistakes 2017-05-04 09:59:53 -03:00
jvmti perf kvmti: Remove unused Makefile file 2016-11-14 12:42:56 -03:00
pmu-events perf vendor events intel: Add missing space in json descriptions 2017-03-30 13:35:50 -07:00
python
scripts
tests perf tests kmod-path: Don't fail if compressed modules aren't supported 2017-05-04 10:05:55 -03:00
trace perf tools: Add signal.h to places using its definitions 2017-04-20 13:22:43 -03:00
ui perf ui gtk: Move gtk .so name to the only place where it is used 2017-04-26 15:31:57 -03:00
util perf report: Fix off-by-one for non-activation frames 2017-05-24 08:41:48 +02:00
.gitignore perf tools: Ignore generated files pmu-events/{jevents,pmu-events.c} for git 2017-03-13 10:59:36 -03:00
Build perf trace: Beautify statx syscall 'flag' and 'mask' arguments 2017-03-31 14:42:31 -03:00
builtin-annotate.c perf tools: Include errno.h where needed 2017-04-19 13:01:51 -03:00
builtin-bench.c perf tools: Remove unused 'prefix' from builtin functions 2017-03-27 11:58:09 -03:00
builtin-buildid-cache.c perf symbols: Accept symbols starting at address 0 2017-05-02 18:23:04 -03:00
builtin-buildid-list.c perf tools: Include errno.h where needed 2017-04-19 13:01:51 -03:00
builtin-c2c.c perf tools: Move event prototypes from util.h to event.h 2017-04-25 15:30:47 -03:00
builtin-config.c perf config: Refactor a duplicated code for obtaining config file name 2017-05-02 18:23:12 -03:00
builtin-data.c perf tools: Remove unused 'prefix' from builtin functions 2017-03-27 11:58:09 -03:00
builtin-diff.c perf tools: Include errno.h where needed 2017-04-19 13:01:51 -03:00
builtin-evlist.c perf tools: Remove unused 'prefix' from builtin functions 2017-03-27 11:58:09 -03:00
builtin-ftrace.c perf tools: Remove poll.h and wait.h from util.h 2017-04-24 13:43:34 -03:00
builtin-help.c perf tools: Remove string.h, unistd.h and sys/stat.h from util.h 2017-04-24 13:43:33 -03:00
builtin-inject.c perf tools: Use just forward declarations for struct thread where possible 2017-04-24 13:43:35 -03:00
builtin-kallsyms.c perf tools: Including missing inttypes.h header 2017-04-19 13:01:46 -03:00
builtin-kmem.c perf tools: Include errno.h where needed 2017-04-19 13:01:51 -03:00
builtin-kvm.c perf tools: Remove poll.h and wait.h from util.h 2017-04-24 13:43:34 -03:00
builtin-list.c perf tools: Remove unused 'prefix' from builtin functions 2017-03-27 11:58:09 -03:00
builtin-lock.c perf tools: Include errno.h where needed 2017-04-19 13:01:51 -03:00
builtin-mem.c perf tools: Use just forward declarations for struct thread where possible 2017-04-24 13:43:35 -03:00
builtin-probe.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2017-05-02 19:09:35 -07:00
builtin-record.c perf tools: Remove poll.h and wait.h from util.h 2017-04-24 13:43:34 -03:00
builtin-report.c perf tools: Remove string.h, unistd.h and sys/stat.h from util.h 2017-04-24 13:43:33 -03:00
builtin-sched.c perf tools: Include errno.h where needed 2017-04-19 13:01:51 -03:00
builtin-script.c perf tools: Remove string.h, unistd.h and sys/stat.h from util.h 2017-04-24 13:43:33 -03:00
builtin-stat.c perf tools: Remove poll.h and wait.h from util.h 2017-04-24 13:43:34 -03:00
builtin-timechart.c perf tools: Use just forward declarations for struct thread where possible 2017-04-24 13:43:35 -03:00
builtin-top.c perf tools: Move event prototypes from util.h to event.h 2017-04-25 15:30:47 -03:00
builtin-trace.c perf tools: Move event prototypes from util.h to event.h 2017-04-25 15:30:47 -03:00
builtin-version.c perf tools: Remove string.h, unistd.h and sys/stat.h from util.h 2017-04-24 13:43:33 -03:00
builtin.h perf tools: Remove stale prototypes from builtin.h 2017-04-24 13:43:33 -03:00
check-headers.sh tools include uapi: Grab copies of stat.h and fcntl.h 2017-03-31 11:26:03 -03:00
command-list.txt perf tools: Missing c2c command in command-list 2017-03-13 10:59:31 -03:00
CREDITS
design.txt
Makefile
Makefile.config perf tools: Disable JVMTI if no ELF support available 2017-04-13 11:47:43 -03:00
Makefile.perf perf build: Add special fixdep cleaning rule 2017-02-17 16:04:38 -03:00
MANIFEST tools include: Introduce linux/bug.h, from the kernel sources 2017-04-19 13:01:42 -03:00
perf-archive.sh
perf-completion.sh
perf-read-vdso.c
perf-sys.h perf powerpc: Fix build-test failure 2016-09-08 13:44:07 -03:00
perf-with-kcore.sh
perf.c perf tools: Move event prototypes from util.h to event.h 2017-04-25 15:30:47 -03:00
perf.h perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info 2017-03-14 11:38:23 -03:00