linux-stable/tools/perf
Namhyung Kim 57482dc5ac perf sort: Fix the 'weight' sort key behavior
[ Upstream commit 784e8adda4 ]

Currently, the 'weight' field in the perf sample has latency information
for some instructions like in memory accesses.  And perf tool has 'weight'
and 'local_weight' sort keys to display the info.

But it's somewhat confusing what it shows exactly.  In my understanding,
'local_weight' shows a weight in a single sample, and (global) 'weight'
shows a sum of the weights in the hist_entry.

For example:

  $ perf mem record -t load dd if=/dev/zero of=/dev/null bs=4k count=1M

  $ perf report --stdio -n -s +local_weight
  ...
  #
  # Overhead  Samples  Command  Shared Object     Symbol                     Local Weight
  # ........  .......  .......  ................  .........................  ............
  #
      21.23%      313  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   32
      12.43%      183  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   35
      11.97%      159  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   36
      10.40%      141  dd       [kernel.vmlinux]  [k] lockref_put_return     32
       7.63%      113  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   33
       6.37%       92  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   34
       6.15%       90  dd       [kernel.vmlinux]  [k] lockref_put_return     33
  ...

So let's look at the 'lockref_get_not_zero' symbols.  The top entry
shows that 313 samples were captured with 'local_weight' 32, so the
total weight should be 313 x 32 = 10016.  But it's not the case:

  $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
  ...
  #
  # Overhead  Samples  Command  Shared Object     Local Weight  Weight
  # ........  .......  .......  ................  ............  ......
  #
       1.36%        4  dd       [kernel.vmlinux]  36            144
       0.47%        4  dd       [kernel.vmlinux]  37            148
       0.42%        4  dd       [kernel.vmlinux]  32            128
       0.40%        4  dd       [kernel.vmlinux]  34            136
       0.35%        4  dd       [kernel.vmlinux]  36            144
       0.34%        4  dd       [kernel.vmlinux]  35            140
       0.30%        4  dd       [kernel.vmlinux]  36            144
       0.30%        4  dd       [kernel.vmlinux]  34            136
       0.30%        4  dd       [kernel.vmlinux]  32            128
       0.30%        4  dd       [kernel.vmlinux]  32            128
  ...

With the 'weight' sort key, it's divided to 4 samples even with the same
info ('comm', 'dso', 'sym' and 'local_weight').  I don't think this is
what we want.

I found this because of the way it aggregates the 'weight' value.  Since
it's not a period, we should not add them in the he->stat.  Otherwise,
two 32 'weight' entries will create a 64 'weight' entry.

After that, new 32 'weight' samples don't have a matching entry so it'd
create a new entry and make it a 64 'weight' entry again and again.
Later, they will be merged into 128 'weight' entries during the
hists__collapse_resort() with 4 samples, multiple times like above.

Let's keep the weight and display it differently.  For 'local_weight',
it can show the weight as is, and for (global) 'weight' it can display
the number multiplied by the number of samples.

With this change, I can see the expected numbers.

  $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
  ...
  #
  # Overhead  Samples  Command  Shared Object     Local Weight  Weight
  # ........  .......  .......  ................  ............  .....
  #
      21.23%      313  dd       [kernel.vmlinux]  32            10016
      12.43%      183  dd       [kernel.vmlinux]  35            6405
      11.97%      159  dd       [kernel.vmlinux]  36            5724
       7.63%      113  dd       [kernel.vmlinux]  33            3729
       6.37%       92  dd       [kernel.vmlinux]  34            3128
       4.17%       59  dd       [kernel.vmlinux]  37            2183
       0.08%        1  dd       [kernel.vmlinux]  269           269
       0.08%        1  dd       [kernel.vmlinux]  38            38

Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211105225617.151364-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08 09:04:40 +01:00
..
arch perf callchain: Fix compilation on powerpc with gcc11+ 2021-10-31 12:51:41 -03:00
bench perf bench futex: Fix memory leak of perf_cpu_map__new() 2021-11-25 09:48:33 +01:00
dlfilters perf tests: Add dlfilter test 2021-08-11 09:35:44 -03:00
Documentation perf doc: Fix typos all over the place 2021-09-27 09:32:28 -03:00
examples/bpf perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
include perf build: Move perf_dlfilters.h in the source tree 2021-08-11 09:35:24 -03:00
jvmti perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
pmu-events perf jevents: Free the sys_event_tables list after processing entries 2021-10-05 14:48:10 -03:00
python
scripts perf scripts python: Fix passing arguments to stackcollapse report 2021-09-10 11:45:19 -03:00
tests perf tests: Remove bash construct from record+zstd_comp_decomp.sh 2021-11-25 09:48:33 +01:00
trace perf beauty: Cover more flags in the move_mount syscall argument beautifier 2021-09-10 18:15:22 -03:00
ui perf annotate: Fix fused instr logic for assembly functions 2021-09-18 17:43:05 -03:00
util perf sort: Fix the 'weight' sort key behavior 2021-12-08 09:04:40 +01:00
.gitignore perf tools: Ignore Documentation dependency file 2021-09-11 15:36:16 -03:00
Build perf daemon: Add daemon command 2021-02-09 15:42:57 -03:00
builtin-annotate.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-bench.c perf bench: Add benchmark for evlist open/close operations 2021-08-10 11:32:37 -03:00
builtin-buildid-cache.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-buildid-list.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-c2c.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
builtin-config.c
builtin-daemon.c Merge remote-tracking branch 'torvalds/master' into perf/core 2021-03-29 10:39:10 -03:00
builtin-data.c perf data: Correct -h output 2021-08-31 15:12:00 -03:00
builtin-diff.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-evlist.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-ftrace.c perf ftrace: Fix access to pid in array when setting a pid filter 2021-04-23 15:58:10 -03:00
builtin-help.c
builtin-inject.c perf inject: Fix output from a file to a pipe 2021-08-02 10:14:34 -03:00
builtin-kallsyms.c
builtin-kmem.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-kvm.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-list.c
builtin-lock.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-mem.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-probe.c perf probe: Do not show @plt function by default 2021-07-07 10:28:10 -03:00
builtin-record.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
builtin-report.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-sched.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-script.c perf script: Fix PERF_SAMPLE_WEIGHT_STRUCT support 2021-10-31 12:51:41 -03:00
builtin-stat.c perf iostat: Use system-wide mode if the target cpu_list is unspecified 2021-09-27 09:39:30 -03:00
builtin-timechart.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-top.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-trace.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-version.c
builtin.h perf daemon: Add daemon command 2021-02-09 15:42:57 -03:00
check-headers.sh perf report: Add tools/arch/x86/include/asm/amd-ibs.h 2021-09-10 18:13:20 -03:00
command-list.txt perf stat: Enable iostat mode for x86 platforms 2021-04-20 08:40:20 -03:00
CREDITS
design.txt
Makefile perf tools: Add a build-test variant to use in builds from a tarball 2021-04-20 08:43:58 -03:00
Makefile.config perf build: Add missing -lstdc++ when linking with libopencsd 2021-10-05 14:48:10 -03:00
Makefile.perf perf build: Suppress 'rm dlfilter' build message 2021-10-31 12:51:41 -03:00
MANIFEST scripts/bpf: Abstract eBPF API target parameter 2021-03-04 18:39:45 -08:00
perf-archive.sh perf archive: Fix filtering of empty build-ids 2021-03-06 16:54:31 -03:00
perf-completion.sh
perf-iostat.sh perf stat: Enable iostat mode for x86 platforms 2021-04-20 08:40:20 -03:00
perf-read-vdso.c
perf-sys.h
perf-with-kcore.sh
perf.c perf debug: Move debug initialization earlier 2021-05-27 13:24:22 -03:00
perf.h