Commit graph

1150 commits

Author SHA1 Message Date
Changbin Du
a1ef3aaf6a perf docs: Fix format of unordered lists
Fix the format of unordered lists so the can wrap properly.

Signed-off-by: Changbin Du <changbin.du@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230718085242.3090797-1-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-16 08:37:49 -03:00
Adrian Hunter
82b0a10390 perf dlfilter: Add al_cleanup()
Add perf_dlfilter_fns.al_cleanup() to do addr_location__exit() on data
passed via perf_dlfilter_fns.resolve_address().

Add dlfilter-test-api-v2 to the "dlfilter C API" test to test it.

Update documentation, clarifying that data returned by APIs should not
be dereferenced after filter_event() and filter_event_early() return.

Fixes: 0dd5041c9a ("perf addr_location: Add init/exit/copy functions")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230731091857.10681-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-15 16:41:49 -03:00
Ian Rogers
3d6dfae889 perf parse-events: Remove BPF event support
New features like the BPF --filter support in perf record have made the
BPF event functionality somewhat redundant. As shown by commit
fcb027c1a4f6 ("perf tools: Revert enable indices setting syntax for BPF
map") and commit 14e4b9f428 ("perf trace: Raw augmented syscalls fix
libbpf 1.0+ compatibility") the BPF event support hasn't been well
maintained and it adds considerable complexity in areas like event
parsing, not least as '/' is a separator for event modifiers as well as
in paths.

This patch removes support in the event parser for BPF events and then
the associated functions are removed. This leads to the removal of whole
source files like bpf-loader.c.  Removing support means that augmented
syscalls in perf trace is broken, this will be fixed in a later commit
adding support using BPF skeletons.

The removal of BPF events causes an unused label warning from flex
generated code, so update build to ignore it:

  ```
  util/parse-events-flex.c:2704:1: error: label ‘find_rule’ defined but not used [-Werror=unused-label]
  2704 | find_rule: /* we branch to this label when backing up */
  ```

Committer notes:

Extracted from a larger patch that was also removing the support for
linking with libllvm and libclang, that were an alternative to using an
external clang execution to compile the .c event source code into BPF
bytecode.

Testing it:

  # perf trace -e /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
  event syntax error: '/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c'
                        \___ Bad event or PMU

  Unabled to find PMU or event on a PMU of 'home'

  Initial error:
  event syntax error: '/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c'
                        \___ Cannot find PMU `home'. Missing kernel support?
  Run 'perf list' for a list of valid events

   Usage: perf trace [<options>] [<command>]
      or: perf trace [<options>] -- <command> [<options>]
      or: perf trace record [<options>] [<command>]
      or: perf trace record [<options>] -- <command> [<options>]

      -e, --event <event>   event/syscall selector. use 'perf list' to list available events
  #

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Fangrui Song <maskray@google.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230810184853.2860737-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-15 16:41:48 -03:00
Xiu Jianfeng
1e37201405 perf doc: Fix typo in perf.data-file-format.txt
The 'it' should be 'is' here, fix it.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230727105001.261420-1-xiujianfeng@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-07-28 19:01:16 -03:00
Arnaldo Carvalho de Melo
2df2707164 perf bench uprobe: Add benchmark to test uprobe overhead
This just adds the initial "workload", a call to libc's usleep(1000us)
function:

  $ perf stat --null perf bench uprobe all
  # Running uprobe/baseline benchmark...
  # Executed 1000 usleep(1000) calls
       Total time: 1053533 usecs

   1053.533 usecs/op

   Performance counter stats for 'perf bench uprobe all':

         1.061042896 seconds time elapsed

         0.001079000 seconds user
         0.006499000 seconds sys

  $

More entries will be added using a BPF skel to add various uprobes to
the usleep() function.

Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andre Fredette <anfredet@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Dave Tucker <datucker@redhat.com>
Cc: Derek Barbosa <debarbos@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/20230719204910.539044-2-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-07-20 11:31:19 -03:00
Namhyung Kim
f6027053f8 perf lock contention: Add --output option
To avoid formatting failures for example in CSV output due to debug
messages, add --output option to put the result in a file.
Unfortunately the short -o option was taken by the --owner already.

  $ sudo ./perf lock con -ab --output lock-out.txt -v sleep 1
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: cannot get elf header.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols

  $ head lock-out.txt
   contended   total wait     max wait     avg wait         type   caller

           3     76.79 us     26.89 us     25.60 us     rwlock:R   ep_poll_callback+0x2d
  			0xffffffff9a23f4b5  _raw_read_lock_irqsave+0x45
  			0xffffffff99bbd4dd  ep_poll_callback+0x2d
  			0xffffffff999029f3  __wake_up_common+0x73
  			0xffffffff99902b82  __wake_up_common_lock+0x82
  			0xffffffff99fa5b1c  sock_def_readable+0x3c
  			0xffffffff9a11521d  unix_stream_sendmsg+0x18d
  			0xffffffff99f9fc9c  sock_sendmsg+0x5c

Suggested-by: Ian Rogers <irogers@google.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-07-01 10:48:48 -07:00
Namhyung Kim
69c5c9930d perf lock contention: Add -x option for CSV style output
Sometimes we want to process the output by external programs.  Let's add
the -x option to specify the field separator like perf stat.

  $ sudo ./perf lock con -ab -x, sleep 1
  # output: contended, total wait, max wait, avg wait, type, caller
  19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
  15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
  4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
  1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
  8, 67608, 27404, 8451, spinlock, __queue_work+0x174
  3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
  3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
  2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
  1, 24619, 24619, 24619, spinlock, rcu_core+0xd4

The first line is a comment that shows the output format.  Each line is
separated by the given string ("," in this case).  The time is printed
in nsec without the unit so that it can be parsed easily.

The characters can be used in the output like (":", "+" and ".") are not
allowed for the -x option.

  $ ./perf lock con -x:
  Cannot use the separator that is already used

   Usage: perf lock contention [<options>]

      -x, --field-separator <separator>
                            print result in CSV format with custom separator

The stacktraces are printed in the same line separated by ":".  The
header is updated to show the stacktrace.  Also the debug output is
added at the end as a comment.

  $ sudo ./perf lock con -abv -x, -F wait_total sleep 1
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: cannot get elf header.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
  # output: total wait, type, caller, stacktrace
  37134, spinlock, rcu_core+0xd4, 0xffffffff9d0401e4 _raw_spin_lock_irqsave+0x44: 0xffffffff9c738114 rcu_core+0xd4: ...
  21213, spinlock, raw_spin_rq_lock_nested+0x1b, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9c6d9cfb raw_spin_rq_lock_nested+0x1b: ...
  20506, rwlock:W, ep_done_scan+0x2d, 0xffffffff9c9bc4dd ep_done_scan+0x2d: 0xffffffff9c9bd5f1 do_epoll_wait+0x6d1: ...
  18044, rwlock:R, ep_poll_callback+0x2d, 0xffffffff9d040555 _raw_read_lock_irqsave+0x45: 0xffffffff9c9bc81d ep_poll_callback+0x2d: ...
  17890, rwlock:W, do_epoll_wait+0x47b, 0xffffffff9c9bd39b do_epoll_wait+0x47b: 0xffffffff9c9be9ef __x64_sys_epoll_wait+0x6d1: ...
  12114, spinlock, futex_wait_queue+0x60, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9d037cae __schedule+0xbe: ...
  # debug: total=7, bad=0, bad_task=0, bad_stack=0, bad_time=0, bad_data=0

Also note that some field (like lock symbols) can be empty.

  $ sudo ./perf lock con -abl -x, -E 10 sleep 1
  # output: contended, total wait, max wait, avg wait, address, symbol, type
  6, 275025, 61764, 45837, ffff9dcc9f7d60d0, , spinlock
  18, 87716, 11196, 4873, ffff9dc540059000, , spinlock
  2, 6472, 5499, 3236, ffff9dcc7f730e00, rq_lock, spinlock
  3, 4429, 2341, 1476, ffff9dcc7f7b0e00, rq_lock, spinlock
  3, 3974, 1635, 1324, ffff9dcc7f7f0e00, rq_lock, spinlock
  4, 3290, 1326, 822, ffff9dc5f4e2cde0, , rwlock
  3, 2894, 1023, 964, ffffffff9e0d7700, rcu_state, spinlock
  1, 2567, 2567, 2567, ffff9dcc7f6b0e00, rq_lock, spinlock
  4, 1259, 596, 314, ffff9dc69c2adde0, , rwlock
  1, 934, 934, 934, ffff9dcc7f670e00, rq_lock, spinlock

Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-07-01 10:48:35 -07:00
Fangrui Song
78987bb02a perf: Replace deprecated -target with --target= for Clang
-target has been deprecated since Clang 3.4 in 2013. Use the preferred
--target=bpf form instead. This matches how we use --target= in
scripts/Makefile.clang.

Signed-off-by: Fangrui Song <maskray@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: llvm@lists.linux.dev
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link: 274b6f0c87
Link: https://lore.kernel.org/r/20230624002708.1907962-1-maskray@google.com
[ resolved a conflict with GEN_VMLINUX_H changes ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-06-27 12:13:22 -07:00
Ian Rogers
e657096777 perf stat: Document --metric-no-threshold and threshold colors
Document the threshold behavior for -M/--metrics.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20230519063719.1029596-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-05 16:04:14 -03:00
K Prateek Nayak
aab667ca88 perf stat: Add "--per-cache" aggregation option and document it
This patch adds support for "--per-cache" option for aggregation at a
particular cache level and documents the same.

Following is the output of 'perf stat' with aggregation at L3 for the
event "ls_dmnd_fills_from_sys.ext_cache_remote" on a dual socket 3rd
Generation EPYC Processor (2 x 64C/128T - 16 LLCs) when running
hackbench pinned to 4 LLCs:

  $ sudo perf stat --per-cache=L3 -a -e ls_dmnd_fills_from_sys.ext_cache_remote -- \
    taskset -c 0-15,64-79,128-143,192-207 \
    perf bench sched messaging -p -t -l 100000 -g 8

  ...

   Performance counter stats for 'system wide':

  S0-D0-L3-ID0             16          9,500,803      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID8             16          6,338,099      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID16            16            355,005      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID24            16             22,067      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID32            16             16,321      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID40            16             11,619      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID48            16              4,238      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID56            16             31,158      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID64            16         28,242,452      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID72            16         22,906,973      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID80            16             72,898      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID88            16             56,907      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID96            16             20,456      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID104           16             40,913      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID112           16             78,113      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID120           16             37,897      ls_dmnd_fills_from_sys.ext_cache_remote

Also support 'perf stat record' and 'perf stat report' with the ability
to specify a different cache level to aggregate data at when running
'perf stat report'.

  $ sudo perf stat record --per-cache=L2 -a -e ls_dmnd_fills_from_sys.ext_cache_remote -- \
    taskset -c 0-15,64-79,128-143,192-207 \
    perf bench sched messaging -p -t -l 100000 -g 8

  ...

   Performance counter stats for 'system wide':

  S0-D0-L2-ID0              2          1,442,061      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID1              2          1,548,994      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID2              2          1,553,557      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID3              2          1,420,122      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID4              2          1,465,461      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID5              2          1,455,153      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID6              2          1,595,237      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID7              2          1,499,321      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID8              2          1,919,025      ls_dmnd_fills_from_sys.ext_cache_remote
  ...
  S1-D1-L2-ID127            2             21,295      ls_dmnd_fills_from_sys.ext_cache_remote

  $ sudo perf stat report --per-cache=L3

   Performance counter stats for 'perf stat record --per-cache=L2 -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\
                                  taskset -c 0-15,64-79,128-143,192-207 \
                                  perf bench sched messaging -p -t -l 100000 -g 8':

  S0-D0-L3-ID0             16         11,979,906      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID8             16         14,257,202      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID16            16            377,484      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID24            16             27,224      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID32            16             26,816      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID40            16             14,461      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID48            16             10,499      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID56            16             53,817      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID64            16         27,361,987      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID72            16         37,299,024      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID80            16             84,125      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID88            16             64,561      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID96            16             13,403      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID104           16             20,138      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID112           16             93,220      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID120           16             35,465      ls_dmnd_fills_from_sys.ext_cache_remote

On the above system, the domain covered by S0-D0-L3-ID0 contains
S0-D0-L2-ID0 to S0-D0-L2-ID7, the corresponding count for L3-ID0 is
equal to the sum of counts for L2-ID0 to L2-ID7.

Add documentation for the newly introduced "--per-cache" option.

Suggested-by: Gautham Shenoy <gautham.shenoy@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wen Pu <puwen@hygon.cn>
Link: https://lore.kernel.org/r/20230517172745.5833-5-kprateek.nayak@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-23 16:10:13 -03:00
Ben Hutchings
61b3d2107d perf doc: Add support for KBUILD_BUILD_TIMESTAMP
When building man pages from a Git checkout, we consistently set the
man page date based on when the input was last changed.  Otherwise, it
defaults to the build time, which is not reproducible.

Allow the date to be set through the KBUILD_BUILD_TIMESTAMP variable,
as for timestamps in the kernel itself.

Signed-off-by: Ben Hutchings <benh@debian.org>
Acked-by: Ian Rogers<irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Salvatore Bonaccorso <carnil@debian.org>
Link: https://lore.kernel.org/r/ZF/1F1P+b9qZ/vVH@decadent.org.uk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-15 17:49:01 -03:00
Ben Hutchings
21a165133c perf doc: Define man page date when using asciidoctor
When building perf documentation with asciidoc, we use "git log" to
find the last commit date of each doc source and pass that to asciidoc
to use as the man page date.

When using asciidoctor, however, the current date is always used
instead.  Defining perf_date like we do for asciidoc also doesn't
work because we're not using DocBook as an intermediate format.
The asciidoctor man page backend looks for the variable "docdate",
so set that instead.

Signed-off-by: Ben Hutchings <benh@debian.org>
Acked-by: Ian Rogers<irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Salvatore Bonaccorso <carnil@debian.org>
Link: https://lore.kernel.org/r/ZF/1BOahN/i6xbBx@decadent.org.uk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-15 17:48:46 -03:00
Changbin Du
af9eb56bfe perf script: Add new output field 'dsoff' to print dso offset
This adds a new 'dsoff' field to print dso offset for resolved symbols,
and the offset is appended to dso name.

Default output:

  $ perf script
       ls 2695501 3011030.487017:     500000 cycles:      152cc73ef4b5 get_common_indices.constprop.0+0x155 (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
       ls 2695501 3011030.487018:     500000 cycles:  ffffffff99045b3e [unknown] ([unknown])
       ls 2695501 3011030.487018:     500000 cycles:  ffffffff9968e107 [unknown] ([unknown])
       ls 2695501 3011030.487018:     500000 cycles:  ffffffffc1f54afb [unknown] ([unknown])
       ls 2695501 3011030.487018:     500000 cycles:  ffffffff9968382f [unknown] ([unknown])
       ls 2695501 3011030.487019:     500000 cycles:  ffffffff99e00094 [unknown] ([unknown])
       ls 2695501 3011030.487019:     500000 cycles:      152cc718a8d0 __errno_location@plt+0x0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1)

Display 'dsoff' field:

  $ perf script -F +dsoff
       ls 2695501 3011030.487017:     500000 cycles:      152cc73ef4b5 get_common_indices.constprop.0+0x155 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5)
       ls 2695501 3011030.487018:     500000 cycles:  ffffffff99045b3e [unknown] ([unknown])
       ls 2695501 3011030.487018:     500000 cycles:  ffffffff9968e107 [unknown] ([unknown])
       ls 2695501 3011030.487018:     500000 cycles:  ffffffffc1f54afb [unknown] ([unknown])
       ls 2695501 3011030.487018:     500000 cycles:  ffffffff9968382f [unknown] ([unknown])
       ls 2695501 3011030.487019:     500000 cycles:  ffffffff99e00094 [unknown] ([unknown])
       ls 2695501 3011030.487019:     500000 cycles:      152cc718a8d0 __errno_location@plt+0x0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0)
       ls 2695501 3011030.487019:     500000 cycles:  ffffffff992a6db0 [unknown] ([unknown])

Signed-off-by: Changbin Du <changbin.du@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hui Wang <hw.huiwang@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230418031825.1262579-4-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-12 15:21:49 -03:00
Namhyung Kim
2d8d016527 perf lock contention: Update default map size to 16384
The BPF hash map will align the map size to a power of 2.  So 10k would
be 16k anyway.  Let's have the actual size to avoid confusions.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06 21:52:27 -03:00
Namhyung Kim
84b9192030 perf lock contention: Use -M for --map-nr-entries
Users often want to change the map size, let's add a short option (-M)
for that.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06 21:52:23 -03:00
Adrian Hunter
5ef506130c perf top: Add --branch-history option
Add --branch-history option, to act the same as that option does for
perf report.

Example:

  $ cat tcallf.c
  volatile a = 10000, b = 100000, c;

  __attribute__((noinline)) f2()
  {
          c = a / b;
  }

  __attribute__((noinline)) f1()
  {
          f2();
          f2();
  }
  main()
  {
          while (1)
                  f1();
  }
  $ gcc -w -g -o tcallf tcallf.c
  $ ./tcallf &
  [1] 29409
  $ perf top -e cycles:u  -t $(pidof tcallf) --stdio --no-children --branch-history
     PerfTop:    3819 irqs/sec  kernel: 0.0%  exact:  0.0% lost: 0/0 drop: 0/0 [4000Hz cycles:u],  (target_tid: 29409)
  --------------------------------------------------------------------------------------------------------------------

      49.01%  tcallf.c:5   [.] f2    tcallf
              |
              |--24.91%--f2 tcallf.c:4
              |          |
              |          |--17.14%--f1 tcallf.c:11 (cycles:1)
              |          |          f1 tcallf.c:11
              |          |          f2 tcallf.c:6 (cycles:3)
              |          |          f2 tcallf.c:4
              |          |          f1 tcallf.c:10 (cycles:2)
              |          |          f1 tcallf.c:9
              |          |          main tcallf.c:16 (cycles:1)
              |          |          main tcallf.c:16
              |          |          main tcallf.c:16 (cycles:1)
              |          |          main tcallf.c:16
              |          |          f1 tcallf.c:12 (cycles:1)
              |          |          f1 tcallf.c:12
              |          |          f2 tcallf.c:6 (cycles:3)
              |          |          f2 tcallf.c:4
              |          |          f1 tcallf.c:11 (cycles:1 iter:1 avg_cycles:12)
              |          |          f1 tcallf.c:11
              |          |          f2 tcallf.c:6 (cycles:3 iter:1 avg_cycles:12)
              |          |          f2 tcallf.c:4
              |          |          f1 tcallf.c:10 (cycles:2 iter:1 avg_cycles:12)
              |          |
              |           --7.78%--f1 tcallf.c:10 (cycles:2)
              |                     f1 tcallf.c:9
              |                     main tcallf.c:16 (cycles:1)
              |                     main tcallf.c:16
              |                     main tcallf.c:16 (cycles:1)
              |                     main tcallf.c:16
              |                     f1 tcallf.c:12 (cycles:1)
              |                     f1 tcallf.c:12
              |                     f2 tcallf.c:6 (cycles:3)
              |                     f2 tcallf.c:4
              |                     f1 tcallf.c:11 (cycles:1)
              |                     f1 tcallf.c:11
              |                     f2 tcallf.c:6 (cycles:3)
              |                     f2 tcallf.c:4
              |                     f1 tcallf.c:10 (cycles:2 iter:1 avg_cycles:12)
              |                     f1 tcallf.c:9
              |                     main tcallf.c:16 (cycles:1 iter:1 avg_cycles:12)
              |                     main tcallf.c:16
              |                     main tcallf.c:16 (cycles:1 iter:1 avg_cycles:12)
  ...

  $ pkill tcallf
  [1]+  Terminated              ./tcallf

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230330131833.12864-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 09:39:56 -03:00
Ian Rogers
57594454ce perf symbol: Add command line support for addr2line path
Allow addr2line to be set either on the command line or via the
perfconfig file. This doesn't currently work with llvm-addr2line as
the addr2line code emits two things:
1) the address to decode,
2) a bogus ',' value.
The expectation is the bogus value will generate:
??
??:0
that terminates the addr2line reading. However, the output from
llvm-addr2line is a single line with just the input ',' locking up the
addr2line reading that is expecting a second line.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 09:39:56 -03:00
Ian Rogers
0b02b47e71 perf annotate: Allow objdump to be set in perfconfig
Allow the setting of the objdump command in the perfconfig. Update man
page for this new option.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 09:39:56 -03:00
German Gomez
ea15483e7c perf report: Add 'simd' sort field
Add 'simd' sort field to visualize SIMD ops in 'perf report'.

Rows are labeled with the SIMD ISA, and the type of predicate (if any):

  - [p] partial predicate
  - [e] empty predicate (no elements in the vector being used)

Example with Arm SPE and SVE (Scalable Vector Extension):

  #include <arm_sve.h>

  double src[1025], dst[1025];

  int main(void) {
    svfloat64_t vc = svdup_f64(1);
    for(;;)
      for(int i = 0; i < 1025; i += svcntd())
      {
        svbool_t pg = svwhilelt_b64(i, 1025);
        svfloat64_t vsrc = svld1(pg, &src[i]);
        svfloat64_t vdst = svadd_x(pg, vsrc, vc);
        svst1(pg, &dst[i], vdst);
      }
    return 0;
  }

  ... compiled using "gcc-11 -march=armv8-a+sve -O3"

Profiling on a platform that implements FEAT_SVE and FEAT_SPEv1p1:

  $ perf record -e arm_spe_0// -- ./a.out
  $ perf report --itrace=i1i -s overhead,pid,simd,sym

  Overhead      Pid:Command   Simd     Symbol
  ........  ................  .......  ......................

    53.76%    10758:program            [.] main
    46.14%    10758:program   [.] SVE  [.] main
     0.09%    10758:program   [p] SVE  [.] main

The report shows 0.09% of the sampled SVE operations use partial
predicates due to src and dst arrays not being multiples of the vector
register lengths.

Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20 19:28:21 -03:00
Leo Yan
96d541699e perf kvm: Update documentation to reflect new changes
Update documentation for new sorting and option '--stdio'.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-15 16:53:43 -03:00
Namhyung Kim
c46bf3bd00 perf record: Update documentation for BPF filters
Add more description and examples.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-15 11:08:36 -03:00
Namhyung Kim
d180aa56b5 perf record: Add BPF event filter support
Use --filter option to set BPF filter for generic events other than the
tracepoints or Intel PT.  The BPF program will check the sample data and
filter according to the expression.

For example, the below is the typical perf record for frequency mode.
The sample period started from 1 and increased gradually.

  $ sudo ./perf record -e cycles true
  $ sudo ./perf script
       perf-exec 2272336 546683.916875:          1 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
       perf-exec 2272336 546683.916892:          1 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
       perf-exec 2272336 546683.916899:          3 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
       perf-exec 2272336 546683.916905:         17 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
       perf-exec 2272336 546683.916911:        100 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
       perf-exec 2272336 546683.916917:        589 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
       perf-exec 2272336 546683.916924:       3470 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
       perf-exec 2272336 546683.916930:      20465 cycles:  ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
            true 2272336 546683.916940:     119873 cycles:  ffffffff8283afdd perf_iterate_ctx+0x2d ([kernel.kallsyms])
            true 2272336 546683.917003:     461349 cycles:  ffffffff82892517 vma_interval_tree_insert+0x37 ([kernel.kallsyms])
            true 2272336 546683.917237:     635778 cycles:  ffffffff82a11400 security_mmap_file+0x20 ([kernel.kallsyms])

When you add a BPF filter to get samples having periods greater than 1000,
the output would look like below:

  $ sudo ./perf record -e cycles --filter 'period > 1000' true
  $ sudo ./perf script
       perf-exec 2273949 546850.708501:       5029 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
       perf-exec 2273949 546850.708508:      32409 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
       perf-exec 2273949 546850.708526:     143369 cycles:  ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
       perf-exec 2273949 546850.708600:     372650 cycles:  ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
       perf-exec 2273949 546850.708791:     482953 cycles:  ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
            true 2273949 546850.709036:     501985 cycles:  ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
            true 2273949 546850.709292:     503065 cycles:      7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)

Committer notes:

Add stubs for perf_bpf_filter__prepare() and perf_bpf_filter__destroy()
to tools/perf/util/python.c to keep it building.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-15 11:08:34 -03:00
Ian Rogers
20cb10eadb perf doc: Refresh topdown documentation
perf stat now supports --topdown for any platform with the TopdownL1
metric group including Intel before Icelake. Tweak the documentation
to reflect this.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-43-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-19 08:10:15 -03:00
Steinar H. Gunderson
7e55b95651 perf intel-pt: Synthesize cycle events
There is no good reason why we cannot synthesize "cycle" events from
Intel PT just as we can synthesize "instruction" events, in particular
when CYC packets are available. This enables using PT to getting much
more accurate cycle profiles than regular sampling (record -e cycles)
when the work last for very short periods (<10 ms).  Thus, add support
for this, based off of the existing IPC calculation framework. The new
option to --itrace is "y" (for cYcles), as c was taken for calls. Cycle
and instruction events can be synthesized together, and are by default.

The only real caveat is that CYC packets are only emitted whenever some
other packet is, which in practice is when a branch instruction is
encountered (and not even all branches). Thus, even at no subsampling
(e.g. --itrace=y0ns), it is impossible to get more accuracy than a
single basic block, and all cycles spent executing that block will get
attributed to the branch instruction that ends the packet.  Thus, one
cannot know whether the cycles came from e.g. a specific load, a
mispredicted branch, or something else. When subsampling (which is the
default), the cycle events will get smeared out even more, but will
still be generally useful to attribute cycle counts to functions.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322082452.1429091-1-sesse@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-17 11:02:44 -03:00
Feng Tang
1470a108a6 perf c2c: Add report option to show false sharing in adjacent cachelines
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.

0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.

So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.

In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):

  ----------------------------------------------------------------------
     26       31        2        0        0        0  0xffff888103ec6000
  ----------------------------------------------------------------------
   35.48%   50.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff8133148b   1153   66    971   3748   74  [k] get_mem_cgroup_from_mm
    6.45%    0.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff813396e4    570    0   1531    879   75  [k] mem_cgroup_charge
   25.81%   50.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81331472    949   70    593   3359   74  [k] get_mem_cgroup_from_mm
   19.35%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81339686   1352    0   1073   1022   74  [k] mem_cgroup_charge
    9.68%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff813396d6   1401    0    863    768   74  [k] mem_cgroup_charge
    3.23%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81333106    618    0    804     11    9  [k] uncharge_batch

The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.

[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/

Committer notes:

Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx

Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-16 09:33:45 -03:00
Namhyung Kim
3477f079fe perf lock contention: Add -o/--lock-owner option
When there're many lock contentions in the system, people sometimes want
to know who caused the contention, IOW who's the owner of the locks.

The -o/--lock-owner option tries to follow the lock owners for the
contended mutexes and rwsems from BPF, and then attributes the
contention time to the owner instead of the waiter.  It's a best effort
approach to get the owner info at the time of the contention and doesn't
guarantee to have the precise tracking of owners if it's changing over
time.

Currently it only handles mutex and rwsem that have owner field in their
struct and it basically points to a task_struct that owns the lock at
the moment.

Technically its type is atomic_long_t and it comes with some LSB bits
used for other meanings.  So it needs to clear them when casting it to a
pointer to task_struct.

Also the atomic_long_t is a typedef of the atomic 32 or 64 bit types
depending on arch which is a wrapper struct for the counter value.  I'm
not aware of proper ways to access those kernel atomic types from BPF so
I just read the internal counter value directly.  Please let me know if
there's a better way.

When -o/--lock-owner option is used, it goes to the task aggregation
mode like -t/--threads option does.  However it cannot get the owner for
other lock types like spinlock and sometimes even for mutex.

  $ sudo ./perf lock con -abo -- ./perf bench sched pipe
  # Running 'sched/pipe' benchmark:
  # Executed 1000000 pipe operations between two processes

       Total time: 4.766 [sec]

         4.766540 usecs/op
           209795 ops/sec
   contended   total wait     max wait     avg wait          pid   owner

         403    565.32 us     26.81 us      1.40 us           -1   Unknown
           4     27.99 us      8.57 us      7.00 us      1583145   sched-pipe
           1      8.25 us      8.25 us      8.25 us      1583144   sched-pipe
           1      2.03 us      2.03 us      2.03 us         5068   chrome

As you can see, the owner is unknown for the most cases.  But if we
filter only for the mutex locks, it'd more likely get the onwers.

  $ sudo ./perf lock con -abo -Y mutex -- ./perf bench sched pipe
  # Running 'sched/pipe' benchmark:
  # Executed 1000000 pipe operations between two processes

       Total time: 4.910 [sec]

         4.910435 usecs/op
           203647 ops/sec
   contended   total wait     max wait     avg wait          pid   owner

           2     15.50 us      8.29 us      7.75 us      1582852   sched-pipe
           7      7.20 us      2.47 us      1.03 us           -1   Unknown
           1      6.74 us      6.74 us      6.74 us      1582851   sched-pipe

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230207002403.63590-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-08 10:33:32 -03:00
Kan Liang
4e846311a9 perf script: Fix missing Retire Latency fields option documentation
The 'perf script' documentation is missing the fields option for Retire
Latency. Add it.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230206162100.3329395-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-06 14:57:50 -03:00
Kan Liang
d7d213e04c perf report: Support Retire Latency
The Retire Latency field is added in the var3_w of the
PERF_SAMPLE_WEIGHT_STRUCT. The Retire Latency reports pipeline stall of
this instruction compared to the previous instruction in cycles.  That's
quite useful to display the information with perf mem report.

The p_stage_cyc for Power is also from the var3_w. Union the p_stage_cyc
and retire_lat to share the code.

Implement X86 specific codes to display the X86 specific header.

Add a new sort key retire_lat for the Retire Latency.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20230104201349.1451191-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-03 17:24:02 -03:00
Namhyung Kim
7b204399ae perf lock contention: Add -S/--callstack-filter option
The -S/--callstack-filter is to limit display entries having the given
string in the callstack (not only in the caller in the output).

The following example shows lock contention results if the callstack
has 'net' substring somewhere.  Note that the caller '__dev_queue_xmit'
does not match to it, but it has 'inet6_csk_xmit' in the callstack.

This applies even if you don't use -v option to show the full callstack.

  $ sudo ./perf lock con -abv -S net sleep 1
  ...
   contended   total wait     max wait     avg wait         type   caller

           5     70.20 us     16.13 us     14.04 us     spinlock   __dev_queue_xmit+0xb6d
                          0xffffffffa5dd1c60  _raw_spin_lock+0x30
                          0xffffffffa5b8f6ed  __dev_queue_xmit+0xb6d
                          0xffffffffa5cd8267  ip6_finish_output2+0x2c7
                          0xffffffffa5cdac14  ip6_finish_output+0x1d4
                          0xffffffffa5cdb477  ip6_xmit+0x457
                          0xffffffffa5d1fd17  inet6_csk_xmit+0xd7
                          0xffffffffa5c5f4aa  __tcp_transmit_skb+0x54a
                          0xffffffffa5c6467d  tcp_keepalive_timer+0x2fd

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230126000936.3017683-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-02 16:32:19 -03:00
Namhyung Kim
3fd7a168bf perf script: Add 'cgroup' field for output
There's no field for the cgroup, let's add one.  To do that, users need to
specify --all-cgroup option for perf record to capture the cgroup info.

  $ perf record --all-cgroups -- true

  $ perf script -F comm,pid,cgroup
            true 337112  /user.slice/user-657345.slice/user@657345.service/...
            true 337112  /user.slice/user-657345.slice/user@657345.service/...
            true 337112  /user.slice/user-657345.slice/user@657345.service/...
            true 337112  /user.slice/user-657345.slice/user@657345.service/...

If it's recorded without the --all-cgroups, it'd complain.

  $ perf script -F comm,pid,cgroup
  Samples for 'cycles:u' event do not have CGROUP attribute set. Cannot print 'cgroup' field.
  Hint: run 'perf record --all-cgroups ...'

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230126213610.3381147-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-02 16:32:19 -03:00
Ross Zwisler
1df49ef9ee perf tools docs: Use canonical ftrace path
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

  Before 4.1, all ftrace tracing control files were within the debugfs
  file system, which is typically located at /sys/kernel/debug/tracing.
  For backward compatibility, when mounting the debugfs file system,
  the tracefs file system will be automatically mounted at:

  /sys/kernel/debug/tracing

A few spots in the perf docs still refer to this older debugfs path, so
let's update them to avoid confusion.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: linux-trace-kernel@vger.kernel.org
Link: http://lore.kernel.org/lkml/20230130181915.1113313-5-zwisler@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-02 16:32:19 -03:00
Namhyung Kim
aeb802f872 perf intel-pt: Do not try to queue auxtrace data on pipe
When it processes AUXTRACE_INFO, it calls to auxtrace_queue_data() to
collect AUXTRACE data first.  That won't work with pipe since it needs
lseek() to read the scattered aux data.

  $ perf record -o- -e intel_pt// true | perf report -i- --itrace=i100
  # To display the perf.data header info, please use --header/--header-only options.
  #
  0x4118 [0xa0]: failed to process type: 70
  Error:
  failed to process sample

For the pipe mode, it can handle the aux data as it gets.  But there's
no guarantee it can get the aux data in time.  So the following warning
will be shown at the beginning:

  WARNING: Intel PT with pipe mode is not recommended.
           The output cannot relied upon.  In particular,
           time stamps and the order of events may be incorrect.

Fixes: dbd134322e ("perf intel-pt: Add support for decoding AUX area samples")
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230131023350.1903992-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01 21:30:05 -03:00
James Clark
86569c0ab1 perf mem/c2c: Document that SPE is used for mem and c2c on ARM
Setup is non-trivial so also link to the full SPE docs.

Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.or
Link: https://lore.kernel.org/r/20230124145929.557891-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-27 15:00:34 -03:00
Diederik de Haas
fc5d836c67 perf: Various spelling fixes
Fix various spelling errors as reported by Debian's lintian tool.

"amount of times" -> "number of times"
ocurrence -> occurrence
upto -> up to

Signed-off-by: Diederik de Haas <didi.debian@cknow.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230122122034.48020-1-didi.debian@cknow.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-23 10:00:47 -03:00
Ian Rogers
4cbd5334ff perf tools: Fix foolproof typo
In the context of LBR stitching documentation.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230119201036.156441-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-22 18:10:53 -03:00
Adrian Hunter
1b69346e7a perf test: Add Symbols test
Add a test to check function symbols do not overlap and are not zero
length.

The main motivation for the test is to make it easier to review changes
to PLT symbol synthesis i.e. changes to dso__synthesize_plt_symbols().

By default the test uses the perf executable as a test DSO, but a
specific DSO can be specified via a new perf test option "--dso".

The test is useful in the following ways:

 - Any DSO can be tested, even ones that do not run on the current
 architecture. For example, using cross-compiled DSOs to see how
 well perf handles different architectures.

 - With verbose > 1 (e.g. -vv), all the symbols are printed, which
 makes it easier to see issues.

 - perf removes duplicate symbols and expands zero-length symbols
 to reach the next symbol, however that is done before adding
 synthesized symbols, so the test is checking those also.

Example:

  $ perf test -v Symbols
   74: Symbols                                                         :
  --- start ---
  test child forked, pid 154918
  Testing /home/user/bin/perf
  Overlapping symbols:
   7d000-7f3a0 g _init
   7d030-7d040 g __printf_chk@plt
  test child finished with -1
  ---- end ----
  Symbols: FAILED!

Note the test fails because perf expands the _init symbol over the PLT
because there are no PLT symbols at that point, but then
dso__synthesize_plt_symbols() creates them.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230120123456.12449-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-22 18:09:56 -03:00
qinyu
3524f89eda perf docs: Fix a typo in 'perf probe' man page: l20th -> 120th
Fix a minor typo in 'perf probe' doc.

Fixes: 631c9def80 ("perf probe: Support --line option to show probable source-code lines")
Signed-off-by: qinyu <qinyu32@huawei.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Feilong Lin <linfeilong@huawei.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230116012143.432435-1-qinyu32@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-19 09:49:59 -03:00
Ahelenia Ziemiańska
f24fb53984 perf tools: Don't include signature in version strings
This explodes the build if HEAD is signed, since the generated version
is gpg: Signature made Mon 26 Dec 2022 20:34:48 CET, then a few more
lines, then the SHA.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/7c9637711271f50ec2341fb8a7c29585335dab04.1672174189.git.nabijaczleweli@nabijaczleweli.xyz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-02 12:34:06 -03:00
Namhyung Kim
511e19b9e2 perf lock contention: Add -L/--lock-filter option
The -L/--lock-filter option is to filter only given locks.  The locks
can be specified by address or name (if exists).

  $ sudo ./perf lock record -a  sleep 1

  $ sudo ./perf lock con -l
   contended  total wait  max wait  avg wait           address  symbol

          57     1.11 ms  42.83 us  19.54 us  ffff9f4140059000
          15   280.88 us  23.51 us  18.73 us  ffffffff9d007a40  jiffies_lock
           1    20.49 us  20.49 us  20.49 us  ffffffff9d0d50c0  rcu_state
           1     9.02 us   9.02 us   9.02 us  ffff9f41759e9ba0

  $ sudo ./perf lock con -L jiffies_lock,rcu_state
   contended  total wait  max wait  avg wait      type  caller

          15   280.88 us  23.51 us  18.73 us  spinlock  tick_sched_do_timer+0x93
           1    20.49 us  20.49 us  20.49 us  spinlock  __softirqentry_text_start+0xeb

  $ sudo ./perf lock con -L ffff9f4140059000
   contended  total wait  max wait  avg wait      type  caller

          38   779.40 us  42.83 us  20.51 us  spinlock  worker_thread+0x50
          11   216.30 us  39.87 us  19.66 us  spinlock  queue_work_on+0x39
           8   118.13 us  20.51 us  14.77 us  spinlock  kthread+0xe5

Committer testing:

  # uname -a
  Linux quaco 6.0.12-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 8 17:15:53 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  # perf lock record
  ^C[ perf record: Woken up 1 times to write data ]
  # perf lock con -L jiffies_lock,rcu_state
   contended   total wait     max wait     avg wait         type   caller

  # perf lock con
   contended   total wait     max wait     avg wait         type   caller

           1      9.06 us      9.06 us      9.06 us     spinlock   call_timer_fn+0x24
  # perf lock con -L call
  ignore unknown symbol: call
   contended   total wait     max wait     avg wait         type   caller

           1      9.06 us      9.06 us      9.06 us     spinlock   call_timer_fn+0x24
  #

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221219201732.460111-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-21 14:52:39 -03:00
Namhyung Kim
b4a7eff93c perf lock contention: Add -Y/--type-filter option
The -Y/--type-filter option is to filter the result for specific lock
types only.  It can accept comma-separated values.  Note that it would
accept type names like one in the output.  spinlock, mutex, rwsem:R and
so on.

For RW-variant lock types, it converts the name to the both variants.
In other words, "rwsem" is same as "rwsem:R,rwsem:W".  Also note that
"mutex" has two different encoding - one for sleeping wait, another for
optimistic spinning.  Add "mutex-spin" entry for the lock_type_table so
that we can add it for "mutex" under the table.

  $ sudo ./perf lock record -a -- ./perf bench sched messaging

  $ sudo ./perf lock con -E 5 -Y spinlock
   contended  total wait   max wait  avg wait      type  caller

         802     1.26 ms   11.73 us   1.58 us  spinlock  __wake_up_common_lock+0x62
          13   787.16 us  105.44 us  60.55 us  spinlock  remove_wait_queue+0x14
          12   612.96 us   78.70 us  51.08 us  spinlock  prepare_to_wait+0x27
         114   340.68 us   12.61 us   2.99 us  spinlock  try_to_wake_up+0x1f5
          83   226.38 us    9.15 us   2.73 us  spinlock  folio_lruvec_lock_irqsave+0x5e

Committer notes:

Make get_type_flag() return UINT_MAX for error instad of -1UL, as that
function returns 'unsigned int' and we store the value on a 'unsigned
int' 'flags' variable which makes clang unhappy:

  35    98.23 fedora:37                     : FAIL clang version 15.0.6 (Fedora 15.0.6-1.fc37)
    builtin-lock.c:2012:14: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
                            if (flags != -1UL) {
                                ~~~~~ ^  ~~~~
    builtin-lock.c:2021:14: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
                            if (flags != -1UL) {
                                ~~~~~ ^  ~~~~
    builtin-lock.c:2037:14: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
                            if (flags != -1UL) {
                                ~~~~~ ^  ~~~~
    3 errors generated.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221219201732.460111-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-21 14:51:04 -03:00
Ian Rogers
5f8f95673f perf evlist: Remove group option.
The group option predates grouping events using curly braces added in
commit 89efb02950 ("perf tools: Add support to parse event group
syntax").

The --group option was retained for legacy support (in August
2012) but keeping it adds complexity.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Eelco Chaudron <echaudro@redhat.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shaomin Deng <dengshaomin@cdjrlc.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Timothy Hayes <timothy.hayes@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221213232651.1269909-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-14 15:28:18 -03:00
Namhyung Kim
688d2e8de2 perf lock contention: Add -l/--lock-addr option
The -l/--lock-addr option is to implement per-lock-instance contention
stat using LOCK_AGGR_ADDR.  It displays lock address and optionally
symbol name if exists.

  $ sudo ./perf lock con -abl sleep 1
   contended   total wait     max wait     avg wait            address   symbol

           1     36.28 us     36.28 us     36.28 us   ffff92615d6448b8
           9     10.91 us      1.84 us      1.21 us   ffffffffbaed50c0   rcu_state
           1     10.49 us     10.49 us     10.49 us   ffff9262ac4f0c80
           8      4.68 us      1.67 us       585 ns   ffffffffbae07a40   jiffies_lock
           3      3.03 us      1.45 us      1.01 us   ffff9262277861e0
           1       924 ns       924 ns       924 ns   ffff926095ba9d20
           1       436 ns       436 ns       436 ns   ffff9260bfda4f60

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221209190727.759804-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-14 11:24:31 -03:00
Anshuman Khandual
955f6def55 perf record: Add remaining branch filters: "no_cycles", "no_flags" & "hw_index"
This adds all remaining branch filters i.e "no_cycles", "no_flags" and
"hw_index". While here, also updates the documentation.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20221205064443.533587-1-anshuman.khandual@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-14 11:16:12 -03:00
Ian Rogers
6ed249441a perf list: Add JSON output option
Output events and metrics in a JSON format by overriding the print
callbacks. Currently other command line options aren't supported and
metrics are repeated once per metric group.

Committer testing:

  $ perf list cache

  List of pre-defined events (to be used in -e or -M):

    L1-dcache-load-misses                              [Hardware cache event]
    L1-dcache-loads                                    [Hardware cache event]
    L1-dcache-prefetches                               [Hardware cache event]
    L1-icache-load-misses                              [Hardware cache event]
    L1-icache-loads                                    [Hardware cache event]
    branch-load-misses                                 [Hardware cache event]
    branch-loads                                       [Hardware cache event]
    dTLB-load-misses                                   [Hardware cache event]
    dTLB-loads                                         [Hardware cache event]
    iTLB-load-misses                                   [Hardware cache event]
    iTLB-loads                                         [Hardware cache event]
  $ perf list --json cache
  [
  {
          "Unit": "cache",
          "EventName": "L1-dcache-load-misses",
          "EventType": "Hardware cache event"
  },
  {
          "Unit": "cache",
          "EventName": "L1-dcache-loads",
          "EventType": "Hardware cache event"
  },
  {
          "Unit": "cache",
          "EventName": "L1-dcache-prefetches",
          "EventType": "Hardware cache event"
  },
  {
          "Unit": "cache",
          "EventName": "L1-icache-load-misses",
          "EventType": "Hardware cache event"
  },
  {
          "Unit": "cache",
          "EventName": "L1-icache-loads",
          "EventType": "Hardware cache event"
  },
  {
          "Unit": "cache",
          "EventName": "branch-load-misses",
          "EventType": "Hardware cache event"
  },
  {
          "Unit": "cache",
          "EventName": "branch-loads",
          "EventType": "Hardware cache event"
  },
  {
          "Unit": "cache",
          "EventName": "dTLB-load-misses",
          "EventType": "Hardware cache event"
  },
  {
          "Unit": "cache",
          "EventName": "dTLB-loads",
          "EventType": "Hardware cache event"
  },
  {
          "Unit": "cache",
          "EventName": "iTLB-load-misses",
          "EventType": "Hardware cache event"
  },
  {
          "Unit": "cache",
          "EventName": "iTLB-loads",
          "EventType": "Hardware cache event"
  }
  ]
  $

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xin Gao <gaoxin@cdjrlc.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: http://lore.kernel.org/lkml/20221114210723.2749751-11-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-23 10:29:59 -03:00
Ian Rogers
ca0fe62413 perf list: Generalize limiting to a PMU name
Deprecate the --cputype option and add a --unit option where '--unit
cpu_atom' behaves like '--cputype atom'. The --unit option can be used
with arbitrary PMUs, for example:

```
$ perf list --unit msr pmu

List of pre-defined events (to be used in -e or -M):

  msr/aperf/                                         [Kernel PMU event]
  msr/cpu_thermal_margin/                            [Kernel PMU event]
  msr/mperf/                                         [Kernel PMU event]
  msr/pperf/                                         [Kernel PMU event]
  msr/smi/                                           [Kernel PMU event]
  msr/tsc/                                           [Kernel PMU event]
```

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xin Gao <gaoxin@cdjrlc.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: http://lore.kernel.org/lkml/20221114210723.2749751-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-15 10:25:48 -03:00
James Clark
a527c2c1e2 perf tools: Make quiet mode consistent between tools
Use the global quiet variable everywhere so that all tools hide warnings
in quiet mode and update the documentation to reflect this.

'perf probe' claimed that errors are not printed in quiet mode but I
don't see this so remove it from the docs.

Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221018094137.783081-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27 16:37:26 -03:00
Adrian Hunter
ad7ad6b5dd perf scripts python: intel-pt-events.py: Add ability interleave output
Intel PT timestamps are not provided for every branch, let alone every
instruction, so there can be many samples with the same timestamp. With
per-cpu contexts, decoding is done for each CPU in turn, which can make it
difficult to see what is happening on different CPUs at the same time.
Currently the interleaving from perf script --itrace=i0ns is quite coarse
grained. There are often long stretches executing on one CPU and nothing on
another.

Some people are interested in seeing what happened on multiple CPUs before
a crash to debug races etc.

To improve perf script interleaving for parallel execution, the
intel-pt-events.py script has been enhanced to enable interleaving the
output with the same timestamp from different CPUs. It is understood that
interleaving is not perfect or causal.

Add parameter --interleave [<n>] to interleave sample output for the same
timestamp so that no more than n samples for a CPU are displayed in a row.
'n' defaults to 4. Note this only affects the order of output, and only
when the timestamp is the same.

Example:

  $ perf script intel-pt-events.py --insn-trace --interleave 3
  ...
  bash  2267/2267  [004]  9323.692625625  563caa3c86f0  jz 0x563caa3c89c7        run_pending_traps+0x30 (/usr/bin/bash)   IPC: 1.52 (38/25)
  bash  2267/2267  [004]  9323.692625625  563caa3c89c7  movq  0x118(%rsp), %rax  run_pending_traps+0x307 (/usr/bin/bash)
  bash  2267/2267  [004]  9323.692625625  563caa3c89cf  subq  %fs:0x28, %rax     run_pending_traps+0x30f (/usr/bin/bash)
  bash  2270/2270  [007]  9323.692625625  55dc58cabf02  jz 0x55dc58cabf48        unquoted_glob_pattern_p+0x102 (/usr/bin/bash)   IPC: 1.56 (25/16)
  bash  2270/2270  [007]  9323.692625625  55dc58cabf04  cmp $0x5d, %al           unquoted_glob_pattern_p+0x104 (/usr/bin/bash)
  bash  2270/2270  [007]  9323.692625625  55dc58cabf06  jnz 0x55dc58cabf10       unquoted_glob_pattern_p+0x106 (/usr/bin/bash)
  bash  2264/2264  [001]  9323.692625625  7fd556a4376c  jbe 0x7fd556a43ac8       round_and_return+0x3fc (/usr/lib/x86_64-linux-gnu/libc.so.6)   IPC: 4.30 (43/10)
  bash  2264/2264  [001]  9323.692625625  7fd556a43772  and $0x8, %edx           round_and_return+0x402 (/usr/lib/x86_64-linux-gnu/libc.so.6)
  bash  2264/2264  [001]  9323.692625625  7fd556a43775  jnz 0x7fd556a43ac8       round_and_return+0x405 (/usr/lib/x86_64-linux-gnu/libc.so.6)
  bash  2267/2267  [004]  9323.692625625  563caa3c89d8  jnz 0x563caa3c8b11       run_pending_traps+0x318 (/usr/bin/bash)
  bash  2267/2267  [004]  9323.692625625  563caa3c89de  add $0x128, %rsp         run_pending_traps+0x31e (/usr/bin/bash)
  bash  2267/2267  [004]  9323.692625625  563caa3c89e5  popq  %rbx               run_pending_traps+0x325 (/usr/bin/bash)
  ...

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221020152509.5298-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27 16:37:26 -03:00
Adrian Hunter
231e61bc2e perf docs: Fix man page build wrt perf-arm-coresight.txt
perf build assumes documentation files starting with "perf-" are man
pages but perf-arm-coresight.txt is not a man page:

  asciidoc: ERROR: perf-arm-coresight.txt: line 2: malformed manpage title
  asciidoc: ERROR: perf-arm-coresight.txt: line 3: name section expected
  asciidoc: FAILED: perf-arm-coresight.txt: line 3: section title expected
  make[3]: *** [Makefile:266: perf-arm-coresight.xml] Error 1
  make[3]: *** Waiting for unfinished jobs....
  make[2]: *** [Makefile.perf:895: man] Error 2

Fix by renaming it.

Fixes: dc2e0fb00b ("perf test coresight: Add relevant documentation about ARM64 CoreSight testing")
Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: coresight@lists.linaro.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/a176a3e1-6ddc-bb63-e41c-15cda8c2d5d2@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25 17:40:48 -03:00
Ravi Bangoria
f7b58cbdb3 perf mem/c2c: Add load store event mappings for AMD
The 'perf mem' and 'perf c2c' tools are wrappers around 'perf record'
with mem load/ store events. IBS tagged load/store sample provides most
of the information needed for these tools. Wire in the "ibs_op//" event
as mem-ldst event for AMD.

There are some limitations though: Only load/store micro-ops provide
mem/c2c information. Whereas, IBS does not have a way to choose a
particular type of micro-op to tag. This results in many non-LS
micro-ops being tagged which appear as N/A in the perf report. IBS,
being an uncore pmu from kernel point of view[1], does not support per
process monitoring. Thus, perf mem/c2c on AMD are currently supported in
per-cpu mode only.

Example:

  $ sudo perf mem record -- -c 10000
  ^C[ perf record: Woken up 227 times to write data ]
  [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

  $ sudo perf mem report -F mem,sample,snoop
  Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
  Memory access                  Samples  Snoop
  N/A                             700620  N/A
  L1 hit                          126675  N/A
  L2 hit                             424  N/A
  L3 hit                             664  HitM
  L3 hit                              10  N/A
  Local RAM hit                        2  N/A
  Remote RAM (1 hop) hit            8558  N/A
  Remote Cache (1 hop) hit             3  N/A
  Remote Cache (1 hop) hit             2  HitM
  Remote Cache (2 hops) hit           10  HitM
  Remote Cache (2 hops) hit            6  N/A
  Uncached hit                         4  N/A
  $

[1]: https://lore.kernel.org/lkml/20220829113347.295-1-ravi.bangoria@amd.com

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-6-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06 16:30:06 -03:00
Ravi Bangoria
4173cc055d perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
Currently perf sets PERF_SAMPLE_WEIGHT flag only for mem load events.
Set it for combined load-store event as well which will enable recording
of load latency by default on arch that does not support independent
mem load event.

Also document missing -W in perf-record man page.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-5-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06 16:29:32 -03:00