linux-stable/tools/perf
Namhyung Kim 601366678c perf data: Allow to use stdio functions for pipe mode
When perf data is in a pipe, it reads each event separately using
read(2) syscall.  This is a huge performance bottleneck when
processing large data like in perf inject.  Also perf inject needs to
use write(2) syscall for the output.

So convert it to use buffer I/O functions in stdio library for pipe
data.  This makes inject-build-id bench time drops from 20ms to 8ms.

  $ perf bench internals inject-build-id
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.074 msec (+- 0.013 msec)
    Average time per event: 0.792 usec (+- 0.001 usec)
    Average memory usage: 8328 KB (+- 0 KB)
    Average build-id-all injection took: 5.490 msec (+- 0.008 msec)
    Average time per event: 0.538 usec (+- 0.001 usec)
    Average memory usage: 7563 KB (+- 0 KB)

This patch enables it just for perf inject when used with pipe (it's a
default behavior).  Maybe we could do it for perf record and/or report
later..

Committer testing:

Before:

  $ perf stat -r 5 perf bench internals inject-build-id
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 13.605 msec (+- 0.064 msec)
    Average time per event: 1.334 usec (+- 0.006 usec)
    Average memory usage: 12220 KB (+- 7 KB)
    Average build-id-all injection took: 11.458 msec (+- 0.058 msec)
    Average time per event: 1.123 usec (+- 0.006 usec)
    Average memory usage: 11546 KB (+- 8 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 13.673 msec (+- 0.057 msec)
    Average time per event: 1.341 usec (+- 0.006 usec)
    Average memory usage: 12508 KB (+- 8 KB)
    Average build-id-all injection took: 11.437 msec (+- 0.046 msec)
    Average time per event: 1.121 usec (+- 0.004 usec)
    Average memory usage: 11812 KB (+- 7 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 13.641 msec (+- 0.069 msec)
    Average time per event: 1.337 usec (+- 0.007 usec)
    Average memory usage: 12302 KB (+- 8 KB)
    Average build-id-all injection took: 10.820 msec (+- 0.106 msec)
    Average time per event: 1.061 usec (+- 0.010 usec)
    Average memory usage: 11616 KB (+- 7 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 13.379 msec (+- 0.074 msec)
    Average time per event: 1.312 usec (+- 0.007 usec)
    Average memory usage: 12334 KB (+- 8 KB)
    Average build-id-all injection took: 11.288 msec (+- 0.071 msec)
    Average time per event: 1.107 usec (+- 0.007 usec)
    Average memory usage: 11657 KB (+- 8 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 13.534 msec (+- 0.058 msec)
    Average time per event: 1.327 usec (+- 0.006 usec)
    Average memory usage: 12264 KB (+- 8 KB)
    Average build-id-all injection took: 11.557 msec (+- 0.076 msec)
    Average time per event: 1.133 usec (+- 0.007 usec)
    Average memory usage: 11593 KB (+- 8 KB)

   Performance counter stats for 'perf bench internals inject-build-id' (5 runs):

            4,060.05 msec task-clock:u              #    1.566 CPUs utilized            ( +-  0.65% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             101,888      page-faults:u             #    0.025 M/sec                    ( +-  0.12% )
       3,745,833,163      cycles:u                  #    0.923 GHz                      ( +-  0.10% )  (83.22%)
         194,346,613      stalled-cycles-frontend:u #    5.19% frontend cycles idle     ( +-  0.57% )  (83.30%)
         708,495,034      stalled-cycles-backend:u  #   18.91% backend cycles idle      ( +-  0.48% )  (83.48%)
       5,629,328,628      instructions:u            #    1.50  insn per cycle
                                                    #    0.13  stalled cycles per insn  ( +-  0.21% )  (83.57%)
       1,236,697,927      branches:u                #  304.602 M/sec                    ( +-  0.16% )  (83.44%)
          17,564,877      branch-misses:u           #    1.42% of all branches          ( +-  0.23% )  (82.99%)

              2.5934 +- 0.0128 seconds time elapsed  ( +-  0.49% )

  $

After:

  $ perf stat -r 5 perf bench internals inject-build-id
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.560 msec (+- 0.125 msec)
    Average time per event: 0.839 usec (+- 0.012 usec)
    Average memory usage: 12520 KB (+- 8 KB)
    Average build-id-all injection took: 5.789 msec (+- 0.054 msec)
    Average time per event: 0.568 usec (+- 0.005 usec)
    Average memory usage: 11919 KB (+- 9 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.639 msec (+- 0.111 msec)
    Average time per event: 0.847 usec (+- 0.011 usec)
    Average memory usage: 12732 KB (+- 8 KB)
    Average build-id-all injection took: 5.647 msec (+- 0.069 msec)
    Average time per event: 0.554 usec (+- 0.007 usec)
    Average memory usage: 12093 KB (+- 7 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.551 msec (+- 0.096 msec)
    Average time per event: 0.838 usec (+- 0.009 usec)
    Average memory usage: 12739 KB (+- 8 KB)
    Average build-id-all injection took: 5.617 msec (+- 0.061 msec)
    Average time per event: 0.551 usec (+- 0.006 usec)
    Average memory usage: 12105 KB (+- 7 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.403 msec (+- 0.097 msec)
    Average time per event: 0.824 usec (+- 0.010 usec)
    Average memory usage: 12770 KB (+- 8 KB)
    Average build-id-all injection took: 5.611 msec (+- 0.085 msec)
    Average time per event: 0.550 usec (+- 0.008 usec)
    Average memory usage: 12134 KB (+- 8 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.518 msec (+- 0.102 msec)
    Average time per event: 0.835 usec (+- 0.010 usec)
    Average memory usage: 12518 KB (+- 10 KB)
    Average build-id-all injection took: 5.503 msec (+- 0.073 msec)
    Average time per event: 0.540 usec (+- 0.007 usec)
    Average memory usage: 11882 KB (+- 8 KB)

   Performance counter stats for 'perf bench internals inject-build-id' (5 runs):

            2,394.88 msec task-clock:u              #    1.577 CPUs utilized            ( +-  0.83% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             103,181      page-faults:u             #    0.043 M/sec                    ( +-  0.11% )
       3,548,172,030      cycles:u                  #    1.482 GHz                      ( +-  0.30% )  (83.26%)
          81,537,700      stalled-cycles-frontend:u #    2.30% frontend cycles idle     ( +-  1.54% )  (83.24%)
         876,631,544      stalled-cycles-backend:u  #   24.71% backend cycles idle      ( +-  1.14% )  (83.45%)
       5,960,361,707      instructions:u            #    1.68  insn per cycle
                                                    #    0.15  stalled cycles per insn  ( +-  0.27% )  (83.26%)
       1,269,413,491      branches:u                #  530.054 M/sec                    ( +-  0.10% )  (83.48%)
          11,372,453      branch-misses:u           #    0.90% of all branches          ( +-  0.52% )  (83.31%)

             1.51874 +- 0.00642 seconds time elapsed  ( +-  0.42% )

  $

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201030054742.87740-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-16 13:37:28 -03:00
..
arch perf mem: Support ARM SPE events 2020-11-11 12:27:37 -03:00
bench perf tools changes for v5.10: 1st batch 2020-10-17 11:47:46 -07:00
Documentation perf auxtrace: Add itrace option '-M' for memory events 2020-11-11 12:24:51 -03:00
examples/bpf
include/bpf
jvmti perf jvmti: Remove redundant jitdump line table entries 2020-05-29 16:51:38 -03:00
pmu-events perf jevents: Add test for arch std events 2020-11-04 09:42:41 -03:00
python
scripts perf script: Add min, max to futex-contention output, in addition to avg 2020-09-23 12:58:53 -03:00
tests perf jevents: Add test for arch std events 2020-11-04 09:42:41 -03:00
trace perf trace beauty: Allow header files in a different path 2020-11-04 09:42:41 -03:00
ui perf hists browser: Increase size of 'buf' in perf_evsel__hists_browse() 2020-11-03 09:11:45 -03:00
util perf data: Allow to use stdio functions for pipe mode 2020-11-16 13:37:28 -03:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
Build
builtin-annotate.c perf evsel: Rename perf_evsel__resort*() to evsel__resort*() 2020-05-28 10:03:24 -03:00
builtin-bench.c perf bench: Add build-id injection benchmark 2020-10-13 10:59:42 -03:00
builtin-buildid-cache.c perf tools: Pass build_id object to build_id__sprintf() 2020-10-14 08:46:22 -03:00
builtin-buildid-list.c
builtin-c2c.c perf c2c: Support AUX trace 2020-11-11 12:27:06 -03:00
builtin-config.c
builtin-data.c perf data: Add support to store time of day in CTF data conversion 2020-08-06 09:43:37 -03:00
builtin-diff.c perf diff: Support hot streams comparison 2020-10-14 13:34:48 -03:00
builtin-evlist.c perf evsel: Rename perf_evsel__fprintf() to evsel__fprintf() 2020-05-28 10:03:24 -03:00
builtin-ftrace.c perf: ftrace: Add filter support for option -F/--funcs 2020-09-04 16:11:16 -03:00
builtin-help.c
builtin-inject.c perf data: Allow to use stdio functions for pipe mode 2020-11-16 13:37:28 -03:00
builtin-kallsyms.c
builtin-kmem.c perf kmem: Pass additional arguments to 'perf record' 2020-07-10 09:37:55 -03:00
builtin-kvm.c perf evlist: Fix the class prefix for 'struct evlist' 'add' evsel methods 2020-06-22 16:28:09 -03:00
builtin-list.c perf list: Remove dead code in argument check 2020-09-09 11:12:10 -03:00
builtin-lock.c perf lock: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*() 2020-05-05 16:35:31 -03:00
builtin-mem.c perf mem: Support AUX trace 2020-11-11 12:25:45 -03:00
builtin-probe.c perf probe: Do not show the skipped events 2020-05-28 10:03:24 -03:00
builtin-record.c perf tools: Consolidate close_control_option()'s into one function 2020-09-04 16:11:16 -03:00
builtin-report.c perf report: Disable ordered_events for raw dump 2020-09-01 12:20:25 -03:00
builtin-sched.c perf sched: Show start of latency as well 2020-10-13 11:01:42 -03:00
builtin-script.c perf script: Display negative tid in non-sample events 2020-09-17 16:06:22 -03:00
builtin-stat.c perf stat: Add --quiet option 2020-11-04 09:42:41 -03:00
builtin-timechart.c perf tools: Replace zero-length array with flexible-array 2020-05-28 10:03:27 -03:00
builtin-top.c perf top: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not set 2020-08-21 10:22:23 -03:00
builtin-trace.c perf trace: Fix segfault when trying to trace events by cgroup 2020-11-03 08:31:03 -03:00
builtin-version.c perf version: Add a feature for libpfm4 2020-11-04 09:42:40 -03:00
builtin.h
check-headers.sh perf tools: Separate the checking of headers only used to build beautification tables 2020-09-29 08:56:38 -03:00
command-list.txt
CREDITS
design.txt perf tools: Support CAP_PERFMON capability 2020-04-16 12:19:08 -03:00
Makefile
Makefile.config perf tools: Remove LTO compiler options when building perl support 2020-11-03 08:24:54 -03:00
Makefile.perf perf trace: Use the autogenerated mmap 'prot' string/id table 2020-10-01 11:35:01 -03:00
MANIFEST
perf-archive.sh
perf-completion.sh
perf-read-vdso.c
perf-sys.h perf tests: Call test_attr__open() directly 2020-09-10 11:55:37 -03:00
perf-with-kcore.sh
perf.c
perf.h