Commit Graph

8739 Commits

Author SHA1 Message Date
Linus Torvalds 7ab89417ed perf tools changes for v6.7
Build
 -----
 * Compile BPF programs by default if clang (>= 12.0.1) is available to
   enable more features like kernel lock contention, off-cpu profiling,
   kwork, sample filtering and so on.  It can be disabled by passing
   BUILD_BPF_SKEL=0 to make.
 
 * Produce better error messages for bison on debug build (make DEBUG=1)
   by defining YYDEBUG symbol internally.
 
 perf record
 -----------
 * Track sideband events (like FORK/MMAP) from all CPUs even if perf record
   targets a subset of CPUs only (using -C option).  Otherwise it may lose
   some information happened on a CPU out of the target list.
 
 * Fix checking raw sched_switch tracepoint argument using system BTF.
   This affects off-cpu profiling which attaches a BPF program to the raw
   tracepoint.
 
 perf lock contention
 --------------------
 * Add --lock-cgroup option to see contention by cgroups.  This should be
   used with BPF only (using -b option).
 
     $ sudo perf lock con -ab --lock-cgroup -- sleep 1
      contended   total wait     max wait     avg wait   cgroup
 
            835     14.06 ms     41.19 us     16.83 us   /system.slice/led.service
             25    122.38 us     13.77 us      4.89 us   /
             44     23.73 us      3.87 us       539 ns   /user.slice/user-657345.slice/session-c4.scope
              1       491 ns       491 ns       491 ns   /system.slice/connectd.service
 
 * Add -G/--cgroup-filter option to see contention only for given cgroups.
   This can be useful when you identified a cgroup in the above command and
   want to investigate more on it.  It also works with other output options
   like -t/--threads and -l/--lock-addr.
 
     $ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1
      contended   total wait     max wait     avg wait         type   caller
 
              8     77.11 us     17.98 us      9.64 us     spinlock   futex_wake+0xc8
              2     24.56 us     14.66 us     12.28 us     spinlock   tick_do_update_jiffies64+0x25
              1      4.97 us      4.97 us      4.97 us     spinlock   futex_q_lock+0x2a
 
 * Use per-cpu array for better spinlock tracking.  This is to improve
   performance of the BPF program and to avoid nested contention on a lock
   in the BPF hash map.
 
 * Update callstack check for PowerPC.  To find a representative caller of a
   lock, it needs to look up the call stacks.  It ends the lookup when it sees
   0 in the call stack buffer.  However, PowerPC call stacks can have 0 values
   in the beginning so skip them when it expects valid call stacks after.
 
 perf kwork
 ----------
 * Support 'sched' class (for -k option) so that it can see task scheduling
   event (using sched_switch tracepoint) as well as irq and workqueue items.
 
 * Add perf kwork top subcommand to show more accurate cpu utilization with
   sched class above.  It works both with a recorded data (using perf kwork
   record command) and BPF (using -b option).  Unlike perf top command, it
   does not support interactive mode (yet).
 
     $ sudo perf kwork top -b -k sched
     Starting trace, Hit <Ctrl+C> to stop and report
     ^C
     Total  : 160702.425 ms, 8 cpus
     %Cpu(s):  36.00% id,   0.00% hi,   0.00% si
     %Cpu0   [||||||||||||||||||              61.66%]
     %Cpu1   [||||||||||||||||||              61.27%]
     %Cpu2   [|||||||||||||||||||             66.40%]
     %Cpu3   [||||||||||||||||||              61.28%]
     %Cpu4   [||||||||||||||||||              61.82%]
     %Cpu5   [|||||||||||||||||||||||         77.41%]
     %Cpu6   [||||||||||||||||||              61.73%]
     %Cpu7   [||||||||||||||||||              63.25%]
 
           PID     SPID    %CPU           RUNTIME  COMMMAND
       -------------------------------------------------------------
             0        0   38.72       8089.463 ms  [swapper/1]
             0        0   38.71       8084.547 ms  [swapper/3]
             0        0   38.33       8007.532 ms  [swapper/0]
             0        0   38.26       7992.985 ms  [swapper/6]
             0        0   38.17       7971.865 ms  [swapper/4]
             0        0   36.74       7447.765 ms  [swapper/7]
             0        0   33.59       6486.942 ms  [swapper/2]
             0        0   22.58       3771.268 ms  [swapper/5]
          9545     9351    2.48        447.136 ms  sched-messaging
          9574     9351    2.09        418.583 ms  sched-messaging
          9724     9351    2.05        372.407 ms  sched-messaging
          9531     9351    2.01        368.804 ms  sched-messaging
          9512     9351    2.00        362.250 ms  sched-messaging
          9514     9351    1.95        357.767 ms  sched-messaging
          9538     9351    1.86        384.476 ms  sched-messaging
          9712     9351    1.84        386.490 ms  sched-messaging
          9723     9351    1.83        380.021 ms  sched-messaging
          9722     9351    1.82        382.738 ms  sched-messaging
          9517     9351    1.81        354.794 ms  sched-messaging
          9559     9351    1.79        344.305 ms  sched-messaging
          9725     9351    1.77        365.315 ms  sched-messaging
     <SNIP>
 
 * Add hard/soft-irq statistics to perf kwork top.  This will show the
   total CPU utilization with IRQ stats like below:
 
     $ sudo perf kwork top -b -k sched,irq,softirq
     Starting trace, Hit <Ctrl+C> to stop and report
     ^C
     Total  :  12554.889 ms, 8 cpus
     %Cpu(s):  96.23% id,   0.10% hi,   0.19% si      <---- here
     %Cpu0   [|                                4.60%]
     %Cpu1   [|                                4.59%]
     %Cpu2   [                                 2.73%]
     %Cpu3   [|                                3.81%]
     <SNIP>
 
 perf bench
 ----------
 * Add -G/--cgroups option to perf bench sched pipe.  The pipe bench is
   good to measure context switch overhead.  With this option, it puts
   the reader and writer tasks in separate cgroups to enforce context
   switch between two different cgroups.
 
   Also it needs to set CPU affinity of the tasks in a CPU to accurately
   measure the impact of cgroup context switches.
 
     $ sudo perf stat -e context-switches,cgroup-switches -- \
     > taskset -c 0 perf bench sched pipe -l 100000
     # Running 'sched/pipe' benchmark:
     # Executed 100000 pipe operations between two processes
 
          Total time: 0.307 [sec]
 
            3.078180 usecs/op
              324867 ops/sec
 
      Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000':
 
                200,026      context-switches
                     63      cgroup-switches
 
            0.321637922 seconds time elapsed
 
   You can see small number of cgroup-switches because both write and read
   tasks are in the same cgroup.
 
     $ sudo mkdir /sys/fs/cgroup/{AAA,BBB}
 
     $ sudo perf stat -e context-switches,cgroup-switches -- \
     > taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB
     # Running 'sched/pipe' benchmark:
     # Executed 100000 pipe operations between two processes
 
          Total time: 0.351 [sec]
 
            3.512990 usecs/op
              284657 ops/sec
 
      Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB':
 
                200,020      context-switches
                200,019      cgroup-switches
 
            0.365034567 seconds time elapsed
 
   Now context-switches and cgroup-switches are almost same.  And you can
   see the pipe operation took little more.
 
 * Kill child processes when perf bench sched messaging exited abnormally.
   Otherwise it'd leave the child doing unnecessary work.
 
 perf test
 ---------
 * Fix various shellcheck issues on the tests written in shell script.
 
 * Skip tests when condition is not satisfied:
   - object code reading test for non-text section addresses.
   - CoreSight test if cs_etm// event is not available.
   - lock contention test if not enough CPUs.
 
 Event parsing
 -------------
 * Make PMU alias name loading lazy to reduce the startup time in the
   event parsing code for perf record, stat and others in the general
   case.
 
 * Lazily compute PMU default config.  In the same sense, delay PMU
   initialization until it's really needed to reduce the startup cost.
 
 * Fix event term values that are raw events.  The event specification
   can have several terms including event name.  But sometimes it clashes
   with raw event encoding which starts with 'r' and has hex-digits.
 
   For example, an event named 'read' should be processed as a normal
   event but it was mis-treated as a raw encoding and caused a failure.
 
     $ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
     event syntax error: '..nning/event=read/'
                                       \___ parser error
     Run 'perf list' for a list of valid events
 
      Usage: perf stat [<options>] [<command>]
 
         -e, --event <event> event selector. use 'perf list' to list available events
 
 Event metrics
 -------------
 * Add "Compat" regex to match event with multiple identifiers.
 
 * Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne.
 
 Misc
 ----
 * Assorted memory leak fixes and footprint reduction.
 
 * Add "bpf_skeletons" to perf version --build-options so that users can
   check whether their perf tools have BPF support easily.
 
 * Fix unaligned access in Intel-PT packet decoder found by undefined-behavior
   sanitizer.
 
 * Avoid frequency mode for the dummy event.  Surprisingly it'd impact
   kernel timer tick handler performance by force iterating all PMU events.
 
 * Update bash shell completion for events and metrics.
 
 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZUMg7wAKCRCMstVUGiXM
 g8FvAQC9KED6H8rlH7UTvxE6fM947EJbldwGrNA1zGx++Ucd3gD/ewA2A6SUcIh6
 Tua/XovmYOQbuDYOwlRHe+sdDag0sgg=
 =GrCE
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Namhyung Kim:
 "Build:

   - Compile BPF programs by default if clang (>= 12.0.1) is available
     to enable more features like kernel lock contention, off-cpu
     profiling, kwork, sample filtering and so on.

     This can be disabled by passing BUILD_BPF_SKEL=0 to make.

   - Produce better error messages for bison on debug build (make
     DEBUG=1) by defining YYDEBUG symbol internally.

  perf record:

   - Track sideband events (like FORK/MMAP) from all CPUs even if perf
     record targets a subset of CPUs only (using -C option). Otherwise
     it may lose some information happened on a CPU out of the target
     list.

   - Fix checking raw sched_switch tracepoint argument using system BTF.
     This affects off-cpu profiling which attaches a BPF program to the
     raw tracepoint.

  perf lock contention:

   - Add --lock-cgroup option to see contention by cgroups. This should
     be used with BPF only (using -b option).

       $ sudo perf lock con -ab --lock-cgroup -- sleep 1
        contended   total wait     max wait     avg wait   cgroup

              835     14.06 ms     41.19 us     16.83 us   /system.slice/led.service
               25    122.38 us     13.77 us      4.89 us   /
               44     23.73 us      3.87 us       539 ns   /user.slice/user-657345.slice/session-c4.scope
                1       491 ns       491 ns       491 ns   /system.slice/connectd.service

   - Add -G/--cgroup-filter option to see contention only for given
     cgroups.

     This can be useful when you identified a cgroup in the above
     command and want to investigate more on it. It also works with
     other output options like -t/--threads and -l/--lock-addr.

       $ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1
        contended   total wait     max wait     avg wait         type   caller

                8     77.11 us     17.98 us      9.64 us     spinlock   futex_wake+0xc8
                2     24.56 us     14.66 us     12.28 us     spinlock   tick_do_update_jiffies64+0x25
                1      4.97 us      4.97 us      4.97 us     spinlock   futex_q_lock+0x2a

   - Use per-cpu array for better spinlock tracking. This is to improve
     performance of the BPF program and to avoid nested contention on a
     lock in the BPF hash map.

   - Update callstack check for PowerPC. To find a representative caller
     of a lock, it needs to look up the call stacks. It ends the lookup
     when it sees 0 in the call stack buffer. However, PowerPC call
     stacks can have 0 values in the beginning so skip them when it
     expects valid call stacks after.

  perf kwork:

   - Support 'sched' class (for -k option) so that it can see task
     scheduling event (using sched_switch tracepoint) as well as irq and
     workqueue items.

   - Add perf kwork top subcommand to show more accurate cpu utilization
     with sched class above. It works both with a recorded data (using
     perf kwork record command) and BPF (using -b option). Unlike perf
     top command, it does not support interactive mode (yet).

       $ sudo perf kwork top -b -k sched
       Starting trace, Hit <Ctrl+C> to stop and report
       ^C
       Total  : 160702.425 ms, 8 cpus
       %Cpu(s):  36.00% id,   0.00% hi,   0.00% si
       %Cpu0   [||||||||||||||||||              61.66%]
       %Cpu1   [||||||||||||||||||              61.27%]
       %Cpu2   [|||||||||||||||||||             66.40%]
       %Cpu3   [||||||||||||||||||              61.28%]
       %Cpu4   [||||||||||||||||||              61.82%]
       %Cpu5   [|||||||||||||||||||||||         77.41%]
       %Cpu6   [||||||||||||||||||              61.73%]
       %Cpu7   [||||||||||||||||||              63.25%]

             PID     SPID    %CPU           RUNTIME  COMMMAND
         -------------------------------------------------------------
               0        0   38.72       8089.463 ms  [swapper/1]
               0        0   38.71       8084.547 ms  [swapper/3]
               0        0   38.33       8007.532 ms  [swapper/0]
               0        0   38.26       7992.985 ms  [swapper/6]
               0        0   38.17       7971.865 ms  [swapper/4]
               0        0   36.74       7447.765 ms  [swapper/7]
               0        0   33.59       6486.942 ms  [swapper/2]
               0        0   22.58       3771.268 ms  [swapper/5]
            9545     9351    2.48        447.136 ms  sched-messaging
            9574     9351    2.09        418.583 ms  sched-messaging
            9724     9351    2.05        372.407 ms  sched-messaging
            9531     9351    2.01        368.804 ms  sched-messaging
            9512     9351    2.00        362.250 ms  sched-messaging
            9514     9351    1.95        357.767 ms  sched-messaging
            9538     9351    1.86        384.476 ms  sched-messaging
            9712     9351    1.84        386.490 ms  sched-messaging
            9723     9351    1.83        380.021 ms  sched-messaging
            9722     9351    1.82        382.738 ms  sched-messaging
            9517     9351    1.81        354.794 ms  sched-messaging
            9559     9351    1.79        344.305 ms  sched-messaging
            9725     9351    1.77        365.315 ms  sched-messaging
       <SNIP>

   - Add hard/soft-irq statistics to perf kwork top. This will show the
     total CPU utilization with IRQ stats like below:

       $ sudo perf kwork top -b -k sched,irq,softirq
       Starting trace, Hit <Ctrl+C> to stop and report
       ^C
       Total  :  12554.889 ms, 8 cpus
       %Cpu(s):  96.23% id,   0.10% hi,   0.19% si      <---- here
       %Cpu0   [|                                4.60%]
       %Cpu1   [|                                4.59%]
       %Cpu2   [                                 2.73%]
       %Cpu3   [|                                3.81%]
       <SNIP>

  perf bench:

   - Add -G/--cgroups option to perf bench sched pipe. The pipe bench is
     good to measure context switch overhead. With this option, it puts
     the reader and writer tasks in separate cgroups to enforce context
     switch between two different cgroups.

     Also it needs to set CPU affinity of the tasks in a CPU to
     accurately measure the impact of cgroup context switches.

       $ sudo perf stat -e context-switches,cgroup-switches -- \
       > taskset -c 0 perf bench sched pipe -l 100000
       # Running 'sched/pipe' benchmark:
       # Executed 100000 pipe operations between two processes

            Total time: 0.307 [sec]

              3.078180 usecs/op
                324867 ops/sec

        Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000':

                  200,026      context-switches
                       63      cgroup-switches

              0.321637922 seconds time elapsed

     You can see small number of cgroup-switches because both write and
     read tasks are in the same cgroup.

       $ sudo mkdir /sys/fs/cgroup/{AAA,BBB}

       $ sudo perf stat -e context-switches,cgroup-switches -- \
       > taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB
       # Running 'sched/pipe' benchmark:
       # Executed 100000 pipe operations between two processes

            Total time: 0.351 [sec]

              3.512990 usecs/op
                284657 ops/sec

        Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB':

                  200,020      context-switches
                  200,019      cgroup-switches

              0.365034567 seconds time elapsed

     Now context-switches and cgroup-switches are almost same. And you
     can see the pipe operation took little more.

   - Kill child processes when perf bench sched messaging exited
     abnormally. Otherwise it'd leave the child doing unnecessary work.

  perf test:

   - Fix various shellcheck issues on the tests written in shell script.

   - Skip tests when condition is not satisfied:
      - object code reading test for non-text section addresses.
      - CoreSight test if cs_etm// event is not available.
      - lock contention test if not enough CPUs.

  Event parsing:

   - Make PMU alias name loading lazy to reduce the startup time in the
     event parsing code for perf record, stat and others in the general
     case.

   - Lazily compute PMU default config. In the same sense, delay PMU
     initialization until it's really needed to reduce the startup cost.

   - Fix event term values that are raw events. The event specification
     can have several terms including event name. But sometimes it
     clashes with raw event encoding which starts with 'r' and has
     hex-digits.

     For example, an event named 'read' should be processed as a normal
     event but it was mis-treated as a raw encoding and caused a
     failure.

       $ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
       event syntax error: '..nning/event=read/'
                                         \___ parser error
       Run 'perf list' for a list of valid events

        Usage: perf stat [<options>] [<command>]

           -e, --event <event> event selector. use 'perf list' to list available events

  Event metrics:

   - Add "Compat" regex to match event with multiple identifiers.

   - Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne.

  Misc:

   - Assorted memory leak fixes and footprint reduction.

   - Add "bpf_skeletons" to perf version --build-options so that users
     can check whether their perf tools have BPF support easily.

   - Fix unaligned access in Intel-PT packet decoder found by
     undefined-behavior sanitizer.

   - Avoid frequency mode for the dummy event. Surprisingly it'd impact
     kernel timer tick handler performance by force iterating all PMU
     events.

   - Update bash shell completion for events and metrics"

* tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (187 commits)
  perf vendor events intel: Update tsx_cycles_per_elision metrics
  perf vendor events intel: Update bonnell version number to v5
  perf vendor events intel: Update westmereex events to v4
  perf vendor events intel: Update meteorlake events to v1.06
  perf vendor events intel: Update knightslanding events to v16
  perf vendor events intel: Add typo fix for ivybridge FP
  perf vendor events intel: Update a spelling in haswell/haswellx
  perf vendor events intel: Update emeraldrapids to v1.01
  perf vendor events intel: Update alderlake/alderlake events to v1.23
  perf build: Disable BPF skeletons if clang version is < 12.0.1
  perf callchain: Fix spelling mistake "statisitcs" -> "statistics"
  perf report: Fix spelling mistake "heirachy" -> "hierarchy"
  perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit()
  perf tests: test_arm_coresight: Simplify source iteration
  perf vendor events intel: Add tigerlake two metrics
  perf vendor events intel: Add broadwellde two metrics
  perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric
  perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit
  perf callchain: Minor layout changes to callchain_list
  perf callchain: Make brtype_stat in callchain_list optional
  ...
2023-11-03 08:17:38 -10:00
Paolo Bonzini 45b890f768 KVM/arm64 updates for 6.7
- Generalized infrastructure for 'writable' ID registers, effectively
    allowing userspace to opt-out of certain vCPU features for its guest
 
  - Optimization for vSGI injection, opportunistically compressing MPIDR
    to vCPU mapping into a table
 
  - Improvements to KVM's PMU emulation, allowing userspace to select
    the number of PMCs available to a VM
 
  - Guest support for memory operation instructions (FEAT_MOPS)
 
  - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
    bugs and getting rid of useless code
 
  - Changes to the way the SMCCC filter is constructed, avoiding wasted
    memory allocations when not in use
 
  - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
    the overhead of errata mitigations
 
  - Miscellaneous kernel and selftest fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZUFJRgAKCRCivnWIJHzd
 FtgYAP9cMsc1Mhlw3jNQnTc6q0cbTulD/SoEDPUat1dXMqjs+gEAnskwQTrTX834
 fgGQeCAyp7Gmar+KeP64H0xm8kPSpAw=
 =R4M7
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.7

 - Generalized infrastructure for 'writable' ID registers, effectively
   allowing userspace to opt-out of certain vCPU features for its guest

 - Optimization for vSGI injection, opportunistically compressing MPIDR
   to vCPU mapping into a table

 - Improvements to KVM's PMU emulation, allowing userspace to select
   the number of PMCs available to a VM

 - Guest support for memory operation instructions (FEAT_MOPS)

 - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
   bugs and getting rid of useless code

 - Changes to the way the SMCCC filter is constructed, avoiding wasted
   memory allocations when not in use

 - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
   the overhead of errata mitigations

 - Miscellaneous kernel and selftest fixes
2023-10-31 16:37:07 -04:00
Namhyung Kim fed3a1be64 perf tools fixes for v6.6: 2nd batch
- Fix regression in reading scale and unit files from sysfs for PMU
   events, so that we can use that info to pretty print instead of
   printing raw numbers:
 
   # perf stat -e power/energy-ram/,power/energy-gpu/ sleep 2
 
    Performance counter stats for 'system wide':
 
               1.64 Joules power/energy-ram/
               0.20 Joules power/energy-gpu/
 
        2.001228914 seconds time elapsed
   #
   # grep -m1 "model name" /proc/cpuinfo
   model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
   #
 
 - The small llvm.cpp file used to check if the llvm devel files are present was
   incorrectly deleted when removing the BPF event in 'perf trace', put it back
   as it is also used by tools/bpf/bpftool, that uses llvm routines to do
   disassembly of BPF object files.
 
 - Fix use of addr_location__exit() in dlfilter__object_code(), making sure that
   it is only used to pair a previous addr_location__init() call.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZTKh5AAKCRCyPKLppCJ+
 J/g/AP0f6SNyHJz21JzDTzyjXAeSdMzKwic0LXv+kATQy31HJAD+Kf7UKQieUeZB
 fxvp60aKyFN8IVIgpYiAjZMS3k9XPAY=
 =N7Gv
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.6-2-2023-10-20' into perf-tools-next

To get the latest fixes in the perf tools including perf stat output,
dlfilter and LLVM feature detection.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-30 13:46:27 -07:00
Colin Ian King ee40490dd7 perf callchain: Fix spelling mistake "statisitcs" -> "statistics"
There are a couple of spelling mistakes in perror messages. Fix them.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: kernel-janitors@vger.kernel.org
Link: https://lore.kernel.org/r/20231027084633.1167530-1-colin.i.king@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-27 19:26:13 -07:00
Arnaldo Carvalho de Melo 93c65d6143 perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit()
The changes in ("perf evsel: Rename evsel__increase_rlimit to
rlimit__increase_nofile") ended up breaking the python binding that now
references the rlimit__increase_nofile function, add the util/rlimit.o
to the tools/perf/util/python-ext-sources to cure that.

This was detected by the 'perf test python' regression test:

  $ perf test python
   14: 'import perf' in python        : FAILED!

  $ perf test -v python
  Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
   14: 'import perf' in python                                         :
  --- start ---
  test child forked, pid 2912462
  python usage test: "echo "import sys ; sys.path.insert(0, '/tmp/build/perf-tools-next/python'); import perf" | '/usr/bin/python3' "
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ImportError: /tmp/build/perf-tools-next/python/perf.cpython-311-x86_64-linux-gnu.so: undefined symbol: rlimit__increase_nofile
  test child finished with -1
  ---- end ----
  'import perf' in python: FAILED!
  $

Fixes: e093a222d7 ("perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/lkml/ZTrCS5Z3PZAmfPdV@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-27 19:23:52 -07:00
Ian Rogers 56e144fe98 perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit
Fix leak where mem_info__put wouldn't release the maps/map as used by
perf mem. Add exit functions and use elsewhere that the maps and map
are released.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-12-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:39:58 -07:00
Ian Rogers dec07fe5d4 perf callchain: Minor layout changes to callchain_list
Avoid 6 byte hole for padding. Place more frequently used fields
first in an attempt to use just 1 cacheline in the common case.

Before:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        const char  *              srcline;              /*   104     8 */
        struct list_head           list;                 /*   112    16 */

        /* size: 128, cachelines: 2, members: 13 */
        /* sum members: 122, holes: 1, sum holes: 6 */
};
```

After:
```
struct callchain_list {
        struct list_head           list;                 /*     0    16 */
        u64                        ip;                   /*    16     8 */
        struct map_symbol          ms;                   /*    24    24 */
        const char  *              srcline;              /*    48     8 */
        u64                        branch_count;         /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        from_count;           /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        u64                        predicted_count;      /*   104     8 */
        u64                        abort_count;          /*   112     8 */
        struct {
                _Bool              unfolded;             /*   120     1 */
                _Bool              has_children;         /*   121     1 */
        };                                               /*   120     2 */

        /* size: 128, cachelines: 2, members: 13 */
        /* padding: 6 */
};
```

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:39:32 -07:00
Ian Rogers 6ba29fbb0b perf callchain: Make brtype_stat in callchain_list optional
struct callchain_list is 352bytes in size, 232 of which are
brtype_stat. brtype_stat is only used for certain callchain_list
items so make it optional, allocating when necessary. So that
printing doesn't need to deal with an optional brtype_stat, pass
an empty/zero version.

Before:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat    brtype_stat;          /*    96   232 */
        /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
        const char  *              srcline;              /*   328     8 */
        struct list_head           list;                 /*   336    16 */

        /* size: 352, cachelines: 6, members: 13 */
        /* sum members: 346, holes: 1, sum holes: 6 */
        /* last cacheline: 32 bytes */
};
```

After:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        const char  *              srcline;              /*   104     8 */
        struct list_head           list;                 /*   112    16 */

        /* size: 128, cachelines: 2, members: 13 */
        /* sum members: 122, holes: 1, sum holes: 6 */
};
```

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:39:08 -07:00
Ian Rogers d47d876d72 perf callchain: Make display use of branch_type_stat const
Display code doesn't modify the branch_type_stat so switch uses to
const. This is done to aid refactoring struct callchain_list where
current the branch_type_stat is embedded even if not used.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:38:50 -07:00
Ian Rogers 67a3ebf1c3 perf offcpu: Add missed btf_free
Caught by address/leak sanitizer.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:38:33 -07:00
Ian Rogers 7b2e444b76 perf threads: Remove unused dead thread list
Commit 40826c45eb ("perf thread: Remove notion of dead threads")
removed dead threads but the list head wasn't removed. Remove it here.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:38:09 -07:00
Ian Rogers c1149037f6 perf hist: Add missing puts to hist__account_cycles
Caught using reference count checking on perf top with
"--call-graph=lbr". After this no memory leaks were detected.

Fixes: 57849998e2 ("perf report: Add processing for cycle histograms")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:37:48 -07:00
Ian Rogers 78c32f4cb1 libperf rc_check: Add RC_CHK_EQUAL
Comparing pointers with reference count checking is tricky to avoid a
SEGV. Add a convenience macro to simplify and use.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:37:22 -07:00
Ian Rogers ab8ce15078 perf machine: Avoid out of bounds LBR memory read
Running perf top with address sanitizer and "--call-graph=lbr" fails
due to reading sample 0 when no samples exist. Add a guard to prevent
this.

Fixes: e2b23483eb ("perf machine: Factor out lbr_callchain_add_lbr_ip()")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:36:20 -07:00
Ian Rogers 7a8f349e9d perf rwsem: Add debug mode that uses a mutex
Mutex error check will capture trying to take the lock recursively and
other problems that rwlock won't. At the expense of concurrency, adda
debug mode that uses a mutex in place of a rwsem.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: liuwenyu <liuwenyu7@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20231024222353.3024098-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 13:35:35 -07:00
Namhyung Kim b5711042a1 perf lock contention: Use per-cpu array map for spinlocks
Currently lock contention timestamp is maintained in a hash map keyed by
pid.  That means it needs to get and release a map element (which is
proctected by spinlock!) on each contention begin and end pair.  This
can impact on performance if there are a lot of contention (usually from
spinlocks).

It used to go with task local storage but it had an issue on memory
allocation in some critical paths.  Although it's addressed in recent
kernels IIUC, the tool should support old kernels too.  So it cannot
simply switch to the task local storage at least for now.

As spinlocks create lots of contention and they disabled preemption
during the spinning, it can use per-cpu array to keep the timestamp to
avoid overhead in hashmap update and delete.

In contention_begin, it's easy to check the lock types since it can see
the flags.  But contention_end cannot see it.  So let's try to per-cpu
array first (unconditionally) if it has an active element (lock != 0).
Then it should be used and per-task tstamp map should not be used until
the per-cpu array element is cleared which means nested spinlock
contention (if any) was finished and it nows see (the outer) lock.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-3-namhyung@kernel.org
2023-10-25 10:02:55 -07:00
Namhyung Kim 6a070573f2 perf lock contention: Check race in tstamp elem creation
When pelem is NULL, it'd create a new entry with zero data.  But it
might be preempted by IRQ/NMI just before calling bpf_map_update_elem()
then there's a chance to call it twice for the same pid.  So it'd be
better to use BPF_NOEXIST flag and check the return value to prevent
the race.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-2-namhyung@kernel.org
2023-10-25 10:02:47 -07:00
Namhyung Kim d99317f214 perf lock contention: Clear lock addr after use
It checks the current lock to calculated the delta of contention time.
The address is saved in the tstamp map which is allocated at begining of
contention and released at end of contention.

But it's possible for bpf_map_delete_elem() to fail.  In that case, the
element in the tstamp map kept for the current lock and it makes the
next contention for the same lock tracked incorrectly.  Specificially
the next contention begin will see the existing element for the task and
it'd just return.  Then the next contention end will see the element and
calculate the time using the timestamp for the previous begin.

This can result in a large value for two small contentions happened from
time to time.  Let's clear the lock address so that it can be updated
next time even if the bpf_map_delete_elem() failed.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-1-namhyung@kernel.org
2023-10-25 10:02:34 -07:00
Yang Jihong e093a222d7 perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile
evsel__increase_rlimit() helper does nothing with evsel, and description
of the functionality is inaccurate, rename it and move to util/rlimit.c.

By the way, fix a checkppatch warning about misplaced license tag:

  WARNING: Misplaced SPDX-License-Identifier tag - use line 1 instead
  #160: FILE: tools/perf/util/rlimit.h:3:
  /* SPDX-License-Identifier: LGPL-2.1 */

No functional change.

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20231023033144.1011896-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-25 10:02:11 -07:00
Yang Jihong c4a852635e perf data: Increase RLIMIT_NOFILE limit when open too many files in perf_data__create_dir()
If using parallel threads to collect data, perf record needs at least 6 fds
per CPU. (one for sys_perf_event_open, four for pipe msg and ack of the
pipe, see record__thread_data_open_pipes(), and one for open perf.data.XXX)
For an environment with more than 100 cores, if perf record uses both
`-a` and `--threads` options, it is easy to exceed the upper limit of the
file descriptor number, when we run out of them try to increase the limits.

Before:
  $ ulimit -n
  1024
  $ lscpu | grep 'On-line CPU(s)'
  On-line CPU(s) list:                0-159
  $ perf record --threads -a sleep 1
  Failed to create data directory: Too many open files

After:
  $ ulimit -n
  1024
  $ lscpu | grep 'On-line CPU(s)'
  On-line CPU(s) list:                0-159
  $ perf record --threads -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.394 MB perf.data (1576 samples) ]

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20231013075945.698874-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-19 23:38:27 -07:00
Thomas Richter 5069211e2f perf trace: Use the right bpf_probe_read(_str) variant for reading user data
Perf test case 111 Check open filename arg using perf trace + vfs_getname
fails on s390. This is caused by a failing function
bpf_probe_read() in file util/bpf_skel/augmented_raw_syscalls.bpf.c.

The root cause is the lookup by address. Function bpf_probe_read()
is used. This function works only for architectures
with ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.

On s390 is not possible to determine from the address to which
address space the address belongs to (user or kernel space).

Replace bpf_probe_read() by bpf_probe_read_kernel()
and bpf_probe_read_str() by bpf_probe_read_user_str() to
explicity specify the address space the address refers to.

Output before:
 # ./perf trace -eopen,openat -- touch /tmp/111
 libbpf: prog 'sys_enter': BPF program load failed: Invalid argument
 libbpf: prog 'sys_enter': -- BEGIN PROG LOAD LOG --
 reg type unsupported for arg#0 function sys_enter#75
 0: R1=ctx(off=0,imm=0) R10=fp0
 ; int sys_enter(struct syscall_enter_args *args)
 0: (bf) r6 = r1           ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0)
 ; return bpf_get_current_pid_tgid();
 1: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
 2: (63) *(u32 *)(r10 -8) = r0 ; R0_w=scalar() R10=fp0 fp-8=????mmmm
 3: (bf) r2 = r10              ; R2_w=fp0 R10=fp0
 ;
 .....
 lines deleted here
 .....
 23: (bf) r3 = r6              ; R3_w=ctx(off=0,imm=0) R6=ctx(off=0,imm=0)
 24: (85) call bpf_probe_read#4
 unknown func bpf_probe_read#4
 processed 23 insns (limit 1000000) max_states_per_insn 0 \
	 total_states 2 peak_states 2 mark_read 2
 -- END PROG LOAD LOG --
 libbpf: prog 'sys_enter': failed to load: -22
 libbpf: failed to load object 'augmented_raw_syscalls_bpf'
 libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -22
 ....

Output after:
 # ./perf test -Fv 111
 111: Check open filename arg using perf trace + vfs_getname          :
 --- start ---
     1.085 ( 0.011 ms): touch/320753 openat(dfd: CWD, filename: \
	"/tmp/temporary_file.SWH85", \
	flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
 ---- end ----
 Check open filename arg using perf trace + vfs_getname: Ok
 #

Test with the sleep command shows:
Output before:
 # ./perf trace -e *sleep sleep 1.234567890
     0.000 (1234.681 ms): sleep/63114 clock_nanosleep(rqtp: \
         { .tv_sec: 0, .tv_nsec: 0 }, rmtp: 0x3ffe0979720) = 0
 #

Output after:
 # ./perf trace -e *sleep sleep 1.234567890
     0.000 (1234.686 ms): sleep/64277 clock_nanosleep(rqtp: \
         { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x3fff3df9ea0) = 0
 #

Fixes: 14e4b9f428 ("perf trace: Raw augmented syscalls fix libbpf 1.0+ compatibility")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Co-developed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: sumanthk@linux.ibm.com
Cc: svens@linux.ibm.com
Link: https://lore.kernel.org/r/20231019082642.3286650-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-19 22:41:46 -07:00
Oliver Upton e2bdd172e6 perf build: Generate arm64's sysreg-defs.h and add to include path
Start generating sysreg-defs.h in anticipation of updating sysreg.h to a
version that needs the generated output.

Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18 23:36:25 +00:00
Namhyung Kim 1f36b190ad perf tools: Do not ignore the default vmlinux.h
The recent change made it possible to generate vmlinux.h from BTF and
to ignore the file.  But we also have a minimal vmlinux.h that will be
used by default.  It should not be ignored by GIT.

Fixes: b7a2d774c9 ("perf build: Add ability to build with a generated vmlinux.h")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310110451.rvdUZJEY-lkp@intel.com/
Cc: oe-kbuild-all@lists.linux.dev
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-18 15:35:20 -07:00
Ian Rogers 0197da7aff perf pmu: Lazily compute default config
The default config is computed during creation of the PMU and may do
things like scanning sysfs, when the PMU may just be used as part of
scanning. Change default_config to perf_event_attr_init_default, a
callback that is used when a default config needs initializing. This
avoids holding onto the memory for a perf_event_attr and copying.

On a tigerlake laptop running the pmu-scan benchmark:

Before:
Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 28.780 usec (+- 0.503 usec)
  Average PMU scanning took: 283.480 usec (+- 18.471 usec)
Number of openat syscalls: 30,227

After:
Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 27.880 usec (+- 0.169 usec)
  Average PMU scanning took: 245.260 usec (+- 15.758 usec)
Number of openat syscalls: 28,914

Over 3 runs it is a nearly 12% reduction in execution time and a 4.3%
of openat calls.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Ian Rogers 63883cb063 perf pmu: Const-ify perf_pmu__config_terms
Add const to related APIs, this is so they can be used to default
initialize a perf_event_attr from a const pmu.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Ian Rogers 3a42f4c796 perf pmu: Const-ify file APIs
File APIs don't alter the struct pmu so allow const ones to be passed.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Ian Rogers aa61360155 perf pmu: Rename perf_pmu__get_default_config to perf_pmu__arch_init
Assign default_config as part of the init. perf_pmu__get_default_config
was doing more than just getting the default config and so this is
intended to better align with the code.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20231012175645.1849503-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Adrian Hunter 661ce78105 perf intel-pt: Prefer get_unaligned_le64 to memcpy_le64
Use get_unaligned_le64() instead of memcpy_le64(..., 8) because it produces
simpler code.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-6-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:50 -07:00
Adrian Hunter 3b4fa67fc6 perf intel-pt: Use get_unaligned_le16() etc
Avoid unaligned access by using get_unaligned_le16(), get_unaligned_le32()
and get_unaligned_le64().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-5-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:49 -07:00
Adrian Hunter f058fa5b07 perf intel-pt: Use existing definitions of le16_to_cpu() etc
Use definitions from tools/include/linux/kernel.h

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-4-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:49 -07:00
Adrian Hunter 1d2dbce9bb perf intel-pt: Simplify intel_pt_get_vmcs()
Simplify and remove unnecessary constant expressions.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20231005190451.175568-3-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-17 12:40:49 -07:00
Besar Wicaksono a16afcc58a perf cs-etm: Fix incorrect or missing decoder for raw trace
The decoder creation for raw trace uses metadata from the first CPU.
On per-cpu mode, this metadata is incorrectly used for every decoder.
On per-process/per-thread traces, the first CPU is CPU0. If CPU0 trace
is not enabled, its metadata will be marked unused and the decoder is
not created. Perf report dump skips the decoding part because the
decoder is missing.

To fix this, use metadata of the CPU associated with sample object.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: James Clark <james.clark@arm.com>
Cc: suzuki.poulose@arm.com
Cc: mike.leach@linaro.org
Cc: jonathanh@nvidia.com
Cc: rwiley@nvidia.com
Cc: treding@nvidia.com
Cc: vsethi@nvidia.com
Cc: ywan@nvidia.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: linux-tegra@vger.kernel.org
Link: https://lore.kernel.org/r/20231010234803.5419-1-bwicaksono@nvidia.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:57 -07:00
Ian Rogers b84b3f4792 perf bpf_counter: Fix a few memory leaks
Memory leaks were detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-20-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:57 -07:00
Ian Rogers 1052545017 perf header: Fix various error path memory leaks
Memory leaks were detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-19-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:57 -07:00
Ian Rogers 97fe038374 perf trace-event-info: Avoid passing NULL value to closedir
If opendir failed then closedir was passed NULL which is
erroneous. Caught by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-18-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:57 -07:00
Ian Rogers 7875c72c8b perf parse-events: Fix unlikely memory leak when cloning terms
Add missing free on an error path as detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-16-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:57 -07:00
Ian Rogers 63d471979e perf svghelper: Avoid memory leak
On success path the sib_core and sib_thr values weren't being
freed. Detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-14-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:56 -07:00
Ian Rogers 52a5ad12f2 perf dlfilter: Be defensive against potential NULL dereference
In the unlikely case of having a symbol without a mapping, avoid a
NULL dereference that clang-tidy warns about.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:56 -07:00
Ian Rogers 85f73c377b perf mem-events: Avoid uninitialized read
pmu should be initialized to NULL before perf_pmus__scan loop. Fix and
shrink the scope of pmu at the same time. Issue detected by clang-tidy.

Fixes: 5752c20f37 ("perf mem: Scan all PMUs instead of just core ones")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:56 -07:00
Ian Rogers b3aa09ee78 perf jitdump: Avoid memory leak
jit_repipe_unwinding_info is called in a loop by jit_process_dump,
avoid leaking unwinding_data by free-ing before overwriting. Error
detected by clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:56 -07:00
Ian Rogers e237213670 perf env: Remove unnecessary NULL tests
clang-tidy was warning:
```
util/env.c:334:23: warning: Access to field 'nr_pmu_mappings' results in a dereference of a null pointer (loaded from variable 'env') [clang-analyzer-core.NullDereference]
        env->nr_pmu_mappings = pmu_num;
```

As functions are called potentially when !env was true. This condition
could never be true as it would produce a segv, so remove the
unnecessary NULL tests and silence clang-tidy.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:56 -07:00
Ian Rogers b20576fd7f perf parse-events: Fix for term values that are raw events
Raw events can be strings like 'r0xead' but the 0x is optional so they
can also be 'read'. On IcelakeX uncore_imc_free_running has an event
called 'read' which may be programmed as:
```
$ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
```
However, the PE_RAW type isn't allowed on the right of a term, even
though in this case we just want to interpret it as a string. This
leads to the following error on IcelakeX:
```
$ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1
event syntax error: '..nning/event=read/'
                                  \___ parser error
Run 'perf list' for a list of valid events

 Usage: perf stat [<options>] [<command>]

    -e, --event <event> event selector. use 'perf list' to list available events
```
Fix this by allowing raw types on the right of terms and treat them as
strings, just as is already done for PE_LEGACY_CACHE. Make this
consistent by just entirely removing name_or_legacy and always using
name_or_raw that covers all three cases.

Fixes: 6fd1e51915 ("perf parse-events: Support PMUs for legacy cache events")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230928004431.1926969-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:55 -07:00
Arnaldo Carvalho de Melo 29a2fd7c72 perf symbols: Add 'intel_idle_ibrs' to the list of idle symbols
This is a longstanding to do list entry: we need a way to see that a
sample took place while in idle state, as the current way to do it is
to infer that by the name of the functions that in such state have
more samples, IOW: a hack.

Maybe we can do flip a bit in samples that take place inside the
enter/exit idle section in do_idle()?

But till then, add one more :-\

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/ZR66Qgbcltt+zG7F@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:55 -07:00
Ian Rogers 03ff4c6b3e perf parse-events: Avoid erange from hex numbers
We specify that a "num_hex" comprises 1 or more digits, however, that
allows strtoull to fail with ERANGE. Limit the number of hex digits to
being between 1 and 16.

Before:
```
$ perf stat -e 'cpu/rE7574c47490475745/' true
perf: util/parse-events.c:215: fix_raw: Assertion `errno == 0' failed.
Aborted (core dumped)
```

After:
```
$ perf stat -e 'cpu/rE7574c47490475745/' true
event syntax error: 'cpu/rE7574c47490475745/'
                         \___ Bad event or PMU

Unable to find PMU or event on a PMU of 'cpu'

Initial error:
event syntax error: 'cpu/rE7574c47490475745/'
                         \___ unknown term 'rE7574c47490475745' for pmu 'cpu'

valid terms: event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,config3,name,period,percore,metric-id
Run 'perf list' for a list of valid events

 Usage: perf stat [<options>] [<command>]

    -e, --event <event>   event selector. use 'perf list' to list available events
```

Issue found through fuzz testing.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230907210533.3712979-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-12 10:01:55 -07:00
Arnaldo Carvalho de Melo 87cd3d4819 perf tools fixes for v6.6: 1st batch
Build:
 
  - Update header files in the tools/**/include directory to sync with
    the kernel sources as usual.
 
  - Remove unused bpf-prologue files.  While it's not strictly a fix,
    but the functionality was removed in this cycle so better to get
    rid of the code together.
 
  - Other minor build fixes.
 
 Misc:
 
  - Fix uninitialized memory access in PMU parsing code
 
  - Fix segfaults on software event
 
 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZRIFKAAKCRCMstVUGiXM
 g/pXAP9HLB2s+beBTK5iQU4/NfqmAVSl303QCoR9xLByo38vfAEAlLiRIh061pTi
 PRlXVuY9bUQPyCSYsiBHv/fmLqdQdwU=
 =ti6G
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' into perf-tools-next

To pick up the 'perf bench sched-seccomp-notify' changes to allow us to
continue build testing perf-tools-next with the set of distro
containers, where some older ones don't have a recent enough seccomp.h
UAPI header that contains defines needed by this new 'perf bench'
workload.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-10-10 17:36:36 -03:00
Athira Rajeev 6be5d82862 tools/perf: Add "is_kmod" to struct dso to check if it is kernel module
Update "struct dso" to include new member "is_kmod".
This new field will determine if the file is a kernel
module or not.

To resolve the address from a sample, perf looks at the
DSO maps. In case of address from a kernel module, there
were some address found to be not resolved. This was
observed while running perf test for "Object code reading".
Though the ip falls beteen the start address of the loaded
module (perf map->start ) and end address ( perf map->end),
it was unresolved.

This was happening because in some cases for kernel
modules, address from sample points to stub instructions.
To identify if the DSO is a kernel module, the new field
"is_kmod" is added to "struct dso".

Reported-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230928075213.84392-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-04 22:28:07 -07:00
Athira Rajeev 26a5262d30 tools/perf: Add text_end to "struct dso" to save .text section size
Update "struct dso" to include new member "text_end".
This new field will represent the offset for end of text
section for a dso. For elf, this value is derived as:
sh_size (Size of section in byes) + sh_offset (Section file
offst) of the elf header for text.

For bfd, this value is derived as:
1. For PE file,
section->size + ( section->vma - dso->text_offset)
2. Other cases:
section->filepos (file position) + section->size (size of
section)

To resolve the address from a sample, perf looks at the
DSO maps. In case of address from a kernel module, there
were some address found to be not resolved. This was
observed while running perf test for "Object code reading".
Though the ip falls beteen the start address of the loaded
module (perf map->start ) and end address ( perf map->end),
it was unresolved.

Example:

    Reading object code for memory address: 0xc008000007f0142c
    File is: /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko
    On file address is: 0x1114cc
    Objdump command is: objdump -z -d --start-address=0x11142c --stop-address=0x1114ac /lib/modules/6.5.0-rc3+/kernel/fs/xfs/xfs.ko
    objdump read too few bytes: 128
    test child finished with -1

Here, module is loaded at:
    # cat /proc/modules | grep xfs
    xfs 2228224 3 - Live 0xc008000007d00000

From objdump for xfs module, text section is:
    text 0010f7bc  0000000000000000 0000000000000000 000000a0 2**4

Here the offset for 0xc008000007f0142c ie  0x112074 falls out
.text section which is up to 0x10f7bc.

In this case for module, the address 0xc008000007e11fd4 is pointing
to stub instructions. This address range represents the module stubs
which is allocated on module load and hence is not part of DSO offset.

To identify such  address, which falls out of text
section and within module end, added the new field "text_end" to
"struct dso".

Reported-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230928075213.84392-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-04 22:28:07 -07:00
Kuan-Wei Chiu be7a4caa7c perf hisi-ptt: Fix memory leak in lseek failure handling
In the previous code, there was a memory leak issue where the previously
allocated memory was not freed upon a failed lseek operation. This patch
addresses the problem by releasing the old memory before returning -errno
in case of a lseek failure. This ensures that memory is properly managed
and avoids potential memory leaks.

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: yangyicong@hisilicon.com
Cc: jonathan.cameron@huawei.com
Link: https://lore.kernel.org/r/20230930072719.1267784-1-visitorckw@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-04 22:28:07 -07:00
Adrian Hunter f2d87895cb perf intel-pt: Fix async branch flags
Ensure PERF_IP_FLAG_ASYNC is set always for asynchronous branches (i.e.
interrupts etc).

Fixes: 90e457f7be ("perf tools: Add Intel PT support")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230928072953.19369-1-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-29 23:59:08 -07:00
Adrian Hunter 7a48b58eb5 perf dlfilter: Fix use of addr_location__exit() in dlfilter__object_code()
Stop calling addr_location__exit() when addr_location__init() was not
called.

Fixes: 0dd5041c9a ("perf addr_location: Add init/exit/copy functions")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230928071605.17624-1-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-29 23:55:05 -07:00