linux-stable/tools/perf/Documentation/perf-kwork.txt

215 lines
5.8 KiB
Text
Raw Normal View History

perf kwork: New tool to trace time properties of kernel work (such as softirq, and workqueue) The 'perf kwork' tool is used to trace time properties of kernel work (such as irq, softirq, and workqueue), including runtime, latency, and timehist, using the infrastructure in the perf tools to allow tracing extra targets. This is the first commit to reuse the 'perf record' framework code to implement a simple record function, kwork is not supported currently. Test cases: # perf usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] The most commonly used perf commands are: <SNIP> iostat Show I/O performance metrics kallsyms Searches running kernel for symbols kmem Tool to trace/measure kernel memory properties kvm Tool to trace/measure kvm guest os kwork Tool to trace/measure kernel work properties (latencies) list List all symbolic event types lock Analyze lock events mem Profile memory accesses record Run a command and record its profile into perf.data <SNIP> See 'perf help COMMAND' for more information on a specific command. # perf kwork Usage: perf kwork [<options>] {record} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile -v, --verbose be more verbose (show symbol address, etc) # perf kwork record -- sleep 1 [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.787 MB perf.data ] Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-2-yangjihong1@huawei.com [ Add {} for multiline if blocks ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:17 +00:00
perf-kowrk(1)
=============
NAME
----
perf-kwork - Tool to trace/measure kernel work properties (latencies)
SYNOPSIS
--------
[verse]
perf kwork top: Introduce new top utility Some common tools for collecting statistics on CPU usage, such as top, obtain statistics from timer interrupt sampling, and then periodically read statistics from /proc/stat. This method has some deviations: 1. In the tick interrupt, the time between the last tick and the current tick is counted in the current task. However, the task may be running only part of the time. 2. For each task, the top tool periodically reads the /proc/{PID}/status information. For tasks with a short life cycle, it may be missed. In conclusion, the top tool cannot accurately collect statistics on the CPU usage and running time of tasks. The statistical method based on sched_switch tracepoint can accurately calculate the CPU usage of all tasks. This method is applicable to scenarios where performance comparison data is of high precision. Example usage: # perf kwork Usage: perf kwork [<options>] {record|report|latency|timehist|top} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile (irq, softirq, workqueue, sched, etc) -v, --verbose be more verbose (show symbol address, etc) # perf kwork -k sched record -- perf bench sched messaging -g 1 -l 10000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 1 groups == 40 processes run Total time: 14.074 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 15.886 MB perf.data (129472 samples) ] # perf kwork top Total : 115708.178 ms, 8 cpus %Cpu(s): 9.78% id %Cpu0 [||||||||||||||||||||||||||| 90.55%] %Cpu1 [||||||||||||||||||||||||||| 90.51%] %Cpu2 [|||||||||||||||||||||||||| 88.57%] %Cpu3 [||||||||||||||||||||||||||| 91.18%] %Cpu4 [||||||||||||||||||||||||||| 91.09%] %Cpu5 [||||||||||||||||||||||||||| 90.88%] %Cpu6 [|||||||||||||||||||||||||| 88.64%] %Cpu7 [||||||||||||||||||||||||||| 90.28%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 4113 22.23 3221.547 ms sched-messaging 4105 21.61 3131.495 ms sched-messaging 4119 21.53 3120.937 ms sched-messaging 4103 21.39 3101.614 ms sched-messaging 4106 21.37 3095.209 ms sched-messaging 4104 21.25 3077.269 ms sched-messaging 4115 21.21 3073.188 ms sched-messaging 4109 21.18 3069.022 ms sched-messaging 4111 20.78 3010.033 ms sched-messaging 4114 20.74 3007.073 ms sched-messaging 4108 20.73 3002.137 ms sched-messaging 4107 20.47 2967.292 ms sched-messaging 4117 20.39 2955.335 ms sched-messaging 4112 20.34 2947.080 ms sched-messaging 4118 20.32 2942.519 ms sched-messaging 4121 20.23 2929.865 ms sched-messaging 4110 20.22 2930.078 ms sched-messaging 4122 20.15 2919.542 ms sched-messaging 4120 19.77 2866.032 ms sched-messaging 4116 19.72 2857.660 ms sched-messaging 4127 16.19 2346.334 ms sched-messaging 4142 15.86 2297.600 ms sched-messaging 4141 15.62 2262.646 ms sched-messaging 4136 15.41 2231.408 ms sched-messaging 4130 15.38 2227.008 ms sched-messaging 4129 15.31 2217.692 ms sched-messaging 4126 15.21 2201.711 ms sched-messaging 4139 15.19 2200.722 ms sched-messaging 4137 15.10 2188.633 ms sched-messaging 4134 15.06 2182.082 ms sched-messaging 4132 15.02 2177.530 ms sched-messaging 4131 14.73 2131.973 ms sched-messaging 4125 14.68 2125.439 ms sched-messaging 4128 14.66 2122.255 ms sched-messaging 4123 14.65 2122.113 ms sched-messaging 4135 14.56 2107.144 ms sched-messaging 4133 14.51 2103.549 ms sched-messaging 4124 14.27 2066.671 ms sched-messaging 4140 14.17 2052.251 ms sched-messaging 4138 13.81 2000.361 ms sched-messaging 0 11.42 1652.009 ms swapper/2 0 11.35 1641.694 ms swapper/6 0 9.71 1405.108 ms swapper/7 0 9.48 1372.338 ms swapper/1 0 9.44 1366.013 ms swapper/0 0 9.11 1318.382 ms swapper/5 0 8.90 1287.582 ms swapper/4 0 8.81 1274.356 ms swapper/3 4100 2.61 379.328 ms perf 4101 1.16 169.487 ms perf-exec 151 0.65 94.741 ms systemd-resolve 249 0.36 53.030 ms sd-resolve 153 0.14 21.405 ms systemd-timesyn 1 0.10 16.200 ms systemd 16 0.09 15.785 ms rcu_preempt 4102 0.06 9.727 ms perf 4095 0.03 5.464 ms kworker/7:1 98 0.02 3.231 ms jbd2/sda-8 353 0.02 4.115 ms sshd 75 0.02 3.889 ms kworker/2:1 73 0.01 1.552 ms kworker/5:1 64 0.01 1.591 ms kworker/4:1 74 0.01 1.952 ms kworker/3:1 61 0.01 2.608 ms kcompactd0 397 0.01 1.602 ms kworker/1:1 69 0.01 1.817 ms kworker/1:1H 10 0.01 2.553 ms kworker/u16:0 2909 0.01 2.684 ms kworker/0:2 1211 0.00 0.426 ms kworker/7:0 97 0.00 0.153 ms kworker/7:1H 51 0.00 0.100 ms ksoftirqd/7 120 0.00 0.856 ms systemd-journal 76 0.00 1.414 ms kworker/6:1 46 0.00 0.246 ms ksoftirqd/6 45 0.00 0.164 ms migration/6 41 0.00 0.098 ms ksoftirqd/5 40 0.00 0.207 ms migration/5 86 0.00 1.339 ms kworker/4:1H 36 0.00 0.252 ms ksoftirqd/4 35 0.00 0.090 ms migration/4 31 0.00 0.156 ms ksoftirqd/3 30 0.00 0.073 ms migration/3 26 0.00 0.180 ms ksoftirqd/2 25 0.00 0.085 ms migration/2 21 0.00 0.106 ms ksoftirqd/1 20 0.00 0.118 ms migration/1 302 0.00 1.440 ms systemd-logind 17 0.00 0.132 ms migration/0 15 0.00 0.255 ms ksoftirqd/0 Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-10-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-12 08:49:10 +00:00
'perf kwork' {record|report|latency|timehist|top}
perf kwork: New tool to trace time properties of kernel work (such as softirq, and workqueue) The 'perf kwork' tool is used to trace time properties of kernel work (such as irq, softirq, and workqueue), including runtime, latency, and timehist, using the infrastructure in the perf tools to allow tracing extra targets. This is the first commit to reuse the 'perf record' framework code to implement a simple record function, kwork is not supported currently. Test cases: # perf usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] The most commonly used perf commands are: <SNIP> iostat Show I/O performance metrics kallsyms Searches running kernel for symbols kmem Tool to trace/measure kernel memory properties kvm Tool to trace/measure kvm guest os kwork Tool to trace/measure kernel work properties (latencies) list List all symbolic event types lock Analyze lock events mem Profile memory accesses record Run a command and record its profile into perf.data <SNIP> See 'perf help COMMAND' for more information on a specific command. # perf kwork Usage: perf kwork [<options>] {record} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile -v, --verbose be more verbose (show symbol address, etc) # perf kwork record -- sleep 1 [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.787 MB perf.data ] Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-2-yangjihong1@huawei.com [ Add {} for multiline if blocks ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:17 +00:00
DESCRIPTION
-----------
There are several variants of 'perf kwork':
'perf kwork record <command>' to record the kernel work
of an arbitrary workload.
perf kwork: Implement 'report' subcommand Implements framework of 'perf kwork report', which is used to report time properties such as run time and frequency: Test cases: # perf kwork Usage: perf kwork [<options>] {record|report} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile (irq, softirq, workqueue, etc) -v, --verbose be more verbose (show symbol address, etc) # perf kwork report -h Usage: perf kwork report [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork report Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- # perf kwork report -S Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- Total count : 0 Total runtime (msec) : 0.000 (0.000% load average) Total time span (msec) : 0.000 -------------------------------------------------------------------------------------------------------------------------------- # perf kwork report -C 0,100 Requested CPU 100 too large. Consider raising MAX_NR_CPUS Invalid cpu bitmap # perf kwork report -s runtime1 Error: Unknown --sort key: `runtime1' Usage: perf kwork report [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork report -i perf_no_exist.data failed to open perf_no_exist.data: No such file or directory # perf kwork report --time 00FFF, Invalid time span Since there are no report supported events, the output is empty. Briefly describe the data structure: 1. "class" indicates event type. For example, irq and softiq correspond to different types. 2. "cluster" refers to a specific event corresponding to a type. For example, RCU and TIMER in softirq correspond to different clusters, which contains three types of events: raise, entry, and exit. 3. "atom" includes time of each sample and sample of the previous phase. (For example, exit corresponds to entry, which is used for timehist.) Committer notes: - Add {} for multiline if blocks. - report_print_work() should either return that ret variable that accounts how many bytes were printed or stop accounting and be void. Do the former for now to avoid this: builtin-kwork.c:534:6: error: variable 'ret' set but not used [-Werror,-Wunused-but-set-variable] int ret = 0; ^ 1 error generated. When building with: ⬢[acme@toolbox perf]$ clang --version clang version 13.0.0 (https://github.com/llvm/llvm-project e8991caea8690ec2d17b0b7e1c29bf0da6609076) Also: - if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) { + if (dst_type < KWORK_TRACE_MAX) { Several versions of clang and at least this gcc: 3 51.40 alpine:3.9 : FAIL gcc version 8.3.0 (Alpine 8.3.0) builtin-kwork.c:411:16: error: comparison of unsigned enum expression >= 0 is always true [-Werror,-Wtautological-compare] if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) { As the first entry in a enum is zero. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-7-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:22 +00:00
'perf kwork report' to report the per kwork runtime.
perf kwork: Implement perf kwork latency Implements framework of perf kwork latency, which is used to report time properties such as delay time and frequency. Test cases: # perf kwork lat -h Usage: perf kwork latency [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat -C 199 Requested CPU 199 too large. Consider raising MAX_NR_CPUS Invalid cpu bitmap # perf kwork lat -i perf_no_exist.data failed to open perf_no_exist.data: No such file or directory # perf kwork lat -s avg1 Error: Unknown --sort key: `avg1' Usage: perf kwork latency [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat --time FFFF, Invalid time span # perf kwork lat Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- INFO: 36.570% skipped events (31537 including 0 raise, 31537 entry, 0 exit) Since there are no latency-enabled events, the output is empty. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-11-yangjihong1@huawei.com [ Add {} for multiline if blocks ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:26 +00:00
'perf kwork latency' to report the per kwork latencies.
perf kwork: Implement perf kwork timehist Implements framework of perf kwork timehist, to provide an analysis of kernel work events. Test cases: # perf kwork tim Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 91576.060290 91576.060344 [0000] (s)RCU:9 0.055 0.111 91576.061470 91576.061547 [0000] (s)SCHED:7 0.077 0.073 91576.062604 91576.062697 [0001] (s)RCU:9 0.094 0.409 91576.064443 91576.064517 [0002] (s)RCU:9 0.074 0.114 91576.065144 91576.065211 [0000] (s)SCHED:7 0.067 0.058 91576.066564 91576.066609 [0003] (s)RCU:9 0.045 0.110 91576.068495 91576.068559 [0000] (s)SCHED:7 0.064 0.059 91576.068900 91576.068996 [0004] (s)RCU:9 0.096 0.726 91576.069364 91576.069420 [0002] (s)RCU:9 0.056 0.082 91576.069649 91576.069701 [0004] (s)RCU:9 0.052 0.111 91576.070147 91576.070206 [0000] (s)SCHED:7 0.060 0.057 91576.073147 91576.073202 [0000] (s)SCHED:7 0.054 0.060 <SNIP> # perf kwork tim --max-stack 2 -g Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 91576.060290 91576.060344 [0000] (s)RCU:9 0.055 0.111 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.061470 91576.061547 [0000] (s)SCHED:7 0.077 0.073 irq_exit_rcu <- sysvec_call_function_single 91576.062604 91576.062697 [0001] (s)RCU:9 0.094 0.409 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.064443 91576.064517 [0002] (s)RCU:9 0.074 0.114 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.065144 91576.065211 [0000] (s)SCHED:7 0.067 0.058 irq_exit_rcu <- sysvec_call_function_single 91576.066564 91576.066609 [0003] (s)RCU:9 0.045 0.110 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.068495 91576.068559 [0000] (s)SCHED:7 0.064 0.059 irq_exit_rcu <- sysvec_call_function_single 91576.068900 91576.068996 [0004] (s)RCU:9 0.096 0.726 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.069364 91576.069420 [0002] (s)RCU:9 0.056 0.082 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.069649 91576.069701 [0004] (s)RCU:9 0.052 0.111 irq_exit_rcu <- sysvec_apic_timer_interrupt <SNIP> Committer testing: # perf kwork -k workqueue timehist | head -40 Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 26520.211825 26520.211832 [0019] (w)free_work 0.007 0.004 26520.212929 26520.212934 [0020] (w)free_work 0.005 0.004 26520.213226 26520.213228 [0014] (w)kfree_rcu_work 0.002 0.004 26520.214057 26520.214061 [0021] (w)free_work 0.004 0.004 26520.221239 26520.221241 [0007] (w)kfree_rcu_work 0.002 0.009 26520.223232 26520.223238 [0013] (w)psi_avgs_work 0.005 0.006 26520.230057 26520.230060 [0020] (w)free_work 0.003 0.003 26520.270428 26520.270434 [0015] (w)free_work 0.006 0.004 26520.270546 26520.270550 [0014] (w)free_work 0.004 0.003 26520.281626 26520.281629 [0015] (w)free_work 0.003 0.002 26520.287225 26520.287230 [0012] (w)psi_avgs_work 0.005 0.008 26520.287231 26520.287235 [0001] (w)psi_avgs_work 0.004 0.011 26520.287236 26520.287239 [0001] (w)psi_avgs_work 0.003 0.012 26520.329488 26520.329492 [0024] (w)free_work 0.004 0.004 26520.330600 26520.330605 [0007] (w)free_work 0.005 0.004 26520.334218 26520.334218 [0007] (w)kfree_rcu_monitor 0.001 0.002 26520.335220 26520.335221 [0005] (w)kfree_rcu_monitor 0.001 0.004 26520.343980 26520.343985 [0007] (w)free_work 0.005 0.002 26520.345093 26520.345097 [0006] (w)free_work 0.004 0.003 26520.351233 26520.351238 [0027] (w)psi_avgs_work 0.005 0.008 26520.353228 26520.353229 [0007] (w)kfree_rcu_work 0.001 0.002 26520.353229 26520.353231 [0005] (w)kfree_rcu_work 0.001 0.006 26520.382381 26520.382383 [0006] (w)free_work 0.003 0.002 26520.386547 26520.386548 [0006] (w)free_work 0.002 0.001 26520.391243 26520.391245 [0015] (w)console_callback 0.002 0.016 26520.415369 26520.415621 [0027] (w)btrfs_work_helper 0.252 26520.415351 26520.416174 [0002] (w)btrfs_work_helper 0.823 0.037 26520.415343 26520.416304 [0031] (w)btrfs_work_helper 0.961 26520.415335 26520.417078 [0001] (w)btrfs_work_helper 1.743 26520.415250 26520.417564 [0002] (w)wb_workfn 2.314 26520.424777 26520.424787 [0002] (w)btrfs_work_helper 0.010 26520.424788 26520.424798 [0002] (w)btrfs_work_helper 0.010 26520.424790 26520.424805 [0001] (w)btrfs_work_helper 0.016 0.016 26520.424801 26520.424807 [0002] (w)btrfs_work_helper 0.006 26520.424809 26520.424831 [0002] (w)btrfs_work_helper 0.022 0.030 26520.424824 26520.424835 [0027] (w)btrfs_work_helper 0.011 26520.424809 26520.424867 [0001] (w)btrfs_work_helper 0.059 0.032 # Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-14-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:29 +00:00
'perf kwork timehist' provides an analysis of kernel work events.
perf kwork top: Introduce new top utility Some common tools for collecting statistics on CPU usage, such as top, obtain statistics from timer interrupt sampling, and then periodically read statistics from /proc/stat. This method has some deviations: 1. In the tick interrupt, the time between the last tick and the current tick is counted in the current task. However, the task may be running only part of the time. 2. For each task, the top tool periodically reads the /proc/{PID}/status information. For tasks with a short life cycle, it may be missed. In conclusion, the top tool cannot accurately collect statistics on the CPU usage and running time of tasks. The statistical method based on sched_switch tracepoint can accurately calculate the CPU usage of all tasks. This method is applicable to scenarios where performance comparison data is of high precision. Example usage: # perf kwork Usage: perf kwork [<options>] {record|report|latency|timehist|top} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile (irq, softirq, workqueue, sched, etc) -v, --verbose be more verbose (show symbol address, etc) # perf kwork -k sched record -- perf bench sched messaging -g 1 -l 10000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 1 groups == 40 processes run Total time: 14.074 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 15.886 MB perf.data (129472 samples) ] # perf kwork top Total : 115708.178 ms, 8 cpus %Cpu(s): 9.78% id %Cpu0 [||||||||||||||||||||||||||| 90.55%] %Cpu1 [||||||||||||||||||||||||||| 90.51%] %Cpu2 [|||||||||||||||||||||||||| 88.57%] %Cpu3 [||||||||||||||||||||||||||| 91.18%] %Cpu4 [||||||||||||||||||||||||||| 91.09%] %Cpu5 [||||||||||||||||||||||||||| 90.88%] %Cpu6 [|||||||||||||||||||||||||| 88.64%] %Cpu7 [||||||||||||||||||||||||||| 90.28%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 4113 22.23 3221.547 ms sched-messaging 4105 21.61 3131.495 ms sched-messaging 4119 21.53 3120.937 ms sched-messaging 4103 21.39 3101.614 ms sched-messaging 4106 21.37 3095.209 ms sched-messaging 4104 21.25 3077.269 ms sched-messaging 4115 21.21 3073.188 ms sched-messaging 4109 21.18 3069.022 ms sched-messaging 4111 20.78 3010.033 ms sched-messaging 4114 20.74 3007.073 ms sched-messaging 4108 20.73 3002.137 ms sched-messaging 4107 20.47 2967.292 ms sched-messaging 4117 20.39 2955.335 ms sched-messaging 4112 20.34 2947.080 ms sched-messaging 4118 20.32 2942.519 ms sched-messaging 4121 20.23 2929.865 ms sched-messaging 4110 20.22 2930.078 ms sched-messaging 4122 20.15 2919.542 ms sched-messaging 4120 19.77 2866.032 ms sched-messaging 4116 19.72 2857.660 ms sched-messaging 4127 16.19 2346.334 ms sched-messaging 4142 15.86 2297.600 ms sched-messaging 4141 15.62 2262.646 ms sched-messaging 4136 15.41 2231.408 ms sched-messaging 4130 15.38 2227.008 ms sched-messaging 4129 15.31 2217.692 ms sched-messaging 4126 15.21 2201.711 ms sched-messaging 4139 15.19 2200.722 ms sched-messaging 4137 15.10 2188.633 ms sched-messaging 4134 15.06 2182.082 ms sched-messaging 4132 15.02 2177.530 ms sched-messaging 4131 14.73 2131.973 ms sched-messaging 4125 14.68 2125.439 ms sched-messaging 4128 14.66 2122.255 ms sched-messaging 4123 14.65 2122.113 ms sched-messaging 4135 14.56 2107.144 ms sched-messaging 4133 14.51 2103.549 ms sched-messaging 4124 14.27 2066.671 ms sched-messaging 4140 14.17 2052.251 ms sched-messaging 4138 13.81 2000.361 ms sched-messaging 0 11.42 1652.009 ms swapper/2 0 11.35 1641.694 ms swapper/6 0 9.71 1405.108 ms swapper/7 0 9.48 1372.338 ms swapper/1 0 9.44 1366.013 ms swapper/0 0 9.11 1318.382 ms swapper/5 0 8.90 1287.582 ms swapper/4 0 8.81 1274.356 ms swapper/3 4100 2.61 379.328 ms perf 4101 1.16 169.487 ms perf-exec 151 0.65 94.741 ms systemd-resolve 249 0.36 53.030 ms sd-resolve 153 0.14 21.405 ms systemd-timesyn 1 0.10 16.200 ms systemd 16 0.09 15.785 ms rcu_preempt 4102 0.06 9.727 ms perf 4095 0.03 5.464 ms kworker/7:1 98 0.02 3.231 ms jbd2/sda-8 353 0.02 4.115 ms sshd 75 0.02 3.889 ms kworker/2:1 73 0.01 1.552 ms kworker/5:1 64 0.01 1.591 ms kworker/4:1 74 0.01 1.952 ms kworker/3:1 61 0.01 2.608 ms kcompactd0 397 0.01 1.602 ms kworker/1:1 69 0.01 1.817 ms kworker/1:1H 10 0.01 2.553 ms kworker/u16:0 2909 0.01 2.684 ms kworker/0:2 1211 0.00 0.426 ms kworker/7:0 97 0.00 0.153 ms kworker/7:1H 51 0.00 0.100 ms ksoftirqd/7 120 0.00 0.856 ms systemd-journal 76 0.00 1.414 ms kworker/6:1 46 0.00 0.246 ms ksoftirqd/6 45 0.00 0.164 ms migration/6 41 0.00 0.098 ms ksoftirqd/5 40 0.00 0.207 ms migration/5 86 0.00 1.339 ms kworker/4:1H 36 0.00 0.252 ms ksoftirqd/4 35 0.00 0.090 ms migration/4 31 0.00 0.156 ms ksoftirqd/3 30 0.00 0.073 ms migration/3 26 0.00 0.180 ms ksoftirqd/2 25 0.00 0.085 ms migration/2 21 0.00 0.106 ms ksoftirqd/1 20 0.00 0.118 ms migration/1 302 0.00 1.440 ms systemd-logind 17 0.00 0.132 ms migration/0 15 0.00 0.255 ms ksoftirqd/0 Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-10-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-12 08:49:10 +00:00
'perf kwork top' to report the task cpu usage.
perf kwork: New tool to trace time properties of kernel work (such as softirq, and workqueue) The 'perf kwork' tool is used to trace time properties of kernel work (such as irq, softirq, and workqueue), including runtime, latency, and timehist, using the infrastructure in the perf tools to allow tracing extra targets. This is the first commit to reuse the 'perf record' framework code to implement a simple record function, kwork is not supported currently. Test cases: # perf usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] The most commonly used perf commands are: <SNIP> iostat Show I/O performance metrics kallsyms Searches running kernel for symbols kmem Tool to trace/measure kernel memory properties kvm Tool to trace/measure kvm guest os kwork Tool to trace/measure kernel work properties (latencies) list List all symbolic event types lock Analyze lock events mem Profile memory accesses record Run a command and record its profile into perf.data <SNIP> See 'perf help COMMAND' for more information on a specific command. # perf kwork Usage: perf kwork [<options>] {record} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile -v, --verbose be more verbose (show symbol address, etc) # perf kwork record -- sleep 1 [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.787 MB perf.data ] Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-2-yangjihong1@huawei.com [ Add {} for multiline if blocks ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:17 +00:00
Example usage:
perf kwork record -- sleep 1
perf kwork: Implement 'report' subcommand Implements framework of 'perf kwork report', which is used to report time properties such as run time and frequency: Test cases: # perf kwork Usage: perf kwork [<options>] {record|report} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile (irq, softirq, workqueue, etc) -v, --verbose be more verbose (show symbol address, etc) # perf kwork report -h Usage: perf kwork report [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork report Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- # perf kwork report -S Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- Total count : 0 Total runtime (msec) : 0.000 (0.000% load average) Total time span (msec) : 0.000 -------------------------------------------------------------------------------------------------------------------------------- # perf kwork report -C 0,100 Requested CPU 100 too large. Consider raising MAX_NR_CPUS Invalid cpu bitmap # perf kwork report -s runtime1 Error: Unknown --sort key: `runtime1' Usage: perf kwork report [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork report -i perf_no_exist.data failed to open perf_no_exist.data: No such file or directory # perf kwork report --time 00FFF, Invalid time span Since there are no report supported events, the output is empty. Briefly describe the data structure: 1. "class" indicates event type. For example, irq and softiq correspond to different types. 2. "cluster" refers to a specific event corresponding to a type. For example, RCU and TIMER in softirq correspond to different clusters, which contains three types of events: raise, entry, and exit. 3. "atom" includes time of each sample and sample of the previous phase. (For example, exit corresponds to entry, which is used for timehist.) Committer notes: - Add {} for multiline if blocks. - report_print_work() should either return that ret variable that accounts how many bytes were printed or stop accounting and be void. Do the former for now to avoid this: builtin-kwork.c:534:6: error: variable 'ret' set but not used [-Werror,-Wunused-but-set-variable] int ret = 0; ^ 1 error generated. When building with: ⬢[acme@toolbox perf]$ clang --version clang version 13.0.0 (https://github.com/llvm/llvm-project e8991caea8690ec2d17b0b7e1c29bf0da6609076) Also: - if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) { + if (dst_type < KWORK_TRACE_MAX) { Several versions of clang and at least this gcc: 3 51.40 alpine:3.9 : FAIL gcc version 8.3.0 (Alpine 8.3.0) builtin-kwork.c:411:16: error: comparison of unsigned enum expression >= 0 is always true [-Werror,-Wtautological-compare] if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) { As the first entry in a enum is zero. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-7-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:22 +00:00
perf kwork report
perf kwork: Implement BPF trace 'perf record' generates perf.data, which generates extra interrupts for hard disk, amount of data to be collected increases with time. Using eBPF trace can process the data in kernel, which solves the preceding two problems. Add -b/--use-bpf option for latency and report to support tracing kwork events using eBPF: 1. Create bpf prog and attach to tracepoints, 2. Start tracing after command is entered, 3. After user hit "ctrl+c", stop tracing and report, 4. Support CPU and name filtering. This commit implements the framework code and does not add specific event support. Test cases: # perf kwork rep -h Usage: perf kwork report [<options>] -b, --use-bpf Use BPF to measure kwork runtime -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork lat -h Usage: perf kwork latency [<options>] -b, --use-bpf Use BPF to measure kwork latency -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat -b Unsupported bpf trace class irq # perf kwork rep -b Unsupported bpf trace class irq Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-15-yangjihong1@huawei.com [ Simplify work_findnew() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:30 +00:00
perf kwork report -b
perf kwork: Implement perf kwork latency Implements framework of perf kwork latency, which is used to report time properties such as delay time and frequency. Test cases: # perf kwork lat -h Usage: perf kwork latency [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat -C 199 Requested CPU 199 too large. Consider raising MAX_NR_CPUS Invalid cpu bitmap # perf kwork lat -i perf_no_exist.data failed to open perf_no_exist.data: No such file or directory # perf kwork lat -s avg1 Error: Unknown --sort key: `avg1' Usage: perf kwork latency [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat --time FFFF, Invalid time span # perf kwork lat Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- INFO: 36.570% skipped events (31537 including 0 raise, 31537 entry, 0 exit) Since there are no latency-enabled events, the output is empty. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-11-yangjihong1@huawei.com [ Add {} for multiline if blocks ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:26 +00:00
perf kwork latency
perf kwork: Implement BPF trace 'perf record' generates perf.data, which generates extra interrupts for hard disk, amount of data to be collected increases with time. Using eBPF trace can process the data in kernel, which solves the preceding two problems. Add -b/--use-bpf option for latency and report to support tracing kwork events using eBPF: 1. Create bpf prog and attach to tracepoints, 2. Start tracing after command is entered, 3. After user hit "ctrl+c", stop tracing and report, 4. Support CPU and name filtering. This commit implements the framework code and does not add specific event support. Test cases: # perf kwork rep -h Usage: perf kwork report [<options>] -b, --use-bpf Use BPF to measure kwork runtime -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork lat -h Usage: perf kwork latency [<options>] -b, --use-bpf Use BPF to measure kwork latency -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat -b Unsupported bpf trace class irq # perf kwork rep -b Unsupported bpf trace class irq Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-15-yangjihong1@huawei.com [ Simplify work_findnew() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:30 +00:00
perf kwork latency -b
perf kwork: Implement perf kwork timehist Implements framework of perf kwork timehist, to provide an analysis of kernel work events. Test cases: # perf kwork tim Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 91576.060290 91576.060344 [0000] (s)RCU:9 0.055 0.111 91576.061470 91576.061547 [0000] (s)SCHED:7 0.077 0.073 91576.062604 91576.062697 [0001] (s)RCU:9 0.094 0.409 91576.064443 91576.064517 [0002] (s)RCU:9 0.074 0.114 91576.065144 91576.065211 [0000] (s)SCHED:7 0.067 0.058 91576.066564 91576.066609 [0003] (s)RCU:9 0.045 0.110 91576.068495 91576.068559 [0000] (s)SCHED:7 0.064 0.059 91576.068900 91576.068996 [0004] (s)RCU:9 0.096 0.726 91576.069364 91576.069420 [0002] (s)RCU:9 0.056 0.082 91576.069649 91576.069701 [0004] (s)RCU:9 0.052 0.111 91576.070147 91576.070206 [0000] (s)SCHED:7 0.060 0.057 91576.073147 91576.073202 [0000] (s)SCHED:7 0.054 0.060 <SNIP> # perf kwork tim --max-stack 2 -g Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 91576.060290 91576.060344 [0000] (s)RCU:9 0.055 0.111 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.061470 91576.061547 [0000] (s)SCHED:7 0.077 0.073 irq_exit_rcu <- sysvec_call_function_single 91576.062604 91576.062697 [0001] (s)RCU:9 0.094 0.409 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.064443 91576.064517 [0002] (s)RCU:9 0.074 0.114 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.065144 91576.065211 [0000] (s)SCHED:7 0.067 0.058 irq_exit_rcu <- sysvec_call_function_single 91576.066564 91576.066609 [0003] (s)RCU:9 0.045 0.110 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.068495 91576.068559 [0000] (s)SCHED:7 0.064 0.059 irq_exit_rcu <- sysvec_call_function_single 91576.068900 91576.068996 [0004] (s)RCU:9 0.096 0.726 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.069364 91576.069420 [0002] (s)RCU:9 0.056 0.082 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.069649 91576.069701 [0004] (s)RCU:9 0.052 0.111 irq_exit_rcu <- sysvec_apic_timer_interrupt <SNIP> Committer testing: # perf kwork -k workqueue timehist | head -40 Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 26520.211825 26520.211832 [0019] (w)free_work 0.007 0.004 26520.212929 26520.212934 [0020] (w)free_work 0.005 0.004 26520.213226 26520.213228 [0014] (w)kfree_rcu_work 0.002 0.004 26520.214057 26520.214061 [0021] (w)free_work 0.004 0.004 26520.221239 26520.221241 [0007] (w)kfree_rcu_work 0.002 0.009 26520.223232 26520.223238 [0013] (w)psi_avgs_work 0.005 0.006 26520.230057 26520.230060 [0020] (w)free_work 0.003 0.003 26520.270428 26520.270434 [0015] (w)free_work 0.006 0.004 26520.270546 26520.270550 [0014] (w)free_work 0.004 0.003 26520.281626 26520.281629 [0015] (w)free_work 0.003 0.002 26520.287225 26520.287230 [0012] (w)psi_avgs_work 0.005 0.008 26520.287231 26520.287235 [0001] (w)psi_avgs_work 0.004 0.011 26520.287236 26520.287239 [0001] (w)psi_avgs_work 0.003 0.012 26520.329488 26520.329492 [0024] (w)free_work 0.004 0.004 26520.330600 26520.330605 [0007] (w)free_work 0.005 0.004 26520.334218 26520.334218 [0007] (w)kfree_rcu_monitor 0.001 0.002 26520.335220 26520.335221 [0005] (w)kfree_rcu_monitor 0.001 0.004 26520.343980 26520.343985 [0007] (w)free_work 0.005 0.002 26520.345093 26520.345097 [0006] (w)free_work 0.004 0.003 26520.351233 26520.351238 [0027] (w)psi_avgs_work 0.005 0.008 26520.353228 26520.353229 [0007] (w)kfree_rcu_work 0.001 0.002 26520.353229 26520.353231 [0005] (w)kfree_rcu_work 0.001 0.006 26520.382381 26520.382383 [0006] (w)free_work 0.003 0.002 26520.386547 26520.386548 [0006] (w)free_work 0.002 0.001 26520.391243 26520.391245 [0015] (w)console_callback 0.002 0.016 26520.415369 26520.415621 [0027] (w)btrfs_work_helper 0.252 26520.415351 26520.416174 [0002] (w)btrfs_work_helper 0.823 0.037 26520.415343 26520.416304 [0031] (w)btrfs_work_helper 0.961 26520.415335 26520.417078 [0001] (w)btrfs_work_helper 1.743 26520.415250 26520.417564 [0002] (w)wb_workfn 2.314 26520.424777 26520.424787 [0002] (w)btrfs_work_helper 0.010 26520.424788 26520.424798 [0002] (w)btrfs_work_helper 0.010 26520.424790 26520.424805 [0001] (w)btrfs_work_helper 0.016 0.016 26520.424801 26520.424807 [0002] (w)btrfs_work_helper 0.006 26520.424809 26520.424831 [0002] (w)btrfs_work_helper 0.022 0.030 26520.424824 26520.424835 [0027] (w)btrfs_work_helper 0.011 26520.424809 26520.424867 [0001] (w)btrfs_work_helper 0.059 0.032 # Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-14-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:29 +00:00
perf kwork timehist
perf kwork top: Introduce new top utility Some common tools for collecting statistics on CPU usage, such as top, obtain statistics from timer interrupt sampling, and then periodically read statistics from /proc/stat. This method has some deviations: 1. In the tick interrupt, the time between the last tick and the current tick is counted in the current task. However, the task may be running only part of the time. 2. For each task, the top tool periodically reads the /proc/{PID}/status information. For tasks with a short life cycle, it may be missed. In conclusion, the top tool cannot accurately collect statistics on the CPU usage and running time of tasks. The statistical method based on sched_switch tracepoint can accurately calculate the CPU usage of all tasks. This method is applicable to scenarios where performance comparison data is of high precision. Example usage: # perf kwork Usage: perf kwork [<options>] {record|report|latency|timehist|top} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile (irq, softirq, workqueue, sched, etc) -v, --verbose be more verbose (show symbol address, etc) # perf kwork -k sched record -- perf bench sched messaging -g 1 -l 10000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 1 groups == 40 processes run Total time: 14.074 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 15.886 MB perf.data (129472 samples) ] # perf kwork top Total : 115708.178 ms, 8 cpus %Cpu(s): 9.78% id %Cpu0 [||||||||||||||||||||||||||| 90.55%] %Cpu1 [||||||||||||||||||||||||||| 90.51%] %Cpu2 [|||||||||||||||||||||||||| 88.57%] %Cpu3 [||||||||||||||||||||||||||| 91.18%] %Cpu4 [||||||||||||||||||||||||||| 91.09%] %Cpu5 [||||||||||||||||||||||||||| 90.88%] %Cpu6 [|||||||||||||||||||||||||| 88.64%] %Cpu7 [||||||||||||||||||||||||||| 90.28%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 4113 22.23 3221.547 ms sched-messaging 4105 21.61 3131.495 ms sched-messaging 4119 21.53 3120.937 ms sched-messaging 4103 21.39 3101.614 ms sched-messaging 4106 21.37 3095.209 ms sched-messaging 4104 21.25 3077.269 ms sched-messaging 4115 21.21 3073.188 ms sched-messaging 4109 21.18 3069.022 ms sched-messaging 4111 20.78 3010.033 ms sched-messaging 4114 20.74 3007.073 ms sched-messaging 4108 20.73 3002.137 ms sched-messaging 4107 20.47 2967.292 ms sched-messaging 4117 20.39 2955.335 ms sched-messaging 4112 20.34 2947.080 ms sched-messaging 4118 20.32 2942.519 ms sched-messaging 4121 20.23 2929.865 ms sched-messaging 4110 20.22 2930.078 ms sched-messaging 4122 20.15 2919.542 ms sched-messaging 4120 19.77 2866.032 ms sched-messaging 4116 19.72 2857.660 ms sched-messaging 4127 16.19 2346.334 ms sched-messaging 4142 15.86 2297.600 ms sched-messaging 4141 15.62 2262.646 ms sched-messaging 4136 15.41 2231.408 ms sched-messaging 4130 15.38 2227.008 ms sched-messaging 4129 15.31 2217.692 ms sched-messaging 4126 15.21 2201.711 ms sched-messaging 4139 15.19 2200.722 ms sched-messaging 4137 15.10 2188.633 ms sched-messaging 4134 15.06 2182.082 ms sched-messaging 4132 15.02 2177.530 ms sched-messaging 4131 14.73 2131.973 ms sched-messaging 4125 14.68 2125.439 ms sched-messaging 4128 14.66 2122.255 ms sched-messaging 4123 14.65 2122.113 ms sched-messaging 4135 14.56 2107.144 ms sched-messaging 4133 14.51 2103.549 ms sched-messaging 4124 14.27 2066.671 ms sched-messaging 4140 14.17 2052.251 ms sched-messaging 4138 13.81 2000.361 ms sched-messaging 0 11.42 1652.009 ms swapper/2 0 11.35 1641.694 ms swapper/6 0 9.71 1405.108 ms swapper/7 0 9.48 1372.338 ms swapper/1 0 9.44 1366.013 ms swapper/0 0 9.11 1318.382 ms swapper/5 0 8.90 1287.582 ms swapper/4 0 8.81 1274.356 ms swapper/3 4100 2.61 379.328 ms perf 4101 1.16 169.487 ms perf-exec 151 0.65 94.741 ms systemd-resolve 249 0.36 53.030 ms sd-resolve 153 0.14 21.405 ms systemd-timesyn 1 0.10 16.200 ms systemd 16 0.09 15.785 ms rcu_preempt 4102 0.06 9.727 ms perf 4095 0.03 5.464 ms kworker/7:1 98 0.02 3.231 ms jbd2/sda-8 353 0.02 4.115 ms sshd 75 0.02 3.889 ms kworker/2:1 73 0.01 1.552 ms kworker/5:1 64 0.01 1.591 ms kworker/4:1 74 0.01 1.952 ms kworker/3:1 61 0.01 2.608 ms kcompactd0 397 0.01 1.602 ms kworker/1:1 69 0.01 1.817 ms kworker/1:1H 10 0.01 2.553 ms kworker/u16:0 2909 0.01 2.684 ms kworker/0:2 1211 0.00 0.426 ms kworker/7:0 97 0.00 0.153 ms kworker/7:1H 51 0.00 0.100 ms ksoftirqd/7 120 0.00 0.856 ms systemd-journal 76 0.00 1.414 ms kworker/6:1 46 0.00 0.246 ms ksoftirqd/6 45 0.00 0.164 ms migration/6 41 0.00 0.098 ms ksoftirqd/5 40 0.00 0.207 ms migration/5 86 0.00 1.339 ms kworker/4:1H 36 0.00 0.252 ms ksoftirqd/4 35 0.00 0.090 ms migration/4 31 0.00 0.156 ms ksoftirqd/3 30 0.00 0.073 ms migration/3 26 0.00 0.180 ms ksoftirqd/2 25 0.00 0.085 ms migration/2 21 0.00 0.106 ms ksoftirqd/1 20 0.00 0.118 ms migration/1 302 0.00 1.440 ms systemd-logind 17 0.00 0.132 ms migration/0 15 0.00 0.255 ms ksoftirqd/0 Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-10-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-12 08:49:10 +00:00
perf kwork top
perf kwork top: Implements BPF-based cpu usage statistics Use BPF to collect statistics on the CPU usage based on perf BPF skeletons. Example usage: # perf kwork top -h Usage: perf kwork top [<options>] -b, --use-bpf Use BPF to measure task cpu usage -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): rate, runtime, tid --time <str> Time span for analysis (start,stop) # # perf kwork -k sched top -b Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [|||||||||||||||||| 61.66%] %Cpu1 [|||||||||||||||||| 61.27%] %Cpu2 [||||||||||||||||||| 66.40%] %Cpu3 [|||||||||||||||||| 61.28%] %Cpu4 [|||||||||||||||||| 61.82%] %Cpu5 [||||||||||||||||||||||| 77.41%] %Cpu6 [|||||||||||||||||| 61.73%] %Cpu7 [|||||||||||||||||| 63.25%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP> # perf kwork -k sched top -b -n perf Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 151563.332 ms, 8 cpus %Cpu(s): 26.49% id, 0.00% hi, 0.00% si %Cpu0 [ 0.01%] %Cpu1 [ 0.00%] %Cpu2 [ 0.00%] %Cpu3 [ 0.00%] %Cpu4 [ 0.00%] %Cpu5 [ 0.00%] %Cpu6 [ 0.00%] %Cpu7 [ 0.00%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 9754 9754 0.01 2.303 ms perf # # perf kwork -k sched top -b -C 2,3,4 Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 48016.721 ms, 3 cpus %Cpu(s): 27.82% id, 0.00% hi, 0.00% si %Cpu2 [|||||||||||||||||||||| 74.68%] %Cpu3 [||||||||||||||||||||| 71.06%] %Cpu4 [||||||||||||||||||||| 70.91%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 29.08 4734.998 ms [swapper/4] 0 0 28.93 4710.029 ms [swapper/3] 0 0 25.31 3912.363 ms [swapper/2] 10248 10158 1.62 264.931 ms sched-messaging 10253 10158 1.62 265.136 ms sched-messaging 10158 10158 1.60 263.013 ms bash 10360 10158 1.49 243.639 ms sched-messaging 10413 10158 1.48 238.604 ms sched-messaging 10531 10158 1.47 234.067 ms sched-messaging 10400 10158 1.47 240.631 ms sched-messaging 10355 10158 1.47 230.586 ms sched-messaging 10377 10158 1.43 234.835 ms sched-messaging 10526 10158 1.42 232.045 ms sched-messaging 10298 10158 1.41 222.396 ms sched-messaging 10410 10158 1.38 221.853 ms sched-messaging 10364 10158 1.38 226.042 ms sched-messaging 10480 10158 1.36 213.633 ms sched-messaging 10370 10158 1.36 223.620 ms sched-messaging 10553 10158 1.34 217.169 ms sched-messaging 10291 10158 1.34 211.516 ms sched-messaging 10251 10158 1.34 218.813 ms sched-messaging 10522 10158 1.33 218.498 ms sched-messaging 10288 10158 1.33 216.787 ms sched-messaging <SNIP> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-15-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-12 08:49:15 +00:00
perf kwork top -b
perf kwork: Implement perf kwork timehist Implements framework of perf kwork timehist, to provide an analysis of kernel work events. Test cases: # perf kwork tim Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 91576.060290 91576.060344 [0000] (s)RCU:9 0.055 0.111 91576.061470 91576.061547 [0000] (s)SCHED:7 0.077 0.073 91576.062604 91576.062697 [0001] (s)RCU:9 0.094 0.409 91576.064443 91576.064517 [0002] (s)RCU:9 0.074 0.114 91576.065144 91576.065211 [0000] (s)SCHED:7 0.067 0.058 91576.066564 91576.066609 [0003] (s)RCU:9 0.045 0.110 91576.068495 91576.068559 [0000] (s)SCHED:7 0.064 0.059 91576.068900 91576.068996 [0004] (s)RCU:9 0.096 0.726 91576.069364 91576.069420 [0002] (s)RCU:9 0.056 0.082 91576.069649 91576.069701 [0004] (s)RCU:9 0.052 0.111 91576.070147 91576.070206 [0000] (s)SCHED:7 0.060 0.057 91576.073147 91576.073202 [0000] (s)SCHED:7 0.054 0.060 <SNIP> # perf kwork tim --max-stack 2 -g Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 91576.060290 91576.060344 [0000] (s)RCU:9 0.055 0.111 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.061470 91576.061547 [0000] (s)SCHED:7 0.077 0.073 irq_exit_rcu <- sysvec_call_function_single 91576.062604 91576.062697 [0001] (s)RCU:9 0.094 0.409 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.064443 91576.064517 [0002] (s)RCU:9 0.074 0.114 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.065144 91576.065211 [0000] (s)SCHED:7 0.067 0.058 irq_exit_rcu <- sysvec_call_function_single 91576.066564 91576.066609 [0003] (s)RCU:9 0.045 0.110 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.068495 91576.068559 [0000] (s)SCHED:7 0.064 0.059 irq_exit_rcu <- sysvec_call_function_single 91576.068900 91576.068996 [0004] (s)RCU:9 0.096 0.726 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.069364 91576.069420 [0002] (s)RCU:9 0.056 0.082 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.069649 91576.069701 [0004] (s)RCU:9 0.052 0.111 irq_exit_rcu <- sysvec_apic_timer_interrupt <SNIP> Committer testing: # perf kwork -k workqueue timehist | head -40 Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 26520.211825 26520.211832 [0019] (w)free_work 0.007 0.004 26520.212929 26520.212934 [0020] (w)free_work 0.005 0.004 26520.213226 26520.213228 [0014] (w)kfree_rcu_work 0.002 0.004 26520.214057 26520.214061 [0021] (w)free_work 0.004 0.004 26520.221239 26520.221241 [0007] (w)kfree_rcu_work 0.002 0.009 26520.223232 26520.223238 [0013] (w)psi_avgs_work 0.005 0.006 26520.230057 26520.230060 [0020] (w)free_work 0.003 0.003 26520.270428 26520.270434 [0015] (w)free_work 0.006 0.004 26520.270546 26520.270550 [0014] (w)free_work 0.004 0.003 26520.281626 26520.281629 [0015] (w)free_work 0.003 0.002 26520.287225 26520.287230 [0012] (w)psi_avgs_work 0.005 0.008 26520.287231 26520.287235 [0001] (w)psi_avgs_work 0.004 0.011 26520.287236 26520.287239 [0001] (w)psi_avgs_work 0.003 0.012 26520.329488 26520.329492 [0024] (w)free_work 0.004 0.004 26520.330600 26520.330605 [0007] (w)free_work 0.005 0.004 26520.334218 26520.334218 [0007] (w)kfree_rcu_monitor 0.001 0.002 26520.335220 26520.335221 [0005] (w)kfree_rcu_monitor 0.001 0.004 26520.343980 26520.343985 [0007] (w)free_work 0.005 0.002 26520.345093 26520.345097 [0006] (w)free_work 0.004 0.003 26520.351233 26520.351238 [0027] (w)psi_avgs_work 0.005 0.008 26520.353228 26520.353229 [0007] (w)kfree_rcu_work 0.001 0.002 26520.353229 26520.353231 [0005] (w)kfree_rcu_work 0.001 0.006 26520.382381 26520.382383 [0006] (w)free_work 0.003 0.002 26520.386547 26520.386548 [0006] (w)free_work 0.002 0.001 26520.391243 26520.391245 [0015] (w)console_callback 0.002 0.016 26520.415369 26520.415621 [0027] (w)btrfs_work_helper 0.252 26520.415351 26520.416174 [0002] (w)btrfs_work_helper 0.823 0.037 26520.415343 26520.416304 [0031] (w)btrfs_work_helper 0.961 26520.415335 26520.417078 [0001] (w)btrfs_work_helper 1.743 26520.415250 26520.417564 [0002] (w)wb_workfn 2.314 26520.424777 26520.424787 [0002] (w)btrfs_work_helper 0.010 26520.424788 26520.424798 [0002] (w)btrfs_work_helper 0.010 26520.424790 26520.424805 [0001] (w)btrfs_work_helper 0.016 0.016 26520.424801 26520.424807 [0002] (w)btrfs_work_helper 0.006 26520.424809 26520.424831 [0002] (w)btrfs_work_helper 0.022 0.030 26520.424824 26520.424835 [0027] (w)btrfs_work_helper 0.011 26520.424809 26520.424867 [0001] (w)btrfs_work_helper 0.059 0.032 # Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-14-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:29 +00:00
By default it shows the individual work events such as irq, workqeueu,
including the run time and delay (time between raise and actually entry):
Runtime start Runtime end Cpu Kwork name Runtime Delaytime
(TYPE)NAME:NUM (msec) (msec)
----------------- ----------------- ------ ------------------------- ---------- ----------
1811186.976062 1811186.976327 [0000] (s)RCU:9 0.266 0.114
1811186.978452 1811186.978547 [0000] (s)SCHED:7 0.095 0.171
1811186.980327 1811186.980490 [0000] (s)SCHED:7 0.162 0.083
1811186.981221 1811186.981271 [0000] (s)SCHED:7 0.050 0.077
1811186.984267 1811186.984318 [0000] (s)SCHED:7 0.051 0.075
1811186.987252 1811186.987315 [0000] (s)SCHED:7 0.063 0.081
1811186.987785 1811186.987843 [0006] (s)RCU:9 0.058 0.645
1811186.988319 1811186.988383 [0000] (s)SCHED:7 0.064 0.143
1811186.989404 1811186.989607 [0002] (s)TIMER:1 0.203 0.111
1811186.989660 1811186.989732 [0002] (s)SCHED:7 0.072 0.310
1811186.991295 1811186.991407 [0002] eth0:10 0.112
1811186.991639 1811186.991734 [0002] (s)NET_RX:3 0.095 0.277
1811186.989860 1811186.991826 [0002] (w)vmstat_shepherd 1.966 0.345
...
Times are in msec.usec.
perf kwork: New tool to trace time properties of kernel work (such as softirq, and workqueue) The 'perf kwork' tool is used to trace time properties of kernel work (such as irq, softirq, and workqueue), including runtime, latency, and timehist, using the infrastructure in the perf tools to allow tracing extra targets. This is the first commit to reuse the 'perf record' framework code to implement a simple record function, kwork is not supported currently. Test cases: # perf usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] The most commonly used perf commands are: <SNIP> iostat Show I/O performance metrics kallsyms Searches running kernel for symbols kmem Tool to trace/measure kernel memory properties kvm Tool to trace/measure kvm guest os kwork Tool to trace/measure kernel work properties (latencies) list List all symbolic event types lock Analyze lock events mem Profile memory accesses record Run a command and record its profile into perf.data <SNIP> See 'perf help COMMAND' for more information on a specific command. # perf kwork Usage: perf kwork [<options>] {record} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile -v, --verbose be more verbose (show symbol address, etc) # perf kwork record -- sleep 1 [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.787 MB perf.data ] Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-2-yangjihong1@huawei.com [ Add {} for multiline if blocks ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:17 +00:00
OPTIONS
-------
-D::
--dump-raw-trace=::
Display verbose dump of the sched data.
-f::
--force::
Don't complain, do it.
-k::
--kwork::
List of kwork to profile (irq, softirq, workqueue, sched, etc)
perf kwork: New tool to trace time properties of kernel work (such as softirq, and workqueue) The 'perf kwork' tool is used to trace time properties of kernel work (such as irq, softirq, and workqueue), including runtime, latency, and timehist, using the infrastructure in the perf tools to allow tracing extra targets. This is the first commit to reuse the 'perf record' framework code to implement a simple record function, kwork is not supported currently. Test cases: # perf usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] The most commonly used perf commands are: <SNIP> iostat Show I/O performance metrics kallsyms Searches running kernel for symbols kmem Tool to trace/measure kernel memory properties kvm Tool to trace/measure kvm guest os kwork Tool to trace/measure kernel work properties (latencies) list List all symbolic event types lock Analyze lock events mem Profile memory accesses record Run a command and record its profile into perf.data <SNIP> See 'perf help COMMAND' for more information on a specific command. # perf kwork Usage: perf kwork [<options>] {record} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile -v, --verbose be more verbose (show symbol address, etc) # perf kwork record -- sleep 1 [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.787 MB perf.data ] Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-2-yangjihong1@huawei.com [ Add {} for multiline if blocks ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:17 +00:00
-v::
--verbose::
Be more verbose. (show symbol address, etc)
perf kwork: Implement 'report' subcommand Implements framework of 'perf kwork report', which is used to report time properties such as run time and frequency: Test cases: # perf kwork Usage: perf kwork [<options>] {record|report} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile (irq, softirq, workqueue, etc) -v, --verbose be more verbose (show symbol address, etc) # perf kwork report -h Usage: perf kwork report [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork report Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- # perf kwork report -S Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- Total count : 0 Total runtime (msec) : 0.000 (0.000% load average) Total time span (msec) : 0.000 -------------------------------------------------------------------------------------------------------------------------------- # perf kwork report -C 0,100 Requested CPU 100 too large. Consider raising MAX_NR_CPUS Invalid cpu bitmap # perf kwork report -s runtime1 Error: Unknown --sort key: `runtime1' Usage: perf kwork report [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork report -i perf_no_exist.data failed to open perf_no_exist.data: No such file or directory # perf kwork report --time 00FFF, Invalid time span Since there are no report supported events, the output is empty. Briefly describe the data structure: 1. "class" indicates event type. For example, irq and softiq correspond to different types. 2. "cluster" refers to a specific event corresponding to a type. For example, RCU and TIMER in softirq correspond to different clusters, which contains three types of events: raise, entry, and exit. 3. "atom" includes time of each sample and sample of the previous phase. (For example, exit corresponds to entry, which is used for timehist.) Committer notes: - Add {} for multiline if blocks. - report_print_work() should either return that ret variable that accounts how many bytes were printed or stop accounting and be void. Do the former for now to avoid this: builtin-kwork.c:534:6: error: variable 'ret' set but not used [-Werror,-Wunused-but-set-variable] int ret = 0; ^ 1 error generated. When building with: ⬢[acme@toolbox perf]$ clang --version clang version 13.0.0 (https://github.com/llvm/llvm-project e8991caea8690ec2d17b0b7e1c29bf0da6609076) Also: - if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) { + if (dst_type < KWORK_TRACE_MAX) { Several versions of clang and at least this gcc: 3 51.40 alpine:3.9 : FAIL gcc version 8.3.0 (Alpine 8.3.0) builtin-kwork.c:411:16: error: comparison of unsigned enum expression >= 0 is always true [-Werror,-Wtautological-compare] if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) { As the first entry in a enum is zero. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-7-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:22 +00:00
OPTIONS for 'perf kwork report'
----------------------------
perf kwork: Implement BPF trace 'perf record' generates perf.data, which generates extra interrupts for hard disk, amount of data to be collected increases with time. Using eBPF trace can process the data in kernel, which solves the preceding two problems. Add -b/--use-bpf option for latency and report to support tracing kwork events using eBPF: 1. Create bpf prog and attach to tracepoints, 2. Start tracing after command is entered, 3. After user hit "ctrl+c", stop tracing and report, 4. Support CPU and name filtering. This commit implements the framework code and does not add specific event support. Test cases: # perf kwork rep -h Usage: perf kwork report [<options>] -b, --use-bpf Use BPF to measure kwork runtime -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork lat -h Usage: perf kwork latency [<options>] -b, --use-bpf Use BPF to measure kwork latency -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat -b Unsupported bpf trace class irq # perf kwork rep -b Unsupported bpf trace class irq Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-15-yangjihong1@huawei.com [ Simplify work_findnew() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:30 +00:00
-b::
--use-bpf::
Use BPF to measure kwork runtime
perf kwork: Implement 'report' subcommand Implements framework of 'perf kwork report', which is used to report time properties such as run time and frequency: Test cases: # perf kwork Usage: perf kwork [<options>] {record|report} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile (irq, softirq, workqueue, etc) -v, --verbose be more verbose (show symbol address, etc) # perf kwork report -h Usage: perf kwork report [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork report Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- # perf kwork report -S Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- Total count : 0 Total runtime (msec) : 0.000 (0.000% load average) Total time span (msec) : 0.000 -------------------------------------------------------------------------------------------------------------------------------- # perf kwork report -C 0,100 Requested CPU 100 too large. Consider raising MAX_NR_CPUS Invalid cpu bitmap # perf kwork report -s runtime1 Error: Unknown --sort key: `runtime1' Usage: perf kwork report [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork report -i perf_no_exist.data failed to open perf_no_exist.data: No such file or directory # perf kwork report --time 00FFF, Invalid time span Since there are no report supported events, the output is empty. Briefly describe the data structure: 1. "class" indicates event type. For example, irq and softiq correspond to different types. 2. "cluster" refers to a specific event corresponding to a type. For example, RCU and TIMER in softirq correspond to different clusters, which contains three types of events: raise, entry, and exit. 3. "atom" includes time of each sample and sample of the previous phase. (For example, exit corresponds to entry, which is used for timehist.) Committer notes: - Add {} for multiline if blocks. - report_print_work() should either return that ret variable that accounts how many bytes were printed or stop accounting and be void. Do the former for now to avoid this: builtin-kwork.c:534:6: error: variable 'ret' set but not used [-Werror,-Wunused-but-set-variable] int ret = 0; ^ 1 error generated. When building with: ⬢[acme@toolbox perf]$ clang --version clang version 13.0.0 (https://github.com/llvm/llvm-project e8991caea8690ec2d17b0b7e1c29bf0da6609076) Also: - if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) { + if (dst_type < KWORK_TRACE_MAX) { Several versions of clang and at least this gcc: 3 51.40 alpine:3.9 : FAIL gcc version 8.3.0 (Alpine 8.3.0) builtin-kwork.c:411:16: error: comparison of unsigned enum expression >= 0 is always true [-Werror,-Wtautological-compare] if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) { As the first entry in a enum is zero. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-7-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:22 +00:00
-C::
--cpu::
Only show events for the given CPU(s) (comma separated list).
-i::
--input::
Input file name. (default: perf.data unless stdin is a fifo)
-n::
--name::
Only show events for the given name.
-s::
--sort::
Sort by key(s): runtime, max, count
-S::
--with-summary::
Show summary with statistics
--time::
Only analyze samples within given time window: <start>,<stop>. Times
have the format seconds.microseconds. If start is not given (i.e., time
string is ',x.y') then analysis starts at the beginning of the file. If
stop time is not given (i.e, time string is 'x.y,') then analysis goes
to end of file.
perf kwork: Implement perf kwork latency Implements framework of perf kwork latency, which is used to report time properties such as delay time and frequency. Test cases: # perf kwork lat -h Usage: perf kwork latency [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat -C 199 Requested CPU 199 too large. Consider raising MAX_NR_CPUS Invalid cpu bitmap # perf kwork lat -i perf_no_exist.data failed to open perf_no_exist.data: No such file or directory # perf kwork lat -s avg1 Error: Unknown --sort key: `avg1' Usage: perf kwork latency [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat --time FFFF, Invalid time span # perf kwork lat Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- INFO: 36.570% skipped events (31537 including 0 raise, 31537 entry, 0 exit) Since there are no latency-enabled events, the output is empty. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-11-yangjihong1@huawei.com [ Add {} for multiline if blocks ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:26 +00:00
OPTIONS for 'perf kwork latency'
----------------------------
perf kwork: Implement BPF trace 'perf record' generates perf.data, which generates extra interrupts for hard disk, amount of data to be collected increases with time. Using eBPF trace can process the data in kernel, which solves the preceding two problems. Add -b/--use-bpf option for latency and report to support tracing kwork events using eBPF: 1. Create bpf prog and attach to tracepoints, 2. Start tracing after command is entered, 3. After user hit "ctrl+c", stop tracing and report, 4. Support CPU and name filtering. This commit implements the framework code and does not add specific event support. Test cases: # perf kwork rep -h Usage: perf kwork report [<options>] -b, --use-bpf Use BPF to measure kwork runtime -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): runtime, max, count -S, --with-summary Show summary with statistics --time <str> Time span for analysis (start,stop) # perf kwork lat -h Usage: perf kwork latency [<options>] -b, --use-bpf Use BPF to measure kwork latency -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat -b Unsupported bpf trace class irq # perf kwork rep -b Unsupported bpf trace class irq Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-15-yangjihong1@huawei.com [ Simplify work_findnew() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:30 +00:00
-b::
--use-bpf::
Use BPF to measure kwork latency
perf kwork: Implement perf kwork latency Implements framework of perf kwork latency, which is used to report time properties such as delay time and frequency. Test cases: # perf kwork lat -h Usage: perf kwork latency [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat -C 199 Requested CPU 199 too large. Consider raising MAX_NR_CPUS Invalid cpu bitmap # perf kwork lat -i perf_no_exist.data failed to open perf_no_exist.data: No such file or directory # perf kwork lat -s avg1 Error: Unknown --sort key: `avg1' Usage: perf kwork latency [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): avg, max, count --time <str> Time span for analysis (start,stop) # perf kwork lat --time FFFF, Invalid time span # perf kwork lat Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end | -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- INFO: 36.570% skipped events (31537 including 0 raise, 31537 entry, 0 exit) Since there are no latency-enabled events, the output is empty. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-11-yangjihong1@huawei.com [ Add {} for multiline if blocks ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:26 +00:00
-C::
--cpu::
Only show events for the given CPU(s) (comma separated list).
-i::
--input::
Input file name. (default: perf.data unless stdin is a fifo)
-n::
--name::
Only show events for the given name.
-s::
--sort::
Sort by key(s): avg, max, count
--time::
Only analyze samples within given time window: <start>,<stop>. Times
have the format seconds.microseconds. If start is not given (i.e., time
string is ',x.y') then analysis starts at the beginning of the file. If
stop time is not given (i.e, time string is 'x.y,') then analysis goes
to end of file.
perf kwork: Implement perf kwork timehist Implements framework of perf kwork timehist, to provide an analysis of kernel work events. Test cases: # perf kwork tim Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 91576.060290 91576.060344 [0000] (s)RCU:9 0.055 0.111 91576.061470 91576.061547 [0000] (s)SCHED:7 0.077 0.073 91576.062604 91576.062697 [0001] (s)RCU:9 0.094 0.409 91576.064443 91576.064517 [0002] (s)RCU:9 0.074 0.114 91576.065144 91576.065211 [0000] (s)SCHED:7 0.067 0.058 91576.066564 91576.066609 [0003] (s)RCU:9 0.045 0.110 91576.068495 91576.068559 [0000] (s)SCHED:7 0.064 0.059 91576.068900 91576.068996 [0004] (s)RCU:9 0.096 0.726 91576.069364 91576.069420 [0002] (s)RCU:9 0.056 0.082 91576.069649 91576.069701 [0004] (s)RCU:9 0.052 0.111 91576.070147 91576.070206 [0000] (s)SCHED:7 0.060 0.057 91576.073147 91576.073202 [0000] (s)SCHED:7 0.054 0.060 <SNIP> # perf kwork tim --max-stack 2 -g Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 91576.060290 91576.060344 [0000] (s)RCU:9 0.055 0.111 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.061470 91576.061547 [0000] (s)SCHED:7 0.077 0.073 irq_exit_rcu <- sysvec_call_function_single 91576.062604 91576.062697 [0001] (s)RCU:9 0.094 0.409 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.064443 91576.064517 [0002] (s)RCU:9 0.074 0.114 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.065144 91576.065211 [0000] (s)SCHED:7 0.067 0.058 irq_exit_rcu <- sysvec_call_function_single 91576.066564 91576.066609 [0003] (s)RCU:9 0.045 0.110 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.068495 91576.068559 [0000] (s)SCHED:7 0.064 0.059 irq_exit_rcu <- sysvec_call_function_single 91576.068900 91576.068996 [0004] (s)RCU:9 0.096 0.726 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.069364 91576.069420 [0002] (s)RCU:9 0.056 0.082 irq_exit_rcu <- sysvec_apic_timer_interrupt 91576.069649 91576.069701 [0004] (s)RCU:9 0.052 0.111 irq_exit_rcu <- sysvec_apic_timer_interrupt <SNIP> Committer testing: # perf kwork -k workqueue timehist | head -40 Runtime start Runtime end Cpu Kwork name Runtime Delaytime (TYPE)NAME:NUM (msec) (msec) ----------------- ----------------- ------ ------------------------------ ---------- ---------- 26520.211825 26520.211832 [0019] (w)free_work 0.007 0.004 26520.212929 26520.212934 [0020] (w)free_work 0.005 0.004 26520.213226 26520.213228 [0014] (w)kfree_rcu_work 0.002 0.004 26520.214057 26520.214061 [0021] (w)free_work 0.004 0.004 26520.221239 26520.221241 [0007] (w)kfree_rcu_work 0.002 0.009 26520.223232 26520.223238 [0013] (w)psi_avgs_work 0.005 0.006 26520.230057 26520.230060 [0020] (w)free_work 0.003 0.003 26520.270428 26520.270434 [0015] (w)free_work 0.006 0.004 26520.270546 26520.270550 [0014] (w)free_work 0.004 0.003 26520.281626 26520.281629 [0015] (w)free_work 0.003 0.002 26520.287225 26520.287230 [0012] (w)psi_avgs_work 0.005 0.008 26520.287231 26520.287235 [0001] (w)psi_avgs_work 0.004 0.011 26520.287236 26520.287239 [0001] (w)psi_avgs_work 0.003 0.012 26520.329488 26520.329492 [0024] (w)free_work 0.004 0.004 26520.330600 26520.330605 [0007] (w)free_work 0.005 0.004 26520.334218 26520.334218 [0007] (w)kfree_rcu_monitor 0.001 0.002 26520.335220 26520.335221 [0005] (w)kfree_rcu_monitor 0.001 0.004 26520.343980 26520.343985 [0007] (w)free_work 0.005 0.002 26520.345093 26520.345097 [0006] (w)free_work 0.004 0.003 26520.351233 26520.351238 [0027] (w)psi_avgs_work 0.005 0.008 26520.353228 26520.353229 [0007] (w)kfree_rcu_work 0.001 0.002 26520.353229 26520.353231 [0005] (w)kfree_rcu_work 0.001 0.006 26520.382381 26520.382383 [0006] (w)free_work 0.003 0.002 26520.386547 26520.386548 [0006] (w)free_work 0.002 0.001 26520.391243 26520.391245 [0015] (w)console_callback 0.002 0.016 26520.415369 26520.415621 [0027] (w)btrfs_work_helper 0.252 26520.415351 26520.416174 [0002] (w)btrfs_work_helper 0.823 0.037 26520.415343 26520.416304 [0031] (w)btrfs_work_helper 0.961 26520.415335 26520.417078 [0001] (w)btrfs_work_helper 1.743 26520.415250 26520.417564 [0002] (w)wb_workfn 2.314 26520.424777 26520.424787 [0002] (w)btrfs_work_helper 0.010 26520.424788 26520.424798 [0002] (w)btrfs_work_helper 0.010 26520.424790 26520.424805 [0001] (w)btrfs_work_helper 0.016 0.016 26520.424801 26520.424807 [0002] (w)btrfs_work_helper 0.006 26520.424809 26520.424831 [0002] (w)btrfs_work_helper 0.022 0.030 26520.424824 26520.424835 [0027] (w)btrfs_work_helper 0.011 26520.424809 26520.424867 [0001] (w)btrfs_work_helper 0.059 0.032 # Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-14-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:29 +00:00
OPTIONS for 'perf kwork timehist'
---------------------------------
-C::
--cpu::
Only show events for the given CPU(s) (comma separated list).
-g::
--call-graph::
Display call chains if present (default off).
-i::
--input::
Input file name. (default: perf.data unless stdin is a fifo)
-k::
--vmlinux=<file>::
Vmlinux pathname
-n::
--name::
Only show events for the given name.
--kallsyms=<file>::
Kallsyms pathname
--max-stack::
Maximum number of functions to display in backtrace, default 5.
--symfs=<directory>::
Look for files with symbols relative to this directory.
--time::
Only analyze samples within given time window: <start>,<stop>. Times
have the format seconds.microseconds. If start is not given (i.e., time
string is ',x.y') then analysis starts at the beginning of the file. If
stop time is not given (i.e, time string is 'x.y,') then analysis goes
to end of file.
perf kwork top: Add -C/--cpu -i/--input -n/--name -s/--sort --time options Provide the following options for perf kwork top: 1. -C, --cpu <cpu> list of cpus to profile 2. -i, --input <file> input file name 3. -n, --name <name> event name to profile 4. -s, --sort <key[,key2...]> sort by key(s): rate, runtime, tid 5. --time <str> Time span for analysis (start,stop) Example usage: # perf kwork top -h Usage: perf kwork top [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): rate, runtime, tid --time <str> Time span for analysis (start,stop) # perf kwork top -C 2,4,5 Total : 51226.940 ms, 3 cpus %Cpu(s): 92.59% id, 0.00% hi, 0.09% si %Cpu2 [| 4.61%] %Cpu4 [ 0.01%] %Cpu5 [||||| 17.31%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.98 17073.515 ms swapper/4 0 95.17 16250.874 ms swapper/2 0 82.62 14108.577 ms swapper/5 4342 21.70 3708.358 ms perf 16 0.13 22.296 ms rcu_preempt 75 0.02 4.261 ms kworker/2:1 98 0.01 2.540 ms jbd2/sda-8 61 0.01 3.404 ms kcompactd0 87 0.00 0.145 ms kworker/5:1H 73 0.00 0.596 ms kworker/5:1 41 0.00 0.041 ms ksoftirqd/5 40 0.00 0.718 ms migration/5 64 0.00 0.115 ms kworker/4:1 35 0.00 0.556 ms migration/4 353 0.00 1.143 ms sshd 26 0.00 1.665 ms ksoftirqd/2 25 0.00 0.662 ms migration/2 # perf kwork top -i perf.data Total : 136601.588 ms, 8 cpus %Cpu(s): 95.66% id, 0.04% hi, 0.05% si %Cpu0 [ 0.02%] %Cpu1 [ 0.01%] %Cpu2 [| 4.61%] %Cpu3 [ 0.04%] %Cpu4 [ 0.01%] %Cpu5 [||||| 17.31%] %Cpu6 [ 0.51%] %Cpu7 [||| 11.42%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.98 17073.515 ms swapper/4 0 99.98 17072.173 ms swapper/1 0 99.93 17064.229 ms swapper/3 0 99.62 17011.013 ms swapper/0 0 99.47 16985.180 ms swapper/6 0 95.17 16250.874 ms swapper/2 0 88.51 15111.684 ms swapper/7 0 82.62 14108.577 ms swapper/5 4342 33.00 5644.045 ms perf 4344 0.43 74.351 ms perf 16 0.13 22.296 ms rcu_preempt 4345 0.05 10.093 ms perf 4343 0.05 8.769 ms perf 4341 0.02 4.882 ms perf 4095 0.02 4.605 ms kworker/7:1 75 0.02 4.261 ms kworker/2:1 120 0.01 1.909 ms systemd-journal 98 0.01 2.540 ms jbd2/sda-8 61 0.01 3.404 ms kcompactd0 667 0.01 2.542 ms kworker/u16:2 4340 0.00 1.052 ms kworker/7:2 97 0.00 0.489 ms kworker/7:1H 51 0.00 0.209 ms ksoftirqd/7 50 0.00 0.646 ms migration/7 76 0.00 0.753 ms kworker/6:1 45 0.00 0.572 ms migration/6 87 0.00 0.145 ms kworker/5:1H 73 0.00 0.596 ms kworker/5:1 41 0.00 0.041 ms ksoftirqd/5 40 0.00 0.718 ms migration/5 64 0.00 0.115 ms kworker/4:1 35 0.00 0.556 ms migration/4 353 0.00 2.600 ms sshd 74 0.00 0.205 ms kworker/3:1 33 0.00 1.576 ms kworker/3:0H 30 0.00 0.996 ms migration/3 26 0.00 1.665 ms ksoftirqd/2 25 0.00 0.662 ms migration/2 397 0.00 0.057 ms kworker/1:1 20 0.00 1.005 ms migration/1 2909 0.00 1.053 ms kworker/0:2 17 0.00 0.720 ms migration/0 15 0.00 0.039 ms ksoftirqd/0 # perf kwork top -n perf Total : 136601.588 ms, 8 cpus %Cpu(s): 95.66% id, 0.04% hi, 0.05% si %Cpu0 [ 0.01%] %Cpu1 [ 0.00%] %Cpu2 [| 4.44%] %Cpu3 [ 0.00%] %Cpu4 [ 0.00%] %Cpu5 [ 0.00%] %Cpu6 [ 0.49%] %Cpu7 [||| 11.38%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 4342 15.74 2695.516 ms perf 4344 0.43 74.351 ms perf 4345 0.05 10.093 ms perf 4343 0.05 8.769 ms perf 4341 0.02 4.882 ms perf # perf kwork top -s tid Total : 136601.588 ms, 8 cpus %Cpu(s): 95.66% id, 0.04% hi, 0.05% si %Cpu0 [ 0.02%] %Cpu1 [ 0.01%] %Cpu2 [| 4.61%] %Cpu3 [ 0.04%] %Cpu4 [ 0.01%] %Cpu5 [||||| 17.31%] %Cpu6 [ 0.51%] %Cpu7 [||| 11.42%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.62 17011.013 ms swapper/0 0 99.98 17072.173 ms swapper/1 0 95.17 16250.874 ms swapper/2 0 99.93 17064.229 ms swapper/3 0 99.98 17073.515 ms swapper/4 0 82.62 14108.577 ms swapper/5 0 99.47 16985.180 ms swapper/6 0 88.51 15111.684 ms swapper/7 15 0.00 0.039 ms ksoftirqd/0 16 0.13 22.296 ms rcu_preempt 17 0.00 0.720 ms migration/0 20 0.00 1.005 ms migration/1 25 0.00 0.662 ms migration/2 26 0.00 1.665 ms ksoftirqd/2 30 0.00 0.996 ms migration/3 33 0.00 1.576 ms kworker/3:0H 35 0.00 0.556 ms migration/4 40 0.00 0.718 ms migration/5 41 0.00 0.041 ms ksoftirqd/5 45 0.00 0.572 ms migration/6 50 0.00 0.646 ms migration/7 51 0.00 0.209 ms ksoftirqd/7 61 0.01 3.404 ms kcompactd0 64 0.00 0.115 ms kworker/4:1 73 0.00 0.596 ms kworker/5:1 74 0.00 0.205 ms kworker/3:1 75 0.02 4.261 ms kworker/2:1 76 0.00 0.753 ms kworker/6:1 87 0.00 0.145 ms kworker/5:1H 97 0.00 0.489 ms kworker/7:1H 98 0.01 2.540 ms jbd2/sda-8 120 0.01 1.909 ms systemd-journal 353 0.00 2.600 ms sshd 397 0.00 0.057 ms kworker/1:1 667 0.01 2.542 ms kworker/u16:2 2909 0.00 1.053 ms kworker/0:2 4095 0.02 4.605 ms kworker/7:1 4340 0.00 1.052 ms kworker/7:2 4341 0.02 4.882 ms perf 4342 33.00 5644.045 ms perf 4343 0.05 8.769 ms perf 4344 0.43 74.351 ms perf 4345 0.05 10.093 ms perf # perf kwork top --time 128800, Total : 53495.122 ms, 8 cpus %Cpu(s): 94.71% id, 0.09% hi, 0.09% si %Cpu0 [ 0.07%] %Cpu1 [ 0.04%] %Cpu2 [|| 8.49%] %Cpu3 [ 0.09%] %Cpu4 [ 0.02%] %Cpu5 [ 0.06%] %Cpu6 [ 0.12%] %Cpu7 [|||||| 21.24%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.96 3981.363 ms swapper/4 0 99.94 3978.955 ms swapper/1 0 99.91 9329.375 ms swapper/5 0 99.87 4906.829 ms swapper/3 0 99.86 9028.064 ms swapper/6 0 98.67 3928.161 ms swapper/0 0 91.17 8388.432 ms swapper/2 0 78.65 7125.602 ms swapper/7 4342 29.42 2675.198 ms perf 16 0.18 16.817 ms rcu_preempt 4345 0.09 8.183 ms perf 4344 0.04 4.290 ms perf 4343 0.03 2.844 ms perf 353 0.03 2.600 ms sshd 4095 0.02 2.702 ms kworker/7:1 120 0.02 1.909 ms systemd-journal 98 0.02 2.540 ms jbd2/sda-8 61 0.02 1.886 ms kcompactd0 667 0.02 1.011 ms kworker/u16:2 75 0.02 2.693 ms kworker/2:1 4341 0.01 1.838 ms perf 30 0.01 0.788 ms migration/3 26 0.01 1.665 ms ksoftirqd/2 20 0.01 0.752 ms migration/1 2909 0.01 0.604 ms kworker/0:2 4340 0.00 0.635 ms kworker/7:2 97 0.00 0.214 ms kworker/7:1H 51 0.00 0.209 ms ksoftirqd/7 50 0.00 0.646 ms migration/7 76 0.00 0.602 ms kworker/6:1 45 0.00 0.366 ms migration/6 87 0.00 0.145 ms kworker/5:1H 40 0.00 0.446 ms migration/5 35 0.00 0.318 ms migration/4 74 0.00 0.205 ms kworker/3:1 33 0.00 0.080 ms kworker/3:0H 25 0.00 0.448 ms migration/2 397 0.00 0.057 ms kworker/1:1 17 0.00 0.365 ms migration/0 Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-14-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-12 08:49:14 +00:00
OPTIONS for 'perf kwork top'
---------------------------------
perf kwork top: Implements BPF-based cpu usage statistics Use BPF to collect statistics on the CPU usage based on perf BPF skeletons. Example usage: # perf kwork top -h Usage: perf kwork top [<options>] -b, --use-bpf Use BPF to measure task cpu usage -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): rate, runtime, tid --time <str> Time span for analysis (start,stop) # # perf kwork -k sched top -b Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [|||||||||||||||||| 61.66%] %Cpu1 [|||||||||||||||||| 61.27%] %Cpu2 [||||||||||||||||||| 66.40%] %Cpu3 [|||||||||||||||||| 61.28%] %Cpu4 [|||||||||||||||||| 61.82%] %Cpu5 [||||||||||||||||||||||| 77.41%] %Cpu6 [|||||||||||||||||| 61.73%] %Cpu7 [|||||||||||||||||| 63.25%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP> # perf kwork -k sched top -b -n perf Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 151563.332 ms, 8 cpus %Cpu(s): 26.49% id, 0.00% hi, 0.00% si %Cpu0 [ 0.01%] %Cpu1 [ 0.00%] %Cpu2 [ 0.00%] %Cpu3 [ 0.00%] %Cpu4 [ 0.00%] %Cpu5 [ 0.00%] %Cpu6 [ 0.00%] %Cpu7 [ 0.00%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 9754 9754 0.01 2.303 ms perf # # perf kwork -k sched top -b -C 2,3,4 Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 48016.721 ms, 3 cpus %Cpu(s): 27.82% id, 0.00% hi, 0.00% si %Cpu2 [|||||||||||||||||||||| 74.68%] %Cpu3 [||||||||||||||||||||| 71.06%] %Cpu4 [||||||||||||||||||||| 70.91%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 29.08 4734.998 ms [swapper/4] 0 0 28.93 4710.029 ms [swapper/3] 0 0 25.31 3912.363 ms [swapper/2] 10248 10158 1.62 264.931 ms sched-messaging 10253 10158 1.62 265.136 ms sched-messaging 10158 10158 1.60 263.013 ms bash 10360 10158 1.49 243.639 ms sched-messaging 10413 10158 1.48 238.604 ms sched-messaging 10531 10158 1.47 234.067 ms sched-messaging 10400 10158 1.47 240.631 ms sched-messaging 10355 10158 1.47 230.586 ms sched-messaging 10377 10158 1.43 234.835 ms sched-messaging 10526 10158 1.42 232.045 ms sched-messaging 10298 10158 1.41 222.396 ms sched-messaging 10410 10158 1.38 221.853 ms sched-messaging 10364 10158 1.38 226.042 ms sched-messaging 10480 10158 1.36 213.633 ms sched-messaging 10370 10158 1.36 223.620 ms sched-messaging 10553 10158 1.34 217.169 ms sched-messaging 10291 10158 1.34 211.516 ms sched-messaging 10251 10158 1.34 218.813 ms sched-messaging 10522 10158 1.33 218.498 ms sched-messaging 10288 10158 1.33 216.787 ms sched-messaging <SNIP> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-15-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-12 08:49:15 +00:00
-b::
--use-bpf::
Use BPF to measure task cpu usage.
perf kwork top: Add -C/--cpu -i/--input -n/--name -s/--sort --time options Provide the following options for perf kwork top: 1. -C, --cpu <cpu> list of cpus to profile 2. -i, --input <file> input file name 3. -n, --name <name> event name to profile 4. -s, --sort <key[,key2...]> sort by key(s): rate, runtime, tid 5. --time <str> Time span for analysis (start,stop) Example usage: # perf kwork top -h Usage: perf kwork top [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): rate, runtime, tid --time <str> Time span for analysis (start,stop) # perf kwork top -C 2,4,5 Total : 51226.940 ms, 3 cpus %Cpu(s): 92.59% id, 0.00% hi, 0.09% si %Cpu2 [| 4.61%] %Cpu4 [ 0.01%] %Cpu5 [||||| 17.31%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.98 17073.515 ms swapper/4 0 95.17 16250.874 ms swapper/2 0 82.62 14108.577 ms swapper/5 4342 21.70 3708.358 ms perf 16 0.13 22.296 ms rcu_preempt 75 0.02 4.261 ms kworker/2:1 98 0.01 2.540 ms jbd2/sda-8 61 0.01 3.404 ms kcompactd0 87 0.00 0.145 ms kworker/5:1H 73 0.00 0.596 ms kworker/5:1 41 0.00 0.041 ms ksoftirqd/5 40 0.00 0.718 ms migration/5 64 0.00 0.115 ms kworker/4:1 35 0.00 0.556 ms migration/4 353 0.00 1.143 ms sshd 26 0.00 1.665 ms ksoftirqd/2 25 0.00 0.662 ms migration/2 # perf kwork top -i perf.data Total : 136601.588 ms, 8 cpus %Cpu(s): 95.66% id, 0.04% hi, 0.05% si %Cpu0 [ 0.02%] %Cpu1 [ 0.01%] %Cpu2 [| 4.61%] %Cpu3 [ 0.04%] %Cpu4 [ 0.01%] %Cpu5 [||||| 17.31%] %Cpu6 [ 0.51%] %Cpu7 [||| 11.42%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.98 17073.515 ms swapper/4 0 99.98 17072.173 ms swapper/1 0 99.93 17064.229 ms swapper/3 0 99.62 17011.013 ms swapper/0 0 99.47 16985.180 ms swapper/6 0 95.17 16250.874 ms swapper/2 0 88.51 15111.684 ms swapper/7 0 82.62 14108.577 ms swapper/5 4342 33.00 5644.045 ms perf 4344 0.43 74.351 ms perf 16 0.13 22.296 ms rcu_preempt 4345 0.05 10.093 ms perf 4343 0.05 8.769 ms perf 4341 0.02 4.882 ms perf 4095 0.02 4.605 ms kworker/7:1 75 0.02 4.261 ms kworker/2:1 120 0.01 1.909 ms systemd-journal 98 0.01 2.540 ms jbd2/sda-8 61 0.01 3.404 ms kcompactd0 667 0.01 2.542 ms kworker/u16:2 4340 0.00 1.052 ms kworker/7:2 97 0.00 0.489 ms kworker/7:1H 51 0.00 0.209 ms ksoftirqd/7 50 0.00 0.646 ms migration/7 76 0.00 0.753 ms kworker/6:1 45 0.00 0.572 ms migration/6 87 0.00 0.145 ms kworker/5:1H 73 0.00 0.596 ms kworker/5:1 41 0.00 0.041 ms ksoftirqd/5 40 0.00 0.718 ms migration/5 64 0.00 0.115 ms kworker/4:1 35 0.00 0.556 ms migration/4 353 0.00 2.600 ms sshd 74 0.00 0.205 ms kworker/3:1 33 0.00 1.576 ms kworker/3:0H 30 0.00 0.996 ms migration/3 26 0.00 1.665 ms ksoftirqd/2 25 0.00 0.662 ms migration/2 397 0.00 0.057 ms kworker/1:1 20 0.00 1.005 ms migration/1 2909 0.00 1.053 ms kworker/0:2 17 0.00 0.720 ms migration/0 15 0.00 0.039 ms ksoftirqd/0 # perf kwork top -n perf Total : 136601.588 ms, 8 cpus %Cpu(s): 95.66% id, 0.04% hi, 0.05% si %Cpu0 [ 0.01%] %Cpu1 [ 0.00%] %Cpu2 [| 4.44%] %Cpu3 [ 0.00%] %Cpu4 [ 0.00%] %Cpu5 [ 0.00%] %Cpu6 [ 0.49%] %Cpu7 [||| 11.38%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 4342 15.74 2695.516 ms perf 4344 0.43 74.351 ms perf 4345 0.05 10.093 ms perf 4343 0.05 8.769 ms perf 4341 0.02 4.882 ms perf # perf kwork top -s tid Total : 136601.588 ms, 8 cpus %Cpu(s): 95.66% id, 0.04% hi, 0.05% si %Cpu0 [ 0.02%] %Cpu1 [ 0.01%] %Cpu2 [| 4.61%] %Cpu3 [ 0.04%] %Cpu4 [ 0.01%] %Cpu5 [||||| 17.31%] %Cpu6 [ 0.51%] %Cpu7 [||| 11.42%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.62 17011.013 ms swapper/0 0 99.98 17072.173 ms swapper/1 0 95.17 16250.874 ms swapper/2 0 99.93 17064.229 ms swapper/3 0 99.98 17073.515 ms swapper/4 0 82.62 14108.577 ms swapper/5 0 99.47 16985.180 ms swapper/6 0 88.51 15111.684 ms swapper/7 15 0.00 0.039 ms ksoftirqd/0 16 0.13 22.296 ms rcu_preempt 17 0.00 0.720 ms migration/0 20 0.00 1.005 ms migration/1 25 0.00 0.662 ms migration/2 26 0.00 1.665 ms ksoftirqd/2 30 0.00 0.996 ms migration/3 33 0.00 1.576 ms kworker/3:0H 35 0.00 0.556 ms migration/4 40 0.00 0.718 ms migration/5 41 0.00 0.041 ms ksoftirqd/5 45 0.00 0.572 ms migration/6 50 0.00 0.646 ms migration/7 51 0.00 0.209 ms ksoftirqd/7 61 0.01 3.404 ms kcompactd0 64 0.00 0.115 ms kworker/4:1 73 0.00 0.596 ms kworker/5:1 74 0.00 0.205 ms kworker/3:1 75 0.02 4.261 ms kworker/2:1 76 0.00 0.753 ms kworker/6:1 87 0.00 0.145 ms kworker/5:1H 97 0.00 0.489 ms kworker/7:1H 98 0.01 2.540 ms jbd2/sda-8 120 0.01 1.909 ms systemd-journal 353 0.00 2.600 ms sshd 397 0.00 0.057 ms kworker/1:1 667 0.01 2.542 ms kworker/u16:2 2909 0.00 1.053 ms kworker/0:2 4095 0.02 4.605 ms kworker/7:1 4340 0.00 1.052 ms kworker/7:2 4341 0.02 4.882 ms perf 4342 33.00 5644.045 ms perf 4343 0.05 8.769 ms perf 4344 0.43 74.351 ms perf 4345 0.05 10.093 ms perf # perf kwork top --time 128800, Total : 53495.122 ms, 8 cpus %Cpu(s): 94.71% id, 0.09% hi, 0.09% si %Cpu0 [ 0.07%] %Cpu1 [ 0.04%] %Cpu2 [|| 8.49%] %Cpu3 [ 0.09%] %Cpu4 [ 0.02%] %Cpu5 [ 0.06%] %Cpu6 [ 0.12%] %Cpu7 [|||||| 21.24%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.96 3981.363 ms swapper/4 0 99.94 3978.955 ms swapper/1 0 99.91 9329.375 ms swapper/5 0 99.87 4906.829 ms swapper/3 0 99.86 9028.064 ms swapper/6 0 98.67 3928.161 ms swapper/0 0 91.17 8388.432 ms swapper/2 0 78.65 7125.602 ms swapper/7 4342 29.42 2675.198 ms perf 16 0.18 16.817 ms rcu_preempt 4345 0.09 8.183 ms perf 4344 0.04 4.290 ms perf 4343 0.03 2.844 ms perf 353 0.03 2.600 ms sshd 4095 0.02 2.702 ms kworker/7:1 120 0.02 1.909 ms systemd-journal 98 0.02 2.540 ms jbd2/sda-8 61 0.02 1.886 ms kcompactd0 667 0.02 1.011 ms kworker/u16:2 75 0.02 2.693 ms kworker/2:1 4341 0.01 1.838 ms perf 30 0.01 0.788 ms migration/3 26 0.01 1.665 ms ksoftirqd/2 20 0.01 0.752 ms migration/1 2909 0.01 0.604 ms kworker/0:2 4340 0.00 0.635 ms kworker/7:2 97 0.00 0.214 ms kworker/7:1H 51 0.00 0.209 ms ksoftirqd/7 50 0.00 0.646 ms migration/7 76 0.00 0.602 ms kworker/6:1 45 0.00 0.366 ms migration/6 87 0.00 0.145 ms kworker/5:1H 40 0.00 0.446 ms migration/5 35 0.00 0.318 ms migration/4 74 0.00 0.205 ms kworker/3:1 33 0.00 0.080 ms kworker/3:0H 25 0.00 0.448 ms migration/2 397 0.00 0.057 ms kworker/1:1 17 0.00 0.365 ms migration/0 Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-14-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-12 08:49:14 +00:00
-C::
--cpu::
Only show events for the given CPU(s) (comma separated list).
-i::
--input::
Input file name. (default: perf.data unless stdin is a fifo)
-n::
--name::
Only show events for the given name.
-s::
--sort::
Sort by key(s): rate, runtime, tid
--time::
Only analyze samples within given time window: <start>,<stop>. Times
have the format seconds.microseconds. If start is not given (i.e., time
string is ',x.y') then analysis starts at the beginning of the file. If
stop time is not given (i.e, time string is 'x.y,') then analysis goes
to end of file.
perf kwork: New tool to trace time properties of kernel work (such as softirq, and workqueue) The 'perf kwork' tool is used to trace time properties of kernel work (such as irq, softirq, and workqueue), including runtime, latency, and timehist, using the infrastructure in the perf tools to allow tracing extra targets. This is the first commit to reuse the 'perf record' framework code to implement a simple record function, kwork is not supported currently. Test cases: # perf usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] The most commonly used perf commands are: <SNIP> iostat Show I/O performance metrics kallsyms Searches running kernel for symbols kmem Tool to trace/measure kernel memory properties kvm Tool to trace/measure kvm guest os kwork Tool to trace/measure kernel work properties (latencies) list List all symbolic event types lock Analyze lock events mem Profile memory accesses record Run a command and record its profile into perf.data <SNIP> See 'perf help COMMAND' for more information on a specific command. # perf kwork Usage: perf kwork [<options>] {record} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -k, --kwork <kwork> list of kwork to profile -v, --verbose be more verbose (show symbol address, etc) # perf kwork record -- sleep 1 [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.787 MB perf.data ] Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220709015033.38326-2-yangjihong1@huawei.com [ Add {} for multiline if blocks ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-09 01:50:17 +00:00
SEE ALSO
--------
linkperf:perf-record[1]