mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2024-09-26 20:38:12 +00:00
7b58b82b86
New features: perf ftrace: - Add -n/--use-nsec option to the 'latency' subcommand. Default: usecs: $ sudo perf ftrace latency -T dput -a sleep 1 # DURATION | COUNT | GRAPH | 0 - 1 us | 2098375 | ############################# | 1 - 2 us | 61 | | 2 - 4 us | 33 | | 4 - 8 us | 13 | | 8 - 16 us | 124 | | 16 - 32 us | 123 | | 32 - 64 us | 1 | | 64 - 128 us | 0 | | 128 - 256 us | 1 | | 256 - 512 us | 0 | | Better granularity with nsec: $ sudo perf ftrace latency -T dput -a -n sleep 1 # DURATION | COUNT | GRAPH | 0 - 1 us | 0 | | 1 - 2 ns | 0 | | 2 - 4 ns | 0 | | 4 - 8 ns | 0 | | 8 - 16 ns | 0 | | 16 - 32 ns | 0 | | 32 - 64 ns | 0 | | 64 - 128 ns | 1163434 | ############## | 128 - 256 ns | 914102 | ############# | 256 - 512 ns | 884 | | 512 - 1024 ns | 613 | | 1 - 2 us | 31 | | 2 - 4 us | 17 | | 4 - 8 us | 7 | | 8 - 16 us | 123 | | 16 - 32 us | 83 | | perf lock: - Add -c/--combine-locks option to merge lock instances in the same class into a single entry. # perf lock report -c Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns) rcu_read_lock 251225 0 0 0 0 0 hrtimer_bases.lock 39450 0 0 0 0 0 &sb->s_type->i_l... 10301 1 662 662 662 662 ptlock_ptr(page) 10173 2 701 1402 760 642 &(ei->i_block_re... 8732 0 0 0 0 0 &xa->xa_lock 8088 0 0 0 0 0 &base->lock 6705 0 0 0 0 0 &p->pi_lock 5549 0 0 0 0 0 &dentry->d_lockr... 5010 4 1274 5097 1844 789 &ep->lock 3958 0 0 0 0 0 - Add -F/--field option to customize the list of fields to output: $ perf lock report -F contended,wait_max -k avg_wait Name contended max wait(ns) avg wait(ns) slock-AF_INET6 1 23543 23543 &lruvec->lru_lock 5 18317 11254 slock-AF_INET6 1 10379 10379 rcu_node_1 1 2104 2104 &dentry->d_lockr... 1 1844 1844 &dentry->d_lockr... 1 1672 1672 &newf->file_lock 15 2279 1025 &dentry->d_lockr... 1 792 792 - Add --synth=no option for record, as there is no need to symbolize, lock names comes from the tracepoints. perf record: - Threaded recording, opt-in, via the new --threads command line option. - Improve AMD IBS (Instruction-Based Sampling) error handling messages. perf script: - Add 'brstackinsnlen' field (use it with -F) for branch stacks. - Output branch sample type in 'perf script'. perf report: - Add "addr_from" and "addr_to" sort dimensions. - Print branch stack entry type in 'perf report --dump-raw-trace' - Fix symbolization for chrooted workloads. Hardware tracing: Intel PT: - Add CFE (Control Flow Event) and EVD (Event Data) packets support. - Add MODE.Exec IFLAG bit support. Explanation about these features from the "Intel® 64 and IA-32 architectures software developer’s manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" PDF at: https://cdrdv2.intel.com/v1/dl/getContent/671200 At page 3951: <quote> 32.2.4 Event Trace is a capability that exposes details about the asynchronous events, when they are generated, and when their corresponding software event handler completes execution. These include: o Interrupts, including NMI and SMI, including the interrupt vector when defined. o Faults, exceptions including the fault vector. — Page faults additionally include the page fault address, when in context. o Event handler returns, including IRET and RSM. o VM exits and VM entries.¹ — VM exits include the values written to the “exit reason” and “exit qualification” VMCS fields. INIT and SIPI events. o TSX aborts, including the abort status returned for the RTM instructions. o Shutdown. Additionally, it provides indication of the status of the Interrupt Flag (IF), to indicate when interrupts are masked. </quote> ARM CoreSight: - Use advertised caps/min_interval as default sample_period on ARM spe. - Update deduction of TRCCONFIGR register for branch broadcast on ARM's CoreSight ETM. Vendor Events (JSON): Intel: - Update events and metrics for: Alderlake, Broadwell, Broadwell DE, BroadwellX, CascadelakeX, Elkhartlake, Bonnell, Goldmont, GoldmontPlus, Westmere EP-DP, Haswell, HaswellX, Icelake, IcelakeX, Ivybridge, Ivytown, Jaketown, Knights Landing, Nehalem EP, Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX, Tigerlake, TremontX, Westmere EP-SP, Westmere EX. ARM: - Add support for HiSilicon CPA PMU aliasing. perf stat: - Fix forked applications enablement of counters. - The 'slots' should only be printed on a different order than the one specified on the command line when 'topdown' events are present, fix it. Miscellaneous: - Sync msr-index, cpufeatures header files with the kernel sources. - Stop using some deprecated libbpf APIs in 'perf trace'. - Fix some spelling mistakes. - Refactor the maps pointers usage to pave the way for using refcount debugging. - Only offer the --tui option on perf top, report and annotate when perf was built with libslang. - Don't mention --to-ctf in 'perf data --help' when not linking with the required library, libbabeltrace. - Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by array_size.cocci. - Enhance the matching of sub-commands abbreviations: 'perf c2c rec' -> 'perf c2c record' 'perf c2c recport -> error - Set build-id using build-id header on new mmap records. - Fix generation of 'perf --version' string. perf test: - Add test for the arm_spe event. - Add test to check unwinding using fame-pointer (fp) mode on arm64. - Make metric testing more robust in 'perf test'. - Add error message for unsupported branch stack cases. libperf: - Add API for allocating new thread map array. - Fix typo in perf_evlist__open() failure error messages in libperf tests. perf c2c: - Replace bitmap_weight() with bitmap_empty() where appropriate. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYj8viwAKCRCyPKLppCJ+ J8K3AQDpN45P4/TWJxVWhZlvYzJtWDSboXHZJfmBiEd4Xu2zbwD7BFW02f1ATHPr dGBFXxRQQufBIqfE+OQXG59Awp1m8wE= =1l8S -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: "New features: perf ftrace: - Add -n/--use-nsec option to the 'latency' subcommand. Default: usecs: $ sudo perf ftrace latency -T dput -a sleep 1 # DURATION | COUNT | GRAPH | 0 - 1 us | 2098375 | ############################# | 1 - 2 us | 61 | | 2 - 4 us | 33 | | 4 - 8 us | 13 | | 8 - 16 us | 124 | | 16 - 32 us | 123 | | 32 - 64 us | 1 | | 64 - 128 us | 0 | | 128 - 256 us | 1 | | 256 - 512 us | 0 | | Better granularity with nsec: $ sudo perf ftrace latency -T dput -a -n sleep 1 # DURATION | COUNT | GRAPH | 0 - 1 us | 0 | | 1 - 2 ns | 0 | | 2 - 4 ns | 0 | | 4 - 8 ns | 0 | | 8 - 16 ns | 0 | | 16 - 32 ns | 0 | | 32 - 64 ns | 0 | | 64 - 128 ns | 1163434 | ############## | 128 - 256 ns | 914102 | ############# | 256 - 512 ns | 884 | | 512 - 1024 ns | 613 | | 1 - 2 us | 31 | | 2 - 4 us | 17 | | 4 - 8 us | 7 | | 8 - 16 us | 123 | | 16 - 32 us | 83 | | perf lock: - Add -c/--combine-locks option to merge lock instances in the same class into a single entry. # perf lock report -c Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns) rcu_read_lock 251225 0 0 0 0 0 hrtimer_bases.lock 39450 0 0 0 0 0 &sb->s_type->i_l... 10301 1 662 662 662 662 ptlock_ptr(page) 10173 2 701 1402 760 642 &(ei->i_block_re... 8732 0 0 0 0 0 &xa->xa_lock 8088 0 0 0 0 0 &base->lock 6705 0 0 0 0 0 &p->pi_lock 5549 0 0 0 0 0 &dentry->d_lockr... 5010 4 1274 5097 1844 789 &ep->lock 3958 0 0 0 0 0 - Add -F/--field option to customize the list of fields to output: $ perf lock report -F contended,wait_max -k avg_wait Name contended max wait(ns) avg wait(ns) slock-AF_INET6 1 23543 23543 &lruvec->lru_lock 5 18317 11254 slock-AF_INET6 1 10379 10379 rcu_node_1 1 2104 2104 &dentry->d_lockr... 1 1844 1844 &dentry->d_lockr... 1 1672 1672 &newf->file_lock 15 2279 1025 &dentry->d_lockr... 1 792 792 - Add --synth=no option for record, as there is no need to symbolize, lock names comes from the tracepoints. perf record: - Threaded recording, opt-in, via the new --threads command line option. - Improve AMD IBS (Instruction-Based Sampling) error handling messages. perf script: - Add 'brstackinsnlen' field (use it with -F) for branch stacks. - Output branch sample type in 'perf script'. perf report: - Add "addr_from" and "addr_to" sort dimensions. - Print branch stack entry type in 'perf report --dump-raw-trace' - Fix symbolization for chrooted workloads. Hardware tracing: Intel PT: - Add CFE (Control Flow Event) and EVD (Event Data) packets support. - Add MODE.Exec IFLAG bit support. Explanation about these features from the "Intel® 64 and IA-32 architectures software developer’s manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" PDF at: https://cdrdv2.intel.com/v1/dl/getContent/671200 At page 3951: "32.2.4 Event Trace is a capability that exposes details about the asynchronous events, when they are generated, and when their corresponding software event handler completes execution. These include: o Interrupts, including NMI and SMI, including the interrupt vector when defined. o Faults, exceptions including the fault vector. - Page faults additionally include the page fault address, when in context. o Event handler returns, including IRET and RSM. o VM exits and VM entries.¹ - VM exits include the values written to the “exit reason” and “exit qualification” VMCS fields. INIT and SIPI events. o TSX aborts, including the abort status returned for the RTM instructions. o Shutdown. Additionally, it provides indication of the status of the Interrupt Flag (IF), to indicate when interrupts are masked" ARM CoreSight: - Use advertised caps/min_interval as default sample_period on ARM spe. - Update deduction of TRCCONFIGR register for branch broadcast on ARM's CoreSight ETM. Vendor Events (JSON): Intel: - Update events and metrics for: Alderlake, Broadwell, Broadwell DE, BroadwellX, CascadelakeX, Elkhartlake, Bonnell, Goldmont, GoldmontPlus, Westmere EP-DP, Haswell, HaswellX, Icelake, IcelakeX, Ivybridge, Ivytown, Jaketown, Knights Landing, Nehalem EP, Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX, Tigerlake, TremontX, Westmere EP-SP, and Westmere EX. ARM: - Add support for HiSilicon CPA PMU aliasing. perf stat: - Fix forked applications enablement of counters. - The 'slots' should only be printed on a different order than the one specified on the command line when 'topdown' events are present, fix it. Miscellaneous: - Sync msr-index, cpufeatures header files with the kernel sources. - Stop using some deprecated libbpf APIs in 'perf trace'. - Fix some spelling mistakes. - Refactor the maps pointers usage to pave the way for using refcount debugging. - Only offer the --tui option on perf top, report and annotate when perf was built with libslang. - Don't mention --to-ctf in 'perf data --help' when not linking with the required library, libbabeltrace. - Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by array_size.cocci. - Enhance the matching of sub-commands abbreviations: 'perf c2c rec' -> 'perf c2c record' 'perf c2c recport -> error - Set build-id using build-id header on new mmap records. - Fix generation of 'perf --version' string. perf test: - Add test for the arm_spe event. - Add test to check unwinding using fame-pointer (fp) mode on arm64. - Make metric testing more robust in 'perf test'. - Add error message for unsupported branch stack cases. libperf: - Add API for allocating new thread map array. - Fix typo in perf_evlist__open() failure error messages in libperf tests. perf c2c: - Replace bitmap_weight() with bitmap_empty() where appropriate" * tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (143 commits) perf evsel: Improve AMD IBS (Instruction-Based Sampling) error handling messages perf python: Add perf_env stubs that will be needed in evsel__open_strerror() perf tools: Enhance the matching of sub-commands abbreviations libperf tests: Fix typo in perf_evlist__open() failure error messages tools arm64: Import cputype.h perf lock: Add -F/--field option to control output perf lock: Extend struct lock_key to have print function perf lock: Add --synth=no option for record tools headers cpufeatures: Sync with the kernel sources tools headers cpufeatures: Sync with the kernel sources perf stat: Fix forked applications enablement of counters tools arch x86: Sync the msr-index.h copy with the kernel sources perf evsel: Make evsel__env() always return a valid env perf build-id: Fix spelling mistake "Cant" -> "Can't" perf header: Fix spelling mistake "could't" -> "couldn't" perf script: Add 'brstackinsnlen' for branch stacks perf parse-events: Move slots only with topdown perf ftrace latency: Update documentation perf ftrace latency: Add -n/--use-nsec option perf tools: Fix version kernel tag ... |
||
---|---|---|
.. | ||
arm-spe-decoder | ||
bpf_skel | ||
c++ | ||
cs-etm-decoder | ||
include | ||
intel-pt-decoder | ||
libunwind | ||
scripting-engines | ||
affinity.c | ||
affinity.h | ||
amd-sample-raw.c | ||
annotate.c | ||
annotate.h | ||
archinsn.h | ||
arm-spe.c | ||
arm-spe.h | ||
arm64-frame-pointer-unwind-support.c | ||
arm64-frame-pointer-unwind-support.h | ||
auxtrace.c | ||
auxtrace.h | ||
block-info.c | ||
block-info.h | ||
block-range.c | ||
block-range.h | ||
bpf-event.c | ||
bpf-event.h | ||
bpf-loader.c | ||
bpf-loader.h | ||
bpf-prologue.c | ||
bpf-prologue.h | ||
bpf-utils.c | ||
bpf-utils.h | ||
bpf_counter.c | ||
bpf_counter.h | ||
bpf_counter_cgroup.c | ||
bpf_ftrace.c | ||
bpf_map.c | ||
bpf_map.h | ||
branch.c | ||
branch.h | ||
Build | ||
build-id.c | ||
build-id.h | ||
cache.h | ||
cacheline.c | ||
cacheline.h | ||
call-path.c | ||
call-path.h | ||
callchain.c | ||
callchain.h | ||
cap.c | ||
cap.h | ||
cgroup.c | ||
cgroup.h | ||
clockid.c | ||
clockid.h | ||
cloexec.c | ||
cloexec.h | ||
color.c | ||
color.h | ||
color_config.c | ||
comm.c | ||
comm.h | ||
compress.h | ||
config.c | ||
config.h | ||
copyfile.c | ||
copyfile.h | ||
counts.c | ||
counts.h | ||
cpu-set-sched.h | ||
cpumap.c | ||
cpumap.h | ||
cputopo.c | ||
cputopo.h | ||
cs-etm.c | ||
cs-etm.h | ||
data-convert-bt.c | ||
data-convert-json.c | ||
data-convert.h | ||
data.c | ||
data.h | ||
db-export.c | ||
db-export.h | ||
debug.c | ||
debug.h | ||
demangle-java.c | ||
demangle-java.h | ||
demangle-ocaml.c | ||
demangle-ocaml.h | ||
demangle-rust.c | ||
demangle-rust.h | ||
dlfilter.c | ||
dlfilter.h | ||
dso.c | ||
dso.h | ||
dsos.c | ||
dsos.h | ||
dump-insn.c | ||
dump-insn.h | ||
dwarf-aux.c | ||
dwarf-aux.h | ||
dwarf-regs.c | ||
env.c | ||
env.h | ||
event.c | ||
event.h | ||
events_stats.h | ||
evlist-hybrid.c | ||
evlist-hybrid.h | ||
evlist.c | ||
evlist.h | ||
evsel.c | ||
evsel.h | ||
evsel_config.h | ||
evsel_fprintf.c | ||
evsel_fprintf.h | ||
evswitch.c | ||
evswitch.h | ||
expr.c | ||
expr.h | ||
expr.l | ||
expr.y | ||
find-map.c | ||
fncache.c | ||
fncache.h | ||
ftrace.h | ||
genelf.c | ||
genelf.h | ||
genelf_debug.c | ||
generate-cmdlist.sh | ||
get_current_dir_name.c | ||
get_current_dir_name.h | ||
hashmap.c | ||
hashmap.h | ||
header.c | ||
header.h | ||
help-unknown-cmd.c | ||
help-unknown-cmd.h | ||
hist.c | ||
hist.h | ||
intel-bts.c | ||
intel-bts.h | ||
intel-pt.c | ||
intel-pt.h | ||
intlist.c | ||
intlist.h | ||
iostat.c | ||
iostat.h | ||
jit.h | ||
jitdump.c | ||
jitdump.h | ||
kvm-stat.h | ||
levenshtein.c | ||
levenshtein.h | ||
llvm-utils.c | ||
llvm-utils.h | ||
lzma.c | ||
machine.c | ||
machine.h | ||
map.c | ||
map.h | ||
map_symbol.h | ||
maps.c | ||
maps.h | ||
mem-events.c | ||
mem-events.h | ||
mem2node.c | ||
mem2node.h | ||
memswap.c | ||
memswap.h | ||
metricgroup.c | ||
metricgroup.h | ||
mmap.c | ||
mmap.h | ||
namespaces.c | ||
namespaces.h | ||
ordered-events.c | ||
ordered-events.h | ||
parse-branch-options.c | ||
parse-branch-options.h | ||
parse-events-hybrid.c | ||
parse-events-hybrid.h | ||
parse-events.c | ||
parse-events.h | ||
parse-events.l | ||
parse-events.y | ||
parse-regs-options.c | ||
parse-regs-options.h | ||
parse-sublevel-options.c | ||
parse-sublevel-options.h | ||
path.c | ||
path.h | ||
perf-hooks-list.h | ||
perf-hooks.c | ||
perf-hooks.h | ||
PERF-VERSION-GEN | ||
perf_api_probe.c | ||
perf_api_probe.h | ||
perf_event_attr_fprintf.c | ||
perf_regs.c | ||
perf_regs.h | ||
pfm.c | ||
pfm.h | ||
pmu-hybrid.c | ||
pmu-hybrid.h | ||
pmu.c | ||
pmu.h | ||
pmu.l | ||
pmu.y | ||
print_binary.c | ||
print_binary.h | ||
probe-event.c | ||
probe-event.h | ||
probe-file.c | ||
probe-file.h | ||
probe-finder.c | ||
probe-finder.h | ||
pstack.c | ||
pstack.h | ||
python-ext-sources | ||
python.c | ||
rb_resort.h | ||
rblist.c | ||
rblist.h | ||
record.c | ||
record.h | ||
rlimit.c | ||
rlimit.h | ||
rwsem.c | ||
rwsem.h | ||
s390-cpumcf-kernel.h | ||
s390-cpumsf-kernel.h | ||
s390-cpumsf.c | ||
s390-cpumsf.h | ||
s390-sample-raw.c | ||
sample-raw.c | ||
sample-raw.h | ||
session.c | ||
session.h | ||
setns.c | ||
setup.py | ||
sideband_evlist.c | ||
smt.c | ||
smt.h | ||
sort.c | ||
sort.h | ||
spark.c | ||
spark.h | ||
srccode.c | ||
srccode.h | ||
srcline.c | ||
srcline.h | ||
stat-display.c | ||
stat-shadow.c | ||
stat.c | ||
stat.h | ||
strbuf.c | ||
strbuf.h | ||
stream.c | ||
stream.h | ||
strfilter.c | ||
strfilter.h | ||
string.c | ||
string2.h | ||
strlist.c | ||
strlist.h | ||
svghelper.c | ||
svghelper.h | ||
symbol-elf.c | ||
symbol-minimal.c | ||
symbol.c | ||
symbol.h | ||
symbol_conf.h | ||
symbol_fprintf.c | ||
symsrc.h | ||
synthetic-events.c | ||
synthetic-events.h | ||
syscalltbl.c | ||
syscalltbl.h | ||
target.c | ||
target.h | ||
term.c | ||
term.h | ||
thread-stack.c | ||
thread-stack.h | ||
thread.c | ||
thread.h | ||
thread_map.c | ||
thread_map.h | ||
time-utils.c | ||
time-utils.h | ||
tool.h | ||
top.c | ||
top.h | ||
topdown.c | ||
topdown.h | ||
trace-event-info.c | ||
trace-event-parse.c | ||
trace-event-read.c | ||
trace-event-scripting.c | ||
trace-event.c | ||
trace-event.h | ||
trigger.h | ||
tsc.c | ||
tsc.h | ||
units.c | ||
units.h | ||
unwind-libdw.c | ||
unwind-libdw.h | ||
unwind-libunwind-local.c | ||
unwind-libunwind.c | ||
unwind.h | ||
usage.c | ||
util.c | ||
util.h | ||
values.c | ||
values.h | ||
vdso.c | ||
vdso.h | ||
zlib.c | ||
zstd.c |