linux-stable/tools/perf/pmu-events
Kim Phillips 08ed77e414 perf vendor events amd: Add recommended events
Add support for events listed in Section 2.1.15.2 "Performance
Measurement" of "PPR for AMD Family 17h Model 31h B0 - 55803
Rev 0.54 - Sep 12, 2019".

perf now supports these new events (-e):

  all_dc_accesses
  all_tlbs_flushed
  l1_dtlb_misses
  l2_cache_accesses_from_dc_misses
  l2_cache_accesses_from_ic_misses
  l2_cache_hits_from_dc_misses
  l2_cache_hits_from_ic_misses
  l2_cache_misses_from_dc_misses
  l2_cache_misses_from_ic_miss
  l2_dtlb_misses
  l2_itlb_misses
  sse_avx_stalls
  uops_dispatched
  uops_retired
  l3_accesses
  l3_misses

and these metrics (-M):

  branch_misprediction_ratio
  all_l2_cache_accesses
  all_l2_cache_hits
  all_l2_cache_misses
  ic_fetch_miss_ratio
  l2_cache_accesses_from_l2_hwpf
  l2_cache_hits_from_l2_hwpf
  l2_cache_misses_from_l2_hwpf
  l3_read_miss_latency
  l1_itlb_misses
  all_remote_links_outbound
  nps1_die_to_dram

The nps1_die_to_dram event may need perf stat's --metric-no-group
switch if the number of available data fabric counters is less
than the number it uses (8).

Committer testing:

On a AMD Ryzen 3900x system:

Before:

  # perf list all_dc_accesses   all_tlbs_flushed   l1_dtlb_misses   l2_cache_accesses_from_dc_misses   l2_cache_accesses_from_ic_misses   l2_cache_hits_from_dc_misses   l2_cache_hits_from_ic_misses   l2_cache_misses_from_dc_misses   l2_cache_misses_from_ic_miss   l2_dtlb_misses   l2_itlb_misses   sse_avx_stalls   uops_dispatched   uops_retired   l3_accesses   l3_misses | grep -v "^Metric Groups:$" | grep -v "^$"
  #

After:

  # perf list all_dc_accesses   all_tlbs_flushed   l1_dtlb_misses   l2_cache_accesses_from_dc_misses   l2_cache_accesses_from_ic_misses   l2_cache_hits_from_dc_misses   l2_cache_hits_from_ic_misses   l2_cache_misses_from_dc_misses   l2_cache_misses_from_ic_miss   l2_dtlb_misses   l2_itlb_misses   sse_avx_stalls   uops_dispatched   uops_retired   l3_accesses   l3_misses | grep -v "^Metric Groups:$" | grep -v "^$" | grep -v "^recommended:$"
  all_dc_accesses
       [All L1 Data Cache Accesses]
  all_tlbs_flushed
       [All TLBs Flushed]
  l1_dtlb_misses
       [L1 DTLB Misses]
  l2_cache_accesses_from_dc_misses
       [L2 Cache Accesses from L1 Data Cache Misses (including prefetch)]
  l2_cache_accesses_from_ic_misses
       [L2 Cache Accesses from L1 Instruction Cache Misses (including
        prefetch)]
  l2_cache_hits_from_dc_misses
       [L2 Cache Hits from L1 Data Cache Misses]
  l2_cache_hits_from_ic_misses
       [L2 Cache Hits from L1 Instruction Cache Misses]
  l2_cache_misses_from_dc_misses
       [L2 Cache Misses from L1 Data Cache Misses]
  l2_cache_misses_from_ic_miss
       [L2 Cache Misses from L1 Instruction Cache Misses]
  l2_dtlb_misses
       [L2 DTLB Misses & Data page walks]
  l2_itlb_misses
       [L2 ITLB Misses & Instruction page walks]
  sse_avx_stalls
       [Mixed SSE/AVX Stalls]
  uops_dispatched
       [Micro-ops Dispatched]
  uops_retired
       [Micro-ops Retired]
  l3_accesses
       [L3 Accesses. Unit: amd_l3]
  l3_misses
       [L3 Misses (includes Chg2X). Unit: amd_l3]
  #

  # perf stat -a -e all_dc_accesses,all_tlbs_flushed,l1_dtlb_misses,l2_cache_accesses_from_dc_misses,l2_cache_accesses_from_ic_misses,l2_cache_hits_from_dc_misses,l2_cache_hits_from_ic_misses,l2_cache_misses_from_dc_misses,l2_cache_misses_from_ic_miss,l2_dtlb_misses,l2_itlb_misses,sse_avx_stalls,uops_dispatched,uops_retired,l3_accesses,l3_misses sleep 2

   Performance counter stats for 'system wide':

       433,439,949      all_dc_accesses                                               (35.66%)
               443      all_tlbs_flushed                                              (35.66%)
         2,985,885      l1_dtlb_misses                                                (35.66%)
        18,318,019      l2_cache_accesses_from_dc_misses                                     (35.68%)
        50,114,810      l2_cache_accesses_from_ic_misses                                     (35.72%)
        12,423,978      l2_cache_hits_from_dc_misses                                     (35.74%)
        40,703,103      l2_cache_hits_from_ic_misses                                     (35.74%)
         6,698,673      l2_cache_misses_from_dc_misses                                     (35.74%)
        12,090,892      l2_cache_misses_from_ic_miss                                     (35.74%)
           614,267      l2_dtlb_misses                                                (35.74%)
           216,036      l2_itlb_misses                                                (35.74%)
            11,977      sse_avx_stalls                                                (35.74%)
       999,276,223      uops_dispatched                                               (35.73%)
     1,075,311,620      uops_retired                                                  (35.69%)
         1,420,763      l3_accesses
           540,164      l3_misses

       2.002344121 seconds time elapsed

  # perf stat -a -e all_dc_accesses,all_tlbs_flushed,l1_dtlb_misses,l2_cache_accesses_from_dc_misses,l2_cache_accesses_from_ic_misses sleep 2

   Performance counter stats for 'system wide':

       175,943,104      all_dc_accesses
               310      all_tlbs_flushed
         2,280,359      l1_dtlb_misses
        11,700,151      l2_cache_accesses_from_dc_misses
        25,414,963      l2_cache_accesses_from_ic_misses

       2.001957818 seconds time elapsed

  #

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Jon Grimm <jon.grimm@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: William Cohen <wcohen@redhat.com>
Cc: Yunfeng Ye <yeyunfeng@huawei.com>
Link: http://lore.kernel.org/lkml/20200901220944.277505-3-kim.phillips@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04 16:32:22 -03:00
..
arch perf vendor events amd: Add recommended events 2020-09-04 16:32:22 -03:00
Build tools: build: Fixup host c flags 2018-07-13 00:48:17 +09:00
jevents.c perf vendor events amd: Add recommended events 2020-09-04 16:32:22 -03:00
jevents.h perf jevents: Support metric constraint 2020-03-10 14:43:05 -03:00
jsmn.c
jsmn.h perf tools: Correct license on jsmn JSON parser 2020-05-29 16:51:38 -03:00
json.c perf utils: Check verbose flag properly 2017-02-20 11:35:54 -03:00
json.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pmu-events.h perf pmu: Fix function name in comment, its get_cpuid_str(), not get_cpustr() 2020-04-30 10:48:33 -03:00
README perf vendor events: Minor fixes to the README 2019-09-25 09:51:42 -03:00

The contents of this directory allow users to specify PMU events in their
CPUs by their symbolic names rather than raw event codes (see example below).

The main program in this directory, is the 'jevents', which is built and
executed _BEFORE_ the perf binary itself is built.

The 'jevents' program tries to locate and process JSON files in the directory
tree tools/perf/pmu-events/arch/foo.

	- Regular files with '.json' extension in the name are assumed to be
	  JSON files, each of which describes a set of PMU events.

	- The CSV file that maps a specific CPU to its set of PMU events is to
	  be named 'mapfile.csv' (see below for mapfile format).

	- Directories are traversed, but all other files are ignored.

	- To reduce JSON event duplication per architecture, platform JSONs may
	  use "ArchStdEvent" keyword to dereference an "Architecture standard
	  events", defined in architecture standard JSONs.
	  Architecture standard JSONs must be located in the architecture root
	  folder. Matching is based on the "EventName" field.

The PMU events supported by a CPU model are expected to grouped into topics
such as Pipelining, Cache, Memory, Floating-point etc. All events for a topic
should be placed in a separate JSON file - where the file name identifies
the topic. Eg: "Floating-point.json".

All the topic JSON files for a CPU model/family should be in a separate
sub directory. Thus for the Silvermont X86 CPU:

	$ ls tools/perf/pmu-events/arch/x86/silvermont
	cache.json     memory.json    virtual-memory.json
	frontend.json  pipeline.json

The JSONs folder for a CPU model/family may be placed in the root arch
folder, or may be placed in a vendor sub-folder under the arch folder
for instances where the arch and vendor are not the same.

Using the JSON files and the mapfile, 'jevents' generates the C source file,
'pmu-events.c', which encodes the two sets of tables:

	- Set of 'PMU events tables' for all known CPUs in the architecture,
	  (one table like the following, per JSON file; table name 'pme_power8'
	  is derived from JSON file name, 'power8.json').

		struct pmu_event pme_power8[] = {

			...

			{
				.name = "pm_1plus_ppc_cmpl",
				.event = "event=0x100f2",
				.desc = "1 or more ppc insts finished,",
			},

			...
		}

	- A 'mapping table' that maps each CPU of the architecture, to its
	  'PMU events table'

		struct pmu_events_map pmu_events_map[] = {
		{
			.cpuid = "004b0000",
			.version = "1",
			.type = "core",
			.table = pme_power8
		},
			...

		};

After the 'pmu-events.c' is generated, it is compiled and the resulting
'pmu-events.o' is added to 'libperf.a' which is then used to build perf.

NOTES:
	1. Several CPUs can support same set of events and hence use a common
	   JSON file. Hence several entries in the pmu_events_map[] could map
	   to a single 'PMU events table'.

	2. The 'pmu-events.h' has an extern declaration for the mapping table
	   and the generated 'pmu-events.c' defines this table.

	3. _All_ known CPU tables for architecture are included in the perf
	   binary.

At run time, perf determines the actual CPU it is running on, finds the
matching events table and builds aliases for those events. This allows
users to specify events by their name:

	$ perf stat -e pm_1plus_ppc_cmpl sleep 1

where 'pm_1plus_ppc_cmpl' is a Power8 PMU event.

However some errors in processing may cause the alias build to fail.

Mapfile format
===============

The mapfile enables multiple CPU models to share a single set of PMU events.
It is required even if such mapping is 1:1.

The mapfile.csv format is expected to be:

	Header line
	CPUID,Version,Dir/path/name,Type

where:

	Comma:
		is the required field delimiter (i.e other fields cannot
		have commas within them).

	Comments:
		Lines in which the first character is either '\n' or '#'
		are ignored.

	Header line
		The header line is the first line in the file, which is
		always _IGNORED_. It can be empty.

	CPUID:
		CPUID is an arch-specific char string, that can be used
		to identify CPU (and associate it with a set of PMU events
		it supports). Multiple CPUIDS can point to the same
		File/path/name.json.

		Example:
			CPUID == 'GenuineIntel-6-2E' (on x86).
			CPUID == '004b0100' (PVR value in Powerpc)
	Version:
		is the Version of the mapfile.

	Dir/path/name:
		is the pathname to the directory containing the CPU's JSON
		files, relative to the directory containing the mapfile.csv

	Type:
		indicates whether the events are "core" or "uncore" events.


	Eg:

	$ grep silvermont tools/perf/pmu-events/arch/x86/mapfile.csv
	GenuineIntel-6-37,v13,silvermont,core
	GenuineIntel-6-4D,v13,silvermont,core
	GenuineIntel-6-4C,v13,silvermont,core

	i.e the three CPU models use the JSON files (i.e PMU events) listed
	in the directory 'tools/perf/pmu-events/arch/x86/silvermont'.