perf vendor events intel: Update cascadelakex to 1.19

Updates were released in:
e4f83534f2
Adds the events IDQ.DSB_CYCLES_OK, IDQ.DSB_CYCLES_ANY,
ICACHE_TAG.STALLS, DECODE.LCP, LSD.CYCLES_OK. Descriptions are also
updated.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623151016.4193660-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
This commit is contained in:
Ian Rogers 2023-06-23 08:10:10 -07:00 committed by Namhyung Kim
parent dfc83cc877
commit 0c9e39421c
5 changed files with 54 additions and 12 deletions

View File

@ -7,6 +7,14 @@
"SampleAfterValue": "100003",
"UMask": "0x1"
},
{
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to ILD_STALL.LCP]",
"EventCode": "0x87",
"EventName": "DECODE.LCP",
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to ILD_STALL.LCP]",
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
{
"BriefDescription": "Decode Stream Buffer (DSB)-to-MITE switches",
"EventCode": "0xAB",
@ -245,27 +253,34 @@
"UMask": "0x2"
},
{
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
"EventCode": "0x83",
"EventName": "ICACHE_64B.IFTAG_STALL",
"SampleAfterValue": "200003",
"UMask": "0x4"
},
{
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops",
"BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
"EventCode": "0x83",
"EventName": "ICACHE_TAG.STALLS",
"SampleAfterValue": "200003",
"UMask": "0x4"
},
{
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops [This event is alias to IDQ.DSB_CYCLES_OK]",
"CounterMask": "4",
"EventCode": "0x79",
"EventName": "IDQ.ALL_DSB_CYCLES_4_UOPS",
"PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ.",
"PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.DSB_CYCLES_OK]",
"SampleAfterValue": "2000003",
"UMask": "0x18"
},
{
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop",
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop [This event is alias to IDQ.DSB_CYCLES_ANY]",
"CounterMask": "1",
"EventCode": "0x79",
"EventName": "IDQ.ALL_DSB_CYCLES_ANY_UOPS",
"PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ.",
"PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.DSB_CYCLES_ANY]",
"SampleAfterValue": "2000003",
"UMask": "0x18"
},
@ -296,6 +311,24 @@
"SampleAfterValue": "2000003",
"UMask": "0x8"
},
{
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop [This event is alias to IDQ.ALL_DSB_CYCLES_ANY_UOPS]",
"CounterMask": "1",
"EventCode": "0x79",
"EventName": "IDQ.DSB_CYCLES_ANY",
"PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.ALL_DSB_CYCLES_ANY_UOPS]",
"SampleAfterValue": "2000003",
"UMask": "0x18"
},
{
"BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops [This event is alias to IDQ.ALL_DSB_CYCLES_4_UOPS]",
"CounterMask": "4",
"EventCode": "0x79",
"EventName": "IDQ.DSB_CYCLES_OK",
"PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.ALL_DSB_CYCLES_4_UOPS]",
"SampleAfterValue": "2000003",
"UMask": "0x18"
},
{
"BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path",
"EventCode": "0x79",

View File

@ -361,10 +361,10 @@
"UMask": "0x1"
},
{
"BriefDescription": "Stalls caused by changing prefix length of the instruction.",
"BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to DECODE.LCP]",
"EventCode": "0x87",
"EventName": "ILD_STALL.LCP",
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk.",
"PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to DECODE.LCP]",
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
@ -488,11 +488,11 @@
"UMask": "0x1"
},
{
"BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder.",
"BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder. [This event is alias to LSD.CYCLES_OK]",
"CounterMask": "4",
"EventCode": "0xA8",
"EventName": "LSD.CYCLES_4_UOPS",
"PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector).",
"PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_OK]",
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
@ -505,6 +505,15 @@
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
{
"BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder. [This event is alias to LSD.CYCLES_4_UOPS]",
"CounterMask": "4",
"EventCode": "0xA8",
"EventName": "LSD.CYCLES_OK",
"PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_4_UOPS]",
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
{
"BriefDescription": "Number of Uops delivered by the LSD.",
"EventCode": "0xA8",

View File

@ -6606,7 +6606,7 @@
"EventCode": "0x52",
"EventName": "UNC_M3UPI_RxC_HELD.PARALLEL_SUCCESS",
"PerPkg": "1",
"PublicDescription": "ad and bl messages were actually slotted into the same flit in paralle",
"PublicDescription": "ad and bl messages were actually slotted into the same flit in parallel",
"UMask": "0x8",
"Unit": "M3UPI"
},

View File

@ -2735,7 +2735,7 @@
"EventCode": "0x81",
"EventName": "UNC_M_WPQ_OCCUPANCY",
"PerPkg": "1",
"PublicDescription": "Counts the number of entries in the Write Pending Queue (WPQ) at each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations). The WPQ is used to schedule writes out to the memory controller and to track the requests. Requests allocate into the WPQ soon after they enter the memory controller, and need credits for an entry in this buffer before being sent from the CHA to the iMC (memory controller). They deallocate after being issued to DRAM. Write requests themselves are able to complete (from the perspective of the rest of the system) as soon they have 'posted' to the iMC. This is not to be confused with actually performing the write to DRAM. Therefore, the average latency for this queue is actually not useful for deconstruction intermediate write latencies. So, we provide filtering based on if the request has posted or not. By using the 'not posted' filter, we can track how long writes spent in the iMC before completions were sent to the HA. The 'posted' filter, on the other hand, provides information about how much queueing is actually happening in the iMC for writes before they are actually issued to memory. High average occupancies will generally coincide with high write major mode counts. Is there a filter of sorts?",
"PublicDescription": "Counts the number of entries in the Write Pending Queue (WPQ) at each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations). The WPQ is used to schedule writes out to the memory controller and to track the requests. Requests allocate into the WPQ soon after they enter the memory controller, and need credits for an entry in this buffer before being sent from the CHA to the iMC (memory controller). They deallocate after being issued to DRAM. Write requests themselves are able to complete (from the perspective of the rest of the system) as soon they have 'posted' to the iMC. This is not to be confused with actually performing the write to DRAM. Therefore, the average latency for this queue is actually not useful for deconstruction intermediate write latencies. So, we provide filtering based on if the request has posted or not. By using the 'not posted' filter, we can track how long writes spent in the iMC before completions were sent to the HA. The 'posted' filter, on the other hand, provides information about how much queueing is actually happening in the iMC for writes before they are actually issued to memory. High average occupancies will generally coincide with high write major mode counts.",
"Unit": "iMC"
},
{

View File

@ -5,7 +5,7 @@ GenuineIntel-6-(1C|26|27|35|36),v4,bonnell,core
GenuineIntel-6-(3D|47),v28,broadwell,core
GenuineIntel-6-56,v10,broadwellde,core
GenuineIntel-6-4F,v21,broadwellx,core
GenuineIntel-6-55-[56789ABCDEF],v1.18,cascadelakex,core
GenuineIntel-6-55-[56789ABCDEF],v1.19,cascadelakex,core
GenuineIntel-6-9[6C],v1.04,elkhartlake,core
GenuineIntel-6-5[CF],v13,goldmont,core
GenuineIntel-6-7A,v1.01,goldmontplus,core

1 Family-model Version Filename EventType
5 GenuineIntel-6-(3D|47) v28 broadwell core
6 GenuineIntel-6-56 v10 broadwellde core
7 GenuineIntel-6-4F v21 broadwellx core
8 GenuineIntel-6-55-[56789ABCDEF] v1.18 v1.19 cascadelakex core
9 GenuineIntel-6-9[6C] v1.04 elkhartlake core
10 GenuineIntel-6-5[CF] v13 goldmont core
11 GenuineIntel-6-7A v1.01 goldmontplus core