Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Main perf kernel side changes:

   - uprobes updates/fixes.  (Oleg Nesterov)

   - Add PERF_RECORD_SWITCH to indicate context switches and use it in
     tooling.  (Adrian Hunter)

   - Support BPF programs attached to uprobes and first steps for BPF
     tooling support.  (Wang Nan)

   - x86 generic x86 MSR-to-perf PMU driver.  (Andy Lutomirski)

   - x86 Intel PT, LBR and BTS updates.  (Alexander Shishkin)

   - x86 Intel Skylake support.  (Andi Kleen)

   - x86 Intel Knights Landing (KNL) RAPL support.  (Dasaratharaman
     Chandramouli)

   - x86 Intel Broadwell-DE uncore support.  (Kan Liang)

   - x86 hw breakpoints robustization (Andy Lutomirski)

  Main perf tooling side changes:

   - Support Intel PT in several tools, enabling the use of the
     processor trace feature introduced in Intel Broadwell processors:
     (Adrian Hunter)

       # dmesg | grep Performance
       # [0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver.
       # perf record -e intel_pt//u -a sleep 1
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.216 MB perf.data ]
       # perf script # then navigate in the tool output to some area, like this one:
       184 1030 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba661440 dl_main (/usr/lib64/ld-2.17.so)
       185 1457 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba669f10 _dl_new_object (/usr/lib64/ld-2.17.so)
       186 9f37 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba677b90 strlen (/usr/lib64/ld-2.17.so)
       187 7ba3 strlen (/usr/lib64/ld-2.17.so) => 7f21ba677c75 strlen (/usr/lib64/ld-2.17.so)
       188 7c78 strlen (/usr/lib64/ld-2.17.so) => 7f21ba669f3c _dl_new_object (/usr/lib64/ld-2.17.so)
       189 9f8a _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba65fab0 calloc@plt (/usr/lib64/ld-2.17.so)
       190 fab0 calloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e70 calloc (/usr/lib64/ld-2.17.so)
       191 5e87 calloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa90 malloc@plt (/usr/lib64/ld-2.17.so)
       192 fa90 malloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e60 malloc (/usr/lib64/ld-2.17.so)
       193 5e68 malloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so)
       194 fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) => 7f21ba675d50 __libc_memalign (/usr/lib64/ld-2.17.so)
       195 5d63 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e20 __libc_memalign (/usr/lib64/ld-2.17.so)
       196 5e40 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675d73 __libc_memalign (/usr/lib64/ld-2.17.so)
       197 5d97 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e18 __libc_memalign (/usr/lib64/ld-2.17.so)
       198 5e1e __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675df9 __libc_memalign (/usr/lib64/ld-2.17.so)
       199 5e10 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba669f8f _dl_new_object (/usr/lib64/ld-2.17.so)
       200 9fc2 _dl_new_object (/usr/lib64/ld-2.17.so) =>  7f21ba678e70 memcpy (/usr/lib64/ld-2.17.so)
       201 8e8c memcpy (/usr/lib64/ld-2.17.so) => 7f21ba678ea0 memcpy (/usr/lib64/ld-2.17.so)

   - Add support for using several Intel PT features (CYC, MTC packets),
     the relevant documentation was updated in:
         tools/perf/Documentation/intel-pt.txt
     briefly describing those packets, its purposes, how to configure
     them in the event config terms and relevant external documentation
     for further reading.  (Adrian Hunter)

   - Introduce support for probing at an absolute address, for user and
     kernel 'perf probe's, useful when one have the symbol maps on a
     developer machine but not on an embedded system.  (Wang Nan)

   - Add Intel BTS support, with a call-graph script to show it and PT
     in use in a GUI using 'perf script' python scripting with
     postgresql and Qt.  (Adrian Hunter)

   - Allow selecting the type of callchains per event, including
     disabling callchains in all but one entry in an event list, to save
     space, and also to ask for the callchains collected in one event to
     be used in other events.  (Kan Liang)

   - Beautify more syscall arguments in 'perf trace': (Arnaldo Carvalho
     de Melo)
       * A bunch more translate file/pathnames from pointers to strings.
       * Convert numbers to strings for the 'keyctl' syscall 'option'
         arg.
       * Add missing 'clockid' entries.

   - Introduce 'srcfile' sort key: (Andi Kleen)

       # perf record -F 10000 usleep 1
       # perf report --stdio --dsos '[kernel.vmlinux]' -s srcfile
       <SNIP>
       # Overhead  Source File
          26.49%  copy_page_64.S
           5.49%  signal.c
           0.51%  msr.h
       #

     It can be combined with other fields, for instance, experiment with
     '-s srcfile,symbol'.

     There are some oddities in some distros and with some specific
     DSOs, being investigated, so your mileage may vary.

   - Support per-event 'freq' term: (Namhyung Kim)

       $ perf record -e 'cpu/instructions,freq=1234/',cycles -c 1000 sleep 1
       $ perf evlist -F
       cpu/instructions,freq=1234/: sample_freq=1234
       cycles: sample_period=1000
       $

   - Deref sys_enter pointer args with contents from probe:vfs_getname,
     showing pathnames instead of pointers in many syscalls in 'perf
     trace'.  (Arnaldo Carvalho de Melo)

   - Stop collecting /proc/kallsyms in perf.data files, saving about
     4.5MB on a typical x86-64 system, use the the symbol resolution
     routines used in all the other tools (report, top, etc) now that we
     can ask libtraceevent to use perf's symbol resolution code.
     (Arnaldo Carvalho de Melo)

   - Allow filtering out of perf's PID via 'perf record --exclude-perf'.
     (Wang Nan)

   - 'perf trace' now supports syscall groups, like strace, i.e:

       $ trace -e file touch file

     Will expand 'file' into multiple, file related, syscalls.  More
     work needed to add extra groups for other syscall groups, and also
     to complement what was added for the 'file' group, included as a
     proof of concept.  (Arnaldo Carvalho de Melo)

   - Add lock_pi stresser to 'perf bench futex', to test the kernel code
     related to FUTEX_(UN)LOCK_PI.  (Davidlohr Bueso)

   - Let user have timestamps with per-thread recording in 'perf record'
     (Adrian Hunter)

   - ... and tons of other changes, see the shortlog and the Git log for
     details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (240 commits)
  perf evlist: Add backpointer for perf_env to evlist
  perf tools: Rename perf_session_env to perf_env
  perf tools: Do not change lib/api/fs/debugfs directly
  perf tools: Add tracing_path and remove unneeded functions
  perf buildid: Introduce sysfs/filename__sprintf_build_id
  perf evsel: Add a backpointer to the evlist a evsel is in
  perf trace: Add header with copyright and background info
  perf scripts python: Add new compaction-times script
  perf stat: Get correct cpu id for print_aggr
  tools lib traceeveent: Allow for negative numbers in print format
  perf script: Add --[no-]-demangle/--[no-]-demangle-kernel
  tracing/uprobes: Do not print '0x (null)' when offset is 0
  perf probe: Support probing at absolute address
  perf probe: Fix error reported when offset without function
  perf probe: Fix list result when address is zero
  perf probe: Fix list result when symbol can't be found
  tools build: Allow duplicate objects in the object list
  perf tools: Remove export.h from MANIFEST
  perf probe: Prevent segfault when reading probe point with absolute address
  perf tools: Update Intel PT documentation
  ...
This commit is contained in:
Linus Torvalds 2015-08-31 19:49:05 -07:00
commit 41d859a83c
206 changed files with 19180 additions and 1577 deletions

View File

@ -73,6 +73,12 @@
#define MSR_LBR_CORE_FROM 0x00000040
#define MSR_LBR_CORE_TO 0x00000060
#define MSR_LBR_INFO_0 0x00000dc0 /* ... 0xddf for _31 */
#define LBR_INFO_MISPRED BIT_ULL(63)
#define LBR_INFO_IN_TX BIT_ULL(62)
#define LBR_INFO_ABORT BIT_ULL(61)
#define LBR_INFO_CYCLES 0xffff
#define MSR_IA32_PEBS_ENABLE 0x000003f1
#define MSR_IA32_DS_AREA 0x00000600
#define MSR_IA32_PERF_CAPABILITIES 0x00000345
@ -80,13 +86,21 @@
#define MSR_IA32_RTIT_CTL 0x00000570
#define RTIT_CTL_TRACEEN BIT(0)
#define RTIT_CTL_CYCLEACC BIT(1)
#define RTIT_CTL_OS BIT(2)
#define RTIT_CTL_USR BIT(3)
#define RTIT_CTL_CR3EN BIT(7)
#define RTIT_CTL_TOPA BIT(8)
#define RTIT_CTL_MTC_EN BIT(9)
#define RTIT_CTL_TSC_EN BIT(10)
#define RTIT_CTL_DISRETC BIT(11)
#define RTIT_CTL_BRANCH_EN BIT(13)
#define RTIT_CTL_MTC_RANGE_OFFSET 14
#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
#define RTIT_CTL_CYC_THRESH_OFFSET 19
#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
#define RTIT_CTL_PSB_FREQ_OFFSET 24
#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
#define MSR_IA32_RTIT_STATUS 0x00000571
#define RTIT_STATUS_CONTEXTEN BIT(1)
#define RTIT_STATUS_TRIGGEREN BIT(2)

View File

@ -159,6 +159,13 @@ struct x86_pmu_capability {
*/
#define INTEL_PMC_IDX_FIXED_BTS (INTEL_PMC_IDX_FIXED + 16)
#define GLOBAL_STATUS_COND_CHG BIT_ULL(63)
#define GLOBAL_STATUS_BUFFER_OVF BIT_ULL(62)
#define GLOBAL_STATUS_UNC_OVF BIT_ULL(61)
#define GLOBAL_STATUS_ASIF BIT_ULL(60)
#define GLOBAL_STATUS_COUNTERS_FROZEN BIT_ULL(59)
#define GLOBAL_STATUS_LBRS_FROZEN BIT_ULL(58)
/*
* IBS cpuid feature detection
*/

View File

@ -51,6 +51,7 @@ extern int unsynchronized_tsc(void);
extern int check_tsc_unstable(void);
extern int check_tsc_disabled(void);
extern unsigned long native_calibrate_tsc(void);
extern unsigned long long native_sched_clock_from_tsc(u64 tsc);
extern int tsc_clocksource_reliable;

View File

@ -46,6 +46,8 @@ obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += perf_event_intel_uncore.o \
perf_event_intel_uncore_snb.o \
perf_event_intel_uncore_snbep.o \
perf_event_intel_uncore_nhmex.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_msr.o
obj-$(CONFIG_CPU_SUP_AMD) += perf_event_msr.o
endif

View File

@ -25,32 +25,11 @@
*/
#define TOPA_PMI_MARGIN 512
/*
* Table of Physical Addresses bits
*/
enum topa_sz {
TOPA_4K = 0,
TOPA_8K,
TOPA_16K,
TOPA_32K,
TOPA_64K,
TOPA_128K,
TOPA_256K,
TOPA_512K,
TOPA_1MB,
TOPA_2MB,
TOPA_4MB,
TOPA_8MB,
TOPA_16MB,
TOPA_32MB,
TOPA_64MB,
TOPA_128MB,
TOPA_SZ_END,
};
#define TOPA_SHIFT 12
static inline unsigned int sizes(enum topa_sz tsz)
static inline unsigned int sizes(unsigned int tsz)
{
return 1 << (tsz + 12);
return 1 << (tsz + TOPA_SHIFT);
};
struct topa_entry {
@ -66,20 +45,26 @@ struct topa_entry {
u64 rsvd4 : 16;
};
#define TOPA_SHIFT 12
#define PT_CPUID_LEAVES 2
#define PT_CPUID_LEAVES 2
#define PT_CPUID_REGS_NUM 4 /* number of regsters (eax, ebx, ecx, edx) */
enum pt_capabilities {
PT_CAP_max_subleaf = 0,
PT_CAP_cr3_filtering,
PT_CAP_psb_cyc,
PT_CAP_mtc,
PT_CAP_topa_output,
PT_CAP_topa_multiple_entries,
PT_CAP_single_range_output,
PT_CAP_payloads_lip,
PT_CAP_mtc_periods,
PT_CAP_cycle_thresholds,
PT_CAP_psb_periods,
};
struct pt_pmu {
struct pmu pmu;
u32 caps[4 * PT_CPUID_LEAVES];
u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
};
/**

View File

@ -1551,7 +1551,7 @@ static void __init filter_events(struct attribute **attrs)
}
/* Merge two pointer arrays */
static __init struct attribute **merge_attr(struct attribute **a, struct attribute **b)
__init struct attribute **merge_attr(struct attribute **a, struct attribute **b)
{
struct attribute **new;
int j, i;

View File

@ -165,7 +165,7 @@ struct intel_excl_cntrs {
unsigned core_id; /* per-core: core id */
};
#define MAX_LBR_ENTRIES 16
#define MAX_LBR_ENTRIES 32
enum {
X86_PERF_KFREE_SHARED = 0,
@ -594,6 +594,7 @@ struct x86_pmu {
struct event_constraint *pebs_constraints;
void (*pebs_aliases)(struct perf_event *event);
int max_pebs_events;
unsigned long free_running_flags;
/*
* Intel LBR
@ -624,6 +625,7 @@ struct x86_pmu {
struct x86_perf_task_context {
u64 lbr_from[MAX_LBR_ENTRIES];
u64 lbr_to[MAX_LBR_ENTRIES];
u64 lbr_info[MAX_LBR_ENTRIES];
int lbr_callstack_users;
int lbr_stack_state;
};
@ -793,6 +795,8 @@ static inline void set_linear_ip(struct pt_regs *regs, unsigned long ip)
ssize_t x86_event_sysfs_show(char *page, u64 config, u64 event);
ssize_t intel_event_sysfs_show(char *page, u64 config);
struct attribute **merge_attr(struct attribute **a, struct attribute **b);
#ifdef CONFIG_CPU_SUP_AMD
int amd_pmu_init(void);
@ -808,20 +812,6 @@ static inline int amd_pmu_init(void)
#ifdef CONFIG_CPU_SUP_INTEL
static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event)
{
/* user explicitly requested branch sampling */
if (has_branch_stack(event))
return true;
/* implicit branch sampling to correct PEBS skid */
if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
x86_pmu.intel_cap.pebs_format < 2)
return true;
return false;
}
static inline bool intel_pmu_has_bts(struct perf_event *event)
{
if (event->attr.config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS &&
@ -873,6 +863,8 @@ extern struct event_constraint intel_ivb_pebs_event_constraints[];
extern struct event_constraint intel_hsw_pebs_event_constraints[];
extern struct event_constraint intel_skl_pebs_event_constraints[];
struct event_constraint *intel_pebs_constraints(struct perf_event *event);
void intel_pmu_pebs_enable(struct perf_event *event);
@ -911,6 +903,8 @@ void intel_pmu_lbr_init_snb(void);
void intel_pmu_lbr_init_hsw(void);
void intel_pmu_lbr_init_skl(void);
int intel_pmu_setup_lbr_filter(struct perf_event *event);
void intel_pt_interrupt(void);
@ -934,6 +928,7 @@ static inline int is_ht_workaround_enabled(void)
{
return !!(x86_pmu.flags & PMU_FL_EXCL_ENABLED);
}
#else /* CONFIG_CPU_SUP_INTEL */
static inline void reserve_ds_buffers(void)

View File

@ -177,6 +177,14 @@ static struct event_constraint intel_slm_event_constraints[] __read_mostly =
EVENT_CONSTRAINT_END
};
struct event_constraint intel_skl_event_constraints[] = {
FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */
INTEL_UEVENT_CONSTRAINT(0x1c0, 0x2), /* INST_RETIRED.PREC_DIST */
EVENT_CONSTRAINT_END
};
static struct extra_reg intel_snb_extra_regs[] __read_mostly = {
/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x3f807f8fffull, RSP_0),
@ -193,6 +201,13 @@ static struct extra_reg intel_snbep_extra_regs[] __read_mostly = {
EVENT_EXTRA_END
};
static struct extra_reg intel_skl_extra_regs[] __read_mostly = {
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x3fffff8fffull, RSP_0),
INTEL_UEVENT_EXTRA_REG(0x01bb, MSR_OFFCORE_RSP_1, 0x3fffff8fffull, RSP_1),
INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd),
EVENT_EXTRA_END
};
EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x0b,umask=0x10,ldlat=3");
EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3");
EVENT_ATTR_STR(mem-stores, mem_st_snb, "event=0xcd,umask=0x2");
@ -244,6 +259,200 @@ static u64 intel_pmu_event_map(int hw_event)
return intel_perfmon_event_map[hw_event];
}
/*
* Notes on the events:
* - data reads do not include code reads (comparable to earlier tables)
* - data counts include speculative execution (except L1 write, dtlb, bpu)
* - remote node access includes remote memory, remote cache, remote mmio.
* - prefetches are not included in the counts.
* - icache miss does not include decoded icache
*/
#define SKL_DEMAND_DATA_RD BIT_ULL(0)
#define SKL_DEMAND_RFO BIT_ULL(1)
#define SKL_ANY_RESPONSE BIT_ULL(16)
#define SKL_SUPPLIER_NONE BIT_ULL(17)
#define SKL_L3_MISS_LOCAL_DRAM BIT_ULL(26)
#define SKL_L3_MISS_REMOTE_HOP0_DRAM BIT_ULL(27)
#define SKL_L3_MISS_REMOTE_HOP1_DRAM BIT_ULL(28)
#define SKL_L3_MISS_REMOTE_HOP2P_DRAM BIT_ULL(29)
#define SKL_L3_MISS (SKL_L3_MISS_LOCAL_DRAM| \
SKL_L3_MISS_REMOTE_HOP0_DRAM| \
SKL_L3_MISS_REMOTE_HOP1_DRAM| \
SKL_L3_MISS_REMOTE_HOP2P_DRAM)
#define SKL_SPL_HIT BIT_ULL(30)
#define SKL_SNOOP_NONE BIT_ULL(31)
#define SKL_SNOOP_NOT_NEEDED BIT_ULL(32)
#define SKL_SNOOP_MISS BIT_ULL(33)
#define SKL_SNOOP_HIT_NO_FWD BIT_ULL(34)
#define SKL_SNOOP_HIT_WITH_FWD BIT_ULL(35)
#define SKL_SNOOP_HITM BIT_ULL(36)
#define SKL_SNOOP_NON_DRAM BIT_ULL(37)
#define SKL_ANY_SNOOP (SKL_SPL_HIT|SKL_SNOOP_NONE| \
SKL_SNOOP_NOT_NEEDED|SKL_SNOOP_MISS| \
SKL_SNOOP_HIT_NO_FWD|SKL_SNOOP_HIT_WITH_FWD| \
SKL_SNOOP_HITM|SKL_SNOOP_NON_DRAM)
#define SKL_DEMAND_READ SKL_DEMAND_DATA_RD
#define SKL_SNOOP_DRAM (SKL_SNOOP_NONE| \
SKL_SNOOP_NOT_NEEDED|SKL_SNOOP_MISS| \
SKL_SNOOP_HIT_NO_FWD|SKL_SNOOP_HIT_WITH_FWD| \
SKL_SNOOP_HITM|SKL_SPL_HIT)
#define SKL_DEMAND_WRITE SKL_DEMAND_RFO
#define SKL_LLC_ACCESS SKL_ANY_RESPONSE
#define SKL_L3_MISS_REMOTE (SKL_L3_MISS_REMOTE_HOP0_DRAM| \
SKL_L3_MISS_REMOTE_HOP1_DRAM| \
SKL_L3_MISS_REMOTE_HOP2P_DRAM)
static __initconst const u64 skl_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
[ C(L1D ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_INST_RETIRED.ALL_LOADS */
[ C(RESULT_MISS) ] = 0x151, /* L1D.REPLACEMENT */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_INST_RETIRED.ALL_STORES */
[ C(RESULT_MISS) ] = 0x0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
[ C(L1I ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x283, /* ICACHE_64B.MISS */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
[ C(LL ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x1b7, /* OFFCORE_RESPONSE */
[ C(RESULT_MISS) ] = 0x1b7, /* OFFCORE_RESPONSE */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x1b7, /* OFFCORE_RESPONSE */
[ C(RESULT_MISS) ] = 0x1b7, /* OFFCORE_RESPONSE */
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
[ C(DTLB) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_INST_RETIRED.ALL_LOADS */
[ C(RESULT_MISS) ] = 0x608, /* DTLB_LOAD_MISSES.WALK_COMPLETED */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_INST_RETIRED.ALL_STORES */
[ C(RESULT_MISS) ] = 0x649, /* DTLB_STORE_MISSES.WALK_COMPLETED */
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
[ C(ITLB) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x2085, /* ITLB_MISSES.STLB_HIT */
[ C(RESULT_MISS) ] = 0xe85, /* ITLB_MISSES.WALK_COMPLETED */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
},
[ C(BPU ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0xc4, /* BR_INST_RETIRED.ALL_BRANCHES */
[ C(RESULT_MISS) ] = 0xc5, /* BR_MISP_RETIRED.ALL_BRANCHES */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
},
[ C(NODE) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x1b7, /* OFFCORE_RESPONSE */
[ C(RESULT_MISS) ] = 0x1b7, /* OFFCORE_RESPONSE */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x1b7, /* OFFCORE_RESPONSE */
[ C(RESULT_MISS) ] = 0x1b7, /* OFFCORE_RESPONSE */
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
};
static __initconst const u64 skl_hw_cache_extra_regs
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
[ C(LL ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = SKL_DEMAND_READ|
SKL_LLC_ACCESS|SKL_ANY_SNOOP,
[ C(RESULT_MISS) ] = SKL_DEMAND_READ|
SKL_L3_MISS|SKL_ANY_SNOOP|
SKL_SUPPLIER_NONE,
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = SKL_DEMAND_WRITE|
SKL_LLC_ACCESS|SKL_ANY_SNOOP,
[ C(RESULT_MISS) ] = SKL_DEMAND_WRITE|
SKL_L3_MISS|SKL_ANY_SNOOP|
SKL_SUPPLIER_NONE,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
[ C(NODE) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = SKL_DEMAND_READ|
SKL_L3_MISS_LOCAL_DRAM|SKL_SNOOP_DRAM,
[ C(RESULT_MISS) ] = SKL_DEMAND_READ|
SKL_L3_MISS_REMOTE|SKL_SNOOP_DRAM,
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = SKL_DEMAND_WRITE|
SKL_L3_MISS_LOCAL_DRAM|SKL_SNOOP_DRAM,
[ C(RESULT_MISS) ] = SKL_DEMAND_WRITE|
SKL_L3_MISS_REMOTE|SKL_SNOOP_DRAM,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
};
#define SNB_DMND_DATA_RD (1ULL << 0)
#define SNB_DMND_RFO (1ULL << 1)
#define SNB_DMND_IFETCH (1ULL << 2)
@ -1114,7 +1323,7 @@ static struct extra_reg intel_slm_extra_regs[] __read_mostly =
{
/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x768005ffffull, RSP_0),
INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x768005ffffull, RSP_1),
INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x368005ffffull, RSP_1),
EVENT_EXTRA_END
};
@ -1594,6 +1803,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
loops = 0;
again:
intel_pmu_lbr_read();
intel_pmu_ack_status(status);
if (++loops > 100) {
static bool warned = false;
@ -1608,16 +1818,16 @@ again:
inc_irq_stat(apic_perf_irqs);
intel_pmu_lbr_read();
/*
* CondChgd bit 63 doesn't mean any overflow status. Ignore
* and clear the bit.
* Ignore a range of extra bits in status that do not indicate
* overflow by themselves.
*/
if (__test_and_clear_bit(63, (unsigned long *)&status)) {
if (!status)
goto done;
}
status &= ~(GLOBAL_STATUS_COND_CHG |
GLOBAL_STATUS_ASIF |
GLOBAL_STATUS_LBRS_FROZEN);
if (!status)
goto done;
/*
* PEBS overflow sets bit 62 in the global status register
@ -1699,18 +1909,22 @@ intel_bts_constraints(struct perf_event *event)
return NULL;
}
static int intel_alt_er(int idx)
static int intel_alt_er(int idx, u64 config)
{
int alt_idx;
if (!(x86_pmu.flags & PMU_FL_HAS_RSP_1))
return idx;
if (idx == EXTRA_REG_RSP_0)
return EXTRA_REG_RSP_1;
alt_idx = EXTRA_REG_RSP_1;
if (idx == EXTRA_REG_RSP_1)
return EXTRA_REG_RSP_0;
alt_idx = EXTRA_REG_RSP_0;
return idx;
if (config & ~x86_pmu.extra_regs[alt_idx].valid_mask)
return idx;
return alt_idx;
}
static void intel_fixup_er(struct perf_event *event, int idx)
@ -1799,7 +2013,7 @@ again:
*/
c = NULL;
} else {
idx = intel_alt_er(idx);
idx = intel_alt_er(idx, reg->config);
if (idx != reg->idx) {
raw_spin_unlock_irqrestore(&era->lock, flags);
goto again;
@ -2253,6 +2467,15 @@ static void intel_pebs_aliases_snb(struct perf_event *event)
}
}
static unsigned long intel_pmu_free_running_flags(struct perf_event *event)
{
unsigned long flags = x86_pmu.free_running_flags;
if (event->attr.use_clockid)
flags &= ~PERF_SAMPLE_TIME;
return flags;
}
static int intel_pmu_hw_config(struct perf_event *event)
{
int ret = x86_pmu_hw_config(event);
@ -2263,7 +2486,8 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (event->attr.precise_ip) {
if (!event->attr.freq) {
event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD;
if (!(event->attr.sample_type & ~PEBS_FREERUNNING_FLAGS))
if (!(event->attr.sample_type &
~intel_pmu_free_running_flags(event)))
event->hw.flags |= PERF_X86_EVENT_FREERUNNING;
}
if (x86_pmu.pebs_aliases)
@ -2694,6 +2918,8 @@ static __initconst const struct x86_pmu core_pmu = {
.event_map = intel_pmu_event_map,
.max_events = ARRAY_SIZE(intel_perfmon_event_map),
.apic = 1,
.free_running_flags = PEBS_FREERUNNING_FLAGS,
/*
* Intel PMCs cannot be accessed sanely above 32-bit width,
* so we install an artificial 1<<31 period regardless of
@ -2732,6 +2958,7 @@ static __initconst const struct x86_pmu intel_pmu = {
.event_map = intel_pmu_event_map,
.max_events = ARRAY_SIZE(intel_perfmon_event_map),
.apic = 1,
.free_running_flags = PEBS_FREERUNNING_FLAGS,
/*
* Intel PMCs cannot be accessed sanely above 32 bit width,
* so we install an artificial 1<<31 period regardless of
@ -3269,6 +3496,29 @@ __init int intel_pmu_init(void)
pr_cont("Broadwell events, ");
break;
case 78: /* 14nm Skylake Mobile */
case 94: /* 14nm Skylake Desktop */
x86_pmu.late_ack = true;
memcpy(hw_cache_event_ids, skl_hw_cache_event_ids, sizeof(hw_cache_event_ids));
memcpy(hw_cache_extra_regs, skl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
intel_pmu_lbr_init_skl();
x86_pmu.event_constraints = intel_skl_event_constraints;
x86_pmu.pebs_constraints = intel_skl_pebs_event_constraints;
x86_pmu.extra_regs = intel_skl_extra_regs;
x86_pmu.pebs_aliases = intel_pebs_aliases_snb;
/* all extra regs are per-cpu when HT is on */
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
x86_pmu.hw_config = hsw_hw_config;
x86_pmu.get_event_constraints = hsw_get_event_constraints;
x86_pmu.cpu_events = hsw_events_attrs;
WARN_ON(!x86_pmu.format_attrs);
x86_pmu.cpu_events = hsw_events_attrs;
pr_cont("Skylake events, ");
break;
default:
switch (x86_pmu.version) {
case 1:
@ -3338,7 +3588,7 @@ __init int intel_pmu_init(void)
*/
if (x86_pmu.extra_regs) {
for (er = x86_pmu.extra_regs; er->msr; er++) {
er->extra_msr_access = check_msr(er->msr, 0x1ffUL);
er->extra_msr_access = check_msr(er->msr, 0x11UL);
/* Disable LBR select mapping */
if ((er->idx == EXTRA_REG_LBR) && !er->extra_msr_access)
x86_pmu.lbr_sel_map = NULL;

View File

@ -62,9 +62,6 @@ struct bts_buffer {
struct pmu bts_pmu;
void intel_pmu_enable_bts(u64 config);
void intel_pmu_disable_bts(void);
static size_t buf_size(struct page *page)
{
return 1 << (PAGE_SHIFT + page_private(page));

View File

@ -224,6 +224,19 @@ union hsw_tsx_tuning {
#define PEBS_HSW_TSX_FLAGS 0xff00000000ULL
/* Same as HSW, plus TSC */
struct pebs_record_skl {
u64 flags, ip;
u64 ax, bx, cx, dx;
u64 si, di, bp, sp;
u64 r8, r9, r10, r11;
u64 r12, r13, r14, r15;
u64 status, dla, dse, lat;
u64 real_ip, tsx_tuning;
u64 tsc;
};
void init_debug_store_on_cpu(int cpu)
{
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
@ -675,6 +688,28 @@ struct event_constraint intel_hsw_pebs_event_constraints[] = {
EVENT_CONSTRAINT_END
};
struct event_constraint intel_skl_pebs_event_constraints[] = {
INTEL_FLAGS_UEVENT_CONSTRAINT(0x1c0, 0x2), /* INST_RETIRED.PREC_DIST */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
/* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */
INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf),
INTEL_PLD_CONSTRAINT(0x1cd, 0xf), /* MEM_TRANS_RETIRED.* */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_STORES */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x21d0, 0xf), /* MEM_INST_RETIRED.LOCK_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x22d0, 0xf), /* MEM_INST_RETIRED.LOCK_STORES */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x41d0, 0xf), /* MEM_INST_RETIRED.SPLIT_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x42d0, 0xf), /* MEM_INST_RETIRED.SPLIT_STORES */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x81d0, 0xf), /* MEM_INST_RETIRED.ALL_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x82d0, 0xf), /* MEM_INST_RETIRED.ALL_STORES */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd1, 0xf), /* MEM_LOAD_RETIRED.* */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd2, 0xf), /* MEM_LOAD_L3_HIT_RETIRED.* */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd3, 0xf), /* MEM_LOAD_L3_MISS_RETIRED.* */
/* Allow all events as PEBS with no flags */
INTEL_ALL_EVENT_CONSTRAINT(0, 0xf),
EVENT_CONSTRAINT_END
};
struct event_constraint *intel_pebs_constraints(struct perf_event *event)
{
struct event_constraint *c;
@ -754,6 +789,11 @@ void intel_pmu_pebs_disable(struct perf_event *event)
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
struct debug_store *ds = cpuc->ds;
bool large_pebs = ds->pebs_interrupt_threshold >
ds->pebs_buffer_base + x86_pmu.pebs_record_size;
if (large_pebs)
intel_pmu_drain_pebs_buffer();
cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
@ -762,12 +802,8 @@ void intel_pmu_pebs_disable(struct perf_event *event)
else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
cpuc->pebs_enabled &= ~(1ULL << 63);
if (ds->pebs_interrupt_threshold >
ds->pebs_buffer_base + x86_pmu.pebs_record_size) {
intel_pmu_drain_pebs_buffer();
if (!pebs_is_enabled(cpuc))
perf_sched_cb_dec(event->ctx->pmu);
}
if (large_pebs && !pebs_is_enabled(cpuc))
perf_sched_cb_dec(event->ctx->pmu);
if (cpuc->enabled)
wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
@ -885,7 +921,7 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
return 0;
}
static inline u64 intel_hsw_weight(struct pebs_record_hsw *pebs)
static inline u64 intel_hsw_weight(struct pebs_record_skl *pebs)
{
if (pebs->tsx_tuning) {
union hsw_tsx_tuning tsx = { .value = pebs->tsx_tuning };
@ -894,7 +930,7 @@ static inline u64 intel_hsw_weight(struct pebs_record_hsw *pebs)
return 0;
}
static inline u64 intel_hsw_transaction(struct pebs_record_hsw *pebs)
static inline u64 intel_hsw_transaction(struct pebs_record_skl *pebs)
{
u64 txn = (pebs->tsx_tuning & PEBS_HSW_TSX_FLAGS) >> 32;
@ -918,7 +954,7 @@ static void setup_pebs_sample_data(struct perf_event *event,
* unconditionally access the 'extra' entries.
*/
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct pebs_record_hsw *pebs = __pebs;
struct pebs_record_skl *pebs = __pebs;
u64 sample_type;
int fll, fst, dsrc;
int fl = event->hw.flags;
@ -1016,6 +1052,16 @@ static void setup_pebs_sample_data(struct perf_event *event,
data->txn = intel_hsw_transaction(pebs);
}
/*
* v3 supplies an accurate time stamp, so we use that
* for the time stamp.
*
* We can only do this for the default trace clock.
*/
if (x86_pmu.intel_cap.pebs_format >= 3 &&
event->attr.use_clockid == 0)
data->time = native_sched_clock_from_tsc(pebs->tsc);
if (has_branch_stack(event))
data->br_stack = &cpuc->lbr_stack;
}
@ -1142,6 +1188,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
for (at = base; at < top; at += x86_pmu.pebs_record_size) {
struct pebs_record_nhm *p = at;
u64 pebs_status;
/* PEBS v3 has accurate status bits */
if (x86_pmu.intel_cap.pebs_format >= 3) {
@ -1152,12 +1199,17 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
continue;
}
bit = find_first_bit((unsigned long *)&p->status,
pebs_status = p->status & cpuc->pebs_enabled;
pebs_status &= (1ULL << x86_pmu.max_pebs_events) - 1;
bit = find_first_bit((unsigned long *)&pebs_status,
x86_pmu.max_pebs_events);
if (bit >= x86_pmu.max_pebs_events)
continue;
if (!test_bit(bit, cpuc->active_mask))
if (WARN(bit >= x86_pmu.max_pebs_events,
"PEBS record without PEBS event! status=%Lx pebs_enabled=%Lx active_mask=%Lx",
(unsigned long long)p->status, (unsigned long long)cpuc->pebs_enabled,
*(unsigned long long *)cpuc->active_mask))
continue;
/*
* The PEBS hardware does not deal well with the situation
* when events happen near to each other and multiple bits
@ -1172,27 +1224,21 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
* one, and it's not possible to reconstruct all events
* that caused the PEBS record. It's called collision.
* If collision happened, the record will be dropped.
*
*/
if (p->status != (1 << bit)) {
u64 pebs_status;
/* slow path */
pebs_status = p->status & cpuc->pebs_enabled;
pebs_status &= (1ULL << MAX_PEBS_EVENTS) - 1;
if (pebs_status != (1 << bit)) {
for_each_set_bit(i, (unsigned long *)&pebs_status,
MAX_PEBS_EVENTS)
error[i]++;
continue;
}
if (p->status != (1ULL << bit)) {
for_each_set_bit(i, (unsigned long *)&pebs_status,
x86_pmu.max_pebs_events)
error[i]++;
continue;
}
counts[bit]++;
}
for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
if ((counts[bit] == 0) && (error[bit] == 0))
continue;
event = cpuc->events[bit];
WARN_ON_ONCE(!event);
WARN_ON_ONCE(!event->attr.precise_ip);
@ -1245,6 +1291,14 @@ void __init intel_ds_init(void)
x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
break;
case 3:
pr_cont("PEBS fmt3%c, ", pebs_type);
x86_pmu.pebs_record_size =
sizeof(struct pebs_record_skl);
x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
x86_pmu.free_running_flags |= PERF_SAMPLE_TIME;
break;
default:
printk(KERN_CONT "no PEBS fmt%d%c, ", format, pebs_type);
x86_pmu.pebs = 0;

View File

@ -13,7 +13,8 @@ enum {
LBR_FORMAT_EIP = 0x02,
LBR_FORMAT_EIP_FLAGS = 0x03,
LBR_FORMAT_EIP_FLAGS2 = 0x04,
LBR_FORMAT_MAX_KNOWN = LBR_FORMAT_EIP_FLAGS2,
LBR_FORMAT_INFO = 0x05,
LBR_FORMAT_MAX_KNOWN = LBR_FORMAT_INFO,
};
static enum {
@ -139,6 +140,13 @@ static void __intel_pmu_lbr_enable(bool pmi)
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
u64 debugctl, lbr_select = 0, orig_debugctl;
/*
* No need to unfreeze manually, as v4 can do that as part
* of the GLOBAL_STATUS ack.
*/
if (pmi && x86_pmu.version >= 4)
return;
/*
* No need to reprogram LBR_SELECT in a PMI, as it
* did not change.
@ -186,6 +194,8 @@ static void intel_pmu_lbr_reset_64(void)
for (i = 0; i < x86_pmu.lbr_nr; i++) {
wrmsrl(x86_pmu.lbr_from + i, 0);
wrmsrl(x86_pmu.lbr_to + i, 0);
if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
wrmsrl(MSR_LBR_INFO_0 + i, 0);
}
}
@ -230,10 +240,12 @@ static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx)
mask = x86_pmu.lbr_nr - 1;
tos = intel_pmu_lbr_tos();
for (i = 0; i < x86_pmu.lbr_nr; i++) {
for (i = 0; i < tos; i++) {
lbr_idx = (tos - i) & mask;
wrmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
wrmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
wrmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]);
}
task_ctx->lbr_stack_state = LBR_NONE;
}
@ -251,10 +263,12 @@ static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
mask = x86_pmu.lbr_nr - 1;
tos = intel_pmu_lbr_tos();
for (i = 0; i < x86_pmu.lbr_nr; i++) {
for (i = 0; i < tos; i++) {
lbr_idx = (tos - i) & mask;
rdmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
rdmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
rdmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]);
}
task_ctx->lbr_stack_state = LBR_VALID;
}
@ -411,16 +425,31 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
u64 tos = intel_pmu_lbr_tos();
int i;
int out = 0;
int num = x86_pmu.lbr_nr;
for (i = 0; i < x86_pmu.lbr_nr; i++) {
if (cpuc->lbr_sel->config & LBR_CALL_STACK)
num = tos;
for (i = 0; i < num; i++) {
unsigned long lbr_idx = (tos - i) & mask;
u64 from, to, mis = 0, pred = 0, in_tx = 0, abort = 0;
int skip = 0;
u16 cycles = 0;
int lbr_flags = lbr_desc[lbr_format];
rdmsrl(x86_pmu.lbr_from + lbr_idx, from);
rdmsrl(x86_pmu.lbr_to + lbr_idx, to);
if (lbr_format == LBR_FORMAT_INFO) {
u64 info;
rdmsrl(MSR_LBR_INFO_0 + lbr_idx, info);
mis = !!(info & LBR_INFO_MISPRED);
pred = !mis;
in_tx = !!(info & LBR_INFO_IN_TX);
abort = !!(info & LBR_INFO_ABORT);
cycles = (info & LBR_INFO_CYCLES);
}
if (lbr_flags & LBR_EIP_FLAGS) {
mis = !!(from & LBR_FROM_FLAG_MISPRED);
pred = !mis;
@ -450,6 +479,7 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[out].predicted = pred;
cpuc->lbr_entries[out].in_tx = in_tx;
cpuc->lbr_entries[out].abort = abort;
cpuc->lbr_entries[out].cycles = cycles;
cpuc->lbr_entries[out].reserved = 0;
out++;
}
@ -947,6 +977,26 @@ void intel_pmu_lbr_init_hsw(void)
pr_cont("16-deep LBR, ");
}
/* skylake */
__init void intel_pmu_lbr_init_skl(void)
{
x86_pmu.lbr_nr = 32;
x86_pmu.lbr_tos = MSR_LBR_TOS;
x86_pmu.lbr_from = MSR_LBR_NHM_FROM;
x86_pmu.lbr_to = MSR_LBR_NHM_TO;
x86_pmu.lbr_sel_mask = LBR_SEL_MASK;
x86_pmu.lbr_sel_map = hsw_lbr_sel_map;
/*
* SW branch filter usage:
* - support syscall, sysret capture.
* That requires LBR_FAR but that means far
* jmp need to be filtered out
*/
pr_cont("32-deep LBR, ");
}
/* atom */
void __init intel_pmu_lbr_init_atom(void)
{

View File

@ -65,15 +65,21 @@ static struct pt_cap_desc {
} pt_caps[] = {
PT_CAP(max_subleaf, 0, CR_EAX, 0xffffffff),
PT_CAP(cr3_filtering, 0, CR_EBX, BIT(0)),
PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
PT_CAP(mtc, 0, CR_EBX, BIT(3)),
PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
PT_CAP(topa_multiple_entries, 0, CR_ECX, BIT(1)),
PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
PT_CAP(payloads_lip, 0, CR_ECX, BIT(31)),
PT_CAP(mtc_periods, 1, CR_EAX, 0xffff0000),
PT_CAP(cycle_thresholds, 1, CR_EBX, 0xffff),
PT_CAP(psb_periods, 1, CR_EBX, 0xffff0000),
};
static u32 pt_cap_get(enum pt_capabilities cap)
{
struct pt_cap_desc *cd = &pt_caps[cap];
u32 c = pt_pmu.caps[cd->leaf * 4 + cd->reg];
u32 c = pt_pmu.caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
unsigned int shift = __ffs(cd->mask);
return (c & cd->mask) >> shift;
@ -94,12 +100,22 @@ static struct attribute_group pt_cap_group = {
.name = "caps",
};
PMU_FORMAT_ATTR(cyc, "config:1" );
PMU_FORMAT_ATTR(mtc, "config:9" );
PMU_FORMAT_ATTR(tsc, "config:10" );
PMU_FORMAT_ATTR(noretcomp, "config:11" );
PMU_FORMAT_ATTR(mtc_period, "config:14-17" );
PMU_FORMAT_ATTR(cyc_thresh, "config:19-22" );
PMU_FORMAT_ATTR(psb_period, "config:24-27" );
static struct attribute *pt_formats_attr[] = {
&format_attr_cyc.attr,
&format_attr_mtc.attr,
&format_attr_tsc.attr,
&format_attr_noretcomp.attr,
&format_attr_mtc_period.attr,
&format_attr_cyc_thresh.attr,
&format_attr_psb_period.attr,
NULL,
};
@ -129,10 +145,10 @@ static int __init pt_pmu_hw_init(void)
for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
&pt_pmu.caps[CR_EAX + i*4],
&pt_pmu.caps[CR_EBX + i*4],
&pt_pmu.caps[CR_ECX + i*4],
&pt_pmu.caps[CR_EDX + i*4]);
&pt_pmu.caps[CR_EAX + i*PT_CPUID_REGS_NUM],
&pt_pmu.caps[CR_EBX + i*PT_CPUID_REGS_NUM],
&pt_pmu.caps[CR_ECX + i*PT_CPUID_REGS_NUM],
&pt_pmu.caps[CR_EDX + i*PT_CPUID_REGS_NUM]);
}
ret = -ENOMEM;
@ -170,15 +186,65 @@ fail:
return ret;
}
#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC)
#define RTIT_CTL_CYC_PSB (RTIT_CTL_CYCLEACC | \
RTIT_CTL_CYC_THRESH | \
RTIT_CTL_PSB_FREQ)
#define RTIT_CTL_MTC (RTIT_CTL_MTC_EN | \
RTIT_CTL_MTC_RANGE)
#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN | \
RTIT_CTL_DISRETC | \
RTIT_CTL_CYC_PSB | \
RTIT_CTL_MTC)
static bool pt_event_valid(struct perf_event *event)
{
u64 config = event->attr.config;
u64 allowed, requested;
if ((config & PT_CONFIG_MASK) != config)
return false;
if (config & RTIT_CTL_CYC_PSB) {
if (!pt_cap_get(PT_CAP_psb_cyc))
return false;
allowed = pt_cap_get(PT_CAP_psb_periods);
requested = (config & RTIT_CTL_PSB_FREQ) >>
RTIT_CTL_PSB_FREQ_OFFSET;
if (requested && (!(allowed & BIT(requested))))
return false;
allowed = pt_cap_get(PT_CAP_cycle_thresholds);
requested = (config & RTIT_CTL_CYC_THRESH) >>
RTIT_CTL_CYC_THRESH_OFFSET;
if (requested && (!(allowed & BIT(requested))))
return false;
}
if (config & RTIT_CTL_MTC) {
/*
* In the unlikely case that CPUID lists valid mtc periods,
* but not the mtc capability, drop out here.
*
* Spec says that setting mtc period bits while mtc bit in
* CPUID is 0 will #GP, so better safe than sorry.
*/
if (!pt_cap_get(PT_CAP_mtc))
return false;
allowed = pt_cap_get(PT_CAP_mtc_periods);
if (!allowed)
return false;
requested = (config & RTIT_CTL_MTC_RANGE) >>
RTIT_CTL_MTC_RANGE_OFFSET;
if (!(allowed & BIT(requested)))
return false;
}
return true;
}
@ -191,6 +257,11 @@ static void pt_config(struct perf_event *event)
{
u64 reg;
if (!event->hw.itrace_started) {
event->hw.itrace_started = 1;
wrmsrl(MSR_IA32_RTIT_STATUS, 0);
}
reg = RTIT_CTL_TOPA | RTIT_CTL_BRANCH_EN | RTIT_CTL_TRACEEN;
if (!event->attr.exclude_kernel)
@ -910,7 +981,6 @@ void intel_pt_interrupt(void)
pt_config_buffer(buf->cur->table, buf->cur_idx,
buf->output_off);
wrmsrl(MSR_IA32_RTIT_STATUS, 0);
pt_config(event);
}
}
@ -934,7 +1004,6 @@ static void pt_event_start(struct perf_event *event, int mode)
pt_config_buffer(buf->cur->table, buf->cur_idx,
buf->output_off);
wrmsrl(MSR_IA32_RTIT_STATUS, 0);
pt_config(event);
}

View File

@ -86,6 +86,10 @@ static const char *rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
1<<RAPL_IDX_RAM_NRG_STAT|\
1<<RAPL_IDX_PP1_NRG_STAT)
/* Knights Landing has PKG, RAM */
#define RAPL_IDX_KNL (1<<RAPL_IDX_PKG_NRG_STAT|\
1<<RAPL_IDX_RAM_NRG_STAT)
/*
* event code: LSB 8 bits, passed in attr->config
* any other bit is reserved
@ -486,6 +490,18 @@ static struct attribute *rapl_events_hsw_attr[] = {
NULL,
};
static struct attribute *rapl_events_knl_attr[] = {
EVENT_PTR(rapl_pkg),
EVENT_PTR(rapl_ram),
EVENT_PTR(rapl_pkg_unit),
EVENT_PTR(rapl_ram_unit),
EVENT_PTR(rapl_pkg_scale),
EVENT_PTR(rapl_ram_scale),
NULL,
};
static struct attribute_group rapl_pmu_events_group = {
.name = "events",
.attrs = NULL, /* patched at runtime */
@ -730,6 +746,10 @@ static int __init rapl_pmu_init(void)
rapl_cntr_mask = RAPL_IDX_SRV;
rapl_pmu_events_group.attrs = rapl_events_srv_attr;
break;
case 87: /* Knights Landing */
rapl_add_quirk(rapl_hsw_server_quirk);
rapl_cntr_mask = RAPL_IDX_KNL;
rapl_pmu_events_group.attrs = rapl_events_knl_attr;
default:
/* unsupported */

View File

@ -911,6 +911,9 @@ static int __init uncore_pci_init(void)
case 63: /* Haswell-EP */
ret = hswep_uncore_pci_init();
break;
case 86: /* BDX-DE */
ret = bdx_uncore_pci_init();
break;
case 42: /* Sandy Bridge */
ret = snb_uncore_pci_init();
break;
@ -1209,6 +1212,11 @@ static int __init uncore_cpu_init(void)
break;
case 42: /* Sandy Bridge */
case 58: /* Ivy Bridge */
case 60: /* Haswell */
case 69: /* Haswell */
case 70: /* Haswell */
case 61: /* Broadwell */
case 71: /* Broadwell */
snb_uncore_cpu_init();
break;
case 45: /* Sandy Bridge-EP */
@ -1224,6 +1232,9 @@ static int __init uncore_cpu_init(void)
case 63: /* Haswell-EP */
hswep_uncore_cpu_init();
break;
case 86: /* BDX-DE */
bdx_uncore_cpu_init();
break;
default:
return 0;
}

View File

@ -336,6 +336,8 @@ int ivbep_uncore_pci_init(void);
void ivbep_uncore_cpu_init(void);
int hswep_uncore_pci_init(void);
void hswep_uncore_cpu_init(void);
int bdx_uncore_pci_init(void);
void bdx_uncore_cpu_init(void);
/* perf_event_intel_uncore_nhmex.c */
void nhmex_uncore_cpu_init(void);

View File

@ -45,6 +45,11 @@
#define SNB_UNC_CBO_0_PER_CTR0 0x706
#define SNB_UNC_CBO_MSR_OFFSET 0x10
/* SNB ARB register */
#define SNB_UNC_ARB_PER_CTR0 0x3b0
#define SNB_UNC_ARB_PERFEVTSEL0 0x3b2
#define SNB_UNC_ARB_MSR_OFFSET 0x10
/* NHM global control register */
#define NHM_UNC_PERF_GLOBAL_CTL 0x391
#define NHM_UNC_FIXED_CTR 0x394
@ -115,7 +120,7 @@ static struct intel_uncore_ops snb_uncore_msr_ops = {
.read_counter = uncore_msr_read_counter,
};
static struct event_constraint snb_uncore_cbox_constraints[] = {
static struct event_constraint snb_uncore_arb_constraints[] = {
UNCORE_EVENT_CONSTRAINT(0x80, 0x1),
UNCORE_EVENT_CONSTRAINT(0x83, 0x1),
EVENT_CONSTRAINT_END
@ -134,14 +139,28 @@ static struct intel_uncore_type snb_uncore_cbox = {
.single_fixed = 1,
.event_mask = SNB_UNC_RAW_EVENT_MASK,
.msr_offset = SNB_UNC_CBO_MSR_OFFSET,
.constraints = snb_uncore_cbox_constraints,
.ops = &snb_uncore_msr_ops,
.format_group = &snb_uncore_format_group,
.event_descs = snb_uncore_events,
};
static struct intel_uncore_type snb_uncore_arb = {
.name = "arb",
.num_counters = 2,
.num_boxes = 1,
.perf_ctr_bits = 44,
.perf_ctr = SNB_UNC_ARB_PER_CTR0,
.event_ctl = SNB_UNC_ARB_PERFEVTSEL0,
.event_mask = SNB_UNC_RAW_EVENT_MASK,
.msr_offset = SNB_UNC_ARB_MSR_OFFSET,
.constraints = snb_uncore_arb_constraints,
.ops = &snb_uncore_msr_ops,
.format_group = &snb_uncore_format_group,
};
static struct intel_uncore_type *snb_msr_uncores[] = {
&snb_uncore_cbox,
&snb_uncore_arb,
NULL,
};

View File

@ -2215,7 +2215,7 @@ static struct intel_uncore_type *hswep_pci_uncores[] = {
NULL,
};
static DEFINE_PCI_DEVICE_TABLE(hswep_uncore_pci_ids) = {
static const struct pci_device_id hswep_uncore_pci_ids[] = {
{ /* Home Agent 0 */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x2f30),
.driver_data = UNCORE_PCI_DEV_DATA(HSWEP_PCI_UNCORE_HA, 0),
@ -2321,3 +2321,167 @@ int hswep_uncore_pci_init(void)
return 0;
}
/* end of Haswell-EP uncore support */
/* BDX-DE uncore support */
static struct intel_uncore_type bdx_uncore_ubox = {
.name = "ubox",
.num_counters = 2,
.num_boxes = 1,
.perf_ctr_bits = 48,
.fixed_ctr_bits = 48,
.perf_ctr = HSWEP_U_MSR_PMON_CTR0,
.event_ctl = HSWEP_U_MSR_PMON_CTL0,
.event_mask = SNBEP_U_MSR_PMON_RAW_EVENT_MASK,
.fixed_ctr = HSWEP_U_MSR_PMON_UCLK_FIXED_CTR,
.fixed_ctl = HSWEP_U_MSR_PMON_UCLK_FIXED_CTL,
.num_shared_regs = 1,
.ops = &ivbep_uncore_msr_ops,
.format_group = &ivbep_uncore_ubox_format_group,
};
static struct event_constraint bdx_uncore_cbox_constraints[] = {
UNCORE_EVENT_CONSTRAINT(0x09, 0x3),
UNCORE_EVENT_CONSTRAINT(0x11, 0x1),
UNCORE_EVENT_CONSTRAINT(0x36, 0x1),
EVENT_CONSTRAINT_END
};
static struct intel_uncore_type bdx_uncore_cbox = {
.name = "cbox",
.num_counters = 4,
.num_boxes = 8,
.perf_ctr_bits = 48,
.event_ctl = HSWEP_C0_MSR_PMON_CTL0,
.perf_ctr = HSWEP_C0_MSR_PMON_CTR0,
.event_mask = SNBEP_CBO_MSR_PMON_RAW_EVENT_MASK,
.box_ctl = HSWEP_C0_MSR_PMON_BOX_CTL,
.msr_offset = HSWEP_CBO_MSR_OFFSET,
.num_shared_regs = 1,
.constraints = bdx_uncore_cbox_constraints,
.ops = &hswep_uncore_cbox_ops,
.format_group = &hswep_uncore_cbox_format_group,
};
static struct intel_uncore_type *bdx_msr_uncores[] = {
&bdx_uncore_ubox,
&bdx_uncore_cbox,
&hswep_uncore_pcu,
NULL,
};
void bdx_uncore_cpu_init(void)
{
if (bdx_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
bdx_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
uncore_msr_uncores = bdx_msr_uncores;
}
static struct intel_uncore_type bdx_uncore_ha = {
.name = "ha",
.num_counters = 4,
.num_boxes = 1,
.perf_ctr_bits = 48,
SNBEP_UNCORE_PCI_COMMON_INIT(),
};
static struct intel_uncore_type bdx_uncore_imc = {
.name = "imc",
.num_counters = 5,
.num_boxes = 2,
.perf_ctr_bits = 48,
.fixed_ctr_bits = 48,
.fixed_ctr = SNBEP_MC_CHy_PCI_PMON_FIXED_CTR,
.fixed_ctl = SNBEP_MC_CHy_PCI_PMON_FIXED_CTL,
.event_descs = hswep_uncore_imc_events,
SNBEP_UNCORE_PCI_COMMON_INIT(),
};
static struct intel_uncore_type bdx_uncore_irp = {
.name = "irp",
.num_counters = 4,
.num_boxes = 1,
.perf_ctr_bits = 48,
.event_mask = SNBEP_PMON_RAW_EVENT_MASK,
.box_ctl = SNBEP_PCI_PMON_BOX_CTL,
.ops = &hswep_uncore_irp_ops,
.format_group = &snbep_uncore_format_group,
};
static struct event_constraint bdx_uncore_r2pcie_constraints[] = {
UNCORE_EVENT_CONSTRAINT(0x10, 0x3),
UNCORE_EVENT_CONSTRAINT(0x11, 0x3),
UNCORE_EVENT_CONSTRAINT(0x13, 0x1),
UNCORE_EVENT_CONSTRAINT(0x23, 0x1),
UNCORE_EVENT_CONSTRAINT(0x25, 0x1),
UNCORE_EVENT_CONSTRAINT(0x26, 0x3),
UNCORE_EVENT_CONSTRAINT(0x2d, 0x3),
EVENT_CONSTRAINT_END
};
static struct intel_uncore_type bdx_uncore_r2pcie = {
.name = "r2pcie",
.num_counters = 4,
.num_boxes = 1,
.perf_ctr_bits = 48,
.constraints = bdx_uncore_r2pcie_constraints,
SNBEP_UNCORE_PCI_COMMON_INIT(),
};
enum {
BDX_PCI_UNCORE_HA,
BDX_PCI_UNCORE_IMC,
BDX_PCI_UNCORE_IRP,
BDX_PCI_UNCORE_R2PCIE,
};
static struct intel_uncore_type *bdx_pci_uncores[] = {
[BDX_PCI_UNCORE_HA] = &bdx_uncore_ha,
[BDX_PCI_UNCORE_IMC] = &bdx_uncore_imc,
[BDX_PCI_UNCORE_IRP] = &bdx_uncore_irp,
[BDX_PCI_UNCORE_R2PCIE] = &bdx_uncore_r2pcie,
NULL,
};
static DEFINE_PCI_DEVICE_TABLE(bdx_uncore_pci_ids) = {
{ /* Home Agent 0 */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f30),
.driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_HA, 0),
},
{ /* MC0 Channel 0 */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fb0),
.driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 0),
},
{ /* MC0 Channel 1 */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fb1),
.driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 1),
},
{ /* IRP */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f39),
.driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IRP, 0),
},
{ /* R2PCIe */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f34),
.driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_R2PCIE, 0),
},
{ /* end: all zeroes */ }
};
static struct pci_driver bdx_uncore_pci_driver = {
.name = "bdx_uncore",
.id_table = bdx_uncore_pci_ids,
};
int bdx_uncore_pci_init(void)
{
int ret = snbep_pci2phy_map_init(0x6f1e);
if (ret)
return ret;
uncore_pci_uncores = bdx_pci_uncores;
uncore_pci_driver = &bdx_uncore_pci_driver;
return 0;
}
/* end of BDX-DE uncore support */

View File

@ -0,0 +1,242 @@
#include <linux/perf_event.h>
enum perf_msr_id {
PERF_MSR_TSC = 0,
PERF_MSR_APERF = 1,
PERF_MSR_MPERF = 2,
PERF_MSR_PPERF = 3,
PERF_MSR_SMI = 4,
PERF_MSR_EVENT_MAX,
};
bool test_aperfmperf(int idx)
{
return boot_cpu_has(X86_FEATURE_APERFMPERF);
}
bool test_intel(int idx)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
boot_cpu_data.x86 != 6)
return false;
switch (boot_cpu_data.x86_model) {
case 30: /* 45nm Nehalem */
case 26: /* 45nm Nehalem-EP */
case 46: /* 45nm Nehalem-EX */
case 37: /* 32nm Westmere */
case 44: /* 32nm Westmere-EP */
case 47: /* 32nm Westmere-EX */
case 42: /* 32nm SandyBridge */
case 45: /* 32nm SandyBridge-E/EN/EP */
case 58: /* 22nm IvyBridge */
case 62: /* 22nm IvyBridge-EP/EX */
case 60: /* 22nm Haswell Core */
case 63: /* 22nm Haswell Server */
case 69: /* 22nm Haswell ULT */
case 70: /* 22nm Haswell + GT3e (Intel Iris Pro graphics) */
case 61: /* 14nm Broadwell Core-M */
case 86: /* 14nm Broadwell Xeon D */
case 71: /* 14nm Broadwell + GT3e (Intel Iris Pro graphics) */
case 79: /* 14nm Broadwell Server */
case 55: /* 22nm Atom "Silvermont" */
case 77: /* 22nm Atom "Silvermont Avoton/Rangely" */
case 76: /* 14nm Atom "Airmont" */
if (idx == PERF_MSR_SMI)
return true;
break;
case 78: /* 14nm Skylake Mobile */
case 94: /* 14nm Skylake Desktop */
if (idx == PERF_MSR_SMI || idx == PERF_MSR_PPERF)
return true;
break;
}
return false;
}
struct perf_msr {
u64 msr;
struct perf_pmu_events_attr *attr;
bool (*test)(int idx);
};
PMU_EVENT_ATTR_STRING(tsc, evattr_tsc, "event=0x00");
PMU_EVENT_ATTR_STRING(aperf, evattr_aperf, "event=0x01");
PMU_EVENT_ATTR_STRING(mperf, evattr_mperf, "event=0x02");
PMU_EVENT_ATTR_STRING(pperf, evattr_pperf, "event=0x03");
PMU_EVENT_ATTR_STRING(smi, evattr_smi, "event=0x04");
static struct perf_msr msr[] = {
[PERF_MSR_TSC] = { 0, &evattr_tsc, NULL, },
[PERF_MSR_APERF] = { MSR_IA32_APERF, &evattr_aperf, test_aperfmperf, },
[PERF_MSR_MPERF] = { MSR_IA32_MPERF, &evattr_mperf, test_aperfmperf, },
[PERF_MSR_PPERF] = { MSR_PPERF, &evattr_pperf, test_intel, },
[PERF_MSR_SMI] = { MSR_SMI_COUNT, &evattr_smi, test_intel, },
};
static struct attribute *events_attrs[PERF_MSR_EVENT_MAX + 1] = {
NULL,
};
static struct attribute_group events_attr_group = {
.name = "events",
.attrs = events_attrs,
};
PMU_FORMAT_ATTR(event, "config:0-63");
static struct attribute *format_attrs[] = {
&format_attr_event.attr,
NULL,
};
static struct attribute_group format_attr_group = {
.name = "format",
.attrs = format_attrs,
};
static const struct attribute_group *attr_groups[] = {
&events_attr_group,
&format_attr_group,
NULL,
};
static int msr_event_init(struct perf_event *event)
{
u64 cfg = event->attr.config;
if (event->attr.type != event->pmu->type)
return -ENOENT;
if (cfg >= PERF_MSR_EVENT_MAX)
return -EINVAL;
/* unsupported modes and filters */
if (event->attr.exclude_user ||
event->attr.exclude_kernel ||
event->attr.exclude_hv ||
event->attr.exclude_idle ||
event->attr.exclude_host ||
event->attr.exclude_guest ||
event->attr.sample_period) /* no sampling */
return -EINVAL;
if (!msr[cfg].attr)
return -EINVAL;
event->hw.idx = -1;
event->hw.event_base = msr[cfg].msr;
event->hw.config = cfg;
return 0;
}
static inline u64 msr_read_counter(struct perf_event *event)
{
u64 now;
if (event->hw.event_base)
rdmsrl(event->hw.event_base, now);
else
rdtscll(now);
return now;
}
static void msr_event_update(struct perf_event *event)
{
u64 prev, now;
s64 delta;
/* Careful, an NMI might modify the previous event value. */
again:
prev = local64_read(&event->hw.prev_count);
now = msr_read_counter(event);
if (local64_cmpxchg(&event->hw.prev_count, prev, now) != prev)
goto again;
delta = now - prev;
if (unlikely(event->hw.event_base == MSR_SMI_COUNT)) {
delta <<= 32;
delta >>= 32; /* sign extend */
}
local64_add(now - prev, &event->count);
}
static void msr_event_start(struct perf_event *event, int flags)
{
u64 now;
now = msr_read_counter(event);
local64_set(&event->hw.prev_count, now);
}
static void msr_event_stop(struct perf_event *event, int flags)
{
msr_event_update(event);
}
static void msr_event_del(struct perf_event *event, int flags)
{
msr_event_stop(event, PERF_EF_UPDATE);
}
static int msr_event_add(struct perf_event *event, int flags)
{
if (flags & PERF_EF_START)
msr_event_start(event, flags);
return 0;
}
static struct pmu pmu_msr = {
.task_ctx_nr = perf_sw_context,
.attr_groups = attr_groups,
.event_init = msr_event_init,
.add = msr_event_add,
.del = msr_event_del,
.start = msr_event_start,
.stop = msr_event_stop,
.read = msr_event_update,
.capabilities = PERF_PMU_CAP_NO_INTERRUPT,
};
static int __init msr_init(void)
{
int i, j = 0;
if (!boot_cpu_has(X86_FEATURE_TSC)) {
pr_cont("no MSR PMU driver.\n");
return 0;
}
/* Probe the MSRs. */
for (i = PERF_MSR_TSC + 1; i < PERF_MSR_EVENT_MAX; i++) {
u64 val;
/*
* Virt sucks arse; you cannot tell if a R/O MSR is present :/
*/
if (!msr[i].test(i) || rdmsrl_safe(msr[i].msr, &val))
msr[i].attr = NULL;
}
/* List remaining MSRs in the sysfs attrs. */
for (i = 0; i < PERF_MSR_EVENT_MAX; i++) {
if (msr[i].attr)
events_attrs[j++] = &msr[i].attr->attr.attr;
}
events_attrs[j] = NULL;
perf_pmu_register(&pmu_msr, "msr", -1);
return 0;
}
device_initcall(msr_init);

View File

@ -32,6 +32,7 @@
#include <linux/irqflags.h>
#include <linux/notifier.h>
#include <linux/kallsyms.h>
#include <linux/kprobes.h>
#include <linux/percpu.h>
#include <linux/kdebug.h>
#include <linux/kernel.h>
@ -179,7 +180,11 @@ int arch_check_bp_in_kernelspace(struct perf_event *bp)
va = info->address;
len = bp->attr.bp_len;
return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE);
/*
* We don't need to worry about va + len - 1 overflowing:
* we already require that va is aligned to a multiple of len.
*/
return (va >= TASK_SIZE_MAX) || ((va + len - 1) >= TASK_SIZE_MAX);
}
int arch_bp_generic_fields(int x86_len, int x86_type,
@ -243,6 +248,20 @@ static int arch_build_bp_info(struct perf_event *bp)
info->type = X86_BREAKPOINT_RW;
break;
case HW_BREAKPOINT_X:
/*
* We don't allow kernel breakpoints in places that are not
* acceptable for kprobes. On non-kprobes kernels, we don't
* allow kernel breakpoints at all.
*/
if (bp->attr.bp_addr >= TASK_SIZE_MAX) {
#ifdef CONFIG_KPROBES
if (within_kprobe_blacklist(bp->attr.bp_addr))
return -EINVAL;
#else
return -EINVAL;
#endif
}
info->type = X86_BREAKPOINT_EXECUTE;
/*
* x86 inst breakpoints need to have a specific undefined len.
@ -276,8 +295,18 @@ static int arch_build_bp_info(struct perf_event *bp)
break;
#endif
default:
/* AMD range breakpoint */
if (!is_power_of_2(bp->attr.bp_len))
return -EINVAL;
if (bp->attr.bp_addr & (bp->attr.bp_len - 1))
return -EINVAL;
/*
* It's impossible to use a range breakpoint to fake out
* user vs kernel detection because bp_len - 1 can't
* have the high bit set. If we ever allow range instruction
* breakpoints, then we'll have to check for kprobe-blacklisted
* addresses anywhere in the range.
*/
if (!cpu_has_bpext)
return -EOPNOTSUPP;
info->mask = bp->attr.bp_len - 1;

View File

@ -296,6 +296,14 @@ u64 native_sched_clock(void)
return cycles_2_ns(tsc_now);
}
/*
* Generate a sched_clock if you already have a TSC value.
*/
u64 native_sched_clock_from_tsc(u64 tsc)
{
return cycles_2_ns(tsc);
}
/* We need to define a real function for sched_clock, to override the
weak default version */
#ifdef CONFIG_PARAVIRT

View File

@ -985,3 +985,12 @@ arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs
return -1;
}
bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check ctx,
struct pt_regs *regs)
{
if (ctx == RP_CHECK_CALL) /* sp was just decremented by "call" insn */
return regs->sp < ret->stack;
else
return regs->sp <= ret->stack;
}

View File

@ -267,6 +267,8 @@ extern void show_registers(struct pt_regs *regs);
extern void kprobes_inc_nmissed_count(struct kprobe *p);
extern bool arch_within_kprobe_blacklist(unsigned long addr);
extern bool within_kprobe_blacklist(unsigned long addr);
struct kprobe_insn_cache {
struct mutex mutex;
void *(*alloc)(void); /* allocate insn page */

View File

@ -243,6 +243,7 @@ enum {
TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
TRACE_EVENT_FL_TRACEPOINT_BIT,
TRACE_EVENT_FL_KPROBE_BIT,
TRACE_EVENT_FL_UPROBE_BIT,
};
/*
@ -257,6 +258,7 @@ enum {
* USE_CALL_FILTER - For trace internal events, don't use file filter
* TRACEPOINT - Event is a tracepoint
* KPROBE - Event is a kprobe
* UPROBE - Event is a uprobe
*/
enum {
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
@ -267,8 +269,11 @@ enum {
TRACE_EVENT_FL_USE_CALL_FILTER = (1 << TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT),
TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT),
};
#define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE)
struct trace_event_call {
struct list_head list;
struct trace_event_class *class;
@ -542,7 +547,7 @@ event_trigger_unlock_commit_regs(struct trace_event_file *file,
event_triggers_post_call(file, tt);
}
#ifdef CONFIG_BPF_SYSCALL
#ifdef CONFIG_BPF_EVENTS
unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx);
#else
static inline unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx)

View File

@ -92,6 +92,22 @@ struct uprobe_task {
unsigned int depth;
};
struct return_instance {
struct uprobe *uprobe;
unsigned long func;
unsigned long stack; /* stack pointer */
unsigned long orig_ret_vaddr; /* original return address */
bool chained; /* true, if instance is nested */
struct return_instance *next; /* keep as stack */
};
enum rp_check {
RP_CHECK_CALL,
RP_CHECK_CHAIN_CALL,
RP_CHECK_RET,
};
struct xol_area;
struct uprobes_state {
@ -128,6 +144,7 @@ extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
extern int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data);
extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs *regs);
extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs);
extern bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check ctx, struct pt_regs *regs);
extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
void *src, unsigned long len);

View File

@ -330,7 +330,8 @@ struct perf_event_attr {
mmap2 : 1, /* include mmap with inode data */
comm_exec : 1, /* flag comm events that are due to an exec */
use_clockid : 1, /* use @clockid for time fields */
__reserved_1 : 38;
context_switch : 1, /* context switch data */
__reserved_1 : 37;
union {
__u32 wakeup_events; /* wakeup every n events */
@ -572,9 +573,11 @@ struct perf_event_mmap_page {
/*
* PERF_RECORD_MISC_MMAP_DATA and PERF_RECORD_MISC_COMM_EXEC are used on
* different events so can reuse the same bit position.
* Ditto PERF_RECORD_MISC_SWITCH_OUT.
*/
#define PERF_RECORD_MISC_MMAP_DATA (1 << 13)
#define PERF_RECORD_MISC_COMM_EXEC (1 << 13)
#define PERF_RECORD_MISC_SWITCH_OUT (1 << 13)
/*
* Indicates that the content of PERF_SAMPLE_IP points to
* the actual instruction that triggered the event. See also
@ -818,6 +821,32 @@ enum perf_event_type {
*/
PERF_RECORD_LOST_SAMPLES = 13,
/*
* Records a context switch in or out (flagged by
* PERF_RECORD_MISC_SWITCH_OUT). See also
* PERF_RECORD_SWITCH_CPU_WIDE.
*
* struct {
* struct perf_event_header header;
* struct sample_id sample_id;
* };
*/
PERF_RECORD_SWITCH = 14,
/*
* CPU-wide version of PERF_RECORD_SWITCH with next_prev_pid and
* next_prev_tid that are the next (switching out) or previous
* (switching in) pid/tid.
*
* struct {
* struct perf_event_header header;
* u32 next_prev_pid;
* u32 next_prev_tid;
* struct sample_id sample_id;
* };
*/
PERF_RECORD_SWITCH_CPU_WIDE = 15,
PERF_RECORD_MAX, /* non-ABI */
};
@ -922,6 +951,7 @@ union perf_mem_data_src {
*
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
*/
struct perf_branch_entry {
__u64 from;
@ -930,7 +960,8 @@ struct perf_branch_entry {
predicted:1,/* target predicted */
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
reserved:60;
cycles:16, /* cycle count to last branch */
reserved:44;
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */

View File

@ -163,6 +163,7 @@ static atomic_t nr_mmap_events __read_mostly;
static atomic_t nr_comm_events __read_mostly;
static atomic_t nr_task_events __read_mostly;
static atomic_t nr_freq_events __read_mostly;
static atomic_t nr_switch_events __read_mostly;
static LIST_HEAD(pmus);
static DEFINE_MUTEX(pmus_lock);
@ -2619,6 +2620,9 @@ static void perf_pmu_sched_task(struct task_struct *prev,
local_irq_restore(flags);
}
static void perf_event_switch(struct task_struct *task,
struct task_struct *next_prev, bool sched_in);
#define for_each_task_context_nr(ctxn) \
for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
@ -2641,6 +2645,9 @@ void __perf_event_task_sched_out(struct task_struct *task,
if (__this_cpu_read(perf_sched_cb_usages))
perf_pmu_sched_task(task, next, false);
if (atomic_read(&nr_switch_events))
perf_event_switch(task, next, false);
for_each_task_context_nr(ctxn)
perf_event_context_sched_out(task, ctxn, next);
@ -2831,6 +2838,9 @@ void __perf_event_task_sched_in(struct task_struct *prev,
if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
perf_cgroup_sched_in(prev, task);
if (atomic_read(&nr_switch_events))
perf_event_switch(task, prev, true);
if (__this_cpu_read(perf_sched_cb_usages))
perf_pmu_sched_task(prev, task, true);
}
@ -3454,6 +3464,10 @@ static void unaccount_event(struct perf_event *event)
atomic_dec(&nr_task_events);
if (event->attr.freq)
atomic_dec(&nr_freq_events);
if (event->attr.context_switch) {
static_key_slow_dec_deferred(&perf_sched_events);
atomic_dec(&nr_switch_events);
}
if (is_cgroup_event(event))
static_key_slow_dec_deferred(&perf_sched_events);
if (has_branch_stack(event))
@ -6024,6 +6038,91 @@ void perf_log_lost_samples(struct perf_event *event, u64 lost)
perf_output_end(&handle);
}
/*
* context_switch tracking
*/
struct perf_switch_event {
struct task_struct *task;
struct task_struct *next_prev;
struct {
struct perf_event_header header;
u32 next_prev_pid;
u32 next_prev_tid;
} event_id;
};
static int perf_event_switch_match(struct perf_event *event)
{
return event->attr.context_switch;
}
static void perf_event_switch_output(struct perf_event *event, void *data)
{
struct perf_switch_event *se = data;
struct perf_output_handle handle;
struct perf_sample_data sample;
int ret;
if (!perf_event_switch_match(event))
return;
/* Only CPU-wide events are allowed to see next/prev pid/tid */
if (event->ctx->task) {
se->event_id.header.type = PERF_RECORD_SWITCH;
se->event_id.header.size = sizeof(se->event_id.header);
} else {
se->event_id.header.type = PERF_RECORD_SWITCH_CPU_WIDE;
se->event_id.header.size = sizeof(se->event_id);
se->event_id.next_prev_pid =
perf_event_pid(event, se->next_prev);
se->event_id.next_prev_tid =
perf_event_tid(event, se->next_prev);
}
perf_event_header__init_id(&se->event_id.header, &sample, event);
ret = perf_output_begin(&handle, event, se->event_id.header.size);
if (ret)
return;
if (event->ctx->task)
perf_output_put(&handle, se->event_id.header);
else
perf_output_put(&handle, se->event_id);
perf_event__output_id_sample(event, &handle, &sample);
perf_output_end(&handle);
}
static void perf_event_switch(struct task_struct *task,
struct task_struct *next_prev, bool sched_in)
{
struct perf_switch_event switch_event;
/* N.B. caller checks nr_switch_events != 0 */
switch_event = (struct perf_switch_event){
.task = task,
.next_prev = next_prev,
.event_id = {
.header = {
/* .type */
.misc = sched_in ? 0 : PERF_RECORD_MISC_SWITCH_OUT,
/* .size */
},
/* .next_prev_pid */
/* .next_prev_tid */
},
};
perf_event_aux(perf_event_switch_output,
&switch_event,
NULL);
}
/*
* IRQ throttle logging
*/
@ -6083,8 +6182,6 @@ static void perf_log_itrace_start(struct perf_event *event)
event->hw.itrace_started)
return;
event->hw.itrace_started = 1;
rec.header.type = PERF_RECORD_ITRACE_START;
rec.header.misc = 0;
rec.header.size = sizeof(rec);
@ -6792,8 +6889,8 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
if (event->tp_event->prog)
return -EEXIST;
if (!(event->tp_event->flags & TRACE_EVENT_FL_KPROBE))
/* bpf programs can only be attached to kprobes */
if (!(event->tp_event->flags & TRACE_EVENT_FL_UKPROBE))
/* bpf programs can only be attached to u/kprobes */
return -EINVAL;
prog = bpf_prog_get(prog_fd);
@ -7522,6 +7619,10 @@ static void account_event(struct perf_event *event)
if (atomic_inc_return(&nr_freq_events) == 1)
tick_nohz_full_kick_all();
}
if (event->attr.context_switch) {
atomic_inc(&nr_switch_events);
static_key_slow_inc(&perf_sched_events.key);
}
if (has_branch_stack(event))
static_key_slow_inc(&perf_sched_events.key);
if (is_cgroup_event(event))

View File

@ -437,7 +437,10 @@ static struct page *rb_alloc_aux_page(int node, int order)
if (page && order) {
/*
* Communicate the allocation size to the driver
* Communicate the allocation size to the driver:
* if we managed to secure a high-order allocation,
* set its first page's private to this order;
* !PagePrivate(page) means it's just a normal page.
*/
split_page(page, order);
SetPagePrivate(page);

View File

@ -86,15 +86,6 @@ struct uprobe {
struct arch_uprobe arch;
};
struct return_instance {
struct uprobe *uprobe;
unsigned long func;
unsigned long orig_ret_vaddr; /* original return address */
bool chained; /* true, if instance is nested */
struct return_instance *next; /* keep as stack */
};
/*
* Execute out of line area: anonymous executable mapping installed
* by the probed task to execute the copy of the original instruction
@ -105,17 +96,18 @@ struct return_instance {
* allocated.
*/
struct xol_area {
wait_queue_head_t wq; /* if all slots are busy */
atomic_t slot_count; /* number of in-use slots */
unsigned long *bitmap; /* 0 = free slot */
struct page *page;
wait_queue_head_t wq; /* if all slots are busy */
atomic_t slot_count; /* number of in-use slots */
unsigned long *bitmap; /* 0 = free slot */
struct vm_special_mapping xol_mapping;
struct page *pages[2];
/*
* We keep the vma's vm_start rather than a pointer to the vma
* itself. The probed process or a naughty kernel module could make
* the vma go away, and we must handle that reasonably gracefully.
*/
unsigned long vaddr; /* Page(s) of instruction slots */
unsigned long vaddr; /* Page(s) of instruction slots */
};
/*
@ -366,6 +358,18 @@ set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long v
return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t *)&auprobe->insn);
}
static struct uprobe *get_uprobe(struct uprobe *uprobe)
{
atomic_inc(&uprobe->ref);
return uprobe;
}
static void put_uprobe(struct uprobe *uprobe)
{
if (atomic_dec_and_test(&uprobe->ref))
kfree(uprobe);
}
static int match_uprobe(struct uprobe *l, struct uprobe *r)
{
if (l->inode < r->inode)
@ -393,10 +397,8 @@ static struct uprobe *__find_uprobe(struct inode *inode, loff_t offset)
while (n) {
uprobe = rb_entry(n, struct uprobe, rb_node);
match = match_uprobe(&u, uprobe);
if (!match) {
atomic_inc(&uprobe->ref);
return uprobe;
}
if (!match)
return get_uprobe(uprobe);
if (match < 0)
n = n->rb_left;
@ -432,10 +434,8 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe)
parent = *p;
u = rb_entry(parent, struct uprobe, rb_node);
match = match_uprobe(uprobe, u);
if (!match) {
atomic_inc(&u->ref);
return u;
}
if (!match)
return get_uprobe(u);
if (match < 0)
p = &parent->rb_left;
@ -472,12 +472,6 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
return u;
}
static void put_uprobe(struct uprobe *uprobe)
{
if (atomic_dec_and_test(&uprobe->ref))
kfree(uprobe);
}
static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
{
struct uprobe *uprobe, *cur_uprobe;
@ -1039,14 +1033,14 @@ static void build_probe_list(struct inode *inode,
if (u->inode != inode || u->offset < min)
break;
list_add(&u->pending_list, head);
atomic_inc(&u->ref);
get_uprobe(u);
}
for (t = n; (t = rb_next(t)); ) {
u = rb_entry(t, struct uprobe, rb_node);
if (u->inode != inode || u->offset > max)
break;
list_add(&u->pending_list, head);
atomic_inc(&u->ref);
get_uprobe(u);
}
}
spin_unlock(&uprobes_treelock);
@ -1132,11 +1126,14 @@ void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned lon
/* Slot allocation for XOL */
static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
{
int ret = -EALREADY;
struct vm_area_struct *vma;
int ret;
down_write(&mm->mmap_sem);
if (mm->uprobes_state.xol_area)
if (mm->uprobes_state.xol_area) {
ret = -EALREADY;
goto fail;
}
if (!area->vaddr) {
/* Try to map as high as possible, this is only a hint. */
@ -1148,11 +1145,15 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
}
}
ret = install_special_mapping(mm, area->vaddr, PAGE_SIZE,
VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, &area->page);
if (ret)
vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE,
VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO,
&area->xol_mapping);
if (IS_ERR(vma)) {
ret = PTR_ERR(vma);
goto fail;
}
ret = 0;
smp_wmb(); /* pairs with get_xol_area() */
mm->uprobes_state.xol_area = area;
fail:
@ -1175,21 +1176,24 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
if (!area->bitmap)
goto free_area;
area->page = alloc_page(GFP_HIGHUSER);
if (!area->page)
area->xol_mapping.name = "[uprobes]";
area->xol_mapping.pages = area->pages;
area->pages[0] = alloc_page(GFP_HIGHUSER);
if (!area->pages[0])
goto free_bitmap;
area->pages[1] = NULL;
area->vaddr = vaddr;
init_waitqueue_head(&area->wq);
/* Reserve the 1st slot for get_trampoline_vaddr() */
set_bit(0, area->bitmap);
atomic_set(&area->slot_count, 1);
copy_to_page(area->page, 0, &insn, UPROBE_SWBP_INSN_SIZE);
copy_to_page(area->pages[0], 0, &insn, UPROBE_SWBP_INSN_SIZE);
if (!xol_add_vma(mm, area))
return area;
__free_page(area->page);
__free_page(area->pages[0]);
free_bitmap:
kfree(area->bitmap);
free_area:
@ -1227,7 +1231,7 @@ void uprobe_clear_state(struct mm_struct *mm)
if (!area)
return;
put_page(area->page);
put_page(area->pages[0]);
kfree(area->bitmap);
kfree(area);
}
@ -1296,7 +1300,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
if (unlikely(!xol_vaddr))
return 0;
arch_uprobe_copy_ixol(area->page, xol_vaddr,
arch_uprobe_copy_ixol(area->pages[0], xol_vaddr,
&uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
return xol_vaddr;
@ -1333,6 +1337,7 @@ static void xol_free_insn_slot(struct task_struct *tsk)
clear_bit(slot_nr, area->bitmap);
atomic_dec(&area->slot_count);
smp_mb__after_atomic(); /* pairs with prepare_to_wait() */
if (waitqueue_active(&area->wq))
wake_up(&area->wq);
@ -1376,6 +1381,14 @@ unsigned long uprobe_get_trap_addr(struct pt_regs *regs)
return instruction_pointer(regs);
}
static struct return_instance *free_ret_instance(struct return_instance *ri)
{
struct return_instance *next = ri->next;
put_uprobe(ri->uprobe);
kfree(ri);
return next;
}
/*
* Called with no locks held.
* Called in context of a exiting or a exec-ing thread.
@ -1383,7 +1396,7 @@ unsigned long uprobe_get_trap_addr(struct pt_regs *regs)
void uprobe_free_utask(struct task_struct *t)
{
struct uprobe_task *utask = t->utask;
struct return_instance *ri, *tmp;
struct return_instance *ri;
if (!utask)
return;
@ -1392,13 +1405,8 @@ void uprobe_free_utask(struct task_struct *t)
put_uprobe(utask->active_uprobe);
ri = utask->return_instances;
while (ri) {
tmp = ri;
ri = ri->next;
put_uprobe(tmp->uprobe);
kfree(tmp);
}
while (ri)
ri = free_ret_instance(ri);
xol_free_insn_slot(t);
kfree(utask);
@ -1437,7 +1445,7 @@ static int dup_utask(struct task_struct *t, struct uprobe_task *o_utask)
return -ENOMEM;
*n = *o;
atomic_inc(&n->uprobe->ref);
get_uprobe(n->uprobe);
n->next = NULL;
*p = n;
@ -1515,12 +1523,25 @@ static unsigned long get_trampoline_vaddr(void)
return trampoline_vaddr;
}
static void cleanup_return_instances(struct uprobe_task *utask, bool chained,
struct pt_regs *regs)
{
struct return_instance *ri = utask->return_instances;
enum rp_check ctx = chained ? RP_CHECK_CHAIN_CALL : RP_CHECK_CALL;
while (ri && !arch_uretprobe_is_alive(ri, ctx, regs)) {
ri = free_ret_instance(ri);
utask->depth--;
}
utask->return_instances = ri;
}
static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
{
struct return_instance *ri;
struct uprobe_task *utask;
unsigned long orig_ret_vaddr, trampoline_vaddr;
bool chained = false;
bool chained;
if (!get_xol_area())
return;
@ -1536,49 +1557,47 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
return;
}
ri = kzalloc(sizeof(struct return_instance), GFP_KERNEL);
ri = kmalloc(sizeof(struct return_instance), GFP_KERNEL);
if (!ri)
goto fail;
return;
trampoline_vaddr = get_trampoline_vaddr();
orig_ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs);
if (orig_ret_vaddr == -1)
goto fail;
/* drop the entries invalidated by longjmp() */
chained = (orig_ret_vaddr == trampoline_vaddr);
cleanup_return_instances(utask, chained, regs);
/*
* We don't want to keep trampoline address in stack, rather keep the
* original return address of first caller thru all the consequent
* instances. This also makes breakpoint unwrapping easier.
*/
if (orig_ret_vaddr == trampoline_vaddr) {
if (chained) {
if (!utask->return_instances) {
/*
* This situation is not possible. Likely we have an
* attack from user-space.
*/
pr_warn("uprobe: unable to set uretprobe pid/tgid=%d/%d\n",
current->pid, current->tgid);
uprobe_warn(current, "handle tail call");
goto fail;
}
chained = true;
orig_ret_vaddr = utask->return_instances->orig_ret_vaddr;
}
atomic_inc(&uprobe->ref);
ri->uprobe = uprobe;
ri->uprobe = get_uprobe(uprobe);
ri->func = instruction_pointer(regs);
ri->stack = user_stack_pointer(regs);
ri->orig_ret_vaddr = orig_ret_vaddr;
ri->chained = chained;
utask->depth++;
/* add instance to the stack */
ri->next = utask->return_instances;
utask->return_instances = ri;
return;
fail:
kfree(ri);
}
@ -1766,46 +1785,58 @@ handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs)
up_read(&uprobe->register_rwsem);
}
static bool handle_trampoline(struct pt_regs *regs)
static struct return_instance *find_next_ret_chain(struct return_instance *ri)
{
bool chained;
do {
chained = ri->chained;
ri = ri->next; /* can't be NULL if chained */
} while (chained);
return ri;
}
static void handle_trampoline(struct pt_regs *regs)
{
struct uprobe_task *utask;
struct return_instance *ri, *tmp;
bool chained;
struct return_instance *ri, *next;
bool valid;
utask = current->utask;
if (!utask)
return false;
goto sigill;
ri = utask->return_instances;
if (!ri)
return false;
goto sigill;
/*
* TODO: we should throw out return_instance's invalidated by
* longjmp(), currently we assume that the probed function always
* returns.
*/
instruction_pointer_set(regs, ri->orig_ret_vaddr);
do {
/*
* We should throw out the frames invalidated by longjmp().
* If this chain is valid, then the next one should be alive
* or NULL; the latter case means that nobody but ri->func
* could hit this trampoline on return. TODO: sigaltstack().
*/
next = find_next_ret_chain(ri);
valid = !next || arch_uretprobe_is_alive(next, RP_CHECK_RET, regs);
for (;;) {
handle_uretprobe_chain(ri, regs);
chained = ri->chained;
put_uprobe(ri->uprobe);
tmp = ri;
ri = ri->next;
kfree(tmp);
utask->depth--;
if (!chained)
break;
BUG_ON(!ri);
}
instruction_pointer_set(regs, ri->orig_ret_vaddr);
do {
if (valid)
handle_uretprobe_chain(ri, regs);
ri = free_ret_instance(ri);
utask->depth--;
} while (ri != next);
} while (!valid);
utask->return_instances = ri;
return;
sigill:
uprobe_warn(current, "handle uretprobe, sending SIGILL.");
force_sig_info(SIGILL, SEND_SIG_FORCED, current);
return true;
}
bool __weak arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs)
@ -1813,6 +1844,12 @@ bool __weak arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs)
return false;
}
bool __weak arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check ctx,
struct pt_regs *regs)
{
return true;
}
/*
* Run handler and ask thread to singlestep.
* Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
@ -1824,13 +1861,8 @@ static void handle_swbp(struct pt_regs *regs)
int uninitialized_var(is_swbp);
bp_vaddr = uprobe_get_swbp_addr(regs);
if (bp_vaddr == get_trampoline_vaddr()) {
if (handle_trampoline(regs))
return;
pr_warn("uprobe: unable to handle uretprobe pid/tgid=%d/%d\n",
current->pid, current->tgid);
}
if (bp_vaddr == get_trampoline_vaddr())
return handle_trampoline(regs);
uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
if (!uprobe) {

View File

@ -1332,7 +1332,7 @@ bool __weak arch_within_kprobe_blacklist(unsigned long addr)
addr < (unsigned long)__kprobes_text_end;
}
static bool within_kprobe_blacklist(unsigned long addr)
bool within_kprobe_blacklist(unsigned long addr)
{
struct kprobe_blacklist_entry *ent;

View File

@ -434,7 +434,7 @@ config UPROBE_EVENT
config BPF_EVENTS
depends on BPF_SYSCALL
depends on KPROBE_EVENT
depends on KPROBE_EVENT || UPROBE_EVENT
bool
default y
help

View File

@ -601,7 +601,22 @@ static int probes_seq_show(struct seq_file *m, void *v)
seq_printf(m, "%c:%s/%s", c, tu->tp.call.class->system,
trace_event_name(&tu->tp.call));
seq_printf(m, " %s:0x%p", tu->filename, (void *)tu->offset);
seq_printf(m, " %s:", tu->filename);
/* Don't print "0x (null)" when offset is 0 */
if (tu->offset) {
seq_printf(m, "0x%p", (void *)tu->offset);
} else {
switch (sizeof(void *)) {
case 4:
seq_printf(m, "0x00000000");
break;
case 8:
default:
seq_printf(m, "0x0000000000000000");
break;
}
}
for (i = 0; i < tu->tp.nr_args; i++)
seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
@ -1095,11 +1110,15 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
{
struct trace_event_call *call = &tu->tp.call;
struct uprobe_trace_entry_head *entry;
struct bpf_prog *prog = call->prog;
struct hlist_head *head;
void *data;
int size, esize;
int rctx;
if (prog && !trace_call_bpf(prog, regs))
return;
esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
size = esize + tu->tp.size + dsize;
@ -1289,6 +1308,7 @@ static int register_uprobe_event(struct trace_uprobe *tu)
return -ENODEV;
}
call->flags = TRACE_EVENT_FL_UPROBE;
call->class->reg = trace_uprobe_register;
call->data = tu;
ret = trace_add_event_call(call);

View File

@ -66,6 +66,7 @@ To follow the above example, the user provides following 'Build' files:
ex/Build:
ex-y += a.o
ex-y += b.o
ex-y += b.o # duplicates in the lists are allowed
libex-y += c.o
libex-y += d.o

View File

@ -57,11 +57,13 @@ quiet_cmd_cc_i_c = CPP $@
quiet_cmd_cc_s_c = AS $@
cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
quiet_cmd_gen = GEN $@
# Link agregate command
# If there's nothing to link, create empty $@ object.
quiet_cmd_ld_multi = LD $@
cmd_ld_multi = $(if $(strip $(obj-y)),\
$(LD) -r -o $@ $(obj-y),rm -f $@; $(AR) rcs $@)
$(LD) -r -o $@ $(filter $(obj-y),$^),rm -f $@; $(AR) rcs $@)
# Build rules
$(OUTPUT)%.o: %.c FORCE

View File

@ -33,7 +33,8 @@ FILES= \
test-compile-32.bin \
test-compile-x32.bin \
test-zlib.bin \
test-lzma.bin
test-lzma.bin \
test-bpf.bin
CC := $(CROSS_COMPILE)gcc -MD
PKG_CONFIG := $(CROSS_COMPILE)pkg-config
@ -69,8 +70,13 @@ test-libelf.bin:
test-glibc.bin:
$(BUILD)
DWARFLIBS := -ldw
ifeq ($(findstring -static,${LDFLAGS}),-static)
DWARFLIBS += -lelf -lebl -lz -llzma -lbz2
endif
test-dwarf.bin:
$(BUILD) -ldw
$(BUILD) $(DWARFLIBS)
test-libelf-mmap.bin:
$(BUILD) -lelf
@ -156,6 +162,9 @@ test-zlib.bin:
test-lzma.bin:
$(BUILD) -llzma
test-bpf.bin:
$(BUILD)
-include *.d
###############################

View File

@ -0,0 +1,18 @@
#include <linux/bpf.h>
int main(void)
{
union bpf_attr attr;
attr.prog_type = BPF_PROG_TYPE_KPROBE;
attr.insn_cnt = 0;
attr.insns = 0;
attr.license = 0;
attr.log_buf = 0;
attr.log_size = 0;
attr.log_level = 0;
attr.kern_version = 0;
attr = attr;
return 0;
}

View File

@ -1,8 +1,19 @@
#include <stdlib.h>
#if !defined(__UCLIBC__)
#include <gnu/libc-version.h>
#else
#define XSTR(s) STR(s)
#define STR(s) #s
#endif
int main(void)
{
#if !defined(__UCLIBC__)
const char *version = gnu_get_libc_version();
#else
const char *version = XSTR(__GLIBC__) "." XSTR(__GLIBC_MINOR__);
#endif
return (long)version;
}

View File

@ -1,6 +1,7 @@
ex-y += ex.o
ex-y += a.o
ex-y += b.o
ex-y += b.o
ex-y += empty/
ex-y += empty2/

View File

@ -12,6 +12,7 @@
#include <linux/kernel.h>
#include "debugfs.h"
#include "tracefs.h"
#ifndef DEBUGFS_DEFAULT_PATH
#define DEBUGFS_DEFAULT_PATH "/sys/kernel/debug"
@ -94,11 +95,21 @@ int debugfs__strerror_open(int err, char *buf, size_t size, const char *filename
"Hint:\tIs the debugfs filesystem mounted?\n"
"Hint:\tTry 'sudo mount -t debugfs nodev /sys/kernel/debug'");
break;
case EACCES:
case EACCES: {
const char *mountpoint = debugfs_mountpoint;
if (!access(debugfs_mountpoint, R_OK) && strncmp(filename, "tracing/", 8) == 0) {
const char *tracefs_mntpoint = tracefs_find_mountpoint();
if (tracefs_mntpoint)
mountpoint = tracefs_mntpoint;
}
snprintf(buf, size,
"Error:\tNo permissions to read %s/%s\n"
"Hint:\tTry 'sudo mount -o remount,mode=755 %s'\n",
debugfs_mountpoint, filename, debugfs_mountpoint);
debugfs_mountpoint, filename, mountpoint);
}
break;
default:
snprintf(buf, size, "%s", strerror_r(err, sbuf, sizeof(sbuf)));

2
tools/lib/bpf/.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
libbpf_version.h
FEATURE-DUMP

1
tools/lib/bpf/Build Normal file
View File

@ -0,0 +1 @@
libbpf-y := libbpf.o bpf.o

195
tools/lib/bpf/Makefile Normal file
View File

@ -0,0 +1,195 @@
# Most of this file is copied from tools/lib/traceevent/Makefile
BPF_VERSION = 0
BPF_PATCHLEVEL = 0
BPF_EXTRAVERSION = 1
MAKEFLAGS += --no-print-directory
# Makefiles suck: This macro sets a default value of $(2) for the
# variable named by $(1), unless the variable has been set by
# environment or command line. This is necessary for CC and AR
# because make sets default values, so the simpler ?= approach
# won't work as expected.
define allow-override
$(if $(or $(findstring environment,$(origin $(1))),\
$(findstring command line,$(origin $(1)))),,\
$(eval $(1) = $(2)))
endef
# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
$(call allow-override,CC,$(CROSS_COMPILE)gcc)
$(call allow-override,AR,$(CROSS_COMPILE)ar)
INSTALL = install
# Use DESTDIR for installing into a different root directory.
# This is useful for building a package. The program will be
# installed in this directory as if it was the root directory.
# Then the build tool can move it later.
DESTDIR ?=
DESTDIR_SQ = '$(subst ','\'',$(DESTDIR))'
LP64 := $(shell echo __LP64__ | ${CC} ${CFLAGS} -E -x c - | tail -n 1)
ifeq ($(LP64), 1)
libdir_relative = lib64
else
libdir_relative = lib
endif
prefix ?= /usr/local
libdir = $(prefix)/$(libdir_relative)
man_dir = $(prefix)/share/man
man_dir_SQ = '$(subst ','\'',$(man_dir))'
export man_dir man_dir_SQ INSTALL
export DESTDIR DESTDIR_SQ
include ../../scripts/Makefile.include
# copy a bit from Linux kbuild
ifeq ("$(origin V)", "command line")
VERBOSE = $(V)
endif
ifndef VERBOSE
VERBOSE = 0
endif
ifeq ($(srctree),)
srctree := $(patsubst %/,%,$(dir $(shell pwd)))
srctree := $(patsubst %/,%,$(dir $(srctree)))
srctree := $(patsubst %/,%,$(dir $(srctree)))
#$(info Determined 'srctree' to be $(srctree))
endif
FEATURE_DISPLAY = libelf libelf-getphdrnum libelf-mmap bpf
FEATURE_TESTS = libelf bpf
INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/arch/$(ARCH)/include/uapi -I$(srctree)/include/uapi
FEATURE_CHECK_CFLAGS-bpf = $(INCLUDES)
include $(srctree)/tools/build/Makefile.feature
export prefix libdir src obj
# Shell quotes
libdir_SQ = $(subst ','\'',$(libdir))
libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
plugin_dir_SQ = $(subst ','\'',$(plugin_dir))
LIB_FILE = libbpf.a libbpf.so
VERSION = $(BPF_VERSION)
PATCHLEVEL = $(BPF_PATCHLEVEL)
EXTRAVERSION = $(BPF_EXTRAVERSION)
OBJ = $@
N =
LIBBPF_VERSION = $(BPF_VERSION).$(BPF_PATCHLEVEL).$(BPF_EXTRAVERSION)
# Set compile option CFLAGS
ifdef EXTRA_CFLAGS
CFLAGS := $(EXTRA_CFLAGS)
else
CFLAGS := -g -Wall
endif
ifeq ($(feature-libelf-mmap), 1)
override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT
endif
ifeq ($(feature-libelf-getphdrnum), 1)
override CFLAGS += -DHAVE_ELF_GETPHDRNUM_SUPPORT
endif
# Append required CFLAGS
override CFLAGS += $(EXTRA_WARNINGS)
override CFLAGS += -Werror -Wall
override CFLAGS += -fPIC
override CFLAGS += $(INCLUDES)
ifeq ($(VERBOSE),1)
Q =
else
Q = @
endif
# Disable command line variables (CFLAGS) overide from top
# level Makefile (perf), otherwise build Makefile will get
# the same command line setup.
MAKEOVERRIDES=
export srctree OUTPUT CC LD CFLAGS V
build := -f $(srctree)/tools/build/Makefile.build dir=. obj
BPF_IN := $(OUTPUT)libbpf-in.o
LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE))
CMD_TARGETS = $(LIB_FILE)
TARGETS = $(CMD_TARGETS)
all: $(VERSION_FILES) all_cmd
all_cmd: $(CMD_TARGETS)
$(BPF_IN): force elfdep bpfdep
$(Q)$(MAKE) $(build)=libbpf
$(OUTPUT)libbpf.so: $(BPF_IN)
$(QUIET_LINK)$(CC) --shared $^ -o $@
$(OUTPUT)libbpf.a: $(BPF_IN)
$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
define update_dir
(echo $1 > $@.tmp; \
if [ -r $@ ] && cmp -s $@ $@.tmp; then \
rm -f $@.tmp; \
else \
echo ' UPDATE $@'; \
mv -f $@.tmp $@; \
fi);
endef
define do_install
if [ ! -d '$(DESTDIR_SQ)$2' ]; then \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$2'; \
fi; \
$(INSTALL) $1 '$(DESTDIR_SQ)$2'
endef
install_lib: all_cmd
$(call QUIET_INSTALL, $(LIB_FILE)) \
$(call do_install,$(LIB_FILE),$(libdir_SQ))
install: install_lib
### Cleaning rules
config-clean:
$(call QUIET_CLEAN, config)
$(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null
clean:
$(call QUIET_CLEAN, libbpf) $(RM) *.o *~ $(TARGETS) *.a *.so $(VERSION_FILES) .*.d \
$(RM) LIBBPF-CFLAGS
$(call QUIET_CLEAN, core-gen) $(RM) $(OUTPUT)FEATURE-DUMP
PHONY += force elfdep bpfdep
force:
elfdep:
@if [ "$(feature-libelf)" != "1" ]; then echo "No libelf found"; exit -1 ; fi
bpfdep:
@if [ "$(feature-bpf)" != "1" ]; then echo "BPF API too old"; exit -1 ; fi
# Declare the contents of the .PHONY variable as phony. We keep that
# information in a variable so we can use it in if_changed and friends.
.PHONY: $(PHONY)

85
tools/lib/bpf/bpf.c Normal file
View File

@ -0,0 +1,85 @@
/*
* common eBPF ELF operations.
*
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc.
*/
#include <stdlib.h>
#include <memory.h>
#include <unistd.h>
#include <asm/unistd.h>
#include <linux/bpf.h>
#include "bpf.h"
/*
* When building perf, unistd.h is override. Define __NR_bpf is
* required to be defined.
*/
#ifndef __NR_bpf
# if defined(__i386__)
# define __NR_bpf 357
# elif defined(__x86_64__)
# define __NR_bpf 321
# elif defined(__aarch64__)
# define __NR_bpf 280
# else
# error __NR_bpf not defined. libbpf does not support your arch.
# endif
#endif
static __u64 ptr_to_u64(void *ptr)
{
return (__u64) (unsigned long) ptr;
}
static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
unsigned int size)
{
return syscall(__NR_bpf, cmd, attr, size);
}
int bpf_create_map(enum bpf_map_type map_type, int key_size,
int value_size, int max_entries)
{
union bpf_attr attr;
memset(&attr, '\0', sizeof(attr));
attr.map_type = map_type;
attr.key_size = key_size;
attr.value_size = value_size;
attr.max_entries = max_entries;
return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}
int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
size_t insns_cnt, char *license,
u32 kern_version, char *log_buf, size_t log_buf_sz)
{
int fd;
union bpf_attr attr;
bzero(&attr, sizeof(attr));
attr.prog_type = type;
attr.insn_cnt = (__u32)insns_cnt;
attr.insns = ptr_to_u64(insns);
attr.license = ptr_to_u64(license);
attr.log_buf = ptr_to_u64(NULL);
attr.log_size = 0;
attr.log_level = 0;
attr.kern_version = kern_version;
fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
if (fd >= 0 || !log_buf || !log_buf_sz)
return fd;
/* Try again with log */
attr.log_buf = ptr_to_u64(log_buf);
attr.log_size = log_buf_sz;
attr.log_level = 1;
log_buf[0] = 0;
return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
}

23
tools/lib/bpf/bpf.h Normal file
View File

@ -0,0 +1,23 @@
/*
* common eBPF ELF operations.
*
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc.
*/
#ifndef __BPF_BPF_H
#define __BPF_BPF_H
#include <linux/bpf.h>
int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
int max_entries);
/* Recommend log buffer size */
#define BPF_LOG_BUF_SIZE 65536
int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
size_t insns_cnt, char *license,
u32 kern_version, char *log_buf,
size_t log_buf_sz);
#endif

1037
tools/lib/bpf/libbpf.c Normal file

File diff suppressed because it is too large Load Diff

81
tools/lib/bpf/libbpf.h Normal file
View File

@ -0,0 +1,81 @@
/*
* Common eBPF ELF object loading operations.
*
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc.
*/
#ifndef __BPF_LIBBPF_H
#define __BPF_LIBBPF_H
#include <stdio.h>
#include <stdbool.h>
/*
* In include/linux/compiler-gcc.h, __printf is defined. However
* it should be better if libbpf.h doesn't depend on Linux header file.
* So instead of __printf, here we use gcc attribute directly.
*/
typedef int (*libbpf_print_fn_t)(const char *, ...)
__attribute__((format(printf, 1, 2)));
void libbpf_set_print(libbpf_print_fn_t warn,
libbpf_print_fn_t info,
libbpf_print_fn_t debug);
/* Hide internal to user */
struct bpf_object;
struct bpf_object *bpf_object__open(const char *path);
struct bpf_object *bpf_object__open_buffer(void *obj_buf,
size_t obj_buf_sz);
void bpf_object__close(struct bpf_object *object);
/* Load/unload object into/from kernel */
int bpf_object__load(struct bpf_object *obj);
int bpf_object__unload(struct bpf_object *obj);
struct bpf_object *bpf_object__next(struct bpf_object *prev);
#define bpf_object__for_each_safe(pos, tmp) \
for ((pos) = bpf_object__next(NULL), \
(tmp) = bpf_object__next(pos); \
(pos) != NULL; \
(pos) = (tmp), (tmp) = bpf_object__next(tmp))
/* Accessors of bpf_program. */
struct bpf_program;
struct bpf_program *bpf_program__next(struct bpf_program *prog,
struct bpf_object *obj);
#define bpf_object__for_each_program(pos, obj) \
for ((pos) = bpf_program__next(NULL, (obj)); \
(pos) != NULL; \
(pos) = bpf_program__next((pos), (obj)))
typedef void (*bpf_program_clear_priv_t)(struct bpf_program *,
void *);
int bpf_program__set_private(struct bpf_program *prog, void *priv,
bpf_program_clear_priv_t clear_priv);
int bpf_program__get_private(struct bpf_program *prog,
void **ppriv);
const char *bpf_program__title(struct bpf_program *prog, bool dup);
int bpf_program__fd(struct bpf_program *prog);
/*
* We don't need __attribute__((packed)) now since it is
* unnecessary for 'bpf_map_def' because they are all aligned.
* In addition, using it will trigger -Wpacked warning message,
* and will be treated as an error due to -Werror.
*/
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
#endif

View File

@ -418,7 +418,7 @@ static int func_map_init(struct pevent *pevent)
}
static struct func_map *
find_func(struct pevent *pevent, unsigned long long addr)
__find_func(struct pevent *pevent, unsigned long long addr)
{
struct func_map *func;
struct func_map key;
@ -434,6 +434,71 @@ find_func(struct pevent *pevent, unsigned long long addr)
return func;
}
struct func_resolver {
pevent_func_resolver_t *func;
void *priv;
struct func_map map;
};
/**
* pevent_set_function_resolver - set an alternative function resolver
* @pevent: handle for the pevent
* @resolver: function to be used
* @priv: resolver function private state.
*
* Some tools may have already a way to resolve kernel functions, allow them to
* keep using it instead of duplicating all the entries inside
* pevent->funclist.
*/
int pevent_set_function_resolver(struct pevent *pevent,
pevent_func_resolver_t *func, void *priv)
{
struct func_resolver *resolver = malloc(sizeof(*resolver));
if (resolver == NULL)
return -1;
resolver->func = func;
resolver->priv = priv;
free(pevent->func_resolver);
pevent->func_resolver = resolver;
return 0;
}
/**
* pevent_reset_function_resolver - reset alternative function resolver
* @pevent: handle for the pevent
*
* Stop using whatever alternative resolver was set, use the default
* one instead.
*/
void pevent_reset_function_resolver(struct pevent *pevent)
{
free(pevent->func_resolver);
pevent->func_resolver = NULL;
}
static struct func_map *
find_func(struct pevent *pevent, unsigned long long addr)
{
struct func_map *map;
if (!pevent->func_resolver)
return __find_func(pevent, addr);
map = &pevent->func_resolver->map;
map->mod = NULL;
map->addr = addr;
map->func = pevent->func_resolver->func(pevent->func_resolver->priv,
&map->addr, &map->mod);
if (map->func == NULL)
return NULL;
return map;
}
/**
* pevent_find_function - find a function by a given address
* @pevent: handle for the pevent
@ -1680,6 +1745,9 @@ process_cond(struct event_format *event, struct print_arg *top, char **tok)
type = process_arg(event, left, &token);
again:
if (type == EVENT_ERROR)
goto out_free;
/* Handle other operations in the arguments */
if (type == EVENT_OP && strcmp(token, ":") != 0) {
type = process_op(event, left, &token);
@ -1939,6 +2007,12 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok)
goto out_warn_free;
type = process_arg_token(event, right, tok, type);
if (type == EVENT_ERROR) {
free_arg(right);
/* token was freed in process_arg_token() via *tok */
token = NULL;
goto out_free;
}
if (right->type == PRINT_OP &&
get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
@ -4754,6 +4828,7 @@ static void pretty_print(struct trace_seq *s, void *data, int size, struct event
case 'z':
case 'Z':
case '0' ... '9':
case '-':
goto cont_process;
case 'p':
if (pevent->long_size == 4)
@ -6564,6 +6639,7 @@ void pevent_free(struct pevent *pevent)
free(pevent->trace_clock);
free(pevent->events);
free(pevent->sort_events);
free(pevent->func_resolver);
free(pevent);
}

View File

@ -453,6 +453,10 @@ struct cmdline_list;
struct func_map;
struct func_list;
struct event_handler;
struct func_resolver;
typedef char *(pevent_func_resolver_t)(void *priv,
unsigned long long *addrp, char **modp);
struct pevent {
int ref_count;
@ -481,6 +485,7 @@ struct pevent {
int cmdline_count;
struct func_map *func_map;
struct func_resolver *func_resolver;
struct func_list *funclist;
unsigned int func_count;
@ -611,6 +616,9 @@ enum trace_flag_type {
TRACE_FLAG_SOFTIRQ = 0x10,
};
int pevent_set_function_resolver(struct pevent *pevent,
pevent_func_resolver_t *func, void *priv);
void pevent_reset_function_resolver(struct pevent *pevent);
int pevent_register_comm(struct pevent *pevent, const char *comm, int pid);
int pevent_register_trace_clock(struct pevent *pevent, const char *trace_clock);
int pevent_register_function(struct pevent *pevent, char *name,

View File

@ -29,3 +29,4 @@ config.mak.autogen
*.pyc
*.pyo
.config-detected
util/intel-pt-decoder/inat-tables.c

View File

@ -35,6 +35,7 @@ paths += -DPERF_MAN_PATH="BUILD_STR($(mandir_SQ))"
CFLAGS_builtin-help.o += $(paths)
CFLAGS_builtin-timechart.o += $(paths)
CFLAGS_perf.o += -DPERF_HTML_PATH="BUILD_STR($(htmldir_SQ))" -include $(OUTPUT)PERF-VERSION-FILE
CFLAGS_builtin-trace.o += -DSTRACE_GROUPS_DIR="BUILD_STR($(STRACE_GROUPS_DIR_SQ))"
libperf-y += util/
libperf-y += arch/

View File

@ -0,0 +1,86 @@
Intel Branch Trace Store
========================
Overview
========
Intel BTS could be regarded as a predecessor to Intel PT and has some
similarities because it can also identify every branch a program takes. A
notable difference is that Intel BTS has no timing information and as a
consequence the present implementation is limited to per-thread recording.
While decoding Intel BTS does not require walking the object code, the object
code is still needed to pair up calls and returns correctly, consequently much
of the Intel PT documentation applies also to Intel BTS. Refer to the Intel PT
documentation and consider that the PMU 'intel_bts' can usually be used in
place of 'intel_pt' in the examples provided, with the proviso that per-thread
recording must also be stipulated i.e. the --per-thread option for
'perf record'.
perf record
===========
new event
---------
The Intel BTS kernel driver creates a new PMU for Intel BTS. The perf record
option is:
-e intel_bts//
Currently Intel BTS is limited to per-thread tracing so the --per-thread option
is also needed.
snapshot option
---------------
The snapshot option is the same as Intel PT (refer Intel PT documentation).
auxtrace mmap size option
-----------------------
The mmap size option is the same as Intel PT (refer Intel PT documentation).
perf script
===========
By default, perf script will decode trace data found in the perf.data file.
This can be further controlled by option --itrace. The --itrace option is
the same as Intel PT (refer Intel PT documentation) except that neither
"instructions" events nor "transactions" events (and consequently call
chains) are supported.
To disable trace decoding entirely, use the option --no-itrace.
dump option
-----------
perf script has an option (-D) to "dump" the events i.e. display the binary
data.
When -D is used, Intel BTS packets are displayed.
To disable the display of Intel BTS packets, combine the -D option with
--no-itrace.
perf report
===========
By default, perf report will decode trace data found in the perf.data file.
This can be further controlled by new option --itrace exactly the same as
perf script.
perf inject
===========
perf inject also accepts the --itrace option in which case tracing data is
removed and replaced with the synthesized events. e.g.
perf inject --itrace -i perf.data -o perf.data.new

View File

@ -0,0 +1,766 @@
Intel Processor Trace
=====================
Overview
========
Intel Processor Trace (Intel PT) is an extension of Intel Architecture that
collects information about software execution such as control flow, execution
modes and timings and formats it into highly compressed binary packets.
Technical details are documented in the Intel 64 and IA-32 Architectures
Software Developer Manuals, Chapter 36 Intel Processor Trace.
Intel PT is first supported in Intel Core M and 5th generation Intel Core
processors that are based on the Intel micro-architecture code name Broadwell.
Trace data is collected by 'perf record' and stored within the perf.data file.
See below for options to 'perf record'.
Trace data must be 'decoded' which involves walking the object code and matching
the trace data packets. For example a TNT packet only tells whether a
conditional branch was taken or not taken, so to make use of that packet the
decoder must know precisely which instruction was being executed.
Decoding is done on-the-fly. The decoder outputs samples in the same format as
samples output by perf hardware events, for example as though the "instructions"
or "branches" events had been recorded. Presently 3 tools support this:
'perf script', 'perf report' and 'perf inject'. See below for more information
on using those tools.
The main distinguishing feature of Intel PT is that the decoder can determine
the exact flow of software execution. Intel PT can be used to understand why
and how did software get to a certain point, or behave a certain way. The
software does not have to be recompiled, so Intel PT works with debug or release
builds, however the executed images are needed - which makes use in JIT-compiled
environments, or with self-modified code, a challenge. Also symbols need to be
provided to make sense of addresses.
A limitation of Intel PT is that it produces huge amounts of trace data
(hundreds of megabytes per second per core) which takes a long time to decode,
for example two or three orders of magnitude longer than it took to collect.
Another limitation is the performance impact of tracing, something that will
vary depending on the use-case and architecture.
Quickstart
==========
It is important to start small. That is because it is easy to capture vastly
more data than can possibly be processed.
The simplest thing to do with Intel PT is userspace profiling of small programs.
Data is captured with 'perf record' e.g. to trace 'ls' userspace-only:
perf record -e intel_pt//u ls
And profiled with 'perf report' e.g.
perf report
To also trace kernel space presents a problem, namely kernel self-modifying
code. A fairly good kernel image is available in /proc/kcore but to get an
accurate image a copy of /proc/kcore needs to be made under the same conditions
as the data capture. A script perf-with-kcore can do that, but beware that the
script makes use of 'sudo' to copy /proc/kcore. If you have perf installed
locally from the source tree you can do:
~/libexec/perf-core/perf-with-kcore record pt_ls -e intel_pt// -- ls
which will create a directory named 'pt_ls' and put the perf.data file and
copies of /proc/kcore, /proc/kallsyms and /proc/modules into it. Then to use
'perf report' becomes:
~/libexec/perf-core/perf-with-kcore report pt_ls
Because samples are synthesized after-the-fact, the sampling period can be
selected for reporting. e.g. sample every microsecond
~/libexec/perf-core/perf-with-kcore report pt_ls --itrace=i1usge
See the sections below for more information about the --itrace option.
Beware the smaller the period, the more samples that are produced, and the
longer it takes to process them.
Also note that the coarseness of Intel PT timing information will start to
distort the statistical value of the sampling as the sampling period becomes
smaller.
To represent software control flow, "branches" samples are produced. By default
a branch sample is synthesized for every single branch. To get an idea what
data is available you can use the 'perf script' tool with no parameters, which
will list all the samples.
perf record -e intel_pt//u ls
perf script
An interesting field that is not printed by default is 'flags' which can be
displayed as follows:
perf script -Fcomm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,flags
The flags are "bcrosyiABEx" which stand for branch, call, return, conditional,
system, asynchronous, interrupt, transaction abort, trace begin, trace end, and
in transaction, respectively.
While it is possible to create scripts to analyze the data, an alternative
approach is available to export the data to a postgresql database. Refer to
script export-to-postgresql.py for more details, and to script
call-graph-from-postgresql.py for an example of using the database.
As mentioned above, it is easy to capture too much data. One way to limit the
data captured is to use 'snapshot' mode which is explained further below.
Refer to 'new snapshot option' and 'Intel PT modes of operation' further below.
Another problem that will be experienced is decoder errors. They can be caused
by inability to access the executed image, self-modified or JIT-ed code, or the
inability to match side-band information (such as context switches and mmaps)
which results in the decoder not knowing what code was executed.
There is also the problem of perf not being able to copy the data fast enough,
resulting in data lost because the buffer was full. See 'Buffer handling' below
for more details.
perf record
===========
new event
---------
The Intel PT kernel driver creates a new PMU for Intel PT. PMU events are
selected by providing the PMU name followed by the "config" separated by slashes.
An enhancement has been made to allow default "config" e.g. the option
-e intel_pt//
will use a default config value. Currently that is the same as
-e intel_pt/tsc,noretcomp=0/
which is the same as
-e intel_pt/tsc=1,noretcomp=0/
Note there are now new config terms - see section 'config terms' further below.
The config terms are listed in /sys/devices/intel_pt/format. They are bit
fields within the config member of the struct perf_event_attr which is
passed to the kernel by the perf_event_open system call. They correspond to bit
fields in the IA32_RTIT_CTL MSR. Here is a list of them and their definitions:
$ grep -H . /sys/bus/event_source/devices/intel_pt/format/*
/sys/bus/event_source/devices/intel_pt/format/cyc:config:1
/sys/bus/event_source/devices/intel_pt/format/cyc_thresh:config:19-22
/sys/bus/event_source/devices/intel_pt/format/mtc:config:9
/sys/bus/event_source/devices/intel_pt/format/mtc_period:config:14-17
/sys/bus/event_source/devices/intel_pt/format/noretcomp:config:11
/sys/bus/event_source/devices/intel_pt/format/psb_period:config:24-27
/sys/bus/event_source/devices/intel_pt/format/tsc:config:10
Note that the default config must be overridden for each term i.e.
-e intel_pt/noretcomp=0/
is the same as:
-e intel_pt/tsc=1,noretcomp=0/
So, to disable TSC packets use:
-e intel_pt/tsc=0/
It is also possible to specify the config value explicitly:
-e intel_pt/config=0x400/
Note that, as with all events, the event is suffixed with event modifiers:
u userspace
k kernel
h hypervisor
G guest
H host
p precise ip
'h', 'G' and 'H' are for virtualization which is not supported by Intel PT.
'p' is also not relevant to Intel PT. So only options 'u' and 'k' are
meaningful for Intel PT.
perf_event_attr is displayed if the -vv option is used e.g.
------------------------------------------------------------
perf_event_attr:
type 6
size 112
config 0x400
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID
disabled 1
inherit 1
exclude_kernel 1
exclude_hv 1
enable_on_exec 1
sample_id_all 1
------------------------------------------------------------
sys_perf_event_open: pid 31104 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open: pid 31104 cpu 1 group_fd -1 flags 0x8
sys_perf_event_open: pid 31104 cpu 2 group_fd -1 flags 0x8
sys_perf_event_open: pid 31104 cpu 3 group_fd -1 flags 0x8
------------------------------------------------------------
config terms
------------
The June 2015 version of Intel 64 and IA-32 Architectures Software Developer
Manuals, Chapter 36 Intel Processor Trace, defined new Intel PT features.
Some of the features are reflect in new config terms. All the config terms are
described below.
tsc Always supported. Produces TSC timestamp packets to provide
timing information. In some cases it is possible to decode
without timing information, for example a per-thread context
that does not overlap executable memory maps.
The default config selects tsc (i.e. tsc=1).
noretcomp Always supported. Disables "return compression" so a TIP packet
is produced when a function returns. Causes more packets to be
produced but might make decoding more reliable.
The default config does not select noretcomp (i.e. noretcomp=0).
psb_period Allows the frequency of PSB packets to be specified.
The PSB packet is a synchronization packet that provides a
starting point for decoding or recovery from errors.
Support for psb_period is indicated by:
/sys/bus/event_source/devices/intel_pt/caps/psb_cyc
which contains "1" if the feature is supported and "0"
otherwise.
Valid values are given by:
/sys/bus/event_source/devices/intel_pt/caps/psb_periods
which contains a hexadecimal value, the bits of which represent
valid values e.g. bit 2 set means value 2 is valid.
The psb_period value is converted to the approximate number of
trace bytes between PSB packets as:
2 ^ (value + 11)
e.g. value 3 means 16KiB bytes between PSBs
If an invalid value is entered, the error message
will give a list of valid values e.g.
$ perf record -e intel_pt/psb_period=15/u uname
Invalid psb_period for intel_pt. Valid values are: 0-5
If MTC packets are selected, the default config selects a value
of 3 (i.e. psb_period=3) or the nearest lower value that is
supported (0 is always supported). Otherwise the default is 0.
If decoding is expected to be reliable and the buffer is large
then a large PSB period can be used.
Because a TSC packet is produced with PSB, the PSB period can
also affect the granularity to timing information in the absence
of MTC or CYC.
mtc Produces MTC timing packets.
MTC packets provide finer grain timestamp information than TSC
packets. MTC packets record time using the hardware crystal
clock (CTC) which is related to TSC packets using a TMA packet.
Support for this feature is indicated by:
/sys/bus/event_source/devices/intel_pt/caps/mtc
which contains "1" if the feature is supported and
"0" otherwise.
The frequency of MTC packets can also be specified - see
mtc_period below.
mtc_period Specifies how frequently MTC packets are produced - see mtc
above for how to determine if MTC packets are supported.
Valid values are given by:
/sys/bus/event_source/devices/intel_pt/caps/mtc_periods
which contains a hexadecimal value, the bits of which represent
valid values e.g. bit 2 set means value 2 is valid.
The mtc_period value is converted to the MTC frequency as:
CTC-frequency / (2 ^ value)
e.g. value 3 means one eighth of CTC-frequency
Where CTC is the hardware crystal clock, the frequency of which
can be related to TSC via values provided in cpuid leaf 0x15.
If an invalid value is entered, the error message
will give a list of valid values e.g.
$ perf record -e intel_pt/mtc_period=15/u uname
Invalid mtc_period for intel_pt. Valid values are: 0,3,6,9
The default value is 3 or the nearest lower value
that is supported (0 is always supported).
cyc Produces CYC timing packets.
CYC packets provide even finer grain timestamp information than
MTC and TSC packets. A CYC packet contains the number of CPU
cycles since the last CYC packet. Unlike MTC and TSC packets,
CYC packets are only sent when another packet is also sent.
Support for this feature is indicated by:
/sys/bus/event_source/devices/intel_pt/caps/psb_cyc
which contains "1" if the feature is supported and
"0" otherwise.
The number of CYC packets produced can be reduced by specifying
a threshold - see cyc_thresh below.
cyc_thresh Specifies how frequently CYC packets are produced - see cyc
above for how to determine if CYC packets are supported.
Valid cyc_thresh values are given by:
/sys/bus/event_source/devices/intel_pt/caps/cycle_thresholds
which contains a hexadecimal value, the bits of which represent
valid values e.g. bit 2 set means value 2 is valid.
The cyc_thresh value represents the minimum number of CPU cycles
that must have passed before a CYC packet can be sent. The
number of CPU cycles is:
2 ^ (value - 1)
e.g. value 4 means 8 CPU cycles must pass before a CYC packet
can be sent. Note a CYC packet is still only sent when another
packet is sent, not at, e.g. every 8 CPU cycles.
If an invalid value is entered, the error message
will give a list of valid values e.g.
$ perf record -e intel_pt/cyc,cyc_thresh=15/u uname
Invalid cyc_thresh for intel_pt. Valid values are: 0-12
CYC packets are not requested by default.
no_force_psb This is a driver option and is not in the IA32_RTIT_CTL MSR.
It stops the driver resetting the byte count to zero whenever
enabling the trace (for example on context switches) which in
turn results in no PSB being forced. However some processors
will produce a PSB anyway.
In any case, there is still a PSB when the trace is enabled for
the first time.
no_force_psb can be used to slightly decrease the trace size but
may make it harder for the decoder to recover from errors.
no_force_psb is not selected by default.
new snapshot option
-------------------
The difference between full trace and snapshot from the kernel's perspective is
that in full trace we don't overwrite trace data that the user hasn't collected
yet (and indicated that by advancing aux_tail), whereas in snapshot mode we let
the trace run and overwrite older data in the buffer so that whenever something
interesting happens, we can stop it and grab a snapshot of what was going on
around that interesting moment.
To select snapshot mode a new option has been added:
-S
Optionally it can be followed by the snapshot size e.g.
-S0x100000
The default snapshot size is the auxtrace mmap size. If neither auxtrace mmap size
nor snapshot size is specified, then the default is 4MiB for privileged users
(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
If an unprivileged user does not specify mmap pages, the mmap pages will be
reduced as described in the 'new auxtrace mmap size option' section below.
The snapshot size is displayed if the option -vv is used e.g.
Intel PT snapshot size: %zu
new auxtrace mmap size option
---------------------------
Intel PT buffer size is specified by an addition to the -m option e.g.
-m,16
selects a buffer size of 16 pages i.e. 64KiB.
Note that the existing functionality of -m is unchanged. The auxtrace mmap size
is specified by the optional addition of a comma and the value.
The default auxtrace mmap size for Intel PT is 4MiB/page_size for privileged users
(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
If an unprivileged user does not specify mmap pages, the mmap pages will be
reduced from the default 512KiB/page_size to 256KiB/page_size, otherwise the
user is likely to get an error as they exceed their mlock limit (Max locked
memory as shown in /proc/self/limits). Note that perf does not count the first
512KiB (actually /proc/sys/kernel/perf_event_mlock_kb minus 1 page) per cpu
against the mlock limit so an unprivileged user is allowed 512KiB per cpu plus
their mlock limit (which defaults to 64KiB but is not multiplied by the number
of cpus).
In full-trace mode, powers of two are allowed for buffer size, with a minimum
size of 2 pages. In snapshot mode, it is the same but the minimum size is
1 page.
The mmap size and auxtrace mmap size are displayed if the -vv option is used e.g.
mmap length 528384
auxtrace mmap length 4198400
Intel PT modes of operation
---------------------------
Intel PT can be used in 2 modes:
full-trace mode
snapshot mode
Full-trace mode traces continuously e.g.
perf record -e intel_pt//u uname
Snapshot mode captures the available data when a signal is sent e.g.
perf record -v -e intel_pt//u -S ./loopy 1000000000 &
[1] 11435
kill -USR2 11435
Recording AUX area tracing snapshot
Note that the signal sent is SIGUSR2.
Note that "Recording AUX area tracing snapshot" is displayed because the -v
option is used.
The 2 modes cannot be used together.
Buffer handling
---------------
There may be buffer limitations (i.e. single ToPa entry) which means that actual
buffer sizes are limited to powers of 2 up to 4MiB (MAX_ORDER). In order to
provide other sizes, and in particular an arbitrarily large size, multiple
buffers are logically concatenated. However an interrupt must be used to switch
between buffers. That has two potential problems:
a) the interrupt may not be handled in time so that the current buffer
becomes full and some trace data is lost.
b) the interrupts may slow the system and affect the performance
results.
If trace data is lost, the driver sets 'truncated' in the PERF_RECORD_AUX event
which the tools report as an error.
In full-trace mode, the driver waits for data to be copied out before allowing
the (logical) buffer to wrap-around. If data is not copied out quickly enough,
again 'truncated' is set in the PERF_RECORD_AUX event. If the driver has to
wait, the intel_pt event gets disabled. Because it is difficult to know when
that happens, perf tools always re-enable the intel_pt event after copying out
data.
Intel PT and build ids
----------------------
By default "perf record" post-processes the event stream to find all build ids
for executables for all addresses sampled. Deliberately, Intel PT is not
decoded for that purpose (it would take too long). Instead the build ids for
all executables encountered (due to mmap, comm or task events) are included
in the perf.data file.
To see buildids included in the perf.data file use the command:
perf buildid-list
If the perf.data file contains Intel PT data, that is the same as:
perf buildid-list --with-hits
Snapshot mode and event disabling
---------------------------------
In order to make a snapshot, the intel_pt event is disabled using an IOCTL,
namely PERF_EVENT_IOC_DISABLE. However doing that can also disable the
collection of side-band information. In order to prevent that, a dummy
software event has been introduced that permits tracking events (like mmaps) to
continue to be recorded while intel_pt is disabled. That is important to ensure
there is complete side-band information to allow the decoding of subsequent
snapshots.
A test has been created for that. To find the test:
perf test list
...
23: Test using a dummy software event to keep tracking
To run the test:
perf test 23
23: Test using a dummy software event to keep tracking : Ok
perf record modes (nothing new here)
------------------------------------
perf record essentially operates in one of three modes:
per thread
per cpu
workload only
"per thread" mode is selected by -t or by --per-thread (with -p or -u or just a
workload).
"per cpu" is selected by -C or -a.
"workload only" mode is selected by not using the other options but providing a
command to run (i.e. the workload).
In per-thread mode an exact list of threads is traced. There is no inheritance.
Each thread has its own event buffer.
In per-cpu mode all processes (or processes from the selected cgroup i.e. -G
option, or processes selected with -p or -u) are traced. Each cpu has its own
buffer. Inheritance is allowed.
In workload-only mode, the workload is traced but with per-cpu buffers.
Inheritance is allowed. Note that you can now trace a workload in per-thread
mode by using the --per-thread option.
Privileged vs non-privileged users
----------------------------------
Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users
have memory limits imposed upon them. That affects what buffer sizes they can
have as outlined above.
Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users are
not permitted to use tracepoints which means there is insufficient side-band
information to decode Intel PT in per-cpu mode, and potentially workload-only
mode too if the workload creates new processes.
Note also, that to use tracepoints, read-access to debugfs is required. So if
debugfs is not mounted or the user does not have read-access, it will again not
be possible to decode Intel PT in per-cpu mode.
sched_switch tracepoint
-----------------------
The sched_switch tracepoint is used to provide side-band data for Intel PT
decoding. sched_switch events are automatically added. e.g. the second event
shown below
$ perf record -vv -e intel_pt//u uname
------------------------------------------------------------
perf_event_attr:
type 6
size 112
config 0x400
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID
disabled 1
inherit 1
exclude_kernel 1
exclude_hv 1
enable_on_exec 1
sample_id_all 1
------------------------------------------------------------
sys_perf_event_open: pid 31104 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open: pid 31104 cpu 1 group_fd -1 flags 0x8
sys_perf_event_open: pid 31104 cpu 2 group_fd -1 flags 0x8
sys_perf_event_open: pid 31104 cpu 3 group_fd -1 flags 0x8
------------------------------------------------------------
perf_event_attr:
type 2
size 112
config 0x108
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
read_format ID
inherit 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8
------------------------------------------------------------
perf_event_attr:
type 1
size 112
config 0x9
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|IDENTIFIER
read_format ID
disabled 1
inherit 1
exclude_kernel 1
exclude_hv 1
mmap 1
comm 1
enable_on_exec 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 31104 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open: pid 31104 cpu 1 group_fd -1 flags 0x8
sys_perf_event_open: pid 31104 cpu 2 group_fd -1 flags 0x8
sys_perf_event_open: pid 31104 cpu 3 group_fd -1 flags 0x8
mmap size 528384B
AUX area mmap length 4194304
perf event ring buffer mmapped per cpu
Synthesizing auxtrace information
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.042 MB perf.data ]
Note, the sched_switch event is only added if the user is permitted to use it
and only in per-cpu mode.
Note also, the sched_switch event is only added if TSC packets are requested.
That is because, in the absence of timing information, the sched_switch events
cannot be matched against the Intel PT trace.
perf script
===========
By default, perf script will decode trace data found in the perf.data file.
This can be further controlled by new option --itrace.
New --itrace option
-------------------
Having no option is the same as
--itrace
which, in turn, is the same as
--itrace=ibxe
The letters are:
i synthesize "instructions" events
b synthesize "branches" events
x synthesize "transactions" events
c synthesize branches events (calls only)
r synthesize branches events (returns only)
e synthesize tracing error events
d create a debug log
g synthesize a call chain (use with i or x)
"Instructions" events look like they were recorded by "perf record -e
instructions".
"Branches" events look like they were recorded by "perf record -e branches". "c"
and "r" can be combined to get calls and returns.
"Transactions" events correspond to the start or end of transactions. The
'flags' field can be used in perf script to determine whether the event is a
tranasaction start, commit or abort.
Error events are new. They show where the decoder lost the trace. Error events
are quite important. Users must know if what they are seeing is a complete
picture or not.
The "d" option will cause the creation of a file "intel_pt.log" containing all
decoded packets and instructions. Note that this option slows down the decoder
and that the resulting file may be very large.
In addition, the period of the "instructions" event can be specified. e.g.
--itrace=i10us
sets the period to 10us i.e. one instruction sample is synthesized for each 10
microseconds of trace. Alternatives to "us" are "ms" (milliseconds),
"ns" (nanoseconds), "t" (TSC ticks) or "i" (instructions).
"ms", "us" and "ns" are converted to TSC ticks.
The timing information included with Intel PT does not give the time of every
instruction. Consequently, for the purpose of sampling, the decoder estimates
the time since the last timing packet based on 1 tick per instruction. The time
on the sample is *not* adjusted and reflects the last known value of TSC.
For Intel PT, the default period is 100us.
Also the call chain size (default 16, max. 1024) for instructions or
transactions events can be specified. e.g.
--itrace=ig32
--itrace=xg32
To disable trace decoding entirely, use the option --no-itrace.
dump option
-----------
perf script has an option (-D) to "dump" the events i.e. display the binary
data.
When -D is used, Intel PT packets are displayed. The packet decoder does not
pay attention to PSB packets, but just decodes the bytes - so the packets seen
by the actual decoder may not be identical in places where the data is corrupt.
One example of that would be when the buffer-switching interrupt has been too
slow, and the buffer has been filled completely. In that case, the last packet
in the buffer might be truncated and immediately followed by a PSB as the trace
continues in the next buffer.
To disable the display of Intel PT packets, combine the -D option with
--no-itrace.
perf report
===========
By default, perf report will decode trace data found in the perf.data file.
This can be further controlled by new option --itrace exactly the same as
perf script, with the exception that the default is --itrace=igxe.
perf inject
===========
perf inject also accepts the --itrace option in which case tracing data is
removed and replaced with the synthesized events. e.g.
perf inject --itrace -i perf.data -o perf.data.new

View File

@ -0,0 +1,22 @@
i synthesize instructions events
b synthesize branches events
c synthesize branches events (calls only)
r synthesize branches events (returns only)
x synthesize transactions events
e synthesize error events
d create a debug log
g synthesize a call chain (use with i or x)
The default is all events i.e. the same as --itrace=ibxe
In addition, the period (default 100000) for instructions events
can be specified in units of:
i instructions
t ticks
ms milliseconds
us microseconds
ns nanoseconds (default)
Also the call chain size (default 16, max. 1024) for instructions or
transactions events can be specified.

View File

@ -216,6 +216,10 @@ Suite for evaluating parallel wake calls.
*requeue*::
Suite for evaluating requeue calls.
*lock-pi*::
Suite for evaluating futex lock_pi calls.
SEE ALSO
--------
linkperf:perf[1]

View File

@ -48,28 +48,7 @@ OPTIONS
Decode Instruction Tracing data, replacing it with synthesized events.
Options are:
i synthesize instructions events
b synthesize branches events
c synthesize branches events (calls only)
r synthesize branches events (returns only)
x synthesize transactions events
e synthesize error events
d create a debug log
g synthesize a call chain (use with i or x)
The default is all events i.e. the same as --itrace=ibxe
In addition, the period (default 100000) for instructions events
can be specified in units of:
i instructions
t ticks
ms milliseconds
us microseconds
ns nanoseconds (default)
Also the call chain size (default 16, max. 1024) for instructions or
transactions events can be specified.
include::itrace.txt[]
SEE ALSO
--------

View File

@ -45,6 +45,21 @@ OPTIONS
param1 and param2 are defined as formats for the PMU in:
/sys/bus/event_sources/devices/<pmu>/format/*
There are also some params which are not defined in .../<pmu>/format/*.
These params can be used to overload default config values per event.
Here is a list of the params.
- 'period': Set event sampling period
- 'freq': Set event sampling frequency
- 'time': Disable/enable time stamping. Acceptable values are 1 for
enabling time stamping. 0 for disabling time stamping.
The default is 1.
- 'call-graph': Disable/enable callgraph. Acceptable str are "fp" for
FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and
"no" for disable callgraph.
- 'stack-size': user stack size for dwarf mode
Note: If user explicitly sets options which conflict with the params,
the value set by the params will be overridden.
- a hardware breakpoint event in the form of '\mem:addr[/len][:access]'
where addr is the address in memory you want to break in.
Access is the memory access type (read, write, execute) it can
@ -61,7 +76,16 @@ OPTIONS
"perf report" to view group events together.
--filter=<filter>::
Event filter.
Event filter. This option should follow a event selector (-e) which
selects tracepoint event(s). Multiple '--filter' options are combined
using '&&'.
--exclude-perf::
Don't record events issued by perf itself. This option should follow
a event selector (-e) which selects tracepoint event(s). It adds a
filter expression 'common_pid != $PERFPID' to filters. If other
'--filter' exists, the new filter expression will be combined with
them by '&&'.
-a::
--all-cpus::
@ -276,6 +300,10 @@ When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
because the file may be huge. A time out is needed in such cases.
This option sets the time out limit. The default value is 500 ms.
--switch-events::
Record context switch events i.e. events of type PERF_RECORD_SWITCH or
PERF_RECORD_SWITCH_CPU_WIDE.
SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-list[1]

View File

@ -81,6 +81,8 @@ OPTIONS
- cpu: cpu number the task ran at the time of sample
- srcline: filename and line number executed at the time of sample. The
DWARF debugging info must be provided.
- srcfile: file name of the source file of the same. Requires dwarf
information.
- weight: Event specific weight, e.g. memory latency or transaction
abort cost. This is the global weight.
- local_weight: Local weight version of the weight above.
@ -109,6 +111,7 @@ OPTIONS
- mispredict: "N" for predicted branch, "Y" for mispredicted branch
- in_tx: branch in TSX transaction
- abort: TSX transaction abort.
- cycles: Cycles in basic block
And default sort keys are changed to comm, dso_from, symbol_from, dso_to
and symbol_to, see '--branch-stack'.
@ -328,31 +331,23 @@ OPTIONS
--itrace::
Options for decoding instruction tracing data. The options are:
i synthesize instructions events
b synthesize branches events
c synthesize branches events (calls only)
r synthesize branches events (returns only)
x synthesize transactions events
e synthesize error events
d create a debug log
g synthesize a call chain (use with i or x)
The default is all events i.e. the same as --itrace=ibxe
In addition, the period (default 100000) for instructions events
can be specified in units of:
i instructions
t ticks
ms milliseconds
us microseconds
ns nanoseconds (default)
Also the call chain size (default 16, max. 1024) for instructions or
transactions events can be specified.
include::itrace.txt[]
To disable decoding entirely, use --no-itrace.
--full-source-path::
Show the full path for source files for srcline output.
--show-ref-call-graph::
When multiple events are sampled, it may not be needed to collect
callgraphs for all of them. The sample sites are usually nearby,
and it's enough to collect the callgraphs on a reference event.
So user can use "call-graph=no" event modifier to disable callgraph
for other events to reduce the overhead.
However, perf report cannot show callgraphs for the event which
disable the callgraph.
This option extends the perf report to show reference callgraphs,
which collected by reference event, in no callgraph event.
include::callchain-overhead-calculation.txt[]

View File

@ -222,6 +222,17 @@ OPTIONS
--show-mmap-events
Display mmap related events (e.g. MMAP, MMAP2).
--show-switch-events
Display context switch events i.e. events of type PERF_RECORD_SWITCH or
PERF_RECORD_SWITCH_CPU_WIDE.
--demangle::
Demangle symbol names to human readable form. It's enabled by default,
disable with --no-demangle.
--demangle-kernel::
Demangle kernel symbol names to human readable form (for C++ kernels).
--header
Show perf.data header.
@ -231,31 +242,13 @@ OPTIONS
--itrace::
Options for decoding instruction tracing data. The options are:
i synthesize instructions events
b synthesize branches events
c synthesize branches events (calls only)
r synthesize branches events (returns only)
x synthesize transactions events
e synthesize error events
d create a debug log
g synthesize a call chain (use with i or x)
The default is all events i.e. the same as --itrace=ibxe
In addition, the period (default 100000) for instructions events
can be specified in units of:
i instructions
t ticks
ms milliseconds
us microseconds
ns nanoseconds (default)
Also the call chain size (default 16, max. 1024) for instructions or
transactions events can be specified.
include::itrace.txt[]
To disable decoding entirely, use --no-itrace.
--full-source-path::
Show the full path for source files for srcline output.
SEE ALSO
--------
linkperf:perf-record[1], linkperf:perf-script-perl[1],

View File

@ -208,6 +208,27 @@ Default is to monitor all CPUS.
This option sets the time out limit. The default value is 500 ms.
-b::
--branch-any::
Enable taken branch stack sampling. Any type of taken branch may be sampled.
This is a shortcut for --branch-filter any. See --branch-filter for more infos.
-j::
--branch-filter::
Enable taken branch stack sampling. Each sample captures a series of consecutive
taken branches. The number of branches captured with each sample depends on the
underlying hardware, the type of branches of interest, and the executed code.
It is possible to select the types of branches captured by enabling filters.
For a full list of modifiers please see the perf record manpage.
The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
The privilege levels may be omitted, in which case, the privilege levels of the associated
event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
levels are subject to permissions. When sampling on multiple events, branch stack sampling
is enabled for all the sampling events. The sampled branch type is the same for all events.
The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
Note that this feature may not be available on all processors.
INTERACTIVE PROMPTING KEYS
--------------------------

View File

@ -18,6 +18,7 @@ tools/arch/x86/include/asm/atomic.h
tools/arch/x86/include/asm/rmwcc.h
tools/lib/traceevent
tools/lib/api
tools/lib/bpf
tools/lib/hweight.c
tools/lib/rbtree.c
tools/lib/symbol/kallsyms.c
@ -40,7 +41,6 @@ tools/include/asm-generic/bitops.h
tools/include/linux/atomic.h
tools/include/linux/bitops.h
tools/include/linux/compiler.h
tools/include/linux/export.h
tools/include/linux/hash.h
tools/include/linux/kernel.h
tools/include/linux/list.h

View File

@ -76,6 +76,12 @@ include config/utilities.mak
#
# Define NO_AUXTRACE if you do not want AUX area tracing support
# As per kernel Makefile, avoid funny character set dependencies
unexport LC_ALL
LC_COLLATE=C
LC_NUMERIC=C
export LC_COLLATE LC_NUMERIC
ifeq ($(srctree),)
srctree := $(patsubst %/,%,$(dir $(shell pwd)))
srctree := $(patsubst %/,%,$(dir $(srctree)))
@ -135,6 +141,7 @@ INSTALL = install
FLEX = flex
BISON = bison
STRIP = strip
AWK = awk
LIB_DIR = $(srctree)/tools/lib/api/
TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
@ -289,7 +296,7 @@ strip: $(PROGRAMS) $(OUTPUT)perf
PERF_IN := $(OUTPUT)perf-in.o
export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX
export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX AWK
build := -f $(srctree)/tools/build/Makefile.build dir=. obj
$(PERF_IN): $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h FORCE
@ -507,6 +514,11 @@ endif
$(INSTALL) $(OUTPUT)perf-archive -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
$(call QUIET_INSTALL, perf-with-kcore) \
$(INSTALL) $(OUTPUT)perf-with-kcore -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
ifndef NO_LIBAUDIT
$(call QUIET_INSTALL, strace/groups) \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(STRACE_GROUPS_INSTDIR_SQ)'; \
$(INSTALL) trace/strace/groups/* -t '$(DESTDIR_SQ)$(STRACE_GROUPS_INSTDIR_SQ)'
endif
ifndef NO_LIBPERL
$(call QUIET_INSTALL, perl-scripts) \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'; \
@ -560,7 +572,8 @@ clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
$(Q)$(RM) $(OUTPUT).config-detected
$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32
$(call QUIET_CLEAN, core-gen) $(RM) *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
$(call QUIET_CLEAN, core-gen) $(RM) *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex* \
$(OUTPUT)util/intel-pt-decoder/inat-tables.c
$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
$(python-clean)

View File

@ -0,0 +1 @@
# empty

View File

@ -128,7 +128,7 @@ static const char *normalize_arch(char *arch)
return arch;
}
static int perf_session_env__lookup_binutils_path(struct perf_session_env *env,
static int perf_session_env__lookup_binutils_path(struct perf_env *env,
const char *name,
const char **path)
{
@ -206,7 +206,7 @@ out_error:
return -1;
}
int perf_session_env__lookup_objdump(struct perf_session_env *env)
int perf_session_env__lookup_objdump(struct perf_env *env)
{
/*
* For live mode, env->arch will be NULL and we can use

View File

@ -5,6 +5,6 @@
extern const char *objdump_path;
int perf_session_env__lookup_objdump(struct perf_session_env *env);
int perf_session_env__lookup_objdump(struct perf_env *env);
#endif /* ARCH_PERF_COMMON_H */

View File

@ -0,0 +1 @@
# empty

View File

@ -0,0 +1 @@
# empty

View File

@ -1,8 +1,13 @@
libperf-y += header.o
libperf-y += tsc.o
libperf-y += pmu.o
libperf-y += kvm-stat.o
libperf-$(CONFIG_DWARF) += dwarf-regs.o
libperf-$(CONFIG_LIBUNWIND) += unwind-libunwind.o
libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
libperf-$(CONFIG_AUXTRACE) += auxtrace.o
libperf-$(CONFIG_AUXTRACE) += intel-pt.o
libperf-$(CONFIG_AUXTRACE) += intel-bts.o

View File

@ -0,0 +1,83 @@
/*
* auxtrace.c: AUX area tracing support
* Copyright (c) 2013-2014, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
*/
#include <stdbool.h>
#include "../../util/header.h"
#include "../../util/debug.h"
#include "../../util/pmu.h"
#include "../../util/auxtrace.h"
#include "../../util/intel-pt.h"
#include "../../util/intel-bts.h"
#include "../../util/evlist.h"
static
struct auxtrace_record *auxtrace_record__init_intel(struct perf_evlist *evlist,
int *err)
{
struct perf_pmu *intel_pt_pmu;
struct perf_pmu *intel_bts_pmu;
struct perf_evsel *evsel;
bool found_pt = false;
bool found_bts = false;
intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
if (evlist) {
evlist__for_each(evlist, evsel) {
if (intel_pt_pmu &&
evsel->attr.type == intel_pt_pmu->type)
found_pt = true;
if (intel_bts_pmu &&
evsel->attr.type == intel_bts_pmu->type)
found_bts = true;
}
}
if (found_pt && found_bts) {
pr_err("intel_pt and intel_bts may not be used together\n");
*err = -EINVAL;
return NULL;
}
if (found_pt)
return intel_pt_recording_init(err);
if (found_bts)
return intel_bts_recording_init(err);
return NULL;
}
struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist,
int *err)
{
char buffer[64];
int ret;
*err = 0;
ret = get_cpuid(buffer, sizeof(buffer));
if (ret) {
*err = ret;
return NULL;
}
if (!strncmp(buffer, "GenuineIntel,", 13))
return auxtrace_record__init_intel(evlist, err);
return NULL;
}

View File

@ -0,0 +1,458 @@
/*
* intel-bts.c: Intel Processor Trace support
* Copyright (c) 2013-2015, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
*/
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/bitops.h>
#include <linux/log2.h>
#include "../../util/cpumap.h"
#include "../../util/evsel.h"
#include "../../util/evlist.h"
#include "../../util/session.h"
#include "../../util/util.h"
#include "../../util/pmu.h"
#include "../../util/debug.h"
#include "../../util/tsc.h"
#include "../../util/auxtrace.h"
#include "../../util/intel-bts.h"
#define KiB(x) ((x) * 1024)
#define MiB(x) ((x) * 1024 * 1024)
#define KiB_MASK(x) (KiB(x) - 1)
#define MiB_MASK(x) (MiB(x) - 1)
#define INTEL_BTS_DFLT_SAMPLE_SIZE KiB(4)
#define INTEL_BTS_MAX_SAMPLE_SIZE KiB(60)
struct intel_bts_snapshot_ref {
void *ref_buf;
size_t ref_offset;
bool wrapped;
};
struct intel_bts_recording {
struct auxtrace_record itr;
struct perf_pmu *intel_bts_pmu;
struct perf_evlist *evlist;
bool snapshot_mode;
size_t snapshot_size;
int snapshot_ref_cnt;
struct intel_bts_snapshot_ref *snapshot_refs;
};
struct branch {
u64 from;
u64 to;
u64 misc;
};
static size_t intel_bts_info_priv_size(struct auxtrace_record *itr __maybe_unused)
{
return INTEL_BTS_AUXTRACE_PRIV_SIZE;
}
static int intel_bts_info_fill(struct auxtrace_record *itr,
struct perf_session *session,
struct auxtrace_info_event *auxtrace_info,
size_t priv_size)
{
struct intel_bts_recording *btsr =
container_of(itr, struct intel_bts_recording, itr);
struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
struct perf_event_mmap_page *pc;
struct perf_tsc_conversion tc = { .time_mult = 0, };
bool cap_user_time_zero = false;
int err;
if (priv_size != INTEL_BTS_AUXTRACE_PRIV_SIZE)
return -EINVAL;
if (!session->evlist->nr_mmaps)
return -EINVAL;
pc = session->evlist->mmap[0].base;
if (pc) {
err = perf_read_tsc_conversion(pc, &tc);
if (err) {
if (err != -EOPNOTSUPP)
return err;
} else {
cap_user_time_zero = tc.time_mult != 0;
}
if (!cap_user_time_zero)
ui__warning("Intel BTS: TSC not available\n");
}
auxtrace_info->type = PERF_AUXTRACE_INTEL_BTS;
auxtrace_info->priv[INTEL_BTS_PMU_TYPE] = intel_bts_pmu->type;
auxtrace_info->priv[INTEL_BTS_TIME_SHIFT] = tc.time_shift;
auxtrace_info->priv[INTEL_BTS_TIME_MULT] = tc.time_mult;
auxtrace_info->priv[INTEL_BTS_TIME_ZERO] = tc.time_zero;
auxtrace_info->priv[INTEL_BTS_CAP_USER_TIME_ZERO] = cap_user_time_zero;
auxtrace_info->priv[INTEL_BTS_SNAPSHOT_MODE] = btsr->snapshot_mode;
return 0;
}
static int intel_bts_recording_options(struct auxtrace_record *itr,
struct perf_evlist *evlist,
struct record_opts *opts)
{
struct intel_bts_recording *btsr =
container_of(itr, struct intel_bts_recording, itr);
struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
struct perf_evsel *evsel, *intel_bts_evsel = NULL;
const struct cpu_map *cpus = evlist->cpus;
bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
btsr->evlist = evlist;
btsr->snapshot_mode = opts->auxtrace_snapshot_mode;
evlist__for_each(evlist, evsel) {
if (evsel->attr.type == intel_bts_pmu->type) {
if (intel_bts_evsel) {
pr_err("There may be only one " INTEL_BTS_PMU_NAME " event\n");
return -EINVAL;
}
evsel->attr.freq = 0;
evsel->attr.sample_period = 1;
intel_bts_evsel = evsel;
opts->full_auxtrace = true;
}
}
if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
pr_err("Snapshot mode (-S option) requires " INTEL_BTS_PMU_NAME " PMU event (-e " INTEL_BTS_PMU_NAME ")\n");
return -EINVAL;
}
if (!opts->full_auxtrace)
return 0;
if (opts->full_auxtrace && !cpu_map__empty(cpus)) {
pr_err(INTEL_BTS_PMU_NAME " does not support per-cpu recording\n");
return -EINVAL;
}
/* Set default sizes for snapshot mode */
if (opts->auxtrace_snapshot_mode) {
if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
if (privileged) {
opts->auxtrace_mmap_pages = MiB(4) / page_size;
} else {
opts->auxtrace_mmap_pages = KiB(128) / page_size;
if (opts->mmap_pages == UINT_MAX)
opts->mmap_pages = KiB(256) / page_size;
}
} else if (!opts->auxtrace_mmap_pages && !privileged &&
opts->mmap_pages == UINT_MAX) {
opts->mmap_pages = KiB(256) / page_size;
}
if (!opts->auxtrace_snapshot_size)
opts->auxtrace_snapshot_size =
opts->auxtrace_mmap_pages * (size_t)page_size;
if (!opts->auxtrace_mmap_pages) {
size_t sz = opts->auxtrace_snapshot_size;
sz = round_up(sz, page_size) / page_size;
opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
}
if (opts->auxtrace_snapshot_size >
opts->auxtrace_mmap_pages * (size_t)page_size) {
pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
opts->auxtrace_snapshot_size,
opts->auxtrace_mmap_pages * (size_t)page_size);
return -EINVAL;
}
if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
return -EINVAL;
}
pr_debug2("Intel BTS snapshot size: %zu\n",
opts->auxtrace_snapshot_size);
}
/* Set default sizes for full trace mode */
if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
if (privileged) {
opts->auxtrace_mmap_pages = MiB(4) / page_size;
} else {
opts->auxtrace_mmap_pages = KiB(128) / page_size;
if (opts->mmap_pages == UINT_MAX)
opts->mmap_pages = KiB(256) / page_size;
}
}
/* Validate auxtrace_mmap_pages */
if (opts->auxtrace_mmap_pages) {
size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
size_t min_sz;
if (opts->auxtrace_snapshot_mode)
min_sz = KiB(4);
else
min_sz = KiB(8);
if (sz < min_sz || !is_power_of_2(sz)) {
pr_err("Invalid mmap size for Intel BTS: must be at least %zuKiB and a power of 2\n",
min_sz / 1024);
return -EINVAL;
}
}
if (intel_bts_evsel) {
/*
* To obtain the auxtrace buffer file descriptor, the auxtrace event
* must come first.
*/
perf_evlist__to_front(evlist, intel_bts_evsel);
/*
* In the case of per-cpu mmaps, we need the CPU on the
* AUX event.
*/
if (!cpu_map__empty(cpus))
perf_evsel__set_sample_bit(intel_bts_evsel, CPU);
}
/* Add dummy event to keep tracking */
if (opts->full_auxtrace) {
struct perf_evsel *tracking_evsel;
int err;
err = parse_events(evlist, "dummy:u", NULL);
if (err)
return err;
tracking_evsel = perf_evlist__last(evlist);
perf_evlist__set_tracking_event(evlist, tracking_evsel);
tracking_evsel->attr.freq = 0;
tracking_evsel->attr.sample_period = 1;
}
return 0;
}
static int intel_bts_parse_snapshot_options(struct auxtrace_record *itr,
struct record_opts *opts,
const char *str)
{
struct intel_bts_recording *btsr =
container_of(itr, struct intel_bts_recording, itr);
unsigned long long snapshot_size = 0;
char *endptr;
if (str) {
snapshot_size = strtoull(str, &endptr, 0);
if (*endptr || snapshot_size > SIZE_MAX)
return -1;
}
opts->auxtrace_snapshot_mode = true;
opts->auxtrace_snapshot_size = snapshot_size;
btsr->snapshot_size = snapshot_size;
return 0;
}
static u64 intel_bts_reference(struct auxtrace_record *itr __maybe_unused)
{
return rdtsc();
}
static int intel_bts_alloc_snapshot_refs(struct intel_bts_recording *btsr,
int idx)
{
const size_t sz = sizeof(struct intel_bts_snapshot_ref);
int cnt = btsr->snapshot_ref_cnt, new_cnt = cnt * 2;
struct intel_bts_snapshot_ref *refs;
if (!new_cnt)
new_cnt = 16;
while (new_cnt <= idx)
new_cnt *= 2;
refs = calloc(new_cnt, sz);
if (!refs)
return -ENOMEM;
memcpy(refs, btsr->snapshot_refs, cnt * sz);
btsr->snapshot_refs = refs;
btsr->snapshot_ref_cnt = new_cnt;
return 0;
}
static void intel_bts_free_snapshot_refs(struct intel_bts_recording *btsr)
{
int i;
for (i = 0; i < btsr->snapshot_ref_cnt; i++)
zfree(&btsr->snapshot_refs[i].ref_buf);
zfree(&btsr->snapshot_refs);
}
static void intel_bts_recording_free(struct auxtrace_record *itr)
{
struct intel_bts_recording *btsr =
container_of(itr, struct intel_bts_recording, itr);
intel_bts_free_snapshot_refs(btsr);
free(btsr);
}
static int intel_bts_snapshot_start(struct auxtrace_record *itr)
{
struct intel_bts_recording *btsr =
container_of(itr, struct intel_bts_recording, itr);
struct perf_evsel *evsel;
evlist__for_each(btsr->evlist, evsel) {
if (evsel->attr.type == btsr->intel_bts_pmu->type)
return perf_evlist__disable_event(btsr->evlist, evsel);
}
return -EINVAL;
}
static int intel_bts_snapshot_finish(struct auxtrace_record *itr)
{
struct intel_bts_recording *btsr =
container_of(itr, struct intel_bts_recording, itr);
struct perf_evsel *evsel;
evlist__for_each(btsr->evlist, evsel) {
if (evsel->attr.type == btsr->intel_bts_pmu->type)
return perf_evlist__enable_event(btsr->evlist, evsel);
}
return -EINVAL;
}
static bool intel_bts_first_wrap(u64 *data, size_t buf_size)
{
int i, a, b;
b = buf_size >> 3;
a = b - 512;
if (a < 0)
a = 0;
for (i = a; i < b; i++) {
if (data[i])
return true;
}
return false;
}
static int intel_bts_find_snapshot(struct auxtrace_record *itr, int idx,
struct auxtrace_mmap *mm, unsigned char *data,
u64 *head, u64 *old)
{
struct intel_bts_recording *btsr =
container_of(itr, struct intel_bts_recording, itr);
bool wrapped;
int err;
pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
__func__, idx, (size_t)*old, (size_t)*head);
if (idx >= btsr->snapshot_ref_cnt) {
err = intel_bts_alloc_snapshot_refs(btsr, idx);
if (err)
goto out_err;
}
wrapped = btsr->snapshot_refs[idx].wrapped;
if (!wrapped && intel_bts_first_wrap((u64 *)data, mm->len)) {
btsr->snapshot_refs[idx].wrapped = true;
wrapped = true;
}
/*
* In full trace mode 'head' continually increases. However in snapshot
* mode 'head' is an offset within the buffer. Here 'old' and 'head'
* are adjusted to match the full trace case which expects that 'old' is
* always less than 'head'.
*/
if (wrapped) {
*old = *head;
*head += mm->len;
} else {
if (mm->mask)
*old &= mm->mask;
else
*old %= mm->len;
if (*old > *head)
*head += mm->len;
}
pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
__func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
return 0;
out_err:
pr_err("%s: failed, error %d\n", __func__, err);
return err;
}
static int intel_bts_read_finish(struct auxtrace_record *itr, int idx)
{
struct intel_bts_recording *btsr =
container_of(itr, struct intel_bts_recording, itr);
struct perf_evsel *evsel;
evlist__for_each(btsr->evlist, evsel) {
if (evsel->attr.type == btsr->intel_bts_pmu->type)
return perf_evlist__enable_event_idx(btsr->evlist,
evsel, idx);
}
return -EINVAL;
}
struct auxtrace_record *intel_bts_recording_init(int *err)
{
struct perf_pmu *intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
struct intel_bts_recording *btsr;
if (!intel_bts_pmu)
return NULL;
btsr = zalloc(sizeof(struct intel_bts_recording));
if (!btsr) {
*err = -ENOMEM;
return NULL;
}
btsr->intel_bts_pmu = intel_bts_pmu;
btsr->itr.recording_options = intel_bts_recording_options;
btsr->itr.info_priv_size = intel_bts_info_priv_size;
btsr->itr.info_fill = intel_bts_info_fill;
btsr->itr.free = intel_bts_recording_free;
btsr->itr.snapshot_start = intel_bts_snapshot_start;
btsr->itr.snapshot_finish = intel_bts_snapshot_finish;
btsr->itr.find_snapshot = intel_bts_find_snapshot;
btsr->itr.parse_snapshot_options = intel_bts_parse_snapshot_options;
btsr->itr.reference = intel_bts_reference;
btsr->itr.read_finish = intel_bts_read_finish;
btsr->itr.alignment = sizeof(struct branch);
return &btsr->itr;
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,18 @@
#include <string.h>
#include <linux/perf_event.h>
#include "../../util/intel-pt.h"
#include "../../util/intel-bts.h"
#include "../../util/pmu.h"
struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
{
#ifdef HAVE_AUXTRACE_SUPPORT
if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
return intel_pt_pmu_default_config(pmu);
if (!strcmp(pmu->name, INTEL_BTS_PMU_NAME))
pmu->selectable = true;
#endif
return NULL;
}

View File

@ -5,6 +5,7 @@ perf-y += futex-hash.o
perf-y += futex-wake.o
perf-y += futex-wake-parallel.o
perf-y += futex-requeue.o
perf-y += futex-lock-pi.o
perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
perf-$(CONFIG_X86_64) += mem-memset-x86-64-asm.o

View File

@ -36,6 +36,8 @@ extern int bench_futex_wake(int argc, const char **argv, const char *prefix);
extern int bench_futex_wake_parallel(int argc, const char **argv,
const char *prefix);
extern int bench_futex_requeue(int argc, const char **argv, const char *prefix);
/* pi futexes */
extern int bench_futex_lock_pi(int argc, const char **argv, const char *prefix);
#define BENCH_FORMAT_DEFAULT_STR "default"
#define BENCH_FORMAT_DEFAULT 0

View File

@ -0,0 +1,219 @@
/*
* Copyright (C) 2015 Davidlohr Bueso.
*/
#include "../perf.h"
#include "../util/util.h"
#include "../util/stat.h"
#include "../util/parse-options.h"
#include "../util/header.h"
#include "bench.h"
#include "futex.h"
#include <err.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>
struct worker {
int tid;
u_int32_t *futex;
pthread_t thread;
unsigned long ops;
};
static u_int32_t global_futex = 0;
static struct worker *worker;
static unsigned int nsecs = 10;
static bool silent = false, multi = false;
static bool done = false, fshared = false;
static unsigned int ncpus, nthreads = 0;
static int futex_flag = 0;
struct timeval start, end, runtime;
static pthread_mutex_t thread_lock;
static unsigned int threads_starting;
static struct stats throughput_stats;
static pthread_cond_t thread_parent, thread_worker;
static const struct option options[] = {
OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"),
OPT_UINTEGER('r', "runtime", &nsecs, "Specify runtime (in seconds)"),
OPT_BOOLEAN( 'M', "multi", &multi, "Use multiple futexes"),
OPT_BOOLEAN( 's', "silent", &silent, "Silent mode: do not display data/details"),
OPT_BOOLEAN( 'S', "shared", &fshared, "Use shared futexes instead of private ones"),
OPT_END()
};
static const char * const bench_futex_lock_pi_usage[] = {
"perf bench futex requeue <options>",
NULL
};
static void print_summary(void)
{
unsigned long avg = avg_stats(&throughput_stats);
double stddev = stddev_stats(&throughput_stats);
printf("%sAveraged %ld operations/sec (+- %.2f%%), total secs = %d\n",
!silent ? "\n" : "", avg, rel_stddev_stats(stddev, avg),
(int) runtime.tv_sec);
}
static void toggle_done(int sig __maybe_unused,
siginfo_t *info __maybe_unused,
void *uc __maybe_unused)
{
/* inform all threads that we're done for the day */
done = true;
gettimeofday(&end, NULL);
timersub(&end, &start, &runtime);
}
static void *workerfn(void *arg)
{
struct worker *w = (struct worker *) arg;
pthread_mutex_lock(&thread_lock);
threads_starting--;
if (!threads_starting)
pthread_cond_signal(&thread_parent);
pthread_cond_wait(&thread_worker, &thread_lock);
pthread_mutex_unlock(&thread_lock);
do {
int ret;
again:
ret = futex_lock_pi(w->futex, NULL, 0, futex_flag);
if (ret) { /* handle lock acquisition */
if (!silent)
warn("thread %d: Could not lock pi-lock for %p (%d)",
w->tid, w->futex, ret);
if (done)
break;
goto again;
}
usleep(1);
ret = futex_unlock_pi(w->futex, futex_flag);
if (ret && !silent)
warn("thread %d: Could not unlock pi-lock for %p (%d)",
w->tid, w->futex, ret);
w->ops++; /* account for thread's share of work */
} while (!done);
return NULL;
}
static void create_threads(struct worker *w, pthread_attr_t thread_attr)
{
cpu_set_t cpu;
unsigned int i;
threads_starting = nthreads;
for (i = 0; i < nthreads; i++) {
worker[i].tid = i;
if (multi) {
worker[i].futex = calloc(1, sizeof(u_int32_t));
if (!worker[i].futex)
err(EXIT_FAILURE, "calloc");
} else
worker[i].futex = &global_futex;
CPU_ZERO(&cpu);
CPU_SET(i % ncpus, &cpu);
if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu))
err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
if (pthread_create(&w[i].thread, &thread_attr, workerfn, &worker[i]))
err(EXIT_FAILURE, "pthread_create");
}
}
int bench_futex_lock_pi(int argc, const char **argv,
const char *prefix __maybe_unused)
{
int ret = 0;
unsigned int i;
struct sigaction act;
pthread_attr_t thread_attr;
argc = parse_options(argc, argv, options, bench_futex_lock_pi_usage, 0);
if (argc)
goto err;
ncpus = sysconf(_SC_NPROCESSORS_ONLN);
sigfillset(&act.sa_mask);
act.sa_sigaction = toggle_done;
sigaction(SIGINT, &act, NULL);
if (!nthreads)
nthreads = ncpus;
worker = calloc(nthreads, sizeof(*worker));
if (!worker)
err(EXIT_FAILURE, "calloc");
if (!fshared)
futex_flag = FUTEX_PRIVATE_FLAG;
printf("Run summary [PID %d]: %d threads doing pi lock/unlock pairing for %d secs.\n\n",
getpid(), nthreads, nsecs);
init_stats(&throughput_stats);
pthread_mutex_init(&thread_lock, NULL);
pthread_cond_init(&thread_parent, NULL);
pthread_cond_init(&thread_worker, NULL);
threads_starting = nthreads;
pthread_attr_init(&thread_attr);
gettimeofday(&start, NULL);
create_threads(worker, thread_attr);
pthread_attr_destroy(&thread_attr);
pthread_mutex_lock(&thread_lock);
while (threads_starting)
pthread_cond_wait(&thread_parent, &thread_lock);
pthread_cond_broadcast(&thread_worker);
pthread_mutex_unlock(&thread_lock);
sleep(nsecs);
toggle_done(0, NULL, NULL);
for (i = 0; i < nthreads; i++) {
ret = pthread_join(worker[i].thread, NULL);
if (ret)
err(EXIT_FAILURE, "pthread_join");
}
/* cleanup & report results */
pthread_cond_destroy(&thread_parent);
pthread_cond_destroy(&thread_worker);
pthread_mutex_destroy(&thread_lock);
for (i = 0; i < nthreads; i++) {
unsigned long t = worker[i].ops/runtime.tv_sec;
update_stats(&throughput_stats, t);
if (!silent)
printf("[thread %3d] futex: %p [ %ld ops/sec ]\n",
worker[i].tid, worker[i].futex, t);
if (multi)
free(worker[i].futex);
}
print_summary();
free(worker);
return ret;
err:
usage_with_options(bench_futex_lock_pi_usage, options);
exit(EXIT_FAILURE);
}

View File

@ -55,6 +55,26 @@ futex_wake(u_int32_t *uaddr, int nr_wake, int opflags)
return futex(uaddr, FUTEX_WAKE, nr_wake, NULL, NULL, 0, opflags);
}
/**
* futex_lock_pi() - block on uaddr as a PI mutex
* @detect: whether (1) or not (0) to perform deadlock detection
*/
static inline int
futex_lock_pi(u_int32_t *uaddr, struct timespec *timeout, int detect,
int opflags)
{
return futex(uaddr, FUTEX_LOCK_PI, detect, timeout, NULL, 0, opflags);
}
/**
* futex_unlock_pi() - release uaddr as a PI mutex, waking the top waiter
*/
static inline int
futex_unlock_pi(u_int32_t *uaddr, int opflags)
{
return futex(uaddr, FUTEX_UNLOCK_PI, 0, NULL, NULL, 0, opflags);
}
/**
* futex_cmp_requeue() - requeue tasks from uaddr to uaddr2
* @nr_wake: wake up to this many tasks

View File

@ -67,6 +67,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
rb_erase(&al->sym->rb_node,
&al->map->dso->symbols[al->map->type]);
symbol__delete(al->sym);
dso__reset_find_symbol_cache(al->map->dso);
}
return 0;
}
@ -187,6 +188,7 @@ find_next:
* symbol, free he->ms.sym->src to signal we already
* processed this symbol.
*/
zfree(&notes->src->cycles_hist);
zfree(&notes->src);
}
}
@ -238,6 +240,8 @@ static int __cmd_annotate(struct perf_annotate *ann)
if (nr_samples > 0) {
total_nr_samples += nr_samples;
hists__collapse_resort(hists, NULL);
/* Don't sort callchain */
perf_evsel__reset_sample_bit(pos, CALLCHAIN);
hists__output_resort(hists, NULL);
if (symbol_conf.event_group &&

View File

@ -60,6 +60,8 @@ static struct bench futex_benchmarks[] = {
{ "wake", "Benchmark for futex wake calls", bench_futex_wake },
{ "wake-parallel", "Benchmark for parallel futex wake calls", bench_futex_wake_parallel },
{ "requeue", "Benchmark for futex requeue calls", bench_futex_requeue },
/* pi-futexes */
{ "lock-pi", "Benchmark for futex lock_pi calls", bench_futex_lock_pi },
{ "all", "Test all futex benchmarks", NULL },
{ NULL, NULL, NULL }
};

View File

@ -25,8 +25,6 @@
static int build_id_cache__kcore_buildid(const char *proc_dir, char *sbuildid)
{
char root_dir[PATH_MAX];
char notes[PATH_MAX];
u8 build_id[BUILD_ID_SIZE];
char *p;
strlcpy(root_dir, proc_dir, sizeof(root_dir));
@ -35,15 +33,7 @@ static int build_id_cache__kcore_buildid(const char *proc_dir, char *sbuildid)
if (!p)
return -1;
*p = '\0';
scnprintf(notes, sizeof(notes), "%s/sys/kernel/notes", root_dir);
if (sysfs__read_build_id(notes, build_id, sizeof(build_id)))
return -1;
build_id__sprintf(build_id, sizeof(build_id), sbuildid);
return 0;
return sysfs__sprintf_build_id(root_dir, sbuildid);
}
static int build_id_cache__kcore_dir(char *dir, size_t sz)
@ -127,7 +117,7 @@ static int build_id_cache__kcore_existing(const char *from_dir, char *to_dir,
static int build_id_cache__add_kcore(const char *filename, bool force)
{
char dir[32], sbuildid[BUILD_ID_SIZE * 2 + 1];
char dir[32], sbuildid[SBUILD_ID_SIZE];
char from_dir[PATH_MAX], to_dir[PATH_MAX];
char *p;
@ -138,7 +128,7 @@ static int build_id_cache__add_kcore(const char *filename, bool force)
return -1;
*p = '\0';
if (build_id_cache__kcore_buildid(from_dir, sbuildid))
if (build_id_cache__kcore_buildid(from_dir, sbuildid) < 0)
return -1;
scnprintf(to_dir, sizeof(to_dir), "%s/[kernel.kcore]/%s",
@ -184,7 +174,7 @@ static int build_id_cache__add_kcore(const char *filename, bool force)
static int build_id_cache__add_file(const char *filename)
{
char sbuild_id[BUILD_ID_SIZE * 2 + 1];
char sbuild_id[SBUILD_ID_SIZE];
u8 build_id[BUILD_ID_SIZE];
int err;
@ -204,7 +194,7 @@ static int build_id_cache__add_file(const char *filename)
static int build_id_cache__remove_file(const char *filename)
{
u8 build_id[BUILD_ID_SIZE];
char sbuild_id[BUILD_ID_SIZE * 2 + 1];
char sbuild_id[SBUILD_ID_SIZE];
int err;
@ -276,7 +266,7 @@ static int build_id_cache__fprintf_missing(struct perf_session *session, FILE *f
static int build_id_cache__update_file(const char *filename)
{
u8 build_id[BUILD_ID_SIZE];
char sbuild_id[BUILD_ID_SIZE * 2 + 1];
char sbuild_id[SBUILD_ID_SIZE];
int err = 0;
@ -363,7 +353,7 @@ int cmd_buildid_cache(int argc, const char **argv,
setup_pager();
if (add_name_list_str) {
list = strlist__new(true, add_name_list_str);
list = strlist__new(add_name_list_str, NULL);
if (list) {
strlist__for_each(pos, list)
if (build_id_cache__add_file(pos->s)) {
@ -381,7 +371,7 @@ int cmd_buildid_cache(int argc, const char **argv,
}
if (remove_name_list_str) {
list = strlist__new(true, remove_name_list_str);
list = strlist__new(remove_name_list_str, NULL);
if (list) {
strlist__for_each(pos, list)
if (build_id_cache__remove_file(pos->s)) {
@ -399,7 +389,7 @@ int cmd_buildid_cache(int argc, const char **argv,
}
if (purge_name_list_str) {
list = strlist__new(true, purge_name_list_str);
list = strlist__new(purge_name_list_str, NULL);
if (list) {
strlist__for_each(pos, list)
if (build_id_cache__purge_path(pos->s)) {
@ -420,7 +410,7 @@ int cmd_buildid_cache(int argc, const char **argv,
ret = build_id_cache__fprintf_missing(session, stdout);
if (update_name_list_str) {
list = strlist__new(true, update_name_list_str);
list = strlist__new(update_name_list_str, NULL);
if (list) {
strlist__for_each(pos, list)
if (build_id_cache__update_file(pos->s)) {

View File

@ -19,29 +19,25 @@
static int sysfs__fprintf_build_id(FILE *fp)
{
u8 kallsyms_build_id[BUILD_ID_SIZE];
char sbuild_id[BUILD_ID_SIZE * 2 + 1];
char sbuild_id[SBUILD_ID_SIZE];
int ret;
if (sysfs__read_build_id("/sys/kernel/notes", kallsyms_build_id,
sizeof(kallsyms_build_id)) != 0)
return -1;
ret = sysfs__sprintf_build_id("/", sbuild_id);
if (ret != sizeof(sbuild_id))
return ret < 0 ? ret : -EINVAL;
build_id__sprintf(kallsyms_build_id, sizeof(kallsyms_build_id),
sbuild_id);
fprintf(fp, "%s\n", sbuild_id);
return 0;
return fprintf(fp, "%s\n", sbuild_id);
}
static int filename__fprintf_build_id(const char *name, FILE *fp)
{
u8 build_id[BUILD_ID_SIZE];
char sbuild_id[BUILD_ID_SIZE * 2 + 1];
char sbuild_id[SBUILD_ID_SIZE];
int ret;
if (filename__read_build_id(name, build_id,
sizeof(build_id)) != sizeof(build_id))
return 0;
ret = filename__sprintf_build_id(name, sbuild_id);
if (ret != sizeof(sbuild_id))
return ret < 0 ? ret : -EINVAL;
build_id__sprintf(build_id, sizeof(build_id), sbuild_id);
return fprintf(fp, "%s\n", sbuild_id);
}
@ -63,7 +59,7 @@ static int perf_session__list_build_ids(bool force, bool with_hits)
/*
* See if this is an ELF file first:
*/
if (filename__fprintf_build_id(input_name, stdout))
if (filename__fprintf_build_id(input_name, stdout) > 0)
goto out;
session = perf_session__new(&file, false, &build_id__mark_dso_hit_ops);

View File

@ -722,6 +722,9 @@ static void data_process(void)
if (verbose || data__files_cnt > 2)
data__fprintf();
/* Don't sort callchain for perf diff */
perf_evsel__reset_sample_bit(evsel_base, CALLCHAIN);
hists__process(hists_base);
}
}

View File

@ -561,6 +561,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
.lost = perf_event__repipe,
.aux = perf_event__repipe,
.itrace_start = perf_event__repipe,
.context_switch = perf_event__repipe,
.read = perf_event__repipe_sample,
.throttle = perf_event__repipe,
.unthrottle = perf_event__repipe,

View File

@ -297,8 +297,7 @@ static void cleanup_params(void)
clear_perf_probe_event(params.events + i);
line_range__clear(&params.line_range);
free(params.target);
if (params.filter)
strfilter__delete(params.filter);
strfilter__delete(params.filter);
memset(&params, 0, sizeof(params));
}

View File

@ -771,12 +771,14 @@ static void callchain_debug(void)
callchain_param.dump_size);
}
int record_parse_callchain_opt(const struct option *opt __maybe_unused,
int record_parse_callchain_opt(const struct option *opt,
const char *arg,
int unset)
{
int ret;
struct record_opts *record = (struct record_opts *)opt->value;
record->callgraph_set = true;
callchain_param.enabled = !unset;
/* --no-call-graph */
@ -786,17 +788,20 @@ int record_parse_callchain_opt(const struct option *opt __maybe_unused,
return 0;
}
ret = parse_callchain_record_opt(arg);
ret = parse_callchain_record_opt(arg, &callchain_param);
if (!ret)
callchain_debug();
return ret;
}
int record_callchain_opt(const struct option *opt __maybe_unused,
int record_callchain_opt(const struct option *opt,
const char *arg __maybe_unused,
int unset __maybe_unused)
{
struct record_opts *record = (struct record_opts *)opt->value;
record->callgraph_set = true;
callchain_param.enabled = true;
if (callchain_param.record_mode == CALLCHAIN_NONE)
@ -1003,6 +1008,9 @@ struct option __record_options[] = {
parse_events_option),
OPT_CALLBACK(0, "filter", &record.evlist, "filter",
"event filter", parse_filter),
OPT_CALLBACK_NOOPT(0, "exclude-perf", &record.evlist,
NULL, "don't record events from perf itself",
exclude_perf),
OPT_STRING('p', "pid", &record.opts.target.pid, "pid",
"record events on existing process id"),
OPT_STRING('t', "tid", &record.opts.target.tid, "tid",
@ -1041,7 +1049,9 @@ struct option __record_options[] = {
OPT_BOOLEAN('s', "stat", &record.opts.inherit_stat,
"per thread counts"),
OPT_BOOLEAN('d', "data", &record.opts.sample_address, "Record the sample addresses"),
OPT_BOOLEAN('T', "timestamp", &record.opts.sample_time, "Record the sample timestamps"),
OPT_BOOLEAN_SET('T', "timestamp", &record.opts.sample_time,
&record.opts.sample_time_set,
"Record the sample timestamps"),
OPT_BOOLEAN('P', "period", &record.opts.period, "Record the sample period"),
OPT_BOOLEAN('n', "no-samples", &record.opts.no_samples,
"don't sample"),
@ -1081,6 +1091,8 @@ struct option __record_options[] = {
"opts", "AUX area tracing Snapshot Mode", ""),
OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout,
"per thread proc mmap processing timeout in ms"),
OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
"Record context switch events"),
OPT_END()
};
@ -1108,6 +1120,11 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
" system-wide mode\n");
usage_with_options(record_usage, record_options);
}
if (rec->opts.record_switch_events &&
!perf_can_record_switch_events()) {
ui__error("kernel does not support recording context switch events (--switch-events option)\n");
usage_with_options(record_usage, record_options);
}
if (!rec->itr) {
rec->itr = auxtrace_record__init(rec->evlist, &err);

View File

@ -53,6 +53,7 @@ struct report {
bool mem_mode;
bool header;
bool header_only;
bool nonany_branch_mode;
int max_stack;
struct perf_read_values show_threads_values;
const char *pretty_printing_style;
@ -102,6 +103,9 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter,
if (!ui__has_annotation())
return 0;
hist__account_cycles(iter->sample->branch_stack, al, iter->sample,
rep->nonany_branch_mode);
if (sort__mode == SORT_MODE__BRANCH) {
bi = he->branch_info;
err = addr_map_symbol__inc_samples(&bi->from, evsel->idx);
@ -258,6 +262,12 @@ static int report__setup_sample_type(struct report *rep)
else
callchain_param.record_mode = CALLCHAIN_FP;
}
/* ??? handle more cases than just ANY? */
if (!(perf_evlist__combined_branch_type(session->evlist) &
PERF_SAMPLE_BRANCH_ANY))
rep->nonany_branch_mode = true;
return 0;
}
@ -306,6 +316,11 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report
if (evname != NULL)
ret += fprintf(fp, " of event '%s'", evname);
if (symbol_conf.show_ref_callgraph &&
strstr(evname, "call-graph=no")) {
ret += fprintf(fp, ", show reference callgraph");
}
if (rep->mem_mode) {
ret += fprintf(fp, "\n# Total weight : %" PRIu64, nr_events);
ret += fprintf(fp, "\n# Sort order : %s", sort_order ? : default_mem_sort_order);
@ -728,6 +743,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
"Instruction Tracing options",
itrace_parse_synth_opts),
OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
"Show full source file name path for source lines"),
OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
"Show callgraph from reference event"),
OPT_END()
};
struct perf_data_file file = {

View File

@ -623,6 +623,7 @@ struct perf_script {
struct perf_session *session;
bool show_task_events;
bool show_mmap_events;
bool show_switch_events;
};
static int process_attr(struct perf_tool *tool, union perf_event *event,
@ -661,7 +662,7 @@ static int process_comm_event(struct perf_tool *tool,
struct thread *thread;
struct perf_script *script = container_of(tool, struct perf_script, tool);
struct perf_session *session = script->session;
struct perf_evsel *evsel = perf_evlist__first(session->evlist);
struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
int ret = -1;
thread = machine__findnew_thread(machine, event->comm.pid, event->comm.tid);
@ -695,7 +696,7 @@ static int process_fork_event(struct perf_tool *tool,
struct thread *thread;
struct perf_script *script = container_of(tool, struct perf_script, tool);
struct perf_session *session = script->session;
struct perf_evsel *evsel = perf_evlist__first(session->evlist);
struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
if (perf_event__process_fork(tool, event, sample, machine) < 0)
return -1;
@ -727,7 +728,7 @@ static int process_exit_event(struct perf_tool *tool,
struct thread *thread;
struct perf_script *script = container_of(tool, struct perf_script, tool);
struct perf_session *session = script->session;
struct perf_evsel *evsel = perf_evlist__first(session->evlist);
struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
thread = machine__findnew_thread(machine, event->fork.pid, event->fork.tid);
if (thread == NULL) {
@ -759,7 +760,7 @@ static int process_mmap_event(struct perf_tool *tool,
struct thread *thread;
struct perf_script *script = container_of(tool, struct perf_script, tool);
struct perf_session *session = script->session;
struct perf_evsel *evsel = perf_evlist__first(session->evlist);
struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
if (perf_event__process_mmap(tool, event, sample, machine) < 0)
return -1;
@ -790,7 +791,7 @@ static int process_mmap2_event(struct perf_tool *tool,
struct thread *thread;
struct perf_script *script = container_of(tool, struct perf_script, tool);
struct perf_session *session = script->session;
struct perf_evsel *evsel = perf_evlist__first(session->evlist);
struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
if (perf_event__process_mmap2(tool, event, sample, machine) < 0)
return -1;
@ -813,6 +814,32 @@ static int process_mmap2_event(struct perf_tool *tool,
return 0;
}
static int process_switch_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
struct machine *machine)
{
struct thread *thread;
struct perf_script *script = container_of(tool, struct perf_script, tool);
struct perf_session *session = script->session;
struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
if (perf_event__process_switch(tool, event, sample, machine) < 0)
return -1;
thread = machine__findnew_thread(machine, sample->pid,
sample->tid);
if (thread == NULL) {
pr_debug("problem processing SWITCH event, skipping it.\n");
return -1;
}
print_sample_start(sample, thread, evsel);
perf_event__fprintf(event, stdout);
thread__put(thread);
return 0;
}
static void sig_handler(int sig __maybe_unused)
{
session_done = 1;
@ -834,6 +861,8 @@ static int __cmd_script(struct perf_script *script)
script->tool.mmap = process_mmap_event;
script->tool.mmap2 = process_mmap2_event;
}
if (script->show_switch_events)
script->tool.context_switch = process_switch_event;
ret = perf_session__process_events(script->session);
@ -1532,6 +1561,22 @@ static int have_cmd(int argc, const char **argv)
return 0;
}
static void script__setup_sample_type(struct perf_script *script)
{
struct perf_session *session = script->session;
u64 sample_type = perf_evlist__combined_sample_type(session->evlist);
if (symbol_conf.use_callchain || symbol_conf.cumulate_callchain) {
if ((sample_type & PERF_SAMPLE_REGS_USER) &&
(sample_type & PERF_SAMPLE_STACK_USER))
callchain_param.record_mode = CALLCHAIN_DWARF;
else if (sample_type & PERF_SAMPLE_BRANCH_STACK)
callchain_param.record_mode = CALLCHAIN_LBR;
else
callchain_param.record_mode = CALLCHAIN_FP;
}
}
int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
{
bool show_full_info = false;
@ -1618,10 +1663,19 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
"Show the fork/comm/exit events"),
OPT_BOOLEAN('\0', "show-mmap-events", &script.show_mmap_events,
"Show the mmap events"),
OPT_BOOLEAN('\0', "show-switch-events", &script.show_switch_events,
"Show context switch events (if recorded)"),
OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
"Instruction Tracing options",
itrace_parse_synth_opts),
OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
"Show full source file name path for source lines"),
OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
"Enable symbol demangling"),
OPT_BOOLEAN(0, "demangle-kernel", &symbol_conf.demangle_kernel,
"Enable kernel symbol demangling"),
OPT_END()
};
const char * const script_subcommands[] = { "record", "report", NULL };
@ -1816,6 +1870,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
goto out_delete;
script.session = session;
script__setup_sample_type(&script);
session->itrace_synth_opts = &itrace_synth_opts;
@ -1830,6 +1885,14 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
else
symbol_conf.use_callchain = false;
if (session->tevent.pevent &&
pevent_set_function_resolver(session->tevent.pevent,
machine__resolve_kernel_addr,
&session->machines.host) < 0) {
pr_err("%s: failed to set libtraceevent function resolver\n", __func__);
return -1;
}
if (generate_script_lang) {
struct stat perf_stat;
int input;

View File

@ -58,6 +58,7 @@
#include "util/cpumap.h"
#include "util/thread.h"
#include "util/thread_map.h"
#include "util/counts.h"
#include <stdlib.h>
#include <sys/prctl.h>
@ -101,8 +102,6 @@ static struct target target = {
static int run_count = 1;
static bool no_inherit = false;
static bool scale = true;
static enum aggr_mode aggr_mode = AGGR_GLOBAL;
static volatile pid_t child_pid = -1;
static bool null_run = false;
static int detailed_run = 0;
@ -112,11 +111,9 @@ static int big_num_opt = -1;
static const char *csv_sep = NULL;
static bool csv_output = false;
static bool group = false;
static FILE *output = NULL;
static const char *pre_cmd = NULL;
static const char *post_cmd = NULL;
static bool sync_run = false;
static unsigned int interval = 0;
static unsigned int initial_delay = 0;
static unsigned int unit_width = 4; /* strlen("unit") */
static bool forever = false;
@ -126,6 +123,11 @@ static int (*aggr_get_id)(struct cpu_map *m, int cpu);
static volatile int done = 0;
static struct perf_stat_config stat_config = {
.aggr_mode = AGGR_GLOBAL,
.scale = true,
};
static inline void diff_timespec(struct timespec *r, struct timespec *a,
struct timespec *b)
{
@ -148,7 +150,7 @@ static int create_perf_stat_counter(struct perf_evsel *evsel)
{
struct perf_event_attr *attr = &evsel->attr;
if (scale)
if (stat_config.scale)
attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
PERF_FORMAT_TOTAL_TIME_RUNNING;
@ -178,142 +180,6 @@ static inline int nsec_counter(struct perf_evsel *evsel)
return 0;
}
static void zero_per_pkg(struct perf_evsel *counter)
{
if (counter->per_pkg_mask)
memset(counter->per_pkg_mask, 0, MAX_NR_CPUS);
}
static int check_per_pkg(struct perf_evsel *counter, int cpu, bool *skip)
{
unsigned long *mask = counter->per_pkg_mask;
struct cpu_map *cpus = perf_evsel__cpus(counter);
int s;
*skip = false;
if (!counter->per_pkg)
return 0;
if (cpu_map__empty(cpus))
return 0;
if (!mask) {
mask = zalloc(MAX_NR_CPUS);
if (!mask)
return -ENOMEM;
counter->per_pkg_mask = mask;
}
s = cpu_map__get_socket(cpus, cpu);
if (s < 0)
return -1;
*skip = test_and_set_bit(s, mask) == 1;
return 0;
}
static int
process_counter_values(struct perf_evsel *evsel, int cpu, int thread,
struct perf_counts_values *count)
{
struct perf_counts_values *aggr = &evsel->counts->aggr;
static struct perf_counts_values zero;
bool skip = false;
if (check_per_pkg(evsel, cpu, &skip)) {
pr_err("failed to read per-pkg counter\n");
return -1;
}
if (skip)
count = &zero;
switch (aggr_mode) {
case AGGR_THREAD:
case AGGR_CORE:
case AGGR_SOCKET:
case AGGR_NONE:
if (!evsel->snapshot)
perf_evsel__compute_deltas(evsel, cpu, thread, count);
perf_counts_values__scale(count, scale, NULL);
if (aggr_mode == AGGR_NONE)
perf_stat__update_shadow_stats(evsel, count->values, cpu);
break;
case AGGR_GLOBAL:
aggr->val += count->val;
if (scale) {
aggr->ena += count->ena;
aggr->run += count->run;
}
default:
break;
}
return 0;
}
static int process_counter_maps(struct perf_evsel *counter)
{
int nthreads = thread_map__nr(counter->threads);
int ncpus = perf_evsel__nr_cpus(counter);
int cpu, thread;
if (counter->system_wide)
nthreads = 1;
for (thread = 0; thread < nthreads; thread++) {
for (cpu = 0; cpu < ncpus; cpu++) {
if (process_counter_values(counter, cpu, thread,
perf_counts(counter->counts, cpu, thread)))
return -1;
}
}
return 0;
}
static int process_counter(struct perf_evsel *counter)
{
struct perf_counts_values *aggr = &counter->counts->aggr;
struct perf_stat *ps = counter->priv;
u64 *count = counter->counts->aggr.values;
int i, ret;
aggr->val = aggr->ena = aggr->run = 0;
init_stats(ps->res_stats);
if (counter->per_pkg)
zero_per_pkg(counter);
ret = process_counter_maps(counter);
if (ret)
return ret;
if (aggr_mode != AGGR_GLOBAL)
return 0;
if (!counter->snapshot)
perf_evsel__compute_deltas(counter, -1, -1, aggr);
perf_counts_values__scale(aggr, scale, &counter->counts->scaled);
for (i = 0; i < 3; i++)
update_stats(&ps->res_stats[i], count[i]);
if (verbose) {
fprintf(output, "%s: %" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
perf_evsel__name(counter), count[0], count[1], count[2]);
}
/*
* Save the full runtime - to allow normalization during printout:
*/
perf_stat__update_shadow_stats(counter, count, 0);
return 0;
}
/*
* Read out the results of a single counter:
* do not aggregate counts across CPUs in system-wide mode
@ -351,7 +217,7 @@ static void read_counters(bool close_counters)
if (read_counter(counter))
pr_warning("failed to read counter %s\n", counter->name);
if (process_counter(counter))
if (perf_stat_process_counter(&stat_config, counter))
pr_warning("failed to process counter %s\n", counter->name);
if (close_counters) {
@ -402,6 +268,7 @@ static void workload_exec_failed_signal(int signo __maybe_unused, siginfo_t *inf
static int __run_perf_stat(int argc, const char **argv)
{
int interval = stat_config.interval;
char msg[512];
unsigned long long t0, t1;
struct perf_evsel *counter;
@ -545,13 +412,13 @@ static int run_perf_stat(int argc, const char **argv)
static void print_running(u64 run, u64 ena)
{
if (csv_output) {
fprintf(output, "%s%" PRIu64 "%s%.2f",
fprintf(stat_config.output, "%s%" PRIu64 "%s%.2f",
csv_sep,
run,
csv_sep,
ena ? 100.0 * run / ena : 100.0);
} else if (run != ena) {
fprintf(output, " (%.2f%%)", 100.0 * run / ena);
fprintf(stat_config.output, " (%.2f%%)", 100.0 * run / ena);
}
}
@ -560,9 +427,9 @@ static void print_noise_pct(double total, double avg)
double pct = rel_stddev_stats(total, avg);
if (csv_output)
fprintf(output, "%s%.2f%%", csv_sep, pct);
fprintf(stat_config.output, "%s%.2f%%", csv_sep, pct);
else if (pct)
fprintf(output, " ( +-%6.2f%% )", pct);
fprintf(stat_config.output, " ( +-%6.2f%% )", pct);
}
static void print_noise(struct perf_evsel *evsel, double avg)
@ -578,9 +445,9 @@ static void print_noise(struct perf_evsel *evsel, double avg)
static void aggr_printout(struct perf_evsel *evsel, int id, int nr)
{
switch (aggr_mode) {
switch (stat_config.aggr_mode) {
case AGGR_CORE:
fprintf(output, "S%d-C%*d%s%*d%s",
fprintf(stat_config.output, "S%d-C%*d%s%*d%s",
cpu_map__id_to_socket(id),
csv_output ? 0 : -8,
cpu_map__id_to_cpu(id),
@ -590,7 +457,7 @@ static void aggr_printout(struct perf_evsel *evsel, int id, int nr)
csv_sep);
break;
case AGGR_SOCKET:
fprintf(output, "S%*d%s%*d%s",
fprintf(stat_config.output, "S%*d%s%*d%s",
csv_output ? 0 : -5,
id,
csv_sep,
@ -599,12 +466,12 @@ static void aggr_printout(struct perf_evsel *evsel, int id, int nr)
csv_sep);
break;
case AGGR_NONE:
fprintf(output, "CPU%*d%s",
fprintf(stat_config.output, "CPU%*d%s",
csv_output ? 0 : -4,
perf_evsel__cpus(evsel)->map[id], csv_sep);
break;
case AGGR_THREAD:
fprintf(output, "%*s-%*d%s",
fprintf(stat_config.output, "%*s-%*d%s",
csv_output ? 0 : 16,
thread_map__comm(evsel->threads, id),
csv_output ? 0 : -8,
@ -619,6 +486,7 @@ static void aggr_printout(struct perf_evsel *evsel, int id, int nr)
static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)
{
FILE *output = stat_config.output;
double msecs = avg / 1e6;
const char *fmt_v, *fmt_n;
char name[25];
@ -643,7 +511,7 @@ static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)
if (evsel->cgrp)
fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
if (csv_output || interval)
if (csv_output || stat_config.interval)
return;
if (perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK))
@ -655,6 +523,7 @@ static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)
static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
{
FILE *output = stat_config.output;
double sc = evsel->scale;
const char *fmt;
int cpu = cpu_map__id_to_cpu(id);
@ -670,7 +539,7 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
aggr_printout(evsel, id, nr);
if (aggr_mode == AGGR_GLOBAL)
if (stat_config.aggr_mode == AGGR_GLOBAL)
cpu = 0;
fprintf(output, fmt, avg, csv_sep);
@ -685,16 +554,18 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
if (evsel->cgrp)
fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
if (csv_output || interval)
if (csv_output || stat_config.interval)
return;
perf_stat__print_shadow_stats(output, evsel, avg, cpu, aggr_mode);
perf_stat__print_shadow_stats(output, evsel, avg, cpu,
stat_config.aggr_mode);
}
static void print_aggr(char *prefix)
{
FILE *output = stat_config.output;
struct perf_evsel *counter;
int cpu, cpu2, s, s2, id, nr;
int cpu, s, s2, id, nr;
double uval;
u64 ena, run, val;
@ -707,8 +578,7 @@ static void print_aggr(char *prefix)
val = ena = run = 0;
nr = 0;
for (cpu = 0; cpu < perf_evsel__nr_cpus(counter); cpu++) {
cpu2 = perf_evsel__cpus(counter)->map[cpu];
s2 = aggr_get_id(evsel_list->cpus, cpu2);
s2 = aggr_get_id(perf_evsel__cpus(counter), cpu);
if (s2 != id)
continue;
val += perf_counts(counter->counts, cpu, 0)->val;
@ -761,6 +631,7 @@ static void print_aggr(char *prefix)
static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
{
FILE *output = stat_config.output;
int nthreads = thread_map__nr(counter->threads);
int ncpus = cpu_map__nr(counter->cpus);
int cpu, thread;
@ -799,6 +670,7 @@ static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
*/
static void print_counter_aggr(struct perf_evsel *counter, char *prefix)
{
FILE *output = stat_config.output;
struct perf_stat *ps = counter->priv;
double avg = avg_stats(&ps->res_stats[0]);
int scaled = counter->counts->scaled;
@ -850,6 +722,7 @@ static void print_counter_aggr(struct perf_evsel *counter, char *prefix)
*/
static void print_counter(struct perf_evsel *counter, char *prefix)
{
FILE *output = stat_config.output;
u64 ena, run, val;
double uval;
int cpu;
@ -904,12 +777,13 @@ static void print_counter(struct perf_evsel *counter, char *prefix)
static void print_interval(char *prefix, struct timespec *ts)
{
FILE *output = stat_config.output;
static int num_print_interval;
sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
if (num_print_interval == 0 && !csv_output) {
switch (aggr_mode) {
switch (stat_config.aggr_mode) {
case AGGR_SOCKET:
fprintf(output, "# time socket cpus counts %*s events\n", unit_width, "unit");
break;
@ -934,6 +808,7 @@ static void print_interval(char *prefix, struct timespec *ts)
static void print_header(int argc, const char **argv)
{
FILE *output = stat_config.output;
int i;
fflush(stdout);
@ -963,6 +838,8 @@ static void print_header(int argc, const char **argv)
static void print_footer(void)
{
FILE *output = stat_config.output;
if (!null_run)
fprintf(output, "\n");
fprintf(output, " %17.9f seconds time elapsed",
@ -977,6 +854,7 @@ static void print_footer(void)
static void print_counters(struct timespec *ts, int argc, const char **argv)
{
int interval = stat_config.interval;
struct perf_evsel *counter;
char buf[64], *prefix = NULL;
@ -985,7 +863,7 @@ static void print_counters(struct timespec *ts, int argc, const char **argv)
else
print_header(argc, argv);
switch (aggr_mode) {
switch (stat_config.aggr_mode) {
case AGGR_CORE:
case AGGR_SOCKET:
print_aggr(prefix);
@ -1009,14 +887,14 @@ static void print_counters(struct timespec *ts, int argc, const char **argv)
if (!interval && !csv_output)
print_footer();
fflush(output);
fflush(stat_config.output);
}
static volatile int signr = -1;
static void skip_signal(int signo)
{
if ((child_pid == -1) || interval)
if ((child_pid == -1) || stat_config.interval)
done = 1;
signr = signo;
@ -1064,7 +942,7 @@ static int stat__set_big_num(const struct option *opt __maybe_unused,
static int perf_stat_init_aggr_mode(void)
{
switch (aggr_mode) {
switch (stat_config.aggr_mode) {
case AGGR_SOCKET:
if (cpu_map__build_socket_map(evsel_list->cpus, &aggr_map)) {
perror("cannot build socket map");
@ -1270,7 +1148,7 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
"system-wide collection from all CPUs"),
OPT_BOOLEAN('g', "group", &group,
"put the counters into a counter group"),
OPT_BOOLEAN('c', "scale", &scale, "scale/normalize counters"),
OPT_BOOLEAN('c', "scale", &stat_config.scale, "scale/normalize counters"),
OPT_INCR('v', "verbose", &verbose,
"be more verbose (show counter open errors, etc)"),
OPT_INTEGER('r', "repeat", &run_count,
@ -1286,7 +1164,7 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
stat__set_big_num),
OPT_STRING('C', "cpu", &target.cpu_list, "cpu",
"list of cpus to monitor in system-wide"),
OPT_SET_UINT('A', "no-aggr", &aggr_mode,
OPT_SET_UINT('A', "no-aggr", &stat_config.aggr_mode,
"disable CPU count aggregation", AGGR_NONE),
OPT_STRING('x', "field-separator", &csv_sep, "separator",
"print counts with custom separator"),
@ -1300,13 +1178,13 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
"command to run prior to the measured command"),
OPT_STRING(0, "post", &post_cmd, "command",
"command to run after to the measured command"),
OPT_UINTEGER('I', "interval-print", &interval,
OPT_UINTEGER('I', "interval-print", &stat_config.interval,
"print counts at regular interval in ms (>= 100)"),
OPT_SET_UINT(0, "per-socket", &aggr_mode,
OPT_SET_UINT(0, "per-socket", &stat_config.aggr_mode,
"aggregate counts per processor socket", AGGR_SOCKET),
OPT_SET_UINT(0, "per-core", &aggr_mode,
OPT_SET_UINT(0, "per-core", &stat_config.aggr_mode,
"aggregate counts per physical processor core", AGGR_CORE),
OPT_SET_UINT(0, "per-thread", &aggr_mode,
OPT_SET_UINT(0, "per-thread", &stat_config.aggr_mode,
"aggregate counts per thread", AGGR_THREAD),
OPT_UINTEGER('D', "delay", &initial_delay,
"ms to wait before starting measurement after program start"),
@ -1318,6 +1196,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
};
int status = -EINVAL, run_idx;
const char *mode;
FILE *output = stderr;
unsigned int interval;
setlocale(LC_ALL, "");
@ -1328,7 +1208,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
argc = parse_options(argc, argv, options, stat_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
output = stderr;
interval = stat_config.interval;
if (output_name && strcmp(output_name, "-"))
output = NULL;
@ -1365,6 +1246,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
}
}
stat_config.output = output;
if (csv_sep) {
csv_output = true;
if (!strcmp(csv_sep, "\\t"))
@ -1399,7 +1282,7 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
run_count = 1;
}
if ((aggr_mode == AGGR_THREAD) && !target__has_task(&target)) {
if ((stat_config.aggr_mode == AGGR_THREAD) && !target__has_task(&target)) {
fprintf(stderr, "The --per-thread option is only available "
"when monitoring via -p -t options.\n");
parse_options_usage(NULL, options, "p", 1);
@ -1411,7 +1294,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
* no_aggr, cgroup are for system-wide only
* --per-thread is aggregated per thread, we dont mix it with cpu mode
*/
if (((aggr_mode != AGGR_GLOBAL && aggr_mode != AGGR_THREAD) || nr_cgroups) &&
if (((stat_config.aggr_mode != AGGR_GLOBAL &&
stat_config.aggr_mode != AGGR_THREAD) || nr_cgroups) &&
!target__has_cpu(&target)) {
fprintf(stderr, "both cgroup and no-aggregation "
"modes only available in system-wide mode\n");
@ -1444,7 +1328,7 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
* Initialize thread_map with comm names,
* so we could print it out on output.
*/
if (aggr_mode == AGGR_THREAD)
if (stat_config.aggr_mode == AGGR_THREAD)
thread_map__read_comms(evsel_list->threads);
if (interval && interval < 100) {

View File

@ -40,6 +40,7 @@
#include "util/xyarray.h"
#include "util/sort.h"
#include "util/intlist.h"
#include "util/parse-branch-options.h"
#include "arch/common.h"
#include "util/debug.h"
@ -695,6 +696,8 @@ static int hist_iter__top_callback(struct hist_entry_iter *iter,
perf_top__record_precise_ip(top, he, evsel->idx, ip);
}
hist__account_cycles(iter->sample->branch_stack, al, iter->sample,
!(top->record_opts.branch_stack & PERF_SAMPLE_BRANCH_ANY));
return 0;
}
@ -1171,6 +1174,12 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
"don't try to adjust column width, use these fixed values"),
OPT_UINTEGER(0, "proc-map-timeout", &opts->proc_map_timeout,
"per thread proc mmap processing timeout in ms"),
OPT_CALLBACK_NOOPT('b', "branch-any", &opts->branch_stack,
"branch any", "sample any taken branches",
parse_branch_stack),
OPT_CALLBACK('j', "branch-filter", &opts->branch_stack,
"branch filter mask", "branch stack filter modes",
parse_branch_stack),
OPT_END()
};
const char * const top_usage[] = {

View File

@ -1,8 +1,27 @@
/*
* builtin-trace.c
*
* Builtin 'trace' command:
*
* Display a continuously updated trace of any workload, CPU, specific PID,
* system wide, etc. Default format is loosely strace like, but any other
* event may be specified using --event.
*
* Copyright (C) 2012, 2013, 2014, 2015 Red Hat Inc, Arnaldo Carvalho de Melo <acme@redhat.com>
*
* Initially based on the 'trace' prototype by Thomas Gleixner:
*
* http://lwn.net/Articles/415728/ ("Announcing a new utility: 'trace'")
*
* Released under the GPL v2. (and only v2, not any later version)
*/
#include <traceevent/event-parse.h>
#include "builtin.h"
#include "util/color.h"
#include "util/debug.h"
#include "util/evlist.h"
#include "util/exec_cmd.h"
#include "util/machine.h"
#include "util/session.h"
#include "util/thread.h"
@ -26,6 +45,7 @@
#ifndef MADV_HWPOISON
# define MADV_HWPOISON 100
#endif
#ifndef MADV_MERGEABLE
@ -247,42 +267,6 @@ out_delete:
({ struct syscall_tp *fields = evsel->priv; \
fields->name.pointer(&fields->name, sample); })
static int perf_evlist__add_syscall_newtp(struct perf_evlist *evlist,
void *sys_enter_handler,
void *sys_exit_handler)
{
int ret = -1;
struct perf_evsel *sys_enter, *sys_exit;
sys_enter = perf_evsel__syscall_newtp("sys_enter", sys_enter_handler);
if (sys_enter == NULL)
goto out;
if (perf_evsel__init_sc_tp_ptr_field(sys_enter, args))
goto out_delete_sys_enter;
sys_exit = perf_evsel__syscall_newtp("sys_exit", sys_exit_handler);
if (sys_exit == NULL)
goto out_delete_sys_enter;
if (perf_evsel__init_sc_tp_uint_field(sys_exit, ret))
goto out_delete_sys_exit;
perf_evlist__add(evlist, sys_enter);
perf_evlist__add(evlist, sys_exit);
ret = 0;
out:
return ret;
out_delete_sys_exit:
perf_evsel__delete_priv(sys_exit);
out_delete_sys_enter:
perf_evsel__delete_priv(sys_enter);
goto out;
}
struct syscall_arg {
unsigned long val;
struct thread *thread;
@ -604,6 +588,15 @@ static DEFINE_STRARRAY_OFFSET(epoll_ctl_ops, 1);
static const char *itimers[] = { "REAL", "VIRTUAL", "PROF", };
static DEFINE_STRARRAY(itimers);
static const char *keyctl_options[] = {
"GET_KEYRING_ID", "JOIN_SESSION_KEYRING", "UPDATE", "REVOKE", "CHOWN",
"SETPERM", "DESCRIBE", "CLEAR", "LINK", "UNLINK", "SEARCH", "READ",
"INSTANTIATE", "NEGATE", "SET_REQKEY_KEYRING", "SET_TIMEOUT",
"ASSUME_AUTHORITY", "GET_SECURITY", "SESSION_TO_PARENT", "REJECT",
"INSTANTIATE_IOV", "INVALIDATE", "GET_PERSISTENT",
};
static DEFINE_STRARRAY(keyctl_options);
static const char *whences[] = { "SET", "CUR", "END",
#ifdef SEEK_DATA
"DATA",
@ -634,7 +627,8 @@ static DEFINE_STRARRAY(sighow);
static const char *clockid[] = {
"REALTIME", "MONOTONIC", "PROCESS_CPUTIME_ID", "THREAD_CPUTIME_ID",
"MONOTONIC_RAW", "REALTIME_COARSE", "MONOTONIC_COARSE",
"MONOTONIC_RAW", "REALTIME_COARSE", "MONOTONIC_COARSE", "BOOTTIME",
"REALTIME_ALARM", "BOOTTIME_ALARM", "SGI_CYCLE", "TAI"
};
static DEFINE_STRARRAY(clockid);
@ -779,6 +773,11 @@ static size_t syscall_arg__scnprintf_access_mode(char *bf, size_t size,
#define SCA_ACCMODE syscall_arg__scnprintf_access_mode
static size_t syscall_arg__scnprintf_filename(char *bf, size_t size,
struct syscall_arg *arg);
#define SCA_FILENAME syscall_arg__scnprintf_filename
static size_t syscall_arg__scnprintf_open_flags(char *bf, size_t size,
struct syscall_arg *arg)
{
@ -1006,14 +1005,23 @@ static struct syscall_fmt {
bool hexret;
} syscall_fmts[] = {
{ .name = "access", .errmsg = true,
.arg_scnprintf = { [1] = SCA_ACCMODE, /* mode */ }, },
.arg_scnprintf = { [0] = SCA_FILENAME, /* filename */
[1] = SCA_ACCMODE, /* mode */ }, },
{ .name = "arch_prctl", .errmsg = true, .alias = "prctl", },
{ .name = "brk", .hexret = true,
.arg_scnprintf = { [0] = SCA_HEX, /* brk */ }, },
{ .name = "chdir", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
{ .name = "chmod", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
{ .name = "chroot", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
{ .name = "clock_gettime", .errmsg = true, STRARRAY(0, clk_id, clockid), },
{ .name = "close", .errmsg = true,
.arg_scnprintf = { [0] = SCA_CLOSE_FD, /* fd */ }, },
{ .name = "connect", .errmsg = true, },
{ .name = "creat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "dup", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "dup2", .errmsg = true,
@ -1024,7 +1032,8 @@ static struct syscall_fmt {
{ .name = "eventfd2", .errmsg = true,
.arg_scnprintf = { [1] = SCA_EFD_FLAGS, /* flags */ }, },
{ .name = "faccessat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
[1] = SCA_FILENAME, /* filename */ }, },
{ .name = "fadvise64", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "fallocate", .errmsg = true,
@ -1034,11 +1043,13 @@ static struct syscall_fmt {
{ .name = "fchmod", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "fchmodat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
.arg_scnprintf = { [0] = SCA_FDAT, /* fd */
[1] = SCA_FILENAME, /* filename */ }, },
{ .name = "fchown", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "fchownat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
.arg_scnprintf = { [0] = SCA_FDAT, /* fd */
[1] = SCA_FILENAME, /* filename */ }, },
{ .name = "fcntl", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */
[1] = SCA_STRARRAY, /* cmd */ },
@ -1053,7 +1064,8 @@ static struct syscall_fmt {
{ .name = "fstat", .errmsg = true, .alias = "newfstat",
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "fstatat", .errmsg = true, .alias = "newfstatat",
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
[1] = SCA_FILENAME, /* filename */ }, },
{ .name = "fstatfs", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "fsync", .errmsg = true,
@ -1063,13 +1075,18 @@ static struct syscall_fmt {
{ .name = "futex", .errmsg = true,
.arg_scnprintf = { [1] = SCA_FUTEX_OP, /* op */ }, },
{ .name = "futimesat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
.arg_scnprintf = { [0] = SCA_FDAT, /* fd */
[1] = SCA_FILENAME, /* filename */ }, },
{ .name = "getdents", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "getdents64", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "getitimer", .errmsg = true, STRARRAY(0, which, itimers), },
{ .name = "getrlimit", .errmsg = true, STRARRAY(0, resource, rlimit_resources), },
{ .name = "getxattr", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "inotify_add_watch", .errmsg = true,
.arg_scnprintf = { [1] = SCA_FILENAME, /* pathname */ }, },
{ .name = "ioctl", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */
#if defined(__i386__) || defined(__x86_64__)
@ -1082,22 +1099,44 @@ static struct syscall_fmt {
#else
[2] = SCA_HEX, /* arg */ }, },
#endif
{ .name = "keyctl", .errmsg = true, STRARRAY(0, option, keyctl_options), },
{ .name = "kill", .errmsg = true,
.arg_scnprintf = { [1] = SCA_SIGNUM, /* sig */ }, },
{ .name = "lchown", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
{ .name = "lgetxattr", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "linkat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
{ .name = "listxattr", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "llistxattr", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "lremovexattr", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "lseek", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */
[2] = SCA_STRARRAY, /* whence */ },
.arg_parm = { [2] = &strarray__whences, /* whence */ }, },
{ .name = "lstat", .errmsg = true, .alias = "newlstat", },
{ .name = "lsetxattr", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "lstat", .errmsg = true, .alias = "newlstat",
.arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
{ .name = "lsxattr", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "madvise", .errmsg = true,
.arg_scnprintf = { [0] = SCA_HEX, /* start */
[2] = SCA_MADV_BHV, /* behavior */ }, },
{ .name = "mkdir", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "mkdirat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
.arg_scnprintf = { [0] = SCA_FDAT, /* fd */
[1] = SCA_FILENAME, /* pathname */ }, },
{ .name = "mknod", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
{ .name = "mknodat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* fd */ }, },
.arg_scnprintf = { [0] = SCA_FDAT, /* fd */
[1] = SCA_FILENAME, /* filename */ }, },
{ .name = "mlock", .errmsg = true,
.arg_scnprintf = { [0] = SCA_HEX, /* addr */ }, },
{ .name = "mlockall", .errmsg = true,
@ -1110,6 +1149,8 @@ static struct syscall_fmt {
{ .name = "mprotect", .errmsg = true,
.arg_scnprintf = { [0] = SCA_HEX, /* start */
[2] = SCA_MMAP_PROT, /* prot */ }, },
{ .name = "mq_unlink", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* u_name */ }, },
{ .name = "mremap", .hexret = true,
.arg_scnprintf = { [0] = SCA_HEX, /* addr */
[3] = SCA_MREMAP_FLAGS, /* flags */
@ -1121,14 +1162,17 @@ static struct syscall_fmt {
{ .name = "name_to_handle_at", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
{ .name = "newfstatat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
[1] = SCA_FILENAME, /* filename */ }, },
{ .name = "open", .errmsg = true,
.arg_scnprintf = { [1] = SCA_OPEN_FLAGS, /* flags */ }, },
.arg_scnprintf = { [0] = SCA_FILENAME, /* filename */
[1] = SCA_OPEN_FLAGS, /* flags */ }, },
{ .name = "open_by_handle_at", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
[2] = SCA_OPEN_FLAGS, /* flags */ }, },
{ .name = "openat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
[1] = SCA_FILENAME, /* filename */
[2] = SCA_OPEN_FLAGS, /* flags */ }, },
{ .name = "perf_event_open", .errmsg = true,
.arg_scnprintf = { [1] = SCA_INT, /* pid */
@ -1150,18 +1194,28 @@ static struct syscall_fmt {
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "read", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "readlink", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* path */ }, },
{ .name = "readlinkat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
[1] = SCA_FILENAME, /* pathname */ }, },
{ .name = "readv", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "recvfrom", .errmsg = true,
.arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, },
.arg_scnprintf = { [0] = SCA_FD, /* fd */
[3] = SCA_MSG_FLAGS, /* flags */ }, },
{ .name = "recvmmsg", .errmsg = true,
.arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, },
.arg_scnprintf = { [0] = SCA_FD, /* fd */
[3] = SCA_MSG_FLAGS, /* flags */ }, },
{ .name = "recvmsg", .errmsg = true,
.arg_scnprintf = { [2] = SCA_MSG_FLAGS, /* flags */ }, },
.arg_scnprintf = { [0] = SCA_FD, /* fd */
[2] = SCA_MSG_FLAGS, /* flags */ }, },
{ .name = "removexattr", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "renameat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
{ .name = "rmdir", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "rt_sigaction", .errmsg = true,
.arg_scnprintf = { [0] = SCA_SIGNUM, /* sig */ }, },
{ .name = "rt_sigprocmask", .errmsg = true, STRARRAY(0, how, sighow), },
@ -1171,13 +1225,18 @@ static struct syscall_fmt {
.arg_scnprintf = { [2] = SCA_SIGNUM, /* sig */ }, },
{ .name = "select", .errmsg = true, .timeout = true, },
{ .name = "sendmmsg", .errmsg = true,
.arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, },
.arg_scnprintf = { [0] = SCA_FD, /* fd */
[3] = SCA_MSG_FLAGS, /* flags */ }, },
{ .name = "sendmsg", .errmsg = true,
.arg_scnprintf = { [2] = SCA_MSG_FLAGS, /* flags */ }, },
.arg_scnprintf = { [0] = SCA_FD, /* fd */
[2] = SCA_MSG_FLAGS, /* flags */ }, },
{ .name = "sendto", .errmsg = true,
.arg_scnprintf = { [3] = SCA_MSG_FLAGS, /* flags */ }, },
.arg_scnprintf = { [0] = SCA_FD, /* fd */
[3] = SCA_MSG_FLAGS, /* flags */ }, },
{ .name = "setitimer", .errmsg = true, STRARRAY(0, which, itimers), },
{ .name = "setrlimit", .errmsg = true, STRARRAY(0, resource, rlimit_resources), },
{ .name = "setxattr", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "shutdown", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "socket", .errmsg = true,
@ -1188,18 +1247,35 @@ static struct syscall_fmt {
.arg_scnprintf = { [0] = SCA_STRARRAY, /* family */
[1] = SCA_SK_TYPE, /* type */ },
.arg_parm = { [0] = &strarray__socket_families, /* family */ }, },
{ .name = "stat", .errmsg = true, .alias = "newstat", },
{ .name = "stat", .errmsg = true, .alias = "newstat",
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "statfs", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* pathname */ }, },
{ .name = "swapoff", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* specialfile */ }, },
{ .name = "swapon", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* specialfile */ }, },
{ .name = "symlinkat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
{ .name = "tgkill", .errmsg = true,
.arg_scnprintf = { [2] = SCA_SIGNUM, /* sig */ }, },
{ .name = "tkill", .errmsg = true,
.arg_scnprintf = { [1] = SCA_SIGNUM, /* sig */ }, },
{ .name = "truncate", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* path */ }, },
{ .name = "uname", .errmsg = true, .alias = "newuname", },
{ .name = "unlinkat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */
[1] = SCA_FILENAME, /* pathname */ }, },
{ .name = "utime", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
{ .name = "utimensat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dirfd */ }, },
.arg_scnprintf = { [0] = SCA_FDAT, /* dirfd */
[1] = SCA_FILENAME, /* filename */ }, },
{ .name = "utimes", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FILENAME, /* filename */ }, },
{ .name = "vmsplice", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "write", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FD, /* fd */ }, },
{ .name = "writev", .errmsg = true,
@ -1223,7 +1299,6 @@ struct syscall {
int nr_args;
struct format_field *args;
const char *name;
bool filtered;
bool is_exit;
struct syscall_fmt *fmt;
size_t (**arg_scnprintf)(char *bf, size_t size, struct syscall_arg *arg);
@ -1244,6 +1319,11 @@ static size_t fprintf_duration(unsigned long t, FILE *fp)
return printed + fprintf(fp, "): ");
}
/**
* filename.ptr: The filename char pointer that will be vfs_getname'd
* filename.entry_str_pos: Where to insert the string translated from
* filename.ptr by the vfs_getname tracepoint/kprobe.
*/
struct thread_trace {
u64 entry_time;
u64 exit_time;
@ -1252,6 +1332,13 @@ struct thread_trace {
unsigned long pfmaj, pfmin;
char *entry_str;
double runtime_ms;
struct {
unsigned long ptr;
short int entry_str_pos;
bool pending_open;
unsigned int namelen;
char *name;
} filename;
struct {
int max;
char **table;
@ -1298,6 +1385,8 @@ fail:
#define TRACE_PFMAJ (1 << 0)
#define TRACE_PFMIN (1 << 1)
static const size_t trace__entry_str_size = 2048;
struct trace {
struct perf_tool tool;
struct {
@ -1307,6 +1396,10 @@ struct trace {
struct {
int max;
struct syscall *table;
struct {
struct perf_evsel *sys_enter,
*sys_exit;
} events;
} syscalls;
struct record_opts opts;
struct perf_evlist *evlist;
@ -1316,7 +1409,10 @@ struct trace {
FILE *output;
unsigned long nr_events;
struct strlist *ev_qualifier;
const char *last_vfs_getname;
struct {
size_t nr;
int *entries;
} ev_qualifier_ids;
struct intlist *tid_list;
struct intlist *pid_list;
struct {
@ -1340,6 +1436,7 @@ struct trace {
bool show_tool_stats;
bool trace_syscalls;
bool force;
bool vfs_getname;
int trace_pgfaults;
};
@ -1443,6 +1540,27 @@ static size_t syscall_arg__scnprintf_close_fd(char *bf, size_t size,
return printed;
}
static void thread__set_filename_pos(struct thread *thread, const char *bf,
unsigned long ptr)
{
struct thread_trace *ttrace = thread__priv(thread);
ttrace->filename.ptr = ptr;
ttrace->filename.entry_str_pos = bf - ttrace->entry_str;
}
static size_t syscall_arg__scnprintf_filename(char *bf, size_t size,
struct syscall_arg *arg)
{
unsigned long ptr = arg->val;
if (!arg->trace->vfs_getname)
return scnprintf(bf, size, "%#x", ptr);
thread__set_filename_pos(arg->thread, bf, ptr);
return 0;
}
static bool trace__filter_duration(struct trace *trace, double t)
{
return t < (trace->duration_filter * NSEC_PER_MSEC);
@ -1517,6 +1635,9 @@ static int trace__symbols_init(struct trace *trace, struct perf_evlist *evlist)
if (trace->host == NULL)
return -ENOMEM;
if (trace_event__register_resolver(trace->host, machine__resolve_kernel_addr) < 0)
return -errno;
err = __machine__synthesize_threads(trace->host, &trace->tool, &trace->opts.target,
evlist->threads, trace__tool_process, false,
trace->opts.proc_map_timeout);
@ -1578,19 +1699,6 @@ static int trace__read_syscall_info(struct trace *trace, int id)
sc = trace->syscalls.table + id;
sc->name = name;
if (trace->ev_qualifier) {
bool in = strlist__find(trace->ev_qualifier, name) != NULL;
if (!(in ^ trace->not_ev_qualifier)) {
sc->filtered = true;
/*
* No need to do read tracepoint information since this will be
* filtered out.
*/
return 0;
}
}
sc->fmt = syscall_fmt__find(sc->name);
snprintf(tp_name, sizeof(tp_name), "sys_enter_%s", sc->name);
@ -1619,13 +1727,27 @@ static int trace__read_syscall_info(struct trace *trace, int id)
static int trace__validate_ev_qualifier(struct trace *trace)
{
int err = 0;
int err = 0, i;
struct str_node *pos;
trace->ev_qualifier_ids.nr = strlist__nr_entries(trace->ev_qualifier);
trace->ev_qualifier_ids.entries = malloc(trace->ev_qualifier_ids.nr *
sizeof(trace->ev_qualifier_ids.entries[0]));
if (trace->ev_qualifier_ids.entries == NULL) {
fputs("Error:\tNot enough memory for allocating events qualifier ids\n",
trace->output);
err = -EINVAL;
goto out;
}
i = 0;
strlist__for_each(pos, trace->ev_qualifier) {
const char *sc = pos->s;
int id = audit_name_to_syscall(sc, trace->audit.machine);
if (audit_name_to_syscall(sc, trace->audit.machine) < 0) {
if (id < 0) {
if (err == 0) {
fputs("Error:\tInvalid syscall ", trace->output);
err = -EINVAL;
@ -1635,13 +1757,17 @@ static int trace__validate_ev_qualifier(struct trace *trace)
fputs(sc, trace->output);
}
trace->ev_qualifier_ids.entries[i++] = id;
}
if (err < 0) {
fputs("\nHint:\ttry 'perf list syscalls:sys_enter_*'"
"\nHint:\tand: 'man syscalls'\n", trace->output);
zfree(&trace->ev_qualifier_ids.entries);
trace->ev_qualifier_ids.nr = 0;
}
out:
return err;
}
@ -1833,9 +1959,6 @@ static int trace__sys_enter(struct trace *trace, struct perf_evsel *evsel,
if (sc == NULL)
return -1;
if (sc->filtered)
return 0;
thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
ttrace = thread__trace(thread, trace->output);
if (ttrace == NULL)
@ -1844,7 +1967,7 @@ static int trace__sys_enter(struct trace *trace, struct perf_evsel *evsel,
args = perf_evsel__sc_tp_ptr(evsel, args, sample);
if (ttrace->entry_str == NULL) {
ttrace->entry_str = malloc(1024);
ttrace->entry_str = malloc(trace__entry_str_size);
if (!ttrace->entry_str)
goto out_put;
}
@ -1854,9 +1977,9 @@ static int trace__sys_enter(struct trace *trace, struct perf_evsel *evsel,
ttrace->entry_time = sample->time;
msg = ttrace->entry_str;
printed += scnprintf(msg + printed, 1024 - printed, "%s(", sc->name);
printed += scnprintf(msg + printed, trace__entry_str_size - printed, "%s(", sc->name);
printed += syscall__scnprintf_args(sc, msg + printed, 1024 - printed,
printed += syscall__scnprintf_args(sc, msg + printed, trace__entry_str_size - printed,
args, trace, thread);
if (sc->is_exit) {
@ -1864,8 +1987,11 @@ static int trace__sys_enter(struct trace *trace, struct perf_evsel *evsel,
trace__fprintf_entry_head(trace, thread, 1, sample->time, trace->output);
fprintf(trace->output, "%-70s\n", ttrace->entry_str);
}
} else
} else {
ttrace->entry_pending = true;
/* See trace__vfs_getname & trace__sys_exit */
ttrace->filename.pending_open = false;
}
if (trace->current != thread) {
thread__put(trace->current);
@ -1891,9 +2017,6 @@ static int trace__sys_exit(struct trace *trace, struct perf_evsel *evsel,
if (sc == NULL)
return -1;
if (sc->filtered)
return 0;
thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
ttrace = thread__trace(thread, trace->output);
if (ttrace == NULL)
@ -1904,9 +2027,9 @@ static int trace__sys_exit(struct trace *trace, struct perf_evsel *evsel,
ret = perf_evsel__sc_tp_uint(evsel, ret, sample);
if (id == trace->audit.open_id && ret >= 0 && trace->last_vfs_getname) {
trace__set_fd_pathname(thread, ret, trace->last_vfs_getname);
trace->last_vfs_getname = NULL;
if (id == trace->audit.open_id && ret >= 0 && ttrace->filename.pending_open) {
trace__set_fd_pathname(thread, ret, ttrace->filename.name);
ttrace->filename.pending_open = false;
++trace->stats.vfs_getname;
}
@ -1961,7 +2084,56 @@ static int trace__vfs_getname(struct trace *trace, struct perf_evsel *evsel,
union perf_event *event __maybe_unused,
struct perf_sample *sample)
{
trace->last_vfs_getname = perf_evsel__rawptr(evsel, sample, "pathname");
struct thread *thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
struct thread_trace *ttrace;
size_t filename_len, entry_str_len, to_move;
ssize_t remaining_space;
char *pos;
const char *filename = perf_evsel__rawptr(evsel, sample, "pathname");
if (!thread)
goto out;
ttrace = thread__priv(thread);
if (!ttrace)
goto out;
filename_len = strlen(filename);
if (ttrace->filename.namelen < filename_len) {
char *f = realloc(ttrace->filename.name, filename_len + 1);
if (f == NULL)
goto out;
ttrace->filename.namelen = filename_len;
ttrace->filename.name = f;
}
strcpy(ttrace->filename.name, filename);
ttrace->filename.pending_open = true;
if (!ttrace->filename.ptr)
goto out;
entry_str_len = strlen(ttrace->entry_str);
remaining_space = trace__entry_str_size - entry_str_len - 1; /* \0 */
if (remaining_space <= 0)
goto out;
if (filename_len > (size_t)remaining_space) {
filename += filename_len - remaining_space;
filename_len = remaining_space;
}
to_move = entry_str_len - ttrace->filename.entry_str_pos + 1; /* \0 */
pos = ttrace->entry_str + ttrace->filename.entry_str_pos;
memmove(pos + filename_len, pos, to_move);
memcpy(pos, filename, filename_len);
ttrace->filename.ptr = 0;
ttrace->filename.entry_str_pos = 0;
out:
return 0;
}
@ -2214,19 +2386,20 @@ static int trace__record(struct trace *trace, int argc, const char **argv)
static size_t trace__fprintf_thread_summary(struct trace *trace, FILE *fp);
static void perf_evlist__add_vfs_getname(struct perf_evlist *evlist)
static bool perf_evlist__add_vfs_getname(struct perf_evlist *evlist)
{
struct perf_evsel *evsel = perf_evsel__newtp("probe", "vfs_getname");
if (evsel == NULL)
return;
return false;
if (perf_evsel__field(evsel, "pathname") == NULL) {
perf_evsel__delete(evsel);
return;
return false;
}
evsel->handler = trace__vfs_getname;
perf_evlist__add(evlist, evsel);
return true;
}
static int perf_evlist__add_pgfault(struct perf_evlist *evlist,
@ -2283,9 +2456,68 @@ static void trace__handle_event(struct trace *trace, union perf_event *event, st
}
}
static int trace__add_syscall_newtp(struct trace *trace)
{
int ret = -1;
struct perf_evlist *evlist = trace->evlist;
struct perf_evsel *sys_enter, *sys_exit;
sys_enter = perf_evsel__syscall_newtp("sys_enter", trace__sys_enter);
if (sys_enter == NULL)
goto out;
if (perf_evsel__init_sc_tp_ptr_field(sys_enter, args))
goto out_delete_sys_enter;
sys_exit = perf_evsel__syscall_newtp("sys_exit", trace__sys_exit);
if (sys_exit == NULL)
goto out_delete_sys_enter;
if (perf_evsel__init_sc_tp_uint_field(sys_exit, ret))
goto out_delete_sys_exit;
perf_evlist__add(evlist, sys_enter);
perf_evlist__add(evlist, sys_exit);
trace->syscalls.events.sys_enter = sys_enter;
trace->syscalls.events.sys_exit = sys_exit;
ret = 0;
out:
return ret;
out_delete_sys_exit:
perf_evsel__delete_priv(sys_exit);
out_delete_sys_enter:
perf_evsel__delete_priv(sys_enter);
goto out;
}
static int trace__set_ev_qualifier_filter(struct trace *trace)
{
int err = -1;
char *filter = asprintf_expr_inout_ints("id", !trace->not_ev_qualifier,
trace->ev_qualifier_ids.nr,
trace->ev_qualifier_ids.entries);
if (filter == NULL)
goto out_enomem;
if (!perf_evsel__append_filter(trace->syscalls.events.sys_enter, "&&", filter))
err = perf_evsel__append_filter(trace->syscalls.events.sys_exit, "&&", filter);
free(filter);
out:
return err;
out_enomem:
errno = ENOMEM;
goto out;
}
static int trace__run(struct trace *trace, int argc, const char **argv)
{
struct perf_evlist *evlist = trace->evlist;
struct perf_evsel *evsel;
int err = -1, i;
unsigned long before;
const bool forks = argc > 0;
@ -2293,13 +2525,11 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
trace->live = true;
if (trace->trace_syscalls &&
perf_evlist__add_syscall_newtp(evlist, trace__sys_enter,
trace__sys_exit))
if (trace->trace_syscalls && trace__add_syscall_newtp(trace))
goto out_error_raw_syscalls;
if (trace->trace_syscalls)
perf_evlist__add_vfs_getname(evlist);
trace->vfs_getname = perf_evlist__add_vfs_getname(evlist);
if ((trace->trace_pgfaults & TRACE_PFMAJ) &&
perf_evlist__add_pgfault(evlist, PERF_COUNT_SW_PAGE_FAULTS_MAJ)) {
@ -2356,11 +2586,22 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
else if (thread_map__pid(evlist->threads, 0) == -1)
err = perf_evlist__set_filter_pid(evlist, getpid());
if (err < 0) {
printf("err=%d,%s\n", -err, strerror(-err));
exit(1);
if (err < 0)
goto out_error_mem;
if (trace->ev_qualifier_ids.nr > 0) {
err = trace__set_ev_qualifier_filter(trace);
if (err < 0)
goto out_errno;
pr_debug("event qualifier tracepoint filter: %s\n",
trace->syscalls.events.sys_exit->filter);
}
err = perf_evlist__apply_filters(evlist, &evsel);
if (err < 0)
goto out_error_apply_filters;
err = perf_evlist__mmap(evlist, trace->opts.mmap_pages, false);
if (err < 0)
goto out_error_mmap;
@ -2462,10 +2703,21 @@ out_error_open:
out_error:
fprintf(trace->output, "%s\n", errbuf);
goto out_delete_evlist;
out_error_apply_filters:
fprintf(trace->output,
"Failed to set filter \"%s\" on event %s with %d (%s)\n",
evsel->filter, perf_evsel__name(evsel), errno,
strerror_r(errno, errbuf, sizeof(errbuf)));
goto out_delete_evlist;
}
out_error_mem:
fprintf(trace->output, "Not enough memory to run!\n");
goto out_delete_evlist;
out_errno:
fprintf(trace->output, "errno=%d,%s\n", errno, strerror(errno));
goto out_delete_evlist;
}
static int trace__replay(struct trace *trace)
@ -2586,9 +2838,9 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
printed += fprintf(fp, "\n");
printed += fprintf(fp, " syscall calls min avg max stddev\n");
printed += fprintf(fp, " (msec) (msec) (msec) (%%)\n");
printed += fprintf(fp, " --------------- -------- --------- --------- --------- ------\n");
printed += fprintf(fp, " syscall calls total min avg max stddev\n");
printed += fprintf(fp, " (msec) (msec) (msec) (msec) (%%)\n");
printed += fprintf(fp, " --------------- -------- --------- --------- --------- --------- ------\n");
/* each int_node is a syscall */
while (inode) {
@ -2605,8 +2857,8 @@ static size_t thread__dump_stats(struct thread_trace *ttrace,
sc = &trace->syscalls.table[inode->i];
printed += fprintf(fp, " %-15s", sc->name);
printed += fprintf(fp, " %8" PRIu64 " %9.3f %9.3f",
n, min, avg);
printed += fprintf(fp, " %8" PRIu64 " %9.3f %9.3f %9.3f",
n, avg * n, min, avg);
printed += fprintf(fp, " %9.3f %9.2f%%\n", max, pct);
}
@ -2778,7 +3030,7 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
.mmap_pages = UINT_MAX,
.proc_map_timeout = 500,
},
.output = stdout,
.output = stderr,
.show_comm = true,
.trace_syscalls = true,
};
@ -2879,11 +3131,14 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
if (ev_qualifier_str != NULL) {
const char *s = ev_qualifier_str;
struct strlist_config slist_config = {
.dirname = system_path(STRACE_GROUPS_DIR),
};
trace.not_ev_qualifier = *s == '!';
if (trace.not_ev_qualifier)
++s;
trace.ev_qualifier = strlist__new(true, s);
trace.ev_qualifier = strlist__new(s, &slist_config);
if (trace.ev_qualifier == NULL) {
fputs("Not enough memory to parse event qualifier",
trace.output);

View File

@ -11,7 +11,7 @@ ifneq ($(obj-perf),)
obj-perf := $(abspath $(obj-perf))/
endif
$(shell echo -n > $(OUTPUT).config-detected)
$(shell printf "" > $(OUTPUT).config-detected)
detected = $(shell echo "$(1)=y" >> $(OUTPUT).config-detected)
detected_var = $(shell echo "$(1)=$($(1))" >> $(OUTPUT).config-detected)
@ -297,7 +297,11 @@ ifndef NO_LIBELF
else
CFLAGS += -DHAVE_DWARF_SUPPORT $(LIBDW_CFLAGS)
LDFLAGS += $(LIBDW_LDFLAGS)
EXTLIBS += -ldw
DWARFLIBS := -ldw
ifeq ($(findstring -static,${LDFLAGS}),-static)
DWARFLIBS += -lelf -lebl -lz -llzma -lbz2
endif
EXTLIBS += ${DWARFLIBS}
$(call detected,CONFIG_DWARF)
endif # PERF_HAVE_DWARF_REGS
endif # NO_DWARF
@ -644,6 +648,7 @@ infodir = share/info
perfexecdir = libexec/perf-core
sharedir = $(prefix)/share
template_dir = share/perf-core/templates
STRACE_GROUPS_DIR = share/perf-core/strace/groups
htmldir = share/doc/perf-doc
ifeq ($(prefix),/usr)
sysconfdir = /etc
@ -663,6 +668,7 @@ libdir = $(prefix)/$(lib)
# Shell quote (do not use $(call) to accommodate ancient setups);
ETC_PERFCONFIG_SQ = $(subst ','\'',$(ETC_PERFCONFIG))
STRACE_GROUPS_DIR_SQ = $(subst ','\'',$(STRACE_GROUPS_DIR))
DESTDIR_SQ = $(subst ','\'',$(DESTDIR))
bindir_SQ = $(subst ','\'',$(bindir))
mandir_SQ = $(subst ','\'',$(mandir))
@ -676,10 +682,13 @@ libdir_SQ = $(subst ','\'',$(libdir))
ifneq ($(filter /%,$(firstword $(perfexecdir))),)
perfexec_instdir = $(perfexecdir)
STRACE_GROUPS_INSTDIR = $(STRACE_GROUPS_DIR)
else
perfexec_instdir = $(prefix)/$(perfexecdir)
STRACE_GROUPS_INSTDIR = $(prefix)/$(STRACE_GROUPS_DIR)
endif
perfexec_instdir_SQ = $(subst ','\'',$(perfexec_instdir))
STRACE_GROUPS_INSTDIR_SQ = $(subst ','\'',$(STRACE_GROUPS_INSTDIR))
# If we install to $(HOME) we keep the traceevent default:
# $(HOME)/.traceevent/plugins
@ -713,6 +722,7 @@ $(call detected_var,htmldir_SQ)
$(call detected_var,infodir_SQ)
$(call detected_var,mandir_SQ)
$(call detected_var,ETC_PERFCONFIG_SQ)
$(call detected_var,STRACE_GROUPS_DIR_SQ)
$(call detected_var,prefix_SQ)
$(call detected_var,perfexecdir_SQ)
$(call detected_var,LIBDIR)

View File

@ -50,7 +50,7 @@ copy_kcore()
fi
rm -f perf.data.junk
("$PERF" record -o perf.data.junk $PERF_OPTIONS -- sleep 60) >/dev/null 2>/dev/null &
("$PERF" record -o perf.data.junk "${PERF_OPTIONS[@]}" -- sleep 60) >/dev/null 2>/dev/null &
PERF_PID=$!
# Need to make sure that perf has started
@ -160,18 +160,18 @@ record()
echo "*** WARNING *** /proc/sys/kernel/kptr_restrict prevents access to kernel addresses" >&2
fi
if echo "$PERF_OPTIONS" | grep -q ' -a \|^-a \| -a$\|^-a$\| --all-cpus \|^--all-cpus \| --all-cpus$\|^--all-cpus$' ; then
if echo "${PERF_OPTIONS[@]}" | grep -q ' -a \|^-a \| -a$\|^-a$\| --all-cpus \|^--all-cpus \| --all-cpus$\|^--all-cpus$' ; then
echo "*** WARNING *** system-wide tracing without root access will not be able to read all necessary information from /proc" >&2
fi
if echo "$PERF_OPTIONS" | grep -q 'intel_pt\|intel_bts\| -I\|^-I' ; then
if echo "${PERF_OPTIONS[@]}" | grep -q 'intel_pt\|intel_bts\| -I\|^-I' ; then
if [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt -1 ] ; then
echo "*** WARNING *** /proc/sys/kernel/perf_event_paranoid restricts buffer size and tracepoint (sched_switch) use" >&2
fi
if echo "$PERF_OPTIONS" | grep -q ' --per-thread \|^--per-thread \| --per-thread$\|^--per-thread$' ; then
if echo "${PERF_OPTIONS[@]}" | grep -q ' --per-thread \|^--per-thread \| --per-thread$\|^--per-thread$' ; then
true
elif echo "$PERF_OPTIONS" | grep -q ' -t \|^-t \| -t$\|^-t$' ; then
elif echo "${PERF_OPTIONS[@]}" | grep -q ' -t \|^-t \| -t$\|^-t$' ; then
true
elif [ ! -r /sys/kernel/debug -o ! -x /sys/kernel/debug ] ; then
echo "*** WARNING *** /sys/kernel/debug permissions prevent tracepoint (sched_switch) use" >&2
@ -193,8 +193,8 @@ record()
mkdir "$PERF_DATA_DIR"
echo "$PERF record -o $PERF_DATA_DIR/perf.data $PERF_OPTIONS -- $*"
"$PERF" record -o "$PERF_DATA_DIR/perf.data" $PERF_OPTIONS -- $* || true
echo "$PERF record -o $PERF_DATA_DIR/perf.data ${PERF_OPTIONS[@]} -- $@"
"$PERF" record -o "$PERF_DATA_DIR/perf.data" "${PERF_OPTIONS[@]}" -- "$@" || true
if rmdir "$PERF_DATA_DIR" > /dev/null 2>/dev/null ; then
exit 1
@ -209,8 +209,8 @@ subcommand()
{
find_perf
check_buildid_cache_permissions
echo "$PERF $PERF_SUB_COMMAND -i $PERF_DATA_DIR/perf.data --kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms $*"
"$PERF" $PERF_SUB_COMMAND -i "$PERF_DATA_DIR/perf.data" "--kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms" $*
echo "$PERF $PERF_SUB_COMMAND -i $PERF_DATA_DIR/perf.data --kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms $@"
"$PERF" $PERF_SUB_COMMAND -i "$PERF_DATA_DIR/perf.data" "--kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms" "$@"
}
if [ "$1" = "fix_buildid_cache_permissions" ] ; then
@ -234,7 +234,7 @@ fi
case "$PERF_SUB_COMMAND" in
"record")
while [ "$1" != "--" ] ; do
PERF_OPTIONS+="$1 "
PERF_OPTIONS+=("$1")
shift || break
done
if [ "$1" != "--" ] ; then
@ -242,16 +242,16 @@ case "$PERF_SUB_COMMAND" in
usage
fi
shift
record $*
record "$@"
;;
"script")
subcommand $*
subcommand "$@"
;;
"report")
subcommand $*
subcommand "$@"
;;
"inject")
subcommand $*
subcommand "$@"
;;
*)
usage

View File

@ -231,7 +231,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
(*argc)--;
} else if (!prefixcmp(cmd, CMD_DEBUGFS_DIR)) {
perf_debugfs_set_path(cmd + strlen(CMD_DEBUGFS_DIR));
fprintf(stderr, "dir: %s\n", debugfs_mountpoint);
fprintf(stderr, "dir: %s\n", tracing_path);
if (envchanged)
*envchanged = 1;
} else if (!strcmp(cmd, "--list-cmds")) {

View File

@ -51,11 +51,14 @@ struct record_opts {
bool sample_address;
bool sample_weight;
bool sample_time;
bool sample_time_set;
bool callgraph_set;
bool period;
bool sample_intr_regs;
bool running_time;
bool full_auxtrace;
bool auxtrace_snapshot_mode;
bool record_switch_events;
unsigned int freq;
unsigned int mmap_pages;
unsigned int auxtrace_mmap_pages;

View File

@ -18,10 +18,20 @@ import perf
def main():
cpus = perf.cpu_map()
threads = perf.thread_map()
evsel = perf.evsel(task = 1, comm = 1, mmap = 0,
evsel = perf.evsel(type = perf.TYPE_SOFTWARE,
config = perf.COUNT_SW_DUMMY,
task = 1, comm = 1, mmap = 0, freq = 0,
wakeup_events = 1, watermark = 1,
sample_id_all = 1,
sample_type = perf.SAMPLE_PERIOD | perf.SAMPLE_TID | perf.SAMPLE_CPU)
"""What we want are just the PERF_RECORD_ lifetime events for threads,
using the default, PERF_TYPE_HARDWARE + PERF_COUNT_HW_CYCLES & freq=1
(the default), makes perf reenable irq_vectors:local_timer_entry, when
disabling nohz, not good for some use cases where all we want is to get
threads comes and goes... So use (perf.TYPE_SOFTWARE, perf_COUNT_SW_DUMMY,
freq=0) instead."""
evsel.open(cpus = cpus, threads = threads);
evlist = perf.evlist(cpus, threads)
evlist.add(evsel)

View File

@ -0,0 +1,2 @@
#!/bin/bash
perf record -e compaction:mm_compaction_begin -e compaction:mm_compaction_end -e compaction:mm_compaction_migratepages -e compaction:mm_compaction_isolate_migratepages -e compaction:mm_compaction_isolate_freepages $@

View File

@ -0,0 +1,4 @@
#!/bin/bash
#description: display time taken by mm compaction
#args: [-h] [-u] [-p|-pv] [-t | [-m] [-fs] [-ms]] [pid|pid-range|comm-regex]
perf script -s "$PERF_EXEC_PATH"/scripts/python/compaction-times.py $@

View File

@ -0,0 +1,327 @@
#!/usr/bin/python2
# call-graph-from-postgresql.py: create call-graph from postgresql database
# Copyright (c) 2014, Intel Corporation.
#
# This program is free software; you can redistribute it and/or modify it
# under the terms and conditions of the GNU General Public License,
# version 2, as published by the Free Software Foundation.
#
# This program is distributed in the hope it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
# more details.
# To use this script you will need to have exported data using the
# export-to-postgresql.py script. Refer to that script for details.
#
# Following on from the example in the export-to-postgresql.py script, a
# call-graph can be displayed for the pt_example database like this:
#
# python tools/perf/scripts/python/call-graph-from-postgresql.py pt_example
#
# Note this script supports connecting to remote databases by setting hostname,
# port, username, password, and dbname e.g.
#
# python tools/perf/scripts/python/call-graph-from-postgresql.py "hostname=myhost username=myuser password=mypassword dbname=pt_example"
#
# The result is a GUI window with a tree representing a context-sensitive
# call-graph. Expanding a couple of levels of the tree and adjusting column
# widths to suit will display something like:
#
# Call Graph: pt_example
# Call Path Object Count Time(ns) Time(%) Branch Count Branch Count(%)
# v- ls
# v- 2638:2638
# v- _start ld-2.19.so 1 10074071 100.0 211135 100.0
# |- unknown unknown 1 13198 0.1 1 0.0
# >- _dl_start ld-2.19.so 1 1400980 13.9 19637 9.3
# >- _d_linit_internal ld-2.19.so 1 448152 4.4 11094 5.3
# v-__libc_start_main@plt ls 1 8211741 81.5 180397 85.4
# >- _dl_fixup ld-2.19.so 1 7607 0.1 108 0.1
# >- __cxa_atexit libc-2.19.so 1 11737 0.1 10 0.0
# >- __libc_csu_init ls 1 10354 0.1 10 0.0
# |- _setjmp libc-2.19.so 1 0 0.0 4 0.0
# v- main ls 1 8182043 99.6 180254 99.9
#
# Points to note:
# The top level is a command name (comm)
# The next level is a thread (pid:tid)
# Subsequent levels are functions
# 'Count' is the number of calls
# 'Time' is the elapsed time until the function returns
# Percentages are relative to the level above
# 'Branch Count' is the total number of branches for that function and all
# functions that it calls
import sys
from PySide.QtCore import *
from PySide.QtGui import *
from PySide.QtSql import *
from decimal import *
class TreeItem():
def __init__(self, db, row, parent_item):
self.db = db
self.row = row
self.parent_item = parent_item
self.query_done = False;
self.child_count = 0
self.child_items = []
self.data = ["", "", "", "", "", "", ""]
self.comm_id = 0
self.thread_id = 0
self.call_path_id = 1
self.branch_count = 0
self.time = 0
if not parent_item:
self.setUpRoot()
def setUpRoot(self):
self.query_done = True
query = QSqlQuery(self.db)
ret = query.exec_('SELECT id, comm FROM comms')
if not ret:
raise Exception("Query failed: " + query.lastError().text())
while query.next():
if not query.value(0):
continue
child_item = TreeItem(self.db, self.child_count, self)
self.child_items.append(child_item)
self.child_count += 1
child_item.setUpLevel1(query.value(0), query.value(1))
def setUpLevel1(self, comm_id, comm):
self.query_done = True;
self.comm_id = comm_id
self.data[0] = comm
self.child_items = []
self.child_count = 0
query = QSqlQuery(self.db)
ret = query.exec_('SELECT thread_id, ( SELECT pid FROM threads WHERE id = thread_id ), ( SELECT tid FROM threads WHERE id = thread_id ) FROM comm_threads WHERE comm_id = ' + str(comm_id))
if not ret:
raise Exception("Query failed: " + query.lastError().text())
while query.next():
child_item = TreeItem(self.db, self.child_count, self)
self.child_items.append(child_item)
self.child_count += 1
child_item.setUpLevel2(comm_id, query.value(0), query.value(1), query.value(2))
def setUpLevel2(self, comm_id, thread_id, pid, tid):
self.comm_id = comm_id
self.thread_id = thread_id
self.data[0] = str(pid) + ":" + str(tid)
def getChildItem(self, row):
return self.child_items[row]
def getParentItem(self):
return self.parent_item
def getRow(self):
return self.row
def timePercent(self, b):
if not self.time:
return "0.0"
x = (b * Decimal(100)) / self.time
return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
def branchPercent(self, b):
if not self.branch_count:
return "0.0"
x = (b * Decimal(100)) / self.branch_count
return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
def addChild(self, call_path_id, name, dso, count, time, branch_count):
child_item = TreeItem(self.db, self.child_count, self)
child_item.comm_id = self.comm_id
child_item.thread_id = self.thread_id
child_item.call_path_id = call_path_id
child_item.branch_count = branch_count
child_item.time = time
child_item.data[0] = name
if dso == "[kernel.kallsyms]":
dso = "[kernel]"
child_item.data[1] = dso
child_item.data[2] = str(count)
child_item.data[3] = str(time)
child_item.data[4] = self.timePercent(time)
child_item.data[5] = str(branch_count)
child_item.data[6] = self.branchPercent(branch_count)
self.child_items.append(child_item)
self.child_count += 1
def selectCalls(self):
self.query_done = True;
query = QSqlQuery(self.db)
ret = query.exec_('SELECT id, call_path_id, branch_count, call_time, return_time, '
'( SELECT name FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ), '
'( SELECT short_name FROM dsos WHERE id = ( SELECT dso_id FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ) ), '
'( SELECT ip FROM call_paths where id = call_path_id ) '
'FROM calls WHERE parent_call_path_id = ' + str(self.call_path_id) + ' AND comm_id = ' + str(self.comm_id) + ' AND thread_id = ' + str(self.thread_id) +
'ORDER BY call_path_id')
if not ret:
raise Exception("Query failed: " + query.lastError().text())
last_call_path_id = 0
name = ""
dso = ""
count = 0
branch_count = 0
total_branch_count = 0
time = 0
total_time = 0
while query.next():
if query.value(1) == last_call_path_id:
count += 1
branch_count += query.value(2)
time += query.value(4) - query.value(3)
else:
if count:
self.addChild(last_call_path_id, name, dso, count, time, branch_count)
last_call_path_id = query.value(1)
name = query.value(5)
dso = query.value(6)
count = 1
total_branch_count += branch_count
total_time += time
branch_count = query.value(2)
time = query.value(4) - query.value(3)
if count:
self.addChild(last_call_path_id, name, dso, count, time, branch_count)
total_branch_count += branch_count
total_time += time
# Top level does not have time or branch count, so fix that here
if total_branch_count > self.branch_count:
self.branch_count = total_branch_count
if self.branch_count:
for child_item in self.child_items:
child_item.data[6] = self.branchPercent(child_item.branch_count)
if total_time > self.time:
self.time = total_time
if self.time:
for child_item in self.child_items:
child_item.data[4] = self.timePercent(child_item.time)
def childCount(self):
if not self.query_done:
self.selectCalls()
return self.child_count
def columnCount(self):
return 7
def columnHeader(self, column):
headers = ["Call Path", "Object", "Count ", "Time (ns) ", "Time (%) ", "Branch Count ", "Branch Count (%) "]
return headers[column]
def getData(self, column):
return self.data[column]
class TreeModel(QAbstractItemModel):
def __init__(self, db, parent=None):
super(TreeModel, self).__init__(parent)
self.db = db
self.root = TreeItem(db, 0, None)
def columnCount(self, parent):
return self.root.columnCount()
def rowCount(self, parent):
if parent.isValid():
parent_item = parent.internalPointer()
else:
parent_item = self.root
return parent_item.childCount()
def headerData(self, section, orientation, role):
if role == Qt.TextAlignmentRole:
if section > 1:
return Qt.AlignRight
if role != Qt.DisplayRole:
return None
if orientation != Qt.Horizontal:
return None
return self.root.columnHeader(section)
def parent(self, child):
child_item = child.internalPointer()
if child_item is self.root:
return QModelIndex()
parent_item = child_item.getParentItem()
return self.createIndex(parent_item.getRow(), 0, parent_item)
def index(self, row, column, parent):
if parent.isValid():
parent_item = parent.internalPointer()
else:
parent_item = self.root
child_item = parent_item.getChildItem(row)
return self.createIndex(row, column, child_item)
def data(self, index, role):
if role == Qt.TextAlignmentRole:
if index.column() > 1:
return Qt.AlignRight
if role != Qt.DisplayRole:
return None
index_item = index.internalPointer()
return index_item.getData(index.column())
class MainWindow(QMainWindow):
def __init__(self, db, dbname, parent=None):
super(MainWindow, self).__init__(parent)
self.setObjectName("MainWindow")
self.setWindowTitle("Call Graph: " + dbname)
self.move(100, 100)
self.resize(800, 600)
style = self.style()
icon = style.standardIcon(QStyle.SP_MessageBoxInformation)
self.setWindowIcon(icon);
self.model = TreeModel(db)
self.view = QTreeView()
self.view.setModel(self.model)
self.setCentralWidget(self.view)
if __name__ == '__main__':
if (len(sys.argv) < 2):
print >> sys.stderr, "Usage is: call-graph-from-postgresql.py <database name>"
raise Exception("Too few arguments")
dbname = sys.argv[1]
db = QSqlDatabase.addDatabase('QPSQL')
opts = dbname.split()
for opt in opts:
if '=' in opt:
opt = opt.split('=')
if opt[0] == 'hostname':
db.setHostName(opt[1])
elif opt[0] == 'port':
db.setPort(int(opt[1]))
elif opt[0] == 'username':
db.setUserName(opt[1])
elif opt[0] == 'password':
db.setPassword(opt[1])
elif opt[0] == 'dbname':
dbname = opt[1]
else:
dbname = opt
db.setDatabaseName(dbname)
if not db.open():
raise Exception("Failed to open database " + dbname + " error: " + db.lastError().text())
app = QApplication(sys.argv)
window = MainWindow(db, dbname)
window.show()
err = app.exec_()
db.close()
sys.exit(err)

View File

@ -0,0 +1,311 @@
# report time spent in compaction
# Licensed under the terms of the GNU GPL License version 2
# testing:
# 'echo 1 > /proc/sys/vm/compact_memory' to force compaction of all zones
import os
import sys
import re
import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
usage = "usage: perf script report compaction-times.py -- [-h] [-u] [-p|-pv] [-t | [-m] [-fs] [-ms]] [pid|pid-range|comm-regex]\n"
class popt:
DISP_DFL = 0
DISP_PROC = 1
DISP_PROC_VERBOSE=2
class topt:
DISP_TIME = 0
DISP_MIG = 1
DISP_ISOLFREE = 2
DISP_ISOLMIG = 4
DISP_ALL = 7
class comm_filter:
def __init__(self, re):
self.re = re
def filter(self, pid, comm):
m = self.re.search(comm)
return m == None or m.group() == ""
class pid_filter:
def __init__(self, low, high):
self.low = (0 if low == "" else int(low))
self.high = (0 if high == "" else int(high))
def filter(self, pid, comm):
return not (pid >= self.low and (self.high == 0 or pid <= self.high))
def set_type(t):
global opt_disp
opt_disp = (t if opt_disp == topt.DISP_ALL else opt_disp|t)
def ns(sec, nsec):
return (sec * 1000000000) + nsec
def time(ns):
return "%dns" % ns if opt_ns else "%dus" % (round(ns, -3) / 1000)
class pair:
def __init__(self, aval, bval, alabel = None, blabel = None):
self.alabel = alabel
self.blabel = blabel
self.aval = aval
self.bval = bval
def __add__(self, rhs):
self.aval += rhs.aval
self.bval += rhs.bval
return self
def __str__(self):
return "%s=%d %s=%d" % (self.alabel, self.aval, self.blabel, self.bval)
class cnode:
def __init__(self, ns):
self.ns = ns
self.migrated = pair(0, 0, "moved", "failed")
self.fscan = pair(0,0, "scanned", "isolated")
self.mscan = pair(0,0, "scanned", "isolated")
def __add__(self, rhs):
self.ns += rhs.ns
self.migrated += rhs.migrated
self.fscan += rhs.fscan
self.mscan += rhs.mscan
return self
def __str__(self):
prev = 0
s = "%s " % time(self.ns)
if (opt_disp & topt.DISP_MIG):
s += "migration: %s" % self.migrated
prev = 1
if (opt_disp & topt.DISP_ISOLFREE):
s += "%sfree_scanner: %s" % (" " if prev else "", self.fscan)
prev = 1
if (opt_disp & topt.DISP_ISOLMIG):
s += "%smigration_scanner: %s" % (" " if prev else "", self.mscan)
return s
def complete(self, secs, nsecs):
self.ns = ns(secs, nsecs) - self.ns
def increment(self, migrated, fscan, mscan):
if (migrated != None):
self.migrated += migrated
if (fscan != None):
self.fscan += fscan
if (mscan != None):
self.mscan += mscan
class chead:
heads = {}
val = cnode(0);
fobj = None
@classmethod
def add_filter(cls, filter):
cls.fobj = filter
@classmethod
def create_pending(cls, pid, comm, start_secs, start_nsecs):
filtered = 0
try:
head = cls.heads[pid]
filtered = head.is_filtered()
except KeyError:
if cls.fobj != None:
filtered = cls.fobj.filter(pid, comm)
head = cls.heads[pid] = chead(comm, pid, filtered)
if not filtered:
head.mark_pending(start_secs, start_nsecs)
@classmethod
def increment_pending(cls, pid, migrated, fscan, mscan):
head = cls.heads[pid]
if not head.is_filtered():
if head.is_pending():
head.do_increment(migrated, fscan, mscan)
else:
sys.stderr.write("missing start compaction event for pid %d\n" % pid)
@classmethod
def complete_pending(cls, pid, secs, nsecs):
head = cls.heads[pid]
if not head.is_filtered():
if head.is_pending():
head.make_complete(secs, nsecs)
else:
sys.stderr.write("missing start compaction event for pid %d\n" % pid)
@classmethod
def gen(cls):
if opt_proc != popt.DISP_DFL:
for i in cls.heads:
yield cls.heads[i]
@classmethod
def str(cls):
return cls.val
def __init__(self, comm, pid, filtered):
self.comm = comm
self.pid = pid
self.val = cnode(0)
self.pending = None
self.filtered = filtered
self.list = []
def __add__(self, rhs):
self.ns += rhs.ns
self.val += rhs.val
return self
def mark_pending(self, secs, nsecs):
self.pending = cnode(ns(secs, nsecs))
def do_increment(self, migrated, fscan, mscan):
self.pending.increment(migrated, fscan, mscan)
def make_complete(self, secs, nsecs):
self.pending.complete(secs, nsecs)
chead.val += self.pending
if opt_proc != popt.DISP_DFL:
self.val += self.pending
if opt_proc == popt.DISP_PROC_VERBOSE:
self.list.append(self.pending)
self.pending = None
def enumerate(self):
if opt_proc == popt.DISP_PROC_VERBOSE and not self.is_filtered():
for i, pelem in enumerate(self.list):
sys.stdout.write("%d[%s].%d: %s\n" % (self.pid, self.comm, i+1, pelem))
def is_pending(self):
return self.pending != None
def is_filtered(self):
return self.filtered
def display(self):
if not self.is_filtered():
sys.stdout.write("%d[%s]: %s\n" % (self.pid, self.comm, self.val))
def trace_end():
sys.stdout.write("total: %s\n" % chead.str())
for i in chead.gen():
i.display(),
i.enumerate()
def compaction__mm_compaction_migratepages(event_name, context, common_cpu,
common_secs, common_nsecs, common_pid, common_comm,
common_callchain, nr_migrated, nr_failed):
chead.increment_pending(common_pid,
pair(nr_migrated, nr_failed), None, None)
def compaction__mm_compaction_isolate_freepages(event_name, context, common_cpu,
common_secs, common_nsecs, common_pid, common_comm,
common_callchain, start_pfn, end_pfn, nr_scanned, nr_taken):
chead.increment_pending(common_pid,
None, pair(nr_scanned, nr_taken), None)
def compaction__mm_compaction_isolate_migratepages(event_name, context, common_cpu,
common_secs, common_nsecs, common_pid, common_comm,
common_callchain, start_pfn, end_pfn, nr_scanned, nr_taken):
chead.increment_pending(common_pid,
None, None, pair(nr_scanned, nr_taken))
def compaction__mm_compaction_end(event_name, context, common_cpu,
common_secs, common_nsecs, common_pid, common_comm,
common_callchain, zone_start, migrate_start, free_start, zone_end,
sync, status):
chead.complete_pending(common_pid, common_secs, common_nsecs)
def compaction__mm_compaction_begin(event_name, context, common_cpu,
common_secs, common_nsecs, common_pid, common_comm,
common_callchain, zone_start, migrate_start, free_start, zone_end,
sync):
chead.create_pending(common_pid, common_comm, common_secs, common_nsecs)
def pr_help():
global usage
sys.stdout.write(usage)
sys.stdout.write("\n")
sys.stdout.write("-h display this help\n")
sys.stdout.write("-p display by process\n")
sys.stdout.write("-pv display by process (verbose)\n")
sys.stdout.write("-t display stall times only\n")
sys.stdout.write("-m display stats for migration\n")
sys.stdout.write("-fs display stats for free scanner\n")
sys.stdout.write("-ms display stats for migration scanner\n")
sys.stdout.write("-u display results in microseconds (default nanoseconds)\n")
comm_re = None
pid_re = None
pid_regex = "^(\d*)-(\d*)$|^(\d*)$"
opt_proc = popt.DISP_DFL
opt_disp = topt.DISP_ALL
opt_ns = True
argc = len(sys.argv) - 1
if argc >= 1:
pid_re = re.compile(pid_regex)
for i, opt in enumerate(sys.argv[1:]):
if opt[0] == "-":
if opt == "-h":
pr_help()
exit(0);
elif opt == "-p":
opt_proc = popt.DISP_PROC
elif opt == "-pv":
opt_proc = popt.DISP_PROC_VERBOSE
elif opt == '-u':
opt_ns = False
elif opt == "-t":
set_type(topt.DISP_TIME)
elif opt == "-m":
set_type(topt.DISP_MIG)
elif opt == "-fs":
set_type(topt.DISP_ISOLFREE)
elif opt == "-ms":
set_type(topt.DISP_ISOLMIG)
else:
sys.exit(usage)
elif i == argc - 1:
m = pid_re.search(opt)
if m != None and m.group() != "":
if m.group(3) != None:
f = pid_filter(m.group(3), m.group(3))
else:
f = pid_filter(m.group(1), m.group(2))
else:
try:
comm_re=re.compile(opt)
except:
sys.stderr.write("invalid regex '%s'" % opt)
sys.exit(usage)
f = comm_filter(comm_re)
chead.add_filter(f)

View File

@ -15,6 +15,53 @@ import sys
import struct
import datetime
# To use this script you will need to have installed package python-pyside which
# provides LGPL-licensed Python bindings for Qt. You will also need the package
# libqt4-sql-psql for Qt postgresql support.
#
# The script assumes postgresql is running on the local machine and that the
# user has postgresql permissions to create databases. Examples of installing
# postgresql and adding such a user are:
#
# fedora:
#
# $ sudo yum install postgresql postgresql-server python-pyside qt-postgresql
# $ sudo su - postgres -c initdb
# $ sudo service postgresql start
# $ sudo su - postgres
# $ createuser <your user id here>
# Shall the new role be a superuser? (y/n) y
#
# ubuntu:
#
# $ sudo apt-get install postgresql
# $ sudo su - postgres
# $ createuser <your user id here>
# Shall the new role be a superuser? (y/n) y
#
# An example of using this script with Intel PT:
#
# $ perf record -e intel_pt//u ls
# $ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py pt_example branches calls
# 2015-05-29 12:49:23.464364 Creating database...
# 2015-05-29 12:49:26.281717 Writing to intermediate files...
# 2015-05-29 12:49:27.190383 Copying to database...
# 2015-05-29 12:49:28.140451 Removing intermediate files...
# 2015-05-29 12:49:28.147451 Adding primary keys
# 2015-05-29 12:49:28.655683 Adding foreign keys
# 2015-05-29 12:49:29.365350 Done
#
# To browse the database, psql can be used e.g.
#
# $ psql pt_example
# pt_example=# select * from samples_view where id < 100;
# pt_example=# \d+
# pt_example=# \d+ samples_view
# pt_example=# \q
#
# An example of using the database is provided by the script
# call-graph-from-postgresql.py. Refer to that script for details.
from PySide.QtSql import *
# Need to access PostgreSQL C library directly to use COPY FROM STDIN

View File

@ -32,6 +32,7 @@ perf-y += sample-parsing.o
perf-y += parse-no-sample-id-all.o
perf-y += kmod-path.o
perf-y += thread-map.o
perf-y += llvm.o
perf-$(CONFIG_X86) += perf-time-to-tsc.o

View File

@ -174,6 +174,10 @@ static struct test {
.desc = "Test thread map",
.func = test__thread_map,
},
{
.desc = "Test LLVM searching and compiling",
.func = test__llvm,
},
{
.func = NULL,
},

View File

@ -279,6 +279,7 @@ static int test1(struct perf_evsel *evsel, struct machine *machine)
symbol_conf.use_callchain = false;
symbol_conf.cumulate_callchain = false;
perf_evsel__reset_sample_bit(evsel, CALLCHAIN);
setup_sorting();
callchain_register_param(&callchain_param);
@ -425,6 +426,7 @@ static int test2(struct perf_evsel *evsel, struct machine *machine)
symbol_conf.use_callchain = true;
symbol_conf.cumulate_callchain = false;
perf_evsel__set_sample_bit(evsel, CALLCHAIN);
setup_sorting();
callchain_register_param(&callchain_param);
@ -482,6 +484,7 @@ static int test3(struct perf_evsel *evsel, struct machine *machine)
symbol_conf.use_callchain = false;
symbol_conf.cumulate_callchain = true;
perf_evsel__reset_sample_bit(evsel, CALLCHAIN);
setup_sorting();
callchain_register_param(&callchain_param);
@ -665,6 +668,7 @@ static int test4(struct perf_evsel *evsel, struct machine *machine)
symbol_conf.use_callchain = true;
symbol_conf.cumulate_callchain = true;
perf_evsel__set_sample_bit(evsel, CALLCHAIN);
setup_sorting();
callchain_register_param(&callchain_param);

Some files were not shown because too many files have changed in this diff Show More