Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf changes from Ingo Molnar:
 "Core kernel changes:

   - One of the more interesting features in this cycle is the ability
     to attach eBPF programs (user-defined, sandboxed bytecode executed
     by the kernel) to kprobes.

     This allows user-defined instrumentation on a live kernel image
     that can never crash, hang or interfere with the kernel negatively.
     (Right now it's limited to root-only, but in the future we might
     allow unprivileged use as well.)

     (Alexei Starovoitov)

   - Another non-trivial feature is per event clockid support: this
     allows, amongst other things, the selection of different clock
     sources for event timestamps traced via perf.

     This feature is sought by people who'd like to merge perf generated
     events with external events that were measured with different
     clocks:

       - cluster wide profiling

       - for system wide tracing with user-space events,

       - JIT profiling events

     etc.  Matching perf tooling support is added as well, available via
     the -k, --clockid <clockid> parameter to perf record et al.

     (Peter Zijlstra)

  Hardware enablement kernel changes:

   - x86 Intel Processor Trace (PT) support: which is a hardware tracer
     on steroids, available on Broadwell CPUs.

     The hardware trace stream is directly output into the user-space
     ring-buffer, using the 'AUX' data format extension that was added
     to the perf core to support hardware constraints such as the
     necessity to have the tracing buffer physically contiguous.

     This patch-set was developed for two years and this is the result.
     A simple way to make use of this is to use BTS tracing, the PT
     driver emulates BTS output - available via the 'intel_bts' PMU.
     More explicit PT specific tooling support is in the works as well -
     will probably be ready by 4.2.

     (Alexander Shishkin, Peter Zijlstra)

   - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
     feature of Intel Xeon CPUs that allows the measurement and
     allocation/partitioning of caches to individual workloads.

     These kernel changes expose the measurement side as a new PMU
     driver, which exposes various QoS related PMU events.  (The
     partitioning change is work in progress and is planned to be merged
     as a cgroup extension.)

     (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
     Waskiewicz Jr)

   - x86 Intel Haswell LBR call stack support: this is a new Haswell
     feature that allows the hardware recording of call chains, plus
     tooling support.  To activate this feature you have to enable it
     via the new 'lbr' call-graph recording option:

        perf record --call-graph lbr
        perf report

     or:

        perf top --call-graph lbr

     This hardware feature is a lot faster than stack walk or dwarf
     based unwinding, but has some limitations:

       - It reuses the current LBR facility, so LBR call stack and
         branch record can not be enabled at the same time.

       - It is only available for user-space callchains.

     (Yan, Zheng)

   - x86 Intel Broadwell CPU support and various event constraints and
     event table fixes for earlier models.

     (Andi Kleen)

   - x86 Intel HT CPUs event scheduling workarounds.  This is a complex
     CPU bug affecting the SNB,IVB,HSW families that results in counter
     value corruption.  The mitigation code is automatically enabled and
     is transparent.

     (Maria Dimakopoulou, Stephane Eranian)

  The perf tooling side had a ton of changes in this cycle as well, so
  I'm only able to list the user visible changes here, in addition to
  the tooling changes outlined above:

  User visible changes affecting all tools:

      - Improve support of compressed kernel modules (Jiri Olsa)
      - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
      - Bash completion for subcommands (Yunlong Song)
      - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
      - Support missing -f to override perf.data file ownership. (Yunlong Song)
      - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)

  User visible changes in individual tools:

    'perf data':

        New tool for converting perf.data to other formats, initially
        for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
        Sebastian Siewior)

    'perf diff':

        Add --kallsyms option (David Ahern)

    'perf list':

        Allow listing events with 'tracepoint' prefix (Yunlong Song)

        Sort the output of the command (Yunlong Song)

    'perf kmem':

        Respect -i option (Jiri Olsa)

        Print big numbers using thousands' group (Namhyung Kim)

        Allow -v option (Namhyung Kim)

        Fix alignment of slab result table (Namhyung Kim)

    'perf probe':

        Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

        Support unnamed union/structure members data collection. (Masami Hiramatsu)

        Check kprobes blacklist when adding new events. (Masami Hiramatsu)

    'perf record':

        Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

        Support recording running/enabled time (Andi Kleen)

    'perf sched':

        Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)

    'perf report' and 'perf top':

        Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)

        Indicate which callchain entries are annotated in the
        TUI hists browser (Arnaldo Carvalho de Melo)

        Add pid/tid filtering to 'report' and 'script' commands (David Ahern)

        Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
        cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
        events (Arnaldo Carvalho de Melo)

    'perf stat':

        Report unsupported events properly (Suzuki K. Poulose)

        Output running time and run/enabled ratio in CSV mode (Andi Kleen)

    'perf trace':

        Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)

        Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)

        Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)

        Dump stack on segfaults (Arnaldo Carvalho de Melo)

        No need to explicitely enable evsels for workload started from perf, let it
        be enabled via perf_event_attr.enable_on_exec, removing some events that take
        place in the 'perf trace' before a workload is really started by it.
        (Arnaldo Carvalho de Melo)

        Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)

  There's also been a ton of infrastructure work done, such as the
  split-out of perf's build system into tools/build/ and other changes -
  see the shortlog and changelog for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
  perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
  perf evlist: Fix type for references to data_head/tail
  perf probe: Check the orphaned -x option
  perf probe: Support multiple probes on different binaries
  perf buildid-list: Fix segfault when show DSOs with hits
  perf tools: Fix cross-endian analysis
  perf tools: Fix error path to do closedir() when synthesizing threads
  perf tools: Fix synthesizing fork_event.ppid for non-main thread
  perf tools: Add 'I' event modifier for exclude_idle bit
  perf report: Don't call map__kmap if map is NULL.
  perf tests: Fix attr tests
  perf probe: Fix ARM 32 building error
  perf tools: Merge all perf_event_attr print functions
  perf record: Add clockid parameter
  perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
  perf sched replay: Support using -f to override perf.data file ownership
  perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
  perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
  perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
  perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
  ...
This commit is contained in:
Linus Torvalds 2015-04-14 14:37:47 -07:00
commit 6c8a53c9e6
285 changed files with 13236 additions and 2905 deletions

View file

@ -648,7 +648,7 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
* Per-cpu breakpoints are not supported by our stepping
* mechanism.
*/
if (!bp->hw.bp_target)
if (!bp->hw.target)
return -EINVAL;
/*

View file

@ -527,7 +527,7 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
* Disallow per-task kernel breakpoints since these would
* complicate the stepping code.
*/
if (info->ctrl.privilege == AARCH64_BREAKPOINT_EL1 && bp->hw.bp_target)
if (info->ctrl.privilege == AARCH64_BREAKPOINT_EL1 && bp->hw.target)
return -EINVAL;
return 0;

View file

@ -124,7 +124,7 @@ static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
static inline void power_pmu_bhrb_enable(struct perf_event *event) {}
static inline void power_pmu_bhrb_disable(struct perf_event *event) {}
static void power_pmu_flush_branch_stack(void) {}
static void power_pmu_sched_task(struct perf_event_context *ctx, bool sched_in) {}
static inline void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw) {}
static void pmao_restore_workaround(bool ebb) { }
#endif /* CONFIG_PPC32 */
@ -350,6 +350,7 @@ static void power_pmu_bhrb_enable(struct perf_event *event)
cpuhw->bhrb_context = event->ctx;
}
cpuhw->bhrb_users++;
perf_sched_cb_inc(event->ctx->pmu);
}
static void power_pmu_bhrb_disable(struct perf_event *event)
@ -361,6 +362,7 @@ static void power_pmu_bhrb_disable(struct perf_event *event)
cpuhw->bhrb_users--;
WARN_ON_ONCE(cpuhw->bhrb_users < 0);
perf_sched_cb_dec(event->ctx->pmu);
if (!cpuhw->disabled && !cpuhw->bhrb_users) {
/* BHRB cannot be turned off when other
@ -375,9 +377,12 @@ static void power_pmu_bhrb_disable(struct perf_event *event)
/* Called from ctxsw to prevent one process's branch entries to
* mingle with the other process's entries during context switch.
*/
static void power_pmu_flush_branch_stack(void)
static void power_pmu_sched_task(struct perf_event_context *ctx, bool sched_in)
{
if (ppmu->bhrb_nr)
if (!ppmu->bhrb_nr)
return;
if (sched_in)
power_pmu_bhrb_reset();
}
/* Calculate the to address for a branch */
@ -1901,7 +1906,7 @@ static struct pmu power_pmu = {
.cancel_txn = power_pmu_cancel_txn,
.commit_txn = power_pmu_commit_txn,
.event_idx = power_pmu_event_idx,
.flush_branch_stack = power_pmu_flush_branch_stack,
.sched_task = power_pmu_sched_task,
};
/*

View file

@ -12,7 +12,7 @@
#include <asm/disabled-features.h>
#endif
#define NCAPINTS 11 /* N 32-bit words worth of info */
#define NCAPINTS 13 /* N 32-bit words worth of info */
#define NBUGINTS 1 /* N 32-bit bug flags */
/*
@ -195,6 +195,7 @@
#define X86_FEATURE_HWP_ACT_WINDOW ( 7*32+ 12) /* Intel HWP_ACT_WINDOW */
#define X86_FEATURE_HWP_EPP ( 7*32+13) /* Intel HWP_EPP */
#define X86_FEATURE_HWP_PKG_REQ ( 7*32+14) /* Intel HWP_PKG_REQ */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
@ -226,6 +227,7 @@
#define X86_FEATURE_ERMS ( 9*32+ 9) /* Enhanced REP MOVSB/STOSB */
#define X86_FEATURE_INVPCID ( 9*32+10) /* Invalidate Processor Context ID */
#define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */
#define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */
#define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */
#define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */
#define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
@ -244,6 +246,12 @@
#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 */
#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS */
/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (edx), word 11 */
#define X86_FEATURE_CQM_LLC (11*32+ 1) /* LLC QoS if 1 */
/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (edx), word 12 */
#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring if 1 */
/*
* BUG word(s)
*/

View file

@ -109,6 +109,9 @@ struct cpuinfo_x86 {
/* in KB - valid for CPUS which support this call: */
int x86_cache_size;
int x86_cache_alignment; /* In bytes */
/* Cache QoS architectural values: */
int x86_cache_max_rmid; /* max index */
int x86_cache_occ_scale; /* scale to bytes */
int x86_power;
unsigned long loops_per_jiffy;
/* cpuid returned max cores value: */

View file

@ -74,6 +74,24 @@
#define MSR_IA32_PERF_CAPABILITIES 0x00000345
#define MSR_PEBS_LD_LAT_THRESHOLD 0x000003f6
#define MSR_IA32_RTIT_CTL 0x00000570
#define RTIT_CTL_TRACEEN BIT(0)
#define RTIT_CTL_OS BIT(2)
#define RTIT_CTL_USR BIT(3)
#define RTIT_CTL_CR3EN BIT(7)
#define RTIT_CTL_TOPA BIT(8)
#define RTIT_CTL_TSC_EN BIT(10)
#define RTIT_CTL_DISRETC BIT(11)
#define RTIT_CTL_BRANCH_EN BIT(13)
#define MSR_IA32_RTIT_STATUS 0x00000571
#define RTIT_STATUS_CONTEXTEN BIT(1)
#define RTIT_STATUS_TRIGGEREN BIT(2)
#define RTIT_STATUS_ERROR BIT(4)
#define RTIT_STATUS_STOPPED BIT(5)
#define MSR_IA32_RTIT_CR3_MATCH 0x00000572
#define MSR_IA32_RTIT_OUTPUT_BASE 0x00000560
#define MSR_IA32_RTIT_OUTPUT_MASK 0x00000561
#define MSR_MTRRfix64K_00000 0x00000250
#define MSR_MTRRfix16K_80000 0x00000258
#define MSR_MTRRfix16K_A0000 0x00000259

View file

@ -39,7 +39,8 @@ obj-$(CONFIG_CPU_SUP_AMD) += perf_event_amd_iommu.o
endif
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_p6.o perf_event_knc.o perf_event_p4.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_rapl.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_rapl.o perf_event_intel_cqm.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_pt.o perf_event_intel_bts.o
obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += perf_event_intel_uncore.o \
perf_event_intel_uncore_snb.o \

View file

@ -646,6 +646,30 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
c->x86_capability[10] = eax;
}
/* Additional Intel-defined flags: level 0x0000000F */
if (c->cpuid_level >= 0x0000000F) {
u32 eax, ebx, ecx, edx;
/* QoS sub-leaf, EAX=0Fh, ECX=0 */
cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
c->x86_capability[11] = edx;
if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
/* will be overridden if occupancy monitoring exists */
c->x86_cache_max_rmid = ebx;
/* QoS sub-leaf, EAX=0Fh, ECX=1 */
cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
c->x86_capability[12] = edx;
if (cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC)) {
c->x86_cache_max_rmid = ecx;
c->x86_cache_occ_scale = ebx;
}
} else {
c->x86_cache_max_rmid = -1;
c->x86_cache_occ_scale = -1;
}
}
/* AMD-defined flags: level 0x80000001 */
xlvl = cpuid_eax(0x80000000);
c->extended_cpuid_level = xlvl;
@ -834,6 +858,20 @@ static void generic_identify(struct cpuinfo_x86 *c)
detect_nopl(c);
}
static void x86_init_cache_qos(struct cpuinfo_x86 *c)
{
/*
* The heavy lifting of max_rmid and cache_occ_scale are handled
* in get_cpu_cap(). Here we just set the max_rmid for the boot_cpu
* in case CQM bits really aren't there in this CPU.
*/
if (c != &boot_cpu_data) {
boot_cpu_data.x86_cache_max_rmid =
min(boot_cpu_data.x86_cache_max_rmid,
c->x86_cache_max_rmid);
}
}
/*
* This does the hard work of actually picking apart the CPU stuff...
*/
@ -923,6 +961,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
init_hypervisor(c);
x86_init_rdrand(c);
x86_init_cache_qos(c);
/*
* Clear/Set all flags overriden by options, need do it

View file

@ -0,0 +1,131 @@
/*
* Intel(R) Processor Trace PMU driver for perf
* Copyright (c) 2013-2014, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* Intel PT is specified in the Intel Architecture Instruction Set Extensions
* Programming Reference:
* http://software.intel.com/en-us/intel-isa-extensions
*/
#ifndef __INTEL_PT_H__
#define __INTEL_PT_H__
/*
* Single-entry ToPA: when this close to region boundary, switch
* buffers to avoid losing data.
*/
#define TOPA_PMI_MARGIN 512
/*
* Table of Physical Addresses bits
*/
enum topa_sz {
TOPA_4K = 0,
TOPA_8K,
TOPA_16K,
TOPA_32K,
TOPA_64K,
TOPA_128K,
TOPA_256K,
TOPA_512K,
TOPA_1MB,
TOPA_2MB,
TOPA_4MB,
TOPA_8MB,
TOPA_16MB,
TOPA_32MB,
TOPA_64MB,
TOPA_128MB,
TOPA_SZ_END,
};
static inline unsigned int sizes(enum topa_sz tsz)
{
return 1 << (tsz + 12);
};
struct topa_entry {
u64 end : 1;
u64 rsvd0 : 1;
u64 intr : 1;
u64 rsvd1 : 1;
u64 stop : 1;
u64 rsvd2 : 1;
u64 size : 4;
u64 rsvd3 : 2;
u64 base : 36;
u64 rsvd4 : 16;
};
#define TOPA_SHIFT 12
#define PT_CPUID_LEAVES 2
enum pt_capabilities {
PT_CAP_max_subleaf = 0,
PT_CAP_cr3_filtering,
PT_CAP_topa_output,
PT_CAP_topa_multiple_entries,
PT_CAP_payloads_lip,
};
struct pt_pmu {
struct pmu pmu;
u32 caps[4 * PT_CPUID_LEAVES];
};
/**
* struct pt_buffer - buffer configuration; one buffer per task_struct or
* cpu, depending on perf event configuration
* @cpu: cpu for per-cpu allocation
* @tables: list of ToPA tables in this buffer
* @first: shorthand for first topa table
* @last: shorthand for last topa table
* @cur: current topa table
* @nr_pages: buffer size in pages
* @cur_idx: current output region's index within @cur table
* @output_off: offset within the current output region
* @data_size: running total of the amount of data in this buffer
* @lost: if data was lost/truncated
* @head: logical write offset inside the buffer
* @snapshot: if this is for a snapshot/overwrite counter
* @stop_pos: STOP topa entry in the buffer
* @intr_pos: INT topa entry in the buffer
* @data_pages: array of pages from perf
* @topa_index: table of topa entries indexed by page offset
*/
struct pt_buffer {
int cpu;
struct list_head tables;
struct topa *first, *last, *cur;
unsigned int cur_idx;
size_t output_off;
unsigned long nr_pages;
local_t data_size;
local_t lost;
local64_t head;
bool snapshot;
unsigned long stop_pos, intr_pos;
void **data_pages;
struct topa_entry *topa_index[0];
};
/**
* struct pt - per-cpu pt context
* @handle: perf output handle
* @handle_nmi: do handle PT PMI on this cpu, there's an active event
*/
struct pt {
struct perf_output_handle handle;
int handle_nmi;
};
#endif /* __INTEL_PT_H__ */

View file

@ -263,6 +263,14 @@ static void hw_perf_event_destroy(struct perf_event *event)
}
}
void hw_perf_lbr_event_destroy(struct perf_event *event)
{
hw_perf_event_destroy(event);
/* undo the lbr/bts event accounting */
x86_del_exclusive(x86_lbr_exclusive_lbr);
}
static inline int x86_pmu_initialized(void)
{
return x86_pmu.handle_irq != NULL;
@ -302,6 +310,35 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct perf_event *event)
return x86_pmu_extra_regs(val, event);
}
/*
* Check if we can create event of a certain type (that no conflicting events
* are present).
*/
int x86_add_exclusive(unsigned int what)
{
int ret = -EBUSY, i;
if (atomic_inc_not_zero(&x86_pmu.lbr_exclusive[what]))
return 0;
mutex_lock(&pmc_reserve_mutex);
for (i = 0; i < ARRAY_SIZE(x86_pmu.lbr_exclusive); i++)
if (i != what && atomic_read(&x86_pmu.lbr_exclusive[i]))
goto out;
atomic_inc(&x86_pmu.lbr_exclusive[what]);
ret = 0;
out:
mutex_unlock(&pmc_reserve_mutex);
return ret;
}
void x86_del_exclusive(unsigned int what)
{
atomic_dec(&x86_pmu.lbr_exclusive[what]);
}
int x86_setup_perfctr(struct perf_event *event)
{
struct perf_event_attr *attr = &event->attr;
@ -346,6 +383,12 @@ int x86_setup_perfctr(struct perf_event *event)
/* BTS is currently only allowed for user-mode. */
if (!attr->exclude_kernel)
return -EOPNOTSUPP;
/* disallow bts if conflicting events are present */
if (x86_add_exclusive(x86_lbr_exclusive_lbr))
return -EBUSY;
event->destroy = hw_perf_lbr_event_destroy;
}
hwc->config |= config;
@ -399,39 +442,41 @@ int x86_pmu_hw_config(struct perf_event *event)
if (event->attr.precise_ip > precise)
return -EOPNOTSUPP;
/*
* check that PEBS LBR correction does not conflict with
* whatever the user is asking with attr->branch_sample_type
*/
if (event->attr.precise_ip > 1 &&
x86_pmu.intel_cap.pebs_format < 2) {
u64 *br_type = &event->attr.branch_sample_type;
}
/*
* check that PEBS LBR correction does not conflict with
* whatever the user is asking with attr->branch_sample_type
*/
if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format < 2) {
u64 *br_type = &event->attr.branch_sample_type;
if (has_branch_stack(event)) {
if (!precise_br_compat(event))
return -EOPNOTSUPP;
if (has_branch_stack(event)) {
if (!precise_br_compat(event))
return -EOPNOTSUPP;
/* branch_sample_type is compatible */
/* branch_sample_type is compatible */
} else {
/*
* user did not specify branch_sample_type
*
* For PEBS fixups, we capture all
* the branches at the priv level of the
* event.
*/
*br_type = PERF_SAMPLE_BRANCH_ANY;
} else {
/*
* user did not specify branch_sample_type
*
* For PEBS fixups, we capture all
* the branches at the priv level of the
* event.
*/
*br_type = PERF_SAMPLE_BRANCH_ANY;
if (!event->attr.exclude_user)
*br_type |= PERF_SAMPLE_BRANCH_USER;
if (!event->attr.exclude_user)
*br_type |= PERF_SAMPLE_BRANCH_USER;
if (!event->attr.exclude_kernel)
*br_type |= PERF_SAMPLE_BRANCH_KERNEL;
}
if (!event->attr.exclude_kernel)
*br_type |= PERF_SAMPLE_BRANCH_KERNEL;
}
}
if (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK)
event->attach_state |= PERF_ATTACH_TASK_DATA;
/*
* Generate PMC IRQs:
* (keep 'enabled' bit clear for now)
@ -449,6 +494,12 @@ int x86_pmu_hw_config(struct perf_event *event)
if (event->attr.type == PERF_TYPE_RAW)
event->hw.config |= event->attr.config & X86_RAW_EVENT_MASK;
if (event->attr.sample_period && x86_pmu.limit_period) {
if (x86_pmu.limit_period(event, event->attr.sample_period) >
event->attr.sample_period)
return -EINVAL;
}
return x86_setup_perfctr(event);
}
@ -728,14 +779,17 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
struct event_constraint *c;
unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
struct perf_event *e;
int i, wmin, wmax, num = 0;
int i, wmin, wmax, unsched = 0;
struct hw_perf_event *hwc;
bitmap_zero(used_mask, X86_PMC_IDX_MAX);
if (x86_pmu.start_scheduling)
x86_pmu.start_scheduling(cpuc);
for (i = 0, wmin = X86_PMC_IDX_MAX, wmax = 0; i < n; i++) {
hwc = &cpuc->event_list[i]->hw;
c = x86_pmu.get_event_constraints(cpuc, cpuc->event_list[i]);
c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
hwc->constraint = c;
wmin = min(wmin, c->weight);
@ -768,24 +822,30 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
/* slow path */
if (i != n)
num = perf_assign_events(cpuc->event_list, n, wmin,
wmax, assign);
unsched = perf_assign_events(cpuc->event_list, n, wmin,
wmax, assign);
/*
* Mark the event as committed, so we do not put_constraint()
* in case new events are added and fail scheduling.
* In case of success (unsched = 0), mark events as committed,
* so we do not put_constraint() in case new events are added
* and fail to be scheduled
*
* We invoke the lower level commit callback to lock the resource
*
* We do not need to do all of this in case we are called to
* validate an event group (assign == NULL)
*/
if (!num && assign) {
if (!unsched && assign) {
for (i = 0; i < n; i++) {
e = cpuc->event_list[i];
e->hw.flags |= PERF_X86_EVENT_COMMITTED;
if (x86_pmu.commit_scheduling)
x86_pmu.commit_scheduling(cpuc, e, assign[i]);
}
}
/*
* scheduling failed or is just a simulation,
* free resources if necessary
*/
if (!assign || num) {
if (!assign || unsched) {
for (i = 0; i < n; i++) {
e = cpuc->event_list[i];
/*
@ -795,11 +855,18 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
if ((e->hw.flags & PERF_X86_EVENT_COMMITTED))
continue;
/*
* release events that failed scheduling
*/
if (x86_pmu.put_event_constraints)
x86_pmu.put_event_constraints(cpuc, e);
}
}
return num ? -EINVAL : 0;
if (x86_pmu.stop_scheduling)
x86_pmu.stop_scheduling(cpuc);
return unsched ? -EINVAL : 0;
}
/*
@ -986,6 +1053,9 @@ int x86_perf_event_set_period(struct perf_event *event)
if (left > x86_pmu.max_period)
left = x86_pmu.max_period;
if (x86_pmu.limit_period)
left = x86_pmu.limit_period(event, left);
per_cpu(pmc_prev_left[idx], smp_processor_id()) = left;
/*
@ -1033,7 +1103,6 @@ static int x86_pmu_add(struct perf_event *event, int flags)
hwc = &event->hw;
perf_pmu_disable(event->pmu);
n0 = cpuc->n_events;
ret = n = collect_events(cpuc, event, false);
if (ret < 0)
@ -1071,7 +1140,6 @@ static int x86_pmu_add(struct perf_event *event, int flags)
ret = 0;
out:
perf_pmu_enable(event->pmu);
return ret;
}
@ -1103,7 +1171,7 @@ static void x86_pmu_start(struct perf_event *event, int flags)
void perf_event_print_debug(void)
{
u64 ctrl, status, overflow, pmc_ctrl, pmc_count, prev_left, fixed;
u64 pebs;
u64 pebs, debugctl;
struct cpu_hw_events *cpuc;
unsigned long flags;
int cpu, idx;
@ -1121,14 +1189,20 @@ void perf_event_print_debug(void)
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, status);
rdmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, overflow);
rdmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, fixed);
rdmsrl(MSR_IA32_PEBS_ENABLE, pebs);
pr_info("\n");
pr_info("CPU#%d: ctrl: %016llx\n", cpu, ctrl);
pr_info("CPU#%d: status: %016llx\n", cpu, status);
pr_info("CPU#%d: overflow: %016llx\n", cpu, overflow);
pr_info("CPU#%d: fixed: %016llx\n", cpu, fixed);
pr_info("CPU#%d: pebs: %016llx\n", cpu, pebs);
if (x86_pmu.pebs_constraints) {
rdmsrl(MSR_IA32_PEBS_ENABLE, pebs);
pr_info("CPU#%d: pebs: %016llx\n", cpu, pebs);
}
if (x86_pmu.lbr_nr) {
rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
pr_info("CPU#%d: debugctl: %016llx\n", cpu, debugctl);
}
}
pr_info("CPU#%d: active: %016llx\n", cpu, *(u64 *)cpuc->active_mask);
@ -1321,11 +1395,12 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
{
unsigned int cpu = (long)hcpu;
struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
int ret = NOTIFY_OK;
int i, ret = NOTIFY_OK;
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_UP_PREPARE:
cpuc->kfree_on_online = NULL;
for (i = 0 ; i < X86_PERF_KFREE_MAX; i++)
cpuc->kfree_on_online[i] = NULL;
if (x86_pmu.cpu_prepare)
ret = x86_pmu.cpu_prepare(cpu);
break;
@ -1336,7 +1411,10 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
break;
case CPU_ONLINE:
kfree(cpuc->kfree_on_online);
for (i = 0 ; i < X86_PERF_KFREE_MAX; i++) {
kfree(cpuc->kfree_on_online[i]);
cpuc->kfree_on_online[i] = NULL;
}
break;
case CPU_DYING:
@ -1712,7 +1790,7 @@ static int validate_event(struct perf_event *event)
if (IS_ERR(fake_cpuc))
return PTR_ERR(fake_cpuc);
c = x86_pmu.get_event_constraints(fake_cpuc, event);
c = x86_pmu.get_event_constraints(fake_cpuc, -1, event);
if (!c || !c->weight)
ret = -EINVAL;
@ -1914,10 +1992,10 @@ static const struct attribute_group *x86_pmu_attr_groups[] = {
NULL,
};
static void x86_pmu_flush_branch_stack(void)
static void x86_pmu_sched_task(struct perf_event_context *ctx, bool sched_in)
{
if (x86_pmu.flush_branch_stack)
x86_pmu.flush_branch_stack();
if (x86_pmu.sched_task)
x86_pmu.sched_task(ctx, sched_in);
}
void perf_check_microcode(void)
@ -1949,7 +2027,8 @@ static struct pmu pmu = {
.commit_txn = x86_pmu_commit_txn,
.event_idx = x86_pmu_event_idx,
.flush_branch_stack = x86_pmu_flush_branch_stack,
.sched_task = x86_pmu_sched_task,
.task_ctx_size = sizeof(struct x86_perf_task_context),
};
void arch_perf_update_userpage(struct perf_event *event,
@ -1968,13 +2047,23 @@ void arch_perf_update_userpage(struct perf_event *event,
data = cyc2ns_read_begin();
/*
* Internal timekeeping for enabled/running/stopped times
* is always in the local_clock domain.
*/
userpg->cap_user_time = 1;
userpg->time_mult = data->cyc2ns_mul;
userpg->time_shift = data->cyc2ns_shift;
userpg->time_offset = data->cyc2ns_offset - now;
userpg->cap_user_time_zero = 1;
userpg->time_zero = data->cyc2ns_offset;
/*
* cap_user_time_zero doesn't make sense when we're using a different
* time base for the records.
*/
if (event->clock == &local_clock) {
userpg->cap_user_time_zero = 1;
userpg->time_zero = data->cyc2ns_offset;
}
cyc2ns_read_end(data);
}

View file

@ -71,6 +71,8 @@ struct event_constraint {
#define PERF_X86_EVENT_COMMITTED 0x8 /* event passed commit_txn */
#define PERF_X86_EVENT_PEBS_LD_HSW 0x10 /* haswell style datala, load */
#define PERF_X86_EVENT_PEBS_NA_HSW 0x20 /* haswell style datala, unknown */
#define PERF_X86_EVENT_EXCL 0x40 /* HT exclusivity on counter */
#define PERF_X86_EVENT_DYNAMIC 0x80 /* dynamic alloc'd constraint */
#define PERF_X86_EVENT_RDPMC_ALLOWED 0x40 /* grant rdpmc permission */
@ -123,8 +125,37 @@ struct intel_shared_regs {
unsigned core_id; /* per-core: core id */
};
enum intel_excl_state_type {
INTEL_EXCL_UNUSED = 0, /* counter is unused */
INTEL_EXCL_SHARED = 1, /* counter can be used by both threads */
INTEL_EXCL_EXCLUSIVE = 2, /* counter can be used by one thread only */
};
struct intel_excl_states {
enum intel_excl_state_type init_state[X86_PMC_IDX_MAX];
enum intel_excl_state_type state[X86_PMC_IDX_MAX];
int num_alloc_cntrs;/* #counters allocated */
int max_alloc_cntrs;/* max #counters allowed */
bool sched_started; /* true if scheduling has started */
};
struct intel_excl_cntrs {
raw_spinlock_t lock;
struct intel_excl_states states[2];
int refcnt; /* per-core: #HT threads */
unsigned core_id; /* per-core: core id */
};
#define MAX_LBR_ENTRIES 16
enum {
X86_PERF_KFREE_SHARED = 0,
X86_PERF_KFREE_EXCL = 1,
X86_PERF_KFREE_MAX
};
struct cpu_hw_events {
/*
* Generic x86 PMC bits
@ -179,6 +210,12 @@ struct cpu_hw_events {
* used on Intel NHM/WSM/SNB
*/
struct intel_shared_regs *shared_regs;
/*
* manage exclusive counter access between hyperthread
*/
struct event_constraint *constraint_list; /* in enable order */
struct intel_excl_cntrs *excl_cntrs;
int excl_thread_id; /* 0 or 1 */
/*
* AMD specific bits
@ -187,7 +224,7 @@ struct cpu_hw_events {
/* Inverted mask of bits to clear in the perf_ctr ctrl registers */
u64 perf_ctr_virt_mask;
void *kfree_on_online;
void *kfree_on_online[X86_PERF_KFREE_MAX];
};
#define __EVENT_CONSTRAINT(c, n, m, w, o, f) {\
@ -202,6 +239,10 @@ struct cpu_hw_events {
#define EVENT_CONSTRAINT(c, n, m) \
__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0, 0)
#define INTEL_EXCLEVT_CONSTRAINT(c, n) \
__EVENT_CONSTRAINT(c, n, ARCH_PERFMON_EVENTSEL_EVENT, HWEIGHT(n),\
0, PERF_X86_EVENT_EXCL)
/*
* The overlap flag marks event constraints with overlapping counter
* masks. This is the case if the counter mask of such an event is not
@ -259,6 +300,10 @@ struct cpu_hw_events {
#define INTEL_FLAGS_UEVENT_CONSTRAINT(c, n) \
EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS)
#define INTEL_EXCLUEVT_CONSTRAINT(c, n) \
__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
HWEIGHT(n), 0, PERF_X86_EVENT_EXCL)
#define INTEL_PLD_CONSTRAINT(c, n) \
__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LDLAT)
@ -283,22 +328,40 @@ struct cpu_hw_events {
/* Check flags and event code, and set the HSW load flag */
#define INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(code, n) \
__EVENT_CONSTRAINT(code, n, \
__EVENT_CONSTRAINT(code, n, \
ARCH_PERFMON_EVENTSEL_EVENT|X86_ALL_EVENT_FLAGS, \
HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LD_HSW)
#define INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_XLD(code, n) \
__EVENT_CONSTRAINT(code, n, \
ARCH_PERFMON_EVENTSEL_EVENT|X86_ALL_EVENT_FLAGS, \
HWEIGHT(n), 0, \
PERF_X86_EVENT_PEBS_LD_HSW|PERF_X86_EVENT_EXCL)
/* Check flags and event code/umask, and set the HSW store flag */
#define INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(code, n) \
__EVENT_CONSTRAINT(code, n, \
INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_ST_HSW)
#define INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_XST(code, n) \
__EVENT_CONSTRAINT(code, n, \
INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
HWEIGHT(n), 0, \
PERF_X86_EVENT_PEBS_ST_HSW|PERF_X86_EVENT_EXCL)
/* Check flags and event code/umask, and set the HSW load flag */
#define INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(code, n) \
__EVENT_CONSTRAINT(code, n, \
INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LD_HSW)
#define INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_XLD(code, n) \
__EVENT_CONSTRAINT(code, n, \
INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
HWEIGHT(n), 0, \
PERF_X86_EVENT_PEBS_LD_HSW|PERF_X86_EVENT_EXCL)
/* Check flags and event code/umask, and set the HSW N/A flag */
#define INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA(code, n) \
__EVENT_CONSTRAINT(code, n, \
@ -408,6 +471,13 @@ union x86_pmu_config {
#define X86_CONFIG(args...) ((union x86_pmu_config){.bits = {args}}).value
enum {
x86_lbr_exclusive_lbr,
x86_lbr_exclusive_bts,
x86_lbr_exclusive_pt,
x86_lbr_exclusive_max,
};
/*
* struct x86_pmu - generic x86 pmu
*/
@ -443,14 +513,25 @@ struct x86_pmu {
u64 max_period;
struct event_constraint *
(*get_event_constraints)(struct cpu_hw_events *cpuc,
int idx,
struct perf_event *event);
void (*put_event_constraints)(struct cpu_hw_events *cpuc,
struct perf_event *event);
void (*commit_scheduling)(struct cpu_hw_events *cpuc,
struct perf_event *event,
int cntr);
void (*start_scheduling)(struct cpu_hw_events *cpuc);
void (*stop_scheduling)(struct cpu_hw_events *cpuc);
struct event_constraint *event_constraints;
struct x86_pmu_quirk *quirks;
int perfctr_second_write;
bool late_ack;
unsigned (*limit_period)(struct perf_event *event, unsigned l);
/*
* sysfs attrs
@ -472,7 +553,8 @@ struct x86_pmu {
void (*cpu_dead)(int cpu);
void (*check_microcode)(void);
void (*flush_branch_stack)(void);
void (*sched_task)(struct perf_event_context *ctx,
bool sched_in);
/*
* Intel Arch Perfmon v2+
@ -503,11 +585,16 @@ struct x86_pmu {
const int *lbr_sel_map; /* lbr_select mappings */
bool lbr_double_abort; /* duplicated lbr aborts */
/*
* Intel PT/LBR/BTS are exclusive
*/
atomic_t lbr_exclusive[x86_lbr_exclusive_max];
/*
* Extra registers for events
*/
struct extra_reg *extra_regs;
unsigned int er_flags;
unsigned int flags;
/*
* Intel host/guest support (KVM)
@ -515,6 +602,13 @@ struct x86_pmu {
struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
};
struct x86_perf_task_context {
u64 lbr_from[MAX_LBR_ENTRIES];
u64 lbr_to[MAX_LBR_ENTRIES];
int lbr_callstack_users;
int lbr_stack_state;
};
#define x86_add_quirk(func_) \
do { \
static struct x86_pmu_quirk __quirk __initdata = { \
@ -524,8 +618,13 @@ do { \
x86_pmu.quirks = &__quirk; \
} while (0)
#define ERF_NO_HT_SHARING 1
#define ERF_HAS_RSP_1 2
/*
* x86_pmu flags
*/
#define PMU_FL_NO_HT_SHARING 0x1 /* no hyper-threading resource sharing */
#define PMU_FL_HAS_RSP_1 0x2 /* has 2 equivalent offcore_rsp regs */
#define PMU_FL_EXCL_CNTRS 0x4 /* has exclusive counter requirements */
#define PMU_FL_EXCL_ENABLED 0x8 /* exclusive counter active */
#define EVENT_VAR(_id) event_attr_##_id
#define EVENT_PTR(_id) &event_attr_##_id.attr.attr
@ -546,6 +645,12 @@ static struct perf_pmu_events_attr event_attr_##v = { \
extern struct x86_pmu x86_pmu __read_mostly;
static inline bool x86_pmu_has_lbr_callstack(void)
{
return x86_pmu.lbr_sel_map &&
x86_pmu.lbr_sel_map[PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT] > 0;
}
DECLARE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
int x86_perf_event_set_period(struct perf_event *event);
@ -588,6 +693,12 @@ static inline int x86_pmu_rdpmc_index(int index)
return x86_pmu.rdpmc_index ? x86_pmu.rdpmc_index(index) : index;
}
int x86_add_exclusive(unsigned int what);
void x86_del_exclusive(unsigned int what);
void hw_perf_lbr_event_destroy(struct perf_event *event);
int x86_setup_perfctr(struct perf_event *event);
int x86_pmu_hw_config(struct perf_event *event);
@ -674,10 +785,34 @@ static inline int amd_pmu_init(void)
#ifdef CONFIG_CPU_SUP_INTEL
static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event)
{
/* user explicitly requested branch sampling */
if (has_branch_stack(event))
return true;
/* implicit branch sampling to correct PEBS skid */
if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
x86_pmu.intel_cap.pebs_format < 2)
return true;
return false;
}
static inline bool intel_pmu_has_bts(struct perf_event *event)
{
if (event->attr.config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS &&
!event->attr.freq && event->hw.sample_period == 1)
return true;
return false;
}
int intel_pmu_save_and_restart(struct perf_event *event);
struct event_constraint *
x86_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event);
x86_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
struct perf_event *event);
struct intel_shared_regs *allocate_shared_regs(int cpu);
@ -727,13 +862,15 @@ void intel_pmu_pebs_disable_all(void);
void intel_ds_init(void);
void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in);
void intel_pmu_lbr_reset(void);
void intel_pmu_lbr_enable(struct perf_event *event);
void intel_pmu_lbr_disable(struct perf_event *event);
void intel_pmu_lbr_enable_all(void);
void intel_pmu_lbr_enable_all(bool pmi);
void intel_pmu_lbr_disable_all(void);
@ -747,8 +884,18 @@ void intel_pmu_lbr_init_atom(void);
void intel_pmu_lbr_init_snb(void);
void intel_pmu_lbr_init_hsw(void);
int intel_pmu_setup_lbr_filter(struct perf_event *event);
void intel_pt_interrupt(void);
int intel_bts_interrupt(void);
void intel_bts_enable_local(void);
void intel_bts_disable_local(void);
int p4_pmu_init(void);
int p6_pmu_init(void);
@ -758,6 +905,10 @@ int knc_pmu_init(void);
ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
char *page);
static inline int is_ht_workaround_enabled(void)
{
return !!(x86_pmu.flags & PMU_FL_EXCL_ENABLED);
}
#else /* CONFIG_CPU_SUP_INTEL */
static inline void reserve_ds_buffers(void)

View file

@ -382,6 +382,7 @@ static int amd_pmu_cpu_prepare(int cpu)
static void amd_pmu_cpu_starting(int cpu)
{
struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
void **onln = &cpuc->kfree_on_online[X86_PERF_KFREE_SHARED];
struct amd_nb *nb;
int i, nb_id;
@ -399,7 +400,7 @@ static void amd_pmu_cpu_starting(int cpu)
continue;
if (nb->nb_id == nb_id) {
cpuc->kfree_on_online = cpuc->amd_nb;
*onln = cpuc->amd_nb;
cpuc->amd_nb = nb;
break;
}
@ -429,7 +430,8 @@ static void amd_pmu_cpu_dead(int cpu)
}
static struct event_constraint *
amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
struct perf_event *event)
{
/*
* if not NB event or no NB, then no constraints
@ -537,7 +539,8 @@ static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
static struct event_constraint *
amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, int idx,
struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
unsigned int event_code = amd_get_event_code(hwc);

View file

@ -796,7 +796,7 @@ static int setup_ibs_ctl(int ibs_eilvt_off)
* the IBS interrupt vector is handled by perf_ibs_cpu_notifier that
* is using the new offset.
*/
static int force_ibs_eilvt_setup(void)
static void force_ibs_eilvt_setup(void)
{
int offset;
int ret;
@ -811,26 +811,24 @@ static int force_ibs_eilvt_setup(void)
if (offset == APIC_EILVT_NR_MAX) {
printk(KERN_DEBUG "No EILVT entry available\n");
return -EBUSY;
return;
}
ret = setup_ibs_ctl(offset);
if (ret)
goto out;
if (!ibs_eilvt_valid()) {
ret = -EFAULT;
if (!ibs_eilvt_valid())
goto out;
}
pr_info("IBS: LVT offset %d assigned\n", offset);
return 0;
return;
out:
preempt_disable();
put_eilvt(offset);
preempt_enable();
return ret;
return;
}
static void ibs_eilvt_setup(void)

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,525 @@
/*
* BTS PMU driver for perf
* Copyright (c) 2013-2014, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*/
#undef DEBUG
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/bitops.h>
#include <linux/types.h>
#include <linux/slab.h>
#include <linux/debugfs.h>
#include <linux/device.h>
#include <linux/coredump.h>
#include <asm-generic/sizes.h>
#include <asm/perf_event.h>
#include "perf_event.h"
struct bts_ctx {
struct perf_output_handle handle;
struct debug_store ds_back;
int started;
};
static DEFINE_PER_CPU(struct bts_ctx, bts_ctx);
#define BTS_RECORD_SIZE 24
#define BTS_SAFETY_MARGIN 4080
struct bts_phys {
struct page *page;
unsigned long size;
unsigned long offset;
unsigned long displacement;
};
struct bts_buffer {
size_t real_size; /* multiple of BTS_RECORD_SIZE */
unsigned int nr_pages;
unsigned int nr_bufs;
unsigned int cur_buf;
bool snapshot;
local_t data_size;
local_t lost;
local_t head;
unsigned long end;
void **data_pages;
struct bts_phys buf[0];
};
struct pmu bts_pmu;
void intel_pmu_enable_bts(u64 config);
void intel_pmu_disable_bts(void);
static size_t buf_size(struct page *page)
{
return 1 << (PAGE_SHIFT + page_private(page));
}
static void *
bts_buffer_setup_aux(int cpu, void **pages, int nr_pages, bool overwrite)
{
struct bts_buffer *buf;
struct page *page;
int node = (cpu == -1) ? cpu : cpu_to_node(cpu);
unsigned long offset;
size_t size = nr_pages << PAGE_SHIFT;
int pg, nbuf, pad;
/* count all the high order buffers */
for (pg = 0, nbuf = 0; pg < nr_pages;) {
page = virt_to_page(pages[pg]);
if (WARN_ON_ONCE(!PagePrivate(page) && nr_pages > 1))
return NULL;
pg += 1 << page_private(page);
nbuf++;
}
/*
* to avoid interrupts in overwrite mode, only allow one physical
*/
if (overwrite && nbuf > 1)
return NULL;
buf = kzalloc_node(offsetof(struct bts_buffer, buf[nbuf]), GFP_KERNEL, node);
if (!buf)
return NULL;
buf->nr_pages = nr_pages;
buf->nr_bufs = nbuf;
buf->snapshot = overwrite;
buf->data_pages = pages;
buf->real_size = size - size % BTS_RECORD_SIZE;
for (pg = 0, nbuf = 0, offset = 0, pad = 0; nbuf < buf->nr_bufs; nbuf++) {
unsigned int __nr_pages;
page = virt_to_page(pages[pg]);
__nr_pages = PagePrivate(page) ? 1 << page_private(page) : 1;
buf->buf[nbuf].page = page;
buf->buf[nbuf].offset = offset;
buf->buf[nbuf].displacement = (pad ? BTS_RECORD_SIZE - pad : 0);
buf->buf[nbuf].size = buf_size(page) - buf->buf[nbuf].displacement;
pad = buf->buf[nbuf].size % BTS_RECORD_SIZE;
buf->buf[nbuf].size -= pad;
pg += __nr_pages;
offset += __nr_pages << PAGE_SHIFT;
}
return buf;
}
static void bts_buffer_free_aux(void *data)
{
kfree(data);
}
static unsigned long bts_buffer_offset(struct bts_buffer *buf, unsigned int idx)
{
return buf->buf[idx].offset + buf->buf[idx].displacement;
}
static void
bts_config_buffer(struct bts_buffer *buf)
{
int cpu = raw_smp_processor_id();
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
struct bts_phys *phys = &buf->buf[buf->cur_buf];
unsigned long index, thresh = 0, end = phys->size;
struct page *page = phys->page;
index = local_read(&buf->head);
if (!buf->snapshot) {
if (buf->end < phys->offset + buf_size(page))
end = buf->end - phys->offset - phys->displacement;
index -= phys->offset + phys->displacement;
if (end - index > BTS_SAFETY_MARGIN)
thresh = end - BTS_SAFETY_MARGIN;
else if (end - index > BTS_RECORD_SIZE)
thresh = end - BTS_RECORD_SIZE;
else
thresh = end;
}
ds->bts_buffer_base = (u64)(long)page_address(page) + phys->displacement;
ds->bts_index = ds->bts_buffer_base + index;
ds->bts_absolute_maximum = ds->bts_buffer_base + end;
ds->bts_interrupt_threshold = !buf->snapshot
? ds->bts_buffer_base + thresh
: ds->bts_absolute_maximum + BTS_RECORD_SIZE;
}
static void bts_buffer_pad_out(struct bts_phys *phys, unsigned long head)
{
unsigned long index = head - phys->offset;
memset(page_address(phys->page) + index, 0, phys->size - index);
}
static bool bts_buffer_is_full(struct bts_buffer *buf, struct bts_ctx *bts)
{
if (buf->snapshot)
return false;
if (local_read(&buf->data_size) >= bts->handle.size ||
bts->handle.size - local_read(&buf->data_size) < BTS_RECORD_SIZE)
return true;
return false;
}
static void bts_update(struct bts_ctx *bts)
{
int cpu = raw_smp_processor_id();
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
struct bts_buffer *buf = perf_get_aux(&bts->handle);
unsigned long index = ds->bts_index - ds->bts_buffer_base, old, head;
if (!buf)
return;
head = index + bts_buffer_offset(buf, buf->cur_buf);
old = local_xchg(&buf->head, head);
if (!buf->snapshot) {
if (old == head)
return;
if (ds->bts_index >= ds->bts_absolute_maximum)
local_inc(&buf->lost);
/*
* old and head are always in the same physical buffer, so we
* can subtract them to get the data size.
*/
local_add(head - old, &buf->data_size);
} else {
local_set(&buf->data_size, head);
}
}
static void __bts_event_start(struct perf_event *event)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_buffer *buf = perf_get_aux(&bts->handle);
u64 config = 0;
if (!buf || bts_buffer_is_full(buf, bts))
return;
event->hw.state = 0;
if (!buf->snapshot)
config |= ARCH_PERFMON_EVENTSEL_INT;
if (!event->attr.exclude_kernel)
config |= ARCH_PERFMON_EVENTSEL_OS;
if (!event->attr.exclude_user)
config |= ARCH_PERFMON_EVENTSEL_USR;
bts_config_buffer(buf);
/*
* local barrier to make sure that ds configuration made it
* before we enable BTS
*/
wmb();
intel_pmu_enable_bts(config);
}
static void bts_event_start(struct perf_event *event, int flags)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
__bts_event_start(event);
/* PMI handler: this counter is running and likely generating PMIs */
ACCESS_ONCE(bts->started) = 1;
}
static void __bts_event_stop(struct perf_event *event)
{
/*
* No extra synchronization is mandated by the documentation to have
* BTS data stores globally visible.
*/
intel_pmu_disable_bts();
if (event->hw.state & PERF_HES_STOPPED)
return;
ACCESS_ONCE(event->hw.state) |= PERF_HES_STOPPED;
}
static void bts_event_stop(struct perf_event *event, int flags)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
/* PMI handler: don't restart this counter */
ACCESS_ONCE(bts->started) = 0;
__bts_event_stop(event);
if (flags & PERF_EF_UPDATE)
bts_update(bts);
}
void intel_bts_enable_local(void)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
if (bts->handle.event && bts->started)
__bts_event_start(bts->handle.event);
}
void intel_bts_disable_local(void)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
if (bts->handle.event)
__bts_event_stop(bts->handle.event);
}
static int
bts_buffer_reset(struct bts_buffer *buf, struct perf_output_handle *handle)
{
unsigned long head, space, next_space, pad, gap, skip, wakeup;
unsigned int next_buf;
struct bts_phys *phys, *next_phys;
int ret;
if (buf->snapshot)
return 0;
head = handle->head & ((buf->nr_pages << PAGE_SHIFT) - 1);
if (WARN_ON_ONCE(head != local_read(&buf->head)))
return -EINVAL;
phys = &buf->buf[buf->cur_buf];
space = phys->offset + phys->displacement + phys->size - head;
pad = space;
if (space > handle->size) {
space = handle->size;
space -= space % BTS_RECORD_SIZE;
}
if (space <= BTS_SAFETY_MARGIN) {
/* See if next phys buffer has more space */
next_buf = buf->cur_buf + 1;
if (next_buf >= buf->nr_bufs)
next_buf = 0;
next_phys = &buf->buf[next_buf];
gap = buf_size(phys->page) - phys->displacement - phys->size +
next_phys->displacement;
skip = pad + gap;
if (handle->size >= skip) {
next_space = next_phys->size;
if (next_space + skip > handle->size) {
next_space = handle->size - skip;
next_space -= next_space % BTS_RECORD_SIZE;
}
if (next_space > space || !space) {
if (pad)
bts_buffer_pad_out(phys, head);
ret = perf_aux_output_skip(handle, skip);
if (ret)
return ret;
/* Advance to next phys buffer */
phys = next_phys;
space = next_space;
head = phys->offset + phys->displacement;
/*
* After this, cur_buf and head won't match ds
* anymore, so we must not be racing with
* bts_update().
*/
buf->cur_buf = next_buf;
local_set(&buf->head, head);
}
}
}
/* Don't go far beyond wakeup watermark */
wakeup = BTS_SAFETY_MARGIN + BTS_RECORD_SIZE + handle->wakeup -
handle->head;
if (space > wakeup) {
space = wakeup;
space -= space % BTS_RECORD_SIZE;
}
buf->end = head + space;
/*
* If we have no space, the lost notification would have been sent when
* we hit absolute_maximum - see bts_update()
*/
if (!space)
return -ENOSPC;
return 0;
}
int intel_bts_interrupt(void)
{
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct perf_event *event = bts->handle.event;
struct bts_buffer *buf;
s64 old_head;
int err;
if (!event || !bts->started)
return 0;
buf = perf_get_aux(&bts->handle);
/*
* Skip snapshot counters: they don't use the interrupt, but
* there's no other way of telling, because the pointer will
* keep moving
*/
if (!buf || buf->snapshot)
return 0;
old_head = local_read(&buf->head);
bts_update(bts);
/* no new data */
if (old_head == local_read(&buf->head))
return 0;
perf_aux_output_end(&bts->handle, local_xchg(&buf->data_size, 0),
!!local_xchg(&buf->lost, 0));
buf = perf_aux_output_begin(&bts->handle, event);
if (!buf)
return 1;
err = bts_buffer_reset(buf, &bts->handle);
if (err)
perf_aux_output_end(&bts->handle, 0, false);
return 1;
}
static void bts_event_del(struct perf_event *event, int mode)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct bts_buffer *buf = perf_get_aux(&bts->handle);
bts_event_stop(event, PERF_EF_UPDATE);
if (buf) {
if (buf->snapshot)
bts->handle.head =
local_xchg(&buf->data_size,
buf->nr_pages << PAGE_SHIFT);
perf_aux_output_end(&bts->handle, local_xchg(&buf->data_size, 0),
!!local_xchg(&buf->lost, 0));
}
cpuc->ds->bts_index = bts->ds_back.bts_buffer_base;
cpuc->ds->bts_buffer_base = bts->ds_back.bts_buffer_base;
cpuc->ds->bts_absolute_maximum = bts->ds_back.bts_absolute_maximum;
cpuc->ds->bts_interrupt_threshold = bts->ds_back.bts_interrupt_threshold;
}
static int bts_event_add(struct perf_event *event, int mode)
{
struct bts_buffer *buf;
struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
int ret = -EBUSY;
event->hw.state = PERF_HES_STOPPED;
if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask))
return -EBUSY;
if (bts->handle.event)
return -EBUSY;
buf = perf_aux_output_begin(&bts->handle, event);
if (!buf)
return -EINVAL;
ret = bts_buffer_reset(buf, &bts->handle);
if (ret) {
perf_aux_output_end(&bts->handle, 0, false);
return ret;
}
bts->ds_back.bts_buffer_base = cpuc->ds->bts_buffer_base;
bts->ds_back.bts_absolute_maximum = cpuc->ds->bts_absolute_maximum;
bts->ds_back.bts_interrupt_threshold = cpuc->ds->bts_interrupt_threshold;
if (mode & PERF_EF_START) {
bts_event_start(event, 0);
if (hwc->state & PERF_HES_STOPPED) {
bts_event_del(event, 0);
return -EBUSY;
}
}
return 0;
}
static void bts_event_destroy(struct perf_event *event)
{
x86_del_exclusive(x86_lbr_exclusive_bts);
}
static int bts_event_init(struct perf_event *event)
{
if (event->attr.type != bts_pmu.type)
return -ENOENT;
if (x86_add_exclusive(x86_lbr_exclusive_bts))
return -EBUSY;
event->destroy = bts_event_destroy;
return 0;
}
static void bts_event_read(struct perf_event *event)
{
}
static __init int bts_init(void)
{
if (!boot_cpu_has(X86_FEATURE_DTES64) || !x86_pmu.bts)
return -ENODEV;
bts_pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG | PERF_PMU_CAP_ITRACE;
bts_pmu.task_ctx_nr = perf_sw_context;
bts_pmu.event_init = bts_event_init;
bts_pmu.add = bts_event_add;
bts_pmu.del = bts_event_del;
bts_pmu.start = bts_event_start;
bts_pmu.stop = bts_event_stop;
bts_pmu.read = bts_event_read;
bts_pmu.setup_aux = bts_buffer_setup_aux;
bts_pmu.free_aux = bts_buffer_free_aux;
return perf_pmu_register(&bts_pmu, "intel_bts", -1);
}
module_init(bts_init);

File diff suppressed because it is too large Load diff

View file

@ -461,7 +461,8 @@ void intel_pmu_enable_bts(u64 config)
debugctlmsr |= DEBUGCTLMSR_TR;
debugctlmsr |= DEBUGCTLMSR_BTS;
debugctlmsr |= DEBUGCTLMSR_BTINT;
if (config & ARCH_PERFMON_EVENTSEL_INT)
debugctlmsr |= DEBUGCTLMSR_BTINT;
if (!(config & ARCH_PERFMON_EVENTSEL_OS))
debugctlmsr |= DEBUGCTLMSR_BTS_OFF_OS;
@ -611,6 +612,10 @@ struct event_constraint intel_snb_pebs_event_constraints[] = {
INTEL_PST_CONSTRAINT(0x02cd, 0x8), /* MEM_TRANS_RETIRED.PRECISE_STORES */
/* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */
INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf),
INTEL_EXCLEVT_CONSTRAINT(0xd0, 0xf), /* MEM_UOP_RETIRED.* */
INTEL_EXCLEVT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
INTEL_EXCLEVT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
INTEL_EXCLEVT_CONSTRAINT(0xd3, 0xf), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
/* Allow all events as PEBS with no flags */
INTEL_ALL_EVENT_CONSTRAINT(0, 0xf),
EVENT_CONSTRAINT_END
@ -622,6 +627,10 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
INTEL_PST_CONSTRAINT(0x02cd, 0x8), /* MEM_TRANS_RETIRED.PRECISE_STORES */
/* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */
INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf),
INTEL_EXCLEVT_CONSTRAINT(0xd0, 0xf), /* MEM_UOP_RETIRED.* */
INTEL_EXCLEVT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
INTEL_EXCLEVT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
INTEL_EXCLEVT_CONSTRAINT(0xd3, 0xf), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
/* Allow all events as PEBS with no flags */
INTEL_ALL_EVENT_CONSTRAINT(0, 0xf),
EVENT_CONSTRAINT_END
@ -633,16 +642,16 @@ struct event_constraint intel_hsw_pebs_event_constraints[] = {
/* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */
INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf),
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_STORES */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x42d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_STORES */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x82d0, 0xf), /* MEM_UOPS_RETIRED.ALL_STORES */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd2, 0xf), /* MEM_LOAD_UOPS_L3_HIT_RETIRED.* */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd3, 0xf), /* MEM_LOAD_UOPS_L3_MISS_RETIRED.* */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_XLD(0x11d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_XLD(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_XLD(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_XLD(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_XST(0x12d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_STORES */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_XST(0x42d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_STORES */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_XST(0x82d0, 0xf), /* MEM_UOPS_RETIRED.ALL_STORES */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_XLD(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_XLD(0xd2, 0xf), /* MEM_LOAD_UOPS_L3_HIT_RETIRED.* */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_XLD(0xd3, 0xf), /* MEM_LOAD_UOPS_L3_MISS_RETIRED.* */
/* Allow all events as PEBS with no flags */
INTEL_ALL_EVENT_CONSTRAINT(0, 0xf),
EVENT_CONSTRAINT_END

View file

@ -39,6 +39,7 @@ static enum {
#define LBR_IND_JMP_BIT 6 /* do not capture indirect jumps */
#define LBR_REL_JMP_BIT 7 /* do not capture relative jumps */
#define LBR_FAR_BIT 8 /* do not capture far branches */
#define LBR_CALL_STACK_BIT 9 /* enable call stack */
#define LBR_KERNEL (1 << LBR_KERNEL_BIT)
#define LBR_USER (1 << LBR_USER_BIT)
@ -49,6 +50,7 @@ static enum {
#define LBR_REL_JMP (1 << LBR_REL_JMP_BIT)
#define LBR_IND_JMP (1 << LBR_IND_JMP_BIT)
#define LBR_FAR (1 << LBR_FAR_BIT)
#define LBR_CALL_STACK (1 << LBR_CALL_STACK_BIT)
#define LBR_PLM (LBR_KERNEL | LBR_USER)
@ -69,33 +71,31 @@ static enum {
#define LBR_FROM_FLAG_IN_TX (1ULL << 62)
#define LBR_FROM_FLAG_ABORT (1ULL << 61)
#define for_each_branch_sample_type(x) \
for ((x) = PERF_SAMPLE_BRANCH_USER; \
(x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
/*
* x86control flow change classification
* x86control flow changes include branches, interrupts, traps, faults
*/
enum {
X86_BR_NONE = 0, /* unknown */
X86_BR_NONE = 0, /* unknown */
X86_BR_USER = 1 << 0, /* branch target is user */
X86_BR_KERNEL = 1 << 1, /* branch target is kernel */
X86_BR_USER = 1 << 0, /* branch target is user */
X86_BR_KERNEL = 1 << 1, /* branch target is kernel */
X86_BR_CALL = 1 << 2, /* call */
X86_BR_RET = 1 << 3, /* return */
X86_BR_SYSCALL = 1 << 4, /* syscall */
X86_BR_SYSRET = 1 << 5, /* syscall return */
X86_BR_INT = 1 << 6, /* sw interrupt */
X86_BR_IRET = 1 << 7, /* return from interrupt */
X86_BR_JCC = 1 << 8, /* conditional */
X86_BR_JMP = 1 << 9, /* jump */
X86_BR_IRQ = 1 << 10,/* hw interrupt or trap or fault */
X86_BR_IND_CALL = 1 << 11,/* indirect calls */
X86_BR_ABORT = 1 << 12,/* transaction abort */
X86_BR_IN_TX = 1 << 13,/* in transaction */
X86_BR_NO_TX = 1 << 14,/* not in transaction */
X86_BR_CALL = 1 << 2, /* call */
X86_BR_RET = 1 << 3, /* return */
X86_BR_SYSCALL = 1 << 4, /* syscall */
X86_BR_SYSRET = 1 << 5, /* syscall return */
X86_BR_INT = 1 << 6, /* sw interrupt */
X86_BR_IRET = 1 << 7, /* return from interrupt */
X86_BR_JCC = 1 << 8, /* conditional */
X86_BR_JMP = 1 << 9, /* jump */
X86_BR_IRQ = 1 << 10,/* hw interrupt or trap or fault */
X86_BR_IND_CALL = 1 << 11,/* indirect calls */
X86_BR_ABORT = 1 << 12,/* transaction abort */
X86_BR_IN_TX = 1 << 13,/* in transaction */
X86_BR_NO_TX = 1 << 14,/* not in transaction */
X86_BR_ZERO_CALL = 1 << 15,/* zero length call */
X86_BR_CALL_STACK = 1 << 16,/* call stack */
};
#define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
@ -112,13 +112,15 @@ enum {
X86_BR_JMP |\
X86_BR_IRQ |\
X86_BR_ABORT |\
X86_BR_IND_CALL)
X86_BR_IND_CALL |\
X86_BR_ZERO_CALL)
#define X86_BR_ALL (X86_BR_PLM | X86_BR_ANY)
#define X86_BR_ANY_CALL \
(X86_BR_CALL |\
X86_BR_IND_CALL |\
X86_BR_ZERO_CALL |\
X86_BR_SYSCALL |\
X86_BR_IRQ |\
X86_BR_INT)
@ -130,17 +132,32 @@ static void intel_pmu_lbr_filter(struct cpu_hw_events *cpuc);
* otherwise it becomes near impossible to get a reliable stack.
*/
static void __intel_pmu_lbr_enable(void)
static void __intel_pmu_lbr_enable(bool pmi)
{
u64 debugctl;
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
u64 debugctl, lbr_select = 0, orig_debugctl;
if (cpuc->lbr_sel)
wrmsrl(MSR_LBR_SELECT, cpuc->lbr_sel->config);
/*
* No need to reprogram LBR_SELECT in a PMI, as it
* did not change.
*/
if (cpuc->lbr_sel && !pmi) {
lbr_select = cpuc->lbr_sel->config;
wrmsrl(MSR_LBR_SELECT, lbr_select);
}
rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
debugctl |= (DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI);
wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
orig_debugctl = debugctl;
debugctl |= DEBUGCTLMSR_LBR;
/*
* LBR callstack does not work well with FREEZE_LBRS_ON_PMI.
* If FREEZE_LBRS_ON_PMI is set, PMI near call/return instructions
* may cause superfluous increase/decrease of LBR_TOS.
*/
if (!(lbr_select & LBR_CALL_STACK))
debugctl |= DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
if (orig_debugctl != debugctl)
wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
}
static void __intel_pmu_lbr_disable(void)
@ -181,9 +198,116 @@ void intel_pmu_lbr_reset(void)
intel_pmu_lbr_reset_64();
}
/*
* TOS = most recently recorded branch
*/
static inline u64 intel_pmu_lbr_tos(void)
{
u64 tos;
rdmsrl(x86_pmu.lbr_tos, tos);
return tos;
}
enum {
LBR_NONE,
LBR_VALID,
};
static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx)
{
int i;
unsigned lbr_idx, mask;
u64 tos;
if (task_ctx->lbr_callstack_users == 0 ||
task_ctx->lbr_stack_state == LBR_NONE) {
intel_pmu_lbr_reset();
return;
}
mask = x86_pmu.lbr_nr - 1;
tos = intel_pmu_lbr_tos();
for (i = 0; i < x86_pmu.lbr_nr; i++) {
lbr_idx = (tos - i) & mask;
wrmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
wrmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
}
task_ctx->lbr_stack_state = LBR_NONE;
}
static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
{
int i;
unsigned lbr_idx, mask;
u64 tos;
if (task_ctx->lbr_callstack_users == 0) {
task_ctx->lbr_stack_state = LBR_NONE;
return;
}
mask = x86_pmu.lbr_nr - 1;
tos = intel_pmu_lbr_tos();
for (i = 0; i < x86_pmu.lbr_nr; i++) {
lbr_idx = (tos - i) & mask;
rdmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
rdmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
}
task_ctx->lbr_stack_state = LBR_VALID;
}
void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct x86_perf_task_context *task_ctx;
if (!x86_pmu.lbr_nr)
return;
/*
* If LBR callstack feature is enabled and the stack was saved when
* the task was scheduled out, restore the stack. Otherwise flush
* the LBR stack.
*/
task_ctx = ctx ? ctx->task_ctx_data : NULL;
if (task_ctx) {
if (sched_in) {
__intel_pmu_lbr_restore(task_ctx);
cpuc->lbr_context = ctx;
} else {
__intel_pmu_lbr_save(task_ctx);
}
return;
}
/*
* When sampling the branck stack in system-wide, it may be
* necessary to flush the stack on context switch. This happens
* when the branch stack does not tag its entries with the pid
* of the current task. Otherwise it becomes impossible to
* associate a branch entry with a task. This ambiguity is more
* likely to appear when the branch stack supports priv level
* filtering and the user sets it to monitor only at the user
* level (which could be a useful measurement in system-wide
* mode). In that case, the risk is high of having a branch
* stack with branch from multiple tasks.
*/
if (sched_in) {
intel_pmu_lbr_reset();
cpuc->lbr_context = ctx;
}
}
static inline bool branch_user_callstack(unsigned br_sel)
{
return (br_sel & X86_BR_USER) && (br_sel & X86_BR_CALL_STACK);
}
void intel_pmu_lbr_enable(struct perf_event *event)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct x86_perf_task_context *task_ctx;
if (!x86_pmu.lbr_nr)
return;
@ -198,18 +322,33 @@ void intel_pmu_lbr_enable(struct perf_event *event)
}
cpuc->br_sel = event->hw.branch_reg.reg;
if (branch_user_callstack(cpuc->br_sel) && event->ctx &&
event->ctx->task_ctx_data) {
task_ctx = event->ctx->task_ctx_data;
task_ctx->lbr_callstack_users++;
}
cpuc->lbr_users++;
perf_sched_cb_inc(event->ctx->pmu);
}
void intel_pmu_lbr_disable(struct perf_event *event)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct x86_perf_task_context *task_ctx;
if (!x86_pmu.lbr_nr)
return;
if (branch_user_callstack(cpuc->br_sel) && event->ctx &&
event->ctx->task_ctx_data) {
task_ctx = event->ctx->task_ctx_data;
task_ctx->lbr_callstack_users--;
}
cpuc->lbr_users--;
WARN_ON_ONCE(cpuc->lbr_users < 0);
perf_sched_cb_dec(event->ctx->pmu);
if (cpuc->enabled && !cpuc->lbr_users) {
__intel_pmu_lbr_disable();
@ -218,12 +357,12 @@ void intel_pmu_lbr_disable(struct perf_event *event)
}
}
void intel_pmu_lbr_enable_all(void)
void intel_pmu_lbr_enable_all(bool pmi)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
if (cpuc->lbr_users)
__intel_pmu_lbr_enable();
__intel_pmu_lbr_enable(pmi);
}
void intel_pmu_lbr_disable_all(void)
@ -234,18 +373,6 @@ void intel_pmu_lbr_disable_all(void)
__intel_pmu_lbr_disable();
}
/*
* TOS = most recently recorded branch
*/
static inline u64 intel_pmu_lbr_tos(void)
{
u64 tos;
rdmsrl(x86_pmu.lbr_tos, tos);
return tos;
}
static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
{
unsigned long mask = x86_pmu.lbr_nr - 1;
@ -350,7 +477,7 @@ void intel_pmu_lbr_read(void)
* - in case there is no HW filter
* - in case the HW filter has errata or limitations
*/
static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
{
u64 br_type = event->attr.branch_sample_type;
int mask = 0;
@ -387,11 +514,21 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
if (br_type & PERF_SAMPLE_BRANCH_COND)
mask |= X86_BR_JCC;
if (br_type & PERF_SAMPLE_BRANCH_CALL_STACK) {
if (!x86_pmu_has_lbr_callstack())
return -EOPNOTSUPP;
if (mask & ~(X86_BR_USER | X86_BR_KERNEL))
return -EINVAL;
mask |= X86_BR_CALL | X86_BR_IND_CALL | X86_BR_RET |
X86_BR_CALL_STACK;
}
/*
* stash actual user request into reg, it may
* be used by fixup code for some CPU
*/
event->hw.branch_reg.reg = mask;
return 0;
}
/*
@ -403,14 +540,14 @@ static int intel_pmu_setup_hw_lbr_filter(struct perf_event *event)
{
struct hw_perf_event_extra *reg;
u64 br_type = event->attr.branch_sample_type;
u64 mask = 0, m;
u64 v;
u64 mask = 0, v;
int i;
for_each_branch_sample_type(m) {
if (!(br_type & m))
for (i = 0; i < PERF_SAMPLE_BRANCH_MAX_SHIFT; i++) {
if (!(br_type & (1ULL << i)))
continue;
v = x86_pmu.lbr_sel_map[m];
v = x86_pmu.lbr_sel_map[i];
if (v == LBR_NOT_SUPP)
return -EOPNOTSUPP;
@ -420,8 +557,12 @@ static int intel_pmu_setup_hw_lbr_filter(struct perf_event *event)
reg = &event->hw.branch_reg;
reg->idx = EXTRA_REG_LBR;
/* LBR_SELECT operates in suppress mode so invert mask */
reg->config = ~mask & x86_pmu.lbr_sel_mask;
/*
* The first 9 bits (LBR_SEL_MASK) in LBR_SELECT operate
* in suppress mode. So LBR_SELECT should be set to
* (~mask & LBR_SEL_MASK) | (mask & ~LBR_SEL_MASK)
*/
reg->config = mask ^ x86_pmu.lbr_sel_mask;
return 0;
}
@ -439,7 +580,9 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
/*
* setup SW LBR filter
*/
intel_pmu_setup_sw_lbr_filter(event);
ret = intel_pmu_setup_sw_lbr_filter(event);
if (ret)
return ret;
/*
* setup HW LBR filter, if any
@ -568,6 +711,12 @@ static int branch_type(unsigned long from, unsigned long to, int abort)
ret = X86_BR_INT;
break;
case 0xe8: /* call near rel */
insn_get_immediate(&insn);
if (insn.immediate1.value == 0) {
/* zero length call */
ret = X86_BR_ZERO_CALL;
break;
}
case 0x9a: /* call far absolute */
ret = X86_BR_CALL;
break;
@ -678,35 +827,49 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
/*
* Map interface branch filters onto LBR filters
*/
static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
[PERF_SAMPLE_BRANCH_ANY] = LBR_ANY,
[PERF_SAMPLE_BRANCH_USER] = LBR_USER,
[PERF_SAMPLE_BRANCH_KERNEL] = LBR_KERNEL,
[PERF_SAMPLE_BRANCH_HV] = LBR_IGN,
[PERF_SAMPLE_BRANCH_ANY_RETURN] = LBR_RETURN | LBR_REL_JMP
| LBR_IND_JMP | LBR_FAR,
static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
[PERF_SAMPLE_BRANCH_ANY_SHIFT] = LBR_ANY,
[PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
[PERF_SAMPLE_BRANCH_KERNEL_SHIFT] = LBR_KERNEL,
[PERF_SAMPLE_BRANCH_HV_SHIFT] = LBR_IGN,
[PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT] = LBR_RETURN | LBR_REL_JMP
| LBR_IND_JMP | LBR_FAR,
/*
* NHM/WSM erratum: must include REL_JMP+IND_JMP to get CALL branches
*/
[PERF_SAMPLE_BRANCH_ANY_CALL] =
[PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] =
LBR_REL_CALL | LBR_IND_CALL | LBR_REL_JMP | LBR_IND_JMP | LBR_FAR,
/*
* NHM/WSM erratum: must include IND_JMP to capture IND_CALL
*/
[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
[PERF_SAMPLE_BRANCH_COND] = LBR_JCC,
[PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = LBR_IND_CALL | LBR_IND_JMP,
[PERF_SAMPLE_BRANCH_COND_SHIFT] = LBR_JCC,
};
static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
[PERF_SAMPLE_BRANCH_ANY] = LBR_ANY,
[PERF_SAMPLE_BRANCH_USER] = LBR_USER,
[PERF_SAMPLE_BRANCH_KERNEL] = LBR_KERNEL,
[PERF_SAMPLE_BRANCH_HV] = LBR_IGN,
[PERF_SAMPLE_BRANCH_ANY_RETURN] = LBR_RETURN | LBR_FAR,
[PERF_SAMPLE_BRANCH_ANY_CALL] = LBR_REL_CALL | LBR_IND_CALL
| LBR_FAR,
[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL,
[PERF_SAMPLE_BRANCH_COND] = LBR_JCC,
static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
[PERF_SAMPLE_BRANCH_ANY_SHIFT] = LBR_ANY,
[PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
[PERF_SAMPLE_BRANCH_KERNEL_SHIFT] = LBR_KERNEL,
[PERF_SAMPLE_BRANCH_HV_SHIFT] = LBR_IGN,
[PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT] = LBR_RETURN | LBR_FAR,
[PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] = LBR_REL_CALL | LBR_IND_CALL
| LBR_FAR,
[PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = LBR_IND_CALL,
[PERF_SAMPLE_BRANCH_COND_SHIFT] = LBR_JCC,
};
static const int hsw_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
[PERF_SAMPLE_BRANCH_ANY_SHIFT] = LBR_ANY,
[PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
[PERF_SAMPLE_BRANCH_KERNEL_SHIFT] = LBR_KERNEL,
[PERF_SAMPLE_BRANCH_HV_SHIFT] = LBR_IGN,
[PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT] = LBR_RETURN | LBR_FAR,
[PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] = LBR_REL_CALL | LBR_IND_CALL
| LBR_FAR,
[PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = LBR_IND_CALL,
[PERF_SAMPLE_BRANCH_COND_SHIFT] = LBR_JCC,
[PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT] = LBR_REL_CALL | LBR_IND_CALL
| LBR_RETURN | LBR_CALL_STACK,
};
/* core */
@ -765,6 +928,20 @@ void __init intel_pmu_lbr_init_snb(void)
pr_cont("16-deep LBR, ");
}
/* haswell */
void intel_pmu_lbr_init_hsw(void)
{
x86_pmu.lbr_nr = 16;
x86_pmu.lbr_tos = MSR_LBR_TOS;
x86_pmu.lbr_from = MSR_LBR_NHM_FROM;
x86_pmu.lbr_to = MSR_LBR_NHM_TO;
x86_pmu.lbr_sel_mask = LBR_SEL_MASK;
x86_pmu.lbr_sel_map = hsw_lbr_sel_map;
pr_cont("16-deep LBR, ");
}
/* atom */
void __init intel_pmu_lbr_init_atom(void)
{

File diff suppressed because it is too large Load diff

View file

@ -1132,8 +1132,7 @@ static int snbep_pci2phy_map_init(int devid)
}
}
if (ubox_dev)
pci_dev_put(ubox_dev);
pci_dev_put(ubox_dev);
return err ? pcibios_err_to_errno(err) : 0;
}

View file

@ -41,6 +41,7 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
{ X86_FEATURE_HWP_ACT_WINDOW, CR_EAX, 9, 0x00000006, 0 },
{ X86_FEATURE_HWP_EPP, CR_EAX,10, 0x00000006, 0 },
{ X86_FEATURE_HWP_PKG_REQ, CR_EAX,11, 0x00000006, 0 },
{ X86_FEATURE_INTEL_PT, CR_EBX,25, 0x00000007, 0 },
{ X86_FEATURE_APERFMPERF, CR_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CR_ECX, 3, 0x00000006, 0 },
{ X86_FEATURE_HW_PSTATE, CR_EDX, 7, 0x80000007, 0 },

View file

@ -354,6 +354,7 @@ int __copy_instruction(u8 *dest, u8 *src)
{
struct insn insn;
kprobe_opcode_t buf[MAX_INSN_SIZE];
int length;
unsigned long recovered_insn =
recover_probed_instruction(buf, (unsigned long)src);
@ -361,16 +362,18 @@ int __copy_instruction(u8 *dest, u8 *src)
return 0;
kernel_insn_init(&insn, (void *)recovered_insn, MAX_INSN_SIZE);
insn_get_length(&insn);
length = insn.length;
/* Another subsystem puts a breakpoint, failed to recover */
if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION)
return 0;
memcpy(dest, insn.kaddr, insn.length);
memcpy(dest, insn.kaddr, length);
#ifdef CONFIG_X86_64
if (insn_rip_relative(&insn)) {
s64 newdisp;
u8 *disp;
kernel_insn_init(&insn, dest, insn.length);
kernel_insn_init(&insn, dest, length);
insn_get_displacement(&insn);
/*
* The copied instruction uses the %rip-relative addressing
@ -394,7 +397,7 @@ int __copy_instruction(u8 *dest, u8 *src)
*(s32 *) disp = (s32) newdisp;
}
#endif
return insn.length;
return length;
}
static int arch_copy_kprobe(struct kprobe *p)

View file

@ -113,8 +113,6 @@ struct bpf_prog_type_list {
enum bpf_prog_type type;
};
void bpf_register_prog_type(struct bpf_prog_type_list *tl);
struct bpf_prog;
struct bpf_prog_aux {
@ -129,11 +127,25 @@ struct bpf_prog_aux {
};
#ifdef CONFIG_BPF_SYSCALL
void bpf_register_prog_type(struct bpf_prog_type_list *tl);
void bpf_prog_put(struct bpf_prog *prog);
#else
static inline void bpf_prog_put(struct bpf_prog *prog) {}
#endif
struct bpf_prog *bpf_prog_get(u32 ufd);
#else
static inline void bpf_register_prog_type(struct bpf_prog_type_list *tl)
{
}
static inline struct bpf_prog *bpf_prog_get(u32 ufd)
{
return ERR_PTR(-EOPNOTSUPP);
}
static inline void bpf_prog_put(struct bpf_prog *prog)
{
}
#endif
/* verify correctness of eBPF program */
int bpf_check(struct bpf_prog *fp, union bpf_attr *attr);

View file

@ -13,6 +13,7 @@ struct trace_array;
struct trace_buffer;
struct tracer;
struct dentry;
struct bpf_prog;
struct trace_print_flags {
unsigned long mask;
@ -252,6 +253,7 @@ enum {
TRACE_EVENT_FL_WAS_ENABLED_BIT,
TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
TRACE_EVENT_FL_TRACEPOINT_BIT,
TRACE_EVENT_FL_KPROBE_BIT,
};
/*
@ -265,6 +267,7 @@ enum {
* it is best to clear the buffers that used it).
* USE_CALL_FILTER - For ftrace internal events, don't use file filter
* TRACEPOINT - Event is a tracepoint
* KPROBE - Event is a kprobe
*/
enum {
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
@ -274,6 +277,7 @@ enum {
TRACE_EVENT_FL_WAS_ENABLED = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT),
TRACE_EVENT_FL_USE_CALL_FILTER = (1 << TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT),
};
struct ftrace_event_call {
@ -303,6 +307,7 @@ struct ftrace_event_call {
#ifdef CONFIG_PERF_EVENTS
int perf_refcount;
struct hlist_head __percpu *perf_events;
struct bpf_prog *prog;
int (*perf_perm)(struct ftrace_event_call *,
struct perf_event *);
@ -548,6 +553,15 @@ event_trigger_unlock_commit_regs(struct ftrace_event_file *file,
event_triggers_post_call(file, tt);
}
#ifdef CONFIG_BPF_SYSCALL
unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx);
#else
static inline unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx)
{
return 1;
}
#endif
enum {
FILTER_OTHER = 0,
FILTER_STATIC_STRING,

View file

@ -53,6 +53,7 @@ struct perf_guest_info_callbacks {
#include <linux/sysfs.h>
#include <linux/perf_regs.h>
#include <linux/workqueue.h>
#include <linux/cgroup.h>
#include <asm/local.h>
struct perf_callchain_entry {
@ -118,10 +119,19 @@ struct hw_perf_event {
struct hrtimer hrtimer;
};
struct { /* tracepoint */
struct task_struct *tp_target;
/* for tp_event->class */
struct list_head tp_list;
};
struct { /* intel_cqm */
int cqm_state;
int cqm_rmid;
struct list_head cqm_events_entry;
struct list_head cqm_groups_entry;
struct list_head cqm_group_entry;
};
struct { /* itrace */
int itrace_started;
};
#ifdef CONFIG_HAVE_HW_BREAKPOINT
struct { /* breakpoint */
/*
@ -129,12 +139,12 @@ struct hw_perf_event {
* problem hw_breakpoint has with context
* creation and event initalization.
*/
struct task_struct *bp_target;
struct arch_hw_breakpoint info;
struct list_head bp_list;
};
#endif
};
struct task_struct *target;
int state;
local64_t prev_count;
u64 sample_period;
@ -166,6 +176,11 @@ struct perf_event;
* pmu::capabilities flags
*/
#define PERF_PMU_CAP_NO_INTERRUPT 0x01
#define PERF_PMU_CAP_NO_NMI 0x02
#define PERF_PMU_CAP_AUX_NO_SG 0x04
#define PERF_PMU_CAP_AUX_SW_DOUBLEBUF 0x08
#define PERF_PMU_CAP_EXCLUSIVE 0x10
#define PERF_PMU_CAP_ITRACE 0x20
/**
* struct pmu - generic performance monitoring unit
@ -186,6 +201,7 @@ struct pmu {
int * __percpu pmu_disable_count;
struct perf_cpu_context * __percpu pmu_cpu_context;
atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */
int task_ctx_nr;
int hrtimer_interval_ms;
@ -262,9 +278,32 @@ struct pmu {
int (*event_idx) (struct perf_event *event); /*optional */
/*
* flush branch stack on context-switches (needed in cpu-wide mode)
* context-switches callback
*/
void (*flush_branch_stack) (void);
void (*sched_task) (struct perf_event_context *ctx,
bool sched_in);
/*
* PMU specific data size
*/
size_t task_ctx_size;
/*
* Return the count value for a counter.
*/
u64 (*count) (struct perf_event *event); /*optional*/
/*
* Set up pmu-private data structures for an AUX area
*/
void *(*setup_aux) (int cpu, void **pages,
int nr_pages, bool overwrite);
/* optional */
/*
* Free pmu-private AUX data structures
*/
void (*free_aux) (void *aux); /* optional */
};
/**
@ -300,6 +339,7 @@ struct swevent_hlist {
#define PERF_ATTACH_CONTEXT 0x01
#define PERF_ATTACH_GROUP 0x02
#define PERF_ATTACH_TASK 0x04
#define PERF_ATTACH_TASK_DATA 0x08
struct perf_cgroup;
struct ring_buffer;
@ -438,6 +478,7 @@ struct perf_event {
struct pid_namespace *ns;
u64 id;
u64 (*clock)(void);
perf_overflow_handler_t overflow_handler;
void *overflow_handler_context;
@ -504,7 +545,7 @@ struct perf_event_context {
u64 generation;
int pin_count;
int nr_cgroups; /* cgroup evts */
int nr_branch_stack; /* branch_stack evt */
void *task_ctx_data; /* pmu specific data */
struct rcu_head rcu_head;
struct delayed_work orphans_remove;
@ -536,12 +577,52 @@ struct perf_output_handle {
struct ring_buffer *rb;
unsigned long wakeup;
unsigned long size;
void *addr;
union {
void *addr;
unsigned long head;
};
int page;
};
#ifdef CONFIG_CGROUP_PERF
/*
* perf_cgroup_info keeps track of time_enabled for a cgroup.
* This is a per-cpu dynamically allocated data structure.
*/
struct perf_cgroup_info {
u64 time;
u64 timestamp;
};
struct perf_cgroup {
struct cgroup_subsys_state css;
struct perf_cgroup_info __percpu *info;
};
/*
* Must ensure cgroup is pinned (css_get) before calling
* this function. In other words, we cannot call this function
* if there is no cgroup event for the current CPU context.
*/
static inline struct perf_cgroup *
perf_cgroup_from_task(struct task_struct *task)
{
return container_of(task_css(task, perf_event_cgrp_id),
struct perf_cgroup, css);
}
#endif /* CONFIG_CGROUP_PERF */
#ifdef CONFIG_PERF_EVENTS
extern void *perf_aux_output_begin(struct perf_output_handle *handle,
struct perf_event *event);
extern void perf_aux_output_end(struct perf_output_handle *handle,
unsigned long size, bool truncated);
extern int perf_aux_output_skip(struct perf_output_handle *handle,
unsigned long size);
extern void *perf_get_aux(struct perf_output_handle *handle);
extern int perf_pmu_register(struct pmu *pmu, const char *name, int type);
extern void perf_pmu_unregister(struct pmu *pmu);
@ -558,6 +639,8 @@ extern void perf_event_delayed_put(struct task_struct *task);
extern void perf_event_print_debug(void);
extern void perf_pmu_disable(struct pmu *pmu);
extern void perf_pmu_enable(struct pmu *pmu);
extern void perf_sched_cb_dec(struct pmu *pmu);
extern void perf_sched_cb_inc(struct pmu *pmu);
extern int perf_event_task_disable(void);
extern int perf_event_task_enable(void);
extern int perf_event_refresh(struct perf_event *event, int refresh);
@ -731,6 +814,11 @@ static inline void perf_event_task_sched_out(struct task_struct *prev,
__perf_event_task_sched_out(prev, next);
}
static inline u64 __perf_event_count(struct perf_event *event)
{
return local64_read(&event->count) + atomic64_read(&event->child_count);
}
extern void perf_event_mmap(struct vm_area_struct *vma);
extern struct perf_guest_info_callbacks *perf_guest_cbs;
extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
@ -800,6 +888,16 @@ static inline bool has_branch_stack(struct perf_event *event)
return event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK;
}
static inline bool needs_branch_stack(struct perf_event *event)
{
return event->attr.branch_sample_type != 0;
}
static inline bool has_aux(struct perf_event *event)
{
return event->pmu->setup_aux;
}
extern int perf_output_begin(struct perf_output_handle *handle,
struct perf_event *event, unsigned int size);
extern void perf_output_end(struct perf_output_handle *handle);
@ -815,6 +913,17 @@ extern void perf_event_disable(struct perf_event *event);
extern int __perf_event_disable(void *info);
extern void perf_event_task_tick(void);
#else /* !CONFIG_PERF_EVENTS: */
static inline void *
perf_aux_output_begin(struct perf_output_handle *handle,
struct perf_event *event) { return NULL; }
static inline void
perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
bool truncated) { }
static inline int
perf_aux_output_skip(struct perf_output_handle *handle,
unsigned long size) { return -EINVAL; }
static inline void *
perf_get_aux(struct perf_output_handle *handle) { return NULL; }
static inline void
perf_event_task_sched_in(struct task_struct *prev,
struct task_struct *task) { }

View file

@ -137,4 +137,12 @@ extern int watchdog_init_timeout(struct watchdog_device *wdd,
extern int watchdog_register_device(struct watchdog_device *);
extern void watchdog_unregister_device(struct watchdog_device *);
#ifdef CONFIG_HARDLOCKUP_DETECTOR
void watchdog_nmi_disable_all(void);
void watchdog_nmi_enable_all(void);
#else
static inline void watchdog_nmi_disable_all(void) {}
static inline void watchdog_nmi_enable_all(void) {}
#endif
#endif /* ifndef _LINUX_WATCHDOG_H */

View file

@ -118,6 +118,7 @@ enum bpf_map_type {
enum bpf_prog_type {
BPF_PROG_TYPE_UNSPEC,
BPF_PROG_TYPE_SOCKET_FILTER,
BPF_PROG_TYPE_KPROBE,
};
/* flags for BPF_MAP_UPDATE_ELEM command */
@ -151,6 +152,7 @@ union bpf_attr {
__u32 log_level; /* verbosity level of verifier */
__u32 log_size; /* size of user buffer */
__aligned_u64 log_buf; /* user supplied buffer */
__u32 kern_version; /* checked when prog_type=kprobe */
};
} __attribute__((aligned(8)));
@ -162,6 +164,9 @@ enum bpf_func_id {
BPF_FUNC_map_lookup_elem, /* void *map_lookup_elem(&map, &key) */
BPF_FUNC_map_update_elem, /* int map_update_elem(&map, &key, &value, flags) */
BPF_FUNC_map_delete_elem, /* int map_delete_elem(&map, &key) */
BPF_FUNC_probe_read, /* int bpf_probe_read(void *dst, int size, void *src) */
BPF_FUNC_ktime_get_ns, /* u64 bpf_ktime_get_ns(void) */
BPF_FUNC_trace_printk, /* int bpf_trace_printk(const char *fmt, int fmt_size, ...) */
__BPF_FUNC_MAX_ID,
};

View file

@ -152,21 +152,42 @@ enum perf_event_sample_format {
* The branch types can be combined, however BRANCH_ANY covers all types
* of branches and therefore it supersedes all the other types.
*/
enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_USER_SHIFT = 0, /* user branches */
PERF_SAMPLE_BRANCH_KERNEL_SHIFT = 1, /* kernel branches */
PERF_SAMPLE_BRANCH_HV_SHIFT = 2, /* hypervisor branches */
PERF_SAMPLE_BRANCH_ANY_SHIFT = 3, /* any branch types */
PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT = 4, /* any call branch */
PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT = 5, /* any return branch */
PERF_SAMPLE_BRANCH_IND_CALL_SHIFT = 6, /* indirect calls */
PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT = 7, /* transaction aborts */
PERF_SAMPLE_BRANCH_IN_TX_SHIFT = 8, /* in transaction */
PERF_SAMPLE_BRANCH_NO_TX_SHIFT = 9, /* not in transaction */
PERF_SAMPLE_BRANCH_COND_SHIFT = 10, /* conditional branches */
PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT = 11, /* call/ret stack */
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};
enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_USER = 1U << 0, /* user branches */
PERF_SAMPLE_BRANCH_KERNEL = 1U << 1, /* kernel branches */
PERF_SAMPLE_BRANCH_HV = 1U << 2, /* hypervisor branches */
PERF_SAMPLE_BRANCH_USER = 1U << PERF_SAMPLE_BRANCH_USER_SHIFT,
PERF_SAMPLE_BRANCH_KERNEL = 1U << PERF_SAMPLE_BRANCH_KERNEL_SHIFT,
PERF_SAMPLE_BRANCH_HV = 1U << PERF_SAMPLE_BRANCH_HV_SHIFT,
PERF_SAMPLE_BRANCH_ANY = 1U << 3, /* any branch types */
PERF_SAMPLE_BRANCH_ANY_CALL = 1U << 4, /* any call branch */
PERF_SAMPLE_BRANCH_ANY_RETURN = 1U << 5, /* any return branch */
PERF_SAMPLE_BRANCH_IND_CALL = 1U << 6, /* indirect calls */
PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
PERF_SAMPLE_BRANCH_IN_TX = 1U << 8, /* in transaction */
PERF_SAMPLE_BRANCH_NO_TX = 1U << 9, /* not in transaction */
PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */
PERF_SAMPLE_BRANCH_ANY = 1U << PERF_SAMPLE_BRANCH_ANY_SHIFT,
PERF_SAMPLE_BRANCH_ANY_CALL = 1U << PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT,
PERF_SAMPLE_BRANCH_ANY_RETURN = 1U << PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT,
PERF_SAMPLE_BRANCH_IND_CALL = 1U << PERF_SAMPLE_BRANCH_IND_CALL_SHIFT,
PERF_SAMPLE_BRANCH_ABORT_TX = 1U << PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT,
PERF_SAMPLE_BRANCH_IN_TX = 1U << PERF_SAMPLE_BRANCH_IN_TX_SHIFT,
PERF_SAMPLE_BRANCH_NO_TX = 1U << PERF_SAMPLE_BRANCH_NO_TX_SHIFT,
PERF_SAMPLE_BRANCH_COND = 1U << PERF_SAMPLE_BRANCH_COND_SHIFT,
PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */
PERF_SAMPLE_BRANCH_CALL_STACK = 1U << PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT,
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};
#define PERF_SAMPLE_BRANCH_PLM_ALL \
@ -240,6 +261,7 @@ enum perf_event_read_format {
#define PERF_ATTR_SIZE_VER3 96 /* add: sample_regs_user */
/* add: sample_stack_user */
#define PERF_ATTR_SIZE_VER4 104 /* add: sample_regs_intr */
#define PERF_ATTR_SIZE_VER5 112 /* add: aux_watermark */
/*
* Hardware event_id to monitor via a performance monitoring event:
@ -305,7 +327,8 @@ struct perf_event_attr {
exclude_callchain_user : 1, /* exclude user callchains */
mmap2 : 1, /* include mmap with inode data */
comm_exec : 1, /* flag comm events that are due to an exec */
__reserved_1 : 39;
use_clockid : 1, /* use @clockid for time fields */
__reserved_1 : 38;
union {
__u32 wakeup_events; /* wakeup every n events */
@ -334,8 +357,7 @@ struct perf_event_attr {
*/
__u32 sample_stack_user;
/* Align to u64. */
__u32 __reserved_2;
__s32 clockid;
/*
* Defines set of regs to dump for each sample
* state captured on:
@ -345,6 +367,12 @@ struct perf_event_attr {
* See asm/perf_regs.h for details.
*/
__u64 sample_regs_intr;
/*
* Wakeup watermark for AUX area
*/
__u32 aux_watermark;
__u32 __reserved_2; /* align to __u64 */
};
#define perf_flags(attr) (*(&(attr)->read_format + 1))
@ -360,6 +388,7 @@ struct perf_event_attr {
#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
#define PERF_EVENT_IOC_ID _IOR('$', 7, __u64 *)
#define PERF_EVENT_IOC_SET_BPF _IOW('$', 8, __u32)
enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
@ -500,9 +529,30 @@ struct perf_event_mmap_page {
* In this case the kernel will not over-write unread data.
*
* See perf_output_put_handle() for the data ordering.
*
* data_{offset,size} indicate the location and size of the perf record
* buffer within the mmapped area.
*/
__u64 data_head; /* head in the data section */
__u64 data_tail; /* user-space written tail */
__u64 data_offset; /* where the buffer starts */
__u64 data_size; /* data buffer size */
/*
* AUX area is defined by aux_{offset,size} fields that should be set
* by the userspace, so that
*
* aux_offset >= data_offset + data_size
*
* prior to mmap()ing it. Size of the mmap()ed area should be aux_size.
*
* Ring buffer pointers aux_{head,tail} have the same semantics as
* data_{head,tail} and same ordering rules apply.
*/
__u64 aux_head;
__u64 aux_tail;
__u64 aux_offset;
__u64 aux_size;
};
#define PERF_RECORD_MISC_CPUMODE_MASK (7 << 0)
@ -725,6 +775,31 @@ enum perf_event_type {
*/
PERF_RECORD_MMAP2 = 10,
/*
* Records that new data landed in the AUX buffer part.
*
* struct {
* struct perf_event_header header;
*
* u64 aux_offset;
* u64 aux_size;
* u64 flags;
* struct sample_id sample_id;
* };
*/
PERF_RECORD_AUX = 11,
/*
* Indicates that instruction trace has started
*
* struct {
* struct perf_event_header header;
* u32 pid;
* u32 tid;
* };
*/
PERF_RECORD_ITRACE_START = 12,
PERF_RECORD_MAX, /* non-ABI */
};
@ -742,6 +817,12 @@ enum perf_callchain_context {
PERF_CONTEXT_MAX = (__u64)-4095,
};
/**
* PERF_RECORD_AUX::flags bits
*/
#define PERF_AUX_FLAG_TRUNCATED 0x01 /* record was truncated to fit */
#define PERF_AUX_FLAG_OVERWRITE 0x02 /* snapshot from overwrite mode */
#define PERF_FLAG_FD_NO_GROUP (1UL << 0)
#define PERF_FLAG_FD_OUTPUT (1UL << 1)
#define PERF_FLAG_PID_CGROUP (1UL << 2) /* pid=cgroup id, per-cpu mode only */

View file

@ -1526,7 +1526,7 @@ config EVENTFD
# syscall, maps, verifier
config BPF_SYSCALL
bool "Enable bpf() system call" if EXPERT
bool "Enable bpf() system call"
select ANON_INODES
select BPF
default n

View file

@ -16,6 +16,7 @@
#include <linux/file.h>
#include <linux/license.h>
#include <linux/filter.h>
#include <linux/version.h>
static LIST_HEAD(bpf_map_types);
@ -467,7 +468,7 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
}
/* last field in 'union bpf_attr' used by this command */
#define BPF_PROG_LOAD_LAST_FIELD log_buf
#define BPF_PROG_LOAD_LAST_FIELD kern_version
static int bpf_prog_load(union bpf_attr *attr)
{
@ -492,6 +493,10 @@ static int bpf_prog_load(union bpf_attr *attr)
if (attr->insn_cnt >= BPF_MAXINSNS)
return -EINVAL;
if (type == BPF_PROG_TYPE_KPROBE &&
attr->kern_version != LINUX_VERSION_CODE)
return -EINVAL;
/* plain bpf_prog allocation */
prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
if (!prog)

File diff suppressed because it is too large Load diff

View file

@ -116,12 +116,12 @@ static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
*/
static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
{
struct task_struct *tsk = bp->hw.bp_target;
struct task_struct *tsk = bp->hw.target;
struct perf_event *iter;
int count = 0;
list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
if (iter->hw.bp_target == tsk &&
if (iter->hw.target == tsk &&
find_slot_idx(iter) == type &&
(iter->cpu < 0 || cpu == iter->cpu))
count += hw_breakpoint_weight(iter);
@ -153,7 +153,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
int nr;
nr = info->cpu_pinned;
if (!bp->hw.bp_target)
if (!bp->hw.target)
nr += max_task_bp_pinned(cpu, type);
else
nr += task_bp_pinned(cpu, bp, type);
@ -210,7 +210,7 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
weight = -weight;
/* Pinned counter cpu profiling */
if (!bp->hw.bp_target) {
if (!bp->hw.target) {
get_bp_info(bp->cpu, type)->cpu_pinned += weight;
return;
}

View file

@ -27,6 +27,7 @@ struct ring_buffer {
local_t lost; /* nr records lost */
long watermark; /* wakeup watermark */
long aux_watermark;
/* poll crap */
spinlock_t event_lock;
struct list_head event_list;
@ -35,6 +36,20 @@ struct ring_buffer {
unsigned long mmap_locked;
struct user_struct *mmap_user;
/* AUX area */
local_t aux_head;
local_t aux_nest;
local_t aux_wakeup;
unsigned long aux_pgoff;
int aux_nr_pages;
int aux_overwrite;
atomic_t aux_mmap_count;
unsigned long aux_mmap_locked;
void (*free_aux)(void *);
atomic_t aux_refcount;
void **aux_pages;
void *aux_priv;
struct perf_event_mmap_page *user_page;
void *data_pages[0];
};
@ -43,6 +58,19 @@ extern void rb_free(struct ring_buffer *rb);
extern struct ring_buffer *
rb_alloc(int nr_pages, long watermark, int cpu, int flags);
extern void perf_event_wakeup(struct perf_event *event);
extern int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
pgoff_t pgoff, int nr_pages, long watermark, int flags);
extern void rb_free_aux(struct ring_buffer *rb);
extern struct ring_buffer *ring_buffer_get(struct perf_event *event);
extern void ring_buffer_put(struct ring_buffer *rb);
static inline bool rb_has_aux(struct ring_buffer *rb)
{
return !!rb->aux_nr_pages;
}
void perf_event_aux_event(struct perf_event *event, unsigned long head,
unsigned long size, u64 flags);
extern void
perf_event_header__init_id(struct perf_event_header *header,
@ -81,6 +109,11 @@ static inline unsigned long perf_data_size(struct ring_buffer *rb)
return rb->nr_pages << (PAGE_SHIFT + page_order(rb));
}
static inline unsigned long perf_aux_size(struct ring_buffer *rb)
{
return rb->aux_nr_pages << PAGE_SHIFT;
}
#define DEFINE_OUTPUT_COPY(func_name, memcpy_func) \
static inline unsigned long \
func_name(struct perf_output_handle *handle, \

View file

@ -243,14 +243,317 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
spin_lock_init(&rb->event_lock);
}
/*
* This is called before hardware starts writing to the AUX area to
* obtain an output handle and make sure there's room in the buffer.
* When the capture completes, call perf_aux_output_end() to commit
* the recorded data to the buffer.
*
* The ordering is similar to that of perf_output_{begin,end}, with
* the exception of (B), which should be taken care of by the pmu
* driver, since ordering rules will differ depending on hardware.
*/
void *perf_aux_output_begin(struct perf_output_handle *handle,
struct perf_event *event)
{
struct perf_event *output_event = event;
unsigned long aux_head, aux_tail;
struct ring_buffer *rb;
if (output_event->parent)
output_event = output_event->parent;
/*
* Since this will typically be open across pmu::add/pmu::del, we
* grab ring_buffer's refcount instead of holding rcu read lock
* to make sure it doesn't disappear under us.
*/
rb = ring_buffer_get(output_event);
if (!rb)
return NULL;
if (!rb_has_aux(rb) || !atomic_inc_not_zero(&rb->aux_refcount))
goto err;
/*
* Nesting is not supported for AUX area, make sure nested
* writers are caught early
*/
if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
goto err_put;
aux_head = local_read(&rb->aux_head);
handle->rb = rb;
handle->event = event;
handle->head = aux_head;
handle->size = 0;
/*
* In overwrite mode, AUX data stores do not depend on aux_tail,
* therefore (A) control dependency barrier does not exist. The
* (B) <-> (C) ordering is still observed by the pmu driver.
*/
if (!rb->aux_overwrite) {
aux_tail = ACCESS_ONCE(rb->user_page->aux_tail);
handle->wakeup = local_read(&rb->aux_wakeup) + rb->aux_watermark;
if (aux_head - aux_tail < perf_aux_size(rb))
handle->size = CIRC_SPACE(aux_head, aux_tail, perf_aux_size(rb));
/*
* handle->size computation depends on aux_tail load; this forms a
* control dependency barrier separating aux_tail load from aux data
* store that will be enabled on successful return
*/
if (!handle->size) { /* A, matches D */
event->pending_disable = 1;
perf_output_wakeup(handle);
local_set(&rb->aux_nest, 0);
goto err_put;
}
}
return handle->rb->aux_priv;
err_put:
rb_free_aux(rb);
err:
ring_buffer_put(rb);
handle->event = NULL;
return NULL;
}
/*
* Commit the data written by hardware into the ring buffer by adjusting
* aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the
* pmu driver's responsibility to observe ordering rules of the hardware,
* so that all the data is externally visible before this is called.
*/
void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
bool truncated)
{
struct ring_buffer *rb = handle->rb;
unsigned long aux_head;
u64 flags = 0;
if (truncated)
flags |= PERF_AUX_FLAG_TRUNCATED;
/* in overwrite mode, driver provides aux_head via handle */
if (rb->aux_overwrite) {
flags |= PERF_AUX_FLAG_OVERWRITE;
aux_head = handle->head;
local_set(&rb->aux_head, aux_head);
} else {
aux_head = local_read(&rb->aux_head);
local_add(size, &rb->aux_head);
}
if (size || flags) {
/*
* Only send RECORD_AUX if we have something useful to communicate
*/
perf_event_aux_event(handle->event, aux_head, size, flags);
}
aux_head = rb->user_page->aux_head = local_read(&rb->aux_head);
if (aux_head - local_read(&rb->aux_wakeup) >= rb->aux_watermark) {
perf_output_wakeup(handle);
local_add(rb->aux_watermark, &rb->aux_wakeup);
}
handle->event = NULL;
local_set(&rb->aux_nest, 0);
rb_free_aux(rb);
ring_buffer_put(rb);
}
/*
* Skip over a given number of bytes in the AUX buffer, due to, for example,
* hardware's alignment constraints.
*/
int perf_aux_output_skip(struct perf_output_handle *handle, unsigned long size)
{
struct ring_buffer *rb = handle->rb;
unsigned long aux_head;
if (size > handle->size)
return -ENOSPC;
local_add(size, &rb->aux_head);
aux_head = rb->user_page->aux_head = local_read(&rb->aux_head);
if (aux_head - local_read(&rb->aux_wakeup) >= rb->aux_watermark) {
perf_output_wakeup(handle);
local_add(rb->aux_watermark, &rb->aux_wakeup);
handle->wakeup = local_read(&rb->aux_wakeup) +
rb->aux_watermark;
}
handle->head = aux_head;
handle->size -= size;
return 0;
}
void *perf_get_aux(struct perf_output_handle *handle)
{
/* this is only valid between perf_aux_output_begin and *_end */
if (!handle->event)
return NULL;
return handle->rb->aux_priv;
}
#define PERF_AUX_GFP (GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY)
static struct page *rb_alloc_aux_page(int node, int order)
{
struct page *page;
if (order > MAX_ORDER)
order = MAX_ORDER;
do {
page = alloc_pages_node(node, PERF_AUX_GFP, order);
} while (!page && order--);
if (page && order) {
/*
* Communicate the allocation size to the driver
*/
split_page(page, order);
SetPagePrivate(page);
set_page_private(page, order);
}
return page;
}
static void rb_free_aux_page(struct ring_buffer *rb, int idx)
{
struct page *page = virt_to_page(rb->aux_pages[idx]);
ClearPagePrivate(page);
page->mapping = NULL;
__free_page(page);
}
int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
pgoff_t pgoff, int nr_pages, long watermark, int flags)
{
bool overwrite = !(flags & RING_BUFFER_WRITABLE);
int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu);
int ret = -ENOMEM, max_order = 0;
if (!has_aux(event))
return -ENOTSUPP;
if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NO_SG) {
/*
* We need to start with the max_order that fits in nr_pages,
* not the other way around, hence ilog2() and not get_order.
*/
max_order = ilog2(nr_pages);
/*
* PMU requests more than one contiguous chunks of memory
* for SW double buffering
*/
if ((event->pmu->capabilities & PERF_PMU_CAP_AUX_SW_DOUBLEBUF) &&
!overwrite) {
if (!max_order)
return -EINVAL;
max_order--;
}
}
rb->aux_pages = kzalloc_node(nr_pages * sizeof(void *), GFP_KERNEL, node);
if (!rb->aux_pages)
return -ENOMEM;
rb->free_aux = event->pmu->free_aux;
for (rb->aux_nr_pages = 0; rb->aux_nr_pages < nr_pages;) {
struct page *page;
int last, order;
order = min(max_order, ilog2(nr_pages - rb->aux_nr_pages));
page = rb_alloc_aux_page(node, order);
if (!page)
goto out;
for (last = rb->aux_nr_pages + (1 << page_private(page));
last > rb->aux_nr_pages; rb->aux_nr_pages++)
rb->aux_pages[rb->aux_nr_pages] = page_address(page++);
}
rb->aux_priv = event->pmu->setup_aux(event->cpu, rb->aux_pages, nr_pages,
overwrite);
if (!rb->aux_priv)
goto out;
ret = 0;
/*
* aux_pages (and pmu driver's private data, aux_priv) will be
* referenced in both producer's and consumer's contexts, thus
* we keep a refcount here to make sure either of the two can
* reference them safely.
*/
atomic_set(&rb->aux_refcount, 1);
rb->aux_overwrite = overwrite;
rb->aux_watermark = watermark;
if (!rb->aux_watermark && !rb->aux_overwrite)
rb->aux_watermark = nr_pages << (PAGE_SHIFT - 1);
out:
if (!ret)
rb->aux_pgoff = pgoff;
else
rb_free_aux(rb);
return ret;
}
static void __rb_free_aux(struct ring_buffer *rb)
{
int pg;
if (rb->aux_priv) {
rb->free_aux(rb->aux_priv);
rb->free_aux = NULL;
rb->aux_priv = NULL;
}
for (pg = 0; pg < rb->aux_nr_pages; pg++)
rb_free_aux_page(rb, pg);
kfree(rb->aux_pages);
rb->aux_nr_pages = 0;
}
void rb_free_aux(struct ring_buffer *rb)
{
if (atomic_dec_and_test(&rb->aux_refcount))
__rb_free_aux(rb);
}
#ifndef CONFIG_PERF_USE_VMALLOC
/*
* Back perf_mmap() with regular GFP_KERNEL-0 pages.
*/
struct page *
perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
static struct page *
__perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
{
if (pgoff > rb->nr_pages)
return NULL;
@ -340,8 +643,8 @@ static int data_page_nr(struct ring_buffer *rb)
return rb->nr_pages << page_order(rb);
}
struct page *
perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
static struct page *
__perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
{
/* The '>' counts in the user page. */
if (pgoff > data_page_nr(rb))
@ -416,3 +719,19 @@ struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags)
}
#endif
struct page *
perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
{
if (rb->aux_nr_pages) {
/* above AUX space */
if (pgoff > rb->aux_pgoff + rb->aux_nr_pages)
return NULL;
/* AUX space */
if (pgoff >= rb->aux_pgoff)
return virt_to_page(rb->aux_pages[pgoff - rb->aux_pgoff]);
}
return __perf_mmap_to_page(rb, pgoff);
}

View file

@ -432,6 +432,14 @@ config UPROBE_EVENT
This option is required if you plan to use perf-probe subcommand
of perf tools on user space applications.
config BPF_EVENTS
depends on BPF_SYSCALL
depends on KPROBE_EVENT
bool
default y
help
This allows the user to attach BPF programs to kprobe events.
config PROBE_EVENTS
def_bool n

View file

@ -53,6 +53,7 @@ obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
endif
obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
obj-$(CONFIG_KPROBE_EVENT) += trace_kprobe.o
obj-$(CONFIG_TRACEPOINTS) += power-traces.o
ifeq ($(CONFIG_PM),y)

222
kernel/trace/bpf_trace.c Normal file
View file

@ -0,0 +1,222 @@
/* Copyright (c) 2011-2015 PLUMgrid, http://plumgrid.com
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/slab.h>
#include <linux/bpf.h>
#include <linux/filter.h>
#include <linux/uaccess.h>
#include <linux/ctype.h>
#include "trace.h"
static DEFINE_PER_CPU(int, bpf_prog_active);
/**
* trace_call_bpf - invoke BPF program
* @prog: BPF program
* @ctx: opaque context pointer
*
* kprobe handlers execute BPF programs via this helper.
* Can be used from static tracepoints in the future.
*
* Return: BPF programs always return an integer which is interpreted by
* kprobe handler as:
* 0 - return from kprobe (event is filtered out)
* 1 - store kprobe event into ring buffer
* Other values are reserved and currently alias to 1
*/
unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx)
{
unsigned int ret;
if (in_nmi()) /* not supported yet */
return 1;
preempt_disable();
if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
/*
* since some bpf program is already running on this cpu,
* don't call into another bpf program (same or different)
* and don't send kprobe event into ring-buffer,
* so return zero here
*/
ret = 0;
goto out;
}
rcu_read_lock();
ret = BPF_PROG_RUN(prog, ctx);
rcu_read_unlock();
out:
__this_cpu_dec(bpf_prog_active);
preempt_enable();
return ret;
}
EXPORT_SYMBOL_GPL(trace_call_bpf);
static u64 bpf_probe_read(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
{
void *dst = (void *) (long) r1;
int size = (int) r2;
void *unsafe_ptr = (void *) (long) r3;
return probe_kernel_read(dst, unsafe_ptr, size);
}
static const struct bpf_func_proto bpf_probe_read_proto = {
.func = bpf_probe_read,
.gpl_only = true,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_STACK,
.arg2_type = ARG_CONST_STACK_SIZE,
.arg3_type = ARG_ANYTHING,
};
static u64 bpf_ktime_get_ns(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
{
/* NMI safe access to clock monotonic */
return ktime_get_mono_fast_ns();
}
static const struct bpf_func_proto bpf_ktime_get_ns_proto = {
.func = bpf_ktime_get_ns,
.gpl_only = true,
.ret_type = RET_INTEGER,
};
/*
* limited trace_printk()
* only %d %u %x %ld %lu %lx %lld %llu %llx %p conversion specifiers allowed
*/
static u64 bpf_trace_printk(u64 r1, u64 fmt_size, u64 r3, u64 r4, u64 r5)
{
char *fmt = (char *) (long) r1;
int mod[3] = {};
int fmt_cnt = 0;
int i;
/*
* bpf_check()->check_func_arg()->check_stack_boundary()
* guarantees that fmt points to bpf program stack,
* fmt_size bytes of it were initialized and fmt_size > 0
*/
if (fmt[--fmt_size] != 0)
return -EINVAL;
/* check format string for allowed specifiers */
for (i = 0; i < fmt_size; i++) {
if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i]))
return -EINVAL;
if (fmt[i] != '%')
continue;
if (fmt_cnt >= 3)
return -EINVAL;
/* fmt[i] != 0 && fmt[last] == 0, so we can access fmt[i + 1] */
i++;
if (fmt[i] == 'l') {
mod[fmt_cnt]++;
i++;
} else if (fmt[i] == 'p') {
mod[fmt_cnt]++;
i++;
if (!isspace(fmt[i]) && !ispunct(fmt[i]) && fmt[i] != 0)
return -EINVAL;
fmt_cnt++;
continue;
}
if (fmt[i] == 'l') {
mod[fmt_cnt]++;
i++;
}
if (fmt[i] != 'd' && fmt[i] != 'u' && fmt[i] != 'x')
return -EINVAL;
fmt_cnt++;
}
return __trace_printk(1/* fake ip will not be printed */, fmt,
mod[0] == 2 ? r3 : mod[0] == 1 ? (long) r3 : (u32) r3,
mod[1] == 2 ? r4 : mod[1] == 1 ? (long) r4 : (u32) r4,
mod[2] == 2 ? r5 : mod[2] == 1 ? (long) r5 : (u32) r5);
}
static const struct bpf_func_proto bpf_trace_printk_proto = {
.func = bpf_trace_printk,
.gpl_only = true,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_STACK,
.arg2_type = ARG_CONST_STACK_SIZE,
};
static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func_id)
{
switch (func_id) {
case BPF_FUNC_map_lookup_elem:
return &bpf_map_lookup_elem_proto;
case BPF_FUNC_map_update_elem:
return &bpf_map_update_elem_proto;
case BPF_FUNC_map_delete_elem:
return &bpf_map_delete_elem_proto;
case BPF_FUNC_probe_read:
return &bpf_probe_read_proto;
case BPF_FUNC_ktime_get_ns:
return &bpf_ktime_get_ns_proto;
case BPF_FUNC_trace_printk:
/*
* this program might be calling bpf_trace_printk,
* so allocate per-cpu printk buffers
*/
trace_printk_init_buffers();
return &bpf_trace_printk_proto;
default:
return NULL;
}
}
/* bpf+kprobe programs can access fields of 'struct pt_regs' */
static bool kprobe_prog_is_valid_access(int off, int size, enum bpf_access_type type)
{
/* check bounds */
if (off < 0 || off >= sizeof(struct pt_regs))
return false;
/* only read is allowed */
if (type != BPF_READ)
return false;
/* disallow misaligned access */
if (off % size != 0)
return false;
return true;
}
static struct bpf_verifier_ops kprobe_prog_ops = {
.get_func_proto = kprobe_prog_func_proto,
.is_valid_access = kprobe_prog_is_valid_access,
};
static struct bpf_prog_type_list kprobe_tl = {
.ops = &kprobe_prog_ops,
.type = BPF_PROG_TYPE_KPROBE,
};
static int __init register_kprobe_prog_ops(void)
{
bpf_register_prog_type(&kprobe_tl);
return 0;
}
late_initcall(register_kprobe_prog_ops);

View file

@ -1135,11 +1135,15 @@ static void
kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
{
struct ftrace_event_call *call = &tk->tp.call;
struct bpf_prog *prog = call->prog;
struct kprobe_trace_entry_head *entry;
struct hlist_head *head;
int size, __size, dsize;
int rctx;
if (prog && !trace_call_bpf(prog, regs))
return;
head = this_cpu_ptr(call->perf_events);
if (hlist_empty(head))
return;
@ -1166,11 +1170,15 @@ kretprobe_perf_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
struct pt_regs *regs)
{
struct ftrace_event_call *call = &tk->tp.call;
struct bpf_prog *prog = call->prog;
struct kretprobe_trace_entry_head *entry;
struct hlist_head *head;
int size, __size, dsize;
int rctx;
if (prog && !trace_call_bpf(prog, regs))
return;
head = this_cpu_ptr(call->perf_events);
if (hlist_empty(head))
return;
@ -1287,7 +1295,7 @@ static int register_kprobe_event(struct trace_kprobe *tk)
kfree(call->print_fmt);
return -ENODEV;
}
call->flags = 0;
call->flags = TRACE_EVENT_FL_KPROBE;
call->class->reg = kprobe_register;
call->data = tk;
ret = trace_add_event_call(call);

View file

@ -1006,7 +1006,7 @@ __uprobe_perf_filter(struct trace_uprobe_filter *filter, struct mm_struct *mm)
return true;
list_for_each_entry(event, &filter->perf_events, hw.tp_list) {
if (event->hw.tp_target->mm == mm)
if (event->hw.target->mm == mm)
return true;
}
@ -1016,7 +1016,7 @@ __uprobe_perf_filter(struct trace_uprobe_filter *filter, struct mm_struct *mm)
static inline bool
uprobe_filter_event(struct trace_uprobe *tu, struct perf_event *event)
{
return __uprobe_perf_filter(&tu->filter, event->hw.tp_target->mm);
return __uprobe_perf_filter(&tu->filter, event->hw.target->mm);
}
static int uprobe_perf_close(struct trace_uprobe *tu, struct perf_event *event)
@ -1024,10 +1024,10 @@ static int uprobe_perf_close(struct trace_uprobe *tu, struct perf_event *event)
bool done;
write_lock(&tu->filter.rwlock);
if (event->hw.tp_target) {
if (event->hw.target) {
list_del(&event->hw.tp_list);
done = tu->filter.nr_systemwide ||
(event->hw.tp_target->flags & PF_EXITING) ||
(event->hw.target->flags & PF_EXITING) ||
uprobe_filter_event(tu, event);
} else {
tu->filter.nr_systemwide--;
@ -1047,7 +1047,7 @@ static int uprobe_perf_open(struct trace_uprobe *tu, struct perf_event *event)
int err;
write_lock(&tu->filter.rwlock);
if (event->hw.tp_target) {
if (event->hw.target) {
/*
* event->parent != NULL means copy_process(), we can avoid
* uprobe_apply(). current->mm must be probed and we can rely

View file

@ -567,9 +567,37 @@ static void watchdog_nmi_disable(unsigned int cpu)
cpu0_err = 0;
}
}
void watchdog_nmi_enable_all(void)
{
int cpu;
if (!watchdog_user_enabled)
return;
get_online_cpus();
for_each_online_cpu(cpu)
watchdog_nmi_enable(cpu);
put_online_cpus();
}
void watchdog_nmi_disable_all(void)
{
int cpu;
if (!watchdog_running)
return;
get_online_cpus();
for_each_online_cpu(cpu)
watchdog_nmi_disable(cpu);
put_online_cpus();
}
#else
static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
static void watchdog_nmi_disable(unsigned int cpu) { return; }
void watchdog_nmi_enable_all(void) {}
void watchdog_nmi_disable_all(void) {}
#endif /* CONFIG_HARDLOCKUP_DETECTOR */
static struct smp_hotplug_thread watchdog_threads = {

View file

@ -6,23 +6,39 @@ hostprogs-y := test_verifier test_maps
hostprogs-y += sock_example
hostprogs-y += sockex1
hostprogs-y += sockex2
hostprogs-y += tracex1
hostprogs-y += tracex2
hostprogs-y += tracex3
hostprogs-y += tracex4
test_verifier-objs := test_verifier.o libbpf.o
test_maps-objs := test_maps.o libbpf.o
sock_example-objs := sock_example.o libbpf.o
sockex1-objs := bpf_load.o libbpf.o sockex1_user.o
sockex2-objs := bpf_load.o libbpf.o sockex2_user.o
tracex1-objs := bpf_load.o libbpf.o tracex1_user.o
tracex2-objs := bpf_load.o libbpf.o tracex2_user.o
tracex3-objs := bpf_load.o libbpf.o tracex3_user.o
tracex4-objs := bpf_load.o libbpf.o tracex4_user.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)
always += sockex1_kern.o
always += sockex2_kern.o
always += tracex1_kern.o
always += tracex2_kern.o
always += tracex3_kern.o
always += tracex4_kern.o
HOSTCFLAGS += -I$(objtree)/usr/include
HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
HOSTLOADLIBES_sockex1 += -lelf
HOSTLOADLIBES_sockex2 += -lelf
HOSTLOADLIBES_tracex1 += -lelf
HOSTLOADLIBES_tracex2 += -lelf
HOSTLOADLIBES_tracex3 += -lelf
HOSTLOADLIBES_tracex4 += -lelf -lrt
# point this to your LLVM backend with bpf support
LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc

View file

@ -15,6 +15,12 @@ static int (*bpf_map_update_elem)(void *map, void *key, void *value,
(void *) BPF_FUNC_map_update_elem;
static int (*bpf_map_delete_elem)(void *map, void *key) =
(void *) BPF_FUNC_map_delete_elem;
static int (*bpf_probe_read)(void *dst, int size, void *unsafe_ptr) =
(void *) BPF_FUNC_probe_read;
static unsigned long long (*bpf_ktime_get_ns)(void) =
(void *) BPF_FUNC_ktime_get_ns;
static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
(void *) BPF_FUNC_trace_printk;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions

View file

@ -8,29 +8,70 @@
#include <unistd.h>
#include <string.h>
#include <stdbool.h>
#include <stdlib.h>
#include <linux/bpf.h>
#include <linux/filter.h>
#include <linux/perf_event.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <poll.h>
#include "libbpf.h"
#include "bpf_helpers.h"
#include "bpf_load.h"
#define DEBUGFS "/sys/kernel/debug/tracing/"
static char license[128];
static int kern_version;
static bool processed_sec[128];
int map_fd[MAX_MAPS];
int prog_fd[MAX_PROGS];
int event_fd[MAX_PROGS];
int prog_cnt;
static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
{
int fd;
bool is_socket = strncmp(event, "socket", 6) == 0;
bool is_kprobe = strncmp(event, "kprobe/", 7) == 0;
bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0;
enum bpf_prog_type prog_type;
char buf[256];
int fd, efd, err, id;
struct perf_event_attr attr = {};
if (!is_socket)
/* tracing events tbd */
attr.type = PERF_TYPE_TRACEPOINT;
attr.sample_type = PERF_SAMPLE_RAW;
attr.sample_period = 1;
attr.wakeup_events = 1;
if (is_socket) {
prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
} else if (is_kprobe || is_kretprobe) {
prog_type = BPF_PROG_TYPE_KPROBE;
} else {
printf("Unknown event '%s'\n", event);
return -1;
}
fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER,
prog, size, license);
if (is_kprobe || is_kretprobe) {
if (is_kprobe)
event += 7;
else
event += 10;
snprintf(buf, sizeof(buf),
"echo '%c:%s %s' >> /sys/kernel/debug/tracing/kprobe_events",
is_kprobe ? 'p' : 'r', event, event);
err = system(buf);
if (err < 0) {
printf("failed to create kprobe '%s' error '%s'\n",
event, strerror(errno));
return -1;
}
}
fd = bpf_prog_load(prog_type, prog, size, license, kern_version);
if (fd < 0) {
printf("bpf_prog_load() err=%d\n%s", errno, bpf_log_buf);
@ -39,6 +80,41 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
prog_fd[prog_cnt++] = fd;
if (is_socket)
return 0;
strcpy(buf, DEBUGFS);
strcat(buf, "events/kprobes/");
strcat(buf, event);
strcat(buf, "/id");
efd = open(buf, O_RDONLY, 0);
if (efd < 0) {
printf("failed to open event %s\n", event);
return -1;
}
err = read(efd, buf, sizeof(buf));
if (err < 0 || err >= sizeof(buf)) {
printf("read from '%s' failed '%s'\n", event, strerror(errno));
return -1;
}
close(efd);
buf[err] = 0;
id = atoi(buf);
attr.config = id;
efd = perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
if (efd < 0) {
printf("event %d fd %d err %s\n", id, efd, strerror(errno));
return -1;
}
event_fd[prog_cnt - 1] = efd;
ioctl(efd, PERF_EVENT_IOC_ENABLE, 0);
ioctl(efd, PERF_EVENT_IOC_SET_BPF, fd);
return 0;
}
@ -135,6 +211,9 @@ int load_bpf_file(char *path)
if (gelf_getehdr(elf, &ehdr) != &ehdr)
return 1;
/* clear all kprobes */
i = system("echo \"\" > /sys/kernel/debug/tracing/kprobe_events");
/* scan over all elf sections to get license and map info */
for (i = 1; i < ehdr.e_shnum; i++) {
@ -149,6 +228,14 @@ int load_bpf_file(char *path)
if (strcmp(shname, "license") == 0) {
processed_sec[i] = true;
memcpy(license, data->d_buf, data->d_size);
} else if (strcmp(shname, "version") == 0) {
processed_sec[i] = true;
if (data->d_size != sizeof(int)) {
printf("invalid size of version section %zd\n",
data->d_size);
return 1;
}
memcpy(&kern_version, data->d_buf, sizeof(int));
} else if (strcmp(shname, "maps") == 0) {
processed_sec[i] = true;
if (load_maps(data->d_buf, data->d_size))
@ -178,7 +265,8 @@ int load_bpf_file(char *path)
if (parse_relo_and_apply(data, symbols, &shdr, insns))
continue;
if (memcmp(shname_prog, "events/", 7) == 0 ||
if (memcmp(shname_prog, "kprobe/", 7) == 0 ||
memcmp(shname_prog, "kretprobe/", 10) == 0 ||
memcmp(shname_prog, "socket", 6) == 0)
load_and_attach(shname_prog, insns, data_prog->d_size);
}
@ -193,7 +281,8 @@ int load_bpf_file(char *path)
if (get_sec(elf, i, &ehdr, &shname, &shdr, &data))
continue;
if (memcmp(shname, "events/", 7) == 0 ||
if (memcmp(shname, "kprobe/", 7) == 0 ||
memcmp(shname, "kretprobe/", 10) == 0 ||
memcmp(shname, "socket", 6) == 0)
load_and_attach(shname, data->d_buf, data->d_size);
}
@ -201,3 +290,23 @@ int load_bpf_file(char *path)
close(fd);
return 0;
}
void read_trace_pipe(void)
{
int trace_fd;
trace_fd = open(DEBUGFS "trace_pipe", O_RDONLY, 0);
if (trace_fd < 0)
return;
while (1) {
static char buf[4096];
ssize_t sz;
sz = read(trace_fd, buf, sizeof(buf));
if (sz > 0) {
buf[sz] = 0;
puts(buf);
}
}
}

View file

@ -6,6 +6,7 @@
extern int map_fd[MAX_MAPS];
extern int prog_fd[MAX_PROGS];
extern int event_fd[MAX_PROGS];
/* parses elf file compiled by llvm .c->.o
* . parses 'maps' section and creates maps via BPF syscall
@ -21,4 +22,6 @@ extern int prog_fd[MAX_PROGS];
*/
int load_bpf_file(char *path);
void read_trace_pipe(void);
#endif

View file

@ -81,7 +81,7 @@ char bpf_log_buf[LOG_BUF_SIZE];
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, int prog_len,
const char *license)
const char *license, int kern_version)
{
union bpf_attr attr = {
.prog_type = prog_type,
@ -93,6 +93,11 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
.log_level = 1,
};
/* assign one field outside of struct init to make sure any
* padding is zero initialized
*/
attr.kern_version = kern_version;
bpf_log_buf[0] = 0;
return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
@ -121,3 +126,10 @@ int open_raw_sock(const char *name)
return sock;
}
int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
int group_fd, unsigned long flags)
{
return syscall(__NR_perf_event_open, attr, pid, cpu,
group_fd, flags);
}

View file

@ -13,7 +13,7 @@ int bpf_get_next_key(int fd, void *key, void *next_key);
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, int insn_len,
const char *license);
const char *license, int kern_version);
#define LOG_BUF_SIZE 65536
extern char bpf_log_buf[LOG_BUF_SIZE];
@ -182,4 +182,7 @@ extern char bpf_log_buf[LOG_BUF_SIZE];
/* create RAW socket and bind to interface 'name' */
int open_raw_sock(const char *name);
struct perf_event_attr;
int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
int group_fd, unsigned long flags);
#endif

View file

@ -56,7 +56,7 @@ static int test_sock(void)
};
prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog, sizeof(prog),
"GPL");
"GPL", 0);
if (prog_fd < 0) {
printf("failed to load prog '%s'\n", strerror(errno));
goto cleanup;

View file

@ -689,7 +689,7 @@ static int test(void)
prog_fd = bpf_prog_load(BPF_PROG_TYPE_UNSPEC, prog,
prog_len * sizeof(struct bpf_insn),
"GPL");
"GPL", 0);
if (tests[i].result == ACCEPT) {
if (prog_fd < 0) {

View file

@ -0,0 +1,50 @@
/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <linux/skbuff.h>
#include <linux/netdevice.h>
#include <uapi/linux/bpf.h>
#include <linux/version.h>
#include "bpf_helpers.h"
#define _(P) ({typeof(P) val = 0; bpf_probe_read(&val, sizeof(val), &P); val;})
/* kprobe is NOT a stable ABI
* kernel functions can be removed, renamed or completely change semantics.
* Number of arguments and their positions can change, etc.
* In such case this bpf+kprobe example will no longer be meaningful
*/
SEC("kprobe/__netif_receive_skb_core")
int bpf_prog1(struct pt_regs *ctx)
{
/* attaches to kprobe netif_receive_skb,
* looks for packets on loobpack device and prints them
*/
char devname[IFNAMSIZ] = {};
struct net_device *dev;
struct sk_buff *skb;
int len;
/* non-portable! works for the given kernel only */
skb = (struct sk_buff *) ctx->di;
dev = _(skb->dev);
len = _(skb->len);
bpf_probe_read(devname, sizeof(devname), dev->name);
if (devname[0] == 'l' && devname[1] == 'o') {
char fmt[] = "skb %p len %d\n";
/* using bpf_trace_printk() for DEBUG ONLY */
bpf_trace_printk(fmt, sizeof(fmt), skb, len);
}
return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

View file

@ -0,0 +1,25 @@
#include <stdio.h>
#include <linux/bpf.h>
#include <unistd.h>
#include "libbpf.h"
#include "bpf_load.h"
int main(int ac, char **argv)
{
FILE *f;
char filename[256];
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
}
f = popen("taskset 1 ping -c5 localhost", "r");
(void) f;
read_trace_pipe();
return 0;
}

View file

@ -0,0 +1,86 @@
/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <linux/skbuff.h>
#include <linux/netdevice.h>
#include <linux/version.h>
#include <uapi/linux/bpf.h>
#include "bpf_helpers.h"
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(long),
.value_size = sizeof(long),
.max_entries = 1024,
};
/* kprobe is NOT a stable ABI. If kernel internals change this bpf+kprobe
* example will no longer be meaningful
*/
SEC("kprobe/kfree_skb")
int bpf_prog2(struct pt_regs *ctx)
{
long loc = 0;
long init_val = 1;
long *value;
/* x64 specific: read ip of kfree_skb caller.
* non-portable version of __builtin_return_address(0)
*/
bpf_probe_read(&loc, sizeof(loc), (void *)ctx->sp);
value = bpf_map_lookup_elem(&my_map, &loc);
if (value)
*value += 1;
else
bpf_map_update_elem(&my_map, &loc, &init_val, BPF_ANY);
return 0;
}
static unsigned int log2(unsigned int v)
{
unsigned int r;
unsigned int shift;
r = (v > 0xFFFF) << 4; v >>= r;
shift = (v > 0xFF) << 3; v >>= shift; r |= shift;
shift = (v > 0xF) << 2; v >>= shift; r |= shift;
shift = (v > 0x3) << 1; v >>= shift; r |= shift;
r |= (v >> 1);
return r;
}
static unsigned int log2l(unsigned long v)
{
unsigned int hi = v >> 32;
if (hi)
return log2(hi) + 32;
else
return log2(v);
}
struct bpf_map_def SEC("maps") my_hist_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(long),
.max_entries = 64,
};
SEC("kprobe/sys_write")
int bpf_prog3(struct pt_regs *ctx)
{
long write_size = ctx->dx; /* arg3 */
long init_val = 1;
long *value;
u32 index = log2l(write_size);
value = bpf_map_lookup_elem(&my_hist_map, &index);
if (value)
__sync_fetch_and_add(value, 1);
return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

View file

@ -0,0 +1,95 @@
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <linux/bpf.h>
#include "libbpf.h"
#include "bpf_load.h"
#define MAX_INDEX 64
#define MAX_STARS 38
static void stars(char *str, long val, long max, int width)
{
int i;
for (i = 0; i < (width * val / max) - 1 && i < width - 1; i++)
str[i] = '*';
if (val > max)
str[i - 1] = '+';
str[i] = '\0';
}
static void print_hist(int fd)
{
int key;
long value;
long data[MAX_INDEX] = {};
char starstr[MAX_STARS];
int i;
int max_ind = -1;
long max_value = 0;
for (key = 0; key < MAX_INDEX; key++) {
bpf_lookup_elem(fd, &key, &value);
data[key] = value;
if (value && key > max_ind)
max_ind = key;
if (value > max_value)
max_value = value;
}
printf(" syscall write() stats\n");
printf(" byte_size : count distribution\n");
for (i = 1; i <= max_ind + 1; i++) {
stars(starstr, data[i - 1], max_value, MAX_STARS);
printf("%8ld -> %-8ld : %-8ld |%-*s|\n",
(1l << i) >> 1, (1l << i) - 1, data[i - 1],
MAX_STARS, starstr);
}
}
static void int_exit(int sig)
{
print_hist(map_fd[1]);
exit(0);
}
int main(int ac, char **argv)
{
char filename[256];
long key, next_key, value;
FILE *f;
int i;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
signal(SIGINT, int_exit);
/* start 'ping' in the background to have some kfree_skb events */
f = popen("ping -c5 localhost", "r");
(void) f;
/* start 'dd' in the background to have plenty of 'write' syscalls */
f = popen("dd if=/dev/zero of=/dev/null count=5000000", "r");
(void) f;
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
}
for (i = 0; i < 5; i++) {
key = 0;
while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
bpf_lookup_elem(map_fd[0], &next_key, &value);
printf("location 0x%lx count %ld\n", next_key, value);
key = next_key;
}
if (key)
printf("\n");
sleep(1);
}
print_hist(map_fd[1]);
return 0;
}

View file

@ -0,0 +1,89 @@
/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <linux/skbuff.h>
#include <linux/netdevice.h>
#include <linux/version.h>
#include <uapi/linux/bpf.h>
#include "bpf_helpers.h"
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(long),
.value_size = sizeof(u64),
.max_entries = 4096,
};
/* kprobe is NOT a stable ABI. If kernel internals change this bpf+kprobe
* example will no longer be meaningful
*/
SEC("kprobe/blk_mq_start_request")
int bpf_prog1(struct pt_regs *ctx)
{
long rq = ctx->di;
u64 val = bpf_ktime_get_ns();
bpf_map_update_elem(&my_map, &rq, &val, BPF_ANY);
return 0;
}
static unsigned int log2l(unsigned long long n)
{
#define S(k) if (n >= (1ull << k)) { i += k; n >>= k; }
int i = -(n == 0);
S(32); S(16); S(8); S(4); S(2); S(1);
return i;
#undef S
}
#define SLOTS 100
struct bpf_map_def SEC("maps") lat_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u64),
.max_entries = SLOTS,
};
SEC("kprobe/blk_update_request")
int bpf_prog2(struct pt_regs *ctx)
{
long rq = ctx->di;
u64 *value, l, base;
u32 index;
value = bpf_map_lookup_elem(&my_map, &rq);
if (!value)
return 0;
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - *value;
bpf_map_delete_elem(&my_map, &rq);
/* the lines below are computing index = log10(delta)*10
* using integer arithmetic
* index = 29 ~ 1 usec
* index = 59 ~ 1 msec
* index = 89 ~ 1 sec
* index = 99 ~ 10sec or more
* log10(x)*10 = log2(x)*10/log2(10) = log2(x)*3
*/
l = log2l(delta);
base = 1ll << l;
index = (l * 64 + (delta - base) * 64 / base) * 3 / 64;
if (index >= SLOTS)
index = SLOTS - 1;
value = bpf_map_lookup_elem(&lat_map, &index);
if (value)
__sync_fetch_and_add((long *)value, 1);
return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

150
samples/bpf/tracex3_user.c Normal file
View file

@ -0,0 +1,150 @@
/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <stdbool.h>
#include <string.h>
#include <linux/bpf.h>
#include "libbpf.h"
#include "bpf_load.h"
#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
#define SLOTS 100
static void clear_stats(int fd)
{
__u32 key;
__u64 value = 0;
for (key = 0; key < SLOTS; key++)
bpf_update_elem(fd, &key, &value, BPF_ANY);
}
const char *color[] = {
"\033[48;5;255m",
"\033[48;5;252m",
"\033[48;5;250m",
"\033[48;5;248m",
"\033[48;5;246m",
"\033[48;5;244m",
"\033[48;5;242m",
"\033[48;5;240m",
"\033[48;5;238m",
"\033[48;5;236m",
"\033[48;5;234m",
"\033[48;5;232m",
};
const int num_colors = ARRAY_SIZE(color);
const char nocolor[] = "\033[00m";
const char *sym[] = {
" ",
" ",
".",
".",
"*",
"*",
"o",
"o",
"O",
"O",
"#",
"#",
};
bool full_range = false;
bool text_only = false;
static void print_banner(void)
{
if (full_range)
printf("|1ns |10ns |100ns |1us |10us |100us"
" |1ms |10ms |100ms |1s |10s\n");
else
printf("|1us |10us |100us |1ms |10ms "
"|100ms |1s |10s\n");
}
static void print_hist(int fd)
{
__u32 key;
__u64 value;
__u64 cnt[SLOTS];
__u64 max_cnt = 0;
__u64 total_events = 0;
for (key = 0; key < SLOTS; key++) {
value = 0;
bpf_lookup_elem(fd, &key, &value);
cnt[key] = value;
total_events += value;
if (value > max_cnt)
max_cnt = value;
}
clear_stats(fd);
for (key = full_range ? 0 : 29; key < SLOTS; key++) {
int c = num_colors * cnt[key] / (max_cnt + 1);
if (text_only)
printf("%s", sym[c]);
else
printf("%s %s", color[c], nocolor);
}
printf(" # %lld\n", total_events);
}
int main(int ac, char **argv)
{
char filename[256];
int i;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
}
for (i = 1; i < ac; i++) {
if (strcmp(argv[i], "-a") == 0) {
full_range = true;
} else if (strcmp(argv[i], "-t") == 0) {
text_only = true;
} else if (strcmp(argv[i], "-h") == 0) {
printf("Usage:\n"
" -a display wider latency range\n"
" -t text only\n");
return 1;
}
}
printf(" heatmap of IO latency\n");
if (text_only)
printf(" %s", sym[num_colors - 1]);
else
printf(" %s %s", color[num_colors - 1], nocolor);
printf(" - many events with this latency\n");
if (text_only)
printf(" %s", sym[0]);
else
printf(" %s %s", color[0], nocolor);
printf(" - few events\n");
for (i = 0; ; i++) {
if (i % 20 == 0)
print_banner();
print_hist(map_fd[1]);
sleep(2);
}
return 0;
}

View file

@ -0,0 +1,54 @@
/* Copyright (c) 2015 PLUMgrid, http://plumgrid.com
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <linux/ptrace.h>
#include <linux/version.h>
#include <uapi/linux/bpf.h>
#include "bpf_helpers.h"
struct pair {
u64 val;
u64 ip;
};
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(long),
.value_size = sizeof(struct pair),
.max_entries = 1000000,
};
/* kprobe is NOT a stable ABI. If kernel internals change this bpf+kprobe
* example will no longer be meaningful
*/
SEC("kprobe/kmem_cache_free")
int bpf_prog1(struct pt_regs *ctx)
{
long ptr = ctx->si;
bpf_map_delete_elem(&my_map, &ptr);
return 0;
}
SEC("kretprobe/kmem_cache_alloc_node")
int bpf_prog2(struct pt_regs *ctx)
{
long ptr = ctx->ax;
long ip = 0;
/* get ip address of kmem_cache_alloc_node() caller */
bpf_probe_read(&ip, sizeof(ip), (void *)(ctx->bp + sizeof(ip)));
struct pair v = {
.val = bpf_ktime_get_ns(),
.ip = ip,
};
bpf_map_update_elem(&my_map, &ptr, &v, BPF_ANY);
return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

View file

@ -0,0 +1,69 @@
/* Copyright (c) 2015 PLUMgrid, http://plumgrid.com
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <stdbool.h>
#include <string.h>
#include <time.h>
#include <linux/bpf.h>
#include "libbpf.h"
#include "bpf_load.h"
struct pair {
long long val;
__u64 ip;
};
static __u64 time_get_ns(void)
{
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec * 1000000000ull + ts.tv_nsec;
}
static void print_old_objects(int fd)
{
long long val = time_get_ns();
__u64 key, next_key;
struct pair v;
key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */
key = -1;
while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
bpf_lookup_elem(map_fd[0], &next_key, &v);
key = next_key;
if (val - v.val < 1000000000ll)
/* object was allocated more then 1 sec ago */
continue;
printf("obj 0x%llx is %2lldsec old was allocated at ip %llx\n",
next_key, (val - v.val) / 1000000000ll, v.ip);
}
}
int main(int ac, char **argv)
{
char filename[256];
int i;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
}
for (i = 0; ; i++) {
print_old_objects(map_fd[1]);
sleep(1);
}
return 0;
}

81
tools/build/Build.include Normal file
View file

@ -0,0 +1,81 @@
###
# build: Generic definitions
#
# Lots of this code have been borrowed or heavily inspired from parts
# of kbuild code, which is not credited, but mostly developed by:
#
# Copyright (C) Sam Ravnborg <sam@mars.ravnborg.org>, 2015
# Copyright (C) Linus Torvalds <torvalds@linux-foundation.org>, 2015
#
###
# Convenient variables
comma := ,
squote := '
###
# Name of target with a '.' as filename prefix. foo/bar.o => foo/.bar.o
dot-target = $(dir $@).$(notdir $@)
###
# filename of target with directory and extension stripped
basetarget = $(basename $(notdir $@))
###
# The temporary file to save gcc -MD generated dependencies must not
# contain a comma
depfile = $(subst $(comma),_,$(dot-target).d)
###
# Check if both arguments has same arguments. Result is empty string if equal.
arg-check = $(strip $(filter-out $(cmd_$(1)), $(cmd_$@)) \
$(filter-out $(cmd_$@), $(cmd_$(1))) )
###
# Escape single quote for use in echo statements
escsq = $(subst $(squote),'\$(squote)',$1)
# Echo command
# Short version is used, if $(quiet) equals `quiet_', otherwise full one.
echo-cmd = $(if $($(quiet)cmd_$(1)),\
echo ' $(call escsq,$($(quiet)cmd_$(1)))';)
###
# Replace >$< with >$$< to preserve $ when reloading the .cmd file
# (needed for make)
# Replace >#< with >\#< to avoid starting a comment in the .cmd file
# (needed for make)
# Replace >'< with >'\''< to be able to enclose the whole string in '...'
# (needed for the shell)
make-cmd = $(call escsq,$(subst \#,\\\#,$(subst $$,$$$$,$(cmd_$(1)))))
###
# Find any prerequisites that is newer than target or that does not exist.
# PHONY targets skipped in both cases.
any-prereq = $(filter-out $(PHONY),$?) $(filter-out $(PHONY) $(wildcard $^),$^)
###
# if_changed_dep - execute command if any prerequisite is newer than
# target, or command line has changed and update
# dependencies in the cmd file
if_changed_dep = $(if $(strip $(any-prereq) $(arg-check)), \
@set -e; \
$(echo-cmd) $(cmd_$(1)); \
cat $(depfile) > $(dot-target).cmd; \
printf '%s\n' 'cmd_$@ := $(make-cmd)' >> $(dot-target).cmd)
# if_changed - execute command if any prerequisite is newer than
# target, or command line has changed
if_changed = $(if $(strip $(any-prereq) $(arg-check)), \
@set -e; \
$(echo-cmd) $(cmd_$(1)); \
printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd)
###
# C flags to be used in rule definitions, includes:
# - depfile generation
# - global $(CFLAGS)
# - per target C flags
# - per object C flags
# - BUILD_STR macro to allow '-D"$(variable)"' constructs
c_flags = -Wp,-MD,$(depfile),-MT,$@ $(CFLAGS) -D"BUILD_STR(s)=\#s" $(CFLAGS_$(basetarget).o) $(CFLAGS_$(obj))

View file

@ -0,0 +1,139 @@
Build Framework
===============
The perf build framework was adopted from the kernel build system, hence the
idea and the way how objects are built is the same.
Basically the user provides set of 'Build' files that list objects and
directories to nest for specific target to be build.
Unlike the kernel we don't have a single build object 'obj-y' list that where
we setup source objects, but we support more. This allows one 'Build' file to
carry a sources list for multiple build objects.
a) Build framework makefiles
----------------------------
The build framework consists of 2 Makefiles:
Build.include
Makefile.build
While the 'Build.include' file contains just some generic definitions, the
'Makefile.build' file is the makefile used from the outside. It's
interface/usage is following:
$ make -f tools/build/Makefile srctree=$(KSRC) dir=$(DIR) obj=$(OBJECT)
where:
KSRC - is the path to kernel sources
DIR - is the path to the project to be built
OBJECT - is the name of the build object
When succefully finished the $(DIR) directory contains the final object file
called $(OBJECT)-in.o:
$ ls $(DIR)/$(OBJECT)-in.o
which includes all compiled sources described in 'Build' makefiles.
a) Build makefiles
------------------
The user supplies 'Build' makefiles that contains a objects list, and connects
the build to nested directories.
Assume we have the following project structure:
ex/a.c
/b.c
/c.c
/d.c
/arch/e.c
/arch/f.c
Out of which you build the 'ex' binary ' and the 'libex.a' library:
'ex' - consists of 'a.o', 'b.o' and libex.a
'libex.a' - consists of 'c.o', 'd.o', 'e.o' and 'f.o'
The build framework does not create the 'ex' and 'libex.a' binaries for you, it
only prepares proper objects to be compiled and grouped together.
To follow the above example, the user provides following 'Build' files:
ex/Build:
ex-y += a.o
ex-y += b.o
libex-y += c.o
libex-y += d.o
libex-y += arch/
ex/arch/Build:
libex-y += e.o
libex-y += f.o
and runs:
$ make -f tools/build/Makefile.build dir=. obj=ex
$ make -f tools/build/Makefile.build dir=. obj=libex
which creates the following objects:
ex/ex-in.o
ex/libex-in.o
that contain request objects names in Build files.
It's only a matter of 2 single commands to create the final binaries:
$ ar rcs libex.a libex-in.o
$ gcc -o ex ex-in.o libex.a
You can check the 'ex' example in 'tools/build/tests/ex' for more details.
b) Rules
--------
The build framework provides standard compilation rules to handle .S and .c
compilation.
It's possible to include special rule if needed (like we do for flex or bison
code generation).
c) CFLAGS
---------
It's possible to alter the standard object C flags in the following way:
CFLAGS_perf.o += '...' - alters CFLAGS for perf.o object
CFLAGS_gtk += '...' - alters CFLAGS for gtk build object
This C flags changes has the scope of the Build makefile they are defined in.
d) Dependencies
---------------
For each built object file 'a.o' the '.a.cmd' is created and holds:
- Command line used to built that object
(for each object)
- Dependency rules generated by 'gcc -Wp,-MD,...'
(for compiled object)
All existing '.cmd' files are included in the Build process to follow properly
the dependencies and trigger a rebuild when necessary.
e) Single rules
---------------
It's possible to build single object file by choice, like:
$ make util/map.o # objects
$ make util/map.i # preprocessor
$ make util/map.s # assembly

130
tools/build/Makefile.build Normal file
View file

@ -0,0 +1,130 @@
###
# Main build makefile.
#
# Lots of this code have been borrowed or heavily inspired from parts
# of kbuild code, which is not credited, but mostly developed by:
#
# Copyright (C) Sam Ravnborg <sam@mars.ravnborg.org>, 2015
# Copyright (C) Linus Torvalds <torvalds@linux-foundation.org>, 2015
#
PHONY := __build
__build:
ifeq ($(V),1)
quiet =
Q =
else
quiet=quiet_
Q=@
endif
build-dir := $(srctree)/tools/build
# Generic definitions
include $(build-dir)/Build.include
# do not force detected configuration
-include .config-detected
# Init all relevant variables used in build files so
# 1) they have correct type
# 2) they do not inherit any value from the environment
subdir-y :=
obj-y :=
subdir-y :=
subdir-obj-y :=
# Build definitions
build-file := $(dir)/Build
include $(build-file)
quiet_cmd_flex = FLEX $@
quiet_cmd_bison = BISON $@
# Create directory unless it exists
quiet_cmd_mkdir = MKDIR $(dir $@)
cmd_mkdir = mkdir -p $(dir $@)
rule_mkdir = $(if $(wildcard $(dir $@)),,@$(call echo-cmd,mkdir) $(cmd_mkdir))
# Compile command
quiet_cmd_cc_o_c = CC $@
cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $<
quiet_cmd_cc_i_c = CPP $@
cmd_cc_i_c = $(CC) $(c_flags) -E -o $@ $<
quiet_cmd_cc_s_c = AS $@
cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
# Link agregate command
# If there's nothing to link, create empty $@ object.
quiet_cmd_ld_multi = LD $@
cmd_ld_multi = $(if $(strip $(obj-y)),\
$(LD) -r -o $@ $(obj-y),rm -f $@; $(AR) rcs $@)
# Build rules
$(OUTPUT)%.o: %.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_o_c)
$(OUTPUT)%.o: %.S FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_o_c)
$(OUTPUT)%.i: %.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_i_c)
$(OUTPUT)%.i: %.S FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_i_c)
$(OUTPUT)%.s: %.c FORCE
$(call rule_mkdir)
$(call if_changed_dep,cc_s_c)
# Gather build data:
# obj-y - list of build objects
# subdir-y - list of directories to nest
# subdir-obj-y - list of directories objects 'dir/$(obj)-in.o'
obj-y := $($(obj)-y)
subdir-y := $(patsubst %/,%,$(filter %/, $(obj-y)))
obj-y := $(patsubst %/, %/$(obj)-in.o, $(obj-y))
subdir-obj-y := $(filter %/$(obj)-in.o, $(obj-y))
# '$(OUTPUT)/dir' prefix to all objects
prefix := $(subst ./,,$(OUTPUT)$(dir)/)
obj-y := $(addprefix $(prefix),$(obj-y))
subdir-obj-y := $(addprefix $(prefix),$(subdir-obj-y))
# Final '$(obj)-in.o' object
in-target := $(prefix)$(obj)-in.o
PHONY += $(subdir-y)
$(subdir-y):
$(Q)$(MAKE) -f $(build-dir)/Makefile.build dir=$(dir)/$@ obj=$(obj)
$(sort $(subdir-obj-y)): $(subdir-y) ;
$(in-target): $(obj-y) FORCE
$(call rule_mkdir)
$(call if_changed,ld_multi)
__build: $(in-target)
@:
PHONY += FORCE
FORCE:
# Include all cmd files to get all the dependency rules
# for all objects included
targets := $(wildcard $(sort $(obj-y) $(in-target) $(MAKECMDGOALS)))
cmd_files := $(wildcard $(foreach f,$(targets),$(dir $(f)).$(notdir $(f)).cmd))
ifneq ($(cmd_files),)
include $(cmd_files)
endif
.PHONY: $(PHONY)

View file

@ -0,0 +1,171 @@
feature_dir := $(srctree)/tools/build/feature
ifneq ($(OUTPUT),)
OUTPUT_FEATURES = $(OUTPUT)feature/
$(shell mkdir -p $(OUTPUT_FEATURES))
endif
feature_check = $(eval $(feature_check_code))
define feature_check_code
feature-$(1) := $(shell $(MAKE) OUTPUT=$(OUTPUT_FEATURES) CFLAGS="$(EXTRA_CFLAGS) $(FEATURE_CHECK_CFLAGS-$(1))" LDFLAGS="$(LDFLAGS) $(FEATURE_CHECK_LDFLAGS-$(1))" -C $(feature_dir) test-$1.bin >/dev/null 2>/dev/null && echo 1 || echo 0)
endef
feature_set = $(eval $(feature_set_code))
define feature_set_code
feature-$(1) := 1
endef
#
# Build the feature check binaries in parallel, ignore errors, ignore return value and suppress output:
#
#
# Note that this is not a complete list of all feature tests, just
# those that are typically built on a fully configured system.
#
# [ Feature tests not mentioned here have to be built explicitly in
# the rule that uses them - an example for that is the 'bionic'
# feature check. ]
#
FEATURE_TESTS = \
backtrace \
dwarf \
fortify-source \
sync-compare-and-swap \
glibc \
gtk2 \
gtk2-infobar \
libaudit \
libbfd \
libelf \
libelf-getphdrnum \
libelf-mmap \
libnuma \
libperl \
libpython \
libpython-version \
libslang \
libunwind \
pthread-attr-setaffinity-np \
stackprotector-all \
timerfd \
libdw-dwarf-unwind \
zlib \
lzma
FEATURE_DISPLAY = \
dwarf \
glibc \
gtk2 \
libaudit \
libbfd \
libelf \
libnuma \
libperl \
libpython \
libslang \
libunwind \
libdw-dwarf-unwind \
zlib \
lzma
# Set FEATURE_CHECK_(C|LD)FLAGS-all for all FEATURE_TESTS features.
# If in the future we need per-feature checks/flags for features not
# mentioned in this list we need to refactor this ;-).
set_test_all_flags = $(eval $(set_test_all_flags_code))
define set_test_all_flags_code
FEATURE_CHECK_CFLAGS-all += $(FEATURE_CHECK_CFLAGS-$(1))
FEATURE_CHECK_LDFLAGS-all += $(FEATURE_CHECK_LDFLAGS-$(1))
endef
$(foreach feat,$(FEATURE_TESTS),$(call set_test_all_flags,$(feat)))
#
# Special fast-path for the 'all features are available' case:
#
$(call feature_check,all,$(MSG))
#
# Just in case the build freshly failed, make sure we print the
# feature matrix:
#
ifeq ($(feature-all), 1)
#
# test-all.c passed - just set all the core feature flags to 1:
#
$(foreach feat,$(FEATURE_TESTS),$(call feature_set,$(feat)))
else
$(shell $(MAKE) OUTPUT=$(OUTPUT_FEATURES) CFLAGS="$(EXTRA_CFLAGS)" LDFLAGS=$(LDFLAGS) -i -j -C $(feature_dir) $(addsuffix .bin,$(FEATURE_TESTS)) >/dev/null 2>&1)
$(foreach feat,$(FEATURE_TESTS),$(call feature_check,$(feat)))
endif
#
# Print the result of the feature test:
#
feature_print_status = $(eval $(feature_print_status_code)) $(info $(MSG))
define feature_print_status_code
ifeq ($(feature-$(1)), 1)
MSG = $(shell printf '...%30s: [ \033[32mon\033[m ]' $(1))
else
MSG = $(shell printf '...%30s: [ \033[31mOFF\033[m ]' $(1))
endif
endef
feature_print_text = $(eval $(feature_print_text_code)) $(info $(MSG))
define feature_print_text_code
MSG = $(shell printf '...%30s: %s' $(1) $(2))
endef
FEATURE_DUMP := $(foreach feat,$(FEATURE_DISPLAY),feature-$(feat)($(feature-$(feat))))
FEATURE_DUMP_FILE := $(shell touch $(OUTPUT)FEATURE-DUMP; cat $(OUTPUT)FEATURE-DUMP)
ifeq ($(dwarf-post-unwind),1)
FEATURE_DUMP += dwarf-post-unwind($(dwarf-post-unwind-text))
endif
# The $(feature_display) controls the default detection message
# output. It's set if:
# - detected features differes from stored features from
# last build (in FEATURE-DUMP file)
# - one of the $(FEATURE_DISPLAY) is not detected
# - VF is enabled
ifneq ("$(FEATURE_DUMP)","$(FEATURE_DUMP_FILE)")
$(shell echo "$(FEATURE_DUMP)" > $(OUTPUT)FEATURE-DUMP)
feature_display := 1
endif
feature_display_check = $(eval $(feature_check_code))
define feature_display_check_code
ifneq ($(feature-$(1)), 1)
feature_display := 1
endif
endef
$(foreach feat,$(FEATURE_DISPLAY),$(call feature_display_check,$(feat)))
ifeq ($(VF),1)
feature_display := 1
feature_verbose := 1
endif
ifeq ($(feature_display),1)
$(info )
$(info Auto-detecting system features:)
$(foreach feat,$(FEATURE_DISPLAY),$(call feature_print_status,$(feat),))
ifeq ($(dwarf-post-unwind),1)
$(call feature_print_text,"DWARF post unwind library", $(dwarf-post-unwind-text))
endif
ifneq ($(feature_verbose),1)
$(info )
endif
endif
ifeq ($(feature_verbose),1)
TMP := $(filter-out $(FEATURE_DISPLAY),$(FEATURE_TESTS))
$(foreach feat,$(TMP),$(call feature_print_status,$(feat),))
$(info )
endif

View file

@ -1,2 +1,3 @@
*.d
*.bin
*.output

View file

@ -29,33 +29,36 @@ FILES= \
test-stackprotector-all.bin \
test-timerfd.bin \
test-libdw-dwarf-unwind.bin \
test-libbabeltrace.bin \
test-compile-32.bin \
test-compile-x32.bin \
test-zlib.bin
test-zlib.bin \
test-lzma.bin
CC := $(CROSS_COMPILE)gcc -MD
PKG_CONFIG := $(CROSS_COMPILE)pkg-config
all: $(FILES)
BUILD = $(CC) $(CFLAGS) -o $(OUTPUT)$@ $(patsubst %.bin,%.c,$@) $(LDFLAGS)
__BUILD = $(CC) $(CFLAGS) -Wall -Werror -o $(OUTPUT)$@ $(patsubst %.bin,%.c,$@) $(LDFLAGS)
BUILD = $(__BUILD) > $(OUTPUT)$(@:.bin=.make.output) 2>&1
###############################
test-all.bin:
$(BUILD) -Werror -fstack-protector-all -O2 -Werror -D_FORTIFY_SOURCE=2 -ldw -lelf -lnuma -lelf -laudit -I/usr/include/slang -lslang $(shell $(PKG_CONFIG) --libs --cflags gtk+-2.0 2>/dev/null) $(FLAGS_PERL_EMBED) $(FLAGS_PYTHON_EMBED) -DPACKAGE='"perf"' -lbfd -ldl -lz
$(BUILD) -fstack-protector-all -O2 -D_FORTIFY_SOURCE=2 -ldw -lelf -lnuma -lelf -laudit -I/usr/include/slang -lslang $(shell $(PKG_CONFIG) --libs --cflags gtk+-2.0 2>/dev/null) $(FLAGS_PERL_EMBED) $(FLAGS_PYTHON_EMBED) -DPACKAGE='"perf"' -lbfd -ldl -lz -llzma
test-hello.bin:
$(BUILD)
test-pthread-attr-setaffinity-np.bin:
$(BUILD) -D_GNU_SOURCE -Werror -lpthread
$(BUILD) -D_GNU_SOURCE -lpthread
test-stackprotector-all.bin:
$(BUILD) -Werror -fstack-protector-all
$(BUILD) -fstack-protector-all
test-fortify-source.bin:
$(BUILD) -O2 -Werror -D_FORTIFY_SOURCE=2
$(BUILD) -O2 -D_FORTIFY_SOURCE=2
test-bionic.bin:
$(BUILD)
@ -118,10 +121,10 @@ test-libbfd.bin:
$(BUILD) -DPACKAGE='"perf"' -lbfd -lz -liberty -ldl
test-liberty.bin:
$(CC) -o $(OUTPUT)$@ test-libbfd.c -DPACKAGE='"perf"' -lbfd -ldl -liberty
$(CC) -Wall -Werror -o $(OUTPUT)$@ test-libbfd.c -DPACKAGE='"perf"' -lbfd -ldl -liberty
test-liberty-z.bin:
$(CC) -o $(OUTPUT)$@ test-libbfd.c -DPACKAGE='"perf"' -lbfd -ldl -liberty -lz
$(CC) -Wall -Werror -o $(OUTPUT)$@ test-libbfd.c -DPACKAGE='"perf"' -lbfd -ldl -liberty -lz
test-cplus-demangle.bin:
$(BUILD) -liberty
@ -133,10 +136,13 @@ test-timerfd.bin:
$(BUILD)
test-libdw-dwarf-unwind.bin:
$(BUILD)
$(BUILD) # -ldw provided by $(FEATURE_CHECK_LDFLAGS-libdw-dwarf-unwind)
test-libbabeltrace.bin:
$(BUILD) # -lbabeltrace provided by $(FEATURE_CHECK_LDFLAGS-libbabeltrace)
test-sync-compare-and-swap.bin:
$(BUILD) -Werror
$(BUILD)
test-compile-32.bin:
$(CC) -m32 -o $(OUTPUT)$@ test-compile.c
@ -147,9 +153,12 @@ test-compile-x32.bin:
test-zlib.bin:
$(BUILD) -lz
test-lzma.bin:
$(BUILD) -llzma
-include *.d
###############################
clean:
rm -f $(FILES) *.d
rm -f $(FILES) *.d $(FILES:.bin=.make.output)

View file

@ -98,7 +98,23 @@
#undef main
#define main main_test_pthread_attr_setaffinity_np
# include "test-pthread_attr_setaffinity_np.c"
# include "test-pthread-attr-setaffinity-np.c"
#undef main
# if 0
/*
* Disable libbabeltrace check for test-all, because the requested
* library version is not released yet in most distributions. Will
* reenable later.
*/
#define main main_test_libbabeltrace
# include "test-libbabeltrace.c"
#undef main
#endif
#define main main_test_lzma
# include "test-lzma.c"
#undef main
int main(int argc, char *argv[])
@ -126,6 +142,7 @@ int main(int argc, char *argv[])
main_test_sync_compare_and_swap(argc, argv);
main_test_zlib();
main_test_pthread_attr_setaffinity_np();
main_test_lzma();
return 0;
}

View file

@ -0,0 +1,9 @@
#include <babeltrace/ctf-writer/writer.h>
#include <babeltrace/ctf-ir/stream-class.h>
int main(void)
{
bt_ctf_stream_class_get_packet_context_type((void *) 0);
return 0;
}

View file

@ -0,0 +1,10 @@
#include <lzma.h>
int main(void)
{
lzma_stream strm = LZMA_STREAM_INIT;
int ret;
ret = lzma_stream_decoder(&strm, UINT64_MAX, LZMA_CONCATENATED);
return ret ? -1 : 0;
}

View file

@ -1,5 +1,6 @@
#include <stdint.h>
#include <pthread.h>
#include <sched.h>
int main(void)
{
@ -8,7 +9,8 @@ int main(void)
cpu_set_t cs;
pthread_attr_init(&thread_attr);
/* don't care abt exact args, just the API itself in libpthread */
CPU_ZERO(&cs);
ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cs), &cs);
return ret;

View file

@ -0,0 +1,8 @@
ex-y += ex.o
ex-y += a.o
ex-y += b.o
ex-y += empty/
libex-y += c.o
libex-y += d.o
libex-y += arch/

View file

@ -0,0 +1,23 @@
export srctree := ../../../..
export CC := gcc
export LD := ld
export AR := ar
build := -f $(srctree)/tools/build/Makefile.build dir=. obj
ex: ex-in.o libex-in.o
gcc -o $@ $^
ex.%: FORCE
make -f $(srctree)/tools/build/Makefile.build dir=. $@
ex-in.o: FORCE
make $(build)=ex
libex-in.o: FORCE
make $(build)=libex
clean:
find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
rm -f ex ex.i ex.s
.PHONY: FORCE

5
tools/build/tests/ex/a.c Normal file
View file

@ -0,0 +1,5 @@
int a(void)
{
return 0;
}

View file

@ -0,0 +1,2 @@
libex-y += e.o
libex-y += f.o

View file

@ -0,0 +1,5 @@
int e(void)
{
return 0;
}

View file

@ -0,0 +1,5 @@
int f(void)
{
return 0;
}

5
tools/build/tests/ex/b.c Normal file
View file

@ -0,0 +1,5 @@
int b(void)
{
return 0;
}

Some files were not shown because too many files have changed in this diff Show more