linux-stable/tools/testing/selftests/bpf/benchs/bench_trigger.c

312 lines
7.2 KiB
C
Raw Normal View History

selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Facebook */
#include "bench.h"
#include "trigger_bench.skel.h"
2021-11-16 01:30:41 +00:00
#include "trace_helpers.h"
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
/* BPF triggering benchmarks */
static struct trigger_ctx {
struct trigger_bench *skel;
} ctx;
static struct counter base_hits;
static void trigger_validate(void)
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
{
if (env.consumer_cnt != 0) {
fprintf(stderr, "benchmark doesn't support consumer!\n");
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
exit(1);
}
}
static void *trigger_base_producer(void *input)
{
while (true) {
(void)syscall(__NR_getpgid);
atomic_inc(&base_hits.value);
}
return NULL;
}
static void trigger_base_measure(struct bench_res *res)
{
res->hits = atomic_swap(&base_hits.value, 0);
}
static void *trigger_producer(void *input)
{
while (true)
(void)syscall(__NR_getpgid);
return NULL;
}
static void trigger_measure(struct bench_res *res)
{
res->hits = atomic_swap(&ctx.skel->bss->hits, 0);
}
static void setup_ctx(void)
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
{
setup_libbpf();
ctx.skel = trigger_bench__open_and_load();
if (!ctx.skel) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
}
static void attach_bpf(struct bpf_program *prog)
{
struct bpf_link *link;
link = bpf_program__attach(prog);
if (!link) {
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
fprintf(stderr, "failed to attach program!\n");
exit(1);
}
}
static void trigger_tp_setup(void)
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_tp);
}
static void trigger_rawtp_setup(void)
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_raw_tp);
}
static void trigger_kprobe_setup(void)
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_kprobe);
}
static void trigger_fentry_setup(void)
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fentry);
}
static void trigger_fentry_sleep_setup(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fentry_sleep);
}
static void trigger_fmodret_setup(void)
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fmodret);
}
2021-11-16 01:30:41 +00:00
/* make sure call is not inlined and not avoided by compiler, so __weak and
* inline asm volatile in the body of the function
*
* There is a performance difference between uprobing at nop location vs other
* instructions. So use two different targets, one of which starts with nop
* and another doesn't.
*
* GCC doesn't generate stack setup preample for these functions due to them
* having no input arguments and doing nothing in the body.
*/
__weak void uprobe_target_with_nop(void)
{
asm volatile ("nop");
}
__weak void uprobe_target_without_nop(void)
{
asm volatile ("");
}
static void *uprobe_base_producer(void *input)
{
while (true) {
uprobe_target_with_nop();
atomic_inc(&base_hits.value);
}
return NULL;
}
static void *uprobe_producer_with_nop(void *input)
{
while (true)
uprobe_target_with_nop();
return NULL;
}
static void *uprobe_producer_without_nop(void *input)
{
while (true)
uprobe_target_without_nop();
return NULL;
}
static void usetup(bool use_retprobe, bool use_nop)
{
size_t uprobe_offset;
struct bpf_link *link;
setup_libbpf();
ctx.skel = trigger_bench__open_and_load();
if (!ctx.skel) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
if (use_nop)
uprobe_offset = get_uprobe_offset(&uprobe_target_with_nop);
2021-11-16 01:30:41 +00:00
else
uprobe_offset = get_uprobe_offset(&uprobe_target_without_nop);
2021-11-16 01:30:41 +00:00
link = bpf_program__attach_uprobe(ctx.skel->progs.bench_trigger_uprobe,
use_retprobe,
-1 /* all PIDs */,
"/proc/self/exe",
uprobe_offset);
if (!link) {
fprintf(stderr, "failed to attach uprobe!\n");
exit(1);
}
ctx.skel->links.bench_trigger_uprobe = link;
}
static void uprobe_setup_with_nop(void)
2021-11-16 01:30:41 +00:00
{
usetup(false, true);
}
static void uretprobe_setup_with_nop(void)
2021-11-16 01:30:41 +00:00
{
usetup(true, true);
}
static void uprobe_setup_without_nop(void)
2021-11-16 01:30:41 +00:00
{
usetup(false, false);
}
static void uretprobe_setup_without_nop(void)
2021-11-16 01:30:41 +00:00
{
usetup(true, false);
}
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
const struct bench bench_trig_base = {
.name = "trig-base",
.validate = trigger_validate,
.producer_thread = trigger_base_producer,
.measure = trigger_base_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_tp = {
.name = "trig-tp",
.validate = trigger_validate,
.setup = trigger_tp_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_rawtp = {
.name = "trig-rawtp",
.validate = trigger_validate,
.setup = trigger_rawtp_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_kprobe = {
.name = "trig-kprobe",
.validate = trigger_validate,
.setup = trigger_kprobe_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_fentry = {
.name = "trig-fentry",
.validate = trigger_validate,
.setup = trigger_fentry_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_fentry_sleep = {
.name = "trig-fentry-sleep",
.validate = trigger_validate,
.setup = trigger_fentry_sleep_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
selftest/bpf: Add BPF triggering benchmark It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
2020-05-12 19:24:45 +00:00
const struct bench bench_trig_fmodret = {
.name = "trig-fmodret",
.validate = trigger_validate,
.setup = trigger_fmodret_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
2021-11-16 01:30:41 +00:00
const struct bench bench_trig_uprobe_base = {
.name = "trig-uprobe-base",
.setup = NULL, /* no uprobe/uretprobe is attached */
.producer_thread = uprobe_base_producer,
.measure = trigger_base_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uprobe_with_nop = {
.name = "trig-uprobe-with-nop",
.setup = uprobe_setup_with_nop,
.producer_thread = uprobe_producer_with_nop,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uretprobe_with_nop = {
.name = "trig-uretprobe-with-nop",
.setup = uretprobe_setup_with_nop,
.producer_thread = uprobe_producer_with_nop,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uprobe_without_nop = {
.name = "trig-uprobe-without-nop",
.setup = uprobe_setup_without_nop,
.producer_thread = uprobe_producer_without_nop,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uretprobe_without_nop = {
.name = "trig-uretprobe-without-nop",
.setup = uretprobe_setup_without_nop,
.producer_thread = uprobe_producer_without_nop,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};