2020-03-03 13:35:59 +00:00
|
|
|
# SPDX-License-Identifier: GPL-2.0-only
|
2021-03-02 17:19:43 +00:00
|
|
|
bpf-helpers*
|
2021-03-02 17:19:45 +00:00
|
|
|
bpf-syscall*
|
2016-10-17 12:28:36 +00:00
|
|
|
test_verifier
|
|
|
|
test_maps
|
2016-12-03 20:31:33 +00:00
|
|
|
test_lru_map
|
2017-01-21 16:26:12 +00:00
|
|
|
test_lpm_map
|
2017-02-09 23:21:45 +00:00
|
|
|
test_tag
|
2018-01-19 00:36:24 +00:00
|
|
|
FEATURE-DUMP.libbpf
|
|
|
|
fixdep
|
|
|
|
test_dev_cgroup
|
2021-09-14 16:22:28 +00:00
|
|
|
/test_progs
|
|
|
|
/test_progs-no_alu32
|
|
|
|
/test_progs-bpf_gcc
|
2018-01-19 00:36:24 +00:00
|
|
|
test_verifier_log
|
|
|
|
feature
|
2018-04-10 12:24:21 +00:00
|
|
|
test_sock
|
|
|
|
test_sock_addr
|
|
|
|
urandom_read
|
2018-05-08 13:36:37 +00:00
|
|
|
test_sockmap
|
2018-05-27 11:24:10 +00:00
|
|
|
test_lirc_mode2_user
|
2018-06-03 22:59:43 +00:00
|
|
|
get_cgroup_id_user
|
2018-09-03 16:05:27 +00:00
|
|
|
test_skb_cgroup_id_user
|
|
|
|
test_cgroup_storage
|
2018-09-14 14:46:22 +00:00
|
|
|
test_flow_dissector
|
|
|
|
flow_dissector_load
|
2018-12-20 06:13:09 +00:00
|
|
|
test_tcpnotify_user
|
2019-01-09 00:07:28 +00:00
|
|
|
test_libbpf
|
2019-03-22 01:54:06 +00:00
|
|
|
test_tcp_check_syncookie_user
|
2019-05-16 16:46:57 +00:00
|
|
|
test_sysctl
|
selftests/bpf: measure RTT from xdp using xdping
xdping allows us to get latency estimates from XDP. Output looks
like this:
./xdping -I eth4 192.168.55.8
Setting up XDP for eth4, please wait...
XDP setup disrupts network connectivity, hit Ctrl+C to quit
Normal ping RTT data
[Ignore final RTT; it is distorted by XDP using the reply]
PING 192.168.55.8 (192.168.55.8) from 192.168.55.7 eth4: 56(84) bytes of data.
64 bytes from 192.168.55.8: icmp_seq=1 ttl=64 time=0.302 ms
64 bytes from 192.168.55.8: icmp_seq=2 ttl=64 time=0.208 ms
64 bytes from 192.168.55.8: icmp_seq=3 ttl=64 time=0.163 ms
64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.275 ms
4 packets transmitted, 4 received, 0% packet loss, time 3079ms
rtt min/avg/max/mdev = 0.163/0.237/0.302/0.054 ms
XDP RTT data:
64 bytes from 192.168.55.8: icmp_seq=5 ttl=64 time=0.02808 ms
64 bytes from 192.168.55.8: icmp_seq=6 ttl=64 time=0.02804 ms
64 bytes from 192.168.55.8: icmp_seq=7 ttl=64 time=0.02815 ms
64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.02805 ms
The xdping program loads the associated xdping_kern.o BPF program
and attaches it to the specified interface. If run in client
mode (the default), it will add a map entry keyed by the
target IP address; this map will store RTT measurements, current
sequence number etc. Finally in client mode the ping command
is executed, and the xdping BPF program will use the last ICMP
reply, reformulate it as an ICMP request with the next sequence
number and XDP_TX it. After the reply to that request is received
we can measure RTT and repeat until the desired number of
measurements is made. This is why the sequence numbers in the
normal ping are 1, 2, 3 and 8. We XDP_TX a modified version
of ICMP reply 4 and keep doing this until we get the 4 replies
we need; hence the networking stack only sees reply 8, where
we have XDP_PASSed it upstream since we are done.
In server mode (-s), xdping simply takes ICMP requests and replies
to them in XDP rather than passing the request up to the networking
stack. No map entry is required.
xdping can be run in native XDP mode (the default, or specified
via -N) or in skb mode (-S).
A test program test_xdping.sh exercises some of these options.
Note that native XDP does not seem to XDP_TX for veths, hence -N
is not tested. Looking at the code, it looks like XDP_TX is
supported so I'm not sure if that's expected. Running xdping in
native mode for ixgbe as both client and server works fine.
Changes since v4
- close fds on cleanup (Song Liu)
Changes since v3
- fixed seq to be __be16 (Song Liu)
- fixed fd checks in xdping.c (Song Liu)
Changes since v2
- updated commit message to explain why seq number of last
ICMP reply is 8 not 4 (Song Liu)
- updated types of seq number, raddr and eliminated csum variable
in xdpclient/xdpserver functions as it was not needed (Song Liu)
- added XDPING_DEFAULT_COUNT definition and usage specification of
default/max counts (Song Liu)
Changes since v1
- moved from RFC to PATCH
- removed unused variable in ipv4_csum() (Song Liu)
- refactored ICMP checks into icmp_check() function called by client
and server programs and reworked client and server programs due
to lack of shared code (Song Liu)
- added checks to ensure that SKB and native mode are not requested
together (Song Liu)
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-31 17:47:14 +00:00
|
|
|
xdping
|
2019-12-02 21:59:31 +00:00
|
|
|
test_cpp
|
2019-12-14 01:43:38 +00:00
|
|
|
*.skel.h
|
2021-05-14 00:36:20 +00:00
|
|
|
*.lskel.h
|
2019-10-25 04:55:03 +00:00
|
|
|
/no_alu32
|
selftests/bpf: Replace test_progs and test_maps w/ general rule
Define test runner generation meta-rule that codifies dependencies
between test runner, its tests, and its dependent BPF programs. Use that
for defining test_progs and test_maps test-runners. Also additionally define
2 flavors of test_progs:
- alu32, which builds BPF programs with 32-bit registers codegen;
- bpf_gcc, which build BPF programs using GCC, if it supports BPF target.
Overall, this is accomplished through $(eval)'ing a set of generic
rules, which defines Makefile targets dynamically at runtime. See
comments explaining the need for 2 $(evals), though.
For each test runner we have (test_maps and test_progs, currently), and,
optionally, their flavors, the logic of build process is modeled as
follows (using test_progs as an example):
- all BPF objects are in progs/:
- BPF object's .o file is built into output directory from
corresponding progs/.c file;
- all BPF objects in progs/*.c depend on all progs/*.h headers;
- all BPF objects depend on bpf_*.h helpers from libbpf (but not
libbpf archive). There is an extra rule to trigger bpf_helper_defs.h
(re-)build, if it's not present/outdated);
- build recipe for BPF object can be re-defined per test runner/flavor;
- test files are built from prog_tests/*.c:
- all such test file objects are built on individual file basis;
- currently, every single test file depends on all BPF object files;
this might be improved in follow up patches to do 1-to-1 dependency,
but allowing to customize this per each individual test;
- each test runner definition can specify a list of extra .c and .h
files to be built along test files and test runner binary; all such
headers are becoming automatic dependency of each test .c file;
- due to test files sometimes embedding (using .incbin assembly
directive) contents of some BPF objects at compilation time, which are
expected to be in CWD of compiler, compilation for test file object does
cd into test runner's output directory; to support this mode all the
include paths are turned into absolute paths using $(abspath) make
function;
- prog_tests/test.h is automatically (re-)generated with an entry for
each .c file in prog_tests/;
- final test runner binary is linked together from test object files and
extra object files, linking together libbpf's archive as well;
- it's possible to specify extra "resource" files/targets, which will be
copied into test runner output directory, if it differes from
Makefile-wide $(OUTPUT). This is used to ensure btf_dump test cases and
urandom_read binary is put into a test runner's CWD for tests to find
them in runtime.
For flavored test runners, their output directory is a subdirectory of
common Makefile-wide $(OUTPUT) directory with flavor name used as
subdirectory name.
BPF objects targets might be reused between different test runners, so
extra checks are employed to not double-define them. Similarly, we have
redefinition guards for output directories and test headers.
test_verifier follows slightly different patterns and is simple enough
to not justify generalizing TEST_RUNNER_DEFINE/TEST_RUNNER_DEFINE_RULES
further to accomodate these differences. Instead, rules for
test_verifier are minimized and simplified, while preserving correctness
of dependencies.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191016060051.2024182-6-andriin@fb.com
2019-10-16 06:00:49 +00:00
|
|
|
/bpf_gcc
|
2019-12-14 01:43:38 +00:00
|
|
|
/tools
|
2020-04-29 01:21:11 +00:00
|
|
|
/runqslower
|
selftests/bpf: Add benchmark runner infrastructure
While working on BPF ringbuf implementation, testing, and benchmarking, I've
developed a pretty generic and modular benchmark runner, which seems to be
generically useful, as I've already used it for one more purpose (testing
fastest way to trigger BPF program, to minimize overhead of in-kernel code).
This patch adds generic part of benchmark runner and sets up Makefile for
extending it with more sets of benchmarks.
Benchmarker itself operates by spinning up specified number of producer and
consumer threads, setting up interval timer sending SIGALARM signal to
application once a second. Every second, current snapshot with hits/drops
counters are collected and stored in an array. Drops are useful for
producer/consumer benchmarks in which producer might overwhelm consumers.
Once test finishes after given amount of warm-up and testing seconds, mean and
stddev are calculated (ignoring warm-up results) and is printed out to stdout.
This setup seems to give consistent and accurate results.
To validate behavior, I added two atomic counting tests: global and local.
For global one, all the producer threads are atomically incrementing same
counter as fast as possible. This, of course, leads to huge drop of
performance once there is more than one producer thread due to CPUs fighting
for the same memory location.
Local counting, on the other hand, maintains one counter per each producer
thread, incremented independently. Once per second, all counters are read and
added together to form final "counting throughput" measurement. As expected,
such setup demonstrates linear scalability with number of producers (as long
as there are enough physical CPU cores, of course). See example output below.
Also, this setup can nicely demonstrate disastrous effects of false sharing,
if care is not taken to take those per-producer counters apart into
independent cache lines.
Demo output shows global counter first with 1 producer, then with 4. Both
total and per-producer performance significantly drop. The last run is local
counter with 4 producers, demonstrating near-perfect scalability.
$ ./bench -a -w1 -d2 -p1 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter 0 ( 24.822us): hits 148.179M/s (148.179M/prod), drops 0.000M/s
Iter 1 ( 37.939us): hits 149.308M/s (149.308M/prod), drops 0.000M/s
Iter 2 (-10.774us): hits 150.717M/s (150.717M/prod), drops 0.000M/s
Iter 3 ( 3.807us): hits 151.435M/s (151.435M/prod), drops 0.000M/s
Summary: hits 150.488 ± 1.079M/s (150.488M/prod), drops 0.000 ± 0.000M/s
$ ./bench -a -w1 -d2 -p4 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter 0 ( 60.659us): hits 53.910M/s ( 13.477M/prod), drops 0.000M/s
Iter 1 (-17.658us): hits 53.722M/s ( 13.431M/prod), drops 0.000M/s
Iter 2 ( 5.865us): hits 53.495M/s ( 13.374M/prod), drops 0.000M/s
Iter 3 ( 0.104us): hits 53.606M/s ( 13.402M/prod), drops 0.000M/s
Summary: hits 53.608 ± 0.113M/s ( 13.402M/prod), drops 0.000 ± 0.000M/s
$ ./bench -a -w1 -d2 -p4 count-local
Setting up benchmark 'count-local'...
Benchmark 'count-local' started.
Iter 0 ( 23.388us): hits 640.450M/s (160.113M/prod), drops 0.000M/s
Iter 1 ( 2.291us): hits 605.661M/s (151.415M/prod), drops 0.000M/s
Iter 2 ( -6.415us): hits 607.092M/s (151.773M/prod), drops 0.000M/s
Iter 3 ( -1.361us): hits 601.796M/s (150.449M/prod), drops 0.000M/s
Summary: hits 604.849 ± 2.739M/s (151.212M/prod), drops 0.000 ± 0.000M/s
Benchmark runner supports setting thread affinity for producer and consumer
threads. You can use -a flag for default CPU selection scheme, where first
consumer gets CPU #0, next one gets CPU #1, and so on. Then producer threads
pick up next CPU and increment one-by-one as well. But user can also specify
a set of CPUs independently for producers and consumers with --prod-affinity
1,2-10,15 and --cons-affinity <set-of-cpus>. The latter allows to force
producers and consumers to share same set of CPUs, if necessary.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-3-andriin@fb.com
2020-05-12 19:24:43 +00:00
|
|
|
/bench
|
2020-12-03 20:46:26 +00:00
|
|
|
*.ko
|
2021-06-08 01:57:56 +00:00
|
|
|
*.tmp
|
2020-12-10 11:54:35 +00:00
|
|
|
xdpxceiver
|
2021-06-03 00:40:26 +00:00
|
|
|
xdp_redirect_multi
|