linux-stable/tools/testing/selftests/bpf
Andrii Nakryiko 8e7c2a023a selftests/bpf: Add benchmark runner infrastructure
While working on BPF ringbuf implementation, testing, and benchmarking, I've
developed a pretty generic and modular benchmark runner, which seems to be
generically useful, as I've already used it for one more purpose (testing
fastest way to trigger BPF program, to minimize overhead of in-kernel code).

This patch adds generic part of benchmark runner and sets up Makefile for
extending it with more sets of benchmarks.

Benchmarker itself operates by spinning up specified number of producer and
consumer threads, setting up interval timer sending SIGALARM signal to
application once a second. Every second, current snapshot with hits/drops
counters are collected and stored in an array. Drops are useful for
producer/consumer benchmarks in which producer might overwhelm consumers.

Once test finishes after given amount of warm-up and testing seconds, mean and
stddev are calculated (ignoring warm-up results) and is printed out to stdout.
This setup seems to give consistent and accurate results.

To validate behavior, I added two atomic counting tests: global and local.
For global one, all the producer threads are atomically incrementing same
counter as fast as possible. This, of course, leads to huge drop of
performance once there is more than one producer thread due to CPUs fighting
for the same memory location.

Local counting, on the other hand, maintains one counter per each producer
thread, incremented independently. Once per second, all counters are read and
added together to form final "counting throughput" measurement. As expected,
such setup demonstrates linear scalability with number of producers (as long
as there are enough physical CPU cores, of course). See example output below.
Also, this setup can nicely demonstrate disastrous effects of false sharing,
if care is not taken to take those per-producer counters apart into
independent cache lines.

Demo output shows global counter first with 1 producer, then with 4. Both
total and per-producer performance significantly drop. The last run is local
counter with 4 producers, demonstrating near-perfect scalability.

$ ./bench -a -w1 -d2 -p1 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter   0 ( 24.822us): hits  148.179M/s (148.179M/prod), drops    0.000M/s
Iter   1 ( 37.939us): hits  149.308M/s (149.308M/prod), drops    0.000M/s
Iter   2 (-10.774us): hits  150.717M/s (150.717M/prod), drops    0.000M/s
Iter   3 (  3.807us): hits  151.435M/s (151.435M/prod), drops    0.000M/s
Summary: hits  150.488 ± 1.079M/s (150.488M/prod), drops    0.000 ± 0.000M/s

$ ./bench -a -w1 -d2 -p4 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter   0 ( 60.659us): hits   53.910M/s ( 13.477M/prod), drops    0.000M/s
Iter   1 (-17.658us): hits   53.722M/s ( 13.431M/prod), drops    0.000M/s
Iter   2 (  5.865us): hits   53.495M/s ( 13.374M/prod), drops    0.000M/s
Iter   3 (  0.104us): hits   53.606M/s ( 13.402M/prod), drops    0.000M/s
Summary: hits   53.608 ± 0.113M/s ( 13.402M/prod), drops    0.000 ± 0.000M/s

$ ./bench -a -w1 -d2 -p4 count-local
Setting up benchmark 'count-local'...
Benchmark 'count-local' started.
Iter   0 ( 23.388us): hits  640.450M/s (160.113M/prod), drops    0.000M/s
Iter   1 (  2.291us): hits  605.661M/s (151.415M/prod), drops    0.000M/s
Iter   2 ( -6.415us): hits  607.092M/s (151.773M/prod), drops    0.000M/s
Iter   3 ( -1.361us): hits  601.796M/s (150.449M/prod), drops    0.000M/s
Summary: hits  604.849 ± 2.739M/s (151.212M/prod), drops    0.000 ± 0.000M/s

Benchmark runner supports setting thread affinity for producer and consumer
threads. You can use -a flag for default CPU selection scheme, where first
consumer gets CPU #0, next one gets CPU #1, and so on. Then producer threads
pick up next CPU and increment one-by-one as well. But user can also specify
a set of CPUs independently for producers and consumers with --prod-affinity
1,2-10,15 and --cons-affinity <set-of-cpus>. The latter allows to force
producers and consumers to share same set of CPUs, if necessary.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-3-andriin@fb.com
2020-05-13 12:19:38 -07:00
..
benchs selftests/bpf: Add benchmark runner infrastructure 2020-05-13 12:19:38 -07:00
gnu
map_tests .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
prog_tests tools/bpf: selftests: Add bpf_iter selftests 2020-05-09 17:05:27 -07:00
progs bpf, libbpf: Replace zero-length array with flexible-array 2020-05-11 16:56:47 +02:00
verifier selftests/bpf: Test allowed maps for bpf_sk_select_reuseport 2020-04-30 16:21:14 +02:00
.gitignore selftests/bpf: Add benchmark runner infrastructure 2020-05-13 12:19:38 -07:00
bench.c selftests/bpf: Add benchmark runner infrastructure 2020-05-13 12:19:38 -07:00
bench.h selftests/bpf: Add benchmark runner infrastructure 2020-05-13 12:19:38 -07:00
bpf_legacy.h selftests/bpf: samples/bpf: Split off legacy stuff from bpf_helpers.h 2019-10-08 23:16:03 +02:00
bpf_rand.h
bpf_rlimit.h
bpf_tcp_helpers.h libbpf: Merge selftests' bpf_trace_helpers.h into libbpf's bpf_tracing.h 2020-03-02 16:25:14 -08:00
bpf_util.h selftests: Use consistent include paths for libbpf 2020-01-20 16:37:45 -08:00
cgroup_helpers.c selftests/bpf: Correct path to include msg + path 2019-10-03 17:21:57 +02:00
cgroup_helpers.h
config selftests/bpf: Use reno instead of dctcp 2020-05-01 16:51:07 -07:00
flow_dissector_load.c
flow_dissector_load.h
get_cgroup_id_user.c
Makefile selftests/bpf: Add benchmark runner infrastructure 2020-05-13 12:19:38 -07:00
netcnt_common.h
network_helpers.c selftests/bpf: Move existing common networking parts into network_helpers 2020-05-09 00:48:20 +02:00
network_helpers.h selftests/bpf: Move existing common networking parts into network_helpers 2020-05-09 00:48:20 +02:00
tcp_client.py
tcp_server.py
test_align.c
test_bpftool.py selftests/bpf: Add test for "bpftool feature" command 2020-02-26 18:34:34 +01:00
test_bpftool.sh selftests/bpf: Add test for "bpftool feature" command 2020-02-26 18:34:34 +01:00
test_bpftool_build.sh selftests, bpftool: Skip the build test if not in tree 2019-11-24 16:58:45 -08:00
test_btf.c selftests/bpf: Fix a couple of broken test_btf cases 2020-04-24 17:47:40 -07:00
test_btf.h
test_cgroup_storage.c selftests/bpf: fix test_cgroup_storage on s390 2019-08-21 16:55:01 +02:00
test_cpp.cpp selftests: Use consistent include paths for libbpf 2020-01-20 16:37:45 -08:00
test_current_pid_tgid_new_ns.c tools/testing/selftests/bpf: Add self-tests for new helper bpf_get_ns_current_pid_tgid. 2020-03-12 17:40:47 -07:00
test_dev_cgroup.c
test_flow_dissector.c
test_flow_dissector.sh selftests/bpf: Add test based on port range for BPF flow dissector 2020-01-27 11:25:07 +01:00
test_ftrace.sh selftests/bpf: Test function_graph tracer and bpf trampoline together 2019-12-11 15:19:29 -08:00
test_iptunnel_common.h
test_kmod.sh
test_lirc_mode2.sh
test_lirc_mode2_user.c
test_lpm_map.c
test_lru_map.c
test_lwt_ip_encap.sh selftests/bpf: More compatible nc options in test_lwt_ip_encap 2019-10-08 23:59:22 +02:00
test_lwt_seg6local.sh
test_maps.c bpf, sockmap: Allow inserting listening TCP sockets into sockmap 2020-02-21 22:29:45 +01:00
test_maps.h
test_netcnt.c
test_offload.py selftests: bpf: log direct file writes 2019-11-06 09:59:58 -08:00
test_progs.c selftests/bpf: Extract parse_num_list into generic testing_helpers.c 2020-05-13 12:19:38 -07:00
test_progs.h selftests/bpf: Extract parse_num_list into generic testing_helpers.c 2020-05-13 12:19:38 -07:00
test_select_reuseport_common.h
test_skb_cgroup_id.sh
test_skb_cgroup_id_user.c selftests/bpf: Don't hard-code root cgroup id 2019-12-04 17:56:22 -08:00
test_sock.c selftests: Use consistent include paths for libbpf 2020-01-20 16:37:45 -08:00
test_sock_addr.c
test_sock_addr.sh
test_sock_fields.c
test_socket_cookie.c
test_sockmap.c selftests: bpf: Use a temporary file in test_sockmap 2020-01-24 22:12:13 +01:00
test_sockmap_kern.h selftests: Use consistent include paths for libbpf 2020-01-20 16:37:45 -08:00
test_stub.c selftests/bpf: Integrate verbose verifier log into test_progs 2019-11-24 16:58:45 -08:00
test_sysctl.c selftests: Use consistent include paths for libbpf 2020-01-20 16:37:45 -08:00
test_tag.c
test_tc_edt.sh selftests/bpf: More compatible nc options in test_tc_edt 2019-10-18 22:33:57 +02:00
test_tc_tunnel.sh selftests, bpf: Fix test_tc_tunnel hanging 2019-11-18 21:31:49 +01:00
test_tcp_check_syncookie.sh
test_tcp_check_syncookie_user.c
test_tcpbpf.h selftests/bpf: De-flake test_tcpbpf 2019-12-04 18:01:05 -08:00
test_tcpbpf_user.c selftests/bpf: De-flake test_tcpbpf 2019-12-04 18:01:05 -08:00
test_tcpnotify.h
test_tcpnotify_user.c
test_tunnel.sh
test_verifier.c selftests/bpf: Test allowed maps for bpf_sk_select_reuseport 2020-04-30 16:21:14 +02:00
test_verifier_log.c
test_xdp_meta.sh
test_xdp_redirect.sh
test_xdp_veth.sh
test_xdp_vlan.sh
test_xdp_vlan_mode_generic.sh
test_xdp_vlan_mode_native.sh
test_xdping.sh
testing_helpers.c selftests/bpf: Extract parse_num_list into generic testing_helpers.c 2020-05-13 12:19:38 -07:00
testing_helpers.h selftests/bpf: Extract parse_num_list into generic testing_helpers.c 2020-05-13 12:19:38 -07:00
trace_helpers.c samples, bpf: Move read_trace_pipe to trace_helpers 2020-03-23 22:27:51 +01:00
trace_helpers.h samples, bpf: Move read_trace_pipe to trace_helpers 2020-03-23 22:27:51 +01:00
urandom_read.c
with_addr.sh
with_tunnels.sh
xdping.c selftests: bpf: correct perror strings 2019-11-28 22:40:30 -08:00
xdping.h