linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-28 13:22:57 +00:00

History

Andrii Nakryiko 8e7c2a023a selftests/bpf: Add benchmark runner infrastructure While working on BPF ringbuf implementation, testing, and benchmarking, I've developed a pretty generic and modular benchmark runner, which seems to be generically useful, as I've already used it for one more purpose (testing fastest way to trigger BPF program, to minimize overhead of in-kernel code). This patch adds generic part of benchmark runner and sets up Makefile for extending it with more sets of benchmarks. Benchmarker itself operates by spinning up specified number of producer and consumer threads, setting up interval timer sending SIGALARM signal to application once a second. Every second, current snapshot with hits/drops counters are collected and stored in an array. Drops are useful for producer/consumer benchmarks in which producer might overwhelm consumers. Once test finishes after given amount of warm-up and testing seconds, mean and stddev are calculated (ignoring warm-up results) and is printed out to stdout. This setup seems to give consistent and accurate results. To validate behavior, I added two atomic counting tests: global and local. For global one, all the producer threads are atomically incrementing same counter as fast as possible. This, of course, leads to huge drop of performance once there is more than one producer thread due to CPUs fighting for the same memory location. Local counting, on the other hand, maintains one counter per each producer thread, incremented independently. Once per second, all counters are read and added together to form final "counting throughput" measurement. As expected, such setup demonstrates linear scalability with number of producers (as long as there are enough physical CPU cores, of course). See example output below. Also, this setup can nicely demonstrate disastrous effects of false sharing, if care is not taken to take those per-producer counters apart into independent cache lines. Demo output shows global counter first with 1 producer, then with 4. Both total and per-producer performance significantly drop. The last run is local counter with 4 producers, demonstrating near-perfect scalability. $ ./bench -a -w1 -d2 -p1 count-global Setting up benchmark 'count-global'... Benchmark 'count-global' started. Iter 0 ( 24.822us): hits 148.179M/s (148.179M/prod), drops 0.000M/s Iter 1 ( 37.939us): hits 149.308M/s (149.308M/prod), drops 0.000M/s Iter 2 (-10.774us): hits 150.717M/s (150.717M/prod), drops 0.000M/s Iter 3 ( 3.807us): hits 151.435M/s (151.435M/prod), drops 0.000M/s Summary: hits 150.488 ± 1.079M/s (150.488M/prod), drops 0.000 ± 0.000M/s $ ./bench -a -w1 -d2 -p4 count-global Setting up benchmark 'count-global'... Benchmark 'count-global' started. Iter 0 ( 60.659us): hits 53.910M/s ( 13.477M/prod), drops 0.000M/s Iter 1 (-17.658us): hits 53.722M/s ( 13.431M/prod), drops 0.000M/s Iter 2 ( 5.865us): hits 53.495M/s ( 13.374M/prod), drops 0.000M/s Iter 3 ( 0.104us): hits 53.606M/s ( 13.402M/prod), drops 0.000M/s Summary: hits 53.608 ± 0.113M/s ( 13.402M/prod), drops 0.000 ± 0.000M/s $ ./bench -a -w1 -d2 -p4 count-local Setting up benchmark 'count-local'... Benchmark 'count-local' started. Iter 0 ( 23.388us): hits 640.450M/s (160.113M/prod), drops 0.000M/s Iter 1 ( 2.291us): hits 605.661M/s (151.415M/prod), drops 0.000M/s Iter 2 ( -6.415us): hits 607.092M/s (151.773M/prod), drops 0.000M/s Iter 3 ( -1.361us): hits 601.796M/s (150.449M/prod), drops 0.000M/s Summary: hits 604.849 ± 2.739M/s (151.212M/prod), drops 0.000 ± 0.000M/s Benchmark runner supports setting thread affinity for producer and consumer threads. You can use -a flag for default CPU selection scheme, where first consumer gets CPU #0, next one gets CPU #1, and so on. Then producer threads pick up next CPU and increment one-by-one as well. But user can also specify a set of CPUs independently for producers and consumers with --prod-affinity 1,2-10,15 and --cons-affinity <set-of-cpus>. The latter allows to force producers and consumers to share same set of CPUs, if necessary. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-3-andriin@fb.com		2020-05-13 12:19:38 -07:00
..
benchs	selftests/bpf: Add benchmark runner infrastructure	2020-05-13 12:19:38 -07:00
gnu
map_tests	.gitignore: add SPDX License Identifier	2020-03-25 11:50:48 +01:00
prog_tests	tools/bpf: selftests: Add bpf_iter selftests	2020-05-09 17:05:27 -07:00
progs	bpf, libbpf: Replace zero-length array with flexible-array	2020-05-11 16:56:47 +02:00
verifier	selftests/bpf: Test allowed maps for bpf_sk_select_reuseport	2020-04-30 16:21:14 +02:00
.gitignore	selftests/bpf: Add benchmark runner infrastructure	2020-05-13 12:19:38 -07:00
bench.c	selftests/bpf: Add benchmark runner infrastructure	2020-05-13 12:19:38 -07:00
bench.h	selftests/bpf: Add benchmark runner infrastructure	2020-05-13 12:19:38 -07:00
bpf_legacy.h	selftests/bpf: samples/bpf: Split off legacy stuff from bpf_helpers.h	2019-10-08 23:16:03 +02:00
bpf_rand.h
bpf_rlimit.h
bpf_tcp_helpers.h	libbpf: Merge selftests' bpf_trace_helpers.h into libbpf's bpf_tracing.h	2020-03-02 16:25:14 -08:00
bpf_util.h	selftests: Use consistent include paths for libbpf	2020-01-20 16:37:45 -08:00
cgroup_helpers.c	selftests/bpf: Correct path to include msg + path	2019-10-03 17:21:57 +02:00
cgroup_helpers.h
config	selftests/bpf: Use reno instead of dctcp	2020-05-01 16:51:07 -07:00
flow_dissector_load.c
flow_dissector_load.h
get_cgroup_id_user.c
Makefile	selftests/bpf: Add benchmark runner infrastructure	2020-05-13 12:19:38 -07:00
netcnt_common.h
network_helpers.c	selftests/bpf: Move existing common networking parts into network_helpers	2020-05-09 00:48:20 +02:00
network_helpers.h	selftests/bpf: Move existing common networking parts into network_helpers	2020-05-09 00:48:20 +02:00
tcp_client.py
tcp_server.py
test_align.c
test_bpftool.py	selftests/bpf: Add test for "bpftool feature" command	2020-02-26 18:34:34 +01:00
test_bpftool.sh	selftests/bpf: Add test for "bpftool feature" command	2020-02-26 18:34:34 +01:00
test_bpftool_build.sh	selftests, bpftool: Skip the build test if not in tree	2019-11-24 16:58:45 -08:00
test_btf.c	selftests/bpf: Fix a couple of broken test_btf cases	2020-04-24 17:47:40 -07:00
test_btf.h
test_cgroup_storage.c	selftests/bpf: fix test_cgroup_storage on s390	2019-08-21 16:55:01 +02:00
test_cpp.cpp	selftests: Use consistent include paths for libbpf	2020-01-20 16:37:45 -08:00
test_current_pid_tgid_new_ns.c	tools/testing/selftests/bpf: Add self-tests for new helper bpf_get_ns_current_pid_tgid.	2020-03-12 17:40:47 -07:00
test_dev_cgroup.c
test_flow_dissector.c
test_flow_dissector.sh	selftests/bpf: Add test based on port range for BPF flow dissector	2020-01-27 11:25:07 +01:00
test_ftrace.sh	selftests/bpf: Test function_graph tracer and bpf trampoline together	2019-12-11 15:19:29 -08:00
test_iptunnel_common.h
test_kmod.sh
test_lirc_mode2.sh
test_lirc_mode2_user.c
test_lpm_map.c
test_lru_map.c
test_lwt_ip_encap.sh	selftests/bpf: More compatible nc options in test_lwt_ip_encap	2019-10-08 23:59:22 +02:00
test_lwt_seg6local.sh
test_maps.c	bpf, sockmap: Allow inserting listening TCP sockets into sockmap	2020-02-21 22:29:45 +01:00
test_maps.h
test_netcnt.c
test_offload.py	selftests: bpf: log direct file writes	2019-11-06 09:59:58 -08:00
test_progs.c	selftests/bpf: Extract parse_num_list into generic testing_helpers.c	2020-05-13 12:19:38 -07:00
test_progs.h	selftests/bpf: Extract parse_num_list into generic testing_helpers.c	2020-05-13 12:19:38 -07:00
test_select_reuseport_common.h
test_skb_cgroup_id.sh
test_skb_cgroup_id_user.c	selftests/bpf: Don't hard-code root cgroup id	2019-12-04 17:56:22 -08:00
test_sock.c	selftests: Use consistent include paths for libbpf	2020-01-20 16:37:45 -08:00
test_sock_addr.c
test_sock_addr.sh
test_sock_fields.c
test_socket_cookie.c
test_sockmap.c	selftests: bpf: Use a temporary file in test_sockmap	2020-01-24 22:12:13 +01:00
test_sockmap_kern.h	selftests: Use consistent include paths for libbpf	2020-01-20 16:37:45 -08:00
test_stub.c	selftests/bpf: Integrate verbose verifier log into test_progs	2019-11-24 16:58:45 -08:00
test_sysctl.c	selftests: Use consistent include paths for libbpf	2020-01-20 16:37:45 -08:00
test_tag.c
test_tc_edt.sh	selftests/bpf: More compatible nc options in test_tc_edt	2019-10-18 22:33:57 +02:00
test_tc_tunnel.sh	selftests, bpf: Fix test_tc_tunnel hanging	2019-11-18 21:31:49 +01:00
test_tcp_check_syncookie.sh
test_tcp_check_syncookie_user.c
test_tcpbpf.h	selftests/bpf: De-flake test_tcpbpf	2019-12-04 18:01:05 -08:00
test_tcpbpf_user.c	selftests/bpf: De-flake test_tcpbpf	2019-12-04 18:01:05 -08:00
test_tcpnotify.h
test_tcpnotify_user.c
test_tunnel.sh
test_verifier.c	selftests/bpf: Test allowed maps for bpf_sk_select_reuseport	2020-04-30 16:21:14 +02:00
test_verifier_log.c
test_xdp_meta.sh
test_xdp_redirect.sh
test_xdp_veth.sh
test_xdp_vlan.sh
test_xdp_vlan_mode_generic.sh
test_xdp_vlan_mode_native.sh
test_xdping.sh
testing_helpers.c	selftests/bpf: Extract parse_num_list into generic testing_helpers.c	2020-05-13 12:19:38 -07:00
testing_helpers.h	selftests/bpf: Extract parse_num_list into generic testing_helpers.c	2020-05-13 12:19:38 -07:00
trace_helpers.c	samples, bpf: Move read_trace_pipe to trace_helpers	2020-03-23 22:27:51 +01:00
trace_helpers.h	samples, bpf: Move read_trace_pipe to trace_helpers	2020-03-23 22:27:51 +01:00
urandom_read.c
with_addr.sh
with_tunnels.sh
xdping.c	selftests: bpf: correct perror strings	2019-11-28 22:40:30 -08:00
xdping.h