Commit graph

1044427 commits

Author SHA1 Message Date
Joanne Koong
57fd1c63c9 bpf/benchs: Add benchmark tests for bloom filter throughput + false positive
This patch adds benchmark tests for the throughput (for lookups + updates)
and the false positive rate of bloom filter lookups, as well as some
minor refactoring of the bash script for running the benchmarks.

These benchmarks show that as the number of hash functions increases,
the throughput and the false positive rate of the bloom filter decreases.
>From the benchmark data, the approximate average false-positive rates
are roughly as follows:

1 hash function = ~30%
2 hash functions = ~15%
3 hash functions = ~5%
4 hash functions = ~2.5%
5 hash functions = ~1%
6 hash functions = ~0.5%
7 hash functions  = ~0.35%
8 hash functions = ~0.15%
9 hash functions = ~0.1%
10 hash functions = ~0%

For reference data, the benchmarks run on one thread on a machine
with one numa node for 1 to 5 hash functions for 8-byte and 64-byte
values are as follows:

1 hash function:
  50k entries
	8-byte value
	    Lookups - 51.1 M/s operations
	    Updates - 33.6 M/s operations
	    False positive rate: 24.15%
	64-byte value
	    Lookups - 15.7 M/s operations
	    Updates - 15.1 M/s operations
	    False positive rate: 24.2%
  100k entries
	8-byte value
	    Lookups - 51.0 M/s operations
	    Updates - 33.4 M/s operations
	    False positive rate: 24.04%
	64-byte value
	    Lookups - 15.6 M/s operations
	    Updates - 14.6 M/s operations
	    False positive rate: 24.06%
  500k entries
	8-byte value
	    Lookups - 50.5 M/s operations
	    Updates - 33.1 M/s operations
	    False positive rate: 27.45%
	64-byte value
	    Lookups - 15.6 M/s operations
	    Updates - 14.2 M/s operations
	    False positive rate: 27.42%
  1 mil entries
	8-byte value
	    Lookups - 49.7 M/s operations
	    Updates - 32.9 M/s operations
	    False positive rate: 27.45%
	64-byte value
	    Lookups - 15.4 M/s operations
	    Updates - 13.7 M/s operations
	    False positive rate: 27.58%
  2.5 mil entries
	8-byte value
	    Lookups - 47.2 M/s operations
	    Updates - 31.8 M/s operations
	    False positive rate: 30.94%
	64-byte value
	    Lookups - 15.3 M/s operations
	    Updates - 13.2 M/s operations
	    False positive rate: 30.95%
  5 mil entries
	8-byte value
	    Lookups - 41.1 M/s operations
	    Updates - 28.1 M/s operations
	    False positive rate: 31.01%
	64-byte value
	    Lookups - 13.3 M/s operations
	    Updates - 11.4 M/s operations
	    False positive rate: 30.98%

2 hash functions:
  50k entries
	8-byte value
	    Lookups - 34.1 M/s operations
	    Updates - 20.1 M/s operations
	    False positive rate: 9.13%
	64-byte value
	    Lookups - 8.4 M/s operations
	    Updates - 7.9 M/s operations
	    False positive rate: 9.21%
  100k entries
	8-byte value
	    Lookups - 33.7 M/s operations
	    Updates - 18.9 M/s operations
	    False positive rate: 9.13%
	64-byte value
	    Lookups - 8.4 M/s operations
	    Updates - 7.7 M/s operations
	    False positive rate: 9.19%
  500k entries
	8-byte value
	    Lookups - 32.7 M/s operations
	    Updates - 18.1 M/s operations
	    False positive rate: 12.61%
	64-byte value
	    Lookups - 8.4 M/s operations
	    Updates - 7.5 M/s operations
	    False positive rate: 12.61%
  1 mil entries
	8-byte value
	    Lookups - 30.6 M/s operations
	    Updates - 18.9 M/s operations
	    False positive rate: 12.54%
	64-byte value
	    Lookups - 8.0 M/s operations
	    Updates - 7.0 M/s operations
	    False positive rate: 12.52%
  2.5 mil entries
	8-byte value
	    Lookups - 25.3 M/s operations
	    Updates - 16.7 M/s operations
	    False positive rate: 16.77%
	64-byte value
	    Lookups - 7.9 M/s operations
	    Updates - 6.5 M/s operations
	    False positive rate: 16.88%
  5 mil entries
	8-byte value
	    Lookups - 20.8 M/s operations
	    Updates - 14.7 M/s operations
	    False positive rate: 16.78%
	64-byte value
	    Lookups - 7.0 M/s operations
	    Updates - 6.0 M/s operations
	    False positive rate: 16.78%

3 hash functions:
  50k entries
	8-byte value
	    Lookups - 25.1 M/s operations
	    Updates - 14.6 M/s operations
	    False positive rate: 7.65%
	64-byte value
	    Lookups - 5.8 M/s operations
	    Updates - 5.5 M/s operations
	    False positive rate: 7.58%
  100k entries
	8-byte value
	    Lookups - 24.7 M/s operations
	    Updates - 14.1 M/s operations
	    False positive rate: 7.71%
	64-byte value
	    Lookups - 5.8 M/s operations
	    Updates - 5.3 M/s operations
	    False positive rate: 7.62%
  500k entries
	8-byte value
	    Lookups - 22.9 M/s operations
	    Updates - 13.9 M/s operations
	    False positive rate: 2.62%
	64-byte value
	    Lookups - 5.6 M/s operations
	    Updates - 4.8 M/s operations
	    False positive rate: 2.7%
  1 mil entries
	8-byte value
	    Lookups - 19.8 M/s operations
	    Updates - 12.6 M/s operations
	    False positive rate: 2.60%
	64-byte value
	    Lookups - 5.3 M/s operations
	    Updates - 4.4 M/s operations
	    False positive rate: 2.69%
  2.5 mil entries
	8-byte value
	    Lookups - 16.2 M/s operations
	    Updates - 10.7 M/s operations
	    False positive rate: 4.49%
	64-byte value
	    Lookups - 4.9 M/s operations
	    Updates - 4.1 M/s operations
	    False positive rate: 4.41%
  5 mil entries
	8-byte value
	    Lookups - 18.8 M/s operations
	    Updates - 9.2 M/s operations
	    False positive rate: 4.45%
	64-byte value
	    Lookups - 5.2 M/s operations
	    Updates - 3.9 M/s operations
	    False positive rate: 4.54%

4 hash functions:
  50k entries
	8-byte value
	    Lookups - 19.7 M/s operations
	    Updates - 11.1 M/s operations
	    False positive rate: 1.01%
	64-byte value
	    Lookups - 4.4 M/s operations
	    Updates - 4.0 M/s operations
	    False positive rate: 1.00%
  100k entries
	8-byte value
	    Lookups - 19.5 M/s operations
	    Updates - 10.9 M/s operations
	    False positive rate: 1.00%
	64-byte value
	    Lookups - 4.3 M/s operations
	    Updates - 3.9 M/s operations
	    False positive rate: 0.97%
  500k entries
	8-byte value
	    Lookups - 18.2 M/s operations
	    Updates - 10.6 M/s operations
	    False positive rate: 2.05%
	64-byte value
	    Lookups - 4.3 M/s operations
	    Updates - 3.7 M/s operations
	    False positive rate: 2.05%
  1 mil entries
	8-byte value
	    Lookups - 15.5 M/s operations
	    Updates - 9.6 M/s operations
	    False positive rate: 1.99%
	64-byte value
	    Lookups - 4.0 M/s operations
	    Updates - 3.4 M/s operations
	    False positive rate: 1.99%
  2.5 mil entries
	8-byte value
	    Lookups - 13.8 M/s operations
	    Updates - 7.7 M/s operations
	    False positive rate: 3.91%
	64-byte value
	    Lookups - 3.7 M/s operations
	    Updates - 3.6 M/s operations
	    False positive rate: 3.78%
  5 mil entries
	8-byte value
	    Lookups - 13.0 M/s operations
	    Updates - 6.9 M/s operations
	    False positive rate: 3.93%
	64-byte value
	    Lookups - 3.5 M/s operations
	    Updates - 3.7 M/s operations
	    False positive rate: 3.39%

5 hash functions:
  50k entries
	8-byte value
	    Lookups - 16.4 M/s operations
	    Updates - 9.1 M/s operations
	    False positive rate: 0.78%
	64-byte value
	    Lookups - 3.5 M/s operations
	    Updates - 3.2 M/s operations
	    False positive rate: 0.77%
  100k entries
	8-byte value
	    Lookups - 16.3 M/s operations
	    Updates - 9.0 M/s operations
	    False positive rate: 0.79%
	64-byte value
	    Lookups - 3.5 M/s operations
	    Updates - 3.2 M/s operations
	    False positive rate: 0.78%
  500k entries
	8-byte value
	    Lookups - 15.1 M/s operations
	    Updates - 8.8 M/s operations
	    False positive rate: 1.82%
	64-byte value
	    Lookups - 3.4 M/s operations
	    Updates - 3.0 M/s operations
	    False positive rate: 1.78%
  1 mil entries
	8-byte value
	    Lookups - 13.2 M/s operations
	    Updates - 7.8 M/s operations
	    False positive rate: 1.81%
	64-byte value
	    Lookups - 3.2 M/s operations
	    Updates - 2.8 M/s operations
	    False positive rate: 1.80%
  2.5 mil entries
	8-byte value
	    Lookups - 10.5 M/s operations
	    Updates - 5.9 M/s operations
	    False positive rate: 0.29%
	64-byte value
	    Lookups - 3.2 M/s operations
	    Updates - 2.4 M/s operations
	    False positive rate: 0.28%
  5 mil entries
	8-byte value
	    Lookups - 9.6 M/s operations
	    Updates - 5.7 M/s operations
	    False positive rate: 0.30%
	64-byte value
	    Lookups - 3.2 M/s operations
	    Updates - 2.7 M/s operations
	    False positive rate: 0.30%

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-5-joannekoong@fb.com
2021-10-28 13:22:49 -07:00
Joanne Koong
ed9109ad64 selftests/bpf: Add bloom filter map test cases
This patch adds test cases for bpf bloom filter maps. They include tests
checking against invalid operations by userspace, tests for using the
bloom filter map as an inner map, and a bpf program that queries the
bloom filter map for values added by a userspace program.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-4-joannekoong@fb.com
2021-10-28 13:22:49 -07:00
Joanne Koong
47512102cd libbpf: Add "map_extra" as a per-map-type extra flag
This patch adds the libbpf infrastructure for supporting a
per-map-type "map_extra" field, whose definition will be
idiosyncratic depending on map type.

For example, for the bloom filter map, the lower 4 bits of
map_extra is used to denote the number of hash functions.

Please note that until libbpf 1.0 is here, the
"bpf_create_map_params" struct is used as a temporary
means for propagating the map_extra field to the kernel.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-3-joannekoong@fb.com
2021-10-28 13:22:49 -07:00
Joanne Koong
9330986c03 bpf: Add bloom filter map implementation
This patch adds the kernel-side changes for the implementation of
a bpf bloom filter map.

The bloom filter map supports peek (determining whether an element
is present in the map) and push (adding an element to the map)
operations.These operations are exposed to userspace applications
through the already existing syscalls in the following way:

BPF_MAP_LOOKUP_ELEM -> peek
BPF_MAP_UPDATE_ELEM -> push

The bloom filter map does not have keys, only values. In light of
this, the bloom filter map's API matches that of queue stack maps:
user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
APIs to query or add an element to the bloom filter map. When the
bloom filter map is created, it must be created with a key_size of 0.

For updates, the user will pass in the element to add to the map
as the value, with a NULL key. For lookups, the user will pass in the
element to query in the map as the value, with a NULL key. In the
verifier layer, this requires us to modify the argument type of
a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE;
as well, in the syscall layer, we need to copy over the user value
so that in bpf_map_peek_elem, we know which specific value to query.

A few things to please take note of:
 * If there are any concurrent lookups + updates, the user is
responsible for synchronizing this to ensure no false negative lookups
occur.
 * The number of hashes to use for the bloom filter is configurable from
userspace. If no number is specified, the default used will be 5 hash
functions. The benchmarks later in this patchset can help compare the
performance of using different number of hashes on different entry
sizes. In general, using more hashes decreases both the false positive
rate and the speed of a lookup.
 * Deleting an element in the bloom filter map is not supported.
 * The bloom filter map may be used as an inner map.
 * The "max_entries" size that is specified at map creation time is used
to approximate a reasonable bitmap size for the bloom filter, and is not
otherwise strictly enforced. If the user wishes to insert more entries
into the bloom filter than "max_entries", they may do so but they should
be aware that this may lead to a higher false positive rate.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-2-joannekoong@fb.com
2021-10-28 13:22:49 -07:00
Tiezhu Yang
b066abba3e bpf, tests: Add module parameter test_suite to test_bpf module
After commit 9298e63eaf ("bpf/tests: Add exhaustive tests of ALU
operand magnitudes"), when modprobe test_bpf.ko with JIT on mips64,
there exists segment fault due to the following reason:

  [...]
  ALU64_MOV_X: all register value magnitudes jited:1
  Break instruction in kernel code[#1]
  [...]

It seems that the related JIT implementations of some test cases
in test_bpf() have problems. At this moment, I do not care about
the segment fault while I just want to verify the test cases of
tail calls.

Based on the above background and motivation, add the following
module parameter test_suite to the test_bpf.ko:

  test_suite=<string>: only the specified test suite will be run, the
  string can be "test_bpf", "test_tail_calls" or "test_skb_segment".

If test_suite is not specified, but test_id, test_name or test_range
is specified, set 'test_bpf' as the default test suite. This is useful
to only test the corresponding test suite when specifying the valid
test_suite string.

Any invalid test suite will result in -EINVAL being returned and no
tests being run. If the test_suite is not specified or specified as
empty string, it does not change the current logic, all of the test
cases will be run.

Here are some test results:

 # dmesg -c
 # modprobe test_bpf
 # dmesg | grep Summary
 test_bpf: Summary: 1009 PASSED, 0 FAILED, [0/997 JIT'ed]
 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
 test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED

 # rmmod test_bpf
 # dmesg -c
 # modprobe test_bpf test_suite=test_bpf
 # dmesg | tail -1
 test_bpf: Summary: 1009 PASSED, 0 FAILED, [0/997 JIT'ed]

 # rmmod test_bpf
 # dmesg -c
 # modprobe test_bpf test_suite=test_tail_calls
 # dmesg
 test_bpf: #0 Tail call leaf jited:0 21 PASS
 [...]
 test_bpf: #7 Tail call error path, index out of range jited:0 32 PASS
 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]

 # rmmod test_bpf
 # dmesg -c
 # modprobe test_bpf test_suite=test_skb_segment
 # dmesg
 test_bpf: #0 gso_with_rx_frags PASS
 test_bpf: #1 gso_linear_no_head_frag PASS
 test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED

 # rmmod test_bpf
 # dmesg -c
 # modprobe test_bpf test_id=1
 # dmesg
 test_bpf: test_bpf: set 'test_bpf' as the default test_suite.
 test_bpf: #1 TXA jited:0 54 51 50 PASS
 test_bpf: Summary: 1 PASSED, 0 FAILED, [0/1 JIT'ed]

 # rmmod test_bpf
 # dmesg -c
 # modprobe test_bpf test_suite=test_bpf test_name=TXA
 # dmesg
 test_bpf: #1 TXA jited:0 54 50 51 PASS
 test_bpf: Summary: 1 PASSED, 0 FAILED, [0/1 JIT'ed]

 # rmmod test_bpf
 # dmesg -c
 # modprobe test_bpf test_suite=test_tail_calls test_range=6,7
 # dmesg
 test_bpf: #6 Tail call error path, NULL target jited:0 41 PASS
 test_bpf: #7 Tail call error path, index out of range jited:0 32 PASS
 test_bpf: test_tail_calls: Summary: 2 PASSED, 0 FAILED, [0/2 JIT'ed]

 # rmmod test_bpf
 # dmesg -c
 # modprobe test_bpf test_suite=test_skb_segment test_id=1
 # dmesg
 test_bpf: #1 gso_linear_no_head_frag PASS
 test_bpf: test_skb_segment: Summary: 1 PASSED, 0 FAILED

By the way, the above segment fault has been fixed in the latest bpf-next
tree which contains the mips64 JIT rework.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Link: https://lore.kernel.org/bpf/1635384321-28128-1-git-send-email-yangtiezhu@loongson.cn
2021-10-28 11:41:16 +02:00
Tong Tiangen
252c765bd7 riscv, bpf: Add BPF exception tables
When a tracing BPF program attempts to read memory without using the
bpf_probe_read() helper, the verifier marks the load instruction with
the BPF_PROBE_MEM flag. Since the riscv JIT does not currently recognize
this flag it falls back to the interpreter.

Add support for BPF_PROBE_MEM, by appending an exception table to the
BPF program. If the load instruction causes a data abort, the fixup
infrastructure finds the exception table and fixes up the fault, by
clearing the destination register and jumping over the faulting
instruction.

A more generic solution would add a "handler" field to the table entry,
like on x86 and s390. The same issue in ARM64 is fixed in 8008342853
("bpf, arm64: Add BPF exception tables").

Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com>
Tested-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20211027111822.3801679-1-tongtiangen@huawei.com
2021-10-28 01:02:44 +02:00
Andrii Nakryiko
03e6a7a940 Merge branch 'selftests/bpf: parallel mode improvement'
Yucong Sun says:

====================

Several patches to improve parallel execution mode, updating vmtest.sh
and fixed two previously dropped patches according to feedback.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-10-27 11:59:43 -07:00
Yucong Sun
e1ef62a4dd selftests/bpf: Adding a namespace reset for tc_redirect
This patch delete ns_src/ns_dst/ns_redir namespaces before recreating
them, making the test more robust.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211025223345.2136168-5-fallentree@fb.com
2021-10-27 11:59:02 -07:00
Yucong Sun
9e7240fb2d selftests/bpf: Fix attach_probe in parallel mode
This patch makes attach_probe uses its own method as attach point,
avoiding conflict with other tests like bpf_cookie.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211025223345.2136168-4-fallentree@fb.com
2021-10-27 11:59:02 -07:00
Yucong Sun
547208a386 selfetests/bpf: Update vmtest.sh defaults
Increase memory to 4G, 8 SMP core with host cpu passthrough. This
make it run faster in parallel mode and more likely to succeed.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211025223345.2136168-2-fallentree@fb.com
2021-10-27 11:58:44 -07:00
Alexei Starovoitov
f9d532fc5d Merge branch 'bpf: use 32bit safe version of u64_stats'
Eric Dumazet says:

====================

From: Eric Dumazet <edumazet@google.com>

Two first patches fix bugs added in 5.1 and 5.5

Third patch replaces the u64 fields in struct bpf_prog_stats
with u64_stats_t ones to avoid possible sampling errors,
in case of load/store stearing.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-10-27 11:13:53 -07:00
Eric Dumazet
61a0abaee2 bpf: Use u64_stats_t in struct bpf_prog_stats
Commit 316580b69d ("u64_stats: provide u64_stats_t type")
fixed possible load/store tearing on 64bit arches.

For instance the following C code

stats->nsecs += sched_clock() - start;

Could be rightfully implemented like this by a compiler,
confusing concurrent readers a lot:

stats->nsecs += sched_clock();
// arbitrary delay
stats->nsecs -= start;

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211026214133.3114279-4-eric.dumazet@gmail.com
2021-10-27 11:13:52 -07:00
Eric Dumazet
d979617aa8 bpf: Fixes possible race in update_prog_stats() for 32bit arches
It seems update_prog_stats() suffers from same issue fixed
in the prior patch:

As it can run while interrupts are enabled, it could
be re-entered and the u64_stats syncp could be mangled.

Fixes: fec56f5890 ("bpf: Introduce BPF trampoline")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211026214133.3114279-3-eric.dumazet@gmail.com
2021-10-27 11:13:52 -07:00
Eric Dumazet
f941eadd8d bpf: Avoid races in __bpf_prog_run() for 32bit arches
__bpf_prog_run() can run from non IRQ contexts, meaning
it could be re entered if interrupted.

This calls for the irq safe variant of u64_stats_update_{begin|end},
or risk a deadlock.

This patch is a nop on 64bit arches, fortunately.

syzbot report:

WARNING: inconsistent lock state
5.12.0-rc3-syzkaller #0 Not tainted
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
udevd/4013 [HC0[0]:SC0[0]:HE1:SE1] takes:
ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: sk_filter include/linux/filter.h:867 [inline]
ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: do_one_broadcast net/netlink/af_netlink.c:1468 [inline]
ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: netlink_broadcast_filtered+0x27c/0x4fc net/netlink/af_netlink.c:1520
{IN-SOFTIRQ-W} state was registered at:
  lock_acquire.part.0+0xf0/0x41c kernel/locking/lockdep.c:5510
  lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5483
  do_write_seqcount_begin_nested include/linux/seqlock.h:520 [inline]
  do_write_seqcount_begin include/linux/seqlock.h:545 [inline]
  u64_stats_update_begin include/linux/u64_stats_sync.h:129 [inline]
  bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline]
  bpf_prog_run_clear_cb+0x1bc/0x270 include/linux/filter.h:755
  run_filter+0xa0/0x17c net/packet/af_packet.c:2031
  packet_rcv+0xc0/0x3e0 net/packet/af_packet.c:2104
  dev_queue_xmit_nit+0x2bc/0x39c net/core/dev.c:2387
  xmit_one net/core/dev.c:3588 [inline]
  dev_hard_start_xmit+0x94/0x518 net/core/dev.c:3609
  sch_direct_xmit+0x11c/0x1f0 net/sched/sch_generic.c:313
  qdisc_restart net/sched/sch_generic.c:376 [inline]
  __qdisc_run+0x194/0x7f8 net/sched/sch_generic.c:384
  qdisc_run include/net/pkt_sched.h:136 [inline]
  qdisc_run include/net/pkt_sched.h:128 [inline]
  __dev_xmit_skb net/core/dev.c:3795 [inline]
  __dev_queue_xmit+0x65c/0xf84 net/core/dev.c:4150
  dev_queue_xmit+0x14/0x18 net/core/dev.c:4215
  neigh_resolve_output net/core/neighbour.c:1491 [inline]
  neigh_resolve_output+0x170/0x228 net/core/neighbour.c:1471
  neigh_output include/net/neighbour.h:510 [inline]
  ip6_finish_output2+0x2e4/0x9fc net/ipv6/ip6_output.c:117
  __ip6_finish_output net/ipv6/ip6_output.c:182 [inline]
  __ip6_finish_output+0x164/0x3f8 net/ipv6/ip6_output.c:161
  ip6_finish_output+0x2c/0xb0 net/ipv6/ip6_output.c:192
  NF_HOOK_COND include/linux/netfilter.h:290 [inline]
  ip6_output+0x74/0x294 net/ipv6/ip6_output.c:215
  dst_output include/net/dst.h:448 [inline]
  NF_HOOK include/linux/netfilter.h:301 [inline]
  NF_HOOK include/linux/netfilter.h:295 [inline]
  mld_sendpack+0x2a8/0x7e4 net/ipv6/mcast.c:1679
  mld_send_cr net/ipv6/mcast.c:1975 [inline]
  mld_ifc_timer_expire+0x1e8/0x494 net/ipv6/mcast.c:2474
  call_timer_fn+0xd0/0x570 kernel/time/timer.c:1431
  expire_timers kernel/time/timer.c:1476 [inline]
  __run_timers kernel/time/timer.c:1745 [inline]
  run_timer_softirq+0x2e4/0x384 kernel/time/timer.c:1758
  __do_softirq+0x204/0x7ac kernel/softirq.c:345
  do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline]
  invoke_softirq kernel/softirq.c:228 [inline]
  __irq_exit_rcu+0x1d8/0x200 kernel/softirq.c:422
  irq_exit+0x10/0x3c kernel/softirq.c:446
  __handle_domain_irq+0xb4/0x120 kernel/irq/irqdesc.c:692
  handle_domain_irq include/linux/irqdesc.h:176 [inline]
  gic_handle_irq+0x84/0xac drivers/irqchip/irq-gic.c:370
  __irq_svc+0x5c/0x94 arch/arm/kernel/entry-armv.S:205
  debug_smp_processor_id+0x0/0x24 lib/smp_processor_id.c:53
  rcu_read_lock_held_common kernel/rcu/update.c:108 [inline]
  rcu_read_lock_sched_held+0x24/0x7c kernel/rcu/update.c:123
  trace_lock_acquire+0x24c/0x278 include/trace/events/lock.h:13
  lock_acquire+0x3c/0x74 kernel/locking/lockdep.c:5481
  rcu_lock_acquire include/linux/rcupdate.h:267 [inline]
  rcu_read_lock include/linux/rcupdate.h:656 [inline]
  avc_has_perm_noaudit+0x6c/0x260 security/selinux/avc.c:1150
  selinux_inode_permission+0x140/0x220 security/selinux/hooks.c:3141
  security_inode_permission+0x44/0x60 security/security.c:1268
  inode_permission.part.0+0x5c/0x13c fs/namei.c:521
  inode_permission fs/namei.c:494 [inline]
  may_lookup fs/namei.c:1652 [inline]
  link_path_walk.part.0+0xd4/0x38c fs/namei.c:2208
  link_path_walk fs/namei.c:2189 [inline]
  path_lookupat+0x3c/0x1b8 fs/namei.c:2419
  filename_lookup+0xa8/0x1a4 fs/namei.c:2453
  user_path_at_empty+0x74/0x90 fs/namei.c:2733
  do_readlinkat+0x5c/0x12c fs/stat.c:417
  __do_sys_readlink fs/stat.c:450 [inline]
  sys_readlink+0x24/0x28 fs/stat.c:447
  ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64
  0x7eaa4974
irq event stamp: 298277
hardirqs last  enabled at (298277): [<802000d0>] no_work_pending+0x4/0x34
hardirqs last disabled at (298276): [<8020c9b8>] do_work_pending+0x9c/0x648 arch/arm/kernel/signal.c:676
softirqs last  enabled at (298216): [<8020167c>] __do_softirq+0x584/0x7ac kernel/softirq.c:372
softirqs last disabled at (298201): [<8024dff4>] do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline]
softirqs last disabled at (298201): [<8024dff4>] invoke_softirq kernel/softirq.c:228 [inline]
softirqs last disabled at (298201): [<8024dff4>] __irq_exit_rcu+0x1d8/0x200 kernel/softirq.c:422

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&pstats->syncp)->seq);
  <Interrupt>
    lock(&(&pstats->syncp)->seq);

 *** DEADLOCK ***

1 lock held by udevd/4013:
 #0: 82b09c5c (rcu_read_lock){....}-{1:2}, at: sk_filter_trim_cap+0x54/0x434 net/core/filter.c:139

stack backtrace:
CPU: 1 PID: 4013 Comm: udevd Not tainted 5.12.0-rc3-syzkaller #0
Hardware name: ARM-Versatile Express
Backtrace:
[<81802550>] (dump_backtrace) from [<818027c4>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
 r7:00000080 r6:600d0093 r5:00000000 r4:82b58344
[<818027ac>] (show_stack) from [<81809e98>] (__dump_stack lib/dump_stack.c:79 [inline])
[<818027ac>] (show_stack) from [<81809e98>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120)
[<81809de0>] (dump_stack) from [<81804a00>] (print_usage_bug.part.0+0x228/0x230 kernel/locking/lockdep.c:3806)
 r7:86bcb768 r6:81a0326c r5:830f96a8 r4:86bcb0c0
[<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (print_usage_bug kernel/locking/lockdep.c:3776 [inline])
[<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (valid_state kernel/locking/lockdep.c:3818 [inline])
[<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (mark_lock_irq kernel/locking/lockdep.c:4021 [inline])
[<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (mark_lock.part.0+0xc34/0x136c kernel/locking/lockdep.c:4478)
 r10:83278fe8 r9:82c6d748 r8:00000000 r7:82c6d2d4 r6:00000004 r5:86bcb768
 r4:00000006
[<802ba584>] (mark_lock.part.0) from [<802bc644>] (mark_lock kernel/locking/lockdep.c:4442 [inline])
[<802ba584>] (mark_lock.part.0) from [<802bc644>] (mark_usage kernel/locking/lockdep.c:4391 [inline])
[<802ba584>] (mark_lock.part.0) from [<802bc644>] (__lock_acquire+0x9bc/0x3318 kernel/locking/lockdep.c:4854)
 r10:86bcb768 r9:86bcb0c0 r8:00000001 r7:00040000 r6:0000075a r5:830f96a8
 r4:00000000
[<802bbc88>] (__lock_acquire) from [<802bfb90>] (lock_acquire.part.0+0xf0/0x41c kernel/locking/lockdep.c:5510)
 r10:00000000 r9:600d0013 r8:00000000 r7:00000000 r6:828a2680 r5:828a2680
 r4:861e5bc8
[<802bfaa0>] (lock_acquire.part.0) from [<802bff28>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5483)
 r10:8146137c r9:00000000 r8:00000001 r7:00000000 r6:00000000 r5:00000000
 r4:ff7c9dec
[<802bfebc>] (lock_acquire) from [<81381eb4>] (do_write_seqcount_begin_nested include/linux/seqlock.h:520 [inline])
[<802bfebc>] (lock_acquire) from [<81381eb4>] (do_write_seqcount_begin include/linux/seqlock.h:545 [inline])
[<802bfebc>] (lock_acquire) from [<81381eb4>] (u64_stats_update_begin include/linux/u64_stats_sync.h:129 [inline])
[<802bfebc>] (lock_acquire) from [<81381eb4>] (__bpf_prog_run_save_cb include/linux/filter.h:727 [inline])
[<802bfebc>] (lock_acquire) from [<81381eb4>] (bpf_prog_run_save_cb include/linux/filter.h:741 [inline])
[<802bfebc>] (lock_acquire) from [<81381eb4>] (sk_filter_trim_cap+0x26c/0x434 net/core/filter.c:149)
 r10:a4095dd0 r9:ff7c9dd0 r8:e44be000 r7:8146137c r6:00000001 r5:8611ba80
 r4:00000000
[<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (sk_filter include/linux/filter.h:867 [inline])
[<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (do_one_broadcast net/netlink/af_netlink.c:1468 [inline])
[<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (netlink_broadcast_filtered+0x27c/0x4fc net/netlink/af_netlink.c:1520)
 r10:00000001 r9:833d6b1c r8:00000000 r7:8572f864 r6:8611ba80 r5:8698d800
 r4:8572f800
[<81461100>] (netlink_broadcast_filtered) from [<81463e60>] (netlink_broadcast net/netlink/af_netlink.c:1544 [inline])
[<81461100>] (netlink_broadcast_filtered) from [<81463e60>] (netlink_sendmsg+0x3d0/0x478 net/netlink/af_netlink.c:1925)
 r10:00000000 r9:00000002 r8:8698d800 r7:000000b7 r6:8611b900 r5:861e5f50
 r4:86aa3000
[<81463a90>] (netlink_sendmsg) from [<81321f54>] (sock_sendmsg_nosec net/socket.c:654 [inline])
[<81463a90>] (netlink_sendmsg) from [<81321f54>] (sock_sendmsg+0x3c/0x4c net/socket.c:674)
 r10:00000000 r9:861e5dd4 r8:00000000 r7:86570000 r6:00000000 r5:86570000
 r4:861e5f50
[<81321f18>] (sock_sendmsg) from [<813234d0>] (____sys_sendmsg+0x230/0x29c net/socket.c:2350)
 r5:00000040 r4:861e5f50
[<813232a0>] (____sys_sendmsg) from [<8132549c>] (___sys_sendmsg+0xac/0xe4 net/socket.c:2404)
 r10:00000128 r9:861e4000 r8:00000000 r7:00000000 r6:86570000 r5:861e5f50
 r4:00000000
[<813253f0>] (___sys_sendmsg) from [<81325684>] (__sys_sendmsg net/socket.c:2433 [inline])
[<813253f0>] (___sys_sendmsg) from [<81325684>] (__do_sys_sendmsg net/socket.c:2442 [inline])
[<813253f0>] (___sys_sendmsg) from [<81325684>] (sys_sendmsg+0x58/0xa0 net/socket.c:2440)
 r8:80200224 r7:00000128 r6:00000000 r5:7eaa541c r4:86570000
[<8132562c>] (sys_sendmsg) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)
Exception stack(0x861e5fa8 to 0x861e5ff0)
5fa0:                   00000000 00000000 0000000c 7eaa541c 00000000 00000000
5fc0: 00000000 00000000 76fbf840 00000128 00000000 0000008f 7eaa541c 000563f8
5fe0: 00056110 7eaa53e0 00036cec 76c9bf44
 r6:76fbf840 r5:00000000 r4:00000000

Fixes: 492ecee892 ("bpf: enable program stats")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211026214133.3114279-2-eric.dumazet@gmail.com
2021-10-27 11:13:52 -07:00
Joe Burton
689624f037 libbpf: Deprecate bpf_objects_list
Add a flag to `enum libbpf_strict_mode' to disable the global
`bpf_objects_list', preventing race conditions when concurrent threads
call bpf_object__open() or bpf_object__close().

bpf_object__next() will return NULL if this option is set.

Callers may achieve the same workflow by tracking bpf_objects in
application code.

  [0] Closes: https://github.com/libbpf/libbpf/issues/293

Signed-off-by: Joe Burton <jevburton@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026223528.413950-1-jevburton.kernel@gmail.com
2021-10-27 11:00:12 -07:00
Song Liu
20d1b54a52 selftests/bpf: Guess function end for test_get_branch_snapshot
Function in modules could appear in /proc/kallsyms in random order.

ffffffffa02608a0 t bpf_testmod_loop_test
ffffffffa02600c0 t __traceiter_bpf_testmod_test_writable_bare
ffffffffa0263b60 d __tracepoint_bpf_testmod_test_write_bare
ffffffffa02608c0 T bpf_testmod_test_read
ffffffffa0260d08 t __SCT__tp_func_bpf_testmod_test_writable_bare
ffffffffa0263300 d __SCK__tp_func_bpf_testmod_test_read
ffffffffa0260680 T bpf_testmod_test_write
ffffffffa0260860 t bpf_testmod_test_mod_kfunc

Therefore, we cannot reliably use kallsyms_find_next() to find the end of
a function. Replace it with a simple guess (start + 128). This is good
enough for this test.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211022234814.318457-1-songliubraving@fb.com
2021-10-25 21:43:05 -07:00
Song Liu
b4e8707276 selftests/bpf: Skip all serial_test_get_branch_snapshot in vm
Skipping the second half of the test is not enough to silent the warning
in dmesg. Skip the whole test before we can either properly silent the
warning in kernel, or fix LBR snapshot for VM.

Fixes: 025bd7c753 ("selftests/bpf: Add test for bpf_get_branch_snapshot")
Fixes: aa67fdb464 ("selftests/bpf: Skip the second half of get_branch_snapshot in vm")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026000733.477714-1-songliubraving@fb.com
2021-10-25 20:41:00 -07:00
Andrii Nakryiko
e02daf4ce5 Merge branch 'core_reloc fixes for s390'
Ilya Leoshkevich says:

====================

v2: https://lore.kernel.org/bpf/20211025131214.731972-1-iii@linux.ibm.com/
v2 -> v3: Split the fix from the cleanup (Daniel).

v1: https://lore.kernel.org/bpf/20211021234653.643302-1-iii@linux.ibm.com/
v1 -> v2: Drop bpf_core_calc_field_relo() restructuring, split the
          __BYTE_ORDER__ change (Andrii).

Hi,

this series fixes test failures in core_reloc on s390.

Patch 1 fixes an endianness bug with __BYTE_ORDER vs __BYTE_ORDER__.
Patches 2-5 make the rest of the code consistent in that respect.
Patch 6 fixes an endianness issue in test_core_reloc_mods.

Best regards,
Ilya
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-10-25 20:39:42 -07:00
Ilya Leoshkevich
2e2c6d3fb3 selftests/bpf: Fix test_core_reloc_mods on big-endian machines
This is the same as commit d164dd9a5c ("selftests/bpf: Fix
test_core_autosize on big-endian machines"), but for
test_core_reloc_mods.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-7-iii@linux.ibm.com
2021-10-25 20:39:42 -07:00
Ilya Leoshkevich
3e7ed9cebb selftests/seccomp: Use __BYTE_ORDER__
Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined
__BYTE_ORDER for consistency.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-6-iii@linux.ibm.com
2021-10-25 20:39:42 -07:00
Ilya Leoshkevich
14e6cac771 samples: seccomp: Use __BYTE_ORDER__
Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined
__BYTE_ORDER for consistency.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-5-iii@linux.ibm.com
2021-10-25 20:39:42 -07:00
Ilya Leoshkevich
06fca841fb selftests/bpf: Use __BYTE_ORDER__
Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined
__BYTE_ORDER for consistency.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-4-iii@linux.ibm.com
2021-10-25 20:39:42 -07:00
Ilya Leoshkevich
3930198dc9 libbpf: Use __BYTE_ORDER__
Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined
__BYTE_ORDER for consistency.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-3-iii@linux.ibm.com
2021-10-25 20:39:41 -07:00
Ilya Leoshkevich
45f2bebc80 libbpf: Fix endianness detection in BPF_CORE_READ_BITFIELD_PROBED()
__BYTE_ORDER is supposed to be defined by a libc, and __BYTE_ORDER__ -
by a compiler. bpf_core_read.h checks __BYTE_ORDER == __LITTLE_ENDIAN,
which is true if neither are defined, leading to incorrect behavior on
big-endian hosts if libc headers are not included, which is often the
case.

Fixes: ee26dade0e ("libbpf: Add support for relocatable bitfields")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-2-iii@linux.ibm.com
2021-10-25 20:39:41 -07:00
Alexei Starovoitov
124c6003bf Merge branch 'libbpf: add bpf_program__insns() accessor'
Andrii Nakryiko says:

====================

Add libbpf APIs to access BPF program instructions. Both before and after
libbpf processing (before and after bpf_object__load()). This allows to
inspect what's going on with BPF program assembly instructions as libbpf
performs its processing magic.

But in more practical terms, this allows to do a no-brainer BPF program
cloning, which is something you need when working with fentry/fexit BPF
programs to be able to attach the same BPF program code to multiple kernel
functions. Currently, kernel needs multiple copies of BPF programs, each
loaded with its own target BTF ID. retsnoop is one such example that
previously had to rely on bpf_program__set_prep() API to hijack program
instructions ([0] for before and after).

Speaking of bpf_program__set_prep() API and the whole concept of
multiple-instance BPF programs in libbpf, all that is scheduled for
deprecation in v0.7. It doesn't work well, it's cumbersome, and it will become
more broken as libbpf adds more functionality. So deprecate and remove it in
libbpf 1.0. It doesn't seem to be used by anyone anyways (except for that
retsnoop hack, which is now much cleaner with new APIs as can be seen in [0]).

  [0] https://github.com/anakryiko/retsnoop/pull/1
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-10-25 18:37:22 -07:00
Andrii Nakryiko
c4813e969a libbpf: Deprecate ambiguously-named bpf_program__size() API
The name of the API doesn't convey clearly that this size is in number
of bytes (there needed to be a separate comment to make this clear in
libbpf.h). Further, measuring the size of BPF program in bytes is not
exactly the best fit, because BPF programs always consist of 8-byte
instructions. As such, bpf_program__insn_cnt() is a better alternative
in pretty much any imaginable case.

So schedule bpf_program__size() deprecation starting from v0.7 and it
will be removed in libbpf 1.0.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211025224531.1088894-5-andrii@kernel.org
2021-10-25 18:37:21 -07:00
Andrii Nakryiko
e21d585cb3 libbpf: Deprecate multi-instance bpf_program APIs
Schedule deprecation of a set of APIs that are related to multi-instance
bpf_programs:
  - bpf_program__set_prep() ([0]);
  - bpf_program__{set,unset}_instance() ([1]);
  - bpf_program__nth_fd().

These APIs are obscure, very niche, and don't seem to be used much in
practice. bpf_program__set_prep() is pretty useless for anything but the
simplest BPF programs, as it doesn't allow to adjust BPF program load
attributes, among other things. In short, it already bitrotted and will
bitrot some more if not removed.

With bpf_program__insns() API, which gives access to post-processed BPF
program instructions of any given entry-point BPF program, it's now
possible to do whatever necessary adjustments were possible with
set_prep() API before, but also more. Given any such use case is
automatically an advanced use case, requiring users to stick to
low-level bpf_prog_load() APIs and managing their own prog FDs is
reasonable.

  [0] Closes: https://github.com/libbpf/libbpf/issues/299
  [1] Closes: https://github.com/libbpf/libbpf/issues/300

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211025224531.1088894-4-andrii@kernel.org
2021-10-25 18:37:21 -07:00
Andrii Nakryiko
65a7fa2e4e libbpf: Add ability to fetch bpf_program's underlying instructions
Add APIs providing read-only access to bpf_program BPF instructions ([0]).
This is useful for diagnostics purposes, but it also allows a cleaner
support for cloning BPF programs after libbpf did all the FD resolution
and CO-RE relocations, subprog instructions appending, etc. Currently,
cloning BPF program is possible only through hijacking a half-broken
bpf_program__set_prep() API, which doesn't really work well for anything
but most primitive programs. For instance, set_prep() API doesn't allow
adjusting BPF program load parameters which are necessary for loading
fentry/fexit BPF programs (the case where BPF program cloning is
a necessity if doing some sort of mass-attachment functionality).

Given bpf_program__set_prep() API is set to be deprecated, having
a cleaner alternative is a must. libbpf internally already keeps track
of linear array of struct bpf_insn, so it's not hard to expose it. The
only gotcha is that libbpf previously freed instructions array during
bpf_object load time, which would make this API much less useful overall,
because in between bpf_object__open() and bpf_object__load() a lot of
changes to instructions are done by libbpf.

So this patch makes libbpf hold onto prog->insns array even after BPF
program loading. I think this is a small price for added functionality
and improved introspection of BPF program code.

See retsnoop PR ([1]) for how it can be used in practice and code
savings compared to relying on bpf_program__set_prep().

  [0] Closes: https://github.com/libbpf/libbpf/issues/298
  [1] https://github.com/anakryiko/retsnoop/pull/1

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211025224531.1088894-3-andrii@kernel.org
2021-10-25 18:37:21 -07:00
Andrii Nakryiko
de5d0dcef6 libbpf: Fix off-by-one bug in bpf_core_apply_relo()
Fix instruction index validity check which has off-by-one error.

Fixes: 3ee4f53355 ("libbpf: Split bpf_core_apply_relo() into bpf_program independent helper.")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211025224531.1088894-2-andrii@kernel.org
2021-10-25 18:37:21 -07:00
Andrii Nakryiko
9327acd0f9 Merge branch 'bpftool: Switch to libbpf's hashmap for referencing BPF objects'
Quentin Monnet says:

====================

When listing BPF objects, bpftool can print a number of properties about
items holding references to these objects. For example, it can show pinned
paths for BPF programs, maps, and links; or programs and maps using a given
BTF object; or the names and PIDs of processes referencing BPF objects. To
collect this information, bpftool uses hash maps (to be clear: the data
structures, inside bpftool - we are not talking of BPF maps). It uses the
implementation available from the kernel, and picks it up from
tools/include/linux/hashtable.h.

This patchset converts bpftool's hash maps to a distinct implementation
instead, the one coming with libbpf. The main motivation for this change is
that it should ease the path towards a potential out-of-tree mirror for
bpftool, like the one libbpf already has. Although it's not perfect to
depend on libbpf's internal components, bpftool is intimately tied with the
library anyway, and this looks better than depending too much on (non-UAPI)
kernel headers.

The first two patches contain preparatory work on the Makefile and on the
initialisation of the hash maps for collecting pinned paths for objects.
Then the transition is split into several steps, one for each kind of
properties for which the collection is backed by hash maps.

v2:
  - Move hashmap cleanup for pinned paths for links from do_detach() to
    do_show().
  - Handle errors on hashmap__append() (in three of the patches).
  - Rename bpftool_hash_fn() and bpftool_equal_fn() as hash_fn_for_key_id()
    and equal_fn_for_key_id(), respectively.
  - Add curly braces for hashmap__for_each_key_entry() { } in
    show_btf_plain() and show_btf_json(), where the flow was difficult to
    read.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-10-25 17:31:39 -07:00
Quentin Monnet
d6699f8e0f bpftool: Switch to libbpf's hashmap for PIDs/names references
In order to show PIDs and names for processes holding references to BPF
programs, maps, links, or BTF objects, bpftool creates hash maps to
store all relevant information. This commit is part of a set that
transitions from the kernel's hash map implementation to the one coming
with libbpf.

The motivation is to make bpftool less dependent of kernel headers, to
ease the path to a potential out-of-tree mirror, like libbpf has.

This is the third and final step of the transition, in which we convert
the hash maps used for storing the information about the processes
holding references to BPF objects (programs, maps, links, BTF), and at
last we drop the inclusion of tools/include/linux/hashtable.h.

Note: Checkpatch complains about the use of __weak declarations, and the
missing empty lines after the bunch of empty function declarations when
compiling without the BPF skeletons (none of these were introduced in
this patch). We want to keep things as they are, and the reports should
be safe to ignore.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211023205154.6710-6-quentin@isovalent.com
2021-10-25 17:31:39 -07:00
Quentin Monnet
2828d0d75b bpftool: Switch to libbpf's hashmap for programs/maps in BTF listing
In order to show BPF programs and maps using BTF objects when the latter
are being listed, bpftool creates hash maps to store all relevant items.
This commit is part of a set that transitions from the kernel's hash map
implementation to the one coming with libbpf.

The motivation is to make bpftool less dependent of kernel headers, to
ease the path to a potential out-of-tree mirror, like libbpf has.

This commit focuses on the two hash maps used by bpftool when listing
BTF objects to store references to programs and maps, and convert them
to the libbpf's implementation.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211023205154.6710-5-quentin@isovalent.com
2021-10-25 17:31:39 -07:00
Quentin Monnet
8f184732b6 bpftool: Switch to libbpf's hashmap for pinned paths of BPF objects
In order to show pinned paths for BPF programs, maps, or links when
listing them with the "-f" option, bpftool creates hash maps to store
all relevant paths under the bpffs. So far, it would rely on the
kernel implementation (from tools/include/linux/hashtable.h).

We can make bpftool rely on libbpf's implementation instead. The
motivation is to make bpftool less dependent of kernel headers, to ease
the path to a potential out-of-tree mirror, like libbpf has.

This commit is the first step of the conversion: the hash maps for
pinned paths for programs, maps, and links are converted to libbpf's
hashmap.{c,h}. Other hash maps used for the PIDs of process holding
references to BPF objects are left unchanged for now. On the build side,
this requires adding a dependency to a second header internal to libbpf,
and making it a dependency for the bootstrap bpftool version as well.
The rest of the changes are a rather straightforward conversion.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211023205154.6710-4-quentin@isovalent.com
2021-10-25 17:31:38 -07:00
Quentin Monnet
46241271d1 bpftool: Do not expose and init hash maps for pinned path in main.c
BPF programs, maps, and links, can all be listed with their pinned paths
by bpftool, when the "-f" option is provided. To do so, bpftool builds
hash maps containing all pinned paths for each kind of objects.

These three hash maps are always initialised in main.c, and exposed
through main.h. There appear to be no particular reason to do so: we can
just as well make them static to the files that need them (prog.c,
map.c, and link.c respectively), and initialise them only when we want
to show objects and the "-f" switch is provided.

This may prevent unnecessary memory allocations if the implementation of
the hash maps was to change in the future.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211023205154.6710-3-quentin@isovalent.com
2021-10-25 17:31:38 -07:00
Quentin Monnet
8b6c46241c bpftool: Remove Makefile dep. on $(LIBBPF) for $(LIBBPF_INTERNAL_HDRS)
The dependency is only useful to make sure that the $(LIBBPF_HDRS_DIR)
directory is created before we try to install locally the required
libbpf internal header. Let's create this directory properly instead.

This is in preparation of making $(LIBBPF_INTERNAL_HDRS) a dependency to
the bootstrap bpftool version, in which case we want no dependency on
$(LIBBPF).

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211023205154.6710-2-quentin@isovalent.com
2021-10-25 17:31:38 -07:00
Alexei Starovoitov
57c8d362ce Merge branch 'Parallelize verif_scale selftests'
Andrii Nakryiko says:

====================

Reduce amount of waiting time when running test_progs in parallel mode (-j) by
splitting bpf_verif_scale selftests into multiple tests. Previously it was
structured as a test with multiple subtests, but subtests are not easily
parallelizable with test_progs' infra. Also in practice each scale subtest is
really an independent test with nothing shared across all substest.

This patch set changes how test_progs test discovery works. Now it is possible
to define multiple tests within a single source code file. One of the patches
also marks tc_redirect selftests as serial, because it's extremely harmful to
the test system when run in parallel mode.
====================

Acked-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-10-25 14:45:52 -07:00
Andrii Nakryiko
3762a39ce8 selftests/bpf: Split out bpf_verif_scale selftests into multiple tests
Instead of using subtests in bpf_verif_scale selftest, turn each scale
sub-test into its own test. Each subtest is compltely independent and
just reuses a bit of common test running logic, so the conversion is
trivial. For convenience, keep all of BPF verifier scale tests in one
file.

This conversion shaves off a significant amount of time when running
test_progs in parallel mode. E.g., just running scale tests (-t verif_scale):

BEFORE
======
Summary: 24/0 PASSED, 0 SKIPPED, 0 FAILED

real    0m22.894s
user    0m0.012s
sys     0m22.797s

AFTER
=====
Summary: 24/0 PASSED, 0 SKIPPED, 0 FAILED

real    0m12.044s
user    0m0.024s
sys     0m27.869s

Ten second saving right there. test_progs -j is not yet ready to be
turned on by default, unfortunately, and some tests fail almost every
time, but this is a good improvement nevertheless. Ignoring few
failures, here is sequential vs parallel run times when running all
tests now:

SEQUENTIAL
==========
Summary: 206/953 PASSED, 4 SKIPPED, 0 FAILED

real    1m5.625s
user    0m4.211s
sys     0m31.650s

PARALLEL
========
Summary: 204/952 PASSED, 4 SKIPPED, 2 FAILED

real    0m35.550s
user    0m4.998s
sys     0m39.890s

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211022223228.99920-5-andrii@kernel.org
2021-10-25 14:45:46 -07:00
Andrii Nakryiko
2c0f51ac32 selftests/bpf: Mark tc_redirect selftest as serial
It seems to cause a lot of harm to kprobe/tracepoint selftests. Yucong
mentioned before that it does manipulate sysfs, which might be the
reason. So let's mark it as serial, though ideally it would be less
intrusive on the system at test.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211022223228.99920-4-andrii@kernel.org
2021-10-25 14:45:46 -07:00
Andrii Nakryiko
8ea688e7f4 selftests/bpf: Support multiple tests per file
Revamp how test discovery works for test_progs and allow multiple test
entries per file. Any global void function with no arguments and
serial_test_ or test_ prefix is considered a test.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211022223228.99920-3-andrii@kernel.org
2021-10-25 14:45:46 -07:00
Andrii Nakryiko
6972dc3b87 selftests/bpf: Normalize selftest entry points
Ensure that all test entry points are global void functions with no
input arguments. Mark few subtest entry points as static.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211022223228.99920-2-andrii@kernel.org
2021-10-25 14:45:45 -07:00
Andrii Nakryiko
c825f5fee1 libbpf: Fix BTF header parsing checks
Original code assumed fixed and correct BTF header length. That's not
always the case, though, so fix this bug with a proper additional check.
And use actual header length instead of sizeof(struct btf_header) in
sanity checks.

Fixes: 8a138aed4a ("bpf: btf: Add BTF support to libbpf")
Reported-by: Evgeny Vereshchagin <evvers@ya.ru>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211023003157.726961-2-andrii@kernel.org
2021-10-22 17:33:31 -07:00
Andrii Nakryiko
5245dafe3d libbpf: Fix overflow in BTF sanity checks
btf_header's str_off+str_len or type_off+type_len can overflow as they
are u32s. This will lead to bypassing the sanity checks during BTF
parsing, resulting in crashes afterwards. Fix by using 64-bit signed
integers for comparison.

Fixes: d812362450 ("libbpf: Fix BTF data layout checks and allow empty BTF")
Reported-by: Evgeny Vereshchagin <evvers@ya.ru>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211023003157.726961-1-andrii@kernel.org
2021-10-22 17:33:31 -07:00
Alexei Starovoitov
1c50884370 Merge branch 'bpf: add support for BTF_KIND_DECL_TAG typedef'
Yonghong Song says:

====================

Latest upstream llvm-project added support for btf_decl_tag attributes
for typedef declarations ([1], [2]). Similar to other btf_decl_tag cases,
func/func_param/global_var/struct/union/field, btf_decl_tag with typedef
declaration can carry information from kernel source to clang compiler
and then to dwarf/BTF, for bpf verification or other use cases.

This patch set added kernel support for BTF_KIND_DECL_TAG to typedef
declaration (Patch 1). Additional selftests are added to cover
unit testing, dedup, or bpf program usage of btf_decl_tag with typedef.
(Patches 2, 3 and 4). The btf documentation is updated to include
BTF_KIND_DECL_TAG typedef (Patch 5).

  [1] https://reviews.llvm.org/D110127
  [2] https://reviews.llvm.org/D112259
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-10-22 17:04:44 -07:00
Yonghong Song
5a8671349d docs/bpf: Update documentation for BTF_KIND_DECL_TAG typedef support
Add BTF_KIND_DECL_TAG typedef support in btf.rst.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211021195649.4020514-1-yhs@fb.com
2021-10-22 17:04:44 -07:00
Yonghong Song
8c18ea2d2c selftests/bpf: Add BTF_KIND_DECL_TAG typedef example in tag.c
Change value type in progs/tag.c to a typedef with a btf_decl_tag.
With `bpftool btf dump file tag.o`, we have
  ...
  [14] TYPEDEF 'value_t' type_id=17
  [15] DECL_TAG 'tag1' type_id=14 component_idx=-1
  [16] DECL_TAG 'tag2' type_id=14 component_idx=-1
  [17] STRUCT '(anon)' size=8 vlen=2
        'a' type_id=2 bits_offset=0
        'b' type_id=2 bits_offset=32
  ...

The btf_tag selftest also succeeded:
  $ ./test_progs -t tag
    #21 btf_tag:OK
    Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211021195643.4020315-1-yhs@fb.com
2021-10-22 17:04:44 -07:00
Yonghong Song
557c8c4804 selftests/bpf: Test deduplication for BTF_KIND_DECL_TAG typedef
Add unit tests for deduplication of BTF_KIND_DECL_TAG to typedef types.
Also changed a few comments from "tag" to "decl_tag" to match
BTF_KIND_DECL_TAG enum value name.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211021195638.4019770-1-yhs@fb.com
2021-10-22 17:04:44 -07:00
Yonghong Song
9d19a12b02 selftests/bpf: Add BTF_KIND_DECL_TAG typedef unit tests
Test good and bad variants of typedef BTF_KIND_DECL_TAG encoding.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211021195633.4019472-1-yhs@fb.com
2021-10-22 17:04:43 -07:00
Yonghong Song
bd16dee66a bpf: Add BTF_KIND_DECL_TAG typedef support
The llvm patches ([1], [2]) added support to attach btf_decl_tag
attributes to typedef declarations. This patch added
support in kernel.

  [1] https://reviews.llvm.org/D110127
  [2] https://reviews.llvm.org/D112259

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211021195628.4018847-1-yhs@fb.com
2021-10-22 17:04:43 -07:00
Andrii Nakryiko
a33f607f68 Merge branch 'libbpf: use func name when pinning programs with LIBBPF_STRICT_SEC_NAME'
Stanislav Fomichev says:

====================

Commit 15669e1dcd ("selftests/bpf: Normalize all the rest SEC() uses")
broke flow dissector tests. With the strict section names, bpftool isn't
able to pin all programs of the objects (all section names are the
same now). To bring it back to life let's do the following:

- teach libbpf to pin by func name with LIBBPF_STRICT_SEC_NAME
- enable strict mode in bpftool (breaking cli change)
- fix custom flow_dissector loader to use strict mode
- fix flow_dissector tests to use new pin names (func vs sec)

v5:
- get rid of error when retrying with '/' (Quentin Monnet)

v4:
- fix comment spelling (Quentin Monnet)
- retry progtype without / (Quentin Monnet)

v3:
- clarify program pinning in LIBBPF_STRICT_SEC_NAME,
  for real this time (Andrii Nakryiko)
- fix possible segfault in __bpf_program__pin_name (Andrii Nakryiko)

v2:
- add github issue (Andrii Nakryiko)
- remove sec_name from bpf_program.pin_name comment (Andrii Nakryiko)
- add cover letter (Andrii Nakryiko)
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-10-22 16:53:38 -07:00
Stanislav Fomichev
d1321207b1 selftests/bpf: Fix flow dissector tests
- update custom loader to search by name, not section name
- update bpftool commands to use proper pin path

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211021214814.1236114-4-sdf@google.com
2021-10-22 16:53:38 -07:00