Commit graph

1043026 commits

Author SHA1 Message Date
Johan Almbladh
d4ff9ee2dc bpf/tests: Add JMP tests with small offsets
This patch adds a set of tests for JMP to verify that the JITed jump
offset is calculated correctly. We pretend that the verifier has inserted
any zero extensions to make the jump-over operations JIT to one
instruction each, in order to control the exact JITed jump offset.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210914091842.4186267-10-johan.almbladh@anyfinetworks.com
2021-09-28 09:26:28 +02:00
Johan Almbladh
27cc6dac6e bpf/tests: Add test case flag for verifier zero-extension
This patch adds a new flag to indicate that the verified did insert
zero-extensions, even though the verifier is not being run for any
of the tests.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210914091842.4186267-9-johan.almbladh@anyfinetworks.com
2021-09-28 09:26:28 +02:00
Johan Almbladh
2e80761194 bpf/tests: Add exhaustive test of LD_IMM64 immediate magnitudes
This patch adds a test for the 64-bit immediate load, a two-instruction
operation, to verify correctness for all possible magnitudes of the
immediate operand. Mainly intended for JIT testing.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210914091842.4186267-8-johan.almbladh@anyfinetworks.com
2021-09-28 09:26:28 +02:00
Johan Almbladh
a7d2e752e5 bpf/tests: Add staggered JMP and JMP32 tests
This patch adds a new type of jump test where the program jumps forwards
and backwards with increasing offset. It mainly tests JITs where a
relative jump may generate different JITed code depending on the offset
size, read MIPS.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210914091842.4186267-7-johan.almbladh@anyfinetworks.com
2021-09-28 09:26:28 +02:00
Johan Almbladh
a5a36544de bpf/tests: Add exhaustive tests of JMP operand magnitudes
This patch adds a set of tests for conditional JMP and JMP32 operations to
verify correctness for all possible magnitudes of the immediate and
register operands. Mainly intended for JIT testing.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210914091842.4186267-6-johan.almbladh@anyfinetworks.com
2021-09-28 09:26:28 +02:00
Johan Almbladh
9298e63eaf bpf/tests: Add exhaustive tests of ALU operand magnitudes
This patch adds a set of tests for ALU64 and ALU32 arithmetic and bitwise
logical operations to verify correctness for all possible magnitudes of
the register and immediate operands. Mainly intended for JIT testing.

The patch introduces a pattern generator that can be used to drive
extensive tests of different kinds of operations. It is parameterized
to allow tuning of the operand combinations to test.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210914091842.4186267-5-johan.almbladh@anyfinetworks.com
2021-09-28 09:26:28 +02:00
Johan Almbladh
68c956fe74 bpf/tests: Add exhaustive tests of ALU shift values
This patch adds a set of tests for ALU64 and ALU32 shift operations to
verify correctness for all possible values of the shift value. Mainly
intended for JIT testing.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210914091842.4186267-4-johan.almbladh@anyfinetworks.com
2021-09-28 09:26:28 +02:00
Johan Almbladh
4bc354138d bpf/tests: Reduce memory footprint of test suite
The test suite used to call any fill_helper callbacks to generate eBPF
program data for all test cases at once. This caused ballooning memory
requirements as more extensive test cases were added. Now the each
fill_helper is called before the test is run and the allocated memory
released afterwards, before the next test case is processed.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210914091842.4186267-3-johan.almbladh@anyfinetworks.com
2021-09-28 09:26:28 +02:00
Johan Almbladh
c2a228d69c bpf/tests: Allow different number of runs per test case
This patch allows a test cast to specify the number of runs to use. For
compatibility with existing test case definitions, the default value 0
is interpreted as MAX_TESTRUNS.

A reduced number of runs is useful for complex test programs where 1000
runs may take a very long time. Instead of reducing what is tested, one
can instead reduce the number of times the test is run.

Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210914091842.4186267-2-johan.almbladh@anyfinetworks.com
2021-09-28 09:26:27 +02:00
Toke Høiland-Jørgensen
c3e8c44a90 libbpf: Ignore STT_SECTION symbols in 'maps' section
When parsing legacy map definitions, libbpf would error out when
encountering an STT_SECTION symbol. This becomes a problem because some
versions of binutils will produce SECTION symbols for every section when
processing an ELF file, so BPF files run through 'strip' will end up with
such symbols, making libbpf refuse to load them.

There's not really any reason why erroring out is strictly necessary, so
change libbpf to just ignore SECTION symbols when parsing the ELF.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210927205810.715656-1-toke@redhat.com
2021-09-27 21:29:37 -07:00
Daniel Borkmann
4c9f093720 Merge branch 'bpf-xsk-rx-batch'
Magnus Karlsson says:

====================
This patch set introduces a batched interface for Rx buffer allocation
in AF_XDP buffer pool. Instead of using xsk_buff_alloc(*pool), drivers
can now use xsk_buff_alloc_batch(*pool, **xdp_buff_array,
max). Instead of returning a pointer to an xdp_buff, it returns the
number of xdp_buffs it managed to allocate up to the maximum value of
the max parameter in the function call. Pointers to the allocated
xdp_buff:s are put in the xdp_buff_array supplied in the call. This
could be a SW ring that already exists in the driver or a new
structure that the driver has allocated.

  u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool,
                           struct xdp_buff **xdp,
                           u32 max);

When using this interface, the driver should also use the new
interface below to set the relevant fields in the struct xdp_buff. The
reason for this is that xsk_buff_alloc_batch() does not fill in the
data and data_meta fields for you as is the case with
xsk_buff_alloc(). So it is not sufficient to just set data_end
(effectively the size) anymore in the driver. The reason for this is
performance as explained in detail in the commit message.

  void xsk_buff_set_size(struct xdp_buff *xdp, u32 size);

Patch 6 also optimizes the buffer allocation in the aligned case. In
this case, we can skip the reinitialization of most fields in the
xdp_buff_xsk struct at allocation time. As the number of elements in
the heads array is equal to the number of possible buffers in the
umem, we can initialize them once and for all at bind time and then
just point to the correct one in the xdp_buff_array that is returned
to the driver. No reason to have a stack of free head entries. In the
unaligned case, the buffers can reside anywhere in the umem, so this
optimization is not possible as we still have to fill in the right
information in the xdp_buff every single time one is allocated.

I have updated i40e and ice to use this new batched interface.

These are the throughput results on my 2.1 GHz Cascade Lake system:

Aligned mode:
ice: +11% / -9 cycles/pkt
i40e: +12% / -9 cycles/pkt

Unaligned mode:
ice: +1.5% / -1 cycle/pkt
i40e: +1% / -1 cycle/pkt

For the aligned case, batching provides around 40% of the performance
improvement and the aligned optimization the rest, around 60%. Would
have expected a ~4% boost for unaligned with this data, but I only get
around 1%. Do not know why. Note that memory consumption in aligned
mode is also reduced by this patch set.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2021-09-28 00:19:13 +02:00
Magnus Karlsson
e34087fc00 selftests: xsk: Add frame_headroom test
Add a test for the frame_headroom feature that can be set on the
umem. The logic added validates that all offsets in all tests and
packets are valid, not just the ones that have a specifically
configured frame_headroom.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-14-magnus.karlsson@gmail.com
2021-09-28 00:18:35 +02:00
Magnus Karlsson
e4e9baf06a selftests: xsk: Change interleaving of packets in unaligned mode
Change the interleaving of packets in unaligned mode. With the current
buffer addresses in the packet stream, the last buffer in the umem
could not be used as a large packet could potentially write over the
end of the umem. The kernel correctly threw this buffer address away
and refused to use it. This is perfectly fine for all regular packet
streams, but the ones used for unaligned mode have every other packet
being at some different offset. As we will add checks for correct
offsets in the next patch, this needs to be fixed. Just start these
page-boundary straddling buffers one page earlier so that the last
one is not on the last page of the umem, making all buffers valid.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-13-magnus.karlsson@gmail.com
2021-09-28 00:18:35 +02:00
Magnus Karlsson
96a40678ce selftests: xsk: Add single packet test
Add a test where a single packet is sent and received. This might
sound like a silly test, but since many of the interfaces in xsk are
batched, it is important to be able to validate that we did not break
something as fundamental as just receiving single packets, instead of
batches of packets at high speed.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-12-magnus.karlsson@gmail.com
2021-09-28 00:18:35 +02:00
Magnus Karlsson
1bf3649688 selftests: xsk: Introduce pacing of traffic
Introduce pacing of traffic so that the Tx thread can never send more
packets than the receiver has processed plus the number of packets it
can have in its umem. So at any point in time, the number of in flight
packets (not processed by the Rx thread) are less than or equal to the
number of packets that can be held in the Rx thread's umem.

The batch size is also increased to improve running time.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-11-magnus.karlsson@gmail.com
2021-09-28 00:18:35 +02:00
Magnus Karlsson
89013b8a29 selftests: xsk: Fix socket creation retry
The socket creation retry unnecessarily registered the umem once for
every retry. No reason to do this. It wastes memory and it might lead
to too many pages being locked at some point and the failure of a
test.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-10-magnus.karlsson@gmail.com
2021-09-28 00:18:35 +02:00
Magnus Karlsson
872a1184db selftests: xsk: Put the same buffer only once in the fill ring
Fix a problem where the fill ring was populated with too many
entries. If number of buffers in the umem was smaller than the fill
ring size, the code used to loop over from the beginning of the umem
and start putting the same buffers in again. This is racy indeed as a
later packet can be received overwriting an earlier one before the Rx
thread manages to validate it.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-9-magnus.karlsson@gmail.com
2021-09-28 00:18:35 +02:00
Magnus Karlsson
5b13205612 selftests: xsk: Fix missing initialization
Fix missing initialization of the member rx_pkt_nb in the packet
stream. This leads to some tests declaring success too early as the
test thought all packets had already been received.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-8-magnus.karlsson@gmail.com
2021-09-28 00:18:35 +02:00
Magnus Karlsson
94033cd8e7 xsk: Optimize for aligned case
Optimize for the aligned case by precomputing the parameter values of
the xdp_buff_xsk and xdp_buff structures in the heads array. We can do
this as the heads array size is equal to the number of chunks in the
umem for the aligned case. Then every entry in this array will reflect
a certain chunk/frame and can therefore be prepopulated with the
correct values and we can drop the use of the free_heads stack. Note
that it is not possible to allocate more buffers than what has been
allocated in the aligned case since each chunk can only contain a
single buffer.

We can unfortunately not do this in the unaligned case as one chunk
might contain multiple buffers. In this case, we keep the old scheme
of populating a heads entry every time it is used and using
the free_heads stack.

Also move xp_release() and xp_get_handle() to xsk_buff_pool.h. They
were for some reason in xsk.c even though they are buffer pool
operations.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-7-magnus.karlsson@gmail.com
2021-09-28 00:18:35 +02:00
Magnus Karlsson
6aab0bb0c5 i40e: Use the xsk batched rx allocation interface
Use the new xsk batched rx allocation interface for the zero-copy data
path. As the array of struct xdp_buff pointers kept by the driver is
really a ring that wraps, the allocation routine is modified to detect
a wrap and in that case call the allocation function twice. The
allocation function cannot deal with wrapped rings, only arrays. As we
now know exactly how many buffers we get and that there is no
wrapping, the allocation function can be simplified even more as all
if-statements in the allocation loop can be removed, improving
performance.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-6-magnus.karlsson@gmail.com
2021-09-28 00:18:35 +02:00
Magnus Karlsson
db804cfc21 ice: Use the xsk batched rx allocation interface
Use the new xsk batched rx allocation interface for the zero-copy data
path. As the array of struct xdp_buff pointers kept by the driver is
really a ring that wraps, the allocation routine is modified to detect
a wrap and in that case call the allocation function twice. The
allocation function cannot deal with wrapped rings, only arrays. As we
now know exactly how many buffers we get and that there is no
wrapping, the allocation function can be simplified even more as all
if-statements in the allocation loop can be removed, improving
performance.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-5-magnus.karlsson@gmail.com
2021-09-28 00:18:35 +02:00
Magnus Karlsson
57f7f8b6bc ice: Use xdp_buf instead of rx_buf for xsk zero-copy
In order to use the new xsk batched buffer allocation interface, a
pointer to an array of struct xsk_buff pointers need to be provided so
that the function can put the result of the allocation there. In the
ice driver, we already have a ring that stores pointers to
xdp_buffs. This is only used for the xsk zero-copy driver and is a
union with the structure that is used for the regular non zero-copy
path. Unfortunately, that structure is larger than the xdp_buffs
pointers which mean that there will be a stride (of 20 bytes) between
each xdp_buff pointer. And feeding this into the xsk_buff_alloc_batch
interface will not work since it assumes a regular array of xdp_buff
pointers (each 8 bytes with 0 bytes in-between them on a 64-bit
system).

To fix this, remove the xdp_buff pointer from the rx_buf union and
move it one step higher to the union above which only has pointers to
arrays in it. This solves the problem and we can directly feed the SW
ring of xdp_buff pointers straight into the allocation function in the
next patch when that interface is used. This will improve performance.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-4-magnus.karlsson@gmail.com
2021-09-28 00:18:34 +02:00
Magnus Karlsson
47e4075df3 xsk: Batched buffer allocation for the pool
Add a new driver interface xsk_buff_alloc_batch() offering batched
buffer allocations to improve performance. The new interface takes
three arguments: the buffer pool to allocated from, a pointer to an
array of struct xdp_buff pointers which will contain pointers to the
allocated xdp_buffs, and an unsigned integer specifying the max number
of buffers to allocate. The return value is the actual number of
buffers that the allocator managed to allocate and it will be in the
range 0 <= N <= max, where max is the third parameter to the function.

u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp,
                         u32 max);

A second driver interface is also introduced that need to be used in
conjunction with xsk_buff_alloc_batch(). It is a helper that sets the
size of struct xdp_buff and is used by the NIC Rx irq routine when
receiving a packet. This helper sets the three struct members data,
data_meta, and data_end. The two first ones is in the xsk_buff_alloc()
case set in the allocation routine and data_end is set when a packet
is received in the receive irq function. This unfortunately leads to
worse performance since the xdp_buff is touched twice with a long time
period in between leading to an extra cache miss. Instead, we fill out
the xdp_buff with all 3 fields at one single point in time in the
driver, when the size of the packet is known. Hence this helper. Note
that the driver has to use this helper (or set all three fields
itself) when using xsk_buff_alloc_batch(). xsk_buff_alloc() works as
before and does not require this.

void xsk_buff_set_size(struct xdp_buff *xdp, u32 size);

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-3-magnus.karlsson@gmail.com
2021-09-28 00:18:34 +02:00
Magnus Karlsson
10a5e009b9 xsk: Get rid of unused entry in struct xdp_buff_xsk
Get rid of the unused entry "unaligned" in struct xdp_buff_xsk.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210922075613.12186-2-magnus.karlsson@gmail.com
2021-09-28 00:18:34 +02:00
Alexei Starovoitov
e7d5184b24 Merge branch 'bpf: Support <8-byte scalar spill and refill'
Martin KaFai says:

====================

The verifier currently does not save the reg state when
spilling <8byte bounded scalar to the stack.  The bpf program
will be incorrectly rejected when this scalar is refilled to
the reg and then used to offset into a packet header.
The later patch has a simplified bpf prog from a real use case
to demonstrate this case.  The current work around is
to reparse the packet again such that this offset scalar
is close to where the packet data will be accessed to
avoid the spill.  Thus, the header is parsed twice.

The llvm patch [1] will align the <8bytes spill to
the 8-byte stack address.  This set is to make the necessary
changes in verifier to support <8byte scalar spill and refill.

[1] https://reviews.llvm.org/D109073

v2:
- Changed the xdpwall selftest in patch 3 to trigger a u32
  spill at a non 8-byte aligned stack address.  The v1 has
  simplified the real example too much such that it only
  triggers a u32 spill but does not spill at a non
  8-byte aligned stack address.
- Changed README.rst in patch 3 to explain the llvm dependency
  for the xdpwall test.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-09-26 13:07:28 -07:00
Martin KaFai Lau
ef979017b8 bpf: selftest: Add verifier tests for <8-byte scalar spill and refill
This patch adds a few verifier tests for <8-byte spill and refill.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210922004953.627183-1-kafai@fb.com
2021-09-26 13:07:28 -07:00
Martin KaFai Lau
54ea6079b7 bpf: selftest: A bpf prog that has a 32bit scalar spill
It is a simplified example that can trigger a 32bit scalar spill.
The const scalar is refilled and added to a skb->data later.
Since the reg state of the 32bit scalar spill is not saved now,
adding the refilled reg to skb->data and then comparing it with
skb->data_end cannot verify the skb->data access.

With the earlier verifier patch and the llvm patch [1].  The verifier
can correctly verify the bpf prog.

Here is the snippet of the verifier log that leads to verifier conclusion
that the packet data is unsafe to read.  The log is from the kerne
without the previous verifier patch to save the <8-byte scalar spill.
67: R0=inv1 R1=inv17 R2=invP2 R3=inv1 R4=pkt(id=0,off=68,r=102,imm=0) R5=inv102 R6=pkt(id=0,off=62,r=102,imm=0) R7=pkt(id=0,off=0,r=102,imm=0) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0
67: (63) *(u32 *)(r10 -12) = r5
68: R0=inv1 R1=inv17 R2=invP2 R3=inv1 R4=pkt(id=0,off=68,r=102,imm=0) R5=inv102 R6=pkt(id=0,off=62,r=102,imm=0) R7=pkt(id=0,off=0,r=102,imm=0) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0 fp-16=mmmm????
...
101: R0_w=map_value_or_null(id=2,off=0,ks=16,vs=1,imm=0) R6_w=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=0,off=0,r=102,imm=0) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0 fp-16=mmmmmmmm
101: (61) r1 = *(u32 *)(r10 -12)
102: R0_w=map_value_or_null(id=2,off=0,ks=16,vs=1,imm=0) R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6_w=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=0,off=0,r=102,imm=0) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0 fp-16=mmmmmmmm
102: (bc) w1 = w1
103: R0_w=map_value_or_null(id=2,off=0,ks=16,vs=1,imm=0) R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6_w=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=0,off=0,r=102,imm=0) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0 fp-16=mmmmmmmm
103: (0f) r7 += r1
last_idx 103 first_idx 67
regs=2 stack=0 before 102: (bc) w1 = w1
regs=2 stack=0 before 101: (61) r1 = *(u32 *)(r10 -12)
104: R0_w=map_value_or_null(id=2,off=0,ks=16,vs=1,imm=0) R1_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6_w=pkt(id=0,off=70,r=102,imm=0) R7_w=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9=inv17 R10=fp0 fp-16=mmmmmmmm
...
127: R0_w=inv1 R1=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9_w=invP17 R10=fp0 fp-16=mmmmmmmm
127: (bf) r1 = r7
128: R0_w=inv1 R1_w=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9_w=invP17 R10=fp0 fp-16=mmmmmmmm
128: (07) r1 += 8
129: R0_w=inv1 R1_w=pkt(id=3,off=8,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9_w=invP17 R10=fp0 fp-16=mmmmmmmm
129: (b4) w0 = 1
130: R0=inv1 R1=pkt(id=3,off=8,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9=invP17 R10=fp0 fp-16=mmmmmmmm
130: (2d) if r1 > r8 goto pc-66
 R0=inv1 R1=pkt(id=3,off=8,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9=invP17 R10=fp0 fp-16=mmmmmmmm
131: R0=inv1 R1=pkt(id=3,off=8,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=70,r=102,imm=0) R7=pkt(id=3,off=0,r=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8=pkt_end(id=0,off=0,imm=0) R9=invP17 R10=fp0 fp-16=mmmmmmmm
131: (69) r6 = *(u16 *)(r7 +0)
invalid access to packet, off=0 size=2, R7(id=3,off=0,r=0)
R7 offset is outside of the packet

[1]: https://reviews.llvm.org/D109073

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210922004947.626286-1-kafai@fb.com
2021-09-26 13:07:28 -07:00
Martin KaFai Lau
354e8f1970 bpf: Support <8-byte scalar spill and refill
The verifier currently does not save the reg state when
spilling <8byte bounded scalar to the stack.  The bpf program
will be incorrectly rejected when this scalar is refilled to
the reg and then used to offset into a packet header.
The later patch has a simplified bpf prog from a real use case
to demonstrate this case.  The current work around is
to reparse the packet again such that this offset scalar
is close to where the packet data will be accessed to
avoid the spill.  Thus, the header is parsed twice.

The llvm patch [1] will align the <8bytes spill to
the 8-byte stack address.  This can simplify the verifier
support by avoiding to store multiple reg states for
each 8 byte stack slot.

This patch changes the verifier to save the reg state when
spilling <8bytes scalar to the stack.  This reg state saving
is limited to spill aligned to the 8-byte stack address.
The current refill logic has already called coerce_reg_to_size(),
so coerce_reg_to_size() is not called on state->stack[spi].spilled_ptr
during spill.

When refilling in check_stack_read_fixed_off(),  it checks
the refill size is the same as the number of bytes marked with
STACK_SPILL before restoring the reg state.  When restoring
the reg state to state->regs[dst_regno], it needs
to avoid the state->regs[dst_regno].subreg_def being
over written because it has been marked by the check_reg_arg()
earlier [check_mem_access() is called after check_reg_arg() in
do_check()].  Reordering check_mem_access() and check_reg_arg()
will need a lot of changes in test_verifier's tests because
of the difference in verifier's error message.  Thus, the
patch here is to save the state->regs[dst_regno].subreg_def
first in check_stack_read_fixed_off().

There are cases that the verifier needs to scrub the spilled slot
from STACK_SPILL to STACK_MISC.  After this patch the spill is not always
in 8 bytes now, so it can no longer assume the other 7 bytes are always
marked as STACK_SPILL.  In particular, the scrub needs to avoid marking
an uninitialized byte from STACK_INVALID to STACK_MISC.  Otherwise, the
verifier will incorrectly accept bpf program reading uninitialized bytes
from the stack.  A new helper scrub_spilled_slot() is created for this
purpose.

[1]: https://reviews.llvm.org/D109073

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210922004941.625398-1-kafai@fb.com
2021-09-26 13:07:27 -07:00
Martin KaFai Lau
27113c59b6 bpf: Check the other end of slot_type for STACK_SPILL
Every 8 bytes of the stack is tracked by a bpf_stack_state.
Within each bpf_stack_state, there is a 'u8 slot_type[8]' to track
the type of each byte.  Verifier tests slot_type[0] == STACK_SPILL
to decide if the spilled reg state is saved.  Verifier currently only
saves the reg state if the whole 8 bytes are spilled to the stack,
so checking the slot_type[7] is the same as checking slot_type[0].

The later patch will allow verifier to save the bounded scalar
reg also for <8 bytes spill.  There is a llvm patch [1] to ensure
the <8 bytes spill will be 8-byte aligned,  so checking
slot_type[7] instead of slot_type[0] is required.

While at it, this patch refactors the slot_type[0] == STACK_SPILL
test into a new function is_spilled_reg() and change the
slot_type[0] check to slot_type[7] check in there also.

[1] https://reviews.llvm.org/D109073

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210922004934.624194-1-kafai@fb.com
2021-09-26 13:07:27 -07:00
Yonghong Song
091037fb77 selftests/bpf: Fix btf_dump __int128 test failure with clang build kernel
With clang build kernel (adding LLVM=1 to kernel and selftests/bpf build
command line), I hit the following test failure:

  $ ./test_progs -t btf_dump
  ...
  btf_dump_data:PASS:ensure expected/actual match 0 nsec
  btf_dump_data:FAIL:find type id unexpected find type id: actual -2 < expected 0
  btf_dump_data:FAIL:find type id unexpected find type id: actual -2 < expected 0
  test_btf_dump_int_data:FAIL:dump __int128 unexpected error: -2 (errno 2)
  #15/9 btf_dump/btf_dump: int_data:FAIL

Further analysis showed gcc build kernel has type "__int128" in dwarf/BTF
and it doesn't exist in clang build kernel. Code searching for kernel code
found the following:
  arch/s390/include/asm/types.h:  unsigned __int128 pair;
  crypto/ecc.c:   unsigned __int128 m = (unsigned __int128)left * right;
  include/linux/math64.h: return (u64)(((unsigned __int128)a * mul) >> shift);
  include/linux/math64.h: return (u64)(((unsigned __int128)a * mul) >> shift);
  lib/ubsan.h:typedef __int128 s_max;
  lib/ubsan.h:typedef unsigned __int128 u_max;

In my case, CONFIG_UBSAN is not enabled. Even if we only have "unsigned __int128"
in the code, somehow gcc still put "__int128" in dwarf while clang didn't.
Hence current test works fine for gcc but not for clang.

Enabling CONFIG_UBSAN is an option to provide __int128 type into dwarf
reliably for both gcc and clang, but not everybody enables CONFIG_UBSAN
in their kernel build. So the best choice is to use "unsigned __int128" type
which is available in both clang and gcc build kernels. But clang and gcc
dwarf encoded names for "unsigned __int128" are different:

  [$ ~] cat t.c
  unsigned __int128 a;
  [$ ~] gcc -g -c t.c && llvm-dwarfdump t.o | grep __int128
                  DW_AT_type      (0x00000031 "__int128 unsigned")
                  DW_AT_name      ("__int128 unsigned")
  [$ ~] clang -g -c t.c && llvm-dwarfdump t.o | grep __int128
                  DW_AT_type      (0x00000033 "unsigned __int128")
                  DW_AT_name      ("unsigned __int128")

The test change in this patch tries to test type name before
doing actual test.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20210924025856.2192476-1-yhs@fb.com
2021-09-24 15:35:03 -07:00
Alexei Starovoitov
c86216bc96 bpf: Document BPF licensing.
Document and clarify BPF licensing.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Joe Stringer <joe@cilium.io>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Acked-by: Dave Thaler <dthaler@microsoft.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210917230034.51080-1-alexei.starovoitov@gmail.com
2021-09-22 23:13:13 +02:00
Jiri Benc
17b52c226a seltests: bpf: test_tunnel: Use ip neigh
The 'arp' command is deprecated and is another dependency of the selftest.
Just use 'ip neigh', the test depends on iproute2 already.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/40f24b9d3f0f53b5c44471b452f9a11f4d13b7af.1632236133.git.jbenc@redhat.com
2021-09-21 20:40:16 -07:00
Alexei Starovoitov
a3d697ff8d Merge branch 'libbpf: add legacy uprobe support'
Andrii Nakryiko says:

====================

Implement libbpf support for attaching uprobes/uretprobes using legacy
tracefs interfaces. This is a logical complement to recently landed legacy
kprobe support ([0]). This patch refactors existing legacy kprobe code to be more
uniform with uprobe code as well, making the logic easier to compare and
follow.

This patch set also fixes two bugs recently found by Coverity in legacy kprobe
handling code, and thus subsumes previously submitted two patches ([1]):
original patch #1 is kept as is, while original patch #2 was dropped because
patch #3 of the current series refactors and fixes affected code.

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20210912064844.3181742-1-rafaeldtinoco@gmail.com/
  [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=549977&state=*

v1->v2:
  - drop 'legacy = true' debug left-over and explain legacy check (Alexei).
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-09-21 19:40:09 -07:00
Andrii Nakryiko
cc10623c68 libbpf: Add legacy uprobe attaching support
Similarly to recently added legacy kprobe attach interface support
through tracefs, support attaching uprobes using the legacy interface if
host kernel doesn't support newer FD-based interface.

For uprobes event name consists of "libbpf_" prefix, PID, sanitized
binary path and offset within that binary. Structuraly the code is
aligned with kprobe logic refactoring in previous patch. struct
bpf_link_perf is re-used and all the same legacy_probe_name and
legacy_is_retprobe fields are used to ensure proper cleanup on
bpf_link__destroy().

Users should be aware, though, that on old kernels which don't support
FD-based interface for kprobe/uprobe attachment, if the application
crashes before bpf_link__destroy() is called, uprobe legacy
events will be left in tracefs. This is the same limitation as with
legacy kprobe interfaces.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210921210036.1545557-5-andrii@kernel.org
2021-09-21 19:40:09 -07:00
Andrii Nakryiko
46ed5fc33d libbpf: Refactor and simplify legacy kprobe code
Refactor legacy kprobe handling code to follow the same logic as uprobe
legacy logic added in the next patchs:
  - add append_to_file() helper that makes it simpler to work with
    tracefs file-based interface for creating and deleting probes;
  - move out probe/event name generation outside of the code that
    adds/removes it, which simplifies bookkeeping significantly;
  - change the probe name format to start with "libbpf_" prefix and
    include offset within kernel function;
  - switch 'unsigned long' to 'size_t' for specifying kprobe offsets,
    which is consistent with how uprobes define that, simplifies
    printf()-ing internally, and also avoids unnecessary complications on
    architectures where sizeof(long) != sizeof(void *).

This patch also implicitly fixes the problem with invalid open() error
handling present in poke_kprobe_events(), which (the function) this
patch removes.

Fixes: ca304b40c2 ("libbpf: Introduce legacy kprobe events support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210921210036.1545557-4-andrii@kernel.org
2021-09-21 19:40:09 -07:00
Andrii Nakryiko
d3b0e3b03c selftests/bpf: Adopt attach_probe selftest to work on old kernels
Make sure to not use ref_ctr_off feature when running on old kernels
that don't support this feature. This allows to test libbpf's legacy
kprobe and uprobe logic on old kernels.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210921210036.1545557-3-andrii@kernel.org
2021-09-21 19:40:09 -07:00
Andrii Nakryiko
303a257223 libbpf: Fix memory leak in legacy kprobe attach logic
In some error scenarios legacy_probe string won't be free()'d. Fix this.
This was reported by Coverity static analysis.

Fixes: ca304b40c2 ("libbpf: Introduce legacy kprobe events support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210921210036.1545557-2-andrii@kernel.org
2021-09-21 19:40:08 -07:00
Gokul Sivakumar
cf8980a362 samples: bpf: Convert ARP table network order fields into readable format
The ARP table that is dumped when the xdp_router_ipv4 process is launched
has the IP address & MAC address in non-readable network byte order format,
also the alignment is off when printing the table.

Address HwAddress
160000e0                1600005e0001
ff96a8c0                ffffffffffff
faffffef                faff7f5e0001
196a8c0		9607871293ea
fb0000e0                fb00005e0001
0               0
196a8c0		9607871293ea
ffff11ac                ffffffffffff
faffffef                faff7f5e0001
fb0000e0                fb00005e0001
160000e0                1600005e0001
160000e0                1600005e0001
faffffef                faff7f5e0001
fb0000e0                fb00005e0001
40011ac         40011ac4202

Fix this by converting the "Address" field from network byte order Hex into
dotted decimal notation IPv4 format and "HwAddress" field from network byte
order Hex into Colon separated Hex format. Also fix the aligntment of the
fields in the ARP table.

Address         HwAddress
224.0.0.22      01:00:5e:00:00:16
192.168.150.255 ff:ff:ff:ff:ff:ff
239.255.255.250 01:00:5e:7f:ff:fa
192.168.150.1	ea:93:12:87:07:96
224.0.0.251     01:00:5e:00:00:fb
0.0.0.0         00:00:00:00:00:00
192.168.150.1	ea:93:12:87:07:96
172.17.255.255  ff:ff:ff:ff:ff:ff
239.255.255.250 01:00:5e:7f:ff:fa
224.0.0.251     01:00:5e:00:00:fb
224.0.0.22      01:00:5e:00:00:16
224.0.0.22      01:00:5e:00:00:16
239.255.255.250 01:00:5e:7f:ff:fa
224.0.0.251     01:00:5e:00:00:fb
172.17.0.4      02:42:ac:11:00:04

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210919080305.173588-2-gokulkumar792@gmail.com
2021-09-21 13:27:54 -07:00
Gokul Sivakumar
f5c4e4191b samples: bpf: Convert route table network order fields into readable format
The route table that is dumped when the xdp_router_ipv4 process is launched
has the "Gateway" field in non-readable network byte order format, also the
alignment is off when printing the table.

Destination             Gateway         Genmask         Metric          Iface
  0.0.0.0               196a8c0         0               0               enp7s0
  0.0.0.0               196a8c0         0               0               wlp6s0
169.254.0.0             196a8c0         16              0               enp7s0
172.17.0.0                0             16              0               docker0
192.168.150.0             0             24              0               enp7s0
192.168.150.0             0             24              0               wlp6s0

Fix this by converting the "Gateway" field from network byte order Hex into
dotted decimal notation IPv4 format and "Genmask" from CIDR notation into
dotted decimal notation IPv4 format. Also fix the aligntment of the fields
in the route table.

Destination     Gateway         Genmask         Metric Iface
0.0.0.0         192.168.150.1   0.0.0.0         0      enp7s0
0.0.0.0         192.168.150.1   0.0.0.0         0      wlp6s0
169.254.0.0     192.168.150.1   255.255.0.0     0      enp7s0
172.17.0.0      0.0.0.0         255.255.0.0     0      docker0
192.168.150.0   0.0.0.0         255.255.255.0   0      enp7s0
192.168.150.0   0.0.0.0         255.255.255.0   0      wlp6s0

Signed-off-by: Gokul Sivakumar <gokulkumar792@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210919080305.173588-1-gokulkumar792@gmail.com
2021-09-21 13:27:54 -07:00
Grant Seltzer
97c140d94e libbpf: Add doc comments in libbpf.h
This adds comments above functions in libbpf.h which document
their uses. These comments are of a format that doxygen and sphinx
can pick up and render. These are rendered by libbpf.readthedocs.org

These doc comments are for:
- bpf_object__find_map_by_name()
- bpf_map__fd()
- bpf_map__is_internal()
- libbpf_get_error()
- libbpf_num_possible_cpus()

Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210918031457.36204-1-grantseltzer@gmail.com
2021-09-20 17:23:31 -07:00
Alexei Starovoitov
e57f52b42d Merge branch 'bpf: implement variadic printk helper'
Dave Marchevsky says:

====================

This series introduces a new helper, bpf_trace_vprintk, which functions
like bpf_trace_printk but supports > 3 arguments via a pseudo-vararg u64
array. The bpf_printk libbpf convenience macro is modified to use
bpf_trace_vprintk when > 3 varargs are passed, otherwise the previous
behavior - using bpf_trace_printk - is retained.

Helper functions and macros added during the implementation of
bpf_seq_printf and bpf_snprintf do most of the heavy lifting for
bpf_trace_vprintk. There's no novel format string wrangling here.

Usecase here is straightforward: Giving BPF program writers a more
powerful printk will ease development of BPF programs, particularly
during debugging and testing, where printk tends to be used.

This feature was proposed by Andrii in libbpf mirror's issue tracker
[1].

[1] https://github.com/libbpf/libbpf/issues/315

v5 -> v6: Rebase to pick up newly-added helper

v4 -> v5:

* patch 8: added test for "%pS" format string w/ NULL fmt arg [Daniel]
* patch 8: dmesg -> /sys/kernel/debug/tracing/trace_pipe in commit message [Andrii]
* patch 9: squash into patch 8, remove the added test in favor of just bpf_printk'ing in patch 8's test [Andrii]
    * migrate comment to /* */
* header comments improved$
    * uapi/linux/bpf.h: u64 -> long return type [Daniel]
    * uapi/linux/bpf.h: function description explains benefit of bpf_trace_vprintk over bpf_trace_printk [Daniel]
    * uapi/linux/bpf.h: added patch explaining that data_len should be a multiple of 8 in bpf_seq_printf, bpf_snprintf descriptions [Daniel]
    * tools/lib/bpf/bpf_helpers.h: move comment to new bpf_printk [Andrii]
* rebase

v3 -> v4:
* Add patch 2, which migrates reference_tracking prog_test away from
  bpf_program__load. Could be placed a bit later in the series, but
  wanted to keep the actual vprintk-related patches contiguous
* Add patch 9, which adds a program w/ 0 fmt arg bpf_printk to vprintk
  test
* bpf_printk convenience macro isn't multiline anymore, so simplify [Andrii]
* Add some comments to ___bpf_pick_printk to make it more obvious when
  implementation switches from printk to vprintk [Andrii]
* BPF_PRINTK_FMT_TYPE -> BPF_PRINTK_FMT_MOD for 'static const' fmt string
  in printk wrapper macro [Andrii]
    * checkpatch.pl doesn't like this, says "Macros with complex values
      should be enclosed in parentheses". Strange that it didn't have similar
      complaints about v3's BPF_PRINTK_FMT_TYPE. Regardless, IMO the complaint
      is not highlighting a real issue in the case of this macro.
* Fix alignment of __bpf_vprintk and __bpf_pick_printk [Andrii]
* rebase

v2 -> v3:
* Clean up patch 3's commit message [Alexei]
* Add patch 4, which modifies __bpf_printk to use 'static const char' to
  store fmt string with fallback for older kernels [Andrii]
* rebase

v1 -> v2:

* Naming conversation seems to have gone in favor of keeping
  bpf_trace_vprintk, names are unchanged

* Patch 3 now modifies bpf_printk convenience macro to choose between
  __bpf_printk and __bpf_vprintk 'implementation' macros based on arg
  count. __bpf_vprintk is a renaming of bpf_vprintk convenience macro
  from v1, __bpf_printk is the existing bpf_printk implementation.

  This patch could use some scrutiny as I think current implementation
  may regress developer experience in a specific case, turning a
  compile-time error into a load-time error. Unclear to me how
  common the case is, or whether the macro magic I chose is ideal.

* char ___fmt[] to static const char ___fmt[] change was not done,
  wanted to leave __bpf_printk 'implementation' macro unchanged for v2
  to ease discussion of above point

* Removed __always_inline from __set_printk_clr_event [Andrii]
* Simplified bpf_trace_printk docstring to refer to other functions
  instead of copy/pasting and avoid specifying 12 vararg limit [Andrii]
* Migrated trace_printk selftest to use ASSERT_ instead of CHECK
  * Adds new patch 5, previous patch 5 is now 6
* Migrated trace_vprintk selftest to use ASSERT_ instead of CHECK,
  open_and_load instead of separate open, load [Andrii]
* Patch 2's commit message now correctly mentions trace_pipe instead of
  dmesg [Andrii]
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-09-17 14:02:06 -07:00
Dave Marchevsky
a42effb0b2 bpf: Clarify data_len param in bpf_snprintf and bpf_seq_printf comments
Since the data_len in these two functions is a byte len of the preceding
u64 *data array, it must always be a multiple of 8. If this isn't the
case both helpers error out, so let's make the requirement explicit so
users don't need to infer it.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-10-davemarchevsky@fb.com
2021-09-17 14:02:06 -07:00
Dave Marchevsky
7606729fe2 selftests/bpf: Add trace_vprintk test prog
This commit adds a test prog for vprintk which confirms that:
  * bpf_trace_vprintk is writing to /sys/kernel/debug/tracing/trace_pipe
  * __bpf_vprintk macro works as expected
  * >3 args are printed
  * bpf_printk w/ 0 format args compiles
  * bpf_trace_vprintk call w/ a fmt specifier but NULL fmt data fails

Approach and code are borrowed from trace_printk test.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-9-davemarchevsky@fb.com
2021-09-17 14:02:06 -07:00
Dave Marchevsky
d313d45a22 selftests/bpf: Migrate prog_tests/trace_printk CHECKs to ASSERTs
Guidance for new tests is to use ASSERT macros instead of CHECK. Since
trace_vprintk test will borrow heavily from trace_printk's, migrate its
CHECKs so it remains obvious that the two are closely related.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-8-davemarchevsky@fb.com
2021-09-17 14:02:05 -07:00
Dave Marchevsky
4190c299a4 bpftool: Only probe trace_vprintk feature in 'full' mode
Since commit 368cb0e7cd ("bpftool: Make probes which emit dmesg
warnings optional"), some helpers aren't probed by bpftool unless
`full` arg is added to `bpftool feature probe`.

bpf_trace_vprintk can emit dmesg warnings when probed, so include it.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-7-davemarchevsky@fb.com
2021-09-17 14:02:05 -07:00
Dave Marchevsky
6c66b0e7c9 libbpf: Use static const fmt string in __bpf_printk
The __bpf_printk convenience macro was using a 'char' fmt string holder
as it predates support for globals in libbpf. Move to more efficient
'static const char', but provide a fallback to the old way via
BPF_NO_GLOBAL_DATA so users on old kernels can still use the macro.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-6-davemarchevsky@fb.com
2021-09-17 14:02:05 -07:00
Dave Marchevsky
c2758baa97 libbpf: Modify bpf_printk to choose helper based on arg count
Instead of being a thin wrapper which calls into bpf_trace_printk,
libbpf's bpf_printk convenience macro now chooses between
bpf_trace_printk and bpf_trace_vprintk. If the arg count (excluding
format string) is >3, use bpf_trace_vprintk, otherwise use the older
helper.

The motivation behind this added complexity - instead of migrating
entirely to bpf_trace_vprintk - is to maintain good developer experience
for users compiling against new libbpf but running on older kernels.
Users who are passing <=3 args to bpf_printk will see no change in their
bytecode.

__bpf_vprintk functions similarly to BPF_SEQ_PRINTF and BPF_SNPRINTF
macros elsewhere in the file - it allows use of bpf_trace_vprintk
without manual conversion of varargs to u64 array. Previous
implementation of bpf_printk macro is moved to __bpf_printk for use by
the new implementation.

This does change behavior of bpf_printk calls with >3 args in the "new
libbpf, old kernels" scenario. Before this patch, attempting to use 4
args to bpf_printk results in a compile-time error. After this patch,
using bpf_printk with 4 args results in a trace_vprintk helper call
being emitted and a load-time failure on older kernels.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-5-davemarchevsky@fb.com
2021-09-17 14:02:05 -07:00
Dave Marchevsky
10aceb629e bpf: Add bpf_trace_vprintk helper
This helper is meant to be "bpf_trace_printk, but with proper vararg
support". Follow bpf_snprintf's example and take a u64 pseudo-vararg
array. Write to /sys/kernel/debug/tracing/trace_pipe using the same
mechanism as bpf_trace_printk. The functionality of this helper was
requested in the libbpf issue tracker [0].

[0] Closes: https://github.com/libbpf/libbpf/issues/315

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-4-davemarchevsky@fb.com
2021-09-17 14:02:05 -07:00
Dave Marchevsky
84b4c52960 selftests/bpf: Stop using bpf_program__load
bpf_program__load is not supposed to be used directly. Replace it with
bpf_object__ APIs for the reference_tracking prog_test, which is the
last offender in bpf selftests.

Some additional complexity is added for this test, namely the use of one
bpf_object to iterate through progs, while a second bpf_object is
created and opened/closed to test actual loading of progs. This is
because the test was doing bpf_program__load then __unload to test
loading of individual progs and same semantics with
bpf_object__load/__unload result in failure to load an __unload-ed obj.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-3-davemarchevsky@fb.com
2021-09-17 14:02:05 -07:00
Dave Marchevsky
335ff4990c bpf: Merge printk and seq_printf VARARG max macros
MAX_SNPRINTF_VARARGS and MAX_SEQ_PRINTF_VARARGS are used by bpf helpers
bpf_snprintf and bpf_seq_printf to limit their varargs. Both call into
bpf_bprintf_prepare for print formatting logic and have convenience
macros in libbpf (BPF_SNPRINTF, BPF_SEQ_PRINTF) which use the same
helper macros to convert varargs to a byte array.

Changing shared functionality to support more varargs for either bpf
helper would affect the other as well, so let's combine the _VARARGS
macros to make this more obvious.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210917182911.2426606-2-davemarchevsky@fb.com
2021-09-17 14:02:05 -07:00