Commit graph

1214691 commits

Author SHA1 Message Date
Puranjay Mohan
daabb2b098 bpf/tests: add tests for cpuv4 instructions
The BPF JITs now support cpuv4 instructions. Add tests for these new
instructions to the test suite:

1. Sign extended Load
2. Sign extended Mov
3. Unconditional byte swap
4. Unconditional jump with 32-bit offset
5. Signed division and modulo

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Link: https://lore.kernel.org/r/20230907230550.1417590-9-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 17:16:57 -07:00
Puranjay Mohan
59ff6d63b7 selftest, bpf: enable cpu v4 tests for arm32
Now that all the cpuv4 instructions are supported by the arm32 JIT,
enable the selftests for arm32.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Link: https://lore.kernel.org/r/20230907230550.1417590-8-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 17:16:56 -07:00
Puranjay Mohan
71086041c2 arm32, bpf: add support for 64 bit division instruction
ARM32 doesn't have instructions to do 64-bit/64-bit divisions. So, to
implement the following instructions:
BPF_ALU64 | BPF_DIV
BPF_ALU64 | BPF_MOD
BPF_ALU64 | BPF_SDIV
BPF_ALU64 | BPF_SMOD

We implement the above instructions by doing function calls to div64_u64()
and div64_u64_rem() for unsigned division/mod and calls to div64_s64()
for signed division/mod.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230907230550.1417590-7-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 17:16:56 -07:00
Puranjay Mohan
5097faa559 arm32, bpf: add support for 32-bit signed division
The cpuv4 added a new BPF_SDIV instruction that does signed division.
The encoding is similar to BPF_DIV but BPF_SDIV sets offset=1.

ARM32 already supports 32-bit BPF_DIV which can be easily extended to
support BPF_SDIV as ARM32 has the SDIV instruction. When the CPU is not
ARM-v7, we implement that SDIV/SMOD with the function call similar to
the implementation of DIV/MOD.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230907230550.1417590-6-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 17:16:56 -07:00
Puranjay Mohan
1cfb7eaebe arm32, bpf: add support for unconditional bswap instruction
The cpuv4 added a new unconditional bswap instruction with following
behaviour:

BPF_ALU64 | BPF_TO_LE | BPF_END with imm = 16/32/64 means:
dst = bswap16(dst)
dst = bswap32(dst)
dst = bswap64(dst)

As we already support converting to big-endian from little-endian we can
use the same for unconditional bswap. just treat the unconditional scenario
the same as big-endian conversion.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230907230550.1417590-5-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 17:16:56 -07:00
Puranjay Mohan
fc832653fa arm32, bpf: add support for sign-extension mov instruction
The cpuv4 added a new BPF_MOVSX instruction that sign extends the src
before moving it to the destination.

BPF_ALU | BPF_MOVSX sign extends 8-bit and 16-bit operands into 32-bit
operands, and zeroes the remaining upper 32 bits.

BPF_ALU64 | BPF_MOVSX sign extends 8-bit, 16-bit, and 32-bit  operands
into 64-bit operands.

The offset field of the instruction is used to tell the number of bit to
use for sign-extension. BPF_MOV and BPF_MOVSX have the same code but the
former sets offset to 0 and the later one sets the offset to 8, 16 or 32

The behaviour of this instruction is dst = (s8,s16,s32)src

On ARM32 the implementation uses LSH and ARSH to extend the 8/16 bits to
a 32-bit register and then it is sign extended to the upper 32-bit
register using ARSH. For 32-bit we just move it to the destination
register and use ARSH to extend it to the upper 32-bit register.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230907230550.1417590-4-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 17:16:56 -07:00
Puranjay Mohan
f9e6981b1f arm32, bpf: add support for sign-extension load instruction
The cpuv4 added the support of an instruction that is similar to load
but also sign-extends the result after the load.

BPF_MEMSX | <size> | BPF_LDX means dst = *(signed size *) (src + offset)
here <size> can be one of BPF_B, BPF_H, BPF_W.

ARM32 has instructions to load a byte or a half word with sign
extension into a 32bit register. As the JIT uses two 32 bit registers
to simulate a 64-bit BPF register, an extra instruction is emitted to
sign-extent the result up to the second register.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230907230550.1417590-3-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 17:16:56 -07:00
Puranjay Mohan
471f3d4ee4 arm32, bpf: add support for 32-bit offset jmp instruction
The cpuv4 adds unconditional jump with 32-bit offset where the immediate
field of the instruction is to be used to calculate the jump offset.

BPF_JA | BPF_K | BPF_JMP32 => gotol +imm => PC += imm.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230907230550.1417590-2-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 17:16:56 -07:00
Larysa Zaremba
9b2b86332a bpf: Allow to use kfunc XDP hints and frags together
There is no fundamental reason, why multi-buffer XDP and XDP kfunc RX hints
cannot coexist in a single program.

Allow those features to be used together by modifying the flags condition
for dev-bound-only programs, segments are still prohibited for fully
offloaded programs, hence additional check.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/CAKH8qBuzgtJj=OKMdsxEkyML36VsAuZpcrsXcyqjdKXSJCBq=Q@mail.gmail.com/
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230915083914.65538-1-larysa.zaremba@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15 11:39:46 -07:00
Martin KaFai Lau
45ee73a072 Merge branch 'bpf: expose information about netdev xdp-metadata kfunc support'
Stanislav Fomichev says:

====================
Extend netdev netlink family to expose the bitmask with the
kfuncs that the device implements. The source of truth is the
device's xdp_metadata_ops. There is some amount of auto-generated
netlink boilerplate; the change itself is super minimal.

v2:
- add netdev->xdp_metadata_ops NULL check when dumping to netlink (Martin)

Cc: netdev@vger.kernel.org
Cc: Willem de Bruijn <willemb@google.com>
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15 11:26:58 -07:00
Stanislav Fomichev
0c6c9b105e tools: ynl: extend netdev sample to dump xdp-rx-metadata-features
The tool can be used to verify that everything works end to end.

Unrelated updates:
- include tools/include/uapi to pick the latest kernel uapi headers
- print "xdp-features" and "xdp-rx-metadata-features" so it's clear
  which bitmask is being dumped

Cc: netdev@vger.kernel.org
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230913171350.369987-4-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15 11:26:58 -07:00
Stanislav Fomichev
a9c2a60854 bpf: expose information about supported xdp metadata kfunc
Add new xdp-rx-metadata-features member to netdev netlink
which exports a bitmask of supported kfuncs. Most of the patch
is autogenerated (headers), the only relevant part is netdev.yaml
and the changes in netdev-genl.c to marshal into netlink.

Example output on veth:

$ ip link add veth0 type veth peer name veth1 # ifndex == 12
$ ./tools/net/ynl/samples/netdev 12

Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12
   veth1[12]    xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0

Cc: netdev@vger.kernel.org
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15 11:26:58 -07:00
Stanislav Fomichev
fc45c5b642 bpf: make it easier to add new metadata kfunc
No functional changes.

Instead of having hand-crafted code in bpf_dev_bound_resolve_kfunc,
move kfunc <> xmo handler relationship into XDP_METADATA_KFUNC_xxx.
This way, any time new kfunc is added, we don't have to touch
bpf_dev_bound_resolve_kfunc.

Also document XDP_METADATA_KFUNC_xxx arguments since we now have
more than two and it might be confusing what is what.

Cc: netdev@vger.kernel.org
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230913171350.369987-2-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-15 11:26:58 -07:00
Tirthendu Sarkar
d609f3d228 xsk: add multi-buffer support for sockets sharing umem
Userspace applications indicate their multi-buffer capability to xsk
using XSK_USE_SG socket bind flag. For sockets using shared umem the
bind flag may contain XSK_USE_SG only for the first socket. For any
subsequent socket the only option supported is XDP_SHARED_UMEM.

Add option XDP_UMEM_SG_FLAG in umem config flags to store the
multi-buffer handling capability when indicated by XSK_USE_SG option in
bing flag by the first socket. Use this to derive multi-buffer capability
for subsequent sockets in xsk core.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Fixes: 81470b5c3c ("xsk: introduce XSK_USE_SG bind flag for xsk socket")
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230907035032.2627879-1-tirthendu.sarkar@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15 11:00:22 -07:00
Song Liu
5c04433daf bpf: Charge modmem for struct_ops trampoline
Current code charges modmem for regular trampoline, but not for struct_ops
trampoline. Add bpf_jit_[charge|uncharge]_modmem() to struct_ops so the
trampoline is charged in both cases.

Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230914222542.2986059-1-song@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-14 15:30:45 -07:00
Artem Savkov
971f7c3214 selftests/bpf: Skip module_fentry_shadow test when bpf_testmod is not available
This test relies on bpf_testmod, so skip it if the module is not available.

Fixes: aa3d65de4b ("bpf/selftests: Test fentry attachment to shadowed functions")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230914124928.340701-1-asavkov@redhat.com
2023-09-14 11:16:13 -07:00
Alexei Starovoitov
8fa193412b Merge branch 'seltests-xsk-various-improvements-to-xskxceiver'
Magnus Karlsson says:

====================
seltests/xsk: various improvements to xskxceiver

This patch set implements several improvements to the xsk selftests
test suite that I thought were useful while debugging the xsk
multi-buffer code and tests. The largest new feature is the ability to
be able to execute a single test instead of the whole test suite. This
required some surgery on the current code, details below.

Anatomy of the path set:

1: Print useful info on a per packet basis with the option -v

2: Add a timeout in the transmission loop too. We only used to have
   one for the Rx thread, but Tx can lock up too waiting for
   completions.

3: Add an option (-m) to only run the tests (or a single test with a
   later patch) in a single mode: skb, drv, or zc (zero-copy).

4-5: Preparatory patches to be able to specify a test to run. Need to
     define the test names in a single structure and their entry
     points, so we can use this when wanting to run a specific test.

6: Adds a command line option (-l) that lists all the tests.

7: Adds a command line option (-t) that runs a specific test instead
   of the whole test suite. Can be combined with -m to specify a
   single mode too.

8: Use ksft_print_msg() uniformly throughout the tests. It was a mix
   of printf() and ksft_print_msg() before.

9: In some places, we failed the whole test suite instead of a single
   test in certain circumstances. Fix this so only the test in
   question is failed and the rest of the test suite continues.

10: Display the available command line options with -h

v3 -> v4:
* Fixed another spelling error in patch #9 [Maciej]
* Only allow the actual strings for the -m command [Maciej]
* Move some code from patch #7 to #3 [Maciej]

v2 -> v3:
* Drop the support for environment variables. Probably not useful. [Maciej]
* Fixed spelling mistake in patch #9 [Maciej]
* Fail gracefully if unsupported mode is chosen [Maciej]
* Simplified test run loop [Maciej]

v1 -> v2:

* Introduce XSKTEST_MODE env variable to be able to set the mode to
  use [Przemyslaw]
* Introduce XSKTEST_ETH env variable to be able to set the ethernet
  interface to use by introducing a new patch (#11) [Magnus]
* Fixed spelling error in patch #5 [Przemyslaw, Maciej]
* Fixed confusing documentation in patch #10  [Przemyslaw]
* The -l option can now be used without being root [Magnus, Maciej]
* Fixed documentation error in patch #7 [Maciej]
* Added error handling to the -t option [Maciej]
* -h now displayed as an option [Maciej]

Thanks: Magnus
====================

Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230914084900.492-1-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14 09:48:05 -07:00
Magnus Karlsson
4a5f0ba55f selftests/xsk: display command line options with -h
Add the -h option to display all available command line options
available for test_xsk.sh and xskxceiver.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230914084900.492-11-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14 09:47:56 -07:00
Magnus Karlsson
5fc494d5ab selftests/xsk: fail single test instead of all tests
In a number of places at en error, exit_with_error() is called that
terminates the whole test suite. This is not always desirable as it
would be more logical to only fail that test and then go along with
the other ones. So change this in a number of places in which I
thought it would be more logical to just fail the test in
question. Examples of this are in code that is only used by a single
test.

Also delete a pointless if-statement in receive_pkts() that has an
exit_with_error() in it. It can never occur since the return value is
an unsigned and the test is for less than zero.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230914084900.492-10-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14 09:47:56 -07:00
Magnus Karlsson
7c3fcf088b selftests/xsk: use ksft_print_msg uniformly
Use ksft_print_msg() instead of printf() and fprintf() in all places
as the ksefltests framework is being used. There is only one exception
and that is for the list-of-tests print out option, since no tests are
run in that case.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230914084900.492-9-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14 09:47:56 -07:00
Magnus Karlsson
146e30554a selftests/xsk: add option to run single test
Add a command line option to be able to run a single test. This option
(-t) takes a number from the list of tests available with the "-l"
option. Here are two examples:

Run test number 2, the "receive single packet" test in all available modes:

./test_xsk.sh -t 2

Run test number 21, the metadata copy test in skb mode only

./test_xsh.sh -t 21 -m skb

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230914084900.492-8-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14 09:47:56 -07:00
Magnus Karlsson
c53dab7d39 selftests/xsk: add option that lists all tests
Add a command line option (-l) that lists all the tests. The number
before the test will be used in the next commit for specifying a
single test to run. Here is an example of the output:

Tests:
0: SEND_RECEIVE
1: SEND_RECEIVE_2K_FRAME
2: SEND_RECEIVE_SINGLE_PKT
3: POLL_RX
4: POLL_TX
5: POLL_RXQ_FULL
6: POLL_TXQ_FULL
7: SEND_RECEIVE_UNALIGNED
:
:

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230914084900.492-7-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14 09:47:55 -07:00
Magnus Karlsson
f20fbcd077 selftests/xsk: declare test names in struct
Declare the test names statically in a struct so that we can refer to
them when adding the support to execute a single test in the next
commit. Before this patch, the names of them were not declared in a
single place which made it not possible to refer to them.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230914084900.492-6-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14 09:47:55 -07:00
Magnus Karlsson
13c341c450 selftests/xsk: move all tests to separate functions
Prepare for the capability to be able to run a single test by moving
all the tests to their own functions. This function can then be called
to execute that test in the next commit.

Also, the tests named RUN_TO_COMPLETION_* were not named well, so
change them to SEND_RECEIVE_* as it is just a basic send and receive
test of 4K packets.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230914084900.492-5-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14 09:47:55 -07:00
Magnus Karlsson
3956bc34b6 selftests/xsk: add option to only run tests in a single mode
Add an option -m on the command line that allows the user to run the
tests in a single mode instead of all of them. Valid modes are skb,
drv, and zc (zero-copy). An example:

To run test suite in drv mode only:

./test_xsk.sh -m drv

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230914084900.492-4-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14 09:47:55 -07:00
Magnus Karlsson
64370d7c8a selftests/xsk: add timeout for Tx thread
Add a timeout for the transmission thread. If packets are not
completed properly, for some reason, the test harness would previously
get stuck forever in a while loop. But with this patch, this timeout
will trigger, flag the test as a failure, and continue with the next
test.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230914084900.492-3-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14 09:47:55 -07:00
Magnus Karlsson
2d2712caf4 selftests/xsk: print per packet info in verbose mode
Print info about every packet in verbose mode, both for Tx and
Rx. This is useful to have when a test fails or to validate that a
test is really doing what it was designed to do. Info on what is
supposed to be received and sent is also printed for the custom packet
streams since they differ from the base line. Here is an example:

Tx addr: 37e0 len: 64 options: 0 pkt_nb: 8
Tx addr: 4000 len: 64 options: 0 pkt_nb: 9
Rx: addr: 100 len: 64 options: 0 pkt_nb: 0 valid: 1
Rx: addr: 1100 len: 64 options: 0 pkt_nb: 1 valid: 1
Rx: addr: 2100 len: 64 options: 0 pkt_nb: 4 valid: 1
Rx: addr: 3100 len: 64 options: 0 pkt_nb: 8 valid: 1
Rx: addr: 4100 len: 64 options: 0 pkt_nb: 9 valid: 1

One pointless verbose print statement is also deleted and another one
is made clearer.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230914084900.492-2-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-14 09:47:55 -07:00
Quan Tian
558c50cc3b docs/bpf: update out-of-date doc in BPF flow dissector
Commit a5e2151ff9 ("net/ipv6: SKB symmetric hash should incorporate
transport ports") removed the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL
in __skb_get_hash_symmetric(), making the doc out-of-date.

Signed-off-by: Quan Tian <qtian@vmware.com>
Link: https://lore.kernel.org/r/20230911152353.8280-1-qtian@vmware.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-13 14:22:57 -07:00
Alexei Starovoitov
5bbb9e1f08 Merge branch 'bpf-x64-fix-tailcall-infinite-loop'
Leon Hwang says:

====================
bpf, x64: Fix tailcall infinite loop

This patch series fixes a tailcall infinite loop on x64.

From commit ebf7d1f508 ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b ("bpf: allow for tailcalls in BPF subprograms
for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

From commit 5b92a28aae ("bpf: Support attaching tracing BPF program
to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

1. FENTRY/FEXIT on a BPF subprogram.
2. A tailcall runs in the BPF subprogram.
3. The tailcall calls the subprogram's caller.

As a result, a tailcall infinite loop comes up. And the loop would halt
the machine.

As we know, in tail call context, the tail_call_cnt propagates by stack
and rax register between BPF subprograms. So do in trampolines.

How did I discover the bug?

From commit 7f6e4312e1 ("bpf: Limit caller's stack depth 256 for
subprogs with tailcalls"), the total stack size limits to around 8KiB.
Then, I write some bpf progs to validate the stack consuming, that are
tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf.

At that time, accidently, I made a tailcall loop. And then the loop halted
my VM. Without the loop, the bpf progs would consume over 8KiB stack size.
But the _stack-overflow_ did not halt my VM.

With bpf_printk(), I confirmed that the tailcall count limit did not work
expectedly. Next, read the code and fix it.

Thank Ilya Leoshkevich, this bug on s390x has been fixed.

Hopefully, this bug on arm64 will be fixed in near future.
====================

Link: https://lore.kernel.org/r/20230912150442.2009-1-hffilwlqm@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-12 13:06:12 -07:00
Leon Hwang
e13b5f2f3b selftests/bpf: Add testcases for tailcall infinite loop fixing
Add 4 test cases to confirm the tailcall infinite loop bug has been fixed.

Like tailcall_bpf2bpf cases, do fentry/fexit on the bpf2bpf, and then
check the final count result.

tools/testing/selftests/bpf/test_progs -t tailcalls
226/13  tailcalls/tailcall_bpf2bpf_fentry:OK
226/14  tailcalls/tailcall_bpf2bpf_fexit:OK
226/15  tailcalls/tailcall_bpf2bpf_fentry_fexit:OK
226/16  tailcalls/tailcall_bpf2bpf_fentry_entry:OK
226     tailcalls:OK
Summary: 1/16 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Link: https://lore.kernel.org/r/20230912150442.2009-4-hffilwlqm@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-12 13:06:12 -07:00
Leon Hwang
2b5dcb31a1 bpf, x64: Fix tailcall infinite loop
From commit ebf7d1f508 ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b ("bpf: allow for tailcalls in BPF subprograms
for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

From commit 5b92a28aae ("bpf: Support attaching tracing BPF program
to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

1. FENTRY/FEXIT on a BPF subprogram.
2. A tailcall runs in the BPF subprogram.
3. The tailcall calls the subprogram's caller.

As a result, a tailcall infinite loop comes up. And the loop would halt
the machine.

As we know, in tail call context, the tail_call_cnt propagates by stack
and rax register between BPF subprograms. So do in trampolines.

Fixes: ebf7d1f508 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Fixes: e411901c0b ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Link: https://lore.kernel.org/r/20230912150442.2009-3-hffilwlqm@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-12 13:06:12 -07:00
Leon Hwang
2bee9770f3 bpf, x64: Comment tail_call_cnt initialisation
Without understanding emit_prologue(), it is really hard to figure out
where does tail_call_cnt come from, even though searching tail_call_cnt
in the whole kernel repo.

By adding these comments, it is a little bit easier to understand
tail_call_cnt initialisation.

Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Link: https://lore.kernel.org/r/20230912150442.2009-2-hffilwlqm@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-12 13:06:12 -07:00
Leon Hwang
96daa98742 selftests/bpf: Correct map_fd to data_fd in tailcalls
Get and check data_fd. It should not check map_fd again.

Meanwhile, correct some 'return' to 'goto out'.

Thank the suggestion from Maciej in "bpf, x64: Fix tailcall infinite
loop"[0] discussions.

[0] https://lore.kernel.org/bpf/e496aef8-1f80-0f8e-dcdd-25a8c300319a@gmail.com/T/#m7d3b601066ba66400d436b7e7579b2df4a101033

Fixes: 79d49ba048 ("bpf, testing: Add various tail call test cases")
Fixes: 3b03791111 ("selftests/bpf: Add tailcall_bpf2bpf tests")
Fixes: 5e0b0a4c52 ("selftests/bpf: Test tail call counting with bpf2bpf and data on stack")
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230906154256.95461-1-hffilwlqm@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-11 15:28:24 -07:00
Denys Zagorui
ebc8484d0e bpftool: Fix -Wcast-qual warning
This cast was made by purpose for older libbpf where the
bpf_object_skeleton field is void * instead of const void *
to eliminate a warning (as i understand
-Wincompatible-pointer-types-discards-qualifiers) but this
cast introduces another warning (-Wcast-qual) for libbpf
where data field is const void *

It makes sense for bpftool to be in sync with libbpf from
kernel sources

Signed-off-by: Denys Zagorui <dzagorui@cisco.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20230907090210.968612-1-dzagorui@cisco.com
2023-09-08 17:04:24 -07:00
Andrii Nakryiko
dbbe15859b Merge branch 'selftests/bpf: Optimize kallsyms cache'
Rong Tao says:

====================
We need to optimize the kallsyms cache, including optimizations for the
number of symbols limit, and, some test cases add new kernel symbols
(such as testmods) and we need to refresh kallsyms (reload or refresh).
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-09-08 17:04:14 -07:00
Rong Tao
a28b1ba259 selftests/bpf: trace_helpers.c: Add a global ksyms initialization mutex
As Jirka said [0], we just need to make sure that global ksyms
initialization won't race.

[0] https://lore.kernel.org/lkml/ZPCbAs3ItjRd8XVh@krava/

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/tencent_5D0A837E219E2CFDCB0495DAD7D5D1204407@qq.com
2023-09-08 16:22:41 -07:00
Rong Tao
c698eaebdf selftests/bpf: trace_helpers.c: Optimize kallsyms cache
Static ksyms often have problems because the number of symbols exceeds the
MAX_SYMS limit. Like changing the MAX_SYMS from 300000 to 400000 in
commit e76a014334a6("selftests/bpf: Bump and validate MAX_SYMS") solves
the problem somewhat, but it's not the perfect way.

This commit uses dynamic memory allocation, which completely solves the
problem caused by the limitation of the number of kallsyms. At the same
time, add APIs:

    load_kallsyms_local()
    ksym_search_local()
    ksym_get_addr_local()
    free_kallsyms_local()

There are used to solve the problem of selftests/bpf updating kallsyms
after attach new symbols during testmod testing.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/tencent_C9BDA68F9221F21BE4081566A55D66A9700A@qq.com
2023-09-08 16:22:41 -07:00
Alexei Starovoitov
9bc869253d Merge branch 'bpf-task_group_seq_get_next-misc-cleanups'
Oleg Nesterov says:

====================
bpf: task_group_seq_get_next: misc cleanups

Yonghong,

I am resending 1-5 of 6 as you suggested with your acks included.

The next (final) patch will change this code to use __next_thread when

	https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/

is merged.

Oleg.
====================

Link: https://lore.kernel.org/r/20230905154612.GA24872@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:19 -07:00
Alexei Starovoitov
35897c3c52 Merge branch 'bpf-enable-irq-after-irq_work_raise-completes'
Hou Tao says:

====================
bpf: Enable IRQ after irq_work_raise() completes

From: Hou Tao <houtao1@huawei.com>

Hi,

The patchset aims to fix the problem that bpf_mem_alloc() may return
NULL unexpectedly when multiple bpf_mem_alloc() are invoked concurrently
under process context and there is still free memory available. The
problem was found when doing stress test for qp-trie but the same
problem also exists for bpf_obj_new() as demonstrated in patch #3.

As pointed out by Alexei, the patchset can only fix ENOMEM problem for
normal process context and can not fix the problem for irq-disabled
context or RT-enabled kernel.

Patch #1 fixes the race between unit_alloc() and unit_alloc(). Patch #2
fixes the race between unit_alloc() and unit_free(). And patch #3 adds
a selftest for the problem. The major change compared with v1 is using
local_irq_{save,restore)() pair to disable and enable preemption
instead of preempt_{disable,enable}_notrace pair. The main reason is to
prevent potential overhead from __preempt_schedule_notrace(). I also
run htab_mem benchmark and hash_map_perf on a 8-CPUs KVM VM to compare
the performance between local_irq_{save,restore} and
preempt_{disable,enable}_notrace(), but the results are similar as shown
below:

(1) use preempt_{disable,enable}_notrace()

[root@hello bpf]# ./map_perf_test 4 8
0:hash_map_perf kmalloc 652179 events per sec
1:hash_map_perf kmalloc 651880 events per sec
2:hash_map_perf kmalloc 651382 events per sec
3:hash_map_perf kmalloc 650791 events per sec
5:hash_map_perf kmalloc 650140 events per sec
6:hash_map_perf kmalloc 652773 events per sec
7:hash_map_perf kmalloc 652751 events per sec
4:hash_map_perf kmalloc 648199 events per sec

[root@hello bpf]# ./benchs/run_bench_htab_mem.sh
normal bpf ma
=============
overwrite            per-prod-op: 110.82 ± 0.02k/s, avg mem: 2.00 ± 0.00MiB, peak mem: 2.73MiB
batch_add_batch_del  per-prod-op: 89.79 ± 0.75k/s, avg mem: 1.68 ± 0.38MiB, peak mem: 2.73MiB
add_del_on_diff_cpu  per-prod-op: 17.83 ± 0.07k/s, avg mem: 25.68 ± 2.92MiB, peak mem: 35.10MiB

(2) use local_irq_{save,restore}

[root@hello bpf]# ./map_perf_test 4 8
0:hash_map_perf kmalloc 656299 events per sec
1:hash_map_perf kmalloc 656397 events per sec
2:hash_map_perf kmalloc 656046 events per sec
3:hash_map_perf kmalloc 655723 events per sec
5:hash_map_perf kmalloc 655221 events per sec
4:hash_map_perf kmalloc 654617 events per sec
6:hash_map_perf kmalloc 650269 events per sec
7:hash_map_perf kmalloc 653665 events per sec

[root@hello bpf]# ./benchs/run_bench_htab_mem.sh
normal bpf ma
=============
overwrite            per-prod-op: 116.10 ± 0.02k/s, avg mem: 2.00 ± 0.00MiB, peak mem: 2.74MiB
batch_add_batch_del  per-prod-op: 88.76 ± 0.61k/s, avg mem: 1.94 ± 0.33MiB, peak mem: 2.74MiB
add_del_on_diff_cpu  per-prod-op: 18.12 ± 0.08k/s, avg mem: 25.10 ± 2.70MiB, peak mem: 34.78MiB

As ususal comments are always welcome.

Change Log:
v2:
  * Use local_irq_save to disable preemption instead of using
    preempt_{disable,enable}_notrace pair to prevent potential overhead

v1: https://lore.kernel.org/bpf/20230822133807.3198625-1-houtao@huaweicloud.com/
====================

Link: https://lore.kernel.org/r/20230901111954.1804721-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:19 -07:00
Oleg Nesterov
780aa8dfcb bpf: task_group_seq_get_next: simplify the "next tid" logic
Kill saved_tid. It looks ugly to update *tid and then restore the
previous value if __task_pid_nr_ns() returns 0. Change this code
to update *tid and common->pid_visiting once before return.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230905154656.GA24950@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:19 -07:00
Hou Tao
29c11aa808 selftests/bpf: Test preemption between bpf_obj_new() and bpf_obj_drop()
The test case creates 4 threads and then pins these 4 threads in CPU 0.
These 4 threads will run different bpf program through
bpf_prog_test_run_opts() and these bpf program will use bpf_obj_new()
and bpf_obj_drop() to allocate and free local kptrs concurrently.

Under preemptible kernel, bpf_obj_new() and bpf_obj_drop() may preempt
each other, bpf_obj_new() may return NULL and the test will fail before
applying these fixes as shown below:

  test_preempted_bpf_ma_op:PASS:open_and_load 0 nsec
  test_preempted_bpf_ma_op:PASS:attach 0 nsec
  test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
  test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
  test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
  test_preempted_bpf_ma_op:PASS:no test prog 0 nsec
  test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
  test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
  test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
  test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec
  test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
  test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
  test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
  test_preempted_bpf_ma_op:PASS:run prog err 0 nsec
  test_preempted_bpf_ma_op:FAIL:ENOMEM unexpected ENOMEM: got TRUE
  #168     preempted_bpf_ma_op:FAIL
  Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230901111954.1804721-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:19 -07:00
Oleg Nesterov
0ee9808b0a bpf: task_group_seq_get_next: kill next_task
It only adds the unnecessary confusion and compicates the "retry" code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230905154654.GA24945@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:19 -07:00
Hou Tao
62cf51cb0e bpf: Enable IRQ after irq_work_raise() completes in unit_free{_rcu}()
Both unit_free() and unit_free_rcu() invoke irq_work_raise() to free
freed objects back to slab and the invocation may also be preempted by
unit_alloc() and unit_alloc() may return NULL unexpectedly as shown in
the following case:

task A         task B

unit_free()
  // high_watermark = 48
  // free_cnt = 49 after free
  irq_work_raise()
    // mark irq work as IRQ_WORK_PENDING
    irq_work_claim()

               // task B preempts task A
               unit_alloc()
                 // free_cnt = 48 after alloc

               // does unit_alloc() 32-times
	       ......
	       // free_cnt = 16

	       unit_alloc()
	         // free_cnt = 15 after alloc
                 // irq work is already PENDING,
                 // so just return
                 irq_work_raise()

	       // does unit_alloc() 15-times
               ......
	       // free_cnt = 0

               unit_alloc()
                 // free_cnt = 0 before alloc
                 return NULL

Fix it by enabling IRQ after irq_work_raise() completes.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230901111954.1804721-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:19 -07:00
Oleg Nesterov
87abbf7a54 bpf: task_group_seq_get_next: fix the skip_if_dup_files check
Unless I am notally confused it is wrong. We are going to return or
skip next_task so we need to check next_task-files, not task->files.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230905154651.GA24940@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:19 -07:00
Oleg Nesterov
4981921350 bpf: task_group_seq_get_next: cleanup the usage of get/put_task_struct
get_pid_task() makes no sense, the code does put_task_struct() soon after.
Use find_task_by_pid_ns() instead of find_pid_ns + get_pid_task and kill
put_task_struct(), this allows to do get_task_struct() only once before
return.

While at it, kill the unnecessary "if (!pid)" check in the "if (!*tid)"
block, this matches the next usage of find_pid_ns() + get_pid_task() in
this function.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230905154649.GA24935@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:19 -07:00
Oleg Nesterov
1a00ef57d9 bpf: task_group_seq_get_next: cleanup the usage of next_thread()
1. find_pid_ns() + get_pid_task() under rcu_read_lock() guarantees that we
   can safely iterate the task->thread_group list. Even if this task exits
   right after get_pid_task() (or goto retry) and pid_alive() returns 0.

   Kill the unnecessary pid_alive() check.

2. next_thread() simply can't return NULL, kill the bogus "if (!next_task)"
   check.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230905154646.GA24928@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:19 -07:00
Alexei Starovoitov
1e4a6d975e Merge branch 'bpf-add-support-for-local-percpu-kptr'
Yonghong Song says:

====================
bpf: Add support for local percpu kptr

Patch set [1] implemented cgroup local storage BPF_MAP_TYPE_CGRP_STORAGE
similar to sk/task/inode local storage and old BPF_MAP_TYPE_CGROUP_STORAGE
map is marked as deprecated since old BPF_MAP_TYPE_CGROUP_STORAGE map can
only work with current cgroup.

Similarly, the existing BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map
is a percpu version of BPF_MAP_TYPE_CGROUP_STORAGE and only works
with current cgroup. But there is no replacement which can work
with arbitrary cgroup.

This patch set solved this problem but adding support for local
percpu kptr. The map value can have a percpu kptr field which holds
a bpf prog allocated percpu data. The below is an example,

  struct percpu_val_t {
    ... fields ...
  }

  struct map_value_t {
    struct percpu_val_t __percpu_kptr *percpu_data_ptr;
  }

In the above, 'map_value_t' is the map value type for a
BPF_MAP_TYPE_CGRP_STORAGE map. User can access 'percpu_data_ptr'
and then read/write percpu data. This covers BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
and more. So BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map type
is marked as deprecated.

In additional, local percpu kptr supports the same map type
as other kptrs including hash, lru_hash, array, sk/inode/task/cgrp
local storage. Currently, percpu data structure does not support
non-scalars or special fields (e.g., bpf_spin_lock, bpf_rb_root, etc.).
They can be supported in the future if there exist use cases.

Please for individual patches for details.

  [1] https://lore.kernel.org/all/20221026042835.672317-1-yhs@fb.com/

Changelog:
  v2 -> v3:
    - fix libbpf_str test failure.
  v1 -> v2:
    - does not support special fields in percpu data structure.
    - rename __percpu attr to __percpu_kptr attr.
    - rename BPF_KPTR_PERCPU_REF to BPF_KPTR_PERCPU.
    - better code to handle bpf_{this,per}_cpu_ptr() helpers.
    - add more negative tests.
    - fix a bpftool related test failure.
====================

Link: https://lore.kernel.org/r/20230827152729.1995219-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:18 -07:00
Hou Tao
566f6de3ce bpf: Enable IRQ after irq_work_raise() completes in unit_alloc()
When doing stress test for qp-trie, bpf_mem_alloc() returned NULL
unexpectedly because all qp-trie operations were initiated from
bpf syscalls and there was still available free memory. bpf_obj_new()
has the same problem as shown by the following selftest.

The failure is due to the preemption. irq_work_raise() will invoke
irq_work_claim() first to mark the irq work as pending and then inovke
__irq_work_queue_local() to raise an IPI. So when the current task
which is invoking irq_work_raise() is preempted by other task,
unit_alloc() may return NULL for preemption task as shown below:

task A         task B

unit_alloc()
  // low_watermark = 32
  // free_cnt = 31 after alloc
  irq_work_raise()
    // mark irq work as IRQ_WORK_PENDING
    irq_work_claim()

	       // task B preempts task A
	       unit_alloc()
	         // free_cnt = 30 after alloc
	         // irq work is already PENDING,
	         // so just return
	         irq_work_raise()
	       // does unit_alloc() 30-times
	       ......
	       unit_alloc()
	         // free_cnt = 0 before alloc
	         return NULL

Fix it by enabling IRQ after irq_work_raise() completes. An alternative
fix is using preempt_{disable|enable}_notrace() pair, but it may have
extra overhead. Another feasible fix is to only disable preemption or
IRQ before invoking irq_work_queue() and enable preemption or IRQ after
the invocation completes, but it can't handle the case when
c->low_watermark is 1.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230901111954.1804721-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:18 -07:00
Yonghong Song
9bc95a95ab bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated
Now 'BPF_MAP_TYPE_CGRP_STORAGE + local percpu ptr'
can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality
and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated.
Also make changes in selftests/bpf/test_bpftool_synctypes.py
and selftest libbpf_str to fix otherwise test errors.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230827152837.2003563-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:18 -07:00
Yonghong Song
1bd7931728 selftests/bpf: Add some negative tests
Add a few negative tests for common mistakes with using percpu kptr
including:
  - store to percpu kptr.
  - type mistach in bpf_kptr_xchg arguments.
  - sleepable prog with untrusted arg for bpf_this_cpu_ptr().
  - bpf_percpu_obj_new && bpf_obj_drop, and bpf_obj_new && bpf_percpu_obj_drop
  - struct with ptr for bpf_percpu_obj_new
  - struct with special field (e.g., bpf_spin_lock) for bpf_percpu_obj_new

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230827152832.2002421-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-08 08:42:18 -07:00