linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-27 12:57:53 +00:00

Author	SHA1	Message	Date
Puranjay Mohan	daabb2b098	bpf/tests: add tests for cpuv4 instructions The BPF JITs now support cpuv4 instructions. Add tests for these new instructions to the test suite: 1. Sign extended Load 2. Sign extended Mov 3. Unconditional byte swap 4. Unconditional jump with 32-bit offset 5. Signed division and modulo Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20230907230550.1417590-9-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-15 17:16:57 -07:00
Puranjay Mohan	59ff6d63b7	selftest, bpf: enable cpu v4 tests for arm32 Now that all the cpuv4 instructions are supported by the arm32 JIT, enable the selftests for arm32. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20230907230550.1417590-8-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-15 17:16:56 -07:00
Puranjay Mohan	71086041c2	arm32, bpf: add support for 64 bit division instruction ARM32 doesn't have instructions to do 64-bit/64-bit divisions. So, to implement the following instructions: BPF_ALU64 \| BPF_DIV BPF_ALU64 \| BPF_MOD BPF_ALU64 \| BPF_SDIV BPF_ALU64 \| BPF_SMOD We implement the above instructions by doing function calls to div64_u64() and div64_u64_rem() for unsigned division/mod and calls to div64_s64() for signed division/mod. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-7-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-15 17:16:56 -07:00
Puranjay Mohan	5097faa559	arm32, bpf: add support for 32-bit signed division The cpuv4 added a new BPF_SDIV instruction that does signed division. The encoding is similar to BPF_DIV but BPF_SDIV sets offset=1. ARM32 already supports 32-bit BPF_DIV which can be easily extended to support BPF_SDIV as ARM32 has the SDIV instruction. When the CPU is not ARM-v7, we implement that SDIV/SMOD with the function call similar to the implementation of DIV/MOD. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-6-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-15 17:16:56 -07:00
Puranjay Mohan	1cfb7eaebe	arm32, bpf: add support for unconditional bswap instruction The cpuv4 added a new unconditional bswap instruction with following behaviour: BPF_ALU64 \| BPF_TO_LE \| BPF_END with imm = 16/32/64 means: dst = bswap16(dst) dst = bswap32(dst) dst = bswap64(dst) As we already support converting to big-endian from little-endian we can use the same for unconditional bswap. just treat the unconditional scenario the same as big-endian conversion. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-5-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-15 17:16:56 -07:00
Puranjay Mohan	fc832653fa	arm32, bpf: add support for sign-extension mov instruction The cpuv4 added a new BPF_MOVSX instruction that sign extends the src before moving it to the destination. BPF_ALU \| BPF_MOVSX sign extends 8-bit and 16-bit operands into 32-bit operands, and zeroes the remaining upper 32 bits. BPF_ALU64 \| BPF_MOVSX sign extends 8-bit, 16-bit, and 32-bit operands into 64-bit operands. The offset field of the instruction is used to tell the number of bit to use for sign-extension. BPF_MOV and BPF_MOVSX have the same code but the former sets offset to 0 and the later one sets the offset to 8, 16 or 32 The behaviour of this instruction is dst = (s8,s16,s32)src On ARM32 the implementation uses LSH and ARSH to extend the 8/16 bits to a 32-bit register and then it is sign extended to the upper 32-bit register using ARSH. For 32-bit we just move it to the destination register and use ARSH to extend it to the upper 32-bit register. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-4-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-15 17:16:56 -07:00
Puranjay Mohan	f9e6981b1f	arm32, bpf: add support for sign-extension load instruction The cpuv4 added the support of an instruction that is similar to load but also sign-extends the result after the load. BPF_MEMSX \| <size> \| BPF_LDX means dst = (signed size ) (src + offset) here <size> can be one of BPF_B, BPF_H, BPF_W. ARM32 has instructions to load a byte or a half word with sign extension into a 32bit register. As the JIT uses two 32 bit registers to simulate a 64-bit BPF register, an extra instruction is emitted to sign-extent the result up to the second register. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-3-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-15 17:16:56 -07:00
Puranjay Mohan	471f3d4ee4	arm32, bpf: add support for 32-bit offset jmp instruction The cpuv4 adds unconditional jump with 32-bit offset where the immediate field of the instruction is to be used to calculate the jump offset. BPF_JA \| BPF_K \| BPF_JMP32 => gotol +imm => PC += imm. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-2-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-15 17:16:56 -07:00
Larysa Zaremba	9b2b86332a	bpf: Allow to use kfunc XDP hints and frags together There is no fundamental reason, why multi-buffer XDP and XDP kfunc RX hints cannot coexist in a single program. Allow those features to be used together by modifying the flags condition for dev-bound-only programs, segments are still prohibited for fully offloaded programs, hence additional check. Suggested-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/CAKH8qBuzgtJj=OKMdsxEkyML36VsAuZpcrsXcyqjdKXSJCBq=Q@mail.gmail.com/ Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230915083914.65538-1-larysa.zaremba@intel.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-09-15 11:39:46 -07:00
Martin KaFai Lau	45ee73a072	Merge branch 'bpf: expose information about netdev xdp-metadata kfunc support' Stanislav Fomichev says: ==================== Extend netdev netlink family to expose the bitmask with the kfuncs that the device implements. The source of truth is the device's xdp_metadata_ops. There is some amount of auto-generated netlink boilerplate; the change itself is super minimal. v2: - add netdev->xdp_metadata_ops NULL check when dumping to netlink (Martin) Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-09-15 11:26:58 -07:00
Stanislav Fomichev	0c6c9b105e	tools: ynl: extend netdev sample to dump xdp-rx-metadata-features The tool can be used to verify that everything works end to end. Unrelated updates: - include tools/include/uapi to pick the latest kernel uapi headers - print "xdp-features" and "xdp-rx-metadata-features" so it's clear which bitmask is being dumped Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-4-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-09-15 11:26:58 -07:00
Stanislav Fomichev	a9c2a60854	bpf: expose information about supported xdp metadata kfunc Add new xdp-rx-metadata-features member to netdev netlink which exports a bitmask of supported kfuncs. Most of the patch is autogenerated (headers), the only relevant part is netdev.yaml and the changes in netdev-genl.c to marshal into netlink. Example output on veth: $ ip link add veth0 type veth peer name veth1 # ifndex == 12 $ ./tools/net/ynl/samples/netdev 12 Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12 veth1[12] xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0 Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-09-15 11:26:58 -07:00
Stanislav Fomichev	fc45c5b642	bpf: make it easier to add new metadata kfunc No functional changes. Instead of having hand-crafted code in bpf_dev_bound_resolve_kfunc, move kfunc <> xmo handler relationship into XDP_METADATA_KFUNC_xxx. This way, any time new kfunc is added, we don't have to touch bpf_dev_bound_resolve_kfunc. Also document XDP_METADATA_KFUNC_xxx arguments since we now have more than two and it might be confusing what is what. Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-2-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-09-15 11:26:58 -07:00
Tirthendu Sarkar	d609f3d228	xsk: add multi-buffer support for sockets sharing umem Userspace applications indicate their multi-buffer capability to xsk using XSK_USE_SG socket bind flag. For sockets using shared umem the bind flag may contain XSK_USE_SG only for the first socket. For any subsequent socket the only option supported is XDP_SHARED_UMEM. Add option XDP_UMEM_SG_FLAG in umem config flags to store the multi-buffer handling capability when indicated by XSK_USE_SG option in bing flag by the first socket. Use this to derive multi-buffer capability for subsequent sockets in xsk core. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Fixes: `81470b5c3c` ("xsk: introduce XSK_USE_SG bind flag for xsk socket") Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230907035032.2627879-1-tirthendu.sarkar@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-15 11:00:22 -07:00
Song Liu	5c04433daf	bpf: Charge modmem for struct_ops trampoline Current code charges modmem for regular trampoline, but not for struct_ops trampoline. Add bpf_jit_[charge\|uncharge]_modmem() to struct_ops so the trampoline is charged in both cases. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230914222542.2986059-1-song@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-09-14 15:30:45 -07:00
Artem Savkov	971f7c3214	selftests/bpf: Skip module_fentry_shadow test when bpf_testmod is not available This test relies on bpf_testmod, so skip it if the module is not available. Fixes: `aa3d65de4b` ("bpf/selftests: Test fentry attachment to shadowed functions") Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230914124928.340701-1-asavkov@redhat.com	2023-09-14 11:16:13 -07:00
Alexei Starovoitov	8fa193412b	Merge branch 'seltests-xsk-various-improvements-to-xskxceiver' Magnus Karlsson says: ==================== seltests/xsk: various improvements to xskxceiver This patch set implements several improvements to the xsk selftests test suite that I thought were useful while debugging the xsk multi-buffer code and tests. The largest new feature is the ability to be able to execute a single test instead of the whole test suite. This required some surgery on the current code, details below. Anatomy of the path set: 1: Print useful info on a per packet basis with the option -v 2: Add a timeout in the transmission loop too. We only used to have one for the Rx thread, but Tx can lock up too waiting for completions. 3: Add an option (-m) to only run the tests (or a single test with a later patch) in a single mode: skb, drv, or zc (zero-copy). 4-5: Preparatory patches to be able to specify a test to run. Need to define the test names in a single structure and their entry points, so we can use this when wanting to run a specific test. 6: Adds a command line option (-l) that lists all the tests. 7: Adds a command line option (-t) that runs a specific test instead of the whole test suite. Can be combined with -m to specify a single mode too. 8: Use ksft_print_msg() uniformly throughout the tests. It was a mix of printf() and ksft_print_msg() before. 9: In some places, we failed the whole test suite instead of a single test in certain circumstances. Fix this so only the test in question is failed and the rest of the test suite continues. 10: Display the available command line options with -h v3 -> v4: * Fixed another spelling error in patch #9 [Maciej] * Only allow the actual strings for the -m command [Maciej] * Move some code from patch #7 to #3 [Maciej] v2 -> v3: * Drop the support for environment variables. Probably not useful. [Maciej] * Fixed spelling mistake in patch #9 [Maciej] * Fail gracefully if unsupported mode is chosen [Maciej] * Simplified test run loop [Maciej] v1 -> v2: * Introduce XSKTEST_MODE env variable to be able to set the mode to use [Przemyslaw] * Introduce XSKTEST_ETH env variable to be able to set the ethernet interface to use by introducing a new patch (#11) [Magnus] * Fixed spelling error in patch #5 [Przemyslaw, Maciej] * Fixed confusing documentation in patch #10 [Przemyslaw] * The -l option can now be used without being root [Magnus, Maciej] * Fixed documentation error in patch #7 [Maciej] * Added error handling to the -t option [Maciej] * -h now displayed as an option [Maciej] Thanks: Magnus ==================== Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-1-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-14 09:48:05 -07:00
Magnus Karlsson	4a5f0ba55f	selftests/xsk: display command line options with -h Add the -h option to display all available command line options available for test_xsk.sh and xskxceiver. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-11-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-14 09:47:56 -07:00
Magnus Karlsson	5fc494d5ab	selftests/xsk: fail single test instead of all tests In a number of places at en error, exit_with_error() is called that terminates the whole test suite. This is not always desirable as it would be more logical to only fail that test and then go along with the other ones. So change this in a number of places in which I thought it would be more logical to just fail the test in question. Examples of this are in code that is only used by a single test. Also delete a pointless if-statement in receive_pkts() that has an exit_with_error() in it. It can never occur since the return value is an unsigned and the test is for less than zero. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-10-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-14 09:47:56 -07:00
Magnus Karlsson	7c3fcf088b	selftests/xsk: use ksft_print_msg uniformly Use ksft_print_msg() instead of printf() and fprintf() in all places as the ksefltests framework is being used. There is only one exception and that is for the list-of-tests print out option, since no tests are run in that case. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-9-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-14 09:47:56 -07:00
Magnus Karlsson	146e30554a	selftests/xsk: add option to run single test Add a command line option to be able to run a single test. This option (-t) takes a number from the list of tests available with the "-l" option. Here are two examples: Run test number 2, the "receive single packet" test in all available modes: ./test_xsk.sh -t 2 Run test number 21, the metadata copy test in skb mode only ./test_xsh.sh -t 21 -m skb Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-8-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-14 09:47:56 -07:00
Magnus Karlsson	c53dab7d39	selftests/xsk: add option that lists all tests Add a command line option (-l) that lists all the tests. The number before the test will be used in the next commit for specifying a single test to run. Here is an example of the output: Tests: 0: SEND_RECEIVE 1: SEND_RECEIVE_2K_FRAME 2: SEND_RECEIVE_SINGLE_PKT 3: POLL_RX 4: POLL_TX 5: POLL_RXQ_FULL 6: POLL_TXQ_FULL 7: SEND_RECEIVE_UNALIGNED : : Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-7-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-14 09:47:55 -07:00
Magnus Karlsson	f20fbcd077	selftests/xsk: declare test names in struct Declare the test names statically in a struct so that we can refer to them when adding the support to execute a single test in the next commit. Before this patch, the names of them were not declared in a single place which made it not possible to refer to them. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-6-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-14 09:47:55 -07:00
Magnus Karlsson	13c341c450	selftests/xsk: move all tests to separate functions Prepare for the capability to be able to run a single test by moving all the tests to their own functions. This function can then be called to execute that test in the next commit. Also, the tests named RUN_TO_COMPLETION_* were not named well, so change them to SEND_RECEIVE_* as it is just a basic send and receive test of 4K packets. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-5-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-14 09:47:55 -07:00
Magnus Karlsson	3956bc34b6	selftests/xsk: add option to only run tests in a single mode Add an option -m on the command line that allows the user to run the tests in a single mode instead of all of them. Valid modes are skb, drv, and zc (zero-copy). An example: To run test suite in drv mode only: ./test_xsk.sh -m drv Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-4-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-14 09:47:55 -07:00
Magnus Karlsson	64370d7c8a	selftests/xsk: add timeout for Tx thread Add a timeout for the transmission thread. If packets are not completed properly, for some reason, the test harness would previously get stuck forever in a while loop. But with this patch, this timeout will trigger, flag the test as a failure, and continue with the next test. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-3-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-14 09:47:55 -07:00
Magnus Karlsson	2d2712caf4	selftests/xsk: print per packet info in verbose mode Print info about every packet in verbose mode, both for Tx and Rx. This is useful to have when a test fails or to validate that a test is really doing what it was designed to do. Info on what is supposed to be received and sent is also printed for the custom packet streams since they differ from the base line. Here is an example: Tx addr: 37e0 len: 64 options: 0 pkt_nb: 8 Tx addr: 4000 len: 64 options: 0 pkt_nb: 9 Rx: addr: 100 len: 64 options: 0 pkt_nb: 0 valid: 1 Rx: addr: 1100 len: 64 options: 0 pkt_nb: 1 valid: 1 Rx: addr: 2100 len: 64 options: 0 pkt_nb: 4 valid: 1 Rx: addr: 3100 len: 64 options: 0 pkt_nb: 8 valid: 1 Rx: addr: 4100 len: 64 options: 0 pkt_nb: 9 valid: 1 One pointless verbose print statement is also deleted and another one is made clearer. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230914084900.492-2-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-14 09:47:55 -07:00
Quan Tian	558c50cc3b	docs/bpf: update out-of-date doc in BPF flow dissector Commit `a5e2151ff9` ("net/ipv6: SKB symmetric hash should incorporate transport ports") removed the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL in __skb_get_hash_symmetric(), making the doc out-of-date. Signed-off-by: Quan Tian <qtian@vmware.com> Link: https://lore.kernel.org/r/20230911152353.8280-1-qtian@vmware.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-09-13 14:22:57 -07:00
Alexei Starovoitov	5bbb9e1f08	Merge branch 'bpf-x64-fix-tailcall-infinite-loop' Leon Hwang says: ==================== bpf, x64: Fix tailcall infinite loop This patch series fixes a tailcall infinite loop on x64. From commit `ebf7d1f508` ("bpf, x64: rework pro/epilogue and tailcall handling in JIT"), the tailcall on x64 works better than before. From commit `e411901c0b` ("bpf: allow for tailcalls in BPF subprograms for x64 JIT"), tailcall is able to run in BPF subprograms on x64. From commit `5b92a28aae` ("bpf: Support attaching tracing BPF program to other BPF programs"), BPF program is able to trace other BPF programs. How about combining them all together? 1. FENTRY/FEXIT on a BPF subprogram. 2. A tailcall runs in the BPF subprogram. 3. The tailcall calls the subprogram's caller. As a result, a tailcall infinite loop comes up. And the loop would halt the machine. As we know, in tail call context, the tail_call_cnt propagates by stack and rax register between BPF subprograms. So do in trampolines. How did I discover the bug? From commit `7f6e4312e1` ("bpf: Limit caller's stack depth 256 for subprogs with tailcalls"), the total stack size limits to around 8KiB. Then, I write some bpf progs to validate the stack consuming, that are tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf. At that time, accidently, I made a tailcall loop. And then the loop halted my VM. Without the loop, the bpf progs would consume over 8KiB stack size. But the _stack-overflow_ did not halt my VM. With bpf_printk(), I confirmed that the tailcall count limit did not work expectedly. Next, read the code and fix it. Thank Ilya Leoshkevich, this bug on s390x has been fixed. Hopefully, this bug on arm64 will be fixed in near future. ==================== Link: https://lore.kernel.org/r/20230912150442.2009-1-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-12 13:06:12 -07:00
Leon Hwang	e13b5f2f3b	selftests/bpf: Add testcases for tailcall infinite loop fixing Add 4 test cases to confirm the tailcall infinite loop bug has been fixed. Like tailcall_bpf2bpf cases, do fentry/fexit on the bpf2bpf, and then check the final count result. tools/testing/selftests/bpf/test_progs -t tailcalls 226/13 tailcalls/tailcall_bpf2bpf_fentry:OK 226/14 tailcalls/tailcall_bpf2bpf_fexit:OK 226/15 tailcalls/tailcall_bpf2bpf_fentry_fexit:OK 226/16 tailcalls/tailcall_bpf2bpf_fentry_entry:OK 226 tailcalls:OK Summary: 1/16 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20230912150442.2009-4-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-12 13:06:12 -07:00
Leon Hwang	2b5dcb31a1	bpf, x64: Fix tailcall infinite loop From commit `ebf7d1f508` ("bpf, x64: rework pro/epilogue and tailcall handling in JIT"), the tailcall on x64 works better than before. From commit `e411901c0b` ("bpf: allow for tailcalls in BPF subprograms for x64 JIT"), tailcall is able to run in BPF subprograms on x64. From commit `5b92a28aae` ("bpf: Support attaching tracing BPF program to other BPF programs"), BPF program is able to trace other BPF programs. How about combining them all together? 1. FENTRY/FEXIT on a BPF subprogram. 2. A tailcall runs in the BPF subprogram. 3. The tailcall calls the subprogram's caller. As a result, a tailcall infinite loop comes up. And the loop would halt the machine. As we know, in tail call context, the tail_call_cnt propagates by stack and rax register between BPF subprograms. So do in trampolines. Fixes: `ebf7d1f508` ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: `e411901c0b` ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20230912150442.2009-3-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-12 13:06:12 -07:00
Leon Hwang	2bee9770f3	bpf, x64: Comment tail_call_cnt initialisation Without understanding emit_prologue(), it is really hard to figure out where does tail_call_cnt come from, even though searching tail_call_cnt in the whole kernel repo. By adding these comments, it is a little bit easier to understand tail_call_cnt initialisation. Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20230912150442.2009-2-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-12 13:06:12 -07:00
Leon Hwang	96daa98742	selftests/bpf: Correct map_fd to data_fd in tailcalls Get and check data_fd. It should not check map_fd again. Meanwhile, correct some 'return' to 'goto out'. Thank the suggestion from Maciej in "bpf, x64: Fix tailcall infinite loop"[0] discussions. [0] https://lore.kernel.org/bpf/e496aef8-1f80-0f8e-dcdd-25a8c300319a@gmail.com/T/#m7d3b601066ba66400d436b7e7579b2df4a101033 Fixes: `79d49ba048` ("bpf, testing: Add various tail call test cases") Fixes: `3b03791111` ("selftests/bpf: Add tailcall_bpf2bpf tests") Fixes: `5e0b0a4c52` ("selftests/bpf: Test tail call counting with bpf2bpf and data on stack") Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230906154256.95461-1-hffilwlqm@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-09-11 15:28:24 -07:00
Denys Zagorui	ebc8484d0e	bpftool: Fix -Wcast-qual warning This cast was made by purpose for older libbpf where the bpf_object_skeleton field is void * instead of const void * to eliminate a warning (as i understand -Wincompatible-pointer-types-discards-qualifiers) but this cast introduces another warning (-Wcast-qual) for libbpf where data field is const void * It makes sense for bpftool to be in sync with libbpf from kernel sources Signed-off-by: Denys Zagorui <dzagorui@cisco.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20230907090210.968612-1-dzagorui@cisco.com	2023-09-08 17:04:24 -07:00
Andrii Nakryiko	dbbe15859b	Merge branch 'selftests/bpf: Optimize kallsyms cache' Rong Tao says: ==================== We need to optimize the kallsyms cache, including optimizations for the number of symbols limit, and, some test cases add new kernel symbols (such as testmods) and we need to refresh kallsyms (reload or refresh). ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2023-09-08 17:04:14 -07:00
Rong Tao	a28b1ba259	selftests/bpf: trace_helpers.c: Add a global ksyms initialization mutex As Jirka said [0], we just need to make sure that global ksyms initialization won't race. [0] https://lore.kernel.org/lkml/ZPCbAs3ItjRd8XVh@krava/ Signed-off-by: Rong Tao <rongtao@cestc.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/tencent_5D0A837E219E2CFDCB0495DAD7D5D1204407@qq.com	2023-09-08 16:22:41 -07:00
Rong Tao	c698eaebdf	selftests/bpf: trace_helpers.c: Optimize kallsyms cache Static ksyms often have problems because the number of symbols exceeds the MAX_SYMS limit. Like changing the MAX_SYMS from 300000 to 400000 in commit e76a014334a6("selftests/bpf: Bump and validate MAX_SYMS") solves the problem somewhat, but it's not the perfect way. This commit uses dynamic memory allocation, which completely solves the problem caused by the limitation of the number of kallsyms. At the same time, add APIs: load_kallsyms_local() ksym_search_local() ksym_get_addr_local() free_kallsyms_local() There are used to solve the problem of selftests/bpf updating kallsyms after attach new symbols during testmod testing. Signed-off-by: Rong Tao <rongtao@cestc.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/tencent_C9BDA68F9221F21BE4081566A55D66A9700A@qq.com	2023-09-08 16:22:41 -07:00
Alexei Starovoitov	9bc869253d	Merge branch 'bpf-task_group_seq_get_next-misc-cleanups' Oleg Nesterov says: ==================== bpf: task_group_seq_get_next: misc cleanups Yonghong, I am resending 1-5 of 6 as you suggested with your acks included. The next (final) patch will change this code to use __next_thread when https://lore.kernel.org/all/20230824143142.GA31222@redhat.com/ is merged. Oleg. ==================== Link: https://lore.kernel.org/r/20230905154612.GA24872@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:19 -07:00
Alexei Starovoitov	35897c3c52	Merge branch 'bpf-enable-irq-after-irq_work_raise-completes' Hou Tao says: ==================== bpf: Enable IRQ after irq_work_raise() completes From: Hou Tao <houtao1@huawei.com> Hi, The patchset aims to fix the problem that bpf_mem_alloc() may return NULL unexpectedly when multiple bpf_mem_alloc() are invoked concurrently under process context and there is still free memory available. The problem was found when doing stress test for qp-trie but the same problem also exists for bpf_obj_new() as demonstrated in patch #3. As pointed out by Alexei, the patchset can only fix ENOMEM problem for normal process context and can not fix the problem for irq-disabled context or RT-enabled kernel. Patch #1 fixes the race between unit_alloc() and unit_alloc(). Patch #2 fixes the race between unit_alloc() and unit_free(). And patch #3 adds a selftest for the problem. The major change compared with v1 is using local_irq_{save,restore)() pair to disable and enable preemption instead of preempt_{disable,enable}_notrace pair. The main reason is to prevent potential overhead from __preempt_schedule_notrace(). I also run htab_mem benchmark and hash_map_perf on a 8-CPUs KVM VM to compare the performance between local_irq_{save,restore} and preempt_{disable,enable}_notrace(), but the results are similar as shown below: (1) use preempt_{disable,enable}_notrace() [root@hello bpf]# ./map_perf_test 4 8 0:hash_map_perf kmalloc 652179 events per sec 1:hash_map_perf kmalloc 651880 events per sec 2:hash_map_perf kmalloc 651382 events per sec 3:hash_map_perf kmalloc 650791 events per sec 5:hash_map_perf kmalloc 650140 events per sec 6:hash_map_perf kmalloc 652773 events per sec 7:hash_map_perf kmalloc 652751 events per sec 4:hash_map_perf kmalloc 648199 events per sec [root@hello bpf]# ./benchs/run_bench_htab_mem.sh normal bpf ma ============= overwrite per-prod-op: 110.82 ± 0.02k/s, avg mem: 2.00 ± 0.00MiB, peak mem: 2.73MiB batch_add_batch_del per-prod-op: 89.79 ± 0.75k/s, avg mem: 1.68 ± 0.38MiB, peak mem: 2.73MiB add_del_on_diff_cpu per-prod-op: 17.83 ± 0.07k/s, avg mem: 25.68 ± 2.92MiB, peak mem: 35.10MiB (2) use local_irq_{save,restore} [root@hello bpf]# ./map_perf_test 4 8 0:hash_map_perf kmalloc 656299 events per sec 1:hash_map_perf kmalloc 656397 events per sec 2:hash_map_perf kmalloc 656046 events per sec 3:hash_map_perf kmalloc 655723 events per sec 5:hash_map_perf kmalloc 655221 events per sec 4:hash_map_perf kmalloc 654617 events per sec 6:hash_map_perf kmalloc 650269 events per sec 7:hash_map_perf kmalloc 653665 events per sec [root@hello bpf]# ./benchs/run_bench_htab_mem.sh normal bpf ma ============= overwrite per-prod-op: 116.10 ± 0.02k/s, avg mem: 2.00 ± 0.00MiB, peak mem: 2.74MiB batch_add_batch_del per-prod-op: 88.76 ± 0.61k/s, avg mem: 1.94 ± 0.33MiB, peak mem: 2.74MiB add_del_on_diff_cpu per-prod-op: 18.12 ± 0.08k/s, avg mem: 25.10 ± 2.70MiB, peak mem: 34.78MiB As ususal comments are always welcome. Change Log: v2: * Use local_irq_save to disable preemption instead of using preempt_{disable,enable}_notrace pair to prevent potential overhead v1: https://lore.kernel.org/bpf/20230822133807.3198625-1-houtao@huaweicloud.com/ ==================== Link: https://lore.kernel.org/r/20230901111954.1804721-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:19 -07:00
Oleg Nesterov	780aa8dfcb	bpf: task_group_seq_get_next: simplify the "next tid" logic Kill saved_tid. It looks ugly to update tid and then restore the previous value if __task_pid_nr_ns() returns 0. Change this code to update tid and common->pid_visiting once before return. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230905154656.GA24950@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:19 -07:00
Hou Tao	29c11aa808	selftests/bpf: Test preemption between bpf_obj_new() and bpf_obj_drop() The test case creates 4 threads and then pins these 4 threads in CPU 0. These 4 threads will run different bpf program through bpf_prog_test_run_opts() and these bpf program will use bpf_obj_new() and bpf_obj_drop() to allocate and free local kptrs concurrently. Under preemptible kernel, bpf_obj_new() and bpf_obj_drop() may preempt each other, bpf_obj_new() may return NULL and the test will fail before applying these fixes as shown below: test_preempted_bpf_ma_op:PASS:open_and_load 0 nsec test_preempted_bpf_ma_op:PASS:attach 0 nsec test_preempted_bpf_ma_op:PASS:no test prog 0 nsec test_preempted_bpf_ma_op:PASS:no test prog 0 nsec test_preempted_bpf_ma_op:PASS:no test prog 0 nsec test_preempted_bpf_ma_op:PASS:no test prog 0 nsec test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec test_preempted_bpf_ma_op:PASS:pthread_create 0 nsec test_preempted_bpf_ma_op:PASS:run prog err 0 nsec test_preempted_bpf_ma_op:PASS:run prog err 0 nsec test_preempted_bpf_ma_op:PASS:run prog err 0 nsec test_preempted_bpf_ma_op:PASS:run prog err 0 nsec test_preempted_bpf_ma_op:FAIL:ENOMEM unexpected ENOMEM: got TRUE #168 preempted_bpf_ma_op:FAIL Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230901111954.1804721-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:19 -07:00
Oleg Nesterov	0ee9808b0a	bpf: task_group_seq_get_next: kill next_task It only adds the unnecessary confusion and compicates the "retry" code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230905154654.GA24945@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:19 -07:00
Hou Tao	62cf51cb0e	bpf: Enable IRQ after irq_work_raise() completes in unit_free{_rcu}() Both unit_free() and unit_free_rcu() invoke irq_work_raise() to free freed objects back to slab and the invocation may also be preempted by unit_alloc() and unit_alloc() may return NULL unexpectedly as shown in the following case: task A task B unit_free() // high_watermark = 48 // free_cnt = 49 after free irq_work_raise() // mark irq work as IRQ_WORK_PENDING irq_work_claim() // task B preempts task A unit_alloc() // free_cnt = 48 after alloc // does unit_alloc() 32-times ...... // free_cnt = 16 unit_alloc() // free_cnt = 15 after alloc // irq work is already PENDING, // so just return irq_work_raise() // does unit_alloc() 15-times ...... // free_cnt = 0 unit_alloc() // free_cnt = 0 before alloc return NULL Fix it by enabling IRQ after irq_work_raise() completes. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230901111954.1804721-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:19 -07:00
Oleg Nesterov	87abbf7a54	bpf: task_group_seq_get_next: fix the skip_if_dup_files check Unless I am notally confused it is wrong. We are going to return or skip next_task so we need to check next_task-files, not task->files. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230905154651.GA24940@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:19 -07:00
Oleg Nesterov	4981921350	bpf: task_group_seq_get_next: cleanup the usage of get/put_task_struct get_pid_task() makes no sense, the code does put_task_struct() soon after. Use find_task_by_pid_ns() instead of find_pid_ns + get_pid_task and kill put_task_struct(), this allows to do get_task_struct() only once before return. While at it, kill the unnecessary "if (!pid)" check in the "if (!*tid)" block, this matches the next usage of find_pid_ns() + get_pid_task() in this function. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230905154649.GA24935@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:19 -07:00
Oleg Nesterov	1a00ef57d9	bpf: task_group_seq_get_next: cleanup the usage of next_thread() 1. find_pid_ns() + get_pid_task() under rcu_read_lock() guarantees that we can safely iterate the task->thread_group list. Even if this task exits right after get_pid_task() (or goto retry) and pid_alive() returns 0. Kill the unnecessary pid_alive() check. 2. next_thread() simply can't return NULL, kill the bogus "if (!next_task)" check. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230905154646.GA24928@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:19 -07:00
Alexei Starovoitov	1e4a6d975e	Merge branch 'bpf-add-support-for-local-percpu-kptr' Yonghong Song says: ==================== bpf: Add support for local percpu kptr Patch set [1] implemented cgroup local storage BPF_MAP_TYPE_CGRP_STORAGE similar to sk/task/inode local storage and old BPF_MAP_TYPE_CGROUP_STORAGE map is marked as deprecated since old BPF_MAP_TYPE_CGROUP_STORAGE map can only work with current cgroup. Similarly, the existing BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map is a percpu version of BPF_MAP_TYPE_CGROUP_STORAGE and only works with current cgroup. But there is no replacement which can work with arbitrary cgroup. This patch set solved this problem but adding support for local percpu kptr. The map value can have a percpu kptr field which holds a bpf prog allocated percpu data. The below is an example, struct percpu_val_t { ... fields ... } struct map_value_t { struct percpu_val_t __percpu_kptr *percpu_data_ptr; } In the above, 'map_value_t' is the map value type for a BPF_MAP_TYPE_CGRP_STORAGE map. User can access 'percpu_data_ptr' and then read/write percpu data. This covers BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE and more. So BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map type is marked as deprecated. In additional, local percpu kptr supports the same map type as other kptrs including hash, lru_hash, array, sk/inode/task/cgrp local storage. Currently, percpu data structure does not support non-scalars or special fields (e.g., bpf_spin_lock, bpf_rb_root, etc.). They can be supported in the future if there exist use cases. Please for individual patches for details. [1] https://lore.kernel.org/all/20221026042835.672317-1-yhs@fb.com/ Changelog: v2 -> v3: - fix libbpf_str test failure. v1 -> v2: - does not support special fields in percpu data structure. - rename __percpu attr to __percpu_kptr attr. - rename BPF_KPTR_PERCPU_REF to BPF_KPTR_PERCPU. - better code to handle bpf_{this,per}_cpu_ptr() helpers. - add more negative tests. - fix a bpftool related test failure. ==================== Link: https://lore.kernel.org/r/20230827152729.1995219-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:18 -07:00
Hou Tao	566f6de3ce	bpf: Enable IRQ after irq_work_raise() completes in unit_alloc() When doing stress test for qp-trie, bpf_mem_alloc() returned NULL unexpectedly because all qp-trie operations were initiated from bpf syscalls and there was still available free memory. bpf_obj_new() has the same problem as shown by the following selftest. The failure is due to the preemption. irq_work_raise() will invoke irq_work_claim() first to mark the irq work as pending and then inovke __irq_work_queue_local() to raise an IPI. So when the current task which is invoking irq_work_raise() is preempted by other task, unit_alloc() may return NULL for preemption task as shown below: task A task B unit_alloc() // low_watermark = 32 // free_cnt = 31 after alloc irq_work_raise() // mark irq work as IRQ_WORK_PENDING irq_work_claim() // task B preempts task A unit_alloc() // free_cnt = 30 after alloc // irq work is already PENDING, // so just return irq_work_raise() // does unit_alloc() 30-times ...... unit_alloc() // free_cnt = 0 before alloc return NULL Fix it by enabling IRQ after irq_work_raise() completes. An alternative fix is using preempt_{disable\|enable}_notrace() pair, but it may have extra overhead. Another feasible fix is to only disable preemption or IRQ before invoking irq_work_queue() and enable preemption or IRQ after the invocation completes, but it can't handle the case when c->low_watermark is 1. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230901111954.1804721-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:18 -07:00
Yonghong Song	9bc95a95ab	bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Now 'BPF_MAP_TYPE_CGRP_STORAGE + local percpu ptr' can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated. Also make changes in selftests/bpf/test_bpftool_synctypes.py and selftest libbpf_str to fix otherwise test errors. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230827152837.2003563-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:18 -07:00
Yonghong Song	1bd7931728	selftests/bpf: Add some negative tests Add a few negative tests for common mistakes with using percpu kptr including: - store to percpu kptr. - type mistach in bpf_kptr_xchg arguments. - sleepable prog with untrusted arg for bpf_this_cpu_ptr(). - bpf_percpu_obj_new && bpf_obj_drop, and bpf_obj_new && bpf_percpu_obj_drop - struct with ptr for bpf_percpu_obj_new - struct with special field (e.g., bpf_spin_lock) for bpf_percpu_obj_new Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230827152832.2002421-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-09-08 08:42:18 -07:00

1 2 3 4 5 ...

1214691 commits