Commit graph

1234882 commits

Author SHA1 Message Date
Andrii Nakryiko
f5fdb51fb9 bpf: fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS
It's quite confusing in practice when it's possible to successfully
create a BPF token from BPF FS that didn't have any of delegate_xxx
mount options set up. While it's not wrong, it's actually more
meaningful to reject BPF_TOKEN_CREATE with specific error code (-ENOENT)
to let user-space know that no token delegation is setup up.

So, instead of creating empty BPF token that will be always ignored
because it doesn't have any of the allow_xxx bits set, reject it with
-ENOENT. If we ever need empty BPF token to be possible, we can support
that with extra flag passed into BPF_TOKEN_CREATE.

Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231213190842.3844987-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 15:47:04 -08:00
Daniel Xu
f04f2ce601 bpf: selftests: Add verifier tests for CO-RE bitfield writes
Add some tests that exercise BPF_CORE_WRITE_BITFIELD() macro. Since some
non-trivial bit fiddling is going on, make sure various edge cases (such
as adjacent bitfields and bitfields at the edge of structs) are
exercised.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/72698a1080fa565f541d5654705255984ea2a029.1702325874.git.dxu@dxuuu.xyz
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-13 15:42:19 -08:00
Daniel Xu
7d19c00e9a bpf: selftests: test_loader: Support __btf_path() annotation
This commit adds support for per-prog btf_custom_path. This is necessary
for testing CO-RE relocations on non-vmlinux types using test_loader
infrastructure.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/660ea7f2fdbdd5103bc1af87c9fc931f05327926.1702325874.git.dxu@dxuuu.xyz
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-13 15:42:19 -08:00
Daniel Xu
2f70803532 libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
=== Motivation ===

Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
writing wrapper to make the verifier happy.

Two alternatives to this approach are:

1. Use the upcoming `preserve_static_offset` [0] attribute to disable
   CO-RE on specific structs.
2. Use broader byte-sized writes to write to bitfields.

(1) is a bit hard to use. It requires specific and not-very-obvious
annotations to bpftool generated vmlinux.h. It's also not generally
available in released LLVM versions yet.

(2) makes the code quite hard to read and write. And especially if
BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
to have an inverse helper for writing.

=== Implementation details ===

Since the logic is a bit non-obvious, I thought it would be helpful
to explain exactly what's going on.

To start, it helps by explaining what LSHIFT_U64 (lshift) and RSHIFT_U64
(rshift) is designed to mean. Consider the core of the
BPF_CORE_READ_BITFIELD() algorithm:

        val <<= __CORE_RELO(s, field, LSHIFT_U64);
        val = val >> __CORE_RELO(s, field, RSHIFT_U64);

Basically what happens is we lshift to clear the non-relevant (blank)
higher order bits. Then we rshift to bring the relevant bits (bitfield)
down to LSB position (while also clearing blank lower order bits). To
illustrate:

        Start:    ........XXX......
        Lshift:   XXX......00000000
        Rshift:   00000000000000XXX

where `.` means blank bit, `0` means 0 bit, and `X` means bitfield bit.

After the two operations, the bitfield is ready to be interpreted as a
regular integer.

Next, we want to build an alternative (but more helpful) mental model
on lshift and rshift. That is, to consider:

* rshift as the total number of blank bits in the u64
* lshift as number of blank bits left of the bitfield in the u64

Take a moment to consider why that is true by consulting the above
diagram.

With this insight, we can now define the following relationship:

              bitfield
                 _
                | |
        0.....00XXX0...00
        |      |   |    |
        |______|   |    |
         lshift    |    |
                   |____|
              (rshift - lshift)

That is, we know the number of higher order blank bits is just lshift.
And the number of lower order blank bits is (rshift - lshift).

Finally, we can examine the core of the write side algorithm:

        mask = (~0ULL << rshift) >> lshift;              // 1
        val = (val & ~mask) | ((nval << rpad) & mask);   // 2

1. Compute a mask where the set bits are the bitfield bits. The first
   left shift zeros out exactly the number of blank bits, leaving a
   bitfield sized set of 1s. The subsequent right shift inserts the
   correct amount of higher order blank bits.

2. On the left of the `|`, mask out the bitfield bits. This creates
   0s where the new bitfield bits will go. On the right of the `|`,
   bring nval into the correct bit position and mask out any bits
   that fall outside of the bitfield. Finally, by bor'ing the two
   halves, we get the final set of bits to write back.

[0]: https://reviews.llvm.org/D133361
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Jonathan Lemon <jlemon@aviatrix.com>
Signed-off-by: Jonathan Lemon <jlemon@aviatrix.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/4d3dd215a4fd57d980733886f9c11a45e1a9adf3.1702325874.git.dxu@dxuuu.xyz
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-13 15:42:19 -08:00
Jie Jiang
750e785796 bpf: Support uid and gid when mounting bpffs
Parse uid and gid in bpf_parse_param() so that they can be passed in as
the `data` parameter when mount() bpffs. This will be useful when we
want to control which user/group has the control to the mounted bpffs,
otherwise a separate chown() call will be needed.

Signed-off-by: Jie Jiang <jiejiang@chromium.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mike Frysinger <vapier@chromium.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231212093923.497838-1-jiejiang@chromium.org
2023-12-13 15:37:42 -08:00
Andrii Nakryiko
62d9a969f4 selftests/bpf: fix compiler warnings in RELEASE=1 mode
When compiling BPF selftests with RELEASE=1, we get two new
warnings, which are treated as errors. Fix them.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20231212225343.1723081-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 10:27:16 -08:00
YiFei Zhu
e1ba7f64b1 selftests/bpf: Relax time_tai test for equal timestamps in tai_forward
We're observing test flakiness on an arm64 platform which might not
have timestamps as precise as x86. The test log looks like:

  test_time_tai:PASS:tai_open 0 nsec
  test_time_tai:PASS:test_run 0 nsec
  test_time_tai:PASS:tai_ts1 0 nsec
  test_time_tai:PASS:tai_ts2 0 nsec
  test_time_tai:FAIL:tai_forward unexpected tai_forward: actual 1702348135471494160 <= expected 1702348135471494160
  test_time_tai:PASS:tai_gettime 0 nsec
  test_time_tai:PASS:tai_future_ts1 0 nsec
  test_time_tai:PASS:tai_future_ts2 0 nsec
  test_time_tai:PASS:tai_range_ts1 0 nsec
  test_time_tai:PASS:tai_range_ts2 0 nsec
  #199     time_tai:FAIL

This patch changes ASSERT_GT to ASSERT_GE in the tai_forward assertion
so that equal timestamps are permitted.

Fixes: 64e15820b9 ("selftests/bpf: Add BPF-helper test for CLOCK_TAI access")
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231212182911.3784108-1-zhuyifei@google.com
2023-12-12 15:53:34 -08:00
Andrei Matei
745e031130 bpf: Comment on check_mem_size_reg
This patch adds a comment to check_mem_size_reg -- a function whose
meaning is not very transparent. The function implicitly deals with two
registers connected by convention, which is not obvious.

Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231210225149.67639-1-andreimatei1@gmail.com
2023-12-12 15:35:32 -08:00
Yang Li
56c26d5ad8 bpf: Remove unused backtrack_state helper functions
The function are defined in the verifier.c file, but not called
elsewhere, so delete the unused function.

kernel/bpf/verifier.c:3448:20: warning: unused function 'bt_set_slot'
kernel/bpf/verifier.c:3453:20: warning: unused function 'bt_clear_slot'
kernel/bpf/verifier.c:3488:20: warning: unused function 'bt_is_slot_set'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231212005436.103829-1-yang.lee@linux.alibaba.com

Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7714
2023-12-12 09:52:07 -08:00
Manu Bretelle
f77d795618 selftests/bpf: Fixes tests for filesystem kfuncs
`fs_kfuncs.c`'s `test_xattr` would fail the test even when the
filesystem did not support xattr, for instance when /tmp is mounted as
tmpfs.

This change checks errno when setxattr fail. If the failure is due to
the operation being unsupported, we will skip the test (just like we
would if verity was not enabled on the FS.

Before the change, fs_kfuncs test would fail in test_axattr:

 $ vmtest -k $(make -s image_name) './tools/testing/selftests/bpf/test_progs -a fs_kfuncs'
 => bzImage
 ===> Booting
 [    0.000000] rcu:        RCU restricting CPUs from NR_CPUS=128 to
 nr_cpu_
 ===> Setting up VM
 ===> Running command
 [    4.157491] bpf_testmod: loading out-of-tree module taints kernel.
 [    4.161515] bpf_testmod: module verification failed: signature and/or
 required key missing - tainting kernel
 test_xattr:PASS:create_file 0 nsec
 test_xattr:FAIL:setxattr unexpected error: -1 (errno 95)
 #90/1    fs_kfuncs/xattr:FAIL
 #90/2    fs_kfuncs/fsverity:SKIP
 #90      fs_kfuncs:FAIL

 All error logs:
 test_xattr:PASS:create_file 0 nsec
 test_xattr:FAIL:setxattr unexpected error: -1 (errno 95)
 #90/1    fs_kfuncs/xattr:FAIL
 #90      fs_kfuncs:FAIL

 Summary: 0/0 PASSED, 1 SKIPPED, 1 FAILED

Test plan:

  $ touch tmpfs_file && truncate -s 1G tmpfs_file && mkfs.ext4 tmpfs_file
  # /tmp mounted as tmpfs
  $ vmtest -k $(make -s image_name) './tools/testing/selftests/bpf/test_progs -a fs_kfuncs'
  => bzImage
  ===> Booting
  ===> Setting up VM
  ===> Running command
  WARNING! Selftests relying on bpf_testmod.ko will be skipped.
  Can't find bpf_testmod.ko kernel module: -2
  #90/1    fs_kfuncs/xattr:SKIP
  #90/2    fs_kfuncs/fsverity:SKIP
  #90      fs_kfuncs:SKIP
  Summary: 1/0 PASSED, 2 SKIPPED, 0 FAILED
  # /tmp mounted as ext4 with xattr enabled but not verity
  $ vmtest -k $(make -s image_name) 'mount -o loop tmpfs_file /tmp && \
    /tools/testing/selftests/bpf/test_progs -a fs_kfuncs'
  => bzImage
  ===> Booting
  ===> Setting up VM
  ===> Running command
  [    4.067071] loop0: detected capacity change from 0 to 2097152
  [    4.191882] EXT4-fs (loop0): mounted filesystem
  407ffa36-4553-4c8c-8c78-134443630f69 r/w with ordered data mode. Quota
  mode: none.
  WARNING! Selftests relying on bpf_testmod.ko will be skipped.
  Can't find bpf_testmod.ko kernel module: -2
  #90/1    fs_kfuncs/xattr:OK
  #90/2    fs_kfuncs/fsverity:SKIP
  #90      fs_kfuncs:OK (SKIP: 1/2)
  Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED
  $ tune2fs -O verity tmpfs_file
  # /tmp as ext4 with both xattr and verity enabled
  $ vmtest -k $(make -s image_name) 'mount -o loop tmpfs_file /tmp && \
    ./tools/testing/selftests/bpf/test_progs -a fs_kfuncs'
  => bzImage
  ===> Booting
  ===> Setting up VM
  ===> Running command
  [    4.291434] loop0: detected capacity change from 0 to 2097152
  [    4.460828] EXT4-fs (loop0): recovery complete
  [    4.468631] EXT4-fs (loop0): mounted filesystem
  7b4a7b7f-c442-4b06-9ede-254e63cceb52 r/w with ordered data mode. Quota
  mode: none.
  [    4.988074] fs-verity: sha256 using implementation "sha256-generic"
  WARNING! Selftests relying on bpf_testmod.ko will be skipped.
  Can't find bpf_testmod.ko kernel module: -2
  #90/1    fs_kfuncs/xattr:OK
  #90/2    fs_kfuncs/fsverity:OK
  #90      fs_kfuncs:OK
  Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED

Fixes: 341f06fddd ("selftests/bpf: Add tests for filesystem kfuncs")
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20231211180733.763025-1-chantr4@gmail.com
2023-12-12 09:51:47 -08:00
Andrii Nakryiko
406a6fa44b bpf: use bitfields for simple per-subprog bool flags
We have a bunch of bool flags for each subprog. Instead of wasting bytes
for them, use bitfields instead.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231204233931.49758-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-11 19:23:37 -08:00
Andrii Nakryiko
1a1ad782dc bpf: tidy up exception callback management a bit
Use the fact that we are passing subprog index around and have
a corresponding struct bpf_subprog_info in bpf_verifier_env for each
subprogram. We don't need to separately pass around a flag whether
subprog is exception callback or not, each relevant verifier function
can determine this using provided subprog index if we maintain
bpf_subprog_info properly.

Also move out exception callback-specific logic from
btf_prepare_func_args(), keeping it generic. We can enforce all these
restriction right before exception callback verification pass. We add
out parameter, arg_cnt, for now, but this will be unnecessary with
subsequent refactoring and will be removed.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231204233931.49758-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-11 19:23:32 -08:00
Andrii Nakryiko
22b769bb4f bpf: emit more dynptr information in verifier log
Emit dynptr type for CONST_PTR_TO_DYNPTR register. Also emit id,
ref_obj_id, and dynptr_id fields for STACK_DYNPTR stack slots.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231204233931.49758-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-11 19:21:22 -08:00
Andrii Nakryiko
1e68485d82 bpf: log PTR_TO_MEM memory size in verifier log
Emit valid memory size addressable through PTR_TO_MEM register.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231204233931.49758-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-11 19:21:22 -08:00
Andrii Nakryiko
e72c1ccfd4 selftests/bpf: validate eliminated global subprog is not freplaceable
Add selftest that establishes dead code-eliminated valid global subprog
(global_dead) and makes sure that it's not possible to freplace it, as
it's effectively not there. This test will fail with unexpected success
before 2afae08c9d ("bpf: Validate global subprogs lazily").

v2->v3:
  - add missing err assignment (Alan);
  - undo unnecessary signature changes in verifier_global_subprogs.c (Eduard);
v1->v2:
  - don't rely on assembly output in verifier log, which changes between
    compiler versions (CI).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20231211174131.2324306-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-11 19:16:32 -08:00
Aleksander Lobakin
2ebe81c814 net, xdp: Allow metadata > 32
32 bytes may be not enough for some custom metadata. Relax the restriction,
allow metadata larger than 32 bytes and make __skb_metadata_differs() work
with bigger lengths.

Now size of metadata is only limited by the fact it is stored as u8 in
skb_shared_info, so maximum possible value is 255. Size still has to be
aligned to 4, so the actual upper limit becomes 252. Most driver
implementations will offer less, none can offer more.

Other important conditions, such as having enough space for xdp_frame
building, are already checked in bpf_xdp_adjust_meta().

Signed-off-by: Aleksander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/eb87653c-8ff8-447d-a7a1-25961f60518a@kernel.org
Link: https://lore.kernel.org/bpf/20231206205919.404415-3-larysa.zaremba@intel.com
2023-12-11 16:09:36 +01:00
Larysa Zaremba
15c79c6507 selftests/bpf: Increase invalid metadata size
Changed check expects passed data meta to be deemed invalid. After loosening
the requirement, the size of 36 bytes becomes valid. Therefore, increase
tested meta size to 256, so we do not get an unexpected success.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231206205919.404415-2-larysa.zaremba@intel.com
2023-12-11 16:09:24 +01:00
Alexei Starovoitov
5bcbdf72df Merge branch 'add-new-bpf_cpumask_weight-kfunc'
David Vernet says:

====================
Add new bpf_cpumask_weight() kfunc

It can be useful to query how many bits are set in a cpumask. For
example, if you want to perform special logic for the last remaining
core that's set in a mask. This logic is already exposed through the
main kernel's cpumask header as cpumask_weight(), so it would be useful
to add a new bpf_cpumask_weight() kfunc which wraps it and does the
same.

This patch series was built and tested on top of commit 2146f7fe6e
("Merge branch 'allocate-bpf-trampoline-on-bpf_prog_pack'").
====================

Link: https://lore.kernel.org/r/20231207210843.168466-1-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 21:37:33 -08:00
David Vernet
88f6047191 selftests/bpf: Add test for bpf_cpumask_weight() kfunc
The new bpf_cpumask_weight() kfunc can be used to count the number of
bits that are set in a struct cpumask* kptr. Let's add a selftest to
verify its behavior.

Signed-off-by: David Vernet <void@manifault.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231207210843.168466-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 21:37:33 -08:00
David Vernet
a6de18f310 bpf: Add bpf_cpumask_weight() kfunc
It can be useful to query how many bits are set in a cpumask. For
example, if you want to perform special logic for the last remaining
core that's set in a mask. Let's therefore add a new
bpf_cpumask_weight() kfunc which checks how many bits are set in a mask.

Signed-off-by: David Vernet <void@manifault.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231207210843.168466-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 21:37:33 -08:00
Tiezhu Yang
5181dc08f7 test_bpf: Rename second ALU64_SMOD_X to ALU64_SMOD_K
Currently, there are two test cases with same name
"ALU64_SMOD_X: -7 % 2 = -1", the first one is right,
the second one should be ALU64_SMOD_K because its
code is BPF_ALU64 | BPF_MOD | BPF_K.

Before:
test_bpf: #170 ALU64_SMOD_X: -7 % 2 = -1 jited:1 4 PASS
test_bpf: #171 ALU64_SMOD_X: -7 % 2 = -1 jited:1 4 PASS

After:
test_bpf: #170 ALU64_SMOD_X: -7 % 2 = -1 jited:1 4 PASS
test_bpf: #171 ALU64_SMOD_K: -7 % 2 = -1 jited:1 4 PASS

Fixes: daabb2b098 ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231207040851.19730-1-yangtiezhu@loongson.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 21:27:54 -08:00
Andrii Nakryiko
7d8ed51bcb selftests/bpf: validate fake register spill/fill precision backtracking logic
Add two tests validating that verifier's precision backtracking logic
handles BPF_ST_MEM instructions that produce fake register spill into
register slot. This is happening when non-zero constant is written
directly to a slot, e.g., *(u64 *)(r10 -8) = 123.

Add both full 64-bit register spill, as well as 32-bit "sub-spill".

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231209010958.66758-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 18:43:20 -08:00
Andrii Nakryiko
482d548d40 bpf: handle fake register spill to stack with BPF_ST_MEM instruction
When verifier validates BPF_ST_MEM instruction that stores known
constant to stack (e.g., *(u64 *)(r10 - 8) = 123), it effectively spills
a fake register with a constant (but initially imprecise) value to
a stack slot. Because read-side logic treats it as a proper register
fill from stack slot, we need to mark such stack slot initialization as
INSN_F_STACK_ACCESS instruction to stop precision backtracking from
missing it.

Fixes: 41f6f64e69 ("bpf: support non-r10 register spill/fill to/from stack in precision tracking")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231209010958.66758-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 18:43:20 -08:00
Alexei Starovoitov
8477fe1de9 Merge branch 'bpf-fixes-for-maybe_wait_bpf_programs'
Hou Tao says:

====================
The patch set aims to fix the problems found when inspecting the code
related with maybe_wait_bpf_programs().

Patch #1 removes unnecessary invocation of maybe_wait_bpf_programs().
Patch #2 calls maybe_wait_bpf_programs() only once for batched update.
Patch #3 adds the missed waiting when doing batched lookup_deletion on
htab of maps. Patch #4 does wait only if the update or deletion
operation succeeds. Patch #5 fixes the value of batch.count when memory
allocation fails.
====================

Link: https://lore.kernel.org/r/20231208102355.2628918-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 18:13:23 -08:00
Hou Tao
06e5c999f1 bpf: Set uattr->batch.count as zero before batched update or deletion
generic_map_{delete,update}_batch() doesn't set uattr->batch.count as
zero before it tries to allocate memory for key. If the memory
allocation fails, the value of uattr->batch.count will be incorrect.

Fix it by setting uattr->batch.count as zero beore batched update or
deletion.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231208102355.2628918-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 18:12:34 -08:00
Hou Tao
67ad2c73ff bpf: Only call maybe_wait_bpf_programs() when map operation succeeds
There is no need to call maybe_wait_bpf_programs() if update or deletion
operation fails. So only call maybe_wait_bpf_programs() if update or
deletion operation succeeds.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231208102355.2628918-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 18:12:34 -08:00
Hou Tao
012772581d bpf: Add missed maybe_wait_bpf_programs() for htab of maps
When doing batched lookup and deletion operations on htab of maps,
maybe_wait_bpf_programs() is needed to ensure all programs don't use the
inner map after the bpf syscall returns.

Instead of adding the wait in __htab_map_lookup_and_delete_batch(),
adding the wait in bpf_map_do_batch() and also removing the calling of
maybe_wait_bpf_programs() from generic_map_{delete,update}_batch().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231208102355.2628918-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 18:12:34 -08:00
Hou Tao
37ba5b59d6 bpf: Call maybe_wait_bpf_programs() only once for generic_map_update_batch()
Just like commit 9087c6ff8d ("bpf: Call maybe_wait_bpf_programs() only
once from generic_map_delete_batch()"), there is also no need to call
maybe_wait_bpf_programs() for each update in batched update, so only
call it once in generic_map_update_batch().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231208102355.2628918-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 18:12:33 -08:00
Hou Tao
c26f2a8901 bpf: Remove unnecessary wait from bpf_map_copy_value()
Both map_lookup_elem() and generic_map_lookup_batch() use
bpf_map_copy_value() to lookup and copy the value, and there is no
update operation in bpf_map_copy_value(), so just remove the invocation
of maybe_wait_bpf_programs() from it.

Fixes: 15c14a3dca ("bpf: Add bpf_map_{value_size, update_value, map_copy_value} functions")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231208102355.2628918-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-09 18:12:33 -08:00
Sergei Trofimovich
32fa058398 libbpf: Add pr_warn() for EINVAL cases in linker_sanity_check_elf
Before the change on `i686-linux` `systemd` build failed as:

    $ bpftool gen object src/core/bpf/socket_bind/socket-bind.bpf.o src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o
    Error: failed to link 'src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o': Invalid argument (22)

After the change it fails as:

    $ bpftool gen object src/core/bpf/socket_bind/socket-bind.bpf.o src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o
    libbpf: ELF section #9 has inconsistent alignment addr=8 != d=4 in src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o
    Error: failed to link 'src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o': Invalid argument (22)

Now it's slightly easier to figure out what is wrong with an ELF file.

Signed-off-by: Sergei Trofimovich <slyich@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231208215100.435876-1-slyich@gmail.com
2023-12-08 17:11:18 -08:00
Martin KaFai Lau
09115c33e6 Merge branch 'bpf: Expand bpf_cgrp_storage to support cgroup1 non-attach case'
Yafang Shao says:

====================
In the current cgroup1 environment, associating operations between a cgroup
and applications in a BPF program requires storing a mapping of cgroup_id
to application either in a hash map or maintaining it in userspace.
However, by enabling bpf_cgrp_storage for cgroup1, it becomes possible to
conveniently store application-specific information in cgroup-local storage
and utilize it within BPF programs. Furthermore, enabling this feature for
cgroup1 involves minor modifications for the non-attach case, streamlining
the process.

However, when it comes to enabling this functionality for the cgroup1
attach case, it presents challenges. Therefore, the decision is to focus on
enabling it solely for the cgroup1 non-attach case at present. If
attempting to attach to a cgroup1 fd, the operation will simply fail with
the error code -EBADF.

Changes:
- RFC -> v1:
  - Collect acked-by
  - Avoid unnecessary is_cgroup1 check (Yonghong)
  - Keep the code patterns consistent (Yonghong)
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-08 17:08:19 -08:00
Yafang Shao
a2c6380b17 selftests/bpf: Add selftests for cgroup1 local storage
Expanding the test coverage from cgroup2 to include cgroup1. The result
as follows,

Already existing test cases for cgroup2:
  #48/1    cgrp_local_storage/tp_btf:OK
  #48/2    cgrp_local_storage/attach_cgroup:OK
  #48/3    cgrp_local_storage/recursion:OK
  #48/4    cgrp_local_storage/negative:OK
  #48/5    cgrp_local_storage/cgroup_iter_sleepable:OK
  #48/6    cgrp_local_storage/yes_rcu_lock:OK
  #48/7    cgrp_local_storage/no_rcu_lock:OK

Expanded test cases for cgroup1:
  #48/8    cgrp_local_storage/cgrp1_tp_btf:OK
  #48/9    cgrp_local_storage/cgrp1_recursion:OK
  #48/10   cgrp_local_storage/cgrp1_negative:OK
  #48/11   cgrp_local_storage/cgrp1_iter_sleepable:OK
  #48/12   cgrp_local_storage/cgrp1_yes_rcu_lock:OK
  #48/13   cgrp_local_storage/cgrp1_no_rcu_lock:OK

Summary:
  #48      cgrp_local_storage:OK
  Summary: 1/13 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231206115326.4295-4-laoar.shao@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-08 17:08:18 -08:00
Yafang Shao
f4199271da selftests/bpf: Add a new cgroup helper open_classid()
This new helper allows us to obtain the fd of a net_cls cgroup, which will
be utilized in the subsequent patch.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231206115326.4295-3-laoar.shao@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-08 17:08:18 -08:00
Yafang Shao
73d9eb340d bpf: Enable bpf_cgrp_storage for cgroup1 non-attach case
In the current cgroup1 environment, associating operations between cgroups
and applications in a BPF program requires storing a mapping of cgroup_id
to application either in a hash map or maintaining it in userspace.
However, by enabling bpf_cgrp_storage for cgroup1, it becomes possible to
conveniently store application-specific information in cgroup-local storage
and utilize it within BPF programs. Furthermore, enabling this feature for
cgroup1 involves minor modifications for the non-attach case, streamlining
the process.

However, when it comes to enabling this functionality for the cgroup1
attach case, it presents challenges. Therefore, the decision is to focus on
enabling it solely for the cgroup1 non-attach case at present. If
attempting to attach to a cgroup1 fd, the operation will simply fail with
the error code -EBADF.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231206115326.4295-2-laoar.shao@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-12-08 17:08:18 -08:00
Andrii Nakryiko
1720c42b90 selftests/bpf: fix timer/test_bad_ret subtest on test_progs-cpuv4 flavor
Because test_bad_ret main program is not written in assembly, we don't
control instruction indices in timer_cb_ret_bad() subprog. This bites us
in timer/test_bad_ret subtest, where we see difference between cpuv4 and
other flavors.

For now, make __msg() expectations not rely on instruction indices by
anchoring them around bpf_get_prandom_u32 call. Once we have regex/glob
support for __msg(), this can be expressed a bit more nicely, but for
now just mitigating the problem with available means.

Fixes: e02dea158d ("selftests/bpf: validate async callback return value check correctness")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231208233028.3412690-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-08 16:50:50 -08:00
Andrii Nakryiko
4af20ab9ed Merge branch 'bpf-fix-accesses-to-uninit-stack-slots'
Andrei Matei says:

====================
bpf: fix accesses to uninit stack slots

Fix two related issues issues around verifying stack accesses:
1. accesses to uninitialized stack memory was allowed inconsistently
2. the maximum stack depth needed for a program was not always
maintained correctly

The two issues are fixed together in one commit because the code for one
affects the other.

V4 to V5:
- target bpf-next (Alexei)

V3 to V4:
- minor fixup to comment in patch 1 (Eduard)
- C89-style in patch 3 (Andrii)

V2 to V3:
- address review comments from Andrii and Eduard
- drop new verifier tests in favor of editing existing tests to check
  for stack depth
- append a patch with a bit of cleanup coming out of the previous review
====================

Link: https://lore.kernel.org/r/20231208032519.260451-1-andreimatei1@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-12-08 14:19:01 -08:00
Andrei Matei
2929bfac00 bpf: Minor cleanup around stack bounds
Push the rounding up of stack offsets into the function responsible for
growing the stack, rather than relying on all the callers to do it.
Uncertainty about whether the callers did it or not tripped up people in
a previous review.

Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231208032519.260451-4-andreimatei1@gmail.com
2023-12-08 14:19:00 -08:00
Andrei Matei
6b4a64bafd bpf: Fix accesses to uninit stack slots
Privileged programs are supposed to be able to read uninitialized stack
memory (ever since 6715df8d5) but, before this patch, these accesses
were permitted inconsistently. In particular, accesses were permitted
above state->allocated_stack, but not below it. In other words, if the
stack was already "large enough", the access was permitted, but
otherwise the access was rejected instead of being allowed to "grow the
stack". This undesired rejection was happening in two places:
- in check_stack_slot_within_bounds()
- in check_stack_range_initialized()
This patch arranges for these accesses to be permitted. A bunch of tests
that were relying on the old rejection had to change; all of them were
changed to add also run unprivileged, in which case the old behavior
persists. One tests couldn't be updated - global_func16 - because it
can't run unprivileged for other reasons.

This patch also fixes the tracking of the stack size for variable-offset
reads. This second fix is bundled in the same commit as the first one
because they're inter-related. Before this patch, writes to the stack
using registers containing a variable offset (as opposed to registers
with fixed, known values) were not properly contributing to the
function's needed stack size. As a result, it was possible for a program
to verify, but then to attempt to read out-of-bounds data at runtime
because a too small stack had been allocated for it.

Each function tracks the size of the stack it needs in
bpf_subprog_info.stack_depth, which is maintained by
update_stack_depth(). For regular memory accesses, check_mem_access()
was calling update_state_depth() but it was passing in only the fixed
part of the offset register, ignoring the variable offset. This was
incorrect; the minimum possible value of that register should be used
instead.

This tracking is now fixed by centralizing the tracking of stack size in
grow_stack_state(), and by lifting the calls to grow_stack_state() to
check_stack_access_within_bounds() as suggested by Andrii. The code is
now simpler and more convincingly tracks the correct maximum stack size.
check_stack_range_initialized() can now rely on enough stack having been
allocated for the access; this helps with the fix for the first issue.

A few tests were changed to also check the stack depth computation. The
one that fails without this patch is verifier_var_off:stack_write_priv_vs_unpriv.

Fixes: 01f810ace9 ("bpf: Allow variable-offset stack access")
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231208032519.260451-3-andreimatei1@gmail.com

Closes: https://lore.kernel.org/bpf/CABWLsev9g8UP_c3a=1qbuZUi20tGoUXoU07FPf-5FLvhOKOY+Q@mail.gmail.com/
2023-12-08 14:19:00 -08:00
Andrei Matei
92e1567ee3 bpf: Add some comments to stack representation
Add comments to the datastructure tracking the stack state, as the
mapping between each stack slot and where its state is stored is not
entirely obvious.

Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231208032519.260451-2-andreimatei1@gmail.com
2023-12-08 14:19:00 -08:00
David Vernet
8b7b0e5fe4 bpf: Load vmlinux btf for any struct_ops map
In libbpf, when determining whether we need to load vmlinux btf, we're
currently (among other things) checking whether there is any struct_ops
program present in the object. This works for most realistic struct_ops
maps, as a struct_ops map is of course typically composed of one or more
struct_ops programs. However, that technically need not be the case. A
struct_ops interface could be defined which allows a map to be specified
which one or more non-prog fields, and which provides default behavior
if no struct_ops progs is actually provided otherwise. For sched_ext,
for example, you technically only need to specify the name of the
scheduler in the struct_ops map, with the core scheduler logic providing
default behavior if no prog is actually specified.

If we were to define and try to load such a struct_ops map, we would
crash in libbpf when initializing it as obj->btf_vmlinux will be NULL:

Reading symbols from minimal...
(gdb) r
Starting program: minimal_example
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x000055555558308c in btf__type_cnt (btf=0x0) at btf.c:612
612             return btf->start_id + btf->nr_types;
(gdb) bt
    type_name=0x5555555d99e3 "sched_ext_ops", kind=4) at btf.c:914
    kind=4) at btf.c:942
    type=0x7fffffffe558, type_id=0x7fffffffe548, ...
    data_member=0x7fffffffe568) at libbpf.c:948
    kern_btf=0x0) at libbpf.c:1017
    at libbpf.c:8059

So as to account for such bare-bones struct_ops maps, let's update
obj_needs_vmlinux_btf() to also iterate over an obj's maps and check
whether any of them are struct_ops maps.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20231208061704.400463-1-void@manifault.com
2023-12-08 09:38:59 -08:00
Andrii Nakryiko
483af466e4 Merge branch 'bpf-fix-verification-of-indirect-var-off-stack-access'
Andrei Matei says:

====================
bpf: fix verification of indirect var-off stack access

V4 to V5:
  - split the test into a separate patch

V3 to V4:
  - include a test per Eduard's request
  - target bpf-next per Alexei's request (patches didn't change)

V2 to V3:
  - simplify checks for max_off (don't call
    check_stack_slot_within_bounds for it)
  - append a commit to protect against overflow in the addition of the
    register and the offset

V1 to V2:
  - fix max_off calculation for access size = 0
====================

Link: https://lore.kernel.org/r/20231207041150.229139-1-andreimatei1@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-12-07 13:58:14 -08:00
Andrei Matei
1d38a9ee81 bpf: Guard stack limits against 32bit overflow
This patch promotes the arithmetic around checking stack bounds to be
done in the 64-bit domain, instead of the current 32bit. The arithmetic
implies adding together a 64-bit register with a int offset. The
register was checked to be below 1<<29 when it was variable, but not
when it was fixed. The offset either comes from an instruction (in which
case it is 16 bit), from another register (in which case the caller
checked it to be below 1<<29 [1]), or from the size of an argument to a
kfunc (in which case it can be a u32 [2]). Between the register being
inconsistently checked to be below 1<<29, and the offset being up to an
u32, it appears that we were open to overflowing the `int`s which were
currently used for arithmetic.

[1] 815fb87b75/kernel/bpf/verifier.c (L7494-L7498)
[2] 815fb87b75/kernel/bpf/verifier.c (L11904)

Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231207041150.229139-4-andreimatei1@gmail.com
2023-12-07 13:58:10 -08:00
Andrei Matei
e28bd359bc bpf: Add verifier regression test for previous patch
Add a regression test for var-off zero-sized reads.

Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231207041150.229139-3-andreimatei1@gmail.com
2023-12-07 13:58:02 -08:00
Andrei Matei
a833a17aea bpf: Fix verification of indirect var-off stack access
This patch fixes a bug around the verification of possibly-zero-sized
stack accesses. When the access was done through a var-offset stack
pointer, check_stack_access_within_bounds was incorrectly computing the
maximum-offset of a zero-sized read to be the same as the register's min
offset. Instead, we have to take in account the register's maximum
possible value. The patch also simplifies how the max offset is checked;
the check is now simpler than for min offset.

The bug was allowing accesses to erroneously pass the
check_stack_access_within_bounds() checks, only to later crash in
check_stack_range_initialized() when all the possibly-affected stack
slots are iterated (this time with a correct max offset).
check_stack_range_initialized() is relying on
check_stack_access_within_bounds() for its accesses to the
stack-tracking vector to be within bounds; in the case of zero-sized
accesses, we were essentially only verifying that the lowest possible
slot was within bounds. We would crash when the max-offset of the stack
pointer was >= 0 (which shouldn't pass verification, and hopefully is
not something anyone's code attempts to do in practice).

Thanks Hao for reporting!

Fixes: 01f810ace9 ("bpf: Allow variable-offset stack access")
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231207041150.229139-2-andreimatei1@gmail.com

Closes: https://lore.kernel.org/bpf/CACkBjsZGEUaRCHsmaX=h-efVogsRfK1FPxmkgb0Os_frnHiNdw@mail.gmail.com/
2023-12-07 13:57:53 -08:00
Alexei Starovoitov
2146f7fe6e Merge branch 'allocate-bpf-trampoline-on-bpf_prog_pack'
Song Liu says:

====================
Allocate bpf trampoline on bpf_prog_pack

This set enables allocating bpf trampoline from bpf_prog_pack on x86. The
majority of this work, however, is the refactoring of trampoline code.
This is needed because we need to handle 4 archs and 2 users (trampoline
and struct_ops).

1/7 through 6/7 refactors trampoline code. A few helpers are added.
7/7 finally let bpf trampoline on x86 use bpf_prog_pack.

Changes in v7:
1. Use kvmalloc for rw_image in x86/arch_prepare_bpf_trampoline. (Alexei)
2. Add comment to explain why we cannot use kvmalloc in
   x86/arch_bpf_trampoline_size. (Alexei)

Changes in v6:
1. Rebase.
2. Add Acked-by and Tested-by from Jiri Olsa and Björn Töpel.

Changes in v5:
1. Adjust size of trampoline ksym. (Jiri)
2. Use "unsigned int size" arg in image management helpers.(Daniel)

Changes in v4:
1. Dropped 1/8 in v3, which is already merged in bpf-next.
2. Add Reviewed-by from Björn Töpel.

Changes in v3:
1. Fix bug in s390. (Thanks to Ilya Leoshkevich).
2. Fix build error in riscv. (kernel test robot).

Changes in v2:
1. Add missing changes in net/bpf/bpf_dummy_struct_ops.c.
2. Reduce one dry run in arch_prepare_bpf_trampoline. (Xu Kuohai)
3. Other small fixes.
====================

Link: https://lore.kernel.org/r/20231206224054.492250-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06 17:17:21 -08:00
Song Liu
3ba026fca8 x86, bpf: Use bpf_prog_pack for bpf trampoline
There are three major changes here:

1. Add arch_[alloc|free]_bpf_trampoline based on bpf_prog_pack;
2. Let arch_prepare_bpf_trampoline handle ROX input image, this requires
   arch_prepare_bpf_trampoline allocating a temporary RW buffer;
3. Update __arch_prepare_bpf_trampoline() to handle a RW buffer (rw_image)
   and a ROX buffer (image). This part is similar to the image/rw_image
   logic in bpf_int_jit_compile().

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20231206224054.492250-8-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06 17:17:21 -08:00
Song Liu
26ef208c20 bpf: Use arch_bpf_trampoline_size
Instead of blindly allocating PAGE_SIZE for each trampoline, check the size
of the trampoline with arch_bpf_trampoline_size(). This size is saved in
bpf_tramp_image->size, and used for modmem charge/uncharge. The fallback
arch_alloc_bpf_trampoline() still allocates a whole page because we need to
use set_memory_* to protect the memory.

struct_ops trampoline still uses a whole page for multiple trampolines.

With this size check at caller (regular trampoline and struct_ops
trampoline), remove arch_bpf_trampoline_size() from
arch_prepare_bpf_trampoline() in archs.

Also, update bpf_image_ksym_add() to handle symbol of different sizes.

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>  # on s390x
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com> # on riscv
Link: https://lore.kernel.org/r/20231206224054.492250-7-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06 17:17:20 -08:00
Song Liu
96d1b7c081 bpf: Add arch_bpf_trampoline_size()
This helper will be used to calculate the size of the trampoline before
allocating the memory.

arch_prepare_bpf_trampoline() for arm64 and riscv64 can use
arch_bpf_trampoline_size() to check the trampoline fits in the image.

OTOH, arch_prepare_bpf_trampoline() for s390 has to call the JIT process
twice, so it cannot use arch_bpf_trampoline_size().

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>  # on s390x
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com> # on riscv
Link: https://lore.kernel.org/r/20231206224054.492250-6-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06 17:17:20 -08:00
Song Liu
38b8b58ae7 bpf, x86: Adjust arch_prepare_bpf_trampoline return value
x86's implementation of arch_prepare_bpf_trampoline() requires
BPF_INSN_SAFETY buffer space between end of program and image_end. OTOH,
the return value does not include BPF_INSN_SAFETY. This doesn't cause any
real issue at the moment. However, "image" of size retval is not enough for
arch_prepare_bpf_trampoline(). This will cause confusion when we introduce
a new helper arch_bpf_trampoline_size(). To avoid future confusion, adjust
the return value to include BPF_INSN_SAFETY.

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20231206224054.492250-5-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06 17:17:20 -08:00
Song Liu
82583daa2e bpf: Add helpers for trampoline image management
As BPF trampoline of different archs moves from bpf_jit_[alloc|free]_exec()
to bpf_prog_pack_[alloc|free](), we need to use different _alloc, _free for
different archs during the transition. Add the following helpers for this
transition:

void *arch_alloc_bpf_trampoline(unsigned int size);
void arch_free_bpf_trampoline(void *image, unsigned int size);
void arch_protect_bpf_trampoline(void *image, unsigned int size);
void arch_unprotect_bpf_trampoline(void *image, unsigned int size);

The fallback version of these helpers require size <= PAGE_SIZE, but they
are only called with size == PAGE_SIZE. They will be called with size <
PAGE_SIZE when arch_bpf_trampoline_size() helper is introduced later.

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>  # on s390x
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20231206224054.492250-4-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06 17:17:20 -08:00