Commit graph

1252661 commits

Author SHA1 Message Date
Eduard Zingerman
1863acccdf selftests/bpf: Test autocreate behavior for struct_ops maps
Check that bpf_map__set_autocreate() can be used to disable automatic
creation for struct_ops maps.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-8-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Eduard Zingerman
c1b93c07b3 selftests/bpf: Bad_struct_ops test
When loading struct_ops programs kernel requires BTF id of the
struct_ops type and member index for attachment point inside that
type. This makes impossible to use same BPF program in several
struct_ops maps that have different struct_ops type.
Check if libbpf rejects such BPF objects files.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-7-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Eduard Zingerman
c8617e8bcf selftests/bpf: Utility functions to capture libbpf log in test_progs
Several test_progs tests already capture libbpf log in order to check
for some expected output, e.g bpf_tcp_ca.c, kfunc_dynptr_param.c,
log_buf.c and a few others.

This commit provides a, hopefully, simple API to capture libbpf log
w/o necessity to define new print callback in each test:

    /* Creates a global memstream capturing INFO and WARN level output
     * passed to libbpf_print_fn.
     * Returns 0 on success, negative value on failure.
     * On failure the description is printed using PRINT_FAIL and
     * current test case is marked as fail.
     */
    int start_libbpf_log_capture(void)

    /* Destroys global memstream created by start_libbpf_log_capture().
     * Returns a pointer to captured data which has to be freed.
     * Returned buffer is null terminated.
     */
    char *stop_libbpf_log_capture(void)

The intended usage is as follows:

    if (start_libbpf_log_capture())
            return;
    use_libbpf();
    char *log = stop_libbpf_log_capture();
    ASSERT_HAS_SUBSTR(log, "... expected ...", "expected some message");
    free(log);

As a safety measure, free(start_libbpf_log_capture()) is invoked in the
epilogue of the test_progs.c:run_one_test().

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-6-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Eduard Zingerman
5bab7a277c selftests/bpf: Test struct_ops map definition with type suffix
Extend struct_ops_module test case to check if it is possible to use
'___' suffixes for struct_ops type specification.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20240306104529.6453-5-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Eduard Zingerman
8db052615a libbpf: Honor autocreate flag for struct_ops maps
Skip load steps for struct_ops maps not marked for automatic creation.
This should allow to load bpf object in situations like below:

    SEC("struct_ops/foo") int BPF_PROG(foo) { ... }
    SEC("struct_ops/bar") int BPF_PROG(bar) { ... }

    struct test_ops___v1 {
    	int (*foo)(void);
    };

    struct test_ops___v2 {
    	int (*foo)(void);
    	int (*does_not_exist)(void);
    };

    SEC(".struct_ops.link")
    struct test_ops___v1 map_for_old = {
    	.test_1 = (void *)foo
    };

    SEC(".struct_ops.link")
    struct test_ops___v2 map_for_new = {
    	.test_1 = (void *)foo,
    	.does_not_exist = (void *)bar
    };

Suppose program is loaded on old kernel that does not have definition
for 'does_not_exist' struct_ops member. After this commit it would be
possible to load such object file after the following tweaks:

    bpf_program__set_autoload(skel->progs.bar, false);
    bpf_map__set_autocreate(skel->maps.map_for_new, false);

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20240306104529.6453-4-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Eduard Zingerman
d9ab2f76ef libbpf: Tie struct_ops programs to kernel BTF ids, not to local ids
Enforce the following existing limitation on struct_ops programs based
on kernel BTF id instead of program-local BTF id:

    struct_ops BPF prog can be re-used between multiple .struct_ops &
    .struct_ops.link as long as it's the same struct_ops struct
    definition and the same function pointer field

This allows reusing same BPF program for versioned struct_ops map
definitions, e.g.:

    SEC("struct_ops/test")
    int BPF_PROG(foo) { ... }

    struct some_ops___v1 { int (*test)(void); };
    struct some_ops___v2 { int (*test)(void); };

    SEC(".struct_ops.link") struct some_ops___v1 a = { .test = foo }
    SEC(".struct_ops.link") struct some_ops___v2 b = { .test = foo }

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-3-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Eduard Zingerman
a2a5172cf1 libbpf: Allow version suffixes (___smth) for struct_ops types
E.g. allow the following struct_ops definitions:

    struct bpf_testmod_ops___v1 { int (*test)(void); };
    struct bpf_testmod_ops___v2 { int (*test)(void); };

    SEC(".struct_ops.link")
    struct bpf_testmod_ops___v1 a = { .test = ... }
    SEC(".struct_ops.link")
    struct bpf_testmod_ops___v2 b = { .test = ... }

Where both bpf_testmod_ops__v1 and bpf_testmod_ops__v2 would be
resolved as 'struct bpf_testmod_ops' from kernel BTF.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-2-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Andrii Nakryiko
0f79bb8987 Merge branch 'bpf-introduce-may_goto-and-cond_break'
Alexei Starovoitov says:

====================
bpf: Introduce may_goto and cond_break

From: Alexei Starovoitov <ast@kernel.org>

v5 -> v6:
- Rename BPF_JMA to BPF_JCOND
- Addressed Andrii's review comments

v4 -> v5:
- rewrote patch 1 to avoid fake may_goto_reg and use 'u32 may_goto_cnt' instead.
  This way may_goto handling is similar to bpf_loop() processing.
- fixed bug in patch 2 that RANGE_WITHIN should not use
  rold->type == NOT_INIT as a safe signal.
- patch 3 fixed negative offset computation in cond_break macro
- using bpf_arena and cond_break recompiled lib/glob.c as bpf prog
  and it works! It will be added as a selftest to arena series.

v3 -> v4:
- fix drained issue reported by John.
  may_goto insn could be implemented with sticky state (once
  reaches 0 it stays 0), but the verifier shouldn't assume that.
  It has to explore both branches.
  Arguably drained iterator state shouldn't be there at all.
  bpf_iter_css_next() is not sticky. Can be fixed, but auditing all
  iterators for stickiness. That's an orthogonal discussion.
- explained JMA name reasons in patch 1
- fixed test_progs-no_alu32 issue and added another test

v2 -> v3: Major change
- drop bpf_can_loop() kfunc and introduce may_goto instruction instead
  kfunc is a function call while may_goto doesn't consume any registers
  and LLVM can produce much better code due to less register pressure.
- instead of counting from zero to BPF_MAX_LOOPS start from it instead
  and break out of the loop when count reaches zero
- use may_goto instruction in cond_break macro
- recognize that 'exact' state comparison doesn't need to be truly exact.
  regsafe() should ignore precision and liveness marks, but range_within
  logic is safe to use while evaluating open coded iterators.
====================

Link: https://lore.kernel.org/r/20240306031929.42666-1-alexei.starovoitov@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-06 15:18:14 -08:00
Alexei Starovoitov
0c8bbf990b selftests/bpf: Test may_goto
Add tests for may_goto instruction via cond_break macro.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240306031929.42666-5-alexei.starovoitov@gmail.com
2024-03-06 15:18:10 -08:00
Alexei Starovoitov
0637580152 bpf: Add cond_break macro
Use may_goto instruction to implement cond_break macro.
Ideally the macro should be written as:
  asm volatile goto(".byte 0xe5;
                     .byte 0;
                     .short %l[l_break] ...
                     .long 0;
but LLVM doesn't recognize fixup of 2 byte PC relative yet.
Hence use
  asm volatile goto(".byte 0xe5;
                     .byte 0;
                     .long %l[l_break] ...
                     .short 0;
that produces correct asm on little endian.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240306031929.42666-4-alexei.starovoitov@gmail.com
2024-03-06 15:18:04 -08:00
Alexei Starovoitov
4f81c16f50 bpf: Recognize that two registers are safe when their ranges match
When open code iterators, bpf_loop or may_goto are used the following two
states are equivalent and safe to prune the search:

cur state: fp-8_w=scalar(id=3,smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=11,var_off=(0x0; 0xf))
old state: fp-8_rw=scalar(id=2,smin=umin=smin32=umin32=1,smax=umax=smax32=umax32=11,var_off=(0x0; 0xf))

In other words "exact" state match should ignore liveness and precision
marks, since open coded iterator logic didn't complete their propagation,
reg_old->type == NOT_INIT && reg_cur->type != NOT_INIT is also not safe to
prune while looping, but range_within logic that applies to scalars,
ptr_to_mem, map_value, pkt_ptr is safe to rely on.

Avoid doing such comparison when regular infinite loop detection logic is
used, otherwise bounded loop logic will declare such "infinite loop" as
false positive. Such example is in progs/verifier_loops1.c
not_an_inifinite_loop().

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240306031929.42666-3-alexei.starovoitov@gmail.com
2024-03-06 15:18:00 -08:00
Andrii Nakryiko
9a9d1d3605 Merge branch 'mm-enforce-ioremap-address-space-and-introduce-sparse-vm_area'
Alexei Starovoitov says:

====================
mm: Enforce ioremap address space and introduce sparse vm_area

From: Alexei Starovoitov <ast@kernel.org>

v3 -> v4
- dropped VM_XEN patch for now. It will be in the follow up.
- fixed constant as pointed out by Mike

v2 -> v3
- added Christoph's reviewed-by to patch 1
- cap commit log lines to 75 chars
- factored out common checks in patch 3 into helper
- made vm_area_unmap_pages() return void

There are various users of kernel virtual address space:
vmalloc, vmap, ioremap, xen.

- vmalloc use case dominates the usage. Such vm areas have VM_ALLOC flag
and these areas are treated differently by KASAN.

- the areas created by vmap() function should be tagged with VM_MAP
(as majority of the users do).

- ioremap areas are tagged with VM_IOREMAP and vm area start is aligned
to size of the area unlike vmalloc/vmap.

- there is also xen usage that is marked as VM_IOREMAP, but it doesn't
call ioremap_page_range() unlike all other VM_IOREMAP users.

To clean this up a bit, enforce that ioremap_page_range() checks the range
and VM_IOREMAP flag.

In addition BPF would like to reserve regions of kernel virtual address
space and populate it lazily, similar to xen use cases.
For that reason, introduce VM_SPARSE flag and vm_area_[un]map_pages()
helpers to populate this sparse area.

In the end the /proc/vmallocinfo will show
"vmalloc"
"vmap"
"ioremap"
"sparse"
categories for different kinds of address regions.

ioremap, sparse will return zero when dumped through /proc/kcore
====================

Link: https://lore.kernel.org/r/20240305030516.41519-1-alexei.starovoitov@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-06 15:17:31 -08:00
Alexei Starovoitov
011832b97b bpf: Introduce may_goto instruction
Introduce may_goto instruction that from the verifier pov is similar to
open coded iterators bpf_for()/bpf_repeat() and bpf_loop() helper, but it
doesn't iterate any objects.
In assembly 'may_goto' is a nop most of the time until bpf runtime has to
terminate the program for whatever reason. In the current implementation
may_goto has a hidden counter, but other mechanisms can be used.
For programs written in C the later patch introduces 'cond_break' macro
that combines 'may_goto' with 'break' statement and has similar semantics:
cond_break is a nop until bpf runtime has to break out of this loop.
It can be used in any normal "for" or "while" loop, like

  for (i = zero; i < cnt; cond_break, i++) {

The verifier recognizes that may_goto is used in the program, reserves
additional 8 bytes of stack, initializes them in subprog prologue, and
replaces may_goto instruction with:
aux_reg = *(u64 *)(fp - 40)
if aux_reg == 0 goto pc+off
aux_reg -= 1
*(u64 *)(fp - 40) = aux_reg

may_goto instruction can be used by LLVM to implement __builtin_memcpy,
__builtin_strcmp.

may_goto is not a full substitute for bpf_for() macro.
bpf_for() doesn't have induction variable that verifiers sees,
so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded.

But when the code is written as:
for (i = 0; i < 100; cond_break, i++)
the verifier see 'i' as precise constant zero,
hence cond_break (aka may_goto) doesn't help to converge the loop.
A static or global variable can be used as a workaround:
static int zero = 0;
for (i = zero; i < 100; cond_break, i++) // works!

may_goto works well with arena pointers that don't need to be bounds
checked on access. Load/store from arena returns imprecise unbounded
scalar and loops with may_goto pass the verifier.

Reserve new opcode BPF_JMP | BPF_JCOND for may_goto insn.
JCOND stands for conditional pseudo jump.
Since goto_or_nop insn was proposed, it may use the same opcode.
may_goto vs goto_or_nop can be distinguished by src_reg:
code = BPF_JMP | BPF_JCOND
src_reg = 0 - may_goto
src_reg = 1 - goto_or_nop

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com
2024-03-06 15:17:31 -08:00
Alexei Starovoitov
e6f798225a mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages().
vmap/vmalloc APIs are used to map a set of pages into contiguous kernel
virtual space.

get_vm_area() with appropriate flag is used to request an area of kernel
address range. It's used for vmalloc, vmap, ioremap, xen use cases.
- vmalloc use case dominates the usage. Such vm areas have VM_ALLOC flag.
- the areas created by vmap() function should be tagged with VM_MAP.
- ioremap areas are tagged with VM_IOREMAP.

BPF would like to extend the vmap API to implement a lazily-populated
sparse, yet contiguous kernel virtual space. Introduce VM_SPARSE flag
and vm_area_map_pages(area, start_addr, count, pages) API to map a set
of pages within a given area.
It has the same sanity checks as vmap() does.
It also checks that get_vm_area() was created with VM_SPARSE flag
which identifies such areas in /proc/vmallocinfo
and returns zero pages on read through /proc/kcore.

The next commits will introduce bpf_arena which is a sparsely populated
shared memory region between bpf program and user space process. It will
map privately-managed pages into a sparse vm area with the following steps:

  // request virtual memory region during bpf prog verification
  area = get_vm_area(area_size, VM_SPARSE);

  // on demand
  vm_area_map_pages(area, kaddr, kend, pages);
  vm_area_unmap_pages(area, kaddr, kend);

  // after bpf program is detached and unloaded
  free_vm_area(area);

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/bpf/20240305030516.41519-3-alexei.starovoitov@gmail.com
2024-03-06 15:17:22 -08:00
Alexei Starovoitov
3e49a866c9 mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.
There are various users of get_vm_area() + ioremap_page_range() APIs.
Enforce that get_vm_area() was requested as VM_IOREMAP type and range
passed to ioremap_page_range() matches created vm_area to avoid
accidentally ioremap-ing into wrong address range.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20240305030516.41519-2-alexei.starovoitov@gmail.com
2024-03-06 10:19:04 -08:00
Martin KaFai Lau
8f50d5c423 Merge branch 'Allow struct_ops maps with a large number of programs'
Kui-Feng Lee says:

====================
The BPF struct_ops previously only allowed for one page to be used for
the trampolines of all links in a map. However, we have recently run
out of space due to the large number of BPF program links. By
allocating additional pages when we exhaust an existing page, we can
accommodate more links in a single map.

The variable st_map->image has been changed to st_map->image_pages,
and its type has been changed to an array of pointers to buffers of
PAGE_SIZE. Additional pages are allocated when all existing pages are
exhausted.

The test case loads a struct_ops maps having 40 programs. Their
trampolines takes about 6.6k+ bytes over 1.5 pages on x86.
---
Major differences from v3:

 - Refactor buffer allocations to bpf_struct_ops_tramp_buf_alloc() and
   bpf_struct_ops_tramp_buf_free().

Major differences from v2:

 - Move image buffer allocation to bpf_struct_ops_prepare_trampoline().

Major differences from v1:

 - Always free pages if failing to update.

 - Allocate 8 pages at most.

v3: https://lore.kernel.org/all/20240224030302.1500343-1-thinker.li@gmail.com/
v2: https://lore.kernel.org/all/20240221225911.757861-1-thinker.li@gmail.com/
v1: https://lore.kernel.org/all/20240216182828.201727-1-thinker.li@gmail.com/
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-04 14:09:24 -08:00
Kui-Feng Lee
93bc28d859 selftests/bpf: Test struct_ops maps with a large number of struct_ops program.
Create and load a struct_ops map with a large number of struct_ops
programs to generate trampolines taking a size over multiple pages. The
map includes 40 programs. Their trampolines takes 6.6k+, more than 1.5
pages, on x86.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240224223418.526631-4-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-04 14:09:24 -08:00
Kui-Feng Lee
187e2af05a bpf: struct_ops supports more than one page for trampolines.
The BPF struct_ops previously only allowed one page of trampolines.
Each function pointer of a struct_ops is implemented by a struct_ops
bpf program. Each struct_ops bpf program requires a trampoline.
The following selftest patch shows each page can hold a little more
than 20 trampolines.

While one page is more than enough for the tcp-cc usecase,
the sched_ext use case shows that one page is not always enough and hits
the one page limit. This patch overcomes the one page limit by allocating
another page when needed and it is limited to a total of
MAX_IMAGE_PAGES (8) pages which is more than enough for
reasonable usages.

The variable st_map->image has been changed to st_map->image_pages, and
its type has been changed to an array of pointers to pages.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240224223418.526631-3-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-04 14:09:20 -08:00
Kui-Feng Lee
73e4f9e615 bpf, net: validate struct_ops when updating value.
Perform all validations when updating values of struct_ops maps. Doing
validation in st_ops->reg() and st_ops->update() is not necessary anymore.
However, tcp_register_congestion_control() has been called in various
places. It still needs to do validations.

Cc: netdev@vger.kernel.org
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/20240224223418.526631-2-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-04 10:03:57 -08:00
Song Yoong Siang
01031fd473 selftests/bpf: xdp_hw_metadata reduce sleep interval
In current ping-pong design, xdp_hw_metadata will wait until the packet
transmission completely done, then only start to receive the next packet.

The current sleep interval is 10ms, which is unnecessary large. Typically,
a NIC does not need such a long time to transmit a packet. Furthermore,
during this 10ms sleep time, the app is unable to receive incoming packets.

Therefore, this commit reduce sleep interval to 10us, so that
xdp_hw_metadata is able to support periodic packets with shorter interval.
10us * 500 = 5ms should be enough for packet transmission and status
retrieval.

Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20240303083225.1184165-2-yoong.siang.song@intel.com
2024-03-04 15:01:59 +01:00
Andrii Nakryiko
8f79870ec8 selftests/bpf: Extend uprobe/uretprobe triggering benchmarks
Settle on three "flavors" of uprobe/uretprobe, installed on different
kinds of instruction: nop, push, and ret. All three are testing
different internal code paths emulating or single-stepping instructions,
so are interesting to compare and benchmark separately.

To ensure `push rbp` instruction we ensure that uprobe_target_push() is
not a leaf function by calling (global __weak) noop function and
returning something afterwards (if we don't do that, compiler will just
do a tail call optimization).

Also, we need to make sure that compiler isn't skipping frame pointer
generation, so let's add `-fno-omit-frame-pointers` to Makefile.

Just to give an idea of where we currently stand in terms of relative
performance of different uprobe/uretprobe cases vs a cheap syscall
(getpgid()) baseline, here are results from my local machine:

$ benchs/run_bench_uprobes.sh
base           :    1.561 ± 0.020M/s
uprobe-nop     :    0.947 ± 0.007M/s
uprobe-push    :    0.951 ± 0.004M/s
uprobe-ret     :    0.443 ± 0.007M/s
uretprobe-nop  :    0.471 ± 0.013M/s
uretprobe-push :    0.483 ± 0.004M/s
uretprobe-ret  :    0.306 ± 0.007M/s

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240301214551.1686095-1-andrii@kernel.org
2024-03-04 14:40:24 +01:00
Chen Shen
25703adf45 libbpf: Correct debug message in btf__load_vmlinux_btf
In the function btf__load_vmlinux_btf, the debug message incorrectly
refers to 'path' instead of 'sysfs_btf_path'.

Signed-off-by: Chen Shen <peterchenshen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240302062218.3587-1-peterchenshen@gmail.com
2024-03-04 14:33:51 +01:00
Dave Thaler
0ef05e258b bpf, docs: Rename legacy conformance group to packet
There could be other legacy conformance groups in the future,
so use a more descriptive name.  The status of the conformance
group in the IANA registry is what designates it as legacy,
not the name of the group.

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Link: https://lore.kernel.org/r/20240302012229.16452-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2024-03-04 14:31:06 +01:00
Dave Thaler
4e73e1bc1a bpf, docs: Use IETF format for field definitions in instruction-set.rst
In preparation for publication as an IETF RFC, the WG chairs asked me
to convert the document to use IETF packet format for field layout, so
this patch attempts to make it consistent with other IETF documents.

Some fields that are not byte aligned were previously inconsistent
in how values were defined.  Some were defined as the value of the
byte containing the field (like 0x20 for a field holding the high
four bits of the byte), and others were defined as the value of the
field itself (like 0x2).  This PR makes them be consistent in using
just the values of the field itself, which is IETF convention.

As a result, some of the defines that used BPF_* would no longer
match the value in the spec, and so this patch also drops the BPF_*
prefix to avoid confusion with the defines that are the full-byte
equivalent values.  For consistency, BPF_* is then dropped from
other fields too.  BPF_<foo> is thus the Linux implementation-specific
define for <foo> as it appears in the BPF ISA specification.

The syntax BPF_ADD | BPF_X | BPF_ALU only worked for full-byte
values so the convention {ADD, X, ALU} is proposed for referring
to field values instead.

Also replace the redundant "LSB bits" with "least significant bits".

A preview of what the resulting Internet Draft would look like can
be seen at:
https://htmlpreview.github.io/?https://raw.githubusercontent.com/dthaler/ebp
f-docs-1/format/draft-ietf-bpf-isa.html

v1->v2: Fix sphinx issue as recommended by David Vernet

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240301222337.15931-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-02 21:40:14 -08:00
Jakub Kicinski
4b2765ae41 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZeEKVAAKCRDbK58LschI
 g7oYAQD5Jlv4fIVTvxvfZrTTZ2tU+OsPa75mc8SDKwpash3YygEA8kvESy8+t6pg
 D6QmSf1DIZdFoSp/bV+pfkNWMeR8gwg=
 =mTAj
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-02-29

We've added 119 non-merge commits during the last 32 day(s) which contain
a total of 150 files changed, 3589 insertions(+), 995 deletions(-).

The main changes are:

1) Extend the BPF verifier to enable static subprog calls in spin lock
   critical sections, from Kumar Kartikeya Dwivedi.

2) Fix confusing and incorrect inference of PTR_TO_CTX argument type
   in BPF global subprogs, from Andrii Nakryiko.

3) Larger batch of riscv BPF JIT improvements and enabling inlining
   of the bpf_kptr_xchg() for RV64, from Pu Lehui.

4) Allow skeleton users to change the values of the fields in struct_ops
   maps at runtime, from Kui-Feng Lee.

5) Extend the verifier's capabilities of tracking scalars when they
   are spilled to stack, especially when the spill or fill is narrowing,
   from Maxim Mikityanskiy & Eduard Zingerman.

6) Various BPF selftest improvements to fix errors under gcc BPF backend,
   from Jose E. Marchesi.

7) Avoid module loading failure when the module trying to register
   a struct_ops has its BTF section stripped, from Geliang Tang.

8) Annotate all kfuncs in .BTF_ids section which eventually allows
   for automatic kfunc prototype generation from bpftool, from Daniel Xu.

9) Several updates to the instruction-set.rst IETF standardization
   document, from Dave Thaler.

10) Shrink the size of struct bpf_map resp. bpf_array,
    from Alexei Starovoitov.

11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer,
    from Benjamin Tissoires.

12) Fix bpftool to be more portable to musl libc by using POSIX's
    basename(), from Arnaldo Carvalho de Melo.

13) Add libbpf support to gcc in CORE macro definitions,
    from Cupertino Miranda.

14) Remove a duplicate type check in perf_event_bpf_event,
    from Florian Lehner.

15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them
    with notrace correctly, from Yonghong Song.

16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible
    array to fix build warnings, from Kees Cook.

17) Fix resolve_btfids cross-compilation to non host-native endianness,
    from Viktor Malik.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
  selftests/bpf: Test if shadow types work correctly.
  bpftool: Add an example for struct_ops map and shadow type.
  bpftool: Generated shadow variables for struct_ops maps.
  libbpf: Convert st_ops->data to shadow type.
  libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
  bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
  bpf, arm64: use bpf_prog_pack for memory management
  arm64: patching: implement text_poke API
  bpf, arm64: support exceptions
  arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT
  bpf: add is_async_callback_calling_insn() helper
  bpf: introduce in_sleepable() helper
  bpf: allow more maps in sleepable bpf programs
  selftests/bpf: Test case for lacking CFI stub functions.
  bpf: Check cfi_stubs before registering a struct_ops type.
  bpf: Clarify batch lookup/lookup_and_delete semantics
  bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
  bpf, docs: Fix typos in instruction-set.rst
  selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
  bpf: Shrink size of struct bpf_map/bpf_array.
  ...
====================

Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-02 20:50:59 -08:00
David S. Miller
e960825709 Merge branch 'inet_dump_ifaddr-no-rtnl'
Eric Dumazet says:

====================
inet: no longer use RTNL to protect inet_dump_ifaddr()

This series convert inet so that a dump of addresses (ip -4 addr)
no longer requires RTNL.
====================

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 11:09:40 +00:00
Eric Dumazet
cdb2f80f1c inet: use xa_array iterator to implement inet_dump_ifaddr()
1) inet_dump_ifaddr() can can run under RCU protection
   instead of RTNL.

2) properly return 0 at the end of a dump, avoiding an
   an extra recvmsg() system call.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 11:09:39 +00:00
Eric Dumazet
590e92cdc8 inet: prepare inet_base_seq() to run without RTNL
In the following patch, inet_base_seq() will no longer be called
with RTNL held.

Add READ_ONCE()/WRITE_ONCE() annotations in dev_base_seq_inc()
and inet_base_seq().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 11:09:39 +00:00
Eric Dumazet
3ddc2231c8 inet: annotate data-races around ifa->ifa_flags
ifa->ifa_flags can be read locklessly.

Add appropriate READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 11:09:39 +00:00
Eric Dumazet
9f6fa3c4e7 inet: annotate data-races around ifa->ifa_preferred_lft
ifa->ifa_preferred_lft can be read locklessly.

Add appropriate READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 11:09:39 +00:00
Eric Dumazet
a5fcf74d80 inet: annotate data-races around ifa->ifa_valid_lft
ifa->ifa_valid_lft can be read locklessly.

Add appropriate READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 11:09:39 +00:00
Eric Dumazet
3cd3e72ccb inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp
ifa->ifa_tstamp can be read locklessly.

Add appropriate READ_ONCE()/WRITE_ONCE() annotations.

Do the same for ifa->ifa_cstamp to prepare upcoming changes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 11:09:39 +00:00
David S. Miller
76f06cbd7b Merge branch 'netdevsim-link'
David Wei says:

====================
netdevsim: link and forward skbs between ports

This patchset adds the ability to link two netdevsim ports together and
forward skbs between them, similar to veth. The goal is to use netdevsim
for testing features e.g. zero copy Rx using io_uring.

This feature was tested locally on QEMU, and a selftest is included.

I ran netdev selftests CI style and all tests but the following passed:
- gro.sh
- l2tp.sh
- ip_local_port_range.sh

gro.sh fails because virtme-ng mounts as read-only and it tries to write
to log.txt. This issue was reported to virtme-ng upstream.

l2tp.sh and ip_local_port_range.sh both fail for me on net-next/main as
well.

---
v13->v14:
- implement ndo_get_iflink()
- fix returning 0 if peer is already linked during linking or not linked
  during unlinking
- bump dropped counter if nsim_ipsec_tx() fails and generally reorder
  nsim_start_xmit()
- fix overflowing lines and indentations

v12->v13:
- wait for socat listening port to be ready before sending data in
  selftest

v11->v12:
- fix leaked netns refs
- fix rtnetlink.sh kci_test_ipsec_offload() selftest

v10->v11:
- add udevadm settle after creating netdevsims in selftest

v9->v10:
- fix not freeing skb when not there is no peer
- prevent possible id clashes in selftest
- cleanup selftest on error paths

v8->v9:
- switch to getting netns using fd rather than id
- prevent linking a netdevsim to itself
- update tests

v7->v8:
- fix not dereferencing RCU ptr using rcu_dereference()
- remove unused variables in selftest

v6->v7:
- change link syntax to netnsid:ifidx
- replace dev_get_by_index() with __dev_get_by_index()
- check for NULL peer when linking
- add a sysfs attribute for unlinking
- only update Tx stats if not dropped
- update selftest

v5->v6:
- reworked to link two netdevsims using sysfs attribute on the bus
  device instead of debugfs due to deadlock possibility if a netdevsim
  is removed during linking
- removed unnecessary patch maintaining a list of probed nsim_devs
- updated selftest

v4->v5:
- reduce nsim_dev_list_lock critical section
- fixed missing mutex unlock during unwind ladder
- rework nsim_dev_peer_write synchronization to take devlink lock as
  well as rtnl_lock
- return err msgs to user during linking if port doesn't exist or
  linking to self
- update tx stats outside of RCU lock

v3->v4:
- maintain a mutex protected list of probed nsim_devs instead of using
  nsim_bus_dev
- fixed synchronization issues by taking rtnl_lock
- track tx_dropped skbs

v2->v3:
- take lock when traversing nsim_bus_dev_list
- take device ref when getting a nsim_bus_dev
- return 0 if nsim_dev_peer_read cannot find the port
- address code formatting
- do not hard code values in selftests
- add Makefile for selftests

v1->v2:
- renamed debugfs file from "link" to "peer"
- replaced strstep() with sscanf() for consistency
- increased char[] buf sz to 22 for copying id + port from user
- added err msg w/ expected fmt when linking as a hint to user
- prevent linking port to itself
- protect peer ptr using RCU

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:43:11 +00:00
David Wei
8ee60f9c41 netdevsim: fix rtnetlink.sh selftest
I cleared IFF_NOARP flag from netdevsim dev->flags in order to support
skb forwarding. This breaks the rtnetlink.sh selftest
kci_test_ipsec_offload() test because ipsec does not connect to peers it
cannot transmit to.

Fix the issue by adding a neigh entry manually. ipsec_offload test now
successfully pass.

Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:43:10 +00:00
David Wei
dfb429ea4f netdevsim: add selftest for forwarding skb between connected ports
Connect two netdevsim ports in different namespaces together, then send
packets between them using socat.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:43:10 +00:00
David Wei
8debcf5832 netdevsim: add ndo_get_iflink() implementation
Add an implementation for ndo_get_iflink() in netdevsim that shows the
ifindex of the linked peer, if any.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:43:10 +00:00
David Wei
9eb95228a7 netdevsim: forward skbs from one connected port to another
Forward skbs sent from one netdevsim port to its connected netdevsim
port using dev_forward_skb, in a spirit similar to veth.

Add a tx_dropped variable to struct netdevsim, tracking the number of
skbs that could not be forwarded using dev_forward_skb().

The xmit() function accessing the peer ptr is protected by an RCU read
critical section. The rcu_read_lock() is functionally redundant as since
v5.0 all softirqs are implicitly RCU read critical sections; but it is
useful for human readers.

If another CPU is concurrently in nsim_destroy(), then it will first set
the peer ptr to NULL. This does not affect any existing readers that
dereferenced a non-NULL peer. Then, in unregister_netdevice(), there is
a synchronize_rcu() before the netdev is actually unregistered and
freed. This ensures that any readers i.e. xmit() that got a non-NULL
peer will complete before the netdev is freed.

Any readers after the RCU_INIT_POINTER() but before synchronize_rcu()
will dereference NULL, making it safe.

The codepath to nsim_destroy() and nsim_create() takes both the newly
added nsim_dev_list_lock and rtnl_lock. This makes it safe with
concurrent calls to linking two netdevsims together.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:43:10 +00:00
David Wei
f532957d76 netdevsim: allow two netdevsim ports to be connected
Add two netdevsim bus attribute to sysfs:
/sys/bus/netdevsim/link_device
/sys/bus/netdevsim/unlink_device

Writing "A M B N" to link_device will link netdevsim M in netnsid A with
netdevsim N in netnsid B.

Writing "A M" to unlink_device will unlink netdevsim M in netnsid A from
its peer, if any.

rtnl_lock is taken to ensure nothing changes during the linking.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:43:10 +00:00
David S. Miller
e2d890afba Merge branch 'selftests-xfail'
Jakub Kicinski says:

====================
selftests: kselftest_harness: support using xfail

When running selftests for our subsystem in our CI we'd like all
tests to pass. Currently some tests use SKIP for cases they
expect to fail, because the kselftest_harness limits the return
codes to pass/fail/skip. XFAIL which would be a great match
here cannot be used.

Remove the no_print handling and use vfork() to run the test in
a different process than the setup. This way we don't need to
pass "failing step" via the exit code. Further clean up the exit
codes so that we can use all KSFT_* values. Rewrite the result
printing to make handling XFAIL/XPASS easier. Support tests
declaring combinations of fixture + variant they expect to fail.

Merge plan is to put it on top of -rc6 and merge into net-next.
That way others should be able to pull the patches without
any networking changes.

v4:
 - rebase on top of Mickael's vfork() changes
v3: https://lore.kernel.org/all/20240220192235.2953484-1-kuba@kernel.org/
 - combine multiple series
 - change to "list of expected failures" rather than SKIP()-like handling
v2: https://lore.kernel.org/all/20240216002619.1999225-1-kuba@kernel.org/
 - fix alignment
follow up RFC: https://lore.kernel.org/all/20240216004122.2004689-1-kuba@kernel.org/
v1: https://lore.kernel.org/all/20240213154416.422739-1-kuba@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:30 +00:00
Jakub Kicinski
c05bf0e933 selftests: ip_local_port_range: use XFAIL instead of SKIP
SCTP does not support IP_LOCAL_PORT_RANGE and we know it,
so use XFAIL instead of SKIP.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:29 +00:00
Jakub Kicinski
2709473c93 selftests: kselftest_harness: support using xfail
Currently some tests report skip for things they expect to fail
e.g. when given combination of parameters is known to be unsupported.
This is confusing because in an ideal test environment and fully
featured kernel no tests should be skipped.

Selftest summary line already includes xfail and xpass counters,
e.g.:

  Totals: pass:725 fail:0 xfail:0 xpass:0 skip:0 error:0

but there's no way to use it from within the harness.

Add a new per-fixture+variant combination list of test cases
we expect to fail.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:29 +00:00
Jakub Kicinski
378193eff3 selftests: kselftest_harness: let PASS / FAIL provide diagnostic
Switch to printing KTAP line for PASS / FAIL with ksft_test_result_code(),
this gives us the ability to report diagnostic messages.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:29 +00:00
Jakub Kicinski
42ab727eb9 selftests: kselftest_harness: separate diagnostic message with # in ksft_test_result_code()
According to the spec we should always print a # if we add
a diagnostic message. Having the caller pass in the new line
as part of diagnostic message makes handling this a bit
counter-intuitive, so append the new line in the helper.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:29 +00:00
Jakub Kicinski
732e203528 selftests: kselftest_harness: print test name for SKIP
Jakub points out that for parsers it's rather useful to always
have the test name on the result line. Currently if we SKIP
(or soon XFAIL or XPASS), we will print:

ok 17 # SKIP SCTP doesn't support IP_BIND_ADDRESS_NO_PORT

     ^
     no test name

Always print the test name.
KTAP format seems to allow or even call for it, per:
https://docs.kernel.org/dev-tools/ktap.html

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/all/87jzn6lnou.fsf@cloudflare.com/
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:29 +00:00
Jakub Kicinski
fa1a53d836 selftests: kselftest: add ksft_test_result_code(), handling all exit codes
For generic test harness code it's more useful to deal with exit
codes directly, rather than having to switch on them and call
the right ksft_test_result_*() helper. Add such function to kselftest.h.

Note that "directive" and "diagnostic" are what ktap docs call
those parts of the message.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00
Jakub Kicinski
796a344fa4 selftests: kselftest_harness: use exit code to store skip
We always use skip in combination with exit_code being 0
(KSFT_PASS). This are basic KSFT / KTAP semantics.
Store the right KSFT_* code in exit_code directly.

This makes it easier to support tests reporting other
extended KSFT_* codes like XFAIL / XPASS.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00
Jakub Kicinski
69fe8ec4f6 selftests: kselftest_harness: save full exit code in metadata
Instead of tracking passed = 0/1 rename the field to exit_code
and invert the values so that they match the KSFT_* exit codes.
This will allow us to fold SKIP / XFAIL into the same value.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00
Jakub Kicinski
38c957f070 selftests: kselftest_harness: generate test name once
Since we added variant support generating full test case
name takes 4 string arguments. We're about to need it
in another two places. Stop the duplication and print
once into a temporary buffer.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00
Jakub Kicinski
a724707976 selftests: kselftest_harness: use KSFT_* exit codes
Now that we no longer need low exit codes to communicate
assertion steps - use normal KSFT exit codes.

Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00
Mickaël Salaün
0710a1a73f selftests/harness: Merge TEST_F_FORK() into TEST_F()
Replace Landlock-specific TEST_F_FORK() with an improved TEST_F() which
brings four related changes:

Run TEST_F()'s tests in a grandchild process to make it possible to
drop privileges and delegate teardown to the parent.

Compared to TEST_F_FORK(), simplify handling of the test grandchild
process thanks to vfork(2), and makes it generic (e.g. no explicit
conversion between exit code and _metadata).

Compared to TEST_F_FORK(), run teardown even when tests failed with an
assert thanks to commit 63e6b2a423 ("selftests/harness: Run TEARDOWN
for ASSERT failures").

Simplify the test harness code by removing the no_print and step fields
which are not used.  I added this feature just after I made
kselftest_harness.h more broadly available but this step counter
remained even though it wasn't needed after all. See commit 369130b631
("selftests: Enhance kselftest_harness.h to print which assert failed").

Replace spaces with tabs in one line of __TEST_F_IMPL().

Cc: Günther Noack <gnoack@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 10:30:28 +00:00