Commit graph

1253601 commits

Author SHA1 Message Date
Ido Schimmel
d8a21070b6 nexthop: Fix out-of-bounds access during attribute validation
Passing a maximum attribute type to nlmsg_parse() that is larger than
the size of the passed policy will result in an out-of-bounds access [1]
when the attribute type is used as an index into the policy array.

Fix by setting the maximum attribute type according to the policy size,
as is already done for RTM_NEWNEXTHOP messages. Add a test case that
triggers the bug.

No regressions in fib nexthops tests:

 # ./fib_nexthops.sh
 [...]
 Tests passed: 236
 Tests failed:   0

[1]
BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x1e53/0x2940
Read of size 1 at addr ffffffff99ab4d20 by task ip/610

CPU: 3 PID: 610 Comm: ip Not tainted 6.8.0-rc7-custom-gd435d6e3e161 #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x8f/0xe0
 print_report+0xcf/0x670
 kasan_report+0xd8/0x110
 __nla_validate_parse+0x1e53/0x2940
 __nla_parse+0x40/0x50
 rtm_del_nexthop+0x1bd/0x400
 rtnetlink_rcv_msg+0x3cc/0xf20
 netlink_rcv_skb+0x170/0x440
 netlink_unicast+0x540/0x820
 netlink_sendmsg+0x8d3/0xdb0
 ____sys_sendmsg+0x31f/0xa60
 ___sys_sendmsg+0x13a/0x1e0
 __sys_sendmsg+0x11c/0x1f0
 do_syscall_64+0xc5/0x1d0
 entry_SYSCALL_64_after_hwframe+0x63/0x6b
[...]

The buggy address belongs to the variable:
 rtm_nh_policy_del+0x20/0x40

Fixes: 2118f9390d ("net: nexthop: Adjust netlink policy parsing for a new attribute")
Reported-by: Eric Dumazet <edumazet@google.com>
Closes: https://lore.kernel.org/netdev/CANn89i+UNcG0PJMW5X7gOMunF38ryMh=L1aeZUKH3kL4UdUqag@mail.gmail.com/
Reported-by: syzbot+65bb09a7208ce3d4a633@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/00000000000088981b06133bc07b@google.com/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240311162307.545385-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 20:35:20 -07:00
Ido Schimmel
262a68aa46 nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
The attribute is parsed in __nh_valid_dump_req() which is called by the
dump handlers of RTM_GETNEXTHOP and RTM_GETNEXTHOPBUCKET although it is
only used by the former and rejected by the policy of the latter.

Move the parsing to nh_valid_dump_req() which is only called by the dump
handler of RTM_GETNEXTHOP.

This is a preparation for a subsequent patch.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240311162307.545385-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 20:35:20 -07:00
Ido Schimmel
dc5e0141ff nexthop: Only parse NHA_OP_FLAGS for get messages that require it
The attribute is parsed into 'op_flags' in nh_valid_get_del_req() which
is called from the handlers of three message types: RTM_DELNEXTHOP,
RTM_GETNEXTHOPBUCKET and RTM_GETNEXTHOP. The attribute is only used by
the latter and rejected by the policies of the other two.

Pass 'op_flags' as NULL from the handlers of the other two and only
parse the attribute when the argument is not NULL.

This is a preparation for a subsequent patch.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240311162307.545385-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 20:35:20 -07:00
Jakub Kicinski
5f20e6ab1f for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmXvm7IACgkQ6rmadz2v
 bTqdMA//VMHNHVLb4oROoXyQD9fw2mCmIUEKzP88RXfqcxsfEX7HF+k8B5ZTk0ro
 CHXTAnc79+Qqg0j24bkQKxup/fKBQVw9D+Ia4b3ytlm1I2MtyU/16xNEzVhAPU2D
 iKk6mVBsEdCbt/GjpWORy/VVnZlZpC7BOpZLxsbbxgXOndnCegyjXzSnLGJGxdvi
 zkrQTn2SrFzLi6aNpVLqrv6Nks6HJusfCKsIrtlbkQ85dulasHOtwK9s6GF60nte
 aaho+MPx3L+lWEgapsm8rR779pHaYIB/GbZUgEPxE/xUJ/V8BzDgFNLMzEiIBRMN
 a0zZam11BkBzCfcO9gkvDRByaei/dZz2jdqfU4GlHklFj1WFfz8Q7fRLEPINksvj
 WXLgJADGY5mtGbjG21FScThxzj+Ruqwx0a13ddlyI/W+P3y5yzSWsLwJG5F9p0oU
 6nlkJ4U8yg+9E1ie5ae0TibqvRJzXPjfOERZGwYDSVvfQGzv1z+DGSOPMmgNcWYM
 dIaO+A/+NS3zdbk8+1PP2SBbhHPk6kWyCUByWc7wMzCPTiwriFGY/DD2sN+Fsufo
 zorzfikUQOlTfzzD5jbmT49U8hUQUf6QIWsu7BijSiHaaC7am4S8QB2O6ibJMqdv
 yNiwvuX+ThgVIY3QKrLLqL0KPGeKMR5mtfq6rrwSpfp/b4g27FE=
 =eFgA
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2024-03-11

We've added 59 non-merge commits during the last 9 day(s) which contain
a total of 88 files changed, 4181 insertions(+), 590 deletions(-).

The main changes are:

1) Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce
   VM_SPARSE kind and vm_area_[un]map_pages to be used in bpf_arena,
   from Alexei.

2) Introduce bpf_arena which is sparse shared memory region between bpf
   program and user space where structures inside the arena can have
   pointers to other areas of the arena, and pointers work seamlessly for
   both user-space programs and bpf programs, from Alexei and Andrii.

3) Introduce may_goto instruction that is a contract between the verifier
   and the program. The verifier allows the program to loop assuming it's
   behaving well, but reserves the right to terminate it, from Alexei.

4) Use IETF format for field definitions in the BPF standard
   document, from Dave.

5) Extend struct_ops libbpf APIs to allow specify version suffixes for
   stuct_ops map types, share the same BPF program between several map
   definitions, and other improvements, from Eduard.

6) Enable struct_ops support for more than one page in trampolines,
   from Kui-Feng.

7) Support kCFI + BPF on riscv64, from Puranjay.

8) Use bpf_prog_pack for arm64 bpf trampoline, from Puranjay.

9) Fix roundup_pow_of_two undefined behavior on 32-bit archs, from Toke.
====================

Link: https://lore.kernel.org/r/20240312003646.8692-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 18:06:04 -07:00
Andrii Nakryiko
66c8473135 bpf: move sleepable flag from bpf_prog_aux to bpf_prog
prog->aux->sleepable is checked very frequently as part of (some) BPF
program run hot paths. So this extra aux indirection seems wasteful and
on busy systems might cause unnecessary memory cache misses.

Let's move sleepable flag into prog itself to eliminate unnecessary
pointer dereference.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Message-ID: <20240309004739.2961431-1-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-11 16:41:25 -07:00
Puranjay Mohan
d6170e4aaf bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
On some architectures like ARM64, PMD_SIZE can be really large in some
configurations. Like with CONFIG_ARM64_64K_PAGES=y the PMD_SIZE is
512MB.

Use 2MB * num_possible_nodes() as the size for allocations done through
the prog pack allocator. On most architectures, PMD_SIZE will be equal
to 2MB in case of 4KB pages and will be greater than 2MB for bigger page
sizes.

Fixes: ea2babac63 ("bpf: Simplify bpf_prog_pack_[size|mask]")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Closes: https://lore.kernel.org/all/7e216c88-77ee-47b8-becc-a0f780868d3c@sirena.org.uk/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202403092219.dhgcuz2G-lkp@intel.com/
Suggested-by: Song Liu <song@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Message-ID: <20240311122722.86232-1-puranjay12@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-11 16:33:30 -07:00
Jiri Olsa
379b97bbf0 selftests/bpf: Add kprobe multi triggering benchmarks
Adding kprobe multi triggering benchmarks. It's useful now to bench
new fprobe implementation and might be useful later as well.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240311211023.590321-1-jolsa@kernel.org
2024-03-11 16:06:48 -07:00
Kory Maincent
f095fefacd ptp: Move from simple ida to xarray
Move from simple ida to xarray for storing and loading the ptp_clock
pointer. This prepares support for future hardware timestamp selection by
being able to link the ptp clock index to its pointer.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240311144730.1239594-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 16:03:15 -07:00
Breno Leitao
195f88c577 vxlan: Remove generic .ndo_get_stats64
Commit 3e2f544dd8 ("net: get stats64 if device if driver is
configured") moved the callback to dev_get_tstats64() to net core, so,
unless the driver is doing some custom stats collection, it does not
need to set .ndo_get_stats64.

Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
function pointer.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20240311112437.3813987-2-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 16:01:06 -07:00
Breno Leitao
e28c5efc31 vxlan: Do not alloc tstats manually
With commit 34d21de99c ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of in this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Remove the allocation in the vxlan driver and leverage the network
core allocation instead.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20240311112437.3813987-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 16:01:06 -07:00
William Tu
eaf657f7ad devlink: Add comments to use netlink gen tool
Add the comment to remind people not to manually modify
the net/devlink/netlink_gen.c, but to use tools/net/ynl/ynl-regen.sh
to generate it.

Signed-off-by: William Tu <witu@nvidia.com>
Suggested-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240310145503.32721-1-witu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 16:00:16 -07:00
Juntong Deng
76839e2f1f net/packet: Add getsockopt support for PACKET_COPY_THRESH
Currently getsockopt does not support PACKET_COPY_THRESH,
and we are unable to get the value of PACKET_COPY_THRESH
socket option through getsockopt.

This patch adds getsockopt support for PACKET_COPY_THRESH.

In addition, this patch converts access to copy_thresh to
READ_ONCE/WRITE_ONCE.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/AM6PR03MB58487A9704FD150CF76F542899272@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:52:05 -07:00
Juntong Deng
8b6d307f43 net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
Currently getsockopt does not support NETLINK_LISTEN_ALL_NSID,
and we are unable to get the value of NETLINK_LISTEN_ALL_NSID
socket option through getsockopt.

This patch adds getsockopt support for NETLINK_LISTEN_ALL_NSID.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB58482322B7B335308DA56FE599272@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:48:34 -07:00
Andrii Nakryiko
08701e306e Merge branch 'bpf-introduce-bpf-arena'
Alexei Starovoitov says:

====================
bpf: Introduce BPF arena.

From: Alexei Starovoitov <ast@kernel.org>

v2->v3:
- contains bpf bits only, but cc-ing past audience for continuity
- since prerequisite patches landed, this series focus on the main
  functionality of bpf_arena.
- adopted Andrii's approach to support arena in libbpf.
- simplified LLVM support. Instead of two instructions it's now only one.
- switched to cond_break (instead of open coded iters) in selftests
- implemented several follow-ups that will be sent after this set
  . remember first IP and bpf insn that faulted in arena.
    report to user space via bpftool
  . copy paste and tweak glob_match() aka mini-regex as a selftests/bpf
- see patch 1 for detailed description of bpf_arena

v1->v2:
- Improved commit log with reasons for using vmap_pages_range() in arena.
  Thanks to Johannes
- Added support for __arena global variables in bpf programs
- Fixed race conditions spotted by Barret
- Fixed wrap32 issue spotted by Barret
- Fixed bpf_map_mmap_sz() the way Andrii suggested

The work on bpf_arena was inspired by Barret's work:
https://github.com/google/ghost-userspace/blob/main/lib/queue.bpf.h
that implements queues, lists and AVL trees completely as bpf programs
using giant bpf array map and integer indices instead of pointers.
bpf_arena is a sparse array that allows to use normal C pointers to
build such data structures. Last few patches implement page_frag
allocator, link list and hash table as bpf programs.

v1:
bpf programs have multiple options to communicate with user space:
- Various ring buffers (perf, ftrace, bpf): The data is streamed
  unidirectionally from bpf to user space.
- Hash map: The bpf program populates elements, and user space consumes
  them via bpf syscall.
- mmap()-ed array map: Libbpf creates an array map that is directly
  accessed by the bpf program and mmap-ed to user space. It's the fastest
  way. Its disadvantage is that memory for the whole array is reserved at
  the start.
====================

Link: https://lore.kernel.org/r/20240308010812.89848-1-alexei.starovoitov@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-11 15:43:43 -07:00
Alexei Starovoitov
8df839ae23 selftests/bpf: Add bpf_arena_htab test.
bpf_arena_htab.h - hash table implemented as bpf program

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-15-alexei.starovoitov@gmail.com
2024-03-11 15:43:43 -07:00
Alexei Starovoitov
9f2c156f90 selftests/bpf: Add bpf_arena_list test.
bpf_arena_alloc.h - implements page_frag allocator as a bpf program.
bpf_arena_list.h - doubly linked link list as a bpf program.

Compiled as a bpf program and as native C code.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-14-alexei.starovoitov@gmail.com
2024-03-11 15:43:43 -07:00
Alexei Starovoitov
80a4129fcf selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
Add unit tests for bpf_arena_alloc/free_pages() functionality
and bpf_arena_common.h with a set of common helpers and macros that
is used in this test and the following patches.

Also modify test_loader that didn't support running bpf_prog_type_syscall
programs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-13-alexei.starovoitov@gmail.com
2024-03-11 15:43:43 -07:00
Alexei Starovoitov
204c628730 bpf: Add helper macro bpf_addr_space_cast()
Introduce helper macro bpf_addr_space_cast() that emits:
rX = rX
instruction with off = BPF_ADDR_SPACE_CAST
and encodes dest and src address_space-s into imm32.

It's useful with older LLVM that doesn't emit this insn automatically.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-12-alexei.starovoitov@gmail.com
2024-03-11 15:43:42 -07:00
Andrii Nakryiko
2e7ba4f8fd libbpf: Recognize __arena global variables.
LLVM automatically places __arena variables into ".arena.1" ELF section.
In order to use such global variables bpf program must include definition
of arena map in ".maps" section, like:
struct {
       __uint(type, BPF_MAP_TYPE_ARENA);
       __uint(map_flags, BPF_F_MMAPABLE);
       __uint(max_entries, 1000);         /* number of pages */
       __ulong(map_extra, 2ull << 44);    /* start of mmap() region */
} arena SEC(".maps");

libbpf recognizes both uses of arena and creates single `struct bpf_map *`
instance in libbpf APIs.
".arena.1" ELF section data is used as initial data image, which is exposed
through skeleton and bpf_map__initial_value() to the user, if they need to tune
it before the load phase. During load phase, this initial image is copied over
into mmap()'ed region corresponding to arena, and discarded.

Few small checks here and there had to be added to make sure this
approach works with bpf_map__initial_value(), mostly due to hard-coded
assumption that map->mmaped is set up with mmap() syscall and should be
munmap()'ed. For arena, .arena.1 can be (much) smaller than maximum
arena size, so this smaller data size has to be tracked separately.
Given it is enforced that there is only one arena for entire bpf_object
instance, we just keep it in a separate field. This can be generalized
if necessary later.

All global variables from ".arena.1" section are accessible from user space
via skel->arena->name_of_var.

For bss/data/rodata the skeleton/libbpf perform the following sequence:
1. addr = mmap(MAP_ANONYMOUS)
2. user space optionally modifies global vars
3. map_fd = bpf_create_map()
4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
5. mmap(addr, MAP_FIXED, map_fd)
after step 5 user spaces see the values it wrote at step 2 at the same addresses

arena doesn't support update_map_elem. Hence skeleton/libbpf do:
1. addr = malloc(sizeof SEC ".arena.1")
2. user space optionally modifies global vars
3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
5. memcpy(real_addr, addr) // this will fault-in and allocate pages

At the end look and feel of global data vs __arena global data is the same from
bpf prog pov.

Another complication is:
struct {
  __uint(type, BPF_MAP_TYPE_ARENA);
} arena SEC(".maps");

int __arena foo;
int bar;

  ptr1 = &foo;   // relocation against ".arena.1" section
  ptr2 = &arena; // relocation against ".maps" section
  ptr3 = &bar;   // relocation against ".bss" section

Fo the kernel ptr1 and ptr2 has point to the same arena's map_fd
while ptr3 points to a different global array's map_fd.
For the verifier:
ptr1->type == unknown_scalar
ptr2->type == const_ptr_to_map
ptr3->type == ptr_to_map_value

After verification, from JIT pov all 3 ptr-s are normal ld_imm64 insns.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-11-alexei.starovoitov@gmail.com
2024-03-11 15:43:35 -07:00
Alexei Starovoitov
eed512e8ac bpftool: Recognize arena map type
Teach bpftool to recognize arena map type.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-10-alexei.starovoitov@gmail.com
2024-03-11 15:37:24 -07:00
Alexei Starovoitov
79ff13e991 libbpf: Add support for bpf_arena.
mmap() bpf_arena right after creation, since the kernel needs to
remember the address returned from mmap. This is user_vm_start.
LLVM will generate bpf_arena_cast_user() instructions where
necessary and JIT will add upper 32-bit of user_vm_start
to such pointers.

Fix up bpf_map_mmap_sz() to compute mmap size as
map->value_size * map->max_entries for arrays and
PAGE_SIZE * map->max_entries for arena.

Don't set BTF at arena creation time, since it doesn't support it.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-9-alexei.starovoitov@gmail.com
2024-03-11 15:37:24 -07:00
Alexei Starovoitov
4d2b56081c libbpf: Add __arg_arena to bpf_helpers.h
Add __arg_arena to bpf_helpers.h

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-8-alexei.starovoitov@gmail.com
2024-03-11 15:37:24 -07:00
Alexei Starovoitov
2edc3de6fb bpf: Recognize btf_decl_tag("arg: Arena") as PTR_TO_ARENA.
In global bpf functions recognize btf_decl_tag("arg:arena") as PTR_TO_ARENA.

Note, when the verifier sees:

__weak void foo(struct bar *p)

it recognizes 'p' as PTR_TO_MEM and 'struct bar' has to be a struct with scalars.
Hence the only way to use arena pointers in global functions is to tag them with "arg:arena".

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-7-alexei.starovoitov@gmail.com
2024-03-11 15:37:24 -07:00
Alexei Starovoitov
6082b6c328 bpf: Recognize addr_space_cast instruction in the verifier.
rY = addr_space_cast(rX, 0, 1) tells the verifier that rY->type = PTR_TO_ARENA.
Any further operations on PTR_TO_ARENA register have to be in 32-bit domain.

The verifier will mark load/store through PTR_TO_ARENA with PROBE_MEM32.
JIT will generate them as kern_vm_start + 32bit_addr memory accesses.

rY = addr_space_cast(rX, 1, 0) tells the verifier that rY->type = unknown scalar.
If arena->map_flags has BPF_F_NO_USER_CONV set then convert cast_user to mov32 as well.
Otherwise JIT will convert it to:
  rY = (u32)rX;
  if (rY)
     rY |= arena->user_vm_start & ~(u64)~0U;

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-6-alexei.starovoitov@gmail.com
2024-03-11 15:37:24 -07:00
Alexei Starovoitov
142fd4d2dc bpf: Add x86-64 JIT support for bpf_addr_space_cast instruction.
LLVM generates bpf_addr_space_cast instruction while translating
pointers between native (zero) address space and
__attribute__((address_space(N))).
The addr_space=1 is reserved as bpf_arena address space.

rY = addr_space_cast(rX, 0, 1) is processed by the verifier and
converted to normal 32-bit move: wX = wY

rY = addr_space_cast(rX, 1, 0) has to be converted by JIT:

aux_reg = upper_32_bits of arena->user_vm_start
aux_reg <<= 32
wX = wY // clear upper 32 bits of dst register
if (wX) // if not zero add upper bits of user_vm_start
  wX |= aux_reg

JIT can do it more efficiently:

mov dst_reg32, src_reg32  // 32-bit move
shl dst_reg, 32
or dst_reg, user_vm_start
rol dst_reg, 32
xor r11, r11
test dst_reg32, dst_reg32 // check if lower 32-bit are zero
cmove r11, dst_reg	  // if so, set dst_reg to zero
			  // Intel swapped src/dst register encoding in CMOVcc

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-5-alexei.starovoitov@gmail.com
2024-03-11 15:37:24 -07:00
Alexei Starovoitov
2fe99eb0cc bpf: Add x86-64 JIT support for PROBE_MEM32 pseudo instructions.
Add support for [LDX | STX | ST], PROBE_MEM32, [B | H | W | DW] instructions.
They are similar to PROBE_MEM instructions with the following differences:
- PROBE_MEM has to check that the address is in the kernel range with
  src_reg + insn->off >= TASK_SIZE_MAX + PAGE_SIZE check
- PROBE_MEM doesn't support store
- PROBE_MEM32 relies on the verifier to clear upper 32-bit in the register
- PROBE_MEM32 adds 64-bit kern_vm_start address (which is stored in %r12 in the prologue)
  Due to bpf_arena constructions such %r12 + %reg + off16 access is guaranteed
  to be within arena virtual range, so no address check at run-time.
- PROBE_MEM32 allows STX and ST. If they fault the store is a nop.
  When LDX faults the destination register is zeroed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-4-alexei.starovoitov@gmail.com
2024-03-11 15:37:24 -07:00
Alexei Starovoitov
667a86ad9b bpf: Disasm support for addr_space_cast instruction.
LLVM generates rX = addr_space_cast(rY, dst_addr_space, src_addr_space)
instruction when pointers in non-zero address space are used by the bpf
program. Recognize this insn in uapi and in bpf disassembler.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-3-alexei.starovoitov@gmail.com
2024-03-11 15:37:24 -07:00
Alexei Starovoitov
317460317a bpf: Introduce bpf_arena.
Introduce bpf_arena, which is a sparse shared memory region between the bpf
program and user space.

Use cases:
1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed
   anonymous region, like memcached or any key/value storage. The bpf
   program implements an in-kernel accelerator. XDP prog can search for
   a key in bpf_arena and return a value without going to user space.
2. The bpf program builds arbitrary data structures in bpf_arena (hash
   tables, rb-trees, sparse arrays), while user space consumes it.
3. bpf_arena is a "heap" of memory from the bpf program's point of view.
   The user space may mmap it, but bpf program will not convert pointers
   to user base at run-time to improve bpf program speed.

Initially, the kernel vm_area and user vma are not populated. User space
can fault in pages within the range. While servicing a page fault,
bpf_arena logic will insert a new page into the kernel and user vmas. The
bpf program can allocate pages from that region via
bpf_arena_alloc_pages(). This kernel function will insert pages into the
kernel vm_area. The subsequent fault-in from user space will populate that
page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time
can be used to prevent fault-in from user space. In such a case, if a page
is not allocated by the bpf program and not present in the kernel vm_area,
the user process will segfault. This is useful for use cases 2 and 3 above.

bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages
either at a specific address within the arena or allocates a range with the
maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees
pages and removes the range from the kernel vm_area and from user process
vmas.

bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of
bpf program is more important than ease of sharing with user space. This is
use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended.
It will tell the verifier to treat the rX = bpf_arena_cast_user(rY)
instruction as a 32-bit move wX = wY, which will improve bpf prog
performance. Otherwise, bpf_arena_cast_user is translated by JIT to
conditionally add the upper 32 bits of user vm_start (if the pointer is not
NULL) to arena pointers before they are stored into memory. This way, user
space sees them as valid 64-bit pointers.

Diff https://github.com/llvm/llvm-project/pull/84410 enables LLVM BPF
backend generate the bpf_addr_space_cast() instruction to cast pointers
between address_space(1) which is reserved for bpf_arena pointers and
default address space zero. All arena pointers in a bpf program written in
C language are tagged as __attribute__((address_space(1))). Hence, clang
provides helpful diagnostics when pointers cross address space. Libbpf and
the kernel support only address_space == 1. All other address space
identifiers are reserved.

rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the
verifier that rX->type = PTR_TO_ARENA. Any further operations on
PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will
mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate
them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar
to copy_from_kernel_nofault() except that no address checks are necessary.
The address is guaranteed to be in the 4GB range. If the page is not
present, the destination register is zeroed on read, and the operation is
ignored on write.

rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type =
unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the
verifier converts such cast instructions to mov32. Otherwise, JIT will emit
native code equivalent to:
rX = (u32)rY;
if (rY)
  rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */

After such conversion, the pointer becomes a valid user pointer within
bpf_arena range. The user process can access data structures created in
bpf_arena without any additional computations. For example, a linked list
built by a bpf program can be walked natively by user space.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Barret Rhoden <brho@google.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-2-alexei.starovoitov@gmail.com
2024-03-11 15:37:23 -07:00
Niklas Söderlund
a290d4cb89 ravb: Correct buffer size to map for R-Car Rx
When creating a helper to allocate and align an skb one location where
the skb data size was updated was missed. This can lead to a warning
being printed when the memory is being unmapped as it now always unmap
the maximum frame size, instead of the size after it have been
aligned.

This was correctly done for RZ/G2L but missed for R-Car.

Fixes: cfbad64706 ("ravb: Create helper to allocate skb and align it")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20240308224237.496924-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:16:57 -07:00
Breno Leitao
7598531c3a net: amt: Remove generic .ndo_get_stats64
Commit 3e2f544dd8 ("net: get stats64 if device if driver is
configured") moved the callback to dev_get_tstats64() to net core, so,
unless the driver is doing some custom stats collection, it does not
need to set .ndo_get_stats64.

Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
function pointer.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20240308162606.1597287-2-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:16:13 -07:00
Breno Leitao
2892956e93 net: amt: Move stats allocation to core
With commit 34d21de99c ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core instead
of this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Move amt driver to leverage the core allocation.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20240308162606.1597287-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:16:13 -07:00
Jakub Kicinski
ba980f8dff netlink: specs: support generating code for genl socket priv
The family struct is auto-generated for new families, support
use of the sock_priv_* mechanism added in commit a731132424
("genetlink: introduce per-sock family private storage").

For example if the family wants to use struct sk_buff as its
private struct (unrealistic but just for illustration), it would
add to its spec:

  kernel-family:
    headers: [ "linux/skbuff.h" ]
    sock-priv: struct sk_buff

ynl-gen-c will declare the appropriate priv size and hook
in function prototypes to be implemented by the family.

Link: https://lore.kernel.org/r/20240308190319.2523704-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:15:42 -07:00
Jakub Kicinski
a0d942960d tools: ynl: remove trailing semicolon
Commit e8a6c515ff ("tools: ynl: allow user to pass enum string
instead of scalar value") added a semicolon at the end of a line.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240308192555.2550253-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:15:27 -07:00
Justin Iurman
fcac05daa7 net: ipv6: exthdrs: get rid of ipv6_skb_net()
Get rid of ipv6_skb_net() which is only used in ipv6_hop_ioam().

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://lore.kernel.org/r/20240308185343.39272-1-justin.iurman@uliege.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:15:08 -07:00
Jakub Kicinski
d9c822ffef Merge branch 'selftests-mptcp-various-improvements'
Matthieu Baerts says:

====================
selftests: mptcp: various improvements

In this series from Geliang, there are various improvements in MPTCP
selftests: sharing code, doing actions the same way, colours, etc.

Patch 1 prints all error messages to stdout: what was done in almost all
other MPTCP selftests. This can be now easily changed later if needed.

Patch 2 makes sure the test counter is continuous in mptcp_connect.sh.

Patch 3 aligns the messages that are printed in mptcp_connect.sh.

Patch 4 prints each test results in mptcp_sockopt.sh, similar to what we
have in the TAP output.

Patch 5 moves the different test counters to a single one in
mptcp_lib.sh, to uniform how it is used.

Patch 6 moves how titles are printed from mptcp_join.sh to the lib, to
be reused in patch 7 by all other MPTCP selftests.

Patch 8 uses the '+=' operator to append strings instead of repeating
twice the variable name: that's shorter, easier to read.

Patch 9 adds colours for the [ OK ], [SKIP], [FAIL] and INFO keywords in
all MPTCP selftests.

Patch 10 to 12 are some preparation patches for patch 13: patch 10
modifies how some 'test_fail' helpers, patch 11 moves a helper from
userspace_pm.sh to the lib, and patch 12 changes where titles are
printed in userspace_pm.sh. Patch 13 moves some duplicated helpers from
mptcp_join.sh and userspace_pm.sh to mptcp_lib.sh.

Patch 14 moves duplicated read-only variables from mptcp_join.sh and
userspace_pm.sh to mptcp_lib.sh as well.

Patch 15 uses explicit variables instead of hard-coded numbers for the
exit status.
====================

Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-0-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:29 -07:00
Geliang Tang
8f7a69a8e7 selftests: mptcp: use KSFT_SKIP/KSFT_PASS/KSFT_FAIL
This patch uses the public var KSFT_SKIP in mptcp_lib.sh instead of
ksft_skip, and drop 'ksft_skip=4' in mptcp_join.sh.

Use KSFT_PASS and KSFT_FAIL macros instead of 0 and 1 after 'exit '
and 'ret=' in all scripts:

        exit 0 -> exit ${KSFT_PASS}
        exit 1 -> exit ${KSFT_FAIL}
         ret=0 ->  ret=${KSFT_PASS}
         ret=1 ->  ret=${KSFT_FAIL}

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-15-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:27 -07:00
Geliang Tang
23a0485d1c selftests: mptcp: declare event macros in mptcp_lib
MPTCP event macros (SUB_ESTABLISHED, LISTENER_CREATED, LISTENER_CLOSED),
and the protocol family macros (AF_INET, AF_INET6) are defined in both
mptcp_join.sh and userspace_pm.sh. In order not to duplicate code, this
patch declares them all in mptcp_lib.sh with MPTCP_LIB_ prefixs.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-14-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:27 -07:00
Geliang Tang
7f0782ca1c selftests: mptcp: add mptcp_lib_verify_listener_events
To avoid duplicated code in different MPTCP selftests, we can add and use
helpers defined in mptcp_lib.sh.

The helper verify_listener_events() is defined both in mptcp_join.sh and
userspace_pm.sh, export it into mptcp_lib.sh and rename it with mptcp_lib_
prefix. Use this new helper in both scripts.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-13-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:27 -07:00
Geliang Tang
8ebb441965 selftests: mptcp: print_test out of verify_listener_events
verify_listener_events() helper will be exported into mptcp_lib.sh as a
public function, but print_test() is invoked in it, which is a private
function in userspace_pm.sh only. So this patch moves print_test() out of
verify_listener_events().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-12-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:27 -07:00
Geliang Tang
663260e146 selftests: mptcp: extract mptcp_lib_check_expected
Extract the main part of check_expected() in userspace_pm.sh to a new
function mptcp_lib_check_expected() in mptcp_lib.sh. It will be used
in both mptcp_john.sh and userspace_pm.sh. check_expected_one() is
moved into mptcp_lib.sh too as mptcp_lib_check_expected_one().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-11-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
339c225e2e selftests: mptcp: call test_fail without argument
This patch modifies test_fail() to call mptcp_lib_pr_fail() only if there
are arguments (if [ ${#} -gt 0 ]) in userspace_pm.sh, add arguments
"unexpected type: ${type}" when calling test_fail() from test_remove().
Then mptcp_lib_pr_fail() can be used in check_expected_one() instead of
test_fail().

The same in mptcp_join.sh, calling fail_test() without argument, and adapt
this helper not to call print_fail() in this case.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-10-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
747ba8783a selftests: mptcp: print test results with colors
To unify the output formats of all test scripts, this patch adds
four more helpers:

	mptcp_lib_pr_ok()
	mptcp_lib_pr_skip()
	mptcp_lib_pr_fail()
	mptcp_lib_pr_info()

to print out [ OK ], [SKIP], [FAIL] and 'INFO: ' with colors. Use them
in all scripts to print the "ok/skip/fail/info' using the same 'format'.

Having colors helps to quickly identify issues when looking at a long
list of output logs and results.

Note that now all print the same keywords, which was not the case
before, but it is good to uniform that.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-9-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
e7c42bf4d3 selftests: mptcp: use += operator to append strings
This patch uses addition assignment operator (+=) to append strings
instead of duplicating the variable name in mptcp_connect.sh and
mptcp_join.sh.

This can make the statements shorter.

Note: in mptcp_connect.sh, add a local variable extra in do_transfer to
save the various extra warning logs, using += to append it. And add a
new variable tc_info to save various tc info, also using += to append it.
This can make the code more readable and prepare for the next commit.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-8-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
aa7694766f selftests: mptcp: print test results with counters
This patch adds a new helper mptcp_lib_print_title(), a wrapper of
mptcp_lib_inc_test_counter() and mptcp_lib_pr_title_counter(), to
print out test counter in each test result and increase the counter.
Use this helper to print out test counters for every tests in diag.sh,
mptcp_connect.sh, mptcp_sockopt.sh, pm_netlink.sh, simult_flows.sh,
and userspace_pm.sh.

diag.sh:

01 no msk on netns creation                          [  ok  ]
02 listen match for dport 10000                      [  ok  ]
03 listen match for sport 10000                      [  ok  ]
04 listen match for saddr and sport                  [  ok  ]
05 all listen sockets                                [  ok  ]

mptcp_connect.sh:

01 New MPTCP socket can be blocked via sysctl                       [ OK ]
02 Validating network environment with pings                        [ OK ]
INFO: Using loss of 0.85% delay 31 ms reorder .. with delay 7ms on ns3eth4
03 ns1 MPTCP -> ns1 (10.0.1.1:10000  ) MPTCP     (duration    69ms) [ OK ]
04 ns1 MPTCP -> ns1 (10.0.1.1:10001  ) TCP       (duration    20ms) [ OK ]
05 ns1 TCP   -> ns1 (10.0.1.1:10002  ) MPTCP     (duration    16ms) [ OK ]

mptcp_sockopt.sh:

01 Transfer v4                                       [ OK ]
02 Mark v4                                           [ OK ]
03 Transfer v6                                       [ OK ]
04 Mark v6                                           [ OK ]
05 SOL_MPTCP sockopt v4                              [ OK ]

pm_netlink.sh:

01 defaults addr list                                [ OK ]
02 simple add/get addr                               [ OK ]
03 dump addrs                                        [ OK ]
04 simple del addr                                   [ OK ]
05 dump addrs after del                              [ OK ]

simult_flows.sh:

01 balanced bwidth                                     7391 max 8456 [ OK ]
02 balanced bwidth - reverse direction                 7403 max 8456 [ OK ]
03 balanced bwidth with unbalanced delay               7429 max 8456 [ OK ]
04 balanced bwidth with unbalanced delay - reverse ... 7485 max 8456 [ OK ]
05 unbalanced bwidth                                   7549 max 8456 [ OK ]

userspace_pm.sh:

01 Created network namespaces ns1, ns2                               [ OK ]
INFO: Make connections
02 Established IPv4 MPTCP Connection ns2 => ns1                      [ OK ]
03 Established IPv6 MPTCP Connection ns2 => ns1                      [ OK ]
INFO: Announce tests
04 ADD_ADDR 10.0.2.2 (ns2) => ns1, invalid token                     [ OK ]
05 ADD_ADDR id:67 10.0.2.2 (ns2) => ns1, reuse port                  [ OK ]

Having test counters helps to quickly identify issues when looking at a
long list of output logs and results.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-7-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
3382bb0970 selftests: mptcp: add print_title in mptcp_lib
This patch adds a new variable MPTCP_LIB_TEST_FORMAT as the test title
printing format. Also add a helper mptcp_lib_print_title() to use this
format to print the test title with test counters. They are used in
mptcp_join.sh first.

Each MPTCP selftest is having subtests, and it helps to give them a
number to quickly identify them. This can be managed by mptcp_lib.sh,
reusing what has been done here. The following commit will use these
new helpers in the other tests.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-6-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
9e6a39ecb9 selftests: mptcp: export TEST_COUNTER variable
Variable TEST_COUNT are used in mptcp_connect.sh and mptcp_join.sh as
test counters, which are initialized to 0, while variable test_cnt are used
in diag.sh and simult_flows.sh, which are initialized to 1. To maintain
consistency, this patch renames them all as MPTCP_LIB_TEST_COUNTER,
initializes it to 1, and exports it into mptcp_lib.sh.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-5-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:26 -07:00
Geliang Tang
fd959262c1 selftests: mptcp: sockopt: print every test result
Only total test results are printed out in mptcp_sockopt.sh:

PASS: all packets had packet mark set
PASS: SOL_MPTCP getsockopt has expected information
PASS: TCP_INQ cmsg/ioctl -t tcp
PASS: TCP_INQ cmsg/ioctl -6 -t tcp
PASS: TCP_INQ cmsg/ioctl -r tcp
PASS: TCP_INQ cmsg/ioctl -6 -r tcp
PASS: TCP_INQ cmsg/ioctl -r tcp -t tcp

They mismatch with the test results:

ok 1 - mptcp_sockopt: mark ipv4
ok 2 - mptcp_sockopt: transfer ipv4
ok 3 - mptcp_sockopt: mark ipv6
ok 4 - mptcp_sockopt: transfer ipv6
ok 5 - mptcp_sockopt: sockopt v4
ok 6 - mptcp_sockopt: sockopt v6
ok 7 - mptcp_sockopt: TCP_INQ: -t tcp
ok 8 - mptcp_sockopt: TCP_INQ: -6 -t tcp
ok 9 - mptcp_sockopt: TCP_INQ: -r tcp
ok 10 - mptcp_sockopt: TCP_INQ: -6 -r tcp
ok 11 - mptcp_sockopt: TCP_INQ: -r tcp -t tcp

'mptcp_sockopt.sh' now display more detailed results + why (what you had
in a former patch from v6, merged here). It no longer displays 'PASS:',
because it is duplicated info now that the detailed are displayed:

Transfer v4                                       [ OK ]
Mark v4                                           [ OK ]
Transfer v6                                       [ OK ]
Mark v6                                           [ OK ]
SOL_MPTCP sockopt v4                              [ OK ]
SOL_MPTCP sockopt v6                              [ OK ]
TCP_INQ cmsg/ioctl -t tcp                         [ OK ]
TCP_INQ cmsg/ioctl -6 -t tcp                      [ OK ]
TCP_INQ cmsg/ioctl -r tcp                         [ OK ]
TCP_INQ cmsg/ioctl -6 -r tcp                      [ OK ]
TCP_INQ cmsg/ioctl -r tcp -t tcp                  [ OK ]

Also fix the TAP output:

ok 1 - mptcp_sockopt: transfer ipv4
ok 2 - mptcp_sockopt: mark ipv4
ok 3 - mptcp_sockopt: transfer ipv6
ok 4 - mptcp_sockopt: mark ipv6
ok 5 - mptcp_sockopt: sockopt v4
ok 6 - mptcp_sockopt: sockopt v6
ok 7 - mptcp_sockopt: TCP_INQ: -t tcp
ok 8 - mptcp_sockopt: TCP_INQ: -6 -t tcp
ok 9 - mptcp_sockopt: TCP_INQ: -r tcp
ok 10 - mptcp_sockopt: TCP_INQ: -6 -r tcp
ok 11 - mptcp_sockopt: TCP_INQ: -r tcp -t tcp

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-4-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:25 -07:00
Geliang Tang
c9161a0f8f selftests: mptcp: connect: fix misaligned output
The first [ OK ] in the output of mptcp_connect.sh misaligns with the
others:

New MPTCP socket can be blocked via sysctl              [ OK ]
INFO: validating network environment with pings
INFO: Using loss of 0.85% delay 16 ms reorder 95% 70% with delay 4ms on
ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP   (duration   184ms) [ OK ]
ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP     (duration    50ms) [ OK ]
ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP   (duration    55ms) [ OK ]

This patch aligns them by using 69 chars to display the first two lines,
and 50 chars for the other. Since 19 chars are used to display duration
time. Also print out a [ OK ] at the end of the 2nd line for consistency.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-3-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:25 -07:00
Geliang Tang
01ed983810 selftests: mptcp: connect: add dedicated port counter
This patch adds a new dedicated counter 'port' instead of TEST_COUNT
to increase port numbers in mptcp_connect.sh.

This can avoid outputting discontinuous test counters.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-2-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:25 -07:00
Geliang Tang
6215df11b9 selftests: mptcp: print all error messages to stdout
Some error messages are printed to stderr while the others are printed
to 'stdout'. As part of the unification, this patch drop "1>&2" to let
all errors messages are printed to 'stdout'.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240308-upstream-net-next-20240308-selftests-mptcp-unification-v1-1-4f42c347b653@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11 15:07:25 -07:00