Commit graph

1107121 commits

Author SHA1 Message Date
Andrii Nakryiko
ce6dc74a0a selftests/bpf: add test of __weak unknown virtual __kconfig extern
Exercise libbpf's logic for unknown __weak virtual __kconfig externs.
USDT selftests are already excercising non-weak known virtual extern
already (LINUX_HAS_BPF_COOKIE), so no need to add explicit tests for it.

Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-19 09:33:18 -07:00
Andrii Nakryiko
55d00c37eb libbpf: generalize virtual __kconfig externs and use it for USDT
Libbpf supports single virtual __kconfig extern currently: LINUX_KERNEL_VERSION.
LINUX_KERNEL_VERSION isn't coming from /proc/kconfig.gz and is intead
customly filled out by libbpf.

This patch generalizes this approach to support more such virtual
__kconfig externs. One such extern added in this patch is
LINUX_HAS_BPF_COOKIE which is used for BPF-side USDT supporting code in
usdt.bpf.h instead of using CO-RE-based enum detection approach for
detecting bpf_get_attach_cookie() BPF helper. This allows to remove
otherwise not needed CO-RE dependency and keeps user-space and BPF-side
parts of libbpf's USDT support strictly in sync in terms of their
feature detection.

We'll use similar approach for syscall wrapper detection for
BPF_KSYSCALL() BPF-side macro in follow up patch.

Generally, currently libbpf reserves CONFIG_ prefix for Kconfig values
and LINUX_ for virtual libbpf-backed externs. In the future we might
extend the set of prefixes that are supported. This can be done without
any breaking changes, as currently any __kconfig extern with
unrecognized name is rejected.

For LINUX_xxx externs we support the normal "weak rule": if libbpf
doesn't recognize given LINUX_xxx extern but such extern is marked as
__weak, it is not rejected and defaults to zero.  This follows
CONFIG_xxx handling logic and will allow BPF applications to
opportunistically use newer libbpf virtual externs without breaking on
older libbpf versions unnecessarily.

Tested-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-19 09:33:17 -07:00
Jon Doron
9ff5efdeb0 libbpf: perfbuf: Add API to get the ring buffer
Add support for writing a custom event reader, by exposing the ring
buffer.

With the new API perf_buffer__buffer() you will get access to the
raw mmaped()'ed per-cpu underlying memory of the ring buffer.

This region contains both the perf buffer data and header
(struct perf_event_mmap_page), which manages the ring buffer
state (head/tail positions, when accessing the head/tail position
it's important to take into consideration SMP).
With this type of low level access one can implement different types of
consumers here are few simple examples where this API helps with:

1. perf_event_read_simple is allocating using malloc, perhaps you want
   to handle the wrap-around in some other way.
2. Since perf buf is per-cpu then the order of the events is not
   guarnteed, for example:
   Given 3 events where each event has a timestamp t0 < t1 < t2,
   and the events are spread on more than 1 CPU, then we can end
   up with the following state in the ring buf:
   CPU[0] => [t0, t2]
   CPU[1] => [t1]
   When you consume the events from CPU[0], you could know there is
   a t1 missing, (assuming there are no drops, and your event data
   contains a sequential index).
   So now one can simply do the following, for CPU[0], you can store
   the address of t0 and t2 in an array (without moving the tail, so
   there data is not perished) then move on the CPU[1] and set the
   address of t1 in the same array.
   So you end up with something like:
   void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
   and move the tails as you process in order.
3. Assuming there are multiple CPUs and we want to start draining the
   messages from them, then we can "pick" with which one to start with
   according to the remaining free space in the ring buffer.

Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220715181122.149224-1-arilou@gmail.com
2022-07-15 12:53:22 -07:00
Andrii Nakryiko
8eab0a09a2 Merge branch 'Use lightweigt version of bpftool'
Pu Lehui says:

====================

Currently, samples/bpf, tools/runqslower and bpf/iterators use bpftool
for vmlinux.h, skeleton, and static linking only. We can use lightweight
bootstrap version of bpftool to handle these, and it will be faster.

v2:
- make libbpf and bootstrap bpftool independent. and make it simple.

v1: https://lore.kernel.org/bpf/20220712030813.865410-1-pulehui@huawei.com
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-07-15 12:03:25 -07:00
Pu Lehui
3848636b4a bpf: iterators: Build and use lightweight bootstrap version of bpftool
kernel/bpf/preload/iterators use bpftool for vmlinux.h, skeleton, and
static linking only. So we can use lightweight bootstrap version of
bpftool to handle these, and it will be faster.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220714024612.944071-4-pulehui@huawei.com
2022-07-15 12:01:30 -07:00
Pu Lehui
3a2a58c447 tools: runqslower: Build and use lightweight bootstrap version of bpftool
tools/runqslower use bpftool for vmlinux.h, skeleton, and static linking
only. So we can use lightweight bootstrap version of bpftool to handle
these, and it will be faster.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220714024612.944071-3-pulehui@huawei.com
2022-07-15 12:01:30 -07:00
Pu Lehui
2e4966288c samples: bpf: Fix cross-compiling error by using bootstrap bpftool
Currently, when cross compiling bpf samples, the host side cannot
use arch-specific bpftool to generate vmlinux.h or skeleton. Since
samples/bpf use bpftool for vmlinux.h, skeleton, and static linking
only, we can use lightweight bootstrap version of bpftool to handle
these, and it's always host-native.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220714024612.944071-2-pulehui@huawei.com
2022-07-15 12:01:30 -07:00
Ben Dooks
a2a5580fcb bpf: Fix check against plain integer v 'NULL'
When checking with sparse, btf_show_type_value() is causing a
warning about checking integer vs NULL when the macro is passed
a pointer, due to the 'value != 0' check. Stop sparse complaining
about any type-casting by adding a cast to the typeof(value).

This fixes the following sparse warnings:

kernel/bpf/btf.c:2579:17: warning: Using plain integer as NULL pointer
kernel/bpf/btf.c:2581:17: warning: Using plain integer as NULL pointer
kernel/bpf/btf.c:3407:17: warning: Using plain integer as NULL pointer
kernel/bpf/btf.c:3758:9: warning: Using plain integer as NULL pointer

Signed-off-by: Ben Dooks <ben.dooks@sifive.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220714100322.260467-1-ben.dooks@sifive.com
2022-07-15 09:55:20 -07:00
Alexei Starovoitov
9c7c48d6a1 bpf: Fix subprog names in stack traces.
The commit 7337224fc1 ("bpf: Improve the info.func_info and info.func_info_rec_size behavior")
accidently made bpf_prog_ksym_set_name() conservative for bpf subprograms.
Fixed it so instead of "bpf_prog_tag_F" the stack traces print "bpf_prog_tag_full_subprog_name".

Fixes: 7337224fc1 ("bpf: Improve the info.func_info and info.func_info_rec_size behavior")
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220714211637.17150-1-alexei.starovoitov@gmail.com
2022-07-14 23:43:01 -07:00
Jiri Olsa
7fb27a56b9 selftests/bpf: Do not attach kprobe_multi bench to bpf_dispatcher_xdp_func
Alexei reported crash by running test_progs -j on system
with 32 cpus.

It turned out the kprobe_multi bench test that attaches all
ftrace-able functions will race with bpf_dispatcher_update,
that calls bpf_arch_text_poke on bpf_dispatcher_xdp_func,
which is ftrace-able function.

Ftrace is not aware of this update so this will cause
ftrace_bug with:

  WARNING: CPU: 6 PID: 1985 at
  arch/x86/kernel/ftrace.c:94 ftrace_verify_code+0x27/0x50
  ...
  ftrace_replace_code+0xa3/0x170
  ftrace_modify_all_code+0xbd/0x150
  ftrace_startup_enable+0x3f/0x50
  ftrace_startup+0x98/0xf0
  register_ftrace_function+0x20/0x60
  register_fprobe_ips+0xbb/0xd0
  bpf_kprobe_multi_link_attach+0x179/0x430
  __sys_bpf+0x18a1/0x2440
  ...
  ------------[ ftrace bug ]------------
  ftrace failed to modify
  [<ffffffff818d9380>] bpf_dispatcher_xdp_func+0x0/0x10
   actual:   ffffffe9:7b:ffffff9c:77:1e
  Setting ftrace call site to call ftrace function

It looks like we need some way to hide some functions
from ftrace, but meanwhile we workaround this by skipping
bpf_dispatcher_xdp_func from kprobe_multi bench test.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220714082316.479181-1-jolsa@kernel.org
2022-07-14 22:34:46 -07:00
Ben Dooks
96a233e600 bpf: Add endian modifiers to fix endian warnings
A couple of the syscalls which load values (bpf_skb_load_helper_16() and
bpf_skb_load_helper_32()) are using u16/u32 types which are triggering
warnings as they are then converted from big-endian to CPU-endian. Fix
these by making the types __be instead.

Fixes the following sparse warnings:

  net/core/filter.c:246:32: warning: cast to restricted __be16
  net/core/filter.c:246:32: warning: cast to restricted __be16
  net/core/filter.c:246:32: warning: cast to restricted __be16
  net/core/filter.c:246:32: warning: cast to restricted __be16
  net/core/filter.c:273:32: warning: cast to restricted __be32
  net/core/filter.c:273:32: warning: cast to restricted __be32
  net/core/filter.c:273:32: warning: cast to restricted __be32
  net/core/filter.c:273:32: warning: cast to restricted __be32
  net/core/filter.c:273:32: warning: cast to restricted __be32
  net/core/filter.c:273:32: warning: cast to restricted __be32

Signed-off-by: Ben Dooks <ben.dooks@sifive.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220714105101.297304-1-ben.dooks@sifive.com
2022-07-14 23:00:48 +02:00
Yafang Shao
5002615a37 bpf: Warn on non-preallocated case for BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE
BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE is also tracing type, which may
cause unexpected memory allocation if we set BPF_F_NO_PREALLOC. Let's
also warn on it similar as we do in case of BPF_PROG_TYPE_RAW_TRACEPOINT.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220713160936.57488-1-laoar.shao@gmail.com
2022-07-14 22:51:35 +02:00
Maciej Fijalkowski
ca2e1a6270 xsk: Mark napi_id on sendmsg()
When application runs in busy poll mode and does not receive a single
packet but only sends them, it is currently impossible to get into
napi_busy_loop() as napi_id is only marked on Rx side in xsk_rcv_check().
In there, napi_id is being taken from xdp_rxq_info carried by xdp_buff.
From Tx perspective, we do not have access to it. What we have handy is
the xsk pool.

Xsk pool works on a pool of internal xdp_buff wrappers called xdp_buff_xsk.
AF_XDP ZC enabled drivers call xp_set_rxq_info() so each of xdp_buff_xsk
has a valid pointer to xdp_rxq_info of underlying queue. Therefore, on Tx
side, napi_id can be pulled from xs->pool->heads[0].xdp.rxq->napi_id. Hide
this pointer chase under helper function, xsk_pool_get_napi_id().

Do this only for sockets working in ZC mode as otherwise rxq pointers would
not be initialized.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220707130842.49408-1-maciej.fijalkowski@intel.com
2022-07-14 22:45:34 +02:00
Nathan Chancellor
33f32e5072 bpf, arm64: Mark dummy_tramp as global
When building with clang + CONFIG_CFI_CLANG=y, the following error
occurs at link time:

  ld.lld: error: undefined symbol: dummy_tramp

dummy_tramp is declared globally in C but its definition in inline
assembly does not use .global, which prevents clang from properly
resolving the references to it when creating the CFI jump tables.

Mark dummy_tramp as global so that the reference can be properly
resolved.

Fixes: b2ad54e153 ("bpf, arm64: Implement bpf_arch_text_poke() for arm64")
Suggested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1661
Link: https://lore.kernel.org/bpf/20220713173503.3889486-1-nathan@kernel.org
2022-07-14 16:57:26 +02:00
Linkui Xiao
94bf6aad5d selftests/bpf: Return true/false (not 1/0) from bool functions
Return boolean values ("true" or "false") instead of 1 or 0 from bool
functions.  This fixes the following warnings from coccicheck:

tools/testing/selftests/bpf/progs/test_xdp_noinline.c:407:9-10: WARNING:
return of 0/1 in function 'decap_v4' with return type bool
tools/testing/selftests/bpf/progs/test_xdp_noinline.c:389:9-10: WARNING:
return of 0/1 in function 'decap_v6' with return type bool
tools/testing/selftests/bpf/progs/test_xdp_noinline.c:290:9-10: WARNING:
return of 0/1 in function 'encap_v6' with return type bool
tools/testing/selftests/bpf/progs/test_xdp_noinline.c:264:9-10: WARNING:
return of 0/1 in function 'parse_tcp' with return type bool
tools/testing/selftests/bpf/progs/test_xdp_noinline.c:242:9-10: WARNING:
return of 0/1 in function 'parse_udp' with return type bool

Generated by: scripts/coccinelle/misc/boolreturn.cocci

Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Linkui Xiao <xiaolinkui@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20220714015647.25074-1-xiaolinkui@kylinos.cn
2022-07-13 23:09:16 -07:00
Anquan Wu
bf3f003785 libbpf: Fix the name of a reused map
BPF map name is limited to BPF_OBJ_NAME_LEN.
A map name is defined as being longer than BPF_OBJ_NAME_LEN,
it will be truncated to BPF_OBJ_NAME_LEN when a userspace program
calls libbpf to create the map. A pinned map also generates a path
in the /sys. If the previous program wanted to reuse the map,
it can not get bpf_map by name, because the name of the map is only
partially the same as the name which get from pinned path.

The syscall information below show that map name "process_pinned_map"
is truncated to "process_pinned_".

    bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/process_pinned_map",
    bpf_fd=0, file_flags=0}, 144) = -1 ENOENT (No such file or directory)

    bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4,
    value_size=4,max_entries=1024, map_flags=0, inner_map_fd=0,
    map_name="process_pinned_",map_ifindex=0, btf_fd=3, btf_key_type_id=6,
    btf_value_type_id=10,btf_vmlinux_value_type_id=0}, 72) = 4

This patch check that if the name of pinned map are the same as the
actual name for the first (BPF_OBJ_NAME_LEN - 1),
bpf map still uses the name which is included in bpf object.

Fixes: 26736eb9a4 ("tools: libbpf: allow map reuse")
Signed-off-by: Anquan Wu <leiqi96@hotmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/OSZP286MB1725CEA1C95C5CB8E7CCC53FB8869@OSZP286MB1725.JPNP286.PROD.OUTLOOK.COM
2022-07-13 22:18:37 -07:00
Linkui Xiao
b1fc28b338 samples: bpf: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE
The ARRAY_SIZE macro is more compact and more formal in linux source.

Signed-off-by: Linkui Xiao <xiaolinkui@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220712072302.13761-1-xiaolinkui@kylinos.cn
2022-07-13 21:43:52 -07:00
Joanne Koong
8ab4cdcf03 bpf: Tidy up verifier check_func_arg()
This patch does two things:

1. For matching against the arg type, the match should be against the
base type of the arg type, since the arg type can have different
bpf_type_flags set on it.

2. Uses switch casing to improve readability + efficiency.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/r/20220712210603.123791-1-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-13 14:45:58 -07:00
Hengqi Chen
8ed2f5a6f3 libbpf: Error out when binary_path is NULL for uprobe and USDT
binary_path is a required non-null parameter for bpf_program__attach_usdt
and bpf_program__attach_uprobe_opts. Check it against NULL to prevent
coredump on strchr.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220712025745.2703995-1-hengqi.chen@gmail.com
2022-07-13 12:02:59 -07:00
Yafang Shao
ace2bee839 bpf: Make non-preallocated allocation low priority
GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
if we allocate too much GFP_ATOMIC memory. For example, when we set the
memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
easily break the memcg limit by force charge. So it is very dangerous to
use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
__GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
too much memory. There's a plan to completely remove __GFP_ATOMIC in the
mm side[1], so let's use GFP_NOWAIT instead.

We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
too memory expensive for some cases. That means removing __GFP_HIGH
doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
it-avoiding issues caused by too much memory. So let's remove it.

This fix can also apply to other run-time allocations, for example, the
allocation in lpm trie, local storage and devmap. So let fix it
consistently over the bpf code

It also fixes a typo in the comment.

[1]. https://lore.kernel.org/linux-mm/163712397076.13692.4727608274002939094@noble.neil.brown.name/

Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Link: https://lore.kernel.org/r/20220709154457.57379-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-12 17:44:27 -07:00
Song Liu
1d5f82d9dd bpf, x86: fix freeing of not-finalized bpf_prog_pack
syzbot reported a few issues with bpf_prog_pack [1], [2]. This only happens
with multiple subprogs. In jit_subprogs(), we first call bpf_int_jit_compile()
on each sub program. And then, we call it on each sub program again. jit_data
is not freed in the first call of bpf_int_jit_compile(). Similarly we don't
call bpf_jit_binary_pack_finalize() in the first call of bpf_int_jit_compile().

If bpf_int_jit_compile() failed for one sub program, we will call
bpf_jit_binary_pack_finalize() for this sub program. However, we don't have a
chance to call it for other sub programs. Then we will hit "goto out_free" in
jit_subprogs(), and call bpf_jit_free on some subprograms that haven't got
bpf_jit_binary_pack_finalize() yet.

At this point, bpf_jit_binary_pack_free() is called and the whole 2MB page is
freed erroneously.

Fix this with a custom bpf_jit_free() for x86_64, which calls
bpf_jit_binary_pack_finalize() if necessary. Also, with custom
bpf_jit_free(), bpf_prog_aux->use_bpf_prog_pack is not needed any more,
remove it.

Fixes: 1022a5498f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc")
[1] https://syzkaller.appspot.com/bug?extid=2f649ec6d2eea1495a8f
[2] https://syzkaller.appspot.com/bug?extid=87f65c75f4a72db05445
Reported-by: syzbot+2f649ec6d2eea1495a8f@syzkaller.appspotmail.com
Reported-by: syzbot+87f65c75f4a72db05445@syzkaller.appspotmail.com
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20220706002612.4013790-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-12 17:33:22 -07:00
Roman Gushchin
4201d9ab3e bpf: reparent bpf maps on memcg offlining
The memory consumed by a bpf map is always accounted to the memory
cgroup of the process which created the map. The map can outlive
the memory cgroup if it's used by processes in other cgroups or
is pinned on bpffs. In this case the map pins the original cgroup
in the dying state.

For other types of objects (slab objects, non-slab kernel allocations,
percpu objects and recently LRU pages) there is a reparenting process
implemented: on cgroup offlining charged objects are getting
reassigned to the parent cgroup. Because all charges and statistics
are fully recursive it's a fairly cheap operation.

For efficiency and consistency with other types of objects, let's do
the same for bpf maps. Fortunately thanks to the objcg API, the
required changes are minimal.

Please, note that individual allocations (slabs, percpu and large
kmallocs) already have the reparenting mechanism. This commit adds
it to the saved map->memcg pointer by replacing it to map->objcg.
Because dying cgroups are not visible for a user and all charges are
recursive, this commit doesn't bring any behavior changes for a user.

v2:
  added a missing const qualifier

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Link: https://lore.kernel.org/r/20220711162827.184743-1-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-12 15:59:12 -07:00
Alexei Starovoitov
d673532daf Merge branch 'bpf: add a ksym BPF iterator'
Alan Maguire says:

====================

a ksym BPF iterator would be useful as it would allow more flexible
interactions with kernel symbols than are currently supported; it could
for example create more efficient map representations for lookup,
speed up symbol resolution etc.

The idea was initially discussed here [1].

Changes since v5 [2]:
- no need to add kallsym_iter to bpf_iter.h as it has existed in kernels
  for a long time so will by in vmlinux.h for older kernels too, unlike
 struct bpf_iter__ksym (Yonghong, patch 2)

Changes since v4 [3]:

- add BPF_ITER_RESCHED to improve responsiveness (Hao, patch 1)
- remove pr_warn to be consistent with other iterators (Andrii, patch 1)
- add definitions to bpf_iter.h to ensure iter tests build on older
  kernels (Andrii, patch 2)

Changes since v3 [4]:

- use late_initcall() to register iter; means we are both consistent
  with other iters and can encapsulate all iter-specific code in
  kallsyms.c in CONFIG_BPF_SYSCALL (Alexei, Yonghong, patch 1).

Changes since v2 [5]:

- set iter->show_value on initialization based on current creds
  and use it in selftest to determine if we show values
  (Yonghong, patches 1/2)
- inline iter registration into kallsyms_init (Yonghong, patch 1)

Changes since RFC [6]:

- change name of iterator (and associated structures/fields) to "ksym"
  (Andrii, patches 1, 2)
- remove dependency on CONFIG_PROC_FS; it was used for other BPF
  iterators, and I assumed it was needed because of seq ops but I
  don't think it is required on digging futher (Andrii, patch 1)

[1] https://lore.kernel.org/all/YjRPZj6Z8vuLeEZo@krava/
[2] https://lore.kernel.org/bpf/1657490998-31468-1-git-send-email-alan.maguire@oracle.com/
[3] https://lore.kernel.org/bpf/1657113391-5624-1-git-send-email-alan.maguire@oracle.com/
[4] https://lore.kernel.org/bpf/1656942916-13491-1-git-send-email-alan.maguire@oracle.com
[5] https://lore.kernel.org/bpf/1656667620-18718-1-git-send-email-alan.maguire@oracle.com/
[6] https://lore.kernel.org/all/1656089118-577-1-git-send-email-alan.maguire@oracle.com/
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-12 15:27:19 -07:00
Alan Maguire
a9d2fae89f selftests/bpf: add a ksym iter subtest
add subtest verifying BPF ksym iter behaviour.  The BPF ksym
iter program shows an example of dumping a format different to
/proc/kallsyms.  It adds KIND and MAX_SIZE fields which represent the
kind of symbol (core kernel, module, ftrace, bpf, or kprobe) and
the maximum size the symbol can be.  The latter is calculated from
the difference between current symbol value and the next symbol
value.

The key benefit for this iterator will likely be supporting in-kernel
data-gathering rather than dumping symbol details to userspace and
parsing the results.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/1657629105-7812-3-git-send-email-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-12 15:27:19 -07:00
Alan Maguire
647cafa223 bpf: add a ksym BPF iterator
add a "ksym" iterator which provides access to a "struct kallsym_iter"
for each symbol.  Intent is to support more flexible symbol parsing
as discussed in [1].

[1] https://lore.kernel.org/all/YjRPZj6Z8vuLeEZo@krava/

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/1657629105-7812-2-git-send-email-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-12 15:27:19 -07:00
Matthieu Baerts
f16214c102 bpf: Fix 'dubious one-bit signed bitfield' warnings
Our CI[1] reported these warnings when using Sparse:

  $ touch net/mptcp/bpf.c
  $ make C=1 net/mptcp/bpf.o
  net/mptcp/bpf.c: note: in included file:
  include/linux/bpf_verifier.h:348:26: error: dubious one-bit signed bitfield
  include/linux/bpf_verifier.h:349:29: error: dubious one-bit signed bitfield

Set them as 'unsigned' to avoid warnings.

[1] https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2643588487

Fixes: 1ade237119 ("bpf: Inline calls to bpf_loop when callback is known")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220711081200.2081262-1-matthieu.baerts@tessares.net
2022-07-11 21:18:55 -07:00
Jesper Dangaard Brouer
49705c4ab3 samples/bpf: Fix xdp_redirect_map egress devmap prog
LLVM compiler optimized out the memcpy in xdp_redirect_map_egress,
which caused the Ethernet source MAC-addr to always be zero
when enabling the devmap egress prog via cmdline --load-egress.

Issue observed with LLVM version 14.0.0
 - Shipped with Fedora 36 on target: x86_64-redhat-linux-gnu.

In verbose mode print the source MAC-addr in case xdp_devmap_attached
mode is used.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/165754826292.575614.5636444052787717159.stgit@firesoul
2022-07-11 21:14:35 -07:00
Xu Kuohai
efc9909fdc bpf, arm64: Add bpf trampoline for arm64
This is arm64 version of commit fec56f5890 ("bpf: Introduce BPF
trampoline"). A bpf trampoline converts native calling convention to bpf
calling convention and is used to implement various bpf features, such
as fentry, fexit, fmod_ret and struct_ops.

This patch does essentially the same thing that bpf trampoline does on x86.

Tested on Raspberry Pi 4B and qemu:

 #18 /1     bpf_tcp_ca/dctcp:OK
 #18 /2     bpf_tcp_ca/cubic:OK
 #18 /3     bpf_tcp_ca/invalid_license:OK
 #18 /4     bpf_tcp_ca/dctcp_fallback:OK
 #18 /5     bpf_tcp_ca/rel_setsockopt:OK
 #18        bpf_tcp_ca:OK
 #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
 #51 /2     dummy_st_ops/dummy_init_ret_value:OK
 #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
 #51 /4     dummy_st_ops/dummy_multiple_args:OK
 #51        dummy_st_ops:OK
 #57 /1     fexit_bpf2bpf/target_no_callees:OK
 #57 /2     fexit_bpf2bpf/target_yes_callees:OK
 #57 /3     fexit_bpf2bpf/func_replace:OK
 #57 /4     fexit_bpf2bpf/func_replace_verify:OK
 #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
 #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
 #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
 #57 /8     fexit_bpf2bpf/func_replace_multi:OK
 #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
 #57        fexit_bpf2bpf:OK
 #237       xdp_bpf2bpf:OK

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20220711150823.2128542-5-xukuohai@huawei.com
2022-07-11 21:08:08 +02:00
Xu Kuohai
b2ad54e153 bpf, arm64: Implement bpf_arch_text_poke() for arm64
Implement bpf_arch_text_poke() for arm64, so bpf prog or bpf trampoline
can be patched with it.

When the target address is NULL, the original instruction is patched to
a NOP.

When the target address and the source address are within the branch
range, the original instruction is patched to a bl instruction to the
target address directly.

To support attaching bpf trampoline to both regular kernel function and
bpf prog, we follow the ftrace patchsite way for bpf prog. That is, two
instructions are inserted at the beginning of bpf prog, the first one
saves the return address to x9, and the second is a nop which will be
patched to a bl instruction when a bpf trampoline is attached.

However, when a bpf trampoline is attached to bpf prog, the distance
between target address and source address may exceed 128MB, the maximum
branch range, because bpf trampoline and bpf prog are allocated
separately with vmalloc. So long jump should be handled.

When a bpf prog is constructed, a plt pointing to empty trampoline
dummy_tramp is placed at the end:

        bpf_prog:
                mov x9, lr
                nop // patchsite
                ...
                ret

        plt:
                ldr x10, target
                br x10
        target:
                .quad dummy_tramp // plt target

This is also the state when no trampoline is attached.

When a short-jump bpf trampoline is attached, the patchsite is patched to
a bl instruction to the trampoline directly:

        bpf_prog:
                mov x9, lr
                bl <short-jump bpf trampoline address> // patchsite
                ...
                ret

        plt:
                ldr x10, target
                br x10
        target:
                .quad dummy_tramp // plt target

When a long-jump bpf trampoline is attached, the plt target is filled with
the trampoline address and the patchsite is patched to a bl instruction to
the plt:

        bpf_prog:
                mov x9, lr
                bl plt // patchsite
                ...
                ret

        plt:
                ldr x10, target
                br x10
        target:
                .quad <long-jump bpf trampoline address>

dummy_tramp is used to prevent another CPU from jumping to an unknown
location during the patching process, making the patching process easier.

The patching process is as follows:

1. when neither the old address or the new address is a long jump, the
   patchsite is replaced with a bl to the new address, or nop if the new
   address is NULL;

2. when the old address is not long jump but the new one is, the
   branch target address is written to plt first, then the patchsite
   is replaced with a bl instruction to the plt;

3. when the old address is long jump but the new one is not, the address
   of dummy_tramp is written to plt first, then the patchsite is replaced
   with a bl to the new address, or a nop if the new address is NULL;

4. when both the old address and the new address are long jump, the
   new address is written to plt and the patchsite is not changed.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220711150823.2128542-4-xukuohai@huawei.com
2022-07-11 21:08:01 +02:00
Xu Kuohai
f1e8a24ed2 arm64: Add LDR (literal) instruction
Add LDR (literal) instruction to load data from address relative to PC.
This instruction will be used to implement long jump from bpf prog to
bpf trampoline in the follow-up patch.

The instruction encoding:

    3       2   2     2                                     0        0
    0       7   6     4                                     5        0
+-----+-------+---+-----+-------------------------------------+--------+
| 0 x | 0 1 1 | 0 | 0 0 |                imm19                |   Rt   |
+-----+-------+---+-----+-------------------------------------+--------+

for 32-bit, variant x == 0; for 64-bit, x == 1.

branch_imm_common() is used to check the distance between pc and target
address, since it's reused by this patch and LDR (literal) is not a branch
instruction, rename it to label_imm_common().

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/bpf/20220711150823.2128542-3-xukuohai@huawei.com
2022-07-11 21:04:58 +02:00
Xu Kuohai
535a57a7ff bpf: Remove is_valid_bpf_tramp_flags()
Before generating bpf trampoline, x86 calls is_valid_bpf_tramp_flags()
to check the input flags. This check is architecture independent.
So, to be consistent with x86, arm64 should also do this check
before generating bpf trampoline.

However, the BPF_TRAMP_F_XXX flags are not used by user code and the
flags argument is almost constant at compile time, so this run time
check is a bit redundant.

Remove is_valid_bpf_tramp_flags() and add some comments to the usage of
BPF_TRAMP_F_XXX flags, as suggested by Alexei.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220711150823.2128542-2-xukuohai@huawei.com
2022-07-11 21:04:58 +02:00
Liu Jian
9974d37ea7 skmsg: Fix invalid last sg check in sk_msg_recvmsg()
In sk_psock_skb_ingress_enqueue function, if the linear area + nr_frags +
frag_list of the SKB has NR_MSG_FRAG_IDS blocks in total, skb_to_sgvec
will return NR_MSG_FRAG_IDS, then msg->sg.end will be set to
NR_MSG_FRAG_IDS, and in addition, (NR_MSG_FRAG_IDS - 1) is set to the last
SG of msg. Recv the msg in sk_msg_recvmsg, when i is (NR_MSG_FRAG_IDS - 1),
the sk_msg_iter_var_next(i) will change i to 0 (not NR_MSG_FRAG_IDS), the
judgment condition "msg_rx->sg.start==msg_rx->sg.end" and
"i != msg_rx->sg.end" can not work.

As a result, the processed msg cannot be deleted from ingress_msg list.
But the length of all the sge of the msg has changed to 0. Then the next
recvmsg syscall will process the msg repeatedly, because the length of sge
is 0, the -EFAULT error is always returned.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220628123616.186950-1-liujian56@huawei.com
2022-07-11 18:22:07 +02:00
Jilin Yuan
edb2c3476d fddi/skfp: fix repeated words in comments
Delete the redundant word 'test'.

Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11 14:12:54 +01:00
Jilin Yuan
1377a5b2d4 ethernet/via: fix repeated words in comments
Delete the redundant word 'driver'.

Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11 14:12:54 +01:00
sewookseo
e22aa14866 net: Find dst with sk's xfrm policy not ctl_sk
If we set XFRM security policy by calling setsockopt with option
IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
struct. However tcp_v6_send_response doesn't look up dst_entry with the
actual socket but looks up with tcp control socket. This may cause a
problem that a RST packet is sent without ESP encryption & peer's TCP
socket can't receive it.
This patch will make the function look up dest_entry with actual socket,
if the socket has XFRM policy(sock_policy), so that the TCP response
packet via this function can be encrypted, & aligned on the encrypted
TCP socket.

Tested: We encountered this problem when a TCP socket which is encrypted
in ESP transport mode encryption, receives challenge ACK at SYN_SENT
state. After receiving challenge ACK, TCP needs to send RST to
establish the socket at next SYN try. But the RST was not encrypted &
peer TCP socket still remains on ESTABLISHED state.
So we verified this with test step as below.
[Test step]
1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
2. Client tries a new connection on the same TCP ports(src & dst).
3. Server will return challenge ACK instead of SYN,ACK.
4. Client will send RST to server to clear the SOCKET.
5. Client will retransmit SYN to server on the same TCP ports.
[Expected result]
The TCP connection should be established.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Sehee Lee <seheele@google.com>
Signed-off-by: Sewook Seo <sewookseo@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11 13:39:56 +01:00
Jakub Kicinski
0076cad301 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-07-09

We've added 94 non-merge commits during the last 19 day(s) which contain
a total of 125 files changed, 5141 insertions(+), 6701 deletions(-).

The main changes are:

1) Add new way for performing BTF type queries to BPF, from Daniel Müller.

2) Add inlining of calls to bpf_loop() helper when its function callback is
   statically known, from Eduard Zingerman.

3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz.

4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM
   hooks, from Stanislav Fomichev.

5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko.

6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky.

7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP
   selftests, from Magnus Karlsson & Maciej Fijalkowski.

8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet.

9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki.

10) Sockmap optimizations around throughput of UDP transmissions which have been
    improved by 61%, from Cong Wang.

11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa.

12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend.

13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang.

14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma
    macro, from James Hilliard.

15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan.

16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits)
  selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
  bpf: Correctly propagate errors up from bpf_core_composites_match
  libbpf: Disable SEC pragma macro on GCC
  bpf: Check attach_func_proto more carefully in check_return_code
  selftests/bpf: Add test involving restrict type qualifier
  bpftool: Add support for KIND_RESTRICT to gen min_core_btf command
  MAINTAINERS: Add entry for AF_XDP selftests files
  selftests, xsk: Rename AF_XDP testing app
  bpf, docs: Remove deprecated xsk libbpf APIs description
  selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage
  libbpf, riscv: Use a0 for RC register
  libbpf: Remove unnecessary usdt_rel_ip assignments
  selftests/bpf: Fix few more compiler warnings
  selftests/bpf: Fix bogus uninitialized variable warning
  bpftool: Remove zlib feature test from Makefile
  libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
  libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
  libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
  selftests/bpf: Add type match test against kernel's task_struct
  selftests/bpf: Add nested type to type based tests
  ...
====================

Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-09 12:24:16 -07:00
Linus Walleij
877d4e3ced ixp4xx_eth: Set MAC address from device tree
If there is a MAC address specified in the device tree, then
use it. This is already perfectly legal to specify in accordance
with the generic ethernet-controller.yaml schema.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:34:18 +01:00
Linus Walleij
b3ba206ce8 ixp4xx_eth: Fall back to random MAC address
If the firmware does not provide a MAC address to the driver,
fall back to generating a random MAC address.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:34:17 +01:00
Eric Dumazet
44ac441a51 af_unix: fix unix_sysctl_register() error path
We want to kfree(table) if @table has been kmalloced,
ie for non initial network namespace.

Fixes: 849d5aa3a1 ("af_unix: Do not call kmemdup() for init_net's sysctl table.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:27:33 +01:00
David S. Miller
be587adbf8 Merge branch 'mptcp-selftest-improvements-and-header-tweak'
Mat Martineau says:

====================
mptcp: Self test improvements and a header tweak

Patch 1 moves a definition to a header so it can be used in a struct
declaration.

Patch 2 adjusts a time threshold for a selftest that runs much slower on
debug kernels (and even more on slow CI infrastructure), to reduce
spurious failures.

Patches 3 & 4 improve userspace PM test coverage.

Patches 5 & 6 clean up output from a test script and selftest helper
tool.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:24 +01:00
Geliang Tang
65ebc6676d selftests: mptcp: update pm_nl_ctl usage header
The usage header of pm_nl_ctl command doesn't match with the context. So
this patch adds the missing userspace PM keywords 'ann', 'rem', 'csf',
'dsf', 'events' and 'listen' in it.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:24 +01:00
Geliang Tang
507719cd7c selftests: mptcp: avoid Terminated messages in userspace_pm
There're some 'Terminated' messages in the output of userspace pm tests
script after killing './pm_nl_ctl events' processes:

Created network namespaces ns1, ns2         			[OK]
./userspace_pm.sh: line 166: 13735 Terminated              ip netns exec "$ns2" ./pm_nl_ctl events >> "$client_evts" 2>&1
./userspace_pm.sh: line 172: 13737 Terminated              ip netns exec "$ns1" ./pm_nl_ctl events >> "$server_evts" 2>&1
Established IPv4 MPTCP Connection ns2 => ns1    		[OK]
./userspace_pm.sh: line 166: 13753 Terminated              ip netns exec "$ns2" ./pm_nl_ctl events >> "$client_evts" 2>&1
./userspace_pm.sh: line 172: 13755 Terminated              ip netns exec "$ns1" ./pm_nl_ctl events >> "$server_evts" 2>&1
Established IPv6 MPTCP Connection ns2 => ns1    		[OK]
ADD_ADDR 10.0.2.2 (ns2) => ns1, invalid token    		[OK]

This patch adds a helper kill_wait(), in it using 'wait $pid 2>/dev/null'
commands after 'kill $pid' to avoid printing out these Terminated messages.
Use this helper instead of using 'kill $pid'.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:24 +01:00
Geliang Tang
5e986ec468 selftests: mptcp: userspace pm subflow tests
This patch adds userspace pm subflow tests support for mptcp_join.sh
script. Add userspace pm create subflow and destroy test cases in
userspace_tests().

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:24 +01:00
Geliang Tang
97040cf980 selftests: mptcp: userspace pm address tests
This patch adds userspace pm tests support for mptcp_join.sh script. Add
userspace pm add_addr and rm_addr test cases in userspace_tests().

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:24 +01:00
Paolo Abeni
d0d9c8f2df selftests: mptcp: tweak simult_flows for debug kernels
The mentioned test measures the transfer run-time to verify
that the user-space program is able to use the full aggregate B/W.

Even on (virtual) link-speed-bound tests, debug kernel can slow
down the transfer enough to cause sporadic test failures.

Instead of unconditionally raising the maximum allowed run-time,
tweak when the running kernel is a debug one, and use some simple/
rough heuristic to guess such scenarios.

Note: this intentionally avoids looking for /boot/config-<version> as
the latter file is not always available in our reference CI
environments.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:23 +01:00
Geliang Tang
f7657ff4a7 mptcp: move MPTCPOPT_HMAC_LEN to net/mptcp.h
Move macro MPTCPOPT_HMAC_LEN definition from net/mptcp/protocol.h to
include/net/mptcp.h.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:23 +01:00
Yang Yingliang
9f7cb73ef6 bcm63xx_enet: change the driver variables to static
bcm63xx_enetsw_driver and bcm63xx_enet_driver are only used in
bcm63xx_enet.c now, change them to static.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220707135801.1483941-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:31:42 -07:00
Russell King (Oracle)
6d1ce9c038 net: phylink: fix SGMII inband autoneg enable
When we are operating in SGMII inband mode, it implies that there is a
PHY connected, and the ethtool advertisement for autoneg applies to
the PHY, not the SGMII link. When in 1000base-X mode, then this applies
to the 802.3z link and needs to be applied to the PCS.

Fix this.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1o9Ng2-005Qbe-3H@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:19:19 -07:00
Antoine Tenart
40ad0a52ef Documentation: add a description for net.core.high_order_alloc_disable
A description is missing for the net.core.high_order_alloc_disable
option in admin-guide/sysctl/net.rst ; add it. The above sysctl option
was introduced by commit ce27ec6064 ("net: add high_order_alloc_disable
sysctl/static key").

Thanks to Eric for running again the benchmark cited in the above
commit, showing this knob is now mostly of historical importance.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220707080245.180525-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:15:59 -07:00
Justin Stitt
5b47d23646 net: rxrpc: fix clang -Wformat warning
When building with Clang we encounter this warning:
| net/rxrpc/rxkad.c:434:33: error: format specifies type 'unsigned short'
| but the argument has type 'u32' (aka 'unsigned int') [-Werror,-Wformat]
| _leave(" = %d [set %hx]", ret, y);

y is a u32 but the format specifier is `%hx`. Going from unsigned int to
short int results in a loss of data. This is surely not intended
behavior. If it is intended, the warning should be suppressed through
other means.

This patch should get us closer to the goal of enabling the -Wformat
flag for Clang builds.

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220707182052.769989-1-justinstitt@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 20:15:11 -07:00