linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-01 22:54:01 +00:00

Author	SHA1	Message	Date
Martin KaFai Lau	781e775e29	tools/bpf: Sync kernel btf.h header The kernel uapi btf.h is synced to the tools directory. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-20 10:54:38 -08:00
Martin KaFai Lau	2667a2626f	bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO This patch adds BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO to support the function debug info. BTF_KIND_FUNC_PROTO must not have a name (i.e. !t->name_off) and it is followed by >= 0 'struct bpf_param' objects to describe the function arguments. The BTF_KIND_FUNC must have a valid name and it must refer back to a BTF_KIND_FUNC_PROTO. The above is the conclusion after the discussion between Edward Cree, Alexei, Daniel, Yonghong and Martin. By combining BTF_KIND_FUNC and BTF_LIND_FUNC_PROTO, a complete function signature can be obtained. It will be used in the later patches to learn the function signature of a running bpf program. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-20 10:54:38 -08:00
Martin KaFai Lau	b47a0bd23e	bpf: btf: Break up btf_type_is_void() This patch breaks up btf_type_is_void() into btf_type_is_void() and btf_type_is_fwd(). It also adds btf_type_nosize() to better describe it is testing a type has nosize info. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-20 10:54:38 -08:00
Daniel Borkmann	bbe5d311be	Merge branch 'bpf-zero-hash-seed' Lorenz Bauer says: ==================== Allow forcing the seed of a hash table to zero, for deterministic execution during benchmarking and testing. Changes from v2: * Change ordering of BPF_F_ZERO_SEED in linux/bpf.h Comments adressed from v1: * Add comment to discourage production use to linux/bpf.h * Require CAP_SYS_ADMIN ==================== Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-20 00:53:41 +01:00
Lorenz Bauer	bf5d68c730	tools: add selftest for BPF_F_ZERO_SEED Check that iterating two separate hash maps produces the same order of keys if BPF_F_ZERO_SEED is used. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-20 00:53:40 +01:00
Lorenz Bauer	608114e441	tools: sync linux/bpf.h Synchronize changes to linux/bpf.h from * "bpf: allow zero-initializing hash map seed" * "bpf: move BPF_F_QUERY_EFFECTIVE after map flags" Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-20 00:53:40 +01:00
Lorenz Bauer	2f1833607a	bpf: move BPF_F_QUERY_EFFECTIVE after map flags BPF_F_QUERY_EFFECTIVE is in the middle of the flags valid for BPF_MAP_CREATE. Move it to its own section to reduce confusion. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-20 00:53:39 +01:00
Lorenz Bauer	96b3b6c909	bpf: allow zero-initializing hash map seed Add a new flag BPF_F_ZERO_SEED, which forces a hash map to initialize the seed to zero. This is useful when doing performance analysis both on individual BPF programs, as well as the kernel's hash table implementation. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-20 00:53:39 +01:00
Stanislav Fomichev	23499442c3	bpf: libbpf: retry map creation without the name Since commit `88cda1c9da` ("bpf: libbpf: Provide basic API support to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name for maps. Pre v4.14 kernels don't know about map names and return an error about unexpected non-zero data. Retry sys_bpf without a map name to cover older kernels. v2 changes: * check for errno == EINVAL as suggested by Daniel Borkmann Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-20 00:49:32 +01:00
Colin Ian King	592ee43faf	bpf: fix null pointer dereference on pointer offload Pointer offload is being null checked however the following statement dereferences the potentially null pointer offload when assigning offload->dev_state. Fix this by only assigning it if offload is not null. Detected by CoverityScan, CID#1475437 ("Dereference after null check") Fixes: `00db12c3d1` ("bpf: call verifier_prep from its callback in struct bpf_offload_dev") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-16 20:48:27 -08:00
Stanislav Fomichev	29a9c10e41	bpftool: make libbfd optional Make it possible to build bpftool without libbfd. libbfd and libopcodes are typically provided in dev/dbg packages (binutils-dev in debian) which we usually don't have installed on the fleet machines and we'd like a way to have bpftool version that works without installing any additional packages. This excludes support for disassembling jit-ted code and prints an error if the user tries to use these features. Tested by: cat > FEATURES_DUMP.bpftool <<EOF feature-libbfd=0 feature-disassembler-four-args=1 feature-reallocarray=0 feature-libelf=1 feature-libelf-mmap=1 feature-bpf=1 EOF FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make ldd bpftool \| grep libbfd Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-16 20:45:01 -08:00
Alexei Starovoitov	ae9435f696	Merge branch 'socket-lookup-cg_sock' Andrey Ignatov says: ==================== This patch set makes bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers available in programs of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR. Patch 1 is a fix for bpf_sk_lookup_udp that was already merged to bpf (stable) tree. Here it's prerequisite for patch 3. Patch 2 is the main patch in the set, it makes the helpers available for BPF_PROG_TYPE_CGROUP_SOCK_ADDR and provides more details about use-case. Patch 3 adds selftest for new functionality. v1->v2: - remove "Split bpf_sk_lookup" patch since it was already split by: commit `c8123ead13` ("bpf: Extend the sk_lookup() helper to XDP hookpoint."); - avoid unnecessary bpf_sock_addr_sk_lookup function. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-16 17:54:30 -08:00
Andrey Ignatov	9108e3a023	selftest/bpf: Use bpf_sk_lookup_{tcp, udp} in test_sock_addr Use bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers from test_sock_addr programs to make sure they're available and can lookup and release socket properly for IPv4/IPv4, TCP/UDP. Reading from a few fields of returned struct bpf_sock is also tested. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-16 17:54:29 -08:00
Andrey Ignatov	6c49e65e0d	bpf: Support socket lookup in CGROUP_SOCK_ADDR progs Make bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers available in programs of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR. Such programs operate on sockets and have access to socket and struct sockaddr passed by user to system calls such as sys_bind, sys_connect, sys_sendmsg. It's useful to be able to lookup other sockets from these programs. E.g. sys_connect may lookup IP:port endpoint and if there is a server socket bound to that endpoint ("server" can be defined by saddr & sport being zero), redirect client connection to it by rewriting IP:port in sockaddr passed to sys_connect. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-16 17:54:29 -08:00
Andrey Ignatov	cac6cc2f5a	bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp Lookup functions in sk_lookup have different expectations about byte order of provided arguments. Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup expect dport to be in network byte order and do ntohs(dport) internally. At the same time __inet6_lookup expects dport to be in host byte order and correspondingly name the argument hnum. sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and __inet6_lookup with regard to dport. But in __udp6_lib_lookup case it uses host instead of expected network byte order. It makes result returned by bpf_sk_lookup_udp for IPv6 incorrect. The patch fixes byte order of dport passed to __udp6_lib_lookup. Originally sk_lookup properly handled UDPv6, but not TCPv6. `5ef0ae84f0` fixes TCPv6 but breaks UDPv6. Fixes: `5ef0ae84f0` ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup") Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Joe Stringer <joe@wand.net.nz> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-16 17:54:29 -08:00
Nathan Chancellor	ac8acec991	bpf: Remove unused variable in nsim_bpf Clang warns: drivers/net/netdevsim/bpf.c:557:30: error: unused variable 'state' [-Werror,-Wunused-variable] struct nsim_bpf_bound_prog *state; ^ 1 error generated. The declaration should have been removed in commit `b07ade27e9` ("bpf: pass translate() as a callback and remove its ndo_bpf subcommand"). Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-16 17:51:13 -08:00
Martin KaFai Lau	a83d6e76a6	bpf: libbpf: Fix bpf_program__next() API This patch restores the behavior in commit `eac7d84519` ("tools: libbpf: don't return '.text' as a program for multi-function programs") such that bpf_program__next() does not return pseudo programs in ".text". Fixes: `0c19a9fbc9` ("libbpf: cleanup after partial failure in bpf_object__pin") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-16 17:46:54 -08:00
Joe Stringer	5c86d2125b	selftests/bpf: Fix uninitialized duration warning Daniel Borkmann reports: test_progs.c: In function ‘main’: test_progs.c:81:3: warning: ‘duration’ may be used uninitialized in this function [-Wmaybe-uninitialized] printf("%s:PASS:%s %d nsec\n", __func__, tag, duration);\ ^~~~~~ test_progs.c:1706:8: note: ‘duration’ was declared here __u32 duration; ^~~~~~~~ Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-16 17:40:10 -08:00
Alexei Starovoitov	407be8d03e	Merge branch 'narrow-loads' Andrey Ignatov says: ==================== This patch set adds support for narrow loads with offset > 0 to BPF verifier. Patch 1 provides more details and is the main patch in the set. Patches 2 and 3 add new test cases to test_verifier and test_sock_addr selftests. v1->v2: - fix -Wdeclaration-after-statement warning. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 22:30:00 -08:00
Andrey Ignatov	e7605475f5	selftests/bpf: Test narrow loads with off > 0 for bpf_sock_addr Add more test cases for context bpf_sock_addr to test narrow loads with offset > 0 for ctx->user_ip4 field (__u32): * off=1, size=1; * off=2, size=1; * off=3, size=1; * off=2, size=2. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 22:29:59 -08:00
Andrey Ignatov	6c2afb674d	selftests/bpf: Test narrow loads with off > 0 in test_verifier Test the following narrow loads in test_verifier for context __sk_buff: * off=1, size=1 - ok; * off=2, size=1 - ok; * off=3, size=1 - ok; * off=0, size=2 - ok; * off=1, size=2 - fail; * off=0, size=2 - ok; * off=3, size=2 - fail. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 22:29:59 -08:00
Andrey Ignatov	46f53a65d2	bpf: Allow narrow loads with offset > 0 Currently BPF verifier allows narrow loads for a context field only with offset zero. E.g. if there is a __u32 field then only the following loads are permitted: * off=0, size=1 (narrow); * off=0, size=2 (narrow); * off=0, size=4 (full). On the other hand LLVM can generate a load with offset different than zero that make sense from program logic point of view, but verifier doesn't accept it. E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code: #define DST_IP4 0xC0A801FEU /* 192.168.1.254 / ... if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) && where ctx is struct bpf_sock_addr. Some versions of LLVM can produce the following byte code for it: 8: 71 12 07 00 00 00 00 00 r2 = (u8 )(r1 + 7) 9: 67 02 00 00 18 00 00 00 r2 <<= 24 10: 18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00 r3 = 4261412864 ll 12: 5d 32 07 00 00 00 00 00 if r2 != r3 goto +7 <LBB0_6> where `(u8 )(r1 + 7)` means narrow load for ctx->user_ip4 with size=1 and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently rejected by verifier. Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok() what means any is_valid_access implementation, that uses the function, works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or sock_addr_is_valid_access() for bpf_sock_addr. The patch makes such loads supported. Offset can be in [0; size_default) but has to be multiple of load size. E.g. for __u32 field the following loads are supported now: off=0, size=1 (narrow); * off=1, size=1 (narrow); * off=2, size=1 (narrow); * off=3, size=1 (narrow); * off=0, size=2 (narrow); * off=2, size=2 (narrow); * off=0, size=4 (full). Reported-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 22:29:59 -08:00
Alexei Starovoitov	f2cbf95826	Merge branch 'bpftool-flow-dissector' Stanislav Fomichev says: ==================== v5 changes: * FILE -> PATH for load/loadall (can be either file or directory now) * simpler implementation for __bpf_program__pin_name * removed p_err for REQ_ARGS checks * parse_atach_detach_args -> parse_attach_detach_args * for -> while in bpf_object__pin_{programs,maps} recovery v4 changes: * addressed another round of comments/style issues from Jakub Kicinski & Quentin Monnet (thanks!) * implemented bpf_object__pin_maps and bpf_object__pin_programs helpers and used them in bpf_program__pin * added new pin_name to bpf_program so bpf_program__pin works with sections that contain '/' * moved loadall command implementation into a separate patch * added patch that implements pinmaps to pin maps when doing load/loadall v3 changes: * (maybe) better cleanup for partial failure in bpf_object__pin * added special case in bpf_program__pin for programs with single instances v2 changes: * addressed comments/style issues from Jakub Kicinski & Quentin Monnet * removed logic that populates jump table * added cleanup for partial failure in bpf_object__pin This patch series adds support for loading and attaching flow dissector programs from the bpftool: * first patch fixes flow dissector section name in the selftests (so libbpf auto-detection works) * second patch adds proper cleanup to bpf_object__pin, parts of which are now being used to attach all flow dissector progs/maps * third patch adds special case in bpf_program__pin for programs with single instances (we don't create <prog>/0 pin anymore, just <prog>) * forth patch adds pin_name to the bpf_program struct which is now used as a pin name in bpf_program__pin et al * fifth patch adds loadall command that pins all programs, not just the first one * sixth patch adds pinmaps argument to load/loadall to let users pin all maps of the obj file * seventh patch adds actual flow_dissector support to the bpftool and an example ==================== Acked-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:58:12 -08:00
Stanislav Fomichev	092f089273	bpftool: support loading flow dissector This commit adds support for loading/attaching/detaching flow dissector program. When `bpftool loadall` is called with a flow_dissector prog (i.e. when the 'type flow_dissector' argument is passed), we load and pin all programs. User is responsible to construct the jump table for the tail calls. The last argument of `bpftool attach` is made optional for this use case. Example: bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \ /sys/fs/bpf/flow type flow_dissector \ pinmaps /sys/fs/bpf/flow bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 0 0 0 0 \ value pinned /sys/fs/bpf/flow/IP bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 1 0 0 0 \ value pinned /sys/fs/bpf/flow/IPV6 bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 2 0 0 0 \ value pinned /sys/fs/bpf/flow/IPV6OP bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 3 0 0 0 \ value pinned /sys/fs/bpf/flow/IPV6FR bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 4 0 0 0 \ value pinned /sys/fs/bpf/flow/MPLS bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 5 0 0 0 \ value pinned /sys/fs/bpf/flow/VLAN bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector flow_dissector Tested by using the above lines to load the prog in the test_flow_dissector.sh selftest. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:56:11 -08:00
Stanislav Fomichev	3767a94b32	bpftool: add pinmaps argument to the load/loadall This new additional argument lets users pin all maps from the object at specified path. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:56:11 -08:00
Stanislav Fomichev	77380998d9	bpftool: add loadall command This patch adds new loadall command which slightly differs from the existing load. load command loads all programs from the obj file, but pins only the first programs. loadall pins all programs from the obj file under specified directory. The intended usecase is flow_dissector, where we want to load a bunch of progs, pin them all and after that construct a jump table. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:56:11 -08:00
Stanislav Fomichev	33a2c75c55	libbpf: add internal pin_name pin_name is the same as section_name where '/' is replaced by '_'. bpf_object__pin_programs is converted to use pin_name to avoid the situation where section_name would require creating another subdirectory for a pin (as, for example, when calling bpf_object__pin_programs for programs in sections like "cgroup/connect6"). Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:56:11 -08:00
Stanislav Fomichev	fd734c5cca	libbpf: bpf_program__pin: add special case for instances.nr == 1 When bpf_program has only one instance, don't create a subdirectory with per-instance pin files (<prog>/0). Instead, just create a single pin file for that single instance. This simplifies object pinning by not creating unnecessary subdirectories. This can potentially break existing users that depend on the case where '/0' is always created. However, I couldn't find any serious usage of bpf_program__pin inside the kernel tree and I suppose there should be none outside. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:56:10 -08:00
Stanislav Fomichev	0c19a9fbc9	libbpf: cleanup after partial failure in bpf_object__pin bpftool will use bpf_object__pin in the next commits to pin all programs and maps from the file; in case of a partial failure, we need to get back to the clean state (undo previous program/map pins). As part of a cleanup, I've added and exported separate routines to pin all maps (bpf_object__pin_maps) and progs (bpf_object__pin_programs) of an object. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:56:10 -08:00
Stanislav Fomichev	108d50a976	selftests/bpf: rename flow dissector section to flow_dissector Makes it compatible with the logic that derives program type from section name in libbpf_prog_type_by_name. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:56:10 -08:00
Alexei Starovoitov	0157edc859	Merge branch 'device-ops-as-cb' Quentin Monnet says: ==================== For passing device functions for offloaded eBPF programs, there used to be no place where to store the pointer without making the non-offloaded programs pay a memory price. As a consequence, three functions were called with ndo_bpf() through specific commands. Now that we have struct bpf_offload_dev, and since none of those operations rely on RTNL, we can turn these three commands into hooks inside the struct bpf_prog_offload_ops, and pass them as part of bpf_offload_dev_create(). This patch set changes the offload architecture to do so, and brings the relevant changes to the nfp and netdevsim drivers. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:55 -08:00
Quentin Monnet	16a8cb5cff	bpf: do not pass netdev to translate() and prepare() offload callbacks The kernel functions to prepare verifier and translate for offloaded program retrieve "offload" from "prog", and "netdev" from "offload". Then both "prog" and "netdev" are passed to the callbacks. Simplify this by letting the drivers retrieve the net device themselves from the offload object attached to prog - if they need it at all. There is currently no need to pass the netdev as an argument to those functions. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	a40a26322a	bpf: pass prog instead of env to bpf_prog_offload_verifier_prep() Function bpf_prog_offload_verifier_prep(), called from the kernel BPF verifier to run a driver-specific callback for preparing for the verification step for offloaded programs, takes a pointer to a struct bpf_verifier_env object. However, no driver callback needs the whole structure at this time: the two drivers supporting this, nfp and netdevsim, only need a pointer to the struct bpf_prog instance held by env. Update the callback accordingly, on kernel side and in these two drivers. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	eb9119471e	bpf: pass destroy() as a callback and remove its ndo_bpf subcommand As part of the transition from ndo_bpf() to callbacks attached to struct bpf_offload_dev for some of the eBPF offload operations, move the functions related to program destruction to the struct and remove the subcommand that was used to call them through the NDO. Remove function __bpf_offload_ndo(), which is no longer used. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	b07ade27e9	bpf: pass translate() as a callback and remove its ndo_bpf subcommand As part of the transition from ndo_bpf() to callbacks attached to struct bpf_offload_dev for some of the eBPF offload operations, move the functions related to code translation to the struct and remove the subcommand that was used to call them through the NDO. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	00db12c3d1	bpf: call verifier_prep from its callback in struct bpf_offload_dev In a way similar to the change previously brought to the verify_insn hook and to the finalize callback, switch to the newly added ops in struct bpf_prog_offload for calling the functions used to prepare driver verifiers. Since the dev_ops pointer in struct bpf_prog_offload is no longer used by any callback, we can now remove it from struct bpf_prog_offload. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:54 -08:00
Quentin Monnet	6dc18fa6f4	bpf: call finalize() from its callback in struct bpf_offload_dev In a way similar to the change previously brought to the verify_insn hook, switch to the newly added ops in struct bpf_prog_offload for calling the functions used to perform final verification steps for offloaded programs. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:53 -08:00
Quentin Monnet	341b3e7b7b	bpf: call verify_insn from its callback in struct bpf_offload_dev We intend to remove the dev_ops in struct bpf_prog_offload, and to only keep the ops in struct bpf_offload_dev instead, which is accessible from more locations for passing function pointers. But dev_ops is used for calling the verify_insn hook. Switch to the newly added ops in struct bpf_prog_offload instead. To avoid table lookups for each eBPF instruction to verify, we remember the offdev attached to a netdev and modify bpf_offload_find_netdev() to avoid performing more than once a lookup for a given offload object. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:53 -08:00
Quentin Monnet	1385d755cf	bpf: pass a struct with offload callbacks to bpf_offload_dev_create() For passing device functions for offloaded eBPF programs, there used to be no place where to store the pointer without making the non-offloaded programs pay a memory price. As a consequence, three functions were called with ndo_bpf() through specific commands. Now that we have struct bpf_offload_dev, and since none of those operations rely on RTNL, we can turn these three commands into hooks inside the struct bpf_prog_offload_ops, and pass them as part of bpf_offload_dev_create(). This commit effectively passes a pointer to the struct to bpf_offload_dev_create(). We temporarily have two struct bpf_prog_offload_ops instances, one under offdev->ops and one under offload->dev_ops. The next patches will make the transition towards the former, so that offload->dev_ops can be removed, and callbacks relying on ndo_bpf() added to offdev->ops as well. While at it, rename "nfp_bpf_analyzer_ops" as "nfp_bpf_dev_ops" (and similarly for netdevsim). Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:53 -08:00
Quentin Monnet	1da6f57338	nfp: bpf: move nfp_bpf_analyzer_ops from verifier.c to offload.c We are about to add several new callbacks to the struct, all of them defined in offload.c. Move the struct bpf_prog_offload_ops object in that file. As a consequence, nfp_verify_insn() and nfp_finalize() can no longer be static. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-11-10 15:39:53 -08:00
Nitin Hande	c8123ead13	bpf: Extend the sk_lookup() helper to XDP hookpoint. This patch proposes to extend the sk_lookup() BPF API to the XDP hookpoint. The sk_lookup() helper supports a lookup on incoming packet to find the corresponding socket that will receive this packet. Current support for this BPF API is at the tc hookpoint. This patch will extend this API at XDP hookpoint. A XDP program can map the incoming packet to the 5-tuple parameter and invoke the API to find the corresponding socket structure. Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-09 10:14:54 +01:00
David Ahern	bf598a8f0f	bpftool: Improve handling of ENOENT on map dumps bpftool output is not user friendly when dumping a map with only a few populated entries: $ bpftool map 1: devmap name tx_devmap flags 0x0 key 4B value 4B max_entries 64 memlock 4096B 2: array name tx_idxmap flags 0x0 key 4B value 4B max_entries 64 memlock 4096B $ bpftool map dump id 1 key: 00 00 00 00 value: No such file or directory key: 01 00 00 00 value: No such file or directory key: 02 00 00 00 value: No such file or directory key: 03 00 00 00 value: 03 00 00 00 Handle ENOENT by keeping the line format sane and dumping "<no entry>" for the value $ bpftool map dump id 1 key: 00 00 00 00 value: <no entry> key: 01 00 00 00 value: <no entry> key: 02 00 00 00 value: <no entry> key: 03 00 00 00 value: 03 00 00 00 ... Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-09 10:03:59 +01:00
Sowmini Varadhan	435f90a338	selftests/bpf: add a test case for sock_ops perf-event notification This patch provides a tcp_bpf based eBPF sample. The test - ncat(1) as the TCP client program to connect() to a port with the intention of triggerring SYN retransmissions: we first install an iptables DROP rule to make sure ncat SYNs are resent (instead of aborting instantly after a TCP RST) - has a bpf kernel module that sends a perf-event notification for each TCP retransmit, and also tracks the number of such notifications sent in the global_map The test passes when the number of event notifications intercepted in user-space matches the value in the global_map. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-09 09:40:17 +01:00
Sowmini Varadhan	a5a3a828cd	bpf: add perf event notificaton support for sock_ops This patch allows eBPF programs that use sock_ops to send perf based event notifications using bpf_perf_event_output(). Our main use case for this is the following: We would like to monitor some subset of TCP sockets in user-space, (the monitoring application would define 4-tuples it wants to monitor) using TCP_INFO stats to analyze reported problems. The idea is to use those stats to see where the bottlenecks are likely to be ("is it application-limited?" or "is there evidence of BufferBloat in the path?" etc). Today we can do this by periodically polling for tcp_info, but this could be made more efficient if the kernel would asynchronously notify the application via tcp_info when some "interesting" thresholds (e.g., "RTT variance > X", or "total_retrans > Y" etc) are reached. And to make this effective, it is better if we could apply the threshold check before constructing the tcp_info netlink notification, so that we don't waste resources constructing notifications that will be discarded by the filter. This work solves the problem by adding perf event based notification support for sock_ops. The eBPF program can thus be designed to apply any desired filters to the bpf_sock_ops and trigger a perf event notification based on the evaluation from the filter. The user space component can use these perf event notifications to either read any state managed by the eBPF program, or issue a TCP_INFO netlink call if desired. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-09 09:37:58 +01:00
Daniel Borkmann	185067a86a	Merge branch 'bpf-max-pkt-offset' Jiong Wang says: ==================== The maximum packet offset accessed by one BPF program is useful information. Because sometimes there could be packet split and it is possible for some reasons (for example performance) we want to reject the BPF program if the maximum packet size would trigger such split. Normally, MTU value is treated as the maximum packet size, but one BPF program does not always access the whole packet, it could only access the head portion of the data. We could let verifier calculate the maximum packet offset ever used and record it inside prog auxiliar information structure as a new field "max_pkt_offset". ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-09 09:16:33 +01:00
Jiong Wang	cf599f5031	nfp: bpf: relax prog rejection through max_pkt_offset NFP is refusing to offload programs whenever the MTU is set to a value larger than the max packet bytes that fits in NFP Cluster Target Memory (CTM). However, a eBPF program doesn't always need to access the whole packet data. Verifier has always calculated maximum direct packet access (DPA) offset, and kept it in max_pkt_offset inside prog auxiliar information. This patch relax prog rejection based on max_pkt_offset. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-09 09:16:32 +01:00
Jiong Wang	e647815a4d	bpf: let verifier to calculate and record max_pkt_offset In check_packet_access, update max_pkt_offset after the offset has passed __check_packet_access. It should be safe to use u32 for max_pkt_offset as explained in code comment. Also, when there is tail call, the max_pkt_offset of the called program is unknown, so conservatively set max_pkt_offset to MAX_PACKET_OFF for such case. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-09 09:16:31 +01:00
Shannon Nelson	bce6a14996	bpf_load: add map name to load_maps error message To help when debugging bpf/xdp load issues, have the load_map() error message include the number and name of the map that failed. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-07 22:34:54 +01:00
Quentin Monnet	8302b9bd31	tools: bpftool: adjust rlimit RLIMIT_MEMLOCK when loading programs, maps The limit for memory locked in the kernel by a process is usually set to 64 kbytes by default. This can be an issue when creating large BPF maps and/or loading many programs. A workaround is to raise this limit for the current process before trying to create a new BPF map. Changing the hard limit requires the CAP_SYS_RESOURCE and can usually only be done by root user (for non-root users, a call to setrlimit fails (and sets errno) and the program simply goes on with its rlimit unchanged). There is no API to get the current amount of memory locked for a user, therefore we cannot raise the limit only when required. One solution, used by bcc, is to try to create the map, and on getting a EPERM error, raising the limit to infinity before giving another try. Another approach, used in iproute2, is to raise the limit in all cases, before trying to create the map. Here we do the same as in iproute2: the rlimit is raised to infinity before trying to load programs or to create maps with bpftool. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-07 22:22:21 +01:00
Quentin Monnet	f96afa767b	selftests/bpf: enable (uncomment) all tests in test_libbpf.sh libbpf is now able to load successfully test_l4lb_noinline.o and samples/bpf/tracex3_kern.o. For the test_l4lb_noinline, uncomment related tests from test_libbpf.c and remove the associated "TODO". For tracex3_kern.o, instead of loading a program from samples/bpf/ that might not have been compiled at this stage, try loading a program from BPF selftests. Since this test case is about loading a program compiled without the "-target bpf" flag, change the Makefile to compile one program accordingly (instead of passing the flag for compiling all programs). Regarding test_xdp_noinline.o: in its current shape the program fails to load because it provides no version section, but the loader needs one. The test was added to make sure that libbpf could load XDP programs even if they do not provide a version number in a dedicated section. But libbpf is already capable of doing that: in our case loading fails because the loader does not know that this is an XDP program (it does not need to, since it does not attach the program). So trying to load test_xdp_noinline.o does not bring much here: just delete this subtest. For the record, the error message obtained with tracex3_kern.o was fixed by commit `e3d91b0ca5` ("tools/libbpf: handle issues with bpf ELF objects containing .eh_frames") I have not been abled to reproduce the "libbpf: incorrect bpf_call opcode" error for test_l4lb_noinline.o, even with the version of libbpf present at the time when test_libbpf.sh and test_libbpf_open.c were created. RFC -> v1: - Compile test_xdp without the "-target bpf" flag, and try to load it instead of ../../samples/bpf/tracex3_kern.o. - Delete test_xdp_noinline.o subtest. Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-07 22:20:56 +01:00

1 2 3 4 5 ...

796762 commits