linux-stable/tools/lib/bpf/libbpf.h

1861 lines
71 KiB
C
Raw Normal View History

/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
bpf tools: Introduce 'bpf' library and add bpf feature check This is the first patch of libbpf. The goal of libbpf is to create a standard way for accessing eBPF object files. This patch creates 'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and libbpf.so, 'make install' to put them into proper directories. Most part of Makefile is borrowed from traceevent. Before building, it checks the existence of libelf in Makefile, and deny to build if not found. Instead of throwing an error if libelf not found, the error raises in a phony target "elfdep". This design is to ensure 'make clean' still workable even if libelf is not found. Because libbpf requires 'kern_version' field set for 'union bpf_attr' (bpfdep" is used for that dependency), Kernel BPF API is also checked by intruducing a new feature check 'bpf' into tools/build/feature, which checks the existence and version of linux/bpf.h. When building libbpf, it searches that file from include/uapi/linux in kernel source tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel source tree it reside, installing of newest kernel headers is not required, except we are trying to port these files to an old kernel. To avoid checking that file when perf building, the newly introduced 'bpf' feature check doesn't added into FEATURE_TESTS and FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added into libbpf's specific. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kaixu Xia <xiakaixu@huawei.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Zefan Li <lizefan@huawei.com> Bcc: pi3orama@163.com Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 02:13:51 +00:00
/*
* Common eBPF ELF object loading operations.
*
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc.
*/
#ifndef __LIBBPF_LIBBPF_H
#define __LIBBPF_LIBBPF_H
bpf tools: Introduce 'bpf' library and add bpf feature check This is the first patch of libbpf. The goal of libbpf is to create a standard way for accessing eBPF object files. This patch creates 'Makefile' and 'Build' for it, allows 'make' to build libbpf.a and libbpf.so, 'make install' to put them into proper directories. Most part of Makefile is borrowed from traceevent. Before building, it checks the existence of libelf in Makefile, and deny to build if not found. Instead of throwing an error if libelf not found, the error raises in a phony target "elfdep". This design is to ensure 'make clean' still workable even if libelf is not found. Because libbpf requires 'kern_version' field set for 'union bpf_attr' (bpfdep" is used for that dependency), Kernel BPF API is also checked by intruducing a new feature check 'bpf' into tools/build/feature, which checks the existence and version of linux/bpf.h. When building libbpf, it searches that file from include/uapi/linux in kernel source tree (controlled by FEATURE_CHECK_CFLAGS-bpf). Since it searches kernel source tree it reside, installing of newest kernel headers is not required, except we are trying to port these files to an old kernel. To avoid checking that file when perf building, the newly introduced 'bpf' feature check doesn't added into FEATURE_TESTS and FEATURE_DISPLAY by default in tools/build/Makefile.feature, but added into libbpf's specific. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kaixu Xia <xiakaixu@huawei.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Zefan Li <lizefan@huawei.com> Bcc: pi3orama@163.com Link: http://lkml.kernel.org/r/1435716878-189507-4-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-01 02:13:51 +00:00
#include <stdarg.h>
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include <sys/types.h> // for size_t
#include <linux/bpf.h>
bpf tools: Improve libbpf error reporting In this patch, a series of libbpf specific error numbers and libbpf_strerror() are introduced to help reporting errors. Functions are updated to pass correct the error number through the CHECK_ERR() macro. All users of bpf_object__open{_buffer}() and bpf_program__title() in perf are modified accordingly. In addition, due to the error codes changing, bpf__strerror_load() is also modified to use them. bpf__strerror_head() is also changed accordingly so it can parse libbpf errors. bpf_loader_strerror() is introduced for that purpose, and will be improved by the following patch. load_program() is improved not to dump log buffer if it is empty. log buffer is also used to deduce whether the error was caused by an invalid program or other problem. v1 -> v2: - Using macro for error code. - Fetch error message based on array index, eliminate for-loop. - Use log buffer to detect the reason of failure. 3 new error code are introduced to replace LIBBPF_ERRNO__LOAD. In v1: # perf record -e ./test_ill_program.o ls event syntax error: './test_ill_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP # perf record -e ./test_kversion_nomatch_program.o ls event syntax error: './test_kversion_nomatch_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP # perf record -e ./test_big_program.o ls event syntax error: './test_big_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP In v2: # perf record -e ./test_ill_program.o ls event syntax error: './test_ill_program.o' \___ Kernel verifier blocks program loading SKIP # perf record -e ./test_kversion_nomatch_program.o event syntax error: './test_kversion_nomatch_program.o' \___ Incorrect kernel version SKIP (Will be further improved by following patches) # perf record -e ./test_big_program.o event syntax error: './test_big_program.o' \___ Program too big SKIP Signed-off-by: Wang Nan <wangnan0@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1446817783-86722-2-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-11-06 13:49:37 +00:00
#include "libbpf_common.h"
#include "libbpf_legacy.h"
#ifdef __cplusplus
extern "C" {
#endif
LIBBPF_API __u32 libbpf_major_version(void);
LIBBPF_API __u32 libbpf_minor_version(void);
LIBBPF_API const char *libbpf_version_string(void);
bpf tools: Improve libbpf error reporting In this patch, a series of libbpf specific error numbers and libbpf_strerror() are introduced to help reporting errors. Functions are updated to pass correct the error number through the CHECK_ERR() macro. All users of bpf_object__open{_buffer}() and bpf_program__title() in perf are modified accordingly. In addition, due to the error codes changing, bpf__strerror_load() is also modified to use them. bpf__strerror_head() is also changed accordingly so it can parse libbpf errors. bpf_loader_strerror() is introduced for that purpose, and will be improved by the following patch. load_program() is improved not to dump log buffer if it is empty. log buffer is also used to deduce whether the error was caused by an invalid program or other problem. v1 -> v2: - Using macro for error code. - Fetch error message based on array index, eliminate for-loop. - Use log buffer to detect the reason of failure. 3 new error code are introduced to replace LIBBPF_ERRNO__LOAD. In v1: # perf record -e ./test_ill_program.o ls event syntax error: './test_ill_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP # perf record -e ./test_kversion_nomatch_program.o ls event syntax error: './test_kversion_nomatch_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP # perf record -e ./test_big_program.o ls event syntax error: './test_big_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP In v2: # perf record -e ./test_ill_program.o ls event syntax error: './test_ill_program.o' \___ Kernel verifier blocks program loading SKIP # perf record -e ./test_kversion_nomatch_program.o event syntax error: './test_kversion_nomatch_program.o' \___ Incorrect kernel version SKIP (Will be further improved by following patches) # perf record -e ./test_big_program.o event syntax error: './test_big_program.o' \___ Program too big SKIP Signed-off-by: Wang Nan <wangnan0@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1446817783-86722-2-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-11-06 13:49:37 +00:00
enum libbpf_errno {
__LIBBPF_ERRNO__START = 4000,
/* Something wrong in libelf */
LIBBPF_ERRNO__LIBELF = __LIBBPF_ERRNO__START,
LIBBPF_ERRNO__FORMAT, /* BPF object format invalid */
LIBBPF_ERRNO__KVERSION, /* Incorrect or no 'version' section */
LIBBPF_ERRNO__ENDIAN, /* Endian mismatch */
bpf tools: Improve libbpf error reporting In this patch, a series of libbpf specific error numbers and libbpf_strerror() are introduced to help reporting errors. Functions are updated to pass correct the error number through the CHECK_ERR() macro. All users of bpf_object__open{_buffer}() and bpf_program__title() in perf are modified accordingly. In addition, due to the error codes changing, bpf__strerror_load() is also modified to use them. bpf__strerror_head() is also changed accordingly so it can parse libbpf errors. bpf_loader_strerror() is introduced for that purpose, and will be improved by the following patch. load_program() is improved not to dump log buffer if it is empty. log buffer is also used to deduce whether the error was caused by an invalid program or other problem. v1 -> v2: - Using macro for error code. - Fetch error message based on array index, eliminate for-loop. - Use log buffer to detect the reason of failure. 3 new error code are introduced to replace LIBBPF_ERRNO__LOAD. In v1: # perf record -e ./test_ill_program.o ls event syntax error: './test_ill_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP # perf record -e ./test_kversion_nomatch_program.o ls event syntax error: './test_kversion_nomatch_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP # perf record -e ./test_big_program.o ls event syntax error: './test_big_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP In v2: # perf record -e ./test_ill_program.o ls event syntax error: './test_ill_program.o' \___ Kernel verifier blocks program loading SKIP # perf record -e ./test_kversion_nomatch_program.o event syntax error: './test_kversion_nomatch_program.o' \___ Incorrect kernel version SKIP (Will be further improved by following patches) # perf record -e ./test_big_program.o event syntax error: './test_big_program.o' \___ Program too big SKIP Signed-off-by: Wang Nan <wangnan0@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1446817783-86722-2-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-11-06 13:49:37 +00:00
LIBBPF_ERRNO__INTERNAL, /* Internal error in libbpf */
LIBBPF_ERRNO__RELOC, /* Relocation failed */
LIBBPF_ERRNO__LOAD, /* Load program failure for unknown reason */
LIBBPF_ERRNO__VERIFY, /* Kernel verifier blocks program loading */
LIBBPF_ERRNO__PROG2BIG, /* Program too big */
LIBBPF_ERRNO__KVER, /* Incorrect kernel version */
LIBBPF_ERRNO__PROGTYPE, /* Kernel doesn't support this program type */
LIBBPF_ERRNO__WRNGPID, /* Wrong pid in netlink message */
LIBBPF_ERRNO__INVSEQ, /* Invalid netlink sequence */
LIBBPF_ERRNO__NLPARSE, /* netlink parsing error */
bpf tools: Improve libbpf error reporting In this patch, a series of libbpf specific error numbers and libbpf_strerror() are introduced to help reporting errors. Functions are updated to pass correct the error number through the CHECK_ERR() macro. All users of bpf_object__open{_buffer}() and bpf_program__title() in perf are modified accordingly. In addition, due to the error codes changing, bpf__strerror_load() is also modified to use them. bpf__strerror_head() is also changed accordingly so it can parse libbpf errors. bpf_loader_strerror() is introduced for that purpose, and will be improved by the following patch. load_program() is improved not to dump log buffer if it is empty. log buffer is also used to deduce whether the error was caused by an invalid program or other problem. v1 -> v2: - Using macro for error code. - Fetch error message based on array index, eliminate for-loop. - Use log buffer to detect the reason of failure. 3 new error code are introduced to replace LIBBPF_ERRNO__LOAD. In v1: # perf record -e ./test_ill_program.o ls event syntax error: './test_ill_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP # perf record -e ./test_kversion_nomatch_program.o ls event syntax error: './test_kversion_nomatch_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP # perf record -e ./test_big_program.o ls event syntax error: './test_big_program.o' \___ Failed to load program: Validate your program and check 'license'/'version' sections in your object SKIP In v2: # perf record -e ./test_ill_program.o ls event syntax error: './test_ill_program.o' \___ Kernel verifier blocks program loading SKIP # perf record -e ./test_kversion_nomatch_program.o event syntax error: './test_kversion_nomatch_program.o' \___ Incorrect kernel version SKIP (Will be further improved by following patches) # perf record -e ./test_big_program.o event syntax error: './test_big_program.o' \___ Program too big SKIP Signed-off-by: Wang Nan <wangnan0@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1446817783-86722-2-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-11-06 13:49:37 +00:00
__LIBBPF_ERRNO__END,
};
LIBBPF_API int libbpf_strerror(int err, char *buf, size_t size);
/**
* @brief **libbpf_bpf_attach_type_str()** converts the provided attach type
* value into a textual representation.
* @param t The attach type.
* @return Pointer to a static string identifying the attach type. NULL is
* returned for unknown **bpf_attach_type** values.
*/
LIBBPF_API const char *libbpf_bpf_attach_type_str(enum bpf_attach_type t);
/**
* @brief **libbpf_bpf_link_type_str()** converts the provided link type value
* into a textual representation.
* @param t The link type.
* @return Pointer to a static string identifying the link type. NULL is
* returned for unknown **bpf_link_type** values.
*/
LIBBPF_API const char *libbpf_bpf_link_type_str(enum bpf_link_type t);
/**
* @brief **libbpf_bpf_map_type_str()** converts the provided map type value
* into a textual representation.
* @param t The map type.
* @return Pointer to a static string identifying the map type. NULL is
* returned for unknown **bpf_map_type** values.
*/
LIBBPF_API const char *libbpf_bpf_map_type_str(enum bpf_map_type t);
/**
* @brief **libbpf_bpf_prog_type_str()** converts the provided program type
* value into a textual representation.
* @param t The program type.
* @return Pointer to a static string identifying the program type. NULL is
* returned for unknown **bpf_prog_type** values.
*/
LIBBPF_API const char *libbpf_bpf_prog_type_str(enum bpf_prog_type t);
enum libbpf_print_level {
LIBBPF_WARN,
LIBBPF_INFO,
LIBBPF_DEBUG,
};
typedef int (*libbpf_print_fn_t)(enum libbpf_print_level level,
const char *, va_list ap);
/**
* @brief **libbpf_set_print()** sets user-provided log callback function to
* be used for libbpf warnings and informational messages.
* @param fn The log print function. If NULL, libbpf won't print anything.
* @return Pointer to old print function.
*
* This function is thread-safe.
*/
LIBBPF_API libbpf_print_fn_t libbpf_set_print(libbpf_print_fn_t fn);
/* Hide internal to user */
struct bpf_object;
struct bpf_object_open_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* object name override, if provided:
* - for object open from file, this will override setting object
* name from file path's base name;
* - for object open from memory buffer, this will specify an object
* name and will override default "<addr>-<buf-size>" name;
*/
const char *object_name;
/* parse map definitions non-strictly, allowing extra attributes/data */
bool relaxed_maps;
/* maps that set the 'pinning' attribute in their definition will have
* their pin_path attribute set to a file in this directory, and be
* auto-pinned to that path on load; defaults to "/sys/fs/bpf".
*/
const char *pin_root_path;
__u32 :32; /* stub out now removed attach_prog_fd */
/* Additional kernel config content that augments and overrides
* system Kconfig for CONFIG_xxx externs.
libbpf: Support libbpf-provided extern variables Add support for extern variables, provided to BPF program by libbpf. Currently the following extern variables are supported: - LINUX_KERNEL_VERSION; version of a kernel in which BPF program is executing, follows KERNEL_VERSION() macro convention, can be 4- and 8-byte long; - CONFIG_xxx values; a set of values of actual kernel config. Tristate, boolean, strings, and integer values are supported. Set of possible values is determined by declared type of extern variable. Supported types of variables are: - Tristate values. Are represented as `enum libbpf_tristate`. Accepted values are **strictly** 'y', 'n', or 'm', which are represented as TRI_YES, TRI_NO, or TRI_MODULE, respectively. - Boolean values. Are represented as bool (_Bool) types. Accepted values are 'y' and 'n' only, turning into true/false values, respectively. - Single-character values. Can be used both as a substritute for bool/tristate, or as a small-range integer: - 'y'/'n'/'m' are represented as is, as characters 'y', 'n', or 'm'; - integers in a range [-128, 127] or [0, 255] (depending on signedness of char in target architecture) are recognized and represented with respective values of char type. - Strings. String values are declared as fixed-length char arrays. String of up to that length will be accepted and put in first N bytes of char array, with the rest of bytes zeroed out. If config string value is longer than space alloted, it will be truncated and warning message emitted. Char array is always zero terminated. String literals in config have to be enclosed in double quotes, just like C-style string literals. - Integers. 8-, 16-, 32-, and 64-bit integers are supported, both signed and unsigned variants. Libbpf enforces parsed config value to be in the supported range of corresponding integer type. Integers values in config can be: - decimal integers, with optional + and - signs; - hexadecimal integers, prefixed with 0x or 0X; - octal integers, starting with 0. Config file itself is searched in /boot/config-$(uname -r) location with fallback to /proc/config.gz, unless config path is specified explicitly through bpf_object_open_opts' kernel_config_path option. Both gzipped and plain text formats are supported. Libbpf adds explicit dependency on zlib because of this, but this shouldn't be a problem, given libelf already depends on zlib. All detected extern variables, are put into a separate .extern internal map. It, similarly to .rodata map, is marked as read-only from BPF program side, as well as is frozen on load. This allows BPF verifier to track extern values as constants and perform enhanced branch prediction and dead code elimination. This can be relied upon for doing kernel version/feature detection and using potentially unsupported field relocations or BPF helpers in a CO-RE-based BPF program, while still having a single version of BPF program running on old and new kernels. Selftests are validating this explicitly for unexisting BPF helper. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191214014710.3449601-3-andriin@fb.com
2019-12-14 01:47:08 +00:00
*/
const char *kconfig;
/* Path to the custom BTF to be used for BPF CO-RE relocations.
* This custom BTF completely replaces the use of vmlinux BTF
* for the purpose of CO-RE relocations.
* NOTE: any other BPF feature (e.g., fentry/fexit programs,
* struct_ops, etc) will need actual kernel BTF at /sys/kernel/btf/vmlinux.
*/
const char *btf_custom_path;
/* Pointer to a buffer for storing kernel logs for applicable BPF
* commands. Valid kernel_log_size has to be specified as well and are
* passed-through to bpf() syscall. Keep in mind that kernel might
* fail operation with -ENOSPC error if provided buffer is too small
* to contain entire log output.
* See the comment below for kernel_log_level for interaction between
* log_buf and log_level settings.
*
* If specified, this log buffer will be passed for:
* - each BPF progral load (BPF_PROG_LOAD) attempt, unless overriden
* with bpf_program__set_log() on per-program level, to get
* BPF verifier log output.
* - during BPF object's BTF load into kernel (BPF_BTF_LOAD) to get
* BTF sanity checking log.
*
* Each BPF command (BPF_BTF_LOAD or BPF_PROG_LOAD) will overwrite
* previous contents, so if you need more fine-grained control, set
* per-program buffer with bpf_program__set_log_buf() to preserve each
* individual program's verification log. Keep using kernel_log_buf
* for BTF verification log, if necessary.
*/
char *kernel_log_buf;
size_t kernel_log_size;
/*
* Log level can be set independently from log buffer. Log_level=0
* means that libbpf will attempt loading BTF or program without any
* logging requested, but will retry with either its own or custom log
* buffer, if provided, and log_level=1 on any error.
* And vice versa, setting log_level>0 will request BTF or prog
* loading with verbose log from the first attempt (and as such also
* for successfully loaded BTF or program), and the actual log buffer
* could be either libbpf's own auto-allocated log buffer, if
* kernel_log_buffer is NULL, or user-provided custom kernel_log_buf.
* If user didn't provide custom log buffer, libbpf will emit captured
* logs through its print callback.
*/
__u32 kernel_log_level;
libbpf: Wire up BPF token support at BPF object level Add BPF token support to BPF object-level functionality. BPF token is supported by BPF object logic either as an explicitly provided BPF token from outside (through BPF FS path), or implicitly (unless prevented through bpf_object_open_opts). Implicit mode is assumed to be the most common one for user namespaced unprivileged workloads. The assumption is that privileged container manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token delegation options (delegate_{cmds,maps,progs,attachs} mount options). BPF object during loading will attempt to create BPF token from /sys/fs/bpf location, and pass it for all relevant operations (currently, map creation, BTF load, and program load). In this implicit mode, if BPF token creation fails due to whatever reason (BPF FS is not mounted, or kernel doesn't support BPF token, etc), this is not considered an error. BPF object loading sequence will proceed with no BPF token. In explicit BPF token mode, user provides explicitly custom BPF FS mount point path. In such case, BPF object will attempt to create BPF token from provided BPF FS location. If BPF token creation fails, that is considered a critical error and BPF object load fails with an error. Libbpf provides a way to disable implicit BPF token creation, if it causes any troubles (BPF token is designed to be completely optional and shouldn't cause any problems even if provided, but in the world of BPF LSM, custom security logic can be installed that might change outcome depending on the presence of BPF token). To disable libbpf's default BPF token creation behavior user should provide either invalid BPF token FD (negative), or empty bpf_token_path option. BPF token presence can influence libbpf's feature probing, so if BPF object has associated BPF token, feature probing is instructed to use BPF object-specific feature detection cache and token FD. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-26-andrii@kernel.org
2024-01-24 02:21:22 +00:00
/* Path to BPF FS mount point to derive BPF token from.
*
* Created BPF token will be used for all bpf() syscall operations
* that accept BPF token (e.g., map creation, BTF and program loads,
* etc) automatically within instantiated BPF object.
*
* If bpf_token_path is not specified, libbpf will consult
* LIBBPF_BPF_TOKEN_PATH environment variable. If set, it will be
* taken as a value of bpf_token_path option and will force libbpf to
* either create BPF token from provided custom BPF FS path, or will
* disable implicit BPF token creation, if envvar value is an empty
* string. bpf_token_path overrides LIBBPF_BPF_TOKEN_PATH, if both are
* set at the same time.
*
libbpf: Wire up BPF token support at BPF object level Add BPF token support to BPF object-level functionality. BPF token is supported by BPF object logic either as an explicitly provided BPF token from outside (through BPF FS path), or implicitly (unless prevented through bpf_object_open_opts). Implicit mode is assumed to be the most common one for user namespaced unprivileged workloads. The assumption is that privileged container manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token delegation options (delegate_{cmds,maps,progs,attachs} mount options). BPF object during loading will attempt to create BPF token from /sys/fs/bpf location, and pass it for all relevant operations (currently, map creation, BTF load, and program load). In this implicit mode, if BPF token creation fails due to whatever reason (BPF FS is not mounted, or kernel doesn't support BPF token, etc), this is not considered an error. BPF object loading sequence will proceed with no BPF token. In explicit BPF token mode, user provides explicitly custom BPF FS mount point path. In such case, BPF object will attempt to create BPF token from provided BPF FS location. If BPF token creation fails, that is considered a critical error and BPF object load fails with an error. Libbpf provides a way to disable implicit BPF token creation, if it causes any troubles (BPF token is designed to be completely optional and shouldn't cause any problems even if provided, but in the world of BPF LSM, custom security logic can be installed that might change outcome depending on the presence of BPF token). To disable libbpf's default BPF token creation behavior user should provide either invalid BPF token FD (negative), or empty bpf_token_path option. BPF token presence can influence libbpf's feature probing, so if BPF object has associated BPF token, feature probing is instructed to use BPF object-specific feature detection cache and token FD. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-26-andrii@kernel.org
2024-01-24 02:21:22 +00:00
* Setting bpf_token_path option to empty string disables libbpf's
* automatic attempt to create BPF token from default BPF FS mount
* point (/sys/fs/bpf), in case this default behavior is undesirable.
*/
const char *bpf_token_path;
size_t :0;
};
libbpf: Wire up BPF token support at BPF object level Add BPF token support to BPF object-level functionality. BPF token is supported by BPF object logic either as an explicitly provided BPF token from outside (through BPF FS path), or implicitly (unless prevented through bpf_object_open_opts). Implicit mode is assumed to be the most common one for user namespaced unprivileged workloads. The assumption is that privileged container manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token delegation options (delegate_{cmds,maps,progs,attachs} mount options). BPF object during loading will attempt to create BPF token from /sys/fs/bpf location, and pass it for all relevant operations (currently, map creation, BTF load, and program load). In this implicit mode, if BPF token creation fails due to whatever reason (BPF FS is not mounted, or kernel doesn't support BPF token, etc), this is not considered an error. BPF object loading sequence will proceed with no BPF token. In explicit BPF token mode, user provides explicitly custom BPF FS mount point path. In such case, BPF object will attempt to create BPF token from provided BPF FS location. If BPF token creation fails, that is considered a critical error and BPF object load fails with an error. Libbpf provides a way to disable implicit BPF token creation, if it causes any troubles (BPF token is designed to be completely optional and shouldn't cause any problems even if provided, but in the world of BPF LSM, custom security logic can be installed that might change outcome depending on the presence of BPF token). To disable libbpf's default BPF token creation behavior user should provide either invalid BPF token FD (negative), or empty bpf_token_path option. BPF token presence can influence libbpf's feature probing, so if BPF object has associated BPF token, feature probing is instructed to use BPF object-specific feature detection cache and token FD. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-26-andrii@kernel.org
2024-01-24 02:21:22 +00:00
#define bpf_object_open_opts__last_field bpf_token_path
/**
* @brief **bpf_object__open()** creates a bpf_object by opening
* the BPF ELF object file pointed to by the passed path and loading it
* into memory.
* @param path BPF object file path.
* @return pointer to the new bpf_object; or NULL is returned on error,
* error code is stored in errno
*/
LIBBPF_API struct bpf_object *bpf_object__open(const char *path);
/**
* @brief **bpf_object__open_file()** creates a bpf_object by opening
* the BPF ELF object file pointed to by the passed path and loading it
* into memory.
* @param path BPF object file path
* @param opts options for how to load the bpf object, this parameter is
* optional and can be set to NULL
* @return pointer to the new bpf_object; or NULL is returned on error,
* error code is stored in errno
*/
LIBBPF_API struct bpf_object *
bpf_object__open_file(const char *path, const struct bpf_object_open_opts *opts);
/**
* @brief **bpf_object__open_mem()** creates a bpf_object by reading
* the BPF objects raw bytes from a memory buffer containing a valid
* BPF ELF object file.
* @param obj_buf pointer to the buffer containing ELF file bytes
* @param obj_buf_sz number of bytes in the buffer
* @param opts options for how to load the bpf object
* @return pointer to the new bpf_object; or NULL is returned on error,
* error code is stored in errno
*/
LIBBPF_API struct bpf_object *
bpf_object__open_mem(const void *obj_buf, size_t obj_buf_sz,
const struct bpf_object_open_opts *opts);
/**
* @brief **bpf_object__load()** loads BPF object into kernel.
* @param obj Pointer to a valid BPF object instance returned by
* **bpf_object__open*()** APIs
* @return 0, on success; negative error code, otherwise, error code is
* stored in errno
*/
LIBBPF_API int bpf_object__load(struct bpf_object *obj);
/**
* @brief **bpf_object__close()** closes a BPF object and releases all
* resources.
* @param obj Pointer to a valid BPF object
*/
LIBBPF_API void bpf_object__close(struct bpf_object *obj);
/**
* @brief **bpf_object__pin_maps()** pins each map contained within
* the BPF object at the passed directory.
* @param obj Pointer to a valid BPF object
* @param path A directory where maps should be pinned.
* @return 0, on success; negative error code, otherwise
*
* If `path` is NULL `bpf_map__pin` (which is being used on each map)
* will use the pin_path attribute of each map. In this case, maps that
* don't have a pin_path set will be ignored.
*/
LIBBPF_API int bpf_object__pin_maps(struct bpf_object *obj, const char *path);
/**
* @brief **bpf_object__unpin_maps()** unpins each map contained within
* the BPF object found in the passed directory.
* @param obj Pointer to a valid BPF object
* @param path A directory where pinned maps should be searched for.
* @return 0, on success; negative error code, otherwise
*
* If `path` is NULL `bpf_map__unpin` (which is being used on each map)
* will use the pin_path attribute of each map. In this case, maps that
* don't have a pin_path set will be ignored.
*/
LIBBPF_API int bpf_object__unpin_maps(struct bpf_object *obj,
const char *path);
LIBBPF_API int bpf_object__pin_programs(struct bpf_object *obj,
const char *path);
LIBBPF_API int bpf_object__unpin_programs(struct bpf_object *obj,
const char *path);
LIBBPF_API int bpf_object__pin(struct bpf_object *object, const char *path);
LIBBPF_API int bpf_object__unpin(struct bpf_object *object, const char *path);
LIBBPF_API const char *bpf_object__name(const struct bpf_object *obj);
LIBBPF_API unsigned int bpf_object__kversion(const struct bpf_object *obj);
LIBBPF_API int bpf_object__set_kversion(struct bpf_object *obj, __u32 kern_version);
struct btf;
LIBBPF_API struct btf *bpf_object__btf(const struct bpf_object *obj);
LIBBPF_API int bpf_object__btf_fd(const struct bpf_object *obj);
LIBBPF_API struct bpf_program *
bpf_object__find_program_by_name(const struct bpf_object *obj,
const char *name);
LIBBPF_API int
libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
enum bpf_attach_type *expected_attach_type);
LIBBPF_API int libbpf_attach_type_by_name(const char *name,
enum bpf_attach_type *attach_type);
LIBBPF_API int libbpf_find_vmlinux_btf_id(const char *name,
enum bpf_attach_type attach_type);
/* Accessors of bpf_program */
struct bpf_program;
LIBBPF_API struct bpf_program *
bpf_object__next_program(const struct bpf_object *obj, struct bpf_program *prog);
#define bpf_object__for_each_program(pos, obj) \
for ((pos) = bpf_object__next_program((obj), NULL); \
(pos) != NULL; \
(pos) = bpf_object__next_program((obj), (pos)))
LIBBPF_API struct bpf_program *
bpf_object__prev_program(const struct bpf_object *obj, struct bpf_program *prog);
LIBBPF_API void bpf_program__set_ifindex(struct bpf_program *prog,
__u32 ifindex);
LIBBPF_API const char *bpf_program__name(const struct bpf_program *prog);
LIBBPF_API const char *bpf_program__section_name(const struct bpf_program *prog);
libbpf: Support disabling auto-loading BPF programs Currently, bpf_object__load() (and by induction skeleton's load), will always attempt to prepare, relocate, and load into kernel every single BPF program found inside the BPF object file. This is often convenient and the right thing to do and what users expect. But there are plenty of cases (especially with BPF development constantly picking up the pace), where BPF application is intended to work with old kernels, with potentially reduced set of features. But on kernels supporting extra features, it would like to take a full advantage of them, by employing extra BPF program. This could be a choice of using fentry/fexit over kprobe/kretprobe, if kernel is recent enough and is built with BTF. Or BPF program might be providing optimized bpf_iter-based solution that user-space might want to use, whenever available. And so on. With libbpf and BPF CO-RE in particular, it's advantageous to not have to maintain two separate BPF object files to achieve this. So to enable such use cases, this patch adds ability to request not auto-loading chosen BPF programs. In such case, libbpf won't attempt to perform relocations (which might fail due to old kernel), won't try to resolve BTF types for BTF-aware (tp_btf/fentry/fexit/etc) program types, because BTF might not be present, and so on. Skeleton will also automatically skip auto-attachment step for such not loaded BPF programs. Overall, this feature allows to simplify development and deployment of real-world BPF applications with complicated compatibility requirements. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200625232629.3444003-2-andriin@fb.com
2020-06-25 23:26:28 +00:00
LIBBPF_API bool bpf_program__autoload(const struct bpf_program *prog);
LIBBPF_API int bpf_program__set_autoload(struct bpf_program *prog, bool autoload);
LIBBPF_API bool bpf_program__autoattach(const struct bpf_program *prog);
LIBBPF_API void bpf_program__set_autoattach(struct bpf_program *prog, bool autoattach);
libbpf: Add ability to fetch bpf_program's underlying instructions Add APIs providing read-only access to bpf_program BPF instructions ([0]). This is useful for diagnostics purposes, but it also allows a cleaner support for cloning BPF programs after libbpf did all the FD resolution and CO-RE relocations, subprog instructions appending, etc. Currently, cloning BPF program is possible only through hijacking a half-broken bpf_program__set_prep() API, which doesn't really work well for anything but most primitive programs. For instance, set_prep() API doesn't allow adjusting BPF program load parameters which are necessary for loading fentry/fexit BPF programs (the case where BPF program cloning is a necessity if doing some sort of mass-attachment functionality). Given bpf_program__set_prep() API is set to be deprecated, having a cleaner alternative is a must. libbpf internally already keeps track of linear array of struct bpf_insn, so it's not hard to expose it. The only gotcha is that libbpf previously freed instructions array during bpf_object load time, which would make this API much less useful overall, because in between bpf_object__open() and bpf_object__load() a lot of changes to instructions are done by libbpf. So this patch makes libbpf hold onto prog->insns array even after BPF program loading. I think this is a small price for added functionality and improved introspection of BPF program code. See retsnoop PR ([1]) for how it can be used in practice and code savings compared to relying on bpf_program__set_prep(). [0] Closes: https://github.com/libbpf/libbpf/issues/298 [1] https://github.com/anakryiko/retsnoop/pull/1 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211025224531.1088894-3-andrii@kernel.org
2021-10-25 22:45:29 +00:00
struct bpf_insn;
/**
* @brief **bpf_program__insns()** gives read-only access to BPF program's
* underlying BPF instructions.
* @param prog BPF program for which to return instructions
* @return a pointer to an array of BPF instructions that belong to the
* specified BPF program
*
* Returned pointer is always valid and not NULL. Number of `struct bpf_insn`
* pointed to can be fetched using **bpf_program__insn_cnt()** API.
*
* Keep in mind, libbpf can modify and append/delete BPF program's
* instructions as it processes BPF object file and prepares everything for
* uploading into the kernel. So depending on the point in BPF object
* lifetime, **bpf_program__insns()** can return different sets of
* instructions. As an example, during BPF object load phase BPF program
* instructions will be CO-RE-relocated, BPF subprograms instructions will be
* appended, ldimm64 instructions will have FDs embedded, etc. So instructions
* returned before **bpf_object__load()** and after it might be quite
* different.
*/
LIBBPF_API const struct bpf_insn *bpf_program__insns(const struct bpf_program *prog);
/**
* @brief **bpf_program__set_insns()** can set BPF program's underlying
* BPF instructions.
*
* WARNING: This is a very advanced libbpf API and users need to know
* what they are doing. This should be used from prog_prepare_load_fn
* callback only.
*
* @param prog BPF program for which to return instructions
* @param new_insns a pointer to an array of BPF instructions
* @param new_insn_cnt number of `struct bpf_insn`'s that form
* specified BPF program
* @return 0, on success; negative error code, otherwise
*/
LIBBPF_API int bpf_program__set_insns(struct bpf_program *prog,
struct bpf_insn *new_insns, size_t new_insn_cnt);
libbpf: Add ability to fetch bpf_program's underlying instructions Add APIs providing read-only access to bpf_program BPF instructions ([0]). This is useful for diagnostics purposes, but it also allows a cleaner support for cloning BPF programs after libbpf did all the FD resolution and CO-RE relocations, subprog instructions appending, etc. Currently, cloning BPF program is possible only through hijacking a half-broken bpf_program__set_prep() API, which doesn't really work well for anything but most primitive programs. For instance, set_prep() API doesn't allow adjusting BPF program load parameters which are necessary for loading fentry/fexit BPF programs (the case where BPF program cloning is a necessity if doing some sort of mass-attachment functionality). Given bpf_program__set_prep() API is set to be deprecated, having a cleaner alternative is a must. libbpf internally already keeps track of linear array of struct bpf_insn, so it's not hard to expose it. The only gotcha is that libbpf previously freed instructions array during bpf_object load time, which would make this API much less useful overall, because in between bpf_object__open() and bpf_object__load() a lot of changes to instructions are done by libbpf. So this patch makes libbpf hold onto prog->insns array even after BPF program loading. I think this is a small price for added functionality and improved introspection of BPF program code. See retsnoop PR ([1]) for how it can be used in practice and code savings compared to relying on bpf_program__set_prep(). [0] Closes: https://github.com/libbpf/libbpf/issues/298 [1] https://github.com/anakryiko/retsnoop/pull/1 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211025224531.1088894-3-andrii@kernel.org
2021-10-25 22:45:29 +00:00
/**
* @brief **bpf_program__insn_cnt()** returns number of `struct bpf_insn`'s
* that form specified BPF program.
* @param prog BPF program for which to return number of BPF instructions
*
* See **bpf_program__insns()** documentation for notes on how libbpf can
* change instructions and their count during different phases of
* **bpf_object** lifetime.
*/
LIBBPF_API size_t bpf_program__insn_cnt(const struct bpf_program *prog);
LIBBPF_API int bpf_program__fd(const struct bpf_program *prog);
/**
* @brief **bpf_program__pin()** pins the BPF program to a file
* in the BPF FS specified by a path. This increments the programs
* reference count, allowing it to stay loaded after the process
* which loaded it has exited.
*
* @param prog BPF program to pin, must already be loaded
* @param path file path in a BPF file system
* @return 0, on success; negative error code, otherwise
*/
LIBBPF_API int bpf_program__pin(struct bpf_program *prog, const char *path);
/**
* @brief **bpf_program__unpin()** unpins the BPF program from a file
* in the BPFFS specified by a path. This decrements the programs
* reference count.
*
* The file pinning the BPF program can also be unlinked by a different
* process in which case this function will return an error.
*
* @param prog BPF program to unpin
* @param path file path to the pin in a BPF file system
* @return 0, on success; negative error code, otherwise
*/
LIBBPF_API int bpf_program__unpin(struct bpf_program *prog, const char *path);
LIBBPF_API void bpf_program__unload(struct bpf_program *prog);
struct bpf_link;
LIBBPF_API struct bpf_link *bpf_link__open(const char *path);
LIBBPF_API int bpf_link__fd(const struct bpf_link *link);
LIBBPF_API const char *bpf_link__pin_path(const struct bpf_link *link);
/**
* @brief **bpf_link__pin()** pins the BPF link to a file
* in the BPF FS specified by a path. This increments the links
* reference count, allowing it to stay loaded after the process
* which loaded it has exited.
*
* @param link BPF link to pin, must already be loaded
* @param path file path in a BPF file system
* @return 0, on success; negative error code, otherwise
*/
LIBBPF_API int bpf_link__pin(struct bpf_link *link, const char *path);
/**
* @brief **bpf_link__unpin()** unpins the BPF link from a file
* in the BPFFS specified by a path. This decrements the links
* reference count.
*
* The file pinning the BPF link can also be unlinked by a different
* process in which case this function will return an error.
*
* @param prog BPF program to unpin
* @param path file path to the pin in a BPF file system
* @return 0, on success; negative error code, otherwise
*/
LIBBPF_API int bpf_link__unpin(struct bpf_link *link);
LIBBPF_API int bpf_link__update_program(struct bpf_link *link,
struct bpf_program *prog);
libbpf: Add bpf_link__disconnect() API to preserve underlying BPF resource There are cases in which BPF resource (program, map, etc) has to outlive userspace program that "installed" it in the system in the first place. When BPF program is attached, libbpf returns bpf_link object, which is supposed to be destroyed after no longer necessary through bpf_link__destroy() API. Currently, bpf_link destruction causes both automatic detachment and frees up any resources allocated to for bpf_link in-memory representation. This is inconvenient for the case described above because of coupling of detachment and resource freeing. This patch introduces bpf_link__disconnect() API call, which marks bpf_link as disconnected from its underlying BPF resouces. This means that when bpf_link is destroyed later, all its memory resources will be freed, but BPF resource itself won't be detached. This design allows to follow strict and resource-leak-free design by default, while giving easy and straightforward way for user code to opt for keeping BPF resource attached beyond lifetime of a bpf_link. For some BPF programs (i.e., FS-based tracepoints, kprobes, raw tracepoint, etc), user has to make sure to pin BPF program to prevent kernel to automatically detach it on process exit. This should typically be achived by pinning BPF program (or map in some cases) in BPF FS. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191218225039.2668205-1-andriin@fb.com
2019-12-18 22:50:39 +00:00
LIBBPF_API void bpf_link__disconnect(struct bpf_link *link);
LIBBPF_API int bpf_link__detach(struct bpf_link *link);
LIBBPF_API int bpf_link__destroy(struct bpf_link *link);
/**
* @brief **bpf_program__attach()** is a generic function for attaching
* a BPF program based on auto-detection of program type, attach type,
* and extra paremeters, where applicable.
*
* @param prog BPF program to attach
* @return Reference to the newly created BPF link; or NULL is returned on error,
* error code is stored in errno
*
* This is supported for:
* - kprobe/kretprobe (depends on SEC() definition)
* - uprobe/uretprobe (depends on SEC() definition)
* - tracepoint
* - raw tracepoint
* - tracing programs (typed raw TP/fentry/fexit/fmod_ret)
*/
LIBBPF_API struct bpf_link *
bpf_program__attach(const struct bpf_program *prog);
struct bpf_perf_event_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* custom user-provided value fetchable through bpf_get_attach_cookie() */
__u64 bpf_cookie;
/* don't use BPF link when attach BPF program */
bool force_ioctl_attach;
size_t :0;
};
#define bpf_perf_event_opts__last_field force_ioctl_attach
LIBBPF_API struct bpf_link *
bpf_program__attach_perf_event(const struct bpf_program *prog, int pfd);
LIBBPF_API struct bpf_link *
bpf_program__attach_perf_event_opts(const struct bpf_program *prog, int pfd,
const struct bpf_perf_event_opts *opts);
/**
* enum probe_attach_mode - the mode to attach kprobe/uprobe
*
* force libbpf to attach kprobe/uprobe in specific mode, -ENOTSUP will
* be returned if it is not supported by the kernel.
*/
enum probe_attach_mode {
/* attach probe in latest supported mode by kernel */
PROBE_ATTACH_MODE_DEFAULT = 0,
/* attach probe in legacy mode, using debugfs/tracefs */
PROBE_ATTACH_MODE_LEGACY,
/* create perf event with perf_event_open() syscall */
PROBE_ATTACH_MODE_PERF,
/* attach probe with BPF link */
PROBE_ATTACH_MODE_LINK,
};
struct bpf_kprobe_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* custom user-provided value fetchable through bpf_get_attach_cookie() */
__u64 bpf_cookie;
/* function's offset to install kprobe to */
size_t offset;
/* kprobe is return probe */
bool retprobe;
/* kprobe attach mode */
enum probe_attach_mode attach_mode;
size_t :0;
};
#define bpf_kprobe_opts__last_field attach_mode
LIBBPF_API struct bpf_link *
bpf_program__attach_kprobe(const struct bpf_program *prog, bool retprobe,
const char *func_name);
LIBBPF_API struct bpf_link *
bpf_program__attach_kprobe_opts(const struct bpf_program *prog,
const char *func_name,
const struct bpf_kprobe_opts *opts);
libbpf: Add bpf_program__attach_kprobe_multi_opts function Adding bpf_program__attach_kprobe_multi_opts function for attaching kprobe program to multiple functions. struct bpf_link * bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog, const char *pattern, const struct bpf_kprobe_multi_opts *opts); User can specify functions to attach with 'pattern' argument that allows wildcards (*?' supported) or provide symbols or addresses directly through opts argument. These 3 options are mutually exclusive. When using symbols or addresses, user can also provide cookie value for each symbol/address that can be retrieved later in bpf program with bpf_get_attach_cookie helper. struct bpf_kprobe_multi_opts { size_t sz; const char **syms; const unsigned long *addrs; const __u64 *cookies; size_t cnt; bool retprobe; size_t :0; }; Symbols, addresses and cookies are provided through opts object (syms/addrs/cookies) as array pointers with specified count (cnt). Each cookie value is paired with provided function address or symbol with the same array index. The program can be also attached as return probe if 'retprobe' is set. For quick usage with NULL opts argument, like: bpf_program__attach_kprobe_multi_opts(prog, "ksys_*", NULL) the 'prog' will be attached as kprobe to 'ksys_*' functions. Also adding new program sections for automatic attachment: kprobe.multi/<symbol_pattern> kretprobe.multi/<symbol_pattern> The symbol_pattern is used as 'pattern' argument in bpf_program__attach_kprobe_multi_opts function. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-10-jolsa@kernel.org
2022-03-16 12:24:15 +00:00
struct bpf_kprobe_multi_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* array of function symbols to attach */
const char **syms;
/* array of function addresses to attach */
const unsigned long *addrs;
/* array of user-provided values fetchable through bpf_get_attach_cookie */
const __u64 *cookies;
/* number of elements in syms/addrs/cookies arrays */
size_t cnt;
/* create return kprobes */
bool retprobe;
size_t :0;
};
#define bpf_kprobe_multi_opts__last_field retprobe
LIBBPF_API struct bpf_link *
bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog,
const char *pattern,
const struct bpf_kprobe_multi_opts *opts);
struct bpf_uprobe_multi_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* array of function symbols to attach to */
const char **syms;
/* array of function addresses to attach to */
const unsigned long *offsets;
/* optional, array of associated ref counter offsets */
const unsigned long *ref_ctr_offsets;
/* optional, array of associated BPF cookies */
const __u64 *cookies;
/* number of elements in syms/addrs/cookies arrays */
size_t cnt;
/* create return uprobes */
bool retprobe;
size_t :0;
};
#define bpf_uprobe_multi_opts__last_field retprobe
/**
* @brief **bpf_program__attach_uprobe_multi()** attaches a BPF program
* to multiple uprobes with uprobe_multi link.
*
* User can specify 2 mutually exclusive set of inputs:
*
* 1) use only path/func_pattern/pid arguments
*
* 2) use path/pid with allowed combinations of
* syms/offsets/ref_ctr_offsets/cookies/cnt
*
* - syms and offsets are mutually exclusive
* - ref_ctr_offsets and cookies are optional
*
*
* @param prog BPF program to attach
* @param pid Process ID to attach the uprobe to, 0 for self (own process),
* -1 for all processes
* @param binary_path Path to binary
* @param func_pattern Regular expression to specify functions to attach
* BPF program to
* @param opts Additional options (see **struct bpf_uprobe_multi_opts**)
* @return 0, on success; negative error code, otherwise
*/
LIBBPF_API struct bpf_link *
bpf_program__attach_uprobe_multi(const struct bpf_program *prog,
pid_t pid,
const char *binary_path,
const char *func_pattern,
const struct bpf_uprobe_multi_opts *opts);
libbpf: add ksyscall/kretsyscall sections support for syscall kprobes Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding kretsyscall variants (for return kprobes) to allow users to kprobe syscall functions in kernel. These special sections allow to ignore complexities and differences between kernel versions and host architectures when it comes to syscall wrapper and corresponding __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on whether host kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER (though libbpf itself doesn't rely on /proc/config.gz for detecting this, see BPF_KSYSCALL patch for how it's done internally). Combined with the use of BPF_KSYSCALL() macro, this allows to just specify intended syscall name and expected input arguments and leave dealing with all the variations to libbpf. In addition to SEC("ksyscall+") and SEC("kretsyscall+") add bpf_program__attach_ksyscall() API which allows to specify syscall name at runtime and provide associated BPF cookie value. At the moment SEC("ksyscall") and bpf_program__attach_ksyscall() do not handle all the calling convention quirks for mmap(), clone() and compat syscalls. It also only attaches to "native" syscall interfaces. If host system supports compat syscalls or defines 32-bit syscalls in 64-bit kernel, such syscall interfaces won't be attached to by libbpf. These limitations may or may not change in the future. Therefore it is recommended to use SEC("kprobe") for these syscalls or if working with compat and 32-bit interfaces is required. Tested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220714070755.3235561-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-14 07:07:54 +00:00
struct bpf_ksyscall_opts {
/* size of this struct, for forward/backward compatibility */
libbpf: add ksyscall/kretsyscall sections support for syscall kprobes Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding kretsyscall variants (for return kprobes) to allow users to kprobe syscall functions in kernel. These special sections allow to ignore complexities and differences between kernel versions and host architectures when it comes to syscall wrapper and corresponding __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on whether host kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER (though libbpf itself doesn't rely on /proc/config.gz for detecting this, see BPF_KSYSCALL patch for how it's done internally). Combined with the use of BPF_KSYSCALL() macro, this allows to just specify intended syscall name and expected input arguments and leave dealing with all the variations to libbpf. In addition to SEC("ksyscall+") and SEC("kretsyscall+") add bpf_program__attach_ksyscall() API which allows to specify syscall name at runtime and provide associated BPF cookie value. At the moment SEC("ksyscall") and bpf_program__attach_ksyscall() do not handle all the calling convention quirks for mmap(), clone() and compat syscalls. It also only attaches to "native" syscall interfaces. If host system supports compat syscalls or defines 32-bit syscalls in 64-bit kernel, such syscall interfaces won't be attached to by libbpf. These limitations may or may not change in the future. Therefore it is recommended to use SEC("kprobe") for these syscalls or if working with compat and 32-bit interfaces is required. Tested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220714070755.3235561-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-14 07:07:54 +00:00
size_t sz;
/* custom user-provided value fetchable through bpf_get_attach_cookie() */
__u64 bpf_cookie;
/* attach as return probe? */
bool retprobe;
size_t :0;
};
#define bpf_ksyscall_opts__last_field retprobe
/**
* @brief **bpf_program__attach_ksyscall()** attaches a BPF program
* to kernel syscall handler of a specified syscall. Optionally it's possible
* to request to install retprobe that will be triggered at syscall exit. It's
* also possible to associate BPF cookie (though options).
*
* Libbpf automatically will determine correct full kernel function name,
* which depending on system architecture and kernel version/configuration
* could be of the form __<arch>_sys_<syscall> or __se_sys_<syscall>, and will
* attach specified program using kprobe/kretprobe mechanism.
*
* **bpf_program__attach_ksyscall()** is an API counterpart of declarative
* **SEC("ksyscall/<syscall>")** annotation of BPF programs.
*
* At the moment **SEC("ksyscall")** and **bpf_program__attach_ksyscall()** do
* not handle all the calling convention quirks for mmap(), clone() and compat
* syscalls. It also only attaches to "native" syscall interfaces. If host
* system supports compat syscalls or defines 32-bit syscalls in 64-bit
* kernel, such syscall interfaces won't be attached to by libbpf.
*
* These limitations may or may not change in the future. Therefore it is
* recommended to use SEC("kprobe") for these syscalls or if working with
* compat and 32-bit interfaces is required.
*
* @param prog BPF program to attach
* @param syscall_name Symbolic name of the syscall (e.g., "bpf")
* @param opts Additional options (see **struct bpf_ksyscall_opts**)
* @return Reference to the newly created BPF link; or NULL is returned on
* error, error code is stored in errno
*/
LIBBPF_API struct bpf_link *
bpf_program__attach_ksyscall(const struct bpf_program *prog,
const char *syscall_name,
const struct bpf_ksyscall_opts *opts);
struct bpf_uprobe_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* offset of kernel reference counted USDT semaphore, added in
* a6ca88b241d5 ("trace_uprobe: support reference counter in fd-based uprobe")
*/
size_t ref_ctr_offset;
/* custom user-provided value fetchable through bpf_get_attach_cookie() */
__u64 bpf_cookie;
/* uprobe is return probe, invoked at function return time */
bool retprobe;
/* Function name to attach to. Could be an unqualified ("abc") or library-qualified
* "abc@LIBXYZ" name. To specify function entry, func_name should be set while
* func_offset argument to bpf_prog__attach_uprobe_opts() should be 0. To trace an
* offset within a function, specify func_name and use func_offset argument to specify
* offset within the function. Shared library functions must specify the shared library
* binary_path.
*/
const char *func_name;
/* uprobe attach mode */
enum probe_attach_mode attach_mode;
size_t :0;
};
#define bpf_uprobe_opts__last_field attach_mode
/**
* @brief **bpf_program__attach_uprobe()** attaches a BPF program
* to the userspace function which is found by binary path and
* offset. You can optionally specify a particular proccess to attach
* to. You can also optionally attach the program to the function
* exit instead of entry.
*
* @param prog BPF program to attach
* @param retprobe Attach to function exit
* @param pid Process ID to attach the uprobe to, 0 for self (own process),
* -1 for all processes
* @param binary_path Path to binary that contains the function symbol
* @param func_offset Offset within the binary of the function symbol
* @return Reference to the newly created BPF link; or NULL is returned on error,
* error code is stored in errno
*/
LIBBPF_API struct bpf_link *
bpf_program__attach_uprobe(const struct bpf_program *prog, bool retprobe,
pid_t pid, const char *binary_path,
size_t func_offset);
/**
* @brief **bpf_program__attach_uprobe_opts()** is just like
* bpf_program__attach_uprobe() except with a options struct
* for various configurations.
*
* @param prog BPF program to attach
* @param pid Process ID to attach the uprobe to, 0 for self (own process),
* -1 for all processes
* @param binary_path Path to binary that contains the function symbol
* @param func_offset Offset within the binary of the function symbol
* @param opts Options for altering program attachment
* @return Reference to the newly created BPF link; or NULL is returned on error,
* error code is stored in errno
*/
LIBBPF_API struct bpf_link *
bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
const char *binary_path, size_t func_offset,
const struct bpf_uprobe_opts *opts);
libbpf: Wire up USDT API and bpf_link integration Wire up libbpf USDT support APIs without yet implementing all the nitty-gritty details of USDT discovery, spec parsing, and BPF map initialization. User-visible user-space API is simple and is conceptually very similar to uprobe API. bpf_program__attach_usdt() API allows to programmatically attach given BPF program to a USDT, specified through binary path (executable or shared lib), USDT provider and name. Also, just like in uprobe case, PID filter is specified (0 - self, -1 - any process, or specific PID). Optionally, USDT cookie value can be specified. Such single API invocation will try to discover given USDT in specified binary and will use (potentially many) BPF uprobes to attach this program in correct locations. Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that represents this attachment. It is a virtual BPF link that doesn't have direct kernel object, as it can consist of multiple underlying BPF uprobe links. As such, attachment is not atomic operation and there can be brief moment when some USDT call sites are attached while others are still in the process of attaching. This should be taken into consideration by user. But bpf_program__attach_usdt() guarantees that in the case of success all USDT call sites are successfully attached, or all the successfuly attachments will be detached as soon as some USDT call sites failed to be attached. So, in theory, there could be cases of failed bpf_program__attach_usdt() call which did trigger few USDT program invocations. This is unavoidable due to multi-uprobe nature of USDT and has to be handled by user, if it's important to create an illusion of atomicity. USDT BPF programs themselves are marked in BPF source code as either SEC("usdt"), in which case they won't be auto-attached through skeleton's <skel>__attach() method, or it can have a full definition, which follows the spirit of fully-specified uprobes: SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's attach method will attempt auto-attachment. Similarly, generic bpf_program__attach() will have enought information to go off of for parameterless attachment. USDT BPF programs are actually uprobes, and as such for kernel they are marked as BPF_PROG_TYPE_KPROBE. Another part of this patch is USDT-related feature probing: - BPF cookie support detection from user-space; - detection of kernel support for auto-refcounting of USDT semaphore. The latter is optional. If kernel doesn't support such feature and USDT doesn't rely on USDT semaphores, no error is returned. But if libbpf detects that USDT requires setting semaphores and kernel doesn't support this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't support poking process's memory directly to increment semaphore value, like BCC does on legacy kernels, due to inherent raciness and danger of such process memory manipulation. Libbpf let's kernel take care of this properly or gives up. Logistically, all the extra USDT-related infrastructure of libbpf is put into a separate usdt.c file and abstracted behind struct usdt_manager. Each bpf_object has lazily-initialized usdt_manager pointer, which is only instantiated if USDT programs are attempted to be attached. Closing BPF object frees up usdt_manager resources. usdt_manager keeps track of USDT spec ID assignment and few other small things. Subsequent patches will fill out remaining missing pieces of USDT initialization and setup logic. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-3-andrii@kernel.org
2022-04-04 23:41:57 +00:00
struct bpf_usdt_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* custom user-provided value accessible through usdt_cookie() */
__u64 usdt_cookie;
size_t :0;
};
#define bpf_usdt_opts__last_field usdt_cookie
/**
* @brief **bpf_program__attach_usdt()** is just like
* bpf_program__attach_uprobe_opts() except it covers USDT (User-space
* Statically Defined Tracepoint) attachment, instead of attaching to
* user-space function entry or exit.
*
* @param prog BPF program to attach
* @param pid Process ID to attach the uprobe to, 0 for self (own process),
* -1 for all processes
* @param binary_path Path to binary that contains provided USDT probe
* @param usdt_provider USDT provider name
* @param usdt_name USDT probe name
* @param opts Options for altering program attachment
* @return Reference to the newly created BPF link; or NULL is returned on error,
* error code is stored in errno
*/
LIBBPF_API struct bpf_link *
bpf_program__attach_usdt(const struct bpf_program *prog,
pid_t pid, const char *binary_path,
const char *usdt_provider, const char *usdt_name,
const struct bpf_usdt_opts *opts);
struct bpf_tracepoint_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* custom user-provided value fetchable through bpf_get_attach_cookie() */
__u64 bpf_cookie;
};
#define bpf_tracepoint_opts__last_field bpf_cookie
LIBBPF_API struct bpf_link *
bpf_program__attach_tracepoint(const struct bpf_program *prog,
const char *tp_category,
const char *tp_name);
LIBBPF_API struct bpf_link *
bpf_program__attach_tracepoint_opts(const struct bpf_program *prog,
const char *tp_category,
const char *tp_name,
const struct bpf_tracepoint_opts *opts);
LIBBPF_API struct bpf_link *
bpf_program__attach_raw_tracepoint(const struct bpf_program *prog,
const char *tp_name);
struct bpf_trace_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* custom user-provided value fetchable through bpf_get_attach_cookie() */
__u64 cookie;
};
#define bpf_trace_opts__last_field cookie
LIBBPF_API struct bpf_link *
bpf_program__attach_trace(const struct bpf_program *prog);
LIBBPF_API struct bpf_link *
bpf_program__attach_trace_opts(const struct bpf_program *prog, const struct bpf_trace_opts *opts);
LIBBPF_API struct bpf_link *
bpf_program__attach_lsm(const struct bpf_program *prog);
LIBBPF_API struct bpf_link *
bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd);
LIBBPF_API struct bpf_link *
bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd);
LIBBPF_API struct bpf_link *
bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex);
LIBBPF_API struct bpf_link *
bpf_program__attach_freplace(const struct bpf_program *prog,
int target_fd, const char *attach_func_name);
struct bpf_netfilter_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
__u32 pf;
__u32 hooknum;
__s32 priority;
__u32 flags;
};
#define bpf_netfilter_opts__last_field flags
LIBBPF_API struct bpf_link *
bpf_program__attach_netfilter(const struct bpf_program *prog,
const struct bpf_netfilter_opts *opts);
struct bpf_tcx_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
__u32 flags;
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_tcx_opts__last_field expected_revision
LIBBPF_API struct bpf_link *
bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex,
const struct bpf_tcx_opts *opts);
struct bpf_netkit_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
__u32 flags;
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_netkit_opts__last_field expected_revision
LIBBPF_API struct bpf_link *
bpf_program__attach_netkit(const struct bpf_program *prog, int ifindex,
const struct bpf_netkit_opts *opts);
bpf: libbpf: Add STRUCT_OPS support This patch adds BPF STRUCT_OPS support to libbpf. The only sec_name convention is SEC(".struct_ops") to identify the struct_ops implemented in BPF, e.g. To implement a tcp_congestion_ops: SEC(".struct_ops") struct tcp_congestion_ops dctcp = { .init = (void *)dctcp_init, /* <-- a bpf_prog */ /* ... some more func prts ... */ .name = "bpf_dctcp", }; Each struct_ops is defined as a global variable under SEC(".struct_ops") as above. libbpf creates a map for each variable and the variable name is the map's name. Multiple struct_ops is supported under SEC(".struct_ops"). In the bpf_object__open phase, libbpf will look for the SEC(".struct_ops") section and find out what is the btf-type the struct_ops is implementing. Note that the btf-type here is referring to a type in the bpf_prog.o's btf. A "struct bpf_map" is added by bpf_object__add_map() as other maps do. It will then collect (through SHT_REL) where are the bpf progs that the func ptrs are referring to. No btf_vmlinux is needed in the open phase. In the bpf_object__load phase, the map-fields, which depend on the btf_vmlinux, are initialized (in bpf_map__init_kern_struct_ops()). It will also set the prog->type, prog->attach_btf_id, and prog->expected_attach_type. Thus, the prog's properties do not rely on its section name. [ Currently, the bpf_prog's btf-type ==> btf_vmlinux's btf-type matching process is as simple as: member-name match + btf-kind match + size match. If these matching conditions fail, libbpf will reject. The current targeting support is "struct tcp_congestion_ops" which most of its members are function pointers. The member ordering of the bpf_prog's btf-type can be different from the btf_vmlinux's btf-type. ] Then, all obj->maps are created as usual (in bpf_object__create_maps()). Once the maps are created and prog's properties are all set, the libbpf will proceed to load all the progs. bpf_map__attach_struct_ops() is added to register a struct_ops map to a kernel subsystem. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200109003514.3856730-1-kafai@fb.com
2020-01-09 00:35:14 +00:00
struct bpf_map;
LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map);
LIBBPF_API int bpf_link__update_map(struct bpf_link *link, const struct bpf_map *map);
struct bpf_iter_attach_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
union bpf_iter_link_info *link_info;
__u32 link_info_len;
};
#define bpf_iter_attach_opts__last_field link_info_len
LIBBPF_API struct bpf_link *
bpf_program__attach_iter(const struct bpf_program *prog,
const struct bpf_iter_attach_opts *opts);
LIBBPF_API enum bpf_prog_type bpf_program__type(const struct bpf_program *prog);
/**
* @brief **bpf_program__set_type()** sets the program
* type of the passed BPF program.
* @param prog BPF program to set the program type for
* @param type program type to set the BPF map to have
* @return error code; or 0 if no error. An error occurs
* if the object is already loaded.
*
* This must be called before the BPF object is loaded,
* otherwise it has no effect and an error is returned.
*/
LIBBPF_API int bpf_program__set_type(struct bpf_program *prog,
enum bpf_prog_type type);
LIBBPF_API enum bpf_attach_type
bpf_program__expected_attach_type(const struct bpf_program *prog);
/**
* @brief **bpf_program__set_expected_attach_type()** sets the
* attach type of the passed BPF program. This is used for
* auto-detection of attachment when programs are loaded.
* @param prog BPF program to set the attach type for
* @param type attach type to set the BPF map to have
* @return error code; or 0 if no error. An error occurs
* if the object is already loaded.
*
* This must be called before the BPF object is loaded,
* otherwise it has no effect and an error is returned.
*/
LIBBPF_API int
bpf_program__set_expected_attach_type(struct bpf_program *prog,
enum bpf_attach_type type);
LIBBPF_API __u32 bpf_program__flags(const struct bpf_program *prog);
LIBBPF_API int bpf_program__set_flags(struct bpf_program *prog, __u32 flags);
/* Per-program log level and log buffer getters/setters.
* See bpf_object_open_opts comments regarding log_level and log_buf
* interactions.
*/
LIBBPF_API __u32 bpf_program__log_level(const struct bpf_program *prog);
LIBBPF_API int bpf_program__set_log_level(struct bpf_program *prog, __u32 log_level);
LIBBPF_API const char *bpf_program__log_buf(const struct bpf_program *prog, size_t *log_size);
LIBBPF_API int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log_size);
/**
* @brief **bpf_program__set_attach_target()** sets BTF-based attach target
* for supported BPF program types:
* - BTF-aware raw tracepoints (tp_btf);
* - fentry/fexit/fmod_ret;
* - lsm;
* - freplace.
* @param prog BPF program to set the attach type for
* @param type attach type to set the BPF map to have
* @return error code; or 0 if no error occurred.
*/
LIBBPF_API int
bpf_program__set_attach_target(struct bpf_program *prog, int attach_prog_fd,
const char *attach_func_name);
/**
* @brief **bpf_object__find_map_by_name()** returns BPF map of
* the given name, if it exists within the passed BPF object
* @param obj BPF object
* @param name name of the BPF map
* @return BPF map instance, if such map exists within the BPF object;
* or NULL otherwise.
*/
LIBBPF_API struct bpf_map *
bpf_object__find_map_by_name(const struct bpf_object *obj, const char *name);
LIBBPF_API int
bpf_object__find_map_fd_by_name(const struct bpf_object *obj, const char *name);
LIBBPF_API struct bpf_map *
bpf_object__next_map(const struct bpf_object *obj, const struct bpf_map *map);
#define bpf_object__for_each_map(pos, obj) \
for ((pos) = bpf_object__next_map((obj), NULL); \
(pos) != NULL; \
(pos) = bpf_object__next_map((obj), (pos)))
#define bpf_map__for_each bpf_object__for_each_map
LIBBPF_API struct bpf_map *
bpf_object__prev_map(const struct bpf_object *obj, const struct bpf_map *map);
libbpf: Allow to opt-out from creating BPF maps Add bpf_map__set_autocreate() API that allows user to opt-out from libbpf automatically creating BPF map during BPF object load. This is a useful feature when building CO-RE-enabled BPF application that takes advantage of some new-ish BPF map type (e.g., socket-local storage) if kernel supports it, but otherwise uses some alternative way (e.g., extra HASH map). In such case, being able to disable the creation of a map that kernel doesn't support allows to successfully create and load BPF object file with all its other maps and programs. It's still up to user to make sure that no "live" code in any of their BPF programs are referencing such map instance, which can be achieved by guarding such code with CO-RE relocation check or by using .rodata global variables. If user fails to properly guard such code to turn it into "dead code", libbpf will helpfully post-process BPF verifier log and will provide more meaningful error and map name that needs to be guarded properly. As such, instead of: ; value = bpf_map_lookup_elem(&missing_map, &zero); 4: (85) call unknown#2001000000 invalid func unknown#2001000000 ... user will see: ; value = bpf_map_lookup_elem(&missing_map, &zero); 4: <invalid BPF map reference> BPF map 'missing_map' is referenced but wasn't created Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220428041523.4089853-4-andrii@kernel.org
2022-04-28 04:15:22 +00:00
/**
* @brief **bpf_map__set_autocreate()** sets whether libbpf has to auto-create
* BPF map during BPF object load phase.
* @param map the BPF map instance
* @param autocreate whether to create BPF map during BPF object load
* @return 0 on success; -EBUSY if BPF object was already loaded
*
* **bpf_map__set_autocreate()** allows to opt-out from libbpf auto-creating
* BPF map. By default, libbpf will attempt to create every single BPF map
* defined in BPF object file using BPF_MAP_CREATE command of bpf() syscall
* and fill in map FD in BPF instructions.
*
* This API allows to opt-out of this process for specific map instance. This
* can be useful if host kernel doesn't support such BPF map type or used
* combination of flags and user application wants to avoid creating such
* a map in the first place. User is still responsible to make sure that their
* BPF-side code that expects to use such missing BPF map is recognized by BPF
* verifier as dead code, otherwise BPF verifier will reject such BPF program.
*/
LIBBPF_API int bpf_map__set_autocreate(struct bpf_map *map, bool autocreate);
LIBBPF_API bool bpf_map__autocreate(const struct bpf_map *map);
/**
* @brief **bpf_map__fd()** gets the file descriptor of the passed
* BPF map
* @param map the BPF map instance
* @return the file descriptor; or -EINVAL in case of an error
*/
LIBBPF_API int bpf_map__fd(const struct bpf_map *map);
libbpf: Add a bunch of attribute getters/setters for map definitions Add a bunch of getter for various aspects of BPF map. Some of these attribute (e.g., key_size, value_size, type, etc) are available right now in struct bpf_map_def, but this patch adds getter allowing to fetch them individually. bpf_map_def approach isn't very scalable, when ABI stability requirements are taken into account. It's much easier to extend libbpf and add support for new features, when each aspect of BPF map has separate getter/setter. Getters follow the common naming convention of not explicitly having "get" in its name: bpf_map__type() returns map type, bpf_map__key_size() returns key_size. Setters, though, explicitly have set in their name: bpf_map__set_type(), bpf_map__set_key_size(). This patch ensures we now have a getter and a setter for the following map attributes: - type; - max_entries; - map_flags; - numa_node; - key_size; - value_size; - ifindex. bpf_map__resize() enforces unnecessary restriction of max_entries > 0. It is unnecessary, because libbpf actually supports zero max_entries for some cases (e.g., for PERF_EVENT_ARRAY map) and treats it specially during map creation time. To allow setting max_entries=0, new bpf_map__set_max_entries() setter is added. bpf_map__resize()'s behavior is preserved for backwards compatibility reasons. Map ifindex getter is added as well. There is a setter already, but no corresponding getter. Fix this assymetry as well. bpf_map__set_ifindex() itself is converted from void function into error-returning one, similar to other setters. The only error returned right now is -EBUSY, if BPF map is already loaded and has corresponding FD. One lacking attribute with no ability to get/set or even specify it declaratively is numa_node. This patch fixes this gap and both adds programmatic getter/setter, as well as adds support for numa_node field in BTF-defined map. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200621062112.3006313-1-andriin@fb.com
2020-06-21 06:21:12 +00:00
LIBBPF_API int bpf_map__reuse_fd(struct bpf_map *map, int fd);
/* get map name */
LIBBPF_API const char *bpf_map__name(const struct bpf_map *map);
libbpf: Add a bunch of attribute getters/setters for map definitions Add a bunch of getter for various aspects of BPF map. Some of these attribute (e.g., key_size, value_size, type, etc) are available right now in struct bpf_map_def, but this patch adds getter allowing to fetch them individually. bpf_map_def approach isn't very scalable, when ABI stability requirements are taken into account. It's much easier to extend libbpf and add support for new features, when each aspect of BPF map has separate getter/setter. Getters follow the common naming convention of not explicitly having "get" in its name: bpf_map__type() returns map type, bpf_map__key_size() returns key_size. Setters, though, explicitly have set in their name: bpf_map__set_type(), bpf_map__set_key_size(). This patch ensures we now have a getter and a setter for the following map attributes: - type; - max_entries; - map_flags; - numa_node; - key_size; - value_size; - ifindex. bpf_map__resize() enforces unnecessary restriction of max_entries > 0. It is unnecessary, because libbpf actually supports zero max_entries for some cases (e.g., for PERF_EVENT_ARRAY map) and treats it specially during map creation time. To allow setting max_entries=0, new bpf_map__set_max_entries() setter is added. bpf_map__resize()'s behavior is preserved for backwards compatibility reasons. Map ifindex getter is added as well. There is a setter already, but no corresponding getter. Fix this assymetry as well. bpf_map__set_ifindex() itself is converted from void function into error-returning one, similar to other setters. The only error returned right now is -EBUSY, if BPF map is already loaded and has corresponding FD. One lacking attribute with no ability to get/set or even specify it declaratively is numa_node. This patch fixes this gap and both adds programmatic getter/setter, as well as adds support for numa_node field in BTF-defined map. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200621062112.3006313-1-andriin@fb.com
2020-06-21 06:21:12 +00:00
/* get/set map type */
LIBBPF_API enum bpf_map_type bpf_map__type(const struct bpf_map *map);
LIBBPF_API int bpf_map__set_type(struct bpf_map *map, enum bpf_map_type type);
/* get/set map size (max_entries) */
LIBBPF_API __u32 bpf_map__max_entries(const struct bpf_map *map);
LIBBPF_API int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries);
/* get/set map flags */
LIBBPF_API __u32 bpf_map__map_flags(const struct bpf_map *map);
LIBBPF_API int bpf_map__set_map_flags(struct bpf_map *map, __u32 flags);
/* get/set map NUMA node */
LIBBPF_API __u32 bpf_map__numa_node(const struct bpf_map *map);
LIBBPF_API int bpf_map__set_numa_node(struct bpf_map *map, __u32 numa_node);
/* get/set map key size */
LIBBPF_API __u32 bpf_map__key_size(const struct bpf_map *map);
LIBBPF_API int bpf_map__set_key_size(struct bpf_map *map, __u32 size);
/* get map value size */
libbpf: Add a bunch of attribute getters/setters for map definitions Add a bunch of getter for various aspects of BPF map. Some of these attribute (e.g., key_size, value_size, type, etc) are available right now in struct bpf_map_def, but this patch adds getter allowing to fetch them individually. bpf_map_def approach isn't very scalable, when ABI stability requirements are taken into account. It's much easier to extend libbpf and add support for new features, when each aspect of BPF map has separate getter/setter. Getters follow the common naming convention of not explicitly having "get" in its name: bpf_map__type() returns map type, bpf_map__key_size() returns key_size. Setters, though, explicitly have set in their name: bpf_map__set_type(), bpf_map__set_key_size(). This patch ensures we now have a getter and a setter for the following map attributes: - type; - max_entries; - map_flags; - numa_node; - key_size; - value_size; - ifindex. bpf_map__resize() enforces unnecessary restriction of max_entries > 0. It is unnecessary, because libbpf actually supports zero max_entries for some cases (e.g., for PERF_EVENT_ARRAY map) and treats it specially during map creation time. To allow setting max_entries=0, new bpf_map__set_max_entries() setter is added. bpf_map__resize()'s behavior is preserved for backwards compatibility reasons. Map ifindex getter is added as well. There is a setter already, but no corresponding getter. Fix this assymetry as well. bpf_map__set_ifindex() itself is converted from void function into error-returning one, similar to other setters. The only error returned right now is -EBUSY, if BPF map is already loaded and has corresponding FD. One lacking attribute with no ability to get/set or even specify it declaratively is numa_node. This patch fixes this gap and both adds programmatic getter/setter, as well as adds support for numa_node field in BTF-defined map. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200621062112.3006313-1-andriin@fb.com
2020-06-21 06:21:12 +00:00
LIBBPF_API __u32 bpf_map__value_size(const struct bpf_map *map);
/**
* @brief **bpf_map__set_value_size()** sets map value size.
* @param map the BPF map instance
* @return 0, on success; negative error, otherwise
*
* There is a special case for maps with associated memory-mapped regions, like
* the global data section maps (bss, data, rodata). When this function is used
* on such a map, the mapped region is resized. Afterward, an attempt is made to
* adjust the corresponding BTF info. This attempt is best-effort and can only
* succeed if the last variable of the data section map is an array. The array
* BTF type is replaced by a new BTF array type with a different length.
* Any previously existing pointers returned from bpf_map__initial_value() or
* corresponding data section skeleton pointer must be reinitialized.
*/
libbpf: Add a bunch of attribute getters/setters for map definitions Add a bunch of getter for various aspects of BPF map. Some of these attribute (e.g., key_size, value_size, type, etc) are available right now in struct bpf_map_def, but this patch adds getter allowing to fetch them individually. bpf_map_def approach isn't very scalable, when ABI stability requirements are taken into account. It's much easier to extend libbpf and add support for new features, when each aspect of BPF map has separate getter/setter. Getters follow the common naming convention of not explicitly having "get" in its name: bpf_map__type() returns map type, bpf_map__key_size() returns key_size. Setters, though, explicitly have set in their name: bpf_map__set_type(), bpf_map__set_key_size(). This patch ensures we now have a getter and a setter for the following map attributes: - type; - max_entries; - map_flags; - numa_node; - key_size; - value_size; - ifindex. bpf_map__resize() enforces unnecessary restriction of max_entries > 0. It is unnecessary, because libbpf actually supports zero max_entries for some cases (e.g., for PERF_EVENT_ARRAY map) and treats it specially during map creation time. To allow setting max_entries=0, new bpf_map__set_max_entries() setter is added. bpf_map__resize()'s behavior is preserved for backwards compatibility reasons. Map ifindex getter is added as well. There is a setter already, but no corresponding getter. Fix this assymetry as well. bpf_map__set_ifindex() itself is converted from void function into error-returning one, similar to other setters. The only error returned right now is -EBUSY, if BPF map is already loaded and has corresponding FD. One lacking attribute with no ability to get/set or even specify it declaratively is numa_node. This patch fixes this gap and both adds programmatic getter/setter, as well as adds support for numa_node field in BTF-defined map. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200621062112.3006313-1-andriin@fb.com
2020-06-21 06:21:12 +00:00
LIBBPF_API int bpf_map__set_value_size(struct bpf_map *map, __u32 size);
/* get map key/value BTF type IDs */
LIBBPF_API __u32 bpf_map__btf_key_type_id(const struct bpf_map *map);
LIBBPF_API __u32 bpf_map__btf_value_type_id(const struct bpf_map *map);
libbpf: Add a bunch of attribute getters/setters for map definitions Add a bunch of getter for various aspects of BPF map. Some of these attribute (e.g., key_size, value_size, type, etc) are available right now in struct bpf_map_def, but this patch adds getter allowing to fetch them individually. bpf_map_def approach isn't very scalable, when ABI stability requirements are taken into account. It's much easier to extend libbpf and add support for new features, when each aspect of BPF map has separate getter/setter. Getters follow the common naming convention of not explicitly having "get" in its name: bpf_map__type() returns map type, bpf_map__key_size() returns key_size. Setters, though, explicitly have set in their name: bpf_map__set_type(), bpf_map__set_key_size(). This patch ensures we now have a getter and a setter for the following map attributes: - type; - max_entries; - map_flags; - numa_node; - key_size; - value_size; - ifindex. bpf_map__resize() enforces unnecessary restriction of max_entries > 0. It is unnecessary, because libbpf actually supports zero max_entries for some cases (e.g., for PERF_EVENT_ARRAY map) and treats it specially during map creation time. To allow setting max_entries=0, new bpf_map__set_max_entries() setter is added. bpf_map__resize()'s behavior is preserved for backwards compatibility reasons. Map ifindex getter is added as well. There is a setter already, but no corresponding getter. Fix this assymetry as well. bpf_map__set_ifindex() itself is converted from void function into error-returning one, similar to other setters. The only error returned right now is -EBUSY, if BPF map is already loaded and has corresponding FD. One lacking attribute with no ability to get/set or even specify it declaratively is numa_node. This patch fixes this gap and both adds programmatic getter/setter, as well as adds support for numa_node field in BTF-defined map. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200621062112.3006313-1-andriin@fb.com
2020-06-21 06:21:12 +00:00
/* get/set map if_index */
LIBBPF_API __u32 bpf_map__ifindex(const struct bpf_map *map);
LIBBPF_API int bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex);
/* get/set map map_extra flags */
LIBBPF_API __u64 bpf_map__map_extra(const struct bpf_map *map);
LIBBPF_API int bpf_map__set_map_extra(struct bpf_map *map, __u64 map_extra);
LIBBPF_API int bpf_map__set_initial_value(struct bpf_map *map,
const void *data, size_t size);
libbpf: Recognize __arena global variables. LLVM automatically places __arena variables into ".arena.1" ELF section. In order to use such global variables bpf program must include definition of arena map in ".maps" section, like: struct { __uint(type, BPF_MAP_TYPE_ARENA); __uint(map_flags, BPF_F_MMAPABLE); __uint(max_entries, 1000); /* number of pages */ __ulong(map_extra, 2ull << 44); /* start of mmap() region */ } arena SEC(".maps"); libbpf recognizes both uses of arena and creates single `struct bpf_map *` instance in libbpf APIs. ".arena.1" ELF section data is used as initial data image, which is exposed through skeleton and bpf_map__initial_value() to the user, if they need to tune it before the load phase. During load phase, this initial image is copied over into mmap()'ed region corresponding to arena, and discarded. Few small checks here and there had to be added to make sure this approach works with bpf_map__initial_value(), mostly due to hard-coded assumption that map->mmaped is set up with mmap() syscall and should be munmap()'ed. For arena, .arena.1 can be (much) smaller than maximum arena size, so this smaller data size has to be tracked separately. Given it is enforced that there is only one arena for entire bpf_object instance, we just keep it in a separate field. This can be generalized if necessary later. All global variables from ".arena.1" section are accessible from user space via skel->arena->name_of_var. For bss/data/rodata the skeleton/libbpf perform the following sequence: 1. addr = mmap(MAP_ANONYMOUS) 2. user space optionally modifies global vars 3. map_fd = bpf_create_map() 4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel 5. mmap(addr, MAP_FIXED, map_fd) after step 5 user spaces see the values it wrote at step 2 at the same addresses arena doesn't support update_map_elem. Hence skeleton/libbpf do: 1. addr = malloc(sizeof SEC ".arena.1") 2. user space optionally modifies global vars 3. map_fd = bpf_create_map(MAP_TYPE_ARENA) 4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd) 5. memcpy(real_addr, addr) // this will fault-in and allocate pages At the end look and feel of global data vs __arena global data is the same from bpf prog pov. Another complication is: struct { __uint(type, BPF_MAP_TYPE_ARENA); } arena SEC(".maps"); int __arena foo; int bar; ptr1 = &foo; // relocation against ".arena.1" section ptr2 = &arena; // relocation against ".maps" section ptr3 = &bar; // relocation against ".bss" section Fo the kernel ptr1 and ptr2 has point to the same arena's map_fd while ptr3 points to a different global array's map_fd. For the verifier: ptr1->type == unknown_scalar ptr2->type == const_ptr_to_map ptr3->type == ptr_to_map_value After verification, from JIT pov all 3 ptr-s are normal ld_imm64 insns. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20240308010812.89848-11-alexei.starovoitov@gmail.com
2024-03-08 01:08:08 +00:00
LIBBPF_API void *bpf_map__initial_value(const struct bpf_map *map, size_t *psize);
/**
* @brief **bpf_map__is_internal()** tells the caller whether or not the
* passed map is a special map created by libbpf automatically for things like
* global variables, __ksym externs, Kconfig values, etc
* @param map the bpf_map
* @return true, if the map is an internal map; false, otherwise
*/
LIBBPF_API bool bpf_map__is_internal(const struct bpf_map *map);
/**
* @brief **bpf_map__set_pin_path()** sets the path attribute that tells where the
* BPF map should be pinned. This does not actually create the 'pin'.
* @param map The bpf_map
* @param path The path
* @return 0, on success; negative error, otherwise
*/
LIBBPF_API int bpf_map__set_pin_path(struct bpf_map *map, const char *path);
/**
* @brief **bpf_map__pin_path()** gets the path attribute that tells where the
* BPF map should be pinned.
* @param map The bpf_map
* @return The path string; which can be NULL
*/
LIBBPF_API const char *bpf_map__pin_path(const struct bpf_map *map);
/**
* @brief **bpf_map__is_pinned()** tells the caller whether or not the
* passed map has been pinned via a 'pin' file.
* @param map The bpf_map
* @return true, if the map is pinned; false, otherwise
*/
LIBBPF_API bool bpf_map__is_pinned(const struct bpf_map *map);
/**
* @brief **bpf_map__pin()** creates a file that serves as a 'pin'
* for the BPF map. This increments the reference count on the
* BPF map which will keep the BPF map loaded even after the
* userspace process which loaded it has exited.
* @param map The bpf_map to pin
* @param path A file path for the 'pin'
* @return 0, on success; negative error, otherwise
*
* If `path` is NULL the maps `pin_path` attribute will be used. If this is
* also NULL, an error will be returned and the map will not be pinned.
*/
LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
/**
* @brief **bpf_map__unpin()** removes the file that serves as a
* 'pin' for the BPF map.
* @param map The bpf_map to unpin
* @param path A file path for the 'pin'
* @return 0, on success; negative error, otherwise
*
* The `path` parameter can be NULL, in which case the `pin_path`
* map attribute is unpinned. If both the `path` parameter and
* `pin_path` map attribute are set, they must be equal.
*/
LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
LIBBPF_API int bpf_map__set_inner_map_fd(struct bpf_map *map, int fd);
LIBBPF_API struct bpf_map *bpf_map__inner_map(struct bpf_map *map);
libbpf: Add safer high-level wrappers for map operations Add high-level API wrappers for most common and typical BPF map operations that works directly on instances of struct bpf_map * (so you don't have to call bpf_map__fd()) and validate key/value size expectations. These helpers require users to specify key (and value, where appropriate) sizes when performing lookup/update/delete/etc. This forces user to actually think and validate (for themselves) those. This is a good thing as user is expected by kernel to implicitly provide correct key/value buffer sizes and kernel will just read/write necessary amount of data. If it so happens that user doesn't set up buffers correctly (which bit people for per-CPU maps especially) kernel either randomly overwrites stack data or return -EFAULT, depending on user's luck and circumstances. These high-level APIs are meant to prevent such unpleasant and hard to debug bugs. This patch also adds bpf_map_delete_elem_flags() low-level API and requires passing flags to bpf_map__delete_elem() API for consistency across all similar APIs, even though currently kernel doesn't expect any extra flags for BPF_MAP_DELETE_ELEM operation. List of map operations that get these high-level APIs: - bpf_map_lookup_elem; - bpf_map_update_elem; - bpf_map_delete_elem; - bpf_map_lookup_and_delete_elem; - bpf_map_get_next_key. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220512220713.2617964-1-andrii@kernel.org
2022-05-12 22:07:12 +00:00
/**
* @brief **bpf_map__lookup_elem()** allows to lookup BPF map value
* corresponding to provided key.
* @param map BPF map to lookup element in
* @param key pointer to memory containing bytes of the key used for lookup
* @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
* @param value pointer to memory in which looked up value will be stored
* @param value_sz size in byte of value data memory; it has to match BPF map
* definition's **value_size**. For per-CPU BPF maps value size has to be
* a product of BPF map value size and number of possible CPUs in the system
* (could be fetched with **libbpf_num_possible_cpus()**). Note also that for
* per-CPU values value size has to be aligned up to closest 8 bytes for
* alignment reasons, so expected size is: `round_up(value_size, 8)
* * libbpf_num_possible_cpus()`.
* @flags extra flags passed to kernel for this operation
* @return 0, on success; negative error, otherwise
*
* **bpf_map__lookup_elem()** is high-level equivalent of
* **bpf_map_lookup_elem()** API with added check for key and value size.
*/
LIBBPF_API int bpf_map__lookup_elem(const struct bpf_map *map,
const void *key, size_t key_sz,
void *value, size_t value_sz, __u64 flags);
/**
* @brief **bpf_map__update_elem()** allows to insert or update value in BPF
* map that corresponds to provided key.
* @param map BPF map to insert to or update element in
* @param key pointer to memory containing bytes of the key
* @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
* @param value pointer to memory containing bytes of the value
* @param value_sz size in byte of value data memory; it has to match BPF map
* definition's **value_size**. For per-CPU BPF maps value size has to be
* a product of BPF map value size and number of possible CPUs in the system
* (could be fetched with **libbpf_num_possible_cpus()**). Note also that for
* per-CPU values value size has to be aligned up to closest 8 bytes for
* alignment reasons, so expected size is: `round_up(value_size, 8)
* * libbpf_num_possible_cpus()`.
* @flags extra flags passed to kernel for this operation
* @return 0, on success; negative error, otherwise
*
* **bpf_map__update_elem()** is high-level equivalent of
* **bpf_map_update_elem()** API with added check for key and value size.
*/
LIBBPF_API int bpf_map__update_elem(const struct bpf_map *map,
const void *key, size_t key_sz,
const void *value, size_t value_sz, __u64 flags);
/**
* @brief **bpf_map__delete_elem()** allows to delete element in BPF map that
* corresponds to provided key.
* @param map BPF map to delete element from
* @param key pointer to memory containing bytes of the key
* @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
* @flags extra flags passed to kernel for this operation
* @return 0, on success; negative error, otherwise
*
* **bpf_map__delete_elem()** is high-level equivalent of
* **bpf_map_delete_elem()** API with added check for key size.
*/
LIBBPF_API int bpf_map__delete_elem(const struct bpf_map *map,
const void *key, size_t key_sz, __u64 flags);
/**
* @brief **bpf_map__lookup_and_delete_elem()** allows to lookup BPF map value
* corresponding to provided key and atomically delete it afterwards.
* @param map BPF map to lookup element in
* @param key pointer to memory containing bytes of the key used for lookup
* @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
* @param value pointer to memory in which looked up value will be stored
* @param value_sz size in byte of value data memory; it has to match BPF map
* definition's **value_size**. For per-CPU BPF maps value size has to be
* a product of BPF map value size and number of possible CPUs in the system
* (could be fetched with **libbpf_num_possible_cpus()**). Note also that for
* per-CPU values value size has to be aligned up to closest 8 bytes for
* alignment reasons, so expected size is: `round_up(value_size, 8)
* * libbpf_num_possible_cpus()`.
* @flags extra flags passed to kernel for this operation
* @return 0, on success; negative error, otherwise
*
* **bpf_map__lookup_and_delete_elem()** is high-level equivalent of
* **bpf_map_lookup_and_delete_elem()** API with added check for key and value size.
*/
LIBBPF_API int bpf_map__lookup_and_delete_elem(const struct bpf_map *map,
const void *key, size_t key_sz,
void *value, size_t value_sz, __u64 flags);
/**
* @brief **bpf_map__get_next_key()** allows to iterate BPF map keys by
* fetching next key that follows current key.
* @param map BPF map to fetch next key from
* @param cur_key pointer to memory containing bytes of current key or NULL to
* fetch the first key
* @param next_key pointer to memory to write next key into
* @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
* @return 0, on success; -ENOENT if **cur_key** is the last key in BPF map;
* negative error, otherwise
*
* **bpf_map__get_next_key()** is high-level equivalent of
* **bpf_map_get_next_key()** API with added check for key size.
*/
LIBBPF_API int bpf_map__get_next_key(const struct bpf_map *map,
const void *cur_key, void *next_key, size_t key_sz);
struct bpf_xdp_set_link_opts {
size_t sz;
int old_fd;
size_t :0;
};
#define bpf_xdp_set_link_opts__last_field old_fd
libbpf: streamline low-level XDP APIs Introduce 4 new netlink-based XDP APIs for attaching, detaching, and querying XDP programs: - bpf_xdp_attach; - bpf_xdp_detach; - bpf_xdp_query; - bpf_xdp_query_id. These APIs replace bpf_set_link_xdp_fd, bpf_set_link_xdp_fd_opts, bpf_get_link_xdp_id, and bpf_get_link_xdp_info APIs ([0]). The latter don't follow a consistent naming pattern and some of them use non-extensible approaches (e.g., struct xdp_link_info which can't be modified without breaking libbpf ABI). The approach I took with these low-level XDP APIs is similar to what we did with low-level TC APIs. There is a nice duality of bpf_tc_attach vs bpf_xdp_attach, and so on. I left bpf_xdp_attach() to support detaching when -1 is specified for prog_fd for generality and convenience, but bpf_xdp_detach() is preferred due to clearer naming and associated semantics. Both bpf_xdp_attach() and bpf_xdp_detach() accept the same opts struct allowing to specify expected old_prog_fd. While doing the refactoring, I noticed that old APIs require users to specify opts with old_fd == -1 to declare "don't care about already attached XDP prog fd" condition. Otherwise, FD 0 is assumed, which is essentially never an intended behavior. So I made this behavior consistent with other kernel and libbpf APIs, in which zero FD means "no FD". This seems to be more in line with the latest thinking in BPF land and should cause less user confusion, hopefully. For querying, I left two APIs, both more generic bpf_xdp_query() allowing to query multiple IDs and attach mode, but also a specialization of it, bpf_xdp_query_id(), which returns only requested prog_id. Uses of prog_id returning bpf_get_link_xdp_id() were so prevalent across selftests and samples, that it seemed a very common use case and using bpf_xdp_query() for doing it felt very cumbersome with a highly branches if/else chain based on flags and attach mode. Old APIs are scheduled for deprecation in libbpf 0.8 release. [0] Closes: https://github.com/libbpf/libbpf/issues/309 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220120061422.2710637-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-20 06:14:19 +00:00
struct bpf_xdp_attach_opts {
size_t sz;
int old_prog_fd;
size_t :0;
};
#define bpf_xdp_attach_opts__last_field old_prog_fd
struct bpf_xdp_query_opts {
size_t sz;
__u32 prog_id; /* output */
__u32 drv_prog_id; /* output */
__u32 hw_prog_id; /* output */
__u32 skb_prog_id; /* output */
__u8 attach_mode; /* output */
__u64 feature_flags; /* output */
__u32 xdp_zc_max_segs; /* output */
libbpf: streamline low-level XDP APIs Introduce 4 new netlink-based XDP APIs for attaching, detaching, and querying XDP programs: - bpf_xdp_attach; - bpf_xdp_detach; - bpf_xdp_query; - bpf_xdp_query_id. These APIs replace bpf_set_link_xdp_fd, bpf_set_link_xdp_fd_opts, bpf_get_link_xdp_id, and bpf_get_link_xdp_info APIs ([0]). The latter don't follow a consistent naming pattern and some of them use non-extensible approaches (e.g., struct xdp_link_info which can't be modified without breaking libbpf ABI). The approach I took with these low-level XDP APIs is similar to what we did with low-level TC APIs. There is a nice duality of bpf_tc_attach vs bpf_xdp_attach, and so on. I left bpf_xdp_attach() to support detaching when -1 is specified for prog_fd for generality and convenience, but bpf_xdp_detach() is preferred due to clearer naming and associated semantics. Both bpf_xdp_attach() and bpf_xdp_detach() accept the same opts struct allowing to specify expected old_prog_fd. While doing the refactoring, I noticed that old APIs require users to specify opts with old_fd == -1 to declare "don't care about already attached XDP prog fd" condition. Otherwise, FD 0 is assumed, which is essentially never an intended behavior. So I made this behavior consistent with other kernel and libbpf APIs, in which zero FD means "no FD". This seems to be more in line with the latest thinking in BPF land and should cause less user confusion, hopefully. For querying, I left two APIs, both more generic bpf_xdp_query() allowing to query multiple IDs and attach mode, but also a specialization of it, bpf_xdp_query_id(), which returns only requested prog_id. Uses of prog_id returning bpf_get_link_xdp_id() were so prevalent across selftests and samples, that it seemed a very common use case and using bpf_xdp_query() for doing it felt very cumbersome with a highly branches if/else chain based on flags and attach mode. Old APIs are scheduled for deprecation in libbpf 0.8 release. [0] Closes: https://github.com/libbpf/libbpf/issues/309 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220120061422.2710637-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-20 06:14:19 +00:00
size_t :0;
};
#define bpf_xdp_query_opts__last_field xdp_zc_max_segs
libbpf: streamline low-level XDP APIs Introduce 4 new netlink-based XDP APIs for attaching, detaching, and querying XDP programs: - bpf_xdp_attach; - bpf_xdp_detach; - bpf_xdp_query; - bpf_xdp_query_id. These APIs replace bpf_set_link_xdp_fd, bpf_set_link_xdp_fd_opts, bpf_get_link_xdp_id, and bpf_get_link_xdp_info APIs ([0]). The latter don't follow a consistent naming pattern and some of them use non-extensible approaches (e.g., struct xdp_link_info which can't be modified without breaking libbpf ABI). The approach I took with these low-level XDP APIs is similar to what we did with low-level TC APIs. There is a nice duality of bpf_tc_attach vs bpf_xdp_attach, and so on. I left bpf_xdp_attach() to support detaching when -1 is specified for prog_fd for generality and convenience, but bpf_xdp_detach() is preferred due to clearer naming and associated semantics. Both bpf_xdp_attach() and bpf_xdp_detach() accept the same opts struct allowing to specify expected old_prog_fd. While doing the refactoring, I noticed that old APIs require users to specify opts with old_fd == -1 to declare "don't care about already attached XDP prog fd" condition. Otherwise, FD 0 is assumed, which is essentially never an intended behavior. So I made this behavior consistent with other kernel and libbpf APIs, in which zero FD means "no FD". This seems to be more in line with the latest thinking in BPF land and should cause less user confusion, hopefully. For querying, I left two APIs, both more generic bpf_xdp_query() allowing to query multiple IDs and attach mode, but also a specialization of it, bpf_xdp_query_id(), which returns only requested prog_id. Uses of prog_id returning bpf_get_link_xdp_id() were so prevalent across selftests and samples, that it seemed a very common use case and using bpf_xdp_query() for doing it felt very cumbersome with a highly branches if/else chain based on flags and attach mode. Old APIs are scheduled for deprecation in libbpf 0.8 release. [0] Closes: https://github.com/libbpf/libbpf/issues/309 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220120061422.2710637-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-20 06:14:19 +00:00
LIBBPF_API int bpf_xdp_attach(int ifindex, int prog_fd, __u32 flags,
const struct bpf_xdp_attach_opts *opts);
LIBBPF_API int bpf_xdp_detach(int ifindex, __u32 flags,
const struct bpf_xdp_attach_opts *opts);
LIBBPF_API int bpf_xdp_query(int ifindex, int flags, struct bpf_xdp_query_opts *opts);
LIBBPF_API int bpf_xdp_query_id(int ifindex, int flags, __u32 *prog_id);
libbpf: Add low level TC-BPF management API This adds functions that wrap the netlink API used for adding, manipulating, and removing traffic control filters. The API summary: A bpf_tc_hook represents a location where a TC-BPF filter can be attached. This means that creating a hook leads to creation of the backing qdisc, while destruction either removes all filters attached to a hook, or destroys qdisc if requested explicitly (as discussed below). The TC-BPF API functions operate on this bpf_tc_hook to attach, replace, query, and detach tc filters. All functions return 0 on success, and a negative error code on failure. bpf_tc_hook_create - Create a hook Parameters: @hook - Cannot be NULL, ifindex > 0, attach_point must be set to proper enum constant. Note that parent must be unset when attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a valid value for attach_point. Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM. bpf_tc_hook_destroy - Destroy a hook Parameters: @hook - Cannot be NULL. The behaviour depends on value of attach_point. If BPF_TC_INGRESS, all filters attached to the ingress hook will be detached. If BPF_TC_EGRESS, all filters attached to the egress hook will be detached. If BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be deleted, also detaching all filters. As before, parent must be unset for these attach_points, and set for BPF_TC_CUSTOM. It is advised that if the qdisc is operated on by many programs, then the program at least check that there are no other existing filters before deleting the clsact qdisc. An example is shown below: DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"), .attach_point = BPF_TC_INGRESS); /* set opts as NULL, as we're not really interested in * getting any info for a particular filter, but just * detecting its presence. */ r = bpf_tc_query(&hook, NULL); if (r == -ENOENT) { /* no filters */ hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS; return bpf_tc_hook_destroy(&hook); } else { /* failed or r == 0, the latter means filters do exist */ return r; } Note that there is a small race between checking for no filters and deleting the qdisc. This is currently unavoidable. Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM. bpf_tc_attach - Attach a filter to a hook Parameters: @hook - Cannot be NULL. Represents the hook the filter will be attached to. Requirements for ifindex and attach_point are same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM is also supported. In that case, parent must be set to the handle where the filter will be attached (using BPF_TC_PARENT). E.g. to set parent to 1:16 like in tc command line, the equivalent would be BPF_TC_PARENT(1, 16). @opts - Cannot be NULL. The following opts are optional: * handle - The handle of the filter * priority - The priority of the filter Must be >= 0 and <= UINT16_MAX Note that when left unset, they will be auto-allocated by the kernel. The following opts must be set: * prog_fd - The fd of the loaded SCHED_CLS prog The following opts must be unset: * prog_id - The ID of the BPF prog The following opts are optional: * flags - Currently only BPF_TC_F_REPLACE is allowed. It allows replacing an existing filter instead of failing with -EEXIST. The following opts will be filled by bpf_tc_attach on a successful attach operation if they are unset: * handle - The handle of the attached filter * priority - The priority of the attached filter * prog_id - The ID of the attached SCHED_CLS prog This way, the user can know what the auto allocated values for optional opts like handle and priority are for the newly attached filter, if they were unset. Note that some other attributes are set to fixed default values listed below (this holds for all bpf_tc_* APIs): protocol as ETH_P_ALL, direct action mode, chain index of 0, and class ID of 0 (this can be set by writing to the skb->tc_classid field from the BPF program). bpf_tc_detach Parameters: @hook - Cannot be NULL. Represents the hook the filter will be detached from. Requirements are same as described above in bpf_tc_attach. @opts - Cannot be NULL. The following opts must be set: * handle, priority The following opts must be unset: * prog_fd, prog_id, flags bpf_tc_query Parameters: @hook - Cannot be NULL. Represents the hook where the filter lookup will be performed. Requirements are same as described above in bpf_tc_attach(). @opts - Cannot be NULL. The following opts must be set: * handle, priority The following opts must be unset: * prog_fd, prog_id, flags The following fields will be filled by bpf_tc_query upon a successful lookup: * prog_id Some usage examples (using BPF skeleton infrastructure): BPF program (test_tc_bpf.c): #include <linux/bpf.h> #include <bpf/bpf_helpers.h> SEC("classifier") int cls(struct __sk_buff *skb) { return 0; } Userspace loader: struct test_tc_bpf *skel = NULL; int fd, r; skel = test_tc_bpf__open_and_load(); if (!skel) return -ENOMEM; fd = bpf_program__fd(skel->progs.cls); DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = if_nametoindex("lo"), .attach_point = BPF_TC_INGRESS); /* Create clsact qdisc */ r = bpf_tc_hook_create(&hook); if (r < 0) goto end; DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd); r = bpf_tc_attach(&hook, &opts); if (r < 0) goto end; /* Print the auto allocated handle and priority */ printf("Handle=%u", opts.handle); printf("Priority=%u", opts.priority); opts.prog_fd = opts.prog_id = 0; bpf_tc_detach(&hook, &opts); end: test_tc_bpf__destroy(skel); This is equivalent to doing the following using tc command line: # tc qdisc add dev lo clsact # tc filter add dev lo ingress bpf obj foo.o sec classifier da # tc filter del dev lo ingress handle <h> prio <p> bpf ... where the handle and priority can be found using: # tc filter show dev lo ingress Another example replacing a filter (extending prior example): /* We can also choose both (or one), let's try replacing an * existing filter. */ DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle = opts.handle, .priority = opts.priority, .prog_fd = fd); r = bpf_tc_attach(&hook, &replace_opts); if (r == -EEXIST) { /* Expected, now use BPF_TC_F_REPLACE to replace it */ replace_opts.flags = BPF_TC_F_REPLACE; return bpf_tc_attach(&hook, &replace_opts); } else if (r < 0) { return r; } /* There must be no existing filter with these * attributes, so cleanup and return an error. */ replace_opts.prog_fd = replace_opts.prog_id = 0; bpf_tc_detach(&hook, &replace_opts); return -1; To obtain info of a particular filter: /* Find info for filter with handle 1 and priority 50 */ DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1, .priority = 50); r = bpf_tc_query(&hook, &info_opts); if (r == -ENOENT) printf("Filter not found"); else if (r < 0) return r; printf("Prog ID: %u", info_opts.prog_id); return 0; Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> # libbpf API design [ Daniel: also did major patch cleanup ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210512103451.989420-3-memxor@gmail.com
2021-05-12 23:41:22 +00:00
/* TC related API */
enum bpf_tc_attach_point {
BPF_TC_INGRESS = 1 << 0,
BPF_TC_EGRESS = 1 << 1,
BPF_TC_CUSTOM = 1 << 2,
};
#define BPF_TC_PARENT(a, b) \
((((a) << 16) & 0xFFFF0000U) | ((b) & 0x0000FFFFU))
enum bpf_tc_flags {
BPF_TC_F_REPLACE = 1 << 0,
};
struct bpf_tc_hook {
size_t sz;
int ifindex;
enum bpf_tc_attach_point attach_point;
__u32 parent;
size_t :0;
};
#define bpf_tc_hook__last_field parent
struct bpf_tc_opts {
size_t sz;
int prog_fd;
__u32 flags;
__u32 prog_id;
__u32 handle;
__u32 priority;
size_t :0;
};
#define bpf_tc_opts__last_field priority
LIBBPF_API int bpf_tc_hook_create(struct bpf_tc_hook *hook);
LIBBPF_API int bpf_tc_hook_destroy(struct bpf_tc_hook *hook);
LIBBPF_API int bpf_tc_attach(const struct bpf_tc_hook *hook,
struct bpf_tc_opts *opts);
LIBBPF_API int bpf_tc_detach(const struct bpf_tc_hook *hook,
const struct bpf_tc_opts *opts);
LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook,
struct bpf_tc_opts *opts);
libbpf: Add BPF ring buffer support Declaring and instantiating BPF ring buffer doesn't require any changes to libbpf, as it's just another type of maps. So using existing BTF-defined maps syntax with __uint(type, BPF_MAP_TYPE_RINGBUF) and __uint(max_elements, <size-of-ring-buf>) is all that's necessary to create and use BPF ring buffer. This patch adds BPF ring buffer consumer to libbpf. It is very similar to perf_buffer implementation in terms of API, but also attempts to fix some minor problems and inconveniences with existing perf_buffer API. ring_buffer support both single ring buffer use case (with just using ring_buffer__new()), as well as allows to add more ring buffers, each with its own callback and context. This allows to efficiently poll and consume multiple, potentially completely independent, ring buffers, using single epoll instance. The latter is actually a problem in practice for applications that are using multiple sets of perf buffers. They have to create multiple instances for struct perf_buffer and poll them independently or in a loop, each approach having its own problems (e.g., inability to use a common poll timeout). struct ring_buffer eliminates this problem by aggregating many independent ring buffer instances under the single "ring buffer manager". Second, perf_buffer's callback can't return error, so applications that need to stop polling due to error in data or data signalling the end, have to use extra mechanisms to signal that polling has to stop. ring_buffer's callback can return error, which will be passed through back to user code and can be acted upon appropariately. Two APIs allow to consume ring buffer data: - ring_buffer__poll(), which will wait for data availability notification and will consume data only from reported ring buffer(s); this API allows to efficiently use resources by reading data only when it becomes available; - ring_buffer__consume(), will attempt to read new records regardless of data availablity notification sub-system. This API is useful for cases when lowest latency is required, in expense of burning CPU resources. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200529075424.3139988-3-andriin@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-29 07:54:21 +00:00
/* Ring buffer APIs */
struct ring_buffer;
struct ring;
bpf: Add libbpf logic for user-space ring buffer Now that all of the logic is in place in the kernel to support user-space produced ring buffers, we can add the user-space logic to libbpf. This patch therefore adds the following public symbols to libbpf: struct user_ring_buffer * user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample); void user_ring_buffer__discard(struct user_ring_buffer *rb, void user_ring_buffer__free(struct user_ring_buffer *rb); A user-space producer must first create a struct user_ring_buffer * object with user_ring_buffer__new(), and can then reserve samples in the ring buffer using one of the following two symbols: void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring buffer will be returned if sufficient space is available in the buffer. user_ring_buffer__reserve_blocking() provides similar semantics, but will block for up to 'timeout_ms' in epoll_wait if there is insufficient space in the buffer. This function has the guarantee from the kernel that it will receive at least one event-notification per invocation to bpf_ringbuf_drain(), provided that at least one sample is drained, and the BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain(). Once a sample is reserved, it must either be committed to the ring buffer with user_ring_buffer__submit(), or discarded with user_ring_buffer__discard(). Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-20 00:00:59 +00:00
struct user_ring_buffer;
libbpf: Add BPF ring buffer support Declaring and instantiating BPF ring buffer doesn't require any changes to libbpf, as it's just another type of maps. So using existing BTF-defined maps syntax with __uint(type, BPF_MAP_TYPE_RINGBUF) and __uint(max_elements, <size-of-ring-buf>) is all that's necessary to create and use BPF ring buffer. This patch adds BPF ring buffer consumer to libbpf. It is very similar to perf_buffer implementation in terms of API, but also attempts to fix some minor problems and inconveniences with existing perf_buffer API. ring_buffer support both single ring buffer use case (with just using ring_buffer__new()), as well as allows to add more ring buffers, each with its own callback and context. This allows to efficiently poll and consume multiple, potentially completely independent, ring buffers, using single epoll instance. The latter is actually a problem in practice for applications that are using multiple sets of perf buffers. They have to create multiple instances for struct perf_buffer and poll them independently or in a loop, each approach having its own problems (e.g., inability to use a common poll timeout). struct ring_buffer eliminates this problem by aggregating many independent ring buffer instances under the single "ring buffer manager". Second, perf_buffer's callback can't return error, so applications that need to stop polling due to error in data or data signalling the end, have to use extra mechanisms to signal that polling has to stop. ring_buffer's callback can return error, which will be passed through back to user code and can be acted upon appropariately. Two APIs allow to consume ring buffer data: - ring_buffer__poll(), which will wait for data availability notification and will consume data only from reported ring buffer(s); this API allows to efficiently use resources by reading data only when it becomes available; - ring_buffer__consume(), will attempt to read new records regardless of data availablity notification sub-system. This API is useful for cases when lowest latency is required, in expense of burning CPU resources. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200529075424.3139988-3-andriin@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-29 07:54:21 +00:00
typedef int (*ring_buffer_sample_fn)(void *ctx, void *data, size_t size);
struct ring_buffer_opts {
size_t sz; /* size of this struct, for forward/backward compatibility */
libbpf: Add BPF ring buffer support Declaring and instantiating BPF ring buffer doesn't require any changes to libbpf, as it's just another type of maps. So using existing BTF-defined maps syntax with __uint(type, BPF_MAP_TYPE_RINGBUF) and __uint(max_elements, <size-of-ring-buf>) is all that's necessary to create and use BPF ring buffer. This patch adds BPF ring buffer consumer to libbpf. It is very similar to perf_buffer implementation in terms of API, but also attempts to fix some minor problems and inconveniences with existing perf_buffer API. ring_buffer support both single ring buffer use case (with just using ring_buffer__new()), as well as allows to add more ring buffers, each with its own callback and context. This allows to efficiently poll and consume multiple, potentially completely independent, ring buffers, using single epoll instance. The latter is actually a problem in practice for applications that are using multiple sets of perf buffers. They have to create multiple instances for struct perf_buffer and poll them independently or in a loop, each approach having its own problems (e.g., inability to use a common poll timeout). struct ring_buffer eliminates this problem by aggregating many independent ring buffer instances under the single "ring buffer manager". Second, perf_buffer's callback can't return error, so applications that need to stop polling due to error in data or data signalling the end, have to use extra mechanisms to signal that polling has to stop. ring_buffer's callback can return error, which will be passed through back to user code and can be acted upon appropariately. Two APIs allow to consume ring buffer data: - ring_buffer__poll(), which will wait for data availability notification and will consume data only from reported ring buffer(s); this API allows to efficiently use resources by reading data only when it becomes available; - ring_buffer__consume(), will attempt to read new records regardless of data availablity notification sub-system. This API is useful for cases when lowest latency is required, in expense of burning CPU resources. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200529075424.3139988-3-andriin@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-29 07:54:21 +00:00
};
#define ring_buffer_opts__last_field sz
LIBBPF_API struct ring_buffer *
ring_buffer__new(int map_fd, ring_buffer_sample_fn sample_cb, void *ctx,
const struct ring_buffer_opts *opts);
LIBBPF_API void ring_buffer__free(struct ring_buffer *rb);
LIBBPF_API int ring_buffer__add(struct ring_buffer *rb, int map_fd,
ring_buffer_sample_fn sample_cb, void *ctx);
LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms);
LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb);
LIBBPF_API int ring_buffer__epoll_fd(const struct ring_buffer *rb);
libbpf: Add BPF ring buffer support Declaring and instantiating BPF ring buffer doesn't require any changes to libbpf, as it's just another type of maps. So using existing BTF-defined maps syntax with __uint(type, BPF_MAP_TYPE_RINGBUF) and __uint(max_elements, <size-of-ring-buf>) is all that's necessary to create and use BPF ring buffer. This patch adds BPF ring buffer consumer to libbpf. It is very similar to perf_buffer implementation in terms of API, but also attempts to fix some minor problems and inconveniences with existing perf_buffer API. ring_buffer support both single ring buffer use case (with just using ring_buffer__new()), as well as allows to add more ring buffers, each with its own callback and context. This allows to efficiently poll and consume multiple, potentially completely independent, ring buffers, using single epoll instance. The latter is actually a problem in practice for applications that are using multiple sets of perf buffers. They have to create multiple instances for struct perf_buffer and poll them independently or in a loop, each approach having its own problems (e.g., inability to use a common poll timeout). struct ring_buffer eliminates this problem by aggregating many independent ring buffer instances under the single "ring buffer manager". Second, perf_buffer's callback can't return error, so applications that need to stop polling due to error in data or data signalling the end, have to use extra mechanisms to signal that polling has to stop. ring_buffer's callback can return error, which will be passed through back to user code and can be acted upon appropariately. Two APIs allow to consume ring buffer data: - ring_buffer__poll(), which will wait for data availability notification and will consume data only from reported ring buffer(s); this API allows to efficiently use resources by reading data only when it becomes available; - ring_buffer__consume(), will attempt to read new records regardless of data availablity notification sub-system. This API is useful for cases when lowest latency is required, in expense of burning CPU resources. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200529075424.3139988-3-andriin@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-29 07:54:21 +00:00
/**
* @brief **ring_buffer__ring()** returns the ringbuffer object inside a given
* ringbuffer manager representing a single BPF_MAP_TYPE_RINGBUF map instance.
*
* @param rb A ringbuffer manager object.
* @param idx An index into the ringbuffers contained within the ringbuffer
* manager object. The index is 0-based and corresponds to the order in which
* ring_buffer__add was called.
* @return A ringbuffer object on success; NULL and errno set if the index is
* invalid.
*/
LIBBPF_API struct ring *ring_buffer__ring(struct ring_buffer *rb,
unsigned int idx);
/**
* @brief **ring__consumer_pos()** returns the current consumer position in the
* given ringbuffer.
*
* @param r A ringbuffer object.
* @return The current consumer position.
*/
LIBBPF_API unsigned long ring__consumer_pos(const struct ring *r);
/**
* @brief **ring__producer_pos()** returns the current producer position in the
* given ringbuffer.
*
* @param r A ringbuffer object.
* @return The current producer position.
*/
LIBBPF_API unsigned long ring__producer_pos(const struct ring *r);
/**
* @brief **ring__avail_data_size()** returns the number of bytes in the
* ringbuffer not yet consumed. This has no locking associated with it, so it
* can be inaccurate if operations are ongoing while this is called. However, it
* should still show the correct trend over the long-term.
*
* @param r A ringbuffer object.
* @return The number of bytes not yet consumed.
*/
LIBBPF_API size_t ring__avail_data_size(const struct ring *r);
/**
* @brief **ring__size()** returns the total size of the ringbuffer's map data
* area (excluding special producer/consumer pages). Effectively this gives the
* amount of usable bytes of data inside the ringbuffer.
*
* @param r A ringbuffer object.
* @return The total size of the ringbuffer map data area.
*/
LIBBPF_API size_t ring__size(const struct ring *r);
/**
* @brief **ring__map_fd()** returns the file descriptor underlying the given
* ringbuffer.
*
* @param r A ringbuffer object.
* @return The underlying ringbuffer file descriptor
*/
LIBBPF_API int ring__map_fd(const struct ring *r);
/**
* @brief **ring__consume()** consumes available ringbuffer data without event
* polling.
*
* @param r A ringbuffer object.
* @return The number of records consumed (or INT_MAX, whichever is less), or
* a negative number if any of the callbacks return an error.
*/
LIBBPF_API int ring__consume(struct ring *r);
bpf: Add libbpf logic for user-space ring buffer Now that all of the logic is in place in the kernel to support user-space produced ring buffers, we can add the user-space logic to libbpf. This patch therefore adds the following public symbols to libbpf: struct user_ring_buffer * user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample); void user_ring_buffer__discard(struct user_ring_buffer *rb, void user_ring_buffer__free(struct user_ring_buffer *rb); A user-space producer must first create a struct user_ring_buffer * object with user_ring_buffer__new(), and can then reserve samples in the ring buffer using one of the following two symbols: void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring buffer will be returned if sufficient space is available in the buffer. user_ring_buffer__reserve_blocking() provides similar semantics, but will block for up to 'timeout_ms' in epoll_wait if there is insufficient space in the buffer. This function has the guarantee from the kernel that it will receive at least one event-notification per invocation to bpf_ringbuf_drain(), provided that at least one sample is drained, and the BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain(). Once a sample is reserved, it must either be committed to the ring buffer with user_ring_buffer__submit(), or discarded with user_ring_buffer__discard(). Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-20 00:00:59 +00:00
struct user_ring_buffer_opts {
size_t sz; /* size of this struct, for forward/backward compatibility */
};
#define user_ring_buffer_opts__last_field sz
/**
* @brief **user_ring_buffer__new()** creates a new instance of a user ring
bpf: Add libbpf logic for user-space ring buffer Now that all of the logic is in place in the kernel to support user-space produced ring buffers, we can add the user-space logic to libbpf. This patch therefore adds the following public symbols to libbpf: struct user_ring_buffer * user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample); void user_ring_buffer__discard(struct user_ring_buffer *rb, void user_ring_buffer__free(struct user_ring_buffer *rb); A user-space producer must first create a struct user_ring_buffer * object with user_ring_buffer__new(), and can then reserve samples in the ring buffer using one of the following two symbols: void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring buffer will be returned if sufficient space is available in the buffer. user_ring_buffer__reserve_blocking() provides similar semantics, but will block for up to 'timeout_ms' in epoll_wait if there is insufficient space in the buffer. This function has the guarantee from the kernel that it will receive at least one event-notification per invocation to bpf_ringbuf_drain(), provided that at least one sample is drained, and the BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain(). Once a sample is reserved, it must either be committed to the ring buffer with user_ring_buffer__submit(), or discarded with user_ring_buffer__discard(). Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-20 00:00:59 +00:00
* buffer.
*
* @param map_fd A file descriptor to a BPF_MAP_TYPE_USER_RINGBUF map.
* @param opts Options for how the ring buffer should be created.
* @return A user ring buffer on success; NULL and errno being set on a
* failure.
*/
LIBBPF_API struct user_ring_buffer *
user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts);
/**
* @brief **user_ring_buffer__reserve()** reserves a pointer to a sample in the
bpf: Add libbpf logic for user-space ring buffer Now that all of the logic is in place in the kernel to support user-space produced ring buffers, we can add the user-space logic to libbpf. This patch therefore adds the following public symbols to libbpf: struct user_ring_buffer * user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample); void user_ring_buffer__discard(struct user_ring_buffer *rb, void user_ring_buffer__free(struct user_ring_buffer *rb); A user-space producer must first create a struct user_ring_buffer * object with user_ring_buffer__new(), and can then reserve samples in the ring buffer using one of the following two symbols: void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring buffer will be returned if sufficient space is available in the buffer. user_ring_buffer__reserve_blocking() provides similar semantics, but will block for up to 'timeout_ms' in epoll_wait if there is insufficient space in the buffer. This function has the guarantee from the kernel that it will receive at least one event-notification per invocation to bpf_ringbuf_drain(), provided that at least one sample is drained, and the BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain(). Once a sample is reserved, it must either be committed to the ring buffer with user_ring_buffer__submit(), or discarded with user_ring_buffer__discard(). Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-20 00:00:59 +00:00
* user ring buffer.
* @param rb A pointer to a user ring buffer.
* @param size The size of the sample, in bytes.
* @return A pointer to an 8-byte aligned reserved region of the user ring
* buffer; NULL, and errno being set if a sample could not be reserved.
*
* This function is *not* thread safe, and callers must synchronize accessing
* this function if there are multiple producers. If a size is requested that
* is larger than the size of the entire ring buffer, errno will be set to
* E2BIG and NULL is returned. If the ring buffer could accommodate the size,
* but currently does not have enough space, errno is set to ENOSPC and NULL is
* returned.
*
* After initializing the sample, callers must invoke
* **user_ring_buffer__submit()** to post the sample to the kernel. Otherwise,
* the sample must be freed with **user_ring_buffer__discard()**.
*/
LIBBPF_API void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
/**
* @brief **user_ring_buffer__reserve_blocking()** reserves a record in the
bpf: Add libbpf logic for user-space ring buffer Now that all of the logic is in place in the kernel to support user-space produced ring buffers, we can add the user-space logic to libbpf. This patch therefore adds the following public symbols to libbpf: struct user_ring_buffer * user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample); void user_ring_buffer__discard(struct user_ring_buffer *rb, void user_ring_buffer__free(struct user_ring_buffer *rb); A user-space producer must first create a struct user_ring_buffer * object with user_ring_buffer__new(), and can then reserve samples in the ring buffer using one of the following two symbols: void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring buffer will be returned if sufficient space is available in the buffer. user_ring_buffer__reserve_blocking() provides similar semantics, but will block for up to 'timeout_ms' in epoll_wait if there is insufficient space in the buffer. This function has the guarantee from the kernel that it will receive at least one event-notification per invocation to bpf_ringbuf_drain(), provided that at least one sample is drained, and the BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain(). Once a sample is reserved, it must either be committed to the ring buffer with user_ring_buffer__submit(), or discarded with user_ring_buffer__discard(). Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-20 00:00:59 +00:00
* ring buffer, possibly blocking for up to @timeout_ms until a sample becomes
* available.
* @param rb The user ring buffer.
* @param size The size of the sample, in bytes.
* @param timeout_ms The amount of time, in milliseconds, for which the caller
* should block when waiting for a sample. -1 causes the caller to block
* indefinitely.
* @return A pointer to an 8-byte aligned reserved region of the user ring
* buffer; NULL, and errno being set if a sample could not be reserved.
*
* This function is *not* thread safe, and callers must synchronize
* accessing this function if there are multiple producers
*
* If **timeout_ms** is -1, the function will block indefinitely until a sample
* becomes available. Otherwise, **timeout_ms** must be non-negative, or errno
* is set to EINVAL, and NULL is returned. If **timeout_ms** is 0, no blocking
* will occur and the function will return immediately after attempting to
* reserve a sample.
*
* If **size** is larger than the size of the entire ring buffer, errno is set
* to E2BIG and NULL is returned. If the ring buffer could accommodate
* **size**, but currently does not have enough space, the caller will block
* until at most **timeout_ms** has elapsed. If insufficient space is available
* at that time, errno is set to ENOSPC, and NULL is returned.
*
* The kernel guarantees that it will wake up this thread to check if
* sufficient space is available in the ring buffer at least once per
* invocation of the **bpf_ringbuf_drain()** helper function, provided that at
* least one sample is consumed, and the BPF program did not invoke the
* function with BPF_RB_NO_WAKEUP. A wakeup may occur sooner than that, but the
* kernel does not guarantee this. If the helper function is invoked with
* BPF_RB_FORCE_WAKEUP, a wakeup event will be sent even if no sample is
* consumed.
*
* When a sample of size **size** is found within **timeout_ms**, a pointer to
* the sample is returned. After initializing the sample, callers must invoke
* **user_ring_buffer__submit()** to post the sample to the ring buffer.
* Otherwise, the sample must be freed with **user_ring_buffer__discard()**.
*/
LIBBPF_API void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
__u32 size,
int timeout_ms);
/**
* @brief **user_ring_buffer__submit()** submits a previously reserved sample
bpf: Add libbpf logic for user-space ring buffer Now that all of the logic is in place in the kernel to support user-space produced ring buffers, we can add the user-space logic to libbpf. This patch therefore adds the following public symbols to libbpf: struct user_ring_buffer * user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample); void user_ring_buffer__discard(struct user_ring_buffer *rb, void user_ring_buffer__free(struct user_ring_buffer *rb); A user-space producer must first create a struct user_ring_buffer * object with user_ring_buffer__new(), and can then reserve samples in the ring buffer using one of the following two symbols: void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring buffer will be returned if sufficient space is available in the buffer. user_ring_buffer__reserve_blocking() provides similar semantics, but will block for up to 'timeout_ms' in epoll_wait if there is insufficient space in the buffer. This function has the guarantee from the kernel that it will receive at least one event-notification per invocation to bpf_ringbuf_drain(), provided that at least one sample is drained, and the BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain(). Once a sample is reserved, it must either be committed to the ring buffer with user_ring_buffer__submit(), or discarded with user_ring_buffer__discard(). Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-20 00:00:59 +00:00
* into the ring buffer.
* @param rb The user ring buffer.
* @param sample A reserved sample.
*
* It is not necessary to synchronize amongst multiple producers when invoking
* this function.
*/
LIBBPF_API void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample);
/**
* @brief **user_ring_buffer__discard()** discards a previously reserved sample.
bpf: Add libbpf logic for user-space ring buffer Now that all of the logic is in place in the kernel to support user-space produced ring buffers, we can add the user-space logic to libbpf. This patch therefore adds the following public symbols to libbpf: struct user_ring_buffer * user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample); void user_ring_buffer__discard(struct user_ring_buffer *rb, void user_ring_buffer__free(struct user_ring_buffer *rb); A user-space producer must first create a struct user_ring_buffer * object with user_ring_buffer__new(), and can then reserve samples in the ring buffer using one of the following two symbols: void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring buffer will be returned if sufficient space is available in the buffer. user_ring_buffer__reserve_blocking() provides similar semantics, but will block for up to 'timeout_ms' in epoll_wait if there is insufficient space in the buffer. This function has the guarantee from the kernel that it will receive at least one event-notification per invocation to bpf_ringbuf_drain(), provided that at least one sample is drained, and the BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain(). Once a sample is reserved, it must either be committed to the ring buffer with user_ring_buffer__submit(), or discarded with user_ring_buffer__discard(). Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-20 00:00:59 +00:00
* @param rb The user ring buffer.
* @param sample A reserved sample.
*
* It is not necessary to synchronize amongst multiple producers when invoking
* this function.
*/
LIBBPF_API void user_ring_buffer__discard(struct user_ring_buffer *rb, void *sample);
/**
* @brief **user_ring_buffer__free()** frees a ring buffer that was previously
bpf: Add libbpf logic for user-space ring buffer Now that all of the logic is in place in the kernel to support user-space produced ring buffers, we can add the user-space logic to libbpf. This patch therefore adds the following public symbols to libbpf: struct user_ring_buffer * user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample); void user_ring_buffer__discard(struct user_ring_buffer *rb, void user_ring_buffer__free(struct user_ring_buffer *rb); A user-space producer must first create a struct user_ring_buffer * object with user_ring_buffer__new(), and can then reserve samples in the ring buffer using one of the following two symbols: void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring buffer will be returned if sufficient space is available in the buffer. user_ring_buffer__reserve_blocking() provides similar semantics, but will block for up to 'timeout_ms' in epoll_wait if there is insufficient space in the buffer. This function has the guarantee from the kernel that it will receive at least one event-notification per invocation to bpf_ringbuf_drain(), provided that at least one sample is drained, and the BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain(). Once a sample is reserved, it must either be committed to the ring buffer with user_ring_buffer__submit(), or discarded with user_ring_buffer__discard(). Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-20 00:00:59 +00:00
* created with **user_ring_buffer__new()**.
* @param rb The user ring buffer being freed.
*/
LIBBPF_API void user_ring_buffer__free(struct user_ring_buffer *rb);
libbpf: Add BPF ring buffer support Declaring and instantiating BPF ring buffer doesn't require any changes to libbpf, as it's just another type of maps. So using existing BTF-defined maps syntax with __uint(type, BPF_MAP_TYPE_RINGBUF) and __uint(max_elements, <size-of-ring-buf>) is all that's necessary to create and use BPF ring buffer. This patch adds BPF ring buffer consumer to libbpf. It is very similar to perf_buffer implementation in terms of API, but also attempts to fix some minor problems and inconveniences with existing perf_buffer API. ring_buffer support both single ring buffer use case (with just using ring_buffer__new()), as well as allows to add more ring buffers, each with its own callback and context. This allows to efficiently poll and consume multiple, potentially completely independent, ring buffers, using single epoll instance. The latter is actually a problem in practice for applications that are using multiple sets of perf buffers. They have to create multiple instances for struct perf_buffer and poll them independently or in a loop, each approach having its own problems (e.g., inability to use a common poll timeout). struct ring_buffer eliminates this problem by aggregating many independent ring buffer instances under the single "ring buffer manager". Second, perf_buffer's callback can't return error, so applications that need to stop polling due to error in data or data signalling the end, have to use extra mechanisms to signal that polling has to stop. ring_buffer's callback can return error, which will be passed through back to user code and can be acted upon appropariately. Two APIs allow to consume ring buffer data: - ring_buffer__poll(), which will wait for data availability notification and will consume data only from reported ring buffer(s); this API allows to efficiently use resources by reading data only when it becomes available; - ring_buffer__consume(), will attempt to read new records regardless of data availablity notification sub-system. This API is useful for cases when lowest latency is required, in expense of burning CPU resources. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200529075424.3139988-3-andriin@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-29 07:54:21 +00:00
/* Perf buffer APIs */
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
struct perf_buffer;
typedef void (*perf_buffer_sample_fn)(void *ctx, int cpu,
void *data, __u32 size);
typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt);
/* common use perf buffer options */
struct perf_buffer_opts {
size_t sz;
__u32 sample_period;
size_t :0;
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
};
#define perf_buffer_opts__last_field sample_period
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
/**
* @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified
* BPF_PERF_EVENT_ARRAY map
* @param map_fd FD of BPF_PERF_EVENT_ARRAY BPF map that will be used by BPF
* code to send data over to user-space
* @param page_cnt number of memory pages allocated for each per-CPU buffer
* @param sample_cb function called on each received data record
* @param lost_cb function called when record loss has occurred
* @param ctx user-provided extra context passed into *sample_cb* and *lost_cb*
* @return a new instance of struct perf_buffer on success, NULL on error with
* *errno* containing an error code
*/
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
LIBBPF_API struct perf_buffer *
perf_buffer__new(int map_fd, size_t page_cnt,
perf_buffer_sample_fn sample_cb, perf_buffer_lost_fn lost_cb, void *ctx,
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
const struct perf_buffer_opts *opts);
enum bpf_perf_event_ret {
LIBBPF_PERF_EVENT_DONE = 0,
LIBBPF_PERF_EVENT_ERROR = -1,
LIBBPF_PERF_EVENT_CONT = -2,
};
struct perf_event_header;
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
typedef enum bpf_perf_event_ret
(*perf_buffer_event_fn)(void *ctx, int cpu, struct perf_event_header *event);
/* raw perf buffer options, giving most power and control */
struct perf_buffer_raw_opts {
size_t sz;
long :0;
long :0;
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
/* if cpu_cnt == 0, open all on all possible CPUs (up to the number of
* max_entries of given PERF_EVENT_ARRAY map)
*/
int cpu_cnt;
/* if cpu_cnt > 0, cpus is an array of CPUs to open ring buffers on */
int *cpus;
/* if cpu_cnt > 0, map_keys specify map keys to set per-CPU FDs for */
int *map_keys;
};
#define perf_buffer_raw_opts__last_field map_keys
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
struct perf_event_attr;
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
LIBBPF_API struct perf_buffer *
perf_buffer__new_raw(int map_fd, size_t page_cnt, struct perf_event_attr *attr,
perf_buffer_event_fn event_cb, void *ctx,
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
const struct perf_buffer_raw_opts *opts);
LIBBPF_API void perf_buffer__free(struct perf_buffer *pb);
libbpf: Add perf_buffer APIs for better integration with outside epoll loop Add a set of APIs to perf_buffer manage to allow applications to integrate perf buffer polling into existing epoll-based infrastructure. One example is applications using libevent already and wanting to plug perf_buffer polling, instead of relying on perf_buffer__poll() and waste an extra thread to do it. But perf_buffer is still extremely useful to set up and consume perf buffer rings even for such use cases. So to accomodate such new use cases, add three new APIs: - perf_buffer__buffer_cnt() returns number of per-CPU buffers maintained by given instance of perf_buffer manager; - perf_buffer__buffer_fd() returns FD of perf_event corresponding to a specified per-CPU buffer; this FD is then polled independently; - perf_buffer__consume_buffer() consumes data from single per-CPU buffer, identified by its slot index. To support a simpler, but less efficient, way to integrate perf_buffer into external polling logic, also expose underlying epoll FD through perf_buffer__epoll_fd() API. It will need to be followed by perf_buffer__poll(), wasting extra syscall, or perf_buffer__consume(), wasting CPU to iterate buffers with no data. But could be simpler and more convenient for some cases. These APIs allow for great flexiblity, but do not sacrifice general usability of perf_buffer. Also exercise and check new APIs in perf_buffer selftest. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20200821165927.849538-1-andriin@fb.com
2020-08-21 16:59:27 +00:00
LIBBPF_API int perf_buffer__epoll_fd(const struct perf_buffer *pb);
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
LIBBPF_API int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms);
LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
libbpf: Add perf_buffer APIs for better integration with outside epoll loop Add a set of APIs to perf_buffer manage to allow applications to integrate perf buffer polling into existing epoll-based infrastructure. One example is applications using libevent already and wanting to plug perf_buffer polling, instead of relying on perf_buffer__poll() and waste an extra thread to do it. But perf_buffer is still extremely useful to set up and consume perf buffer rings even for such use cases. So to accomodate such new use cases, add three new APIs: - perf_buffer__buffer_cnt() returns number of per-CPU buffers maintained by given instance of perf_buffer manager; - perf_buffer__buffer_fd() returns FD of perf_event corresponding to a specified per-CPU buffer; this FD is then polled independently; - perf_buffer__consume_buffer() consumes data from single per-CPU buffer, identified by its slot index. To support a simpler, but less efficient, way to integrate perf_buffer into external polling logic, also expose underlying epoll FD through perf_buffer__epoll_fd() API. It will need to be followed by perf_buffer__poll(), wasting extra syscall, or perf_buffer__consume(), wasting CPU to iterate buffers with no data. But could be simpler and more convenient for some cases. These APIs allow for great flexiblity, but do not sacrifice general usability of perf_buffer. Also exercise and check new APIs in perf_buffer selftest. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20200821165927.849538-1-andriin@fb.com
2020-08-21 16:59:27 +00:00
LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
libbpf: perfbuf: Add API to get the ring buffer Add support for writing a custom event reader, by exposing the ring buffer. With the new API perf_buffer__buffer() you will get access to the raw mmaped()'ed per-cpu underlying memory of the ring buffer. This region contains both the perf buffer data and header (struct perf_event_mmap_page), which manages the ring buffer state (head/tail positions, when accessing the head/tail position it's important to take into consideration SMP). With this type of low level access one can implement different types of consumers here are few simple examples where this API helps with: 1. perf_event_read_simple is allocating using malloc, perhaps you want to handle the wrap-around in some other way. 2. Since perf buf is per-cpu then the order of the events is not guarnteed, for example: Given 3 events where each event has a timestamp t0 < t1 < t2, and the events are spread on more than 1 CPU, then we can end up with the following state in the ring buf: CPU[0] => [t0, t2] CPU[1] => [t1] When you consume the events from CPU[0], you could know there is a t1 missing, (assuming there are no drops, and your event data contains a sequential index). So now one can simply do the following, for CPU[0], you can store the address of t0 and t2 in an array (without moving the tail, so there data is not perished) then move on the CPU[1] and set the address of t1 in the same array. So you end up with something like: void **arr[] = [&t0, &t1, &t2], now you can consume it orderely and move the tails as you process in order. 3. Assuming there are multiple CPUs and we want to start draining the messages from them, then we can "pick" with which one to start with according to the remaining free space in the ring buffer. Signed-off-by: Jon Doron <jond@wiz.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220715181122.149224-1-arilou@gmail.com
2022-07-15 18:11:22 +00:00
/**
* @brief **perf_buffer__buffer()** returns the per-cpu raw mmap()'ed underlying
* memory region of the ring buffer.
* This ring buffer can be used to implement a custom events consumer.
* The ring buffer starts with the *struct perf_event_mmap_page*, which
* holds the ring buffer managment fields, when accessing the header
* structure it's important to be SMP aware.
* You can refer to *perf_event_read_simple* for a simple example.
* @param pb the perf buffer structure
* @param buf_idx the buffer index to retreive
* @param buf (out) gets the base pointer of the mmap()'ed memory
* @param buf_size (out) gets the size of the mmap()'ed region
* @return 0 on success, negative error code for failure
*/
LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
size_t *buf_size);
libbpf: add perf buffer API BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program to user space for additional processing. libbpf already has very low-level API to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to use and requires a lot of code to set everything up. This patch adds perf_buffer abstraction on top of it, abstracting setting up and polling per-CPU logic into simple and convenient API, similar to what BCC provides. perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF map entries. It accepts two user-provided callbacks: one for handling raw samples and one for get notifications of lost samples due to buffer overflow. perf_buffer__new_raw() is similar, but provides more control over how perf events are set up (by accepting user-provided perf_event_attr), how they are handled (perf_event_header pointer is passed directly to user-provided callback), and on which CPUs ring buffers are created (it's possible to provide a list of CPUs and corresponding map keys to update). This API allows advanced users fuller control. perf_buffer__poll() is used to fetch ring buffer data across all CPUs, utilizing epoll instance. perf_buffer__free() does corresponding clean up and unsets FDs from BPF map. All APIs are not thread-safe. User should ensure proper locking/coordination if used in multi-threaded set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-06 18:06:24 +00:00
bpf: libbpf: bpftool: Print bpf_line_info during prog dump This patch adds print bpf_line_info function in 'prog dump jitted' and 'prog dump xlated': [root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv [...] int test_long_fname_2(struct dummy_tracepoint_args * arg): bpf_prog_44a040bf25481309_test_long_fname_2: ; static int test_long_fname_2(struct dummy_tracepoint_args *arg) 0: push %rbp 1: mov %rsp,%rbp 4: sub $0x30,%rsp b: sub $0x28,%rbp f: mov %rbx,0x0(%rbp) 13: mov %r13,0x8(%rbp) 17: mov %r14,0x10(%rbp) 1b: mov %r15,0x18(%rbp) 1f: xor %eax,%eax 21: mov %rax,0x20(%rbp) 25: xor %esi,%esi ; int key = 0; 27: mov %esi,-0x4(%rbp) ; if (!arg->sock) 2a: mov 0x8(%rdi),%rdi ; if (!arg->sock) 2e: cmp $0x0,%rdi 32: je 0x0000000000000070 34: mov %rbp,%rsi ; counts = bpf_map_lookup_elem(&btf_map, &key); 37: add $0xfffffffffffffffc,%rsi 3b: movabs $0xffff8881139d7480,%rdi 45: add $0x110,%rdi 4c: mov 0x0(%rsi),%eax 4f: cmp $0x4,%rax 53: jae 0x000000000000005e 55: shl $0x3,%rax 59: add %rdi,%rax 5c: jmp 0x0000000000000060 5e: xor %eax,%eax ; if (!counts) 60: cmp $0x0,%rax 64: je 0x0000000000000070 ; counts->v6++; 66: mov 0x4(%rax),%edi 69: add $0x1,%rdi 6d: mov %edi,0x4(%rax) 70: mov 0x0(%rbp),%rbx 74: mov 0x8(%rbp),%r13 78: mov 0x10(%rbp),%r14 7c: mov 0x18(%rbp),%r15 80: add $0x28,%rbp 84: leaveq 85: retq [...] With linum: [root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv linum int _dummy_tracepoint(struct dummy_tracepoint_args * arg): bpf_prog_b07ccb89267cf242__dummy_tracepoint: ; return test_long_fname_1(arg); [file:/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/test_btf_haskv.c line_num:54 line_col:9] 0: push %rbp 1: mov %rsp,%rbp 4: sub $0x28,%rsp b: sub $0x28,%rbp f: mov %rbx,0x0(%rbp) 13: mov %r13,0x8(%rbp) 17: mov %r14,0x10(%rbp) 1b: mov %r15,0x18(%rbp) 1f: xor %eax,%eax 21: mov %rax,0x20(%rbp) 25: callq 0x000000000000851e ; return test_long_fname_1(arg); [file:/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/test_btf_haskv.c line_num:54 line_col:2] 2a: xor %eax,%eax 2c: mov 0x0(%rbp),%rbx 30: mov 0x8(%rbp),%r13 34: mov 0x10(%rbp),%r14 38: mov 0x18(%rbp),%r15 3c: add $0x28,%rbp 40: leaveq 41: retq [...] Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-08 00:42:32 +00:00
struct bpf_prog_linfo;
struct bpf_prog_info;
LIBBPF_API void bpf_prog_linfo__free(struct bpf_prog_linfo *prog_linfo);
LIBBPF_API struct bpf_prog_linfo *
bpf_prog_linfo__new(const struct bpf_prog_info *info);
LIBBPF_API const struct bpf_line_info *
bpf_prog_linfo__lfind_addr_func(const struct bpf_prog_linfo *prog_linfo,
__u64 addr, __u32 func_idx, __u32 nr_skip);
LIBBPF_API const struct bpf_line_info *
bpf_prog_linfo__lfind(const struct bpf_prog_linfo *prog_linfo,
__u32 insn_off, __u32 nr_skip);
tools: bpftool: add probes for eBPF program types Introduce probes for supported BPF program types in libbpf, and call it from bpftool to test what types are available on the system. The probe simply consists in loading a very basic program of that type and see if the verifier complains or not. Sample output: # bpftool feature probe kernel ... Scanning eBPF program types... eBPF program_type socket_filter is available eBPF program_type kprobe is available eBPF program_type sched_cls is available ... # bpftool --json --pretty feature probe kernel { ... "program_types": { "have_socket_filter_prog_type": true, "have_kprobe_prog_type": true, "have_sched_cls_prog_type": true, ... } } v5: - In libbpf.map, move global symbol to a new LIBBPF_0.0.2 section. - Rename (non-API function) prog_load() as probe_load(). v3: - Get kernel version for checking kprobes availability from libbpf instead of from bpftool. Do not pass kernel_version as an argument when calling libbpf probes. - Use a switch with all enum values for setting specific program parameters just before probing, so that gcc complains at compile time (-Wswitch-enum) if new prog types were added to the kernel but libbpf was not updated. - Add a comment in libbpf.h about setrlimit() usage to allow many consecutive probe attempts. v2: - Move probes from bpftool to libbpf. - Remove C-style macros output from this patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-17 15:27:53 +00:00
/*
* Probe for supported system features
*
* Note that running many of these probes in a short amount of time can cause
* the kernel to reach the maximal size of lockable memory allowed for the
* user, causing subsequent probes to fail. In this case, the caller may want
* to adjust that limit with setrlimit().
*/
libbpf: Rework feature-probing APIs Create three extensible alternatives to inconsistently named feature-probing APIs: - libbpf_probe_bpf_prog_type() instead of bpf_probe_prog_type(); - libbpf_probe_bpf_map_type() instead of bpf_probe_map_type(); - libbpf_probe_bpf_helper() instead of bpf_probe_helper(). Set up return values such that libbpf can report errors (e.g., if some combination of input arguments isn't possible to validate, etc), in addition to whether the feature is supported (return value 1) or not supported (return value 0). Also schedule deprecation of those three APIs. Also schedule deprecation of bpf_probe_large_insn_limit(). Also fix all the existing detection logic for various program and map types that never worked: - BPF_PROG_TYPE_LIRC_MODE2; - BPF_PROG_TYPE_TRACING; - BPF_PROG_TYPE_LSM; - BPF_PROG_TYPE_EXT; - BPF_PROG_TYPE_SYSCALL; - BPF_PROG_TYPE_STRUCT_OPS; - BPF_MAP_TYPE_STRUCT_OPS; - BPF_MAP_TYPE_BLOOM_FILTER. Above prog/map types needed special setups and detection logic to work. Subsequent patch adds selftests that will make sure that all the detection logic keeps working for all current and future program and map types, avoiding otherwise inevitable bit rot. [0] Closes: https://github.com/libbpf/libbpf/issues/312 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Cc: Julia Kartseva <hex@fb.com> Link: https://lore.kernel.org/bpf/20211217171202.3352835-2-andrii@kernel.org
2021-12-17 17:12:00 +00:00
/**
* @brief **libbpf_probe_bpf_prog_type()** detects if host kernel supports
* BPF programs of a given type.
* @param prog_type BPF program type to detect kernel support for
* @param opts reserved for future extensibility, should be NULL
* @return 1, if given program type is supported; 0, if given program type is
* not supported; negative error code if feature detection failed or can't be
* performed
*
* Make sure the process has required set of CAP_* permissions (or runs as
* root) when performing feature checking.
*/
LIBBPF_API int libbpf_probe_bpf_prog_type(enum bpf_prog_type prog_type, const void *opts);
/**
* @brief **libbpf_probe_bpf_map_type()** detects if host kernel supports
* BPF maps of a given type.
* @param map_type BPF map type to detect kernel support for
* @param opts reserved for future extensibility, should be NULL
* @return 1, if given map type is supported; 0, if given map type is
* not supported; negative error code if feature detection failed or can't be
* performed
*
* Make sure the process has required set of CAP_* permissions (or runs as
* root) when performing feature checking.
*/
LIBBPF_API int libbpf_probe_bpf_map_type(enum bpf_map_type map_type, const void *opts);
/**
* @brief **libbpf_probe_bpf_helper()** detects if host kernel supports the
* use of a given BPF helper from specified BPF program type.
* @param prog_type BPF program type used to check the support of BPF helper
* @param helper_id BPF helper ID (enum bpf_func_id) to check support for
* @param opts reserved for future extensibility, should be NULL
* @return 1, if given combination of program type and helper is supported; 0,
* if the combination is not supported; negative error code if feature
* detection for provided input arguments failed or can't be performed
*
* Make sure the process has required set of CAP_* permissions (or runs as
* root) when performing feature checking.
*/
LIBBPF_API int libbpf_probe_bpf_helper(enum bpf_prog_type prog_type,
enum bpf_func_id helper_id, const void *opts);
/**
* @brief **libbpf_num_possible_cpus()** is a helper function to get the
* number of possible CPUs that the host kernel supports and expects.
* @return number of possible CPUs; or error code on failure
*
* Example usage:
*
* int ncpus = libbpf_num_possible_cpus();
* if (ncpus < 0) {
* // error handling
* }
* long values[ncpus];
* bpf_map_lookup_elem(per_cpu_map_fd, key, values);
*/
LIBBPF_API int libbpf_num_possible_cpus(void);
struct bpf_map_skeleton {
const char *name;
struct bpf_map **map;
void **mmaped;
};
struct bpf_prog_skeleton {
const char *name;
struct bpf_program **prog;
struct bpf_link **link;
};
struct bpf_object_skeleton {
size_t sz; /* size of this struct, for forward/backward compatibility */
const char *name;
const void *data;
size_t data_sz;
struct bpf_object **obj;
int map_cnt;
int map_skel_sz; /* sizeof(struct bpf_map_skeleton) */
struct bpf_map_skeleton *maps;
int prog_cnt;
int prog_skel_sz; /* sizeof(struct bpf_prog_skeleton) */
struct bpf_prog_skeleton *progs;
};
LIBBPF_API int
bpf_object__open_skeleton(struct bpf_object_skeleton *s,
const struct bpf_object_open_opts *opts);
LIBBPF_API int bpf_object__load_skeleton(struct bpf_object_skeleton *s);
LIBBPF_API int bpf_object__attach_skeleton(struct bpf_object_skeleton *s);
LIBBPF_API void bpf_object__detach_skeleton(struct bpf_object_skeleton *s);
LIBBPF_API void bpf_object__destroy_skeleton(struct bpf_object_skeleton *s);
struct bpf_var_skeleton {
const char *name;
struct bpf_map **map;
void **addr;
};
struct bpf_object_subskeleton {
size_t sz; /* size of this struct, for forward/backward compatibility */
const struct bpf_object *obj;
int map_cnt;
int map_skel_sz; /* sizeof(struct bpf_map_skeleton) */
struct bpf_map_skeleton *maps;
int prog_cnt;
int prog_skel_sz; /* sizeof(struct bpf_prog_skeleton) */
struct bpf_prog_skeleton *progs;
int var_cnt;
int var_skel_sz; /* sizeof(struct bpf_var_skeleton) */
struct bpf_var_skeleton *vars;
};
LIBBPF_API int
bpf_object__open_subskeleton(struct bpf_object_subskeleton *s);
LIBBPF_API void
bpf_object__destroy_subskeleton(struct bpf_object_subskeleton *s);
libbpf: Generate loader program out of BPF ELF file. The BPF program loading process performed by libbpf is quite complex and consists of the following steps: "open" phase: - parse elf file and remember relocations, sections - collect externs and ksyms including their btf_ids in prog's BTF - patch BTF datasec (since llvm couldn't do it) - init maps (old style map_def, BTF based, global data map, kconfig map) - collect relocations against progs and maps "load" phase: - probe kernel features - load vmlinux BTF - resolve externs (kconfig and ksym) - load program BTF - init struct_ops - create maps - apply CO-RE relocations - patch ld_imm64 insns with src_reg=PSEUDO_MAP, PSEUDO_MAP_VALUE, PSEUDO_BTF_ID - reposition subprograms and adjust call insns - sanitize and load progs During this process libbpf does sys_bpf() calls to load BTF, create maps, populate maps and finally load programs. Instead of actually doing the syscalls generate a trace of what libbpf would have done and represent it as the "loader program". The "loader program" consists of single map with: - union bpf_attr(s) - BTF bytes - map value bytes - insns bytes and single bpf program that passes bpf_attr(s) and data into bpf_sys_bpf() helper. Executing such "loader program" via bpf_prog_test_run() command will replay the sequence of syscalls that libbpf would have done which will result the same maps created and programs loaded as specified in the elf file. The "loader program" removes libelf and majority of libbpf dependency from program loading process. kconfig, typeless ksym, struct_ops and CO-RE are not supported yet. The order of relocate_data and relocate_calls had to change, so that bpf_gen__prog_load() can see all relocations for a given program with correct insn_idx-es. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-15-alexei.starovoitov@gmail.com
2021-05-14 00:36:16 +00:00
struct gen_loader_opts {
size_t sz; /* size of this struct, for forward/backward compatibility */
libbpf: Generate loader program out of BPF ELF file. The BPF program loading process performed by libbpf is quite complex and consists of the following steps: "open" phase: - parse elf file and remember relocations, sections - collect externs and ksyms including their btf_ids in prog's BTF - patch BTF datasec (since llvm couldn't do it) - init maps (old style map_def, BTF based, global data map, kconfig map) - collect relocations against progs and maps "load" phase: - probe kernel features - load vmlinux BTF - resolve externs (kconfig and ksym) - load program BTF - init struct_ops - create maps - apply CO-RE relocations - patch ld_imm64 insns with src_reg=PSEUDO_MAP, PSEUDO_MAP_VALUE, PSEUDO_BTF_ID - reposition subprograms and adjust call insns - sanitize and load progs During this process libbpf does sys_bpf() calls to load BTF, create maps, populate maps and finally load programs. Instead of actually doing the syscalls generate a trace of what libbpf would have done and represent it as the "loader program". The "loader program" consists of single map with: - union bpf_attr(s) - BTF bytes - map value bytes - insns bytes and single bpf program that passes bpf_attr(s) and data into bpf_sys_bpf() helper. Executing such "loader program" via bpf_prog_test_run() command will replay the sequence of syscalls that libbpf would have done which will result the same maps created and programs loaded as specified in the elf file. The "loader program" removes libelf and majority of libbpf dependency from program loading process. kconfig, typeless ksym, struct_ops and CO-RE are not supported yet. The order of relocate_data and relocate_calls had to change, so that bpf_gen__prog_load() can see all relocations for a given program with correct insn_idx-es. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-15-alexei.starovoitov@gmail.com
2021-05-14 00:36:16 +00:00
const char *data;
const char *insns;
__u32 data_sz;
__u32 insns_sz;
};
#define gen_loader_opts__last_field insns_sz
LIBBPF_API int bpf_object__gen_loader(struct bpf_object *obj,
struct gen_loader_opts *opts);
libbpf: Support libbpf-provided extern variables Add support for extern variables, provided to BPF program by libbpf. Currently the following extern variables are supported: - LINUX_KERNEL_VERSION; version of a kernel in which BPF program is executing, follows KERNEL_VERSION() macro convention, can be 4- and 8-byte long; - CONFIG_xxx values; a set of values of actual kernel config. Tristate, boolean, strings, and integer values are supported. Set of possible values is determined by declared type of extern variable. Supported types of variables are: - Tristate values. Are represented as `enum libbpf_tristate`. Accepted values are **strictly** 'y', 'n', or 'm', which are represented as TRI_YES, TRI_NO, or TRI_MODULE, respectively. - Boolean values. Are represented as bool (_Bool) types. Accepted values are 'y' and 'n' only, turning into true/false values, respectively. - Single-character values. Can be used both as a substritute for bool/tristate, or as a small-range integer: - 'y'/'n'/'m' are represented as is, as characters 'y', 'n', or 'm'; - integers in a range [-128, 127] or [0, 255] (depending on signedness of char in target architecture) are recognized and represented with respective values of char type. - Strings. String values are declared as fixed-length char arrays. String of up to that length will be accepted and put in first N bytes of char array, with the rest of bytes zeroed out. If config string value is longer than space alloted, it will be truncated and warning message emitted. Char array is always zero terminated. String literals in config have to be enclosed in double quotes, just like C-style string literals. - Integers. 8-, 16-, 32-, and 64-bit integers are supported, both signed and unsigned variants. Libbpf enforces parsed config value to be in the supported range of corresponding integer type. Integers values in config can be: - decimal integers, with optional + and - signs; - hexadecimal integers, prefixed with 0x or 0X; - octal integers, starting with 0. Config file itself is searched in /boot/config-$(uname -r) location with fallback to /proc/config.gz, unless config path is specified explicitly through bpf_object_open_opts' kernel_config_path option. Both gzipped and plain text formats are supported. Libbpf adds explicit dependency on zlib because of this, but this shouldn't be a problem, given libelf already depends on zlib. All detected extern variables, are put into a separate .extern internal map. It, similarly to .rodata map, is marked as read-only from BPF program side, as well as is frozen on load. This allows BPF verifier to track extern values as constants and perform enhanced branch prediction and dead code elimination. This can be relied upon for doing kernel version/feature detection and using potentially unsupported field relocations or BPF helpers in a CO-RE-based BPF program, while still having a single version of BPF program running on old and new kernels. Selftests are validating this explicitly for unexisting BPF helper. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191214014710.3449601-3-andriin@fb.com
2019-12-14 01:47:08 +00:00
enum libbpf_tristate {
TRI_NO = 0,
TRI_YES = 1,
TRI_MODULE = 2,
};
libbpf: Add BPF static linker APIs Introduce BPF static linker APIs to libbpf. BPF static linker allows to perform static linking of multiple BPF object files into a single combined resulting object file, preserving all the BPF programs, maps, global variables, etc. Data sections (.bss, .data, .rodata, .maps, maps, etc) with the same name are concatenated together. Similarly, code sections are also concatenated. All the symbols and ELF relocations are also concatenated in their respective ELF sections and are adjusted accordingly to the new object file layout. Static variables and functions are handled correctly as well, adjusting BPF instructions offsets to reflect new variable/function offset within the combined ELF section. Such relocations are referencing STT_SECTION symbols and that stays intact. Data sections in different files can have different alignment requirements, so that is taken care of as well, adjusting sizes and offsets as necessary to satisfy both old and new alignment requirements. DWARF data sections are stripped out, currently. As well as LLLVM_ADDRSIG section, which is ignored by libbpf in bpf_object__open() anyways. So, in a way, BPF static linker is an analogue to `llvm-strip -g`, which is a pretty nice property, especially if resulting .o file is then used to generate BPF skeleton. Original string sections are ignored and instead we construct our own set of unique strings using libbpf-internal `struct strset` API. To reduce the size of the patch, all the .BTF and .BTF.ext processing was moved into a separate patch. The high-level API consists of just 4 functions: - bpf_linker__new() creates an instance of BPF static linker. It accepts output filename and (currently empty) options struct; - bpf_linker__add_file() takes input filename and appends it to the already processed ELF data; it can be called multiple times, one for each BPF ELF object file that needs to be linked in; - bpf_linker__finalize() needs to be called to dump final ELF contents into the output file, specified when bpf_linker was created; after bpf_linker__finalize() is called, no more bpf_linker__add_file() and bpf_linker__finalize() calls are allowed, they will return error; - regardless of whether bpf_linker__finalize() was called or not, bpf_linker__free() will free up all the used resources. Currently, BPF static linker doesn't resolve cross-object file references (extern variables and/or functions). This will be added in the follow up patch set. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-7-andrii@kernel.org
2021-03-18 19:40:30 +00:00
struct bpf_linker_opts {
/* size of this struct, for forward/backward compatibility */
libbpf: Add BPF static linker APIs Introduce BPF static linker APIs to libbpf. BPF static linker allows to perform static linking of multiple BPF object files into a single combined resulting object file, preserving all the BPF programs, maps, global variables, etc. Data sections (.bss, .data, .rodata, .maps, maps, etc) with the same name are concatenated together. Similarly, code sections are also concatenated. All the symbols and ELF relocations are also concatenated in their respective ELF sections and are adjusted accordingly to the new object file layout. Static variables and functions are handled correctly as well, adjusting BPF instructions offsets to reflect new variable/function offset within the combined ELF section. Such relocations are referencing STT_SECTION symbols and that stays intact. Data sections in different files can have different alignment requirements, so that is taken care of as well, adjusting sizes and offsets as necessary to satisfy both old and new alignment requirements. DWARF data sections are stripped out, currently. As well as LLLVM_ADDRSIG section, which is ignored by libbpf in bpf_object__open() anyways. So, in a way, BPF static linker is an analogue to `llvm-strip -g`, which is a pretty nice property, especially if resulting .o file is then used to generate BPF skeleton. Original string sections are ignored and instead we construct our own set of unique strings using libbpf-internal `struct strset` API. To reduce the size of the patch, all the .BTF and .BTF.ext processing was moved into a separate patch. The high-level API consists of just 4 functions: - bpf_linker__new() creates an instance of BPF static linker. It accepts output filename and (currently empty) options struct; - bpf_linker__add_file() takes input filename and appends it to the already processed ELF data; it can be called multiple times, one for each BPF ELF object file that needs to be linked in; - bpf_linker__finalize() needs to be called to dump final ELF contents into the output file, specified when bpf_linker was created; after bpf_linker__finalize() is called, no more bpf_linker__add_file() and bpf_linker__finalize() calls are allowed, they will return error; - regardless of whether bpf_linker__finalize() was called or not, bpf_linker__free() will free up all the used resources. Currently, BPF static linker doesn't resolve cross-object file references (extern variables and/or functions). This will be added in the follow up patch set. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-7-andrii@kernel.org
2021-03-18 19:40:30 +00:00
size_t sz;
};
#define bpf_linker_opts__last_field sz
struct bpf_linker_file_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
};
#define bpf_linker_file_opts__last_field sz
libbpf: Add BPF static linker APIs Introduce BPF static linker APIs to libbpf. BPF static linker allows to perform static linking of multiple BPF object files into a single combined resulting object file, preserving all the BPF programs, maps, global variables, etc. Data sections (.bss, .data, .rodata, .maps, maps, etc) with the same name are concatenated together. Similarly, code sections are also concatenated. All the symbols and ELF relocations are also concatenated in their respective ELF sections and are adjusted accordingly to the new object file layout. Static variables and functions are handled correctly as well, adjusting BPF instructions offsets to reflect new variable/function offset within the combined ELF section. Such relocations are referencing STT_SECTION symbols and that stays intact. Data sections in different files can have different alignment requirements, so that is taken care of as well, adjusting sizes and offsets as necessary to satisfy both old and new alignment requirements. DWARF data sections are stripped out, currently. As well as LLLVM_ADDRSIG section, which is ignored by libbpf in bpf_object__open() anyways. So, in a way, BPF static linker is an analogue to `llvm-strip -g`, which is a pretty nice property, especially if resulting .o file is then used to generate BPF skeleton. Original string sections are ignored and instead we construct our own set of unique strings using libbpf-internal `struct strset` API. To reduce the size of the patch, all the .BTF and .BTF.ext processing was moved into a separate patch. The high-level API consists of just 4 functions: - bpf_linker__new() creates an instance of BPF static linker. It accepts output filename and (currently empty) options struct; - bpf_linker__add_file() takes input filename and appends it to the already processed ELF data; it can be called multiple times, one for each BPF ELF object file that needs to be linked in; - bpf_linker__finalize() needs to be called to dump final ELF contents into the output file, specified when bpf_linker was created; after bpf_linker__finalize() is called, no more bpf_linker__add_file() and bpf_linker__finalize() calls are allowed, they will return error; - regardless of whether bpf_linker__finalize() was called or not, bpf_linker__free() will free up all the used resources. Currently, BPF static linker doesn't resolve cross-object file references (extern variables and/or functions). This will be added in the follow up patch set. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-7-andrii@kernel.org
2021-03-18 19:40:30 +00:00
struct bpf_linker;
LIBBPF_API struct bpf_linker *bpf_linker__new(const char *filename, struct bpf_linker_opts *opts);
LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker,
const char *filename,
const struct bpf_linker_file_opts *opts);
libbpf: Add BPF static linker APIs Introduce BPF static linker APIs to libbpf. BPF static linker allows to perform static linking of multiple BPF object files into a single combined resulting object file, preserving all the BPF programs, maps, global variables, etc. Data sections (.bss, .data, .rodata, .maps, maps, etc) with the same name are concatenated together. Similarly, code sections are also concatenated. All the symbols and ELF relocations are also concatenated in their respective ELF sections and are adjusted accordingly to the new object file layout. Static variables and functions are handled correctly as well, adjusting BPF instructions offsets to reflect new variable/function offset within the combined ELF section. Such relocations are referencing STT_SECTION symbols and that stays intact. Data sections in different files can have different alignment requirements, so that is taken care of as well, adjusting sizes and offsets as necessary to satisfy both old and new alignment requirements. DWARF data sections are stripped out, currently. As well as LLLVM_ADDRSIG section, which is ignored by libbpf in bpf_object__open() anyways. So, in a way, BPF static linker is an analogue to `llvm-strip -g`, which is a pretty nice property, especially if resulting .o file is then used to generate BPF skeleton. Original string sections are ignored and instead we construct our own set of unique strings using libbpf-internal `struct strset` API. To reduce the size of the patch, all the .BTF and .BTF.ext processing was moved into a separate patch. The high-level API consists of just 4 functions: - bpf_linker__new() creates an instance of BPF static linker. It accepts output filename and (currently empty) options struct; - bpf_linker__add_file() takes input filename and appends it to the already processed ELF data; it can be called multiple times, one for each BPF ELF object file that needs to be linked in; - bpf_linker__finalize() needs to be called to dump final ELF contents into the output file, specified when bpf_linker was created; after bpf_linker__finalize() is called, no more bpf_linker__add_file() and bpf_linker__finalize() calls are allowed, they will return error; - regardless of whether bpf_linker__finalize() was called or not, bpf_linker__free() will free up all the used resources. Currently, BPF static linker doesn't resolve cross-object file references (extern variables and/or functions). This will be added in the follow up patch set. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-7-andrii@kernel.org
2021-03-18 19:40:30 +00:00
LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
libbpf: Support custom SEC() handlers Allow registering and unregistering custom handlers for BPF program. This allows user applications and libraries to plug into libbpf's declarative SEC() definition handling logic. This allows to offload complex and intricate custom logic into external libraries, but still provide a great user experience. One such example is USDT handling library, which has a lot of code and complexity which doesn't make sense to put into libbpf directly, but it would be really great for users to be able to specify BPF programs with something like SEC("usdt/<path-to-binary>:<usdt_provider>:<usdt_name>") and have correct BPF program type set (BPF_PROGRAM_TYPE_KPROBE, as it is uprobe) and even support BPF skeleton's auto-attach logic. In some cases, it might be even good idea to override libbpf's default handling, like for SEC("perf_event") programs. With custom library, it's possible to extend logic to support specifying perf event specification right there in SEC() definition without burdening libbpf with lots of custom logic or extra library dependecies (e.g., libpfm4). With current patch it's possible to override libbpf's SEC("perf_event") handling and specify a completely custom ones. Further, it's possible to specify a generic fallback handling for any SEC() that doesn't match any other custom or standard libbpf handlers. This allows to accommodate whatever legacy use cases there might be, if necessary. See doc comments for libbpf_register_prog_handler() and libbpf_unregister_prog_handler() for detailed semantics. This patch also bumps libbpf development version to v0.8 and adds new APIs there. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220305010129.1549719-3-andrii@kernel.org
2022-03-05 01:01:28 +00:00
/*
* Custom handling of BPF program's SEC() definitions
*/
struct bpf_prog_load_opts; /* defined in bpf.h */
/* Called during bpf_object__open() for each recognized BPF program. Callback
* can use various bpf_program__set_*() setters to adjust whatever properties
* are necessary.
*/
typedef int (*libbpf_prog_setup_fn_t)(struct bpf_program *prog, long cookie);
/* Called right before libbpf performs bpf_prog_load() to load BPF program
* into the kernel. Callback can adjust opts as necessary.
*/
typedef int (*libbpf_prog_prepare_load_fn_t)(struct bpf_program *prog,
struct bpf_prog_load_opts *opts, long cookie);
/* Called during skeleton attach or through bpf_program__attach(). If
* auto-attach is not supported, callback should return 0 and set link to
* NULL (it's not considered an error during skeleton attach, but it will be
* an error for bpf_program__attach() calls). On error, error should be
* returned directly and link set to NULL. On success, return 0 and set link
* to a valid struct bpf_link.
*/
typedef int (*libbpf_prog_attach_fn_t)(const struct bpf_program *prog, long cookie,
struct bpf_link **link);
struct libbpf_prog_handler_opts {
/* size of this struct, for forward/backward compatibility */
libbpf: Support custom SEC() handlers Allow registering and unregistering custom handlers for BPF program. This allows user applications and libraries to plug into libbpf's declarative SEC() definition handling logic. This allows to offload complex and intricate custom logic into external libraries, but still provide a great user experience. One such example is USDT handling library, which has a lot of code and complexity which doesn't make sense to put into libbpf directly, but it would be really great for users to be able to specify BPF programs with something like SEC("usdt/<path-to-binary>:<usdt_provider>:<usdt_name>") and have correct BPF program type set (BPF_PROGRAM_TYPE_KPROBE, as it is uprobe) and even support BPF skeleton's auto-attach logic. In some cases, it might be even good idea to override libbpf's default handling, like for SEC("perf_event") programs. With custom library, it's possible to extend logic to support specifying perf event specification right there in SEC() definition without burdening libbpf with lots of custom logic or extra library dependecies (e.g., libpfm4). With current patch it's possible to override libbpf's SEC("perf_event") handling and specify a completely custom ones. Further, it's possible to specify a generic fallback handling for any SEC() that doesn't match any other custom or standard libbpf handlers. This allows to accommodate whatever legacy use cases there might be, if necessary. See doc comments for libbpf_register_prog_handler() and libbpf_unregister_prog_handler() for detailed semantics. This patch also bumps libbpf development version to v0.8 and adds new APIs there. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220305010129.1549719-3-andrii@kernel.org
2022-03-05 01:01:28 +00:00
size_t sz;
/* User-provided value that is passed to prog_setup_fn,
* prog_prepare_load_fn, and prog_attach_fn callbacks. Allows user to
* register one set of callbacks for multiple SEC() definitions and
* still be able to distinguish them, if necessary. For example,
* libbpf itself is using this to pass necessary flags (e.g.,
* sleepable flag) to a common internal SEC() handler.
*/
long cookie;
/* BPF program initialization callback (see libbpf_prog_setup_fn_t).
* Callback is optional, pass NULL if it's not necessary.
*/
libbpf_prog_setup_fn_t prog_setup_fn;
/* BPF program loading callback (see libbpf_prog_prepare_load_fn_t).
* Callback is optional, pass NULL if it's not necessary.
*/
libbpf_prog_prepare_load_fn_t prog_prepare_load_fn;
/* BPF program attach callback (see libbpf_prog_attach_fn_t).
* Callback is optional, pass NULL if it's not necessary.
*/
libbpf_prog_attach_fn_t prog_attach_fn;
};
#define libbpf_prog_handler_opts__last_field prog_attach_fn
/**
* @brief **libbpf_register_prog_handler()** registers a custom BPF program
* SEC() handler.
* @param sec section prefix for which custom handler is registered
* @param prog_type BPF program type associated with specified section
* @param exp_attach_type Expected BPF attach type associated with specified section
* @param opts optional cookie, callbacks, and other extra options
* @return Non-negative handler ID is returned on success. This handler ID has
* to be passed to *libbpf_unregister_prog_handler()* to unregister such
* custom handler. Negative error code is returned on error.
*
* *sec* defines which SEC() definitions are handled by this custom handler
* registration. *sec* can have few different forms:
* - if *sec* is just a plain string (e.g., "abc"), it will match only
* SEC("abc"). If BPF program specifies SEC("abc/whatever") it will result
* in an error;
* - if *sec* is of the form "abc/", proper SEC() form is
* SEC("abc/something"), where acceptable "something" should be checked by
* *prog_init_fn* callback, if there are additional restrictions;
* - if *sec* is of the form "abc+", it will successfully match both
* SEC("abc") and SEC("abc/whatever") forms;
* - if *sec* is NULL, custom handler is registered for any BPF program that
* doesn't match any of the registered (custom or libbpf's own) SEC()
* handlers. There could be only one such generic custom handler registered
* at any given time.
*
* All custom handlers (except the one with *sec* == NULL) are processed
* before libbpf's own SEC() handlers. It is allowed to "override" libbpf's
* SEC() handlers by registering custom ones for the same section prefix
* (i.e., it's possible to have custom SEC("perf_event/LLC-load-misses")
* handler).
*
* Note, like much of global libbpf APIs (e.g., libbpf_set_print(),
* libbpf_set_strict_mode(), etc)) these APIs are not thread-safe. User needs
* to ensure synchronization if there is a risk of running this API from
* multiple threads simultaneously.
*/
LIBBPF_API int libbpf_register_prog_handler(const char *sec,
enum bpf_prog_type prog_type,
enum bpf_attach_type exp_attach_type,
const struct libbpf_prog_handler_opts *opts);
/**
* @brief *libbpf_unregister_prog_handler()* unregisters previously registered
* custom BPF program SEC() handler.
* @param handler_id handler ID returned by *libbpf_register_prog_handler()*
* after successful registration
* @return 0 on success, negative error code if handler isn't found
*
* Note, like much of global libbpf APIs (e.g., libbpf_set_print(),
* libbpf_set_strict_mode(), etc)) these APIs are not thread-safe. User needs
* to ensure synchronization if there is a risk of running this API from
* multiple threads simultaneously.
*/
LIBBPF_API int libbpf_unregister_prog_handler(int handler_id);
#ifdef __cplusplus
} /* extern "C" */
#endif
#endif /* __LIBBPF_LIBBPF_H */