Andrii Nakryiko says:

====================
bpf-next 2021-12-10 v2

We've added 115 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).

The main changes are:

1) Various samples fixes, from Alexander Lobakin.

2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.

3) A batch of new unified APIs for libbpf, logging improvements, version
   querying, etc. Also a batch of old deprecations for old APIs and various
   bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.

4) BPF documentation reorganization and improvements, from Christoph Hellwig
   and Dave Tucker.

5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
   libbpf, from Hengqi Chen.

6) Verifier log fixes, from Hou Tao.

7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.

8) Extend branch record capturing to all platforms that support it,
   from Kajol Jain.

9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.

10) bpftool doc-generating script improvements, from Quentin Monnet.

11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.

12) Deprecation warning fix for perf/bpf_counter, from Song Liu.

13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
    from Tiezhu Yang.

14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.

15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
    Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
    and others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
  libbpf: Add "bool skipped" to struct bpf_map
  libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
  bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
  selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
  selftests/bpf: Add test for libbpf's custom log_buf behavior
  selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
  libbpf: Deprecate bpf_object__load_xattr()
  libbpf: Add per-program log buffer setter and getter
  libbpf: Preserve kernel error code and remove kprobe prog type guessing
  libbpf: Improve logging around BPF program loading
  libbpf: Allow passing user log setting through bpf_object_open_opts
  libbpf: Allow passing preallocated log_buf when loading BTF into kernel
  libbpf: Add OPTS-based bpf_btf_load() API
  libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
  samples/bpf: Remove unneeded variable
  bpf: Remove redundant assignment to pointer t
  selftests/bpf: Fix a compilation warning
  perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
  samples: bpf: Fix 'unknown warning group' build warning on Clang
  samples: bpf: Fix xdp_sample_user.o linking with Clang
  ...
====================

Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Jakub Kicinski 2021-12-10 15:56:10 -08:00
commit be3158290d
181 changed files with 5750 additions and 2565 deletions

View File

@ -3,7 +3,7 @@ BPF Type Format (BTF)
=====================
1. Introduction
***************
===============
BTF (BPF Type Format) is the metadata format which encodes the debug info
related to BPF program/map. The name BTF was used initially to describe data
@ -30,7 +30,7 @@ sections are discussed in details in :ref:`BTF_Type_String`.
.. _BTF_Type_String:
2. BTF Type and String Encoding
*******************************
===============================
The file ``include/uapi/linux/btf.h`` provides high-level definition of how
types/strings are encoded.
@ -57,13 +57,13 @@ little-endian target. The ``btf_header`` is designed to be extensible with
generated.
2.1 String Encoding
===================
-------------------
The first string in the string section must be a null string. The rest of
string table is a concatenation of other null-terminated strings.
2.2 Type Encoding
=================
-----------------
The type id ``0`` is reserved for ``void`` type. The type section is parsed
sequentially and type id is assigned to each recognized type starting from id
@ -504,7 +504,7 @@ valid index (starting from 0) pointing to a member or an argument.
* ``type``: the type with ``btf_type_tag`` attribute
3. BTF Kernel API
*****************
=================
The following bpf syscall command involves BTF:
* BPF_BTF_LOAD: load a blob of BTF data into kernel
@ -547,14 +547,14 @@ The workflow typically looks like:
3.1 BPF_BTF_LOAD
================
----------------
Load a blob of BTF data into kernel. A blob of data, described in
:ref:`BTF_Type_String`, can be directly loaded into the kernel. A ``btf_fd``
is returned to a userspace.
3.2 BPF_MAP_CREATE
==================
------------------
A map can be created with ``btf_fd`` and specified key/value type id.::
@ -581,7 +581,7 @@ automatically.
.. _BPF_Prog_Load:
3.3 BPF_PROG_LOAD
=================
-----------------
During prog_load, func_info and line_info can be passed to kernel with proper
values for the following attributes:
@ -631,7 +631,7 @@ For line_info, the line number and column number are defined as below:
#define BPF_LINE_INFO_LINE_COL(line_col) ((line_col) & 0x3ff)
3.4 BPF_{PROG,MAP}_GET_NEXT_ID
==============================
------------------------------
In kernel, every loaded program, map or btf has a unique id. The id won't
change during the lifetime of a program, map, or btf.
@ -641,13 +641,13 @@ each command, to user space, for bpf program or maps, respectively, so an
inspection tool can inspect all programs and maps.
3.5 BPF_{PROG,MAP}_GET_FD_BY_ID
===============================
-------------------------------
An introspection tool cannot use id to get details about program or maps.
A file descriptor needs to be obtained first for reference-counting purpose.
3.6 BPF_OBJ_GET_INFO_BY_FD
==========================
--------------------------
Once a program/map fd is acquired, an introspection tool can get the detailed
information from kernel about this fd, some of which are BTF-related. For
@ -656,7 +656,7 @@ example, ``bpf_map_info`` returns ``btf_id`` and key/value type ids.
bpf byte codes, and jited_line_info.
3.7 BPF_BTF_GET_FD_BY_ID
========================
------------------------
With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, bpf
syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. Then, with
@ -668,10 +668,10 @@ tool has full btf knowledge and is able to pretty print map key/values, dump
func signatures and line info, along with byte/jit codes.
4. ELF File Format Interface
****************************
============================
4.1 .BTF section
================
----------------
The .BTF section contains type and string data. The format of this section is
same as the one describe in :ref:`BTF_Type_String`.
@ -679,7 +679,7 @@ same as the one describe in :ref:`BTF_Type_String`.
.. _BTF_Ext_Section:
4.2 .BTF.ext section
====================
--------------------
The .BTF.ext section encodes func_info and line_info which needs loader
manipulation before loading into the kernel.
@ -743,7 +743,7 @@ bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the
beginning of section (``btf_ext_info_sec->sec_name_off``).
4.2 .BTF_ids section
====================
--------------------
The .BTF_ids section encodes BTF ID values that are used within the kernel.
@ -804,10 +804,10 @@ All the BTF ID lists and sets are compiled in the .BTF_ids section and
resolved during the linking phase of kernel build by ``resolve_btfids`` tool.
5. Using BTF
************
============
5.1 bpftool map pretty print
============================
----------------------------
With BTF, the map key/value can be printed based on fields rather than simply
raw bytes. This is especially valuable for large structure or if your data
@ -849,7 +849,7 @@ bpftool is able to pretty print like below:
]
5.2 bpftool prog dump
=====================
---------------------
The following is an example showing how func_info and line_info can help prog
dump with better kernel symbol names, function prototypes and line
@ -883,7 +883,7 @@ information.::
[...]
5.3 Verifier Log
================
----------------
The following is an example of how line_info can help debugging verification
failure.::
@ -909,7 +909,7 @@ failure.::
R2 offset is outside of the packet
6. BTF Generation
*****************
=================
You need latest pahole
@ -1016,6 +1016,6 @@ format.::
.long 8206 # Line 8 Col 14
7. Testing
**********
==========
Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests.

11
Documentation/bpf/faq.rst Normal file
View File

@ -0,0 +1,11 @@
================================
Frequently asked questions (FAQ)
================================
Two sets of Questions and Answers (Q&A) are maintained.
.. toctree::
:maxdepth: 1
bpf_design_QA
bpf_devel_QA

View File

@ -0,0 +1,7 @@
Helper functions
================
* `bpf-helpers(7)`_ maintains a list of helpers available to eBPF programs.
.. Links
.. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html

View File

@ -5,104 +5,32 @@ BPF Documentation
This directory contains documentation for the BPF (Berkeley Packet
Filter) facility, with a focus on the extended BPF version (eBPF).
This kernel side documentation is still work in progress. The main
textual documentation is (for historical reasons) described in
:ref:`networking-filter`, which describe both classical and extended
BPF instruction-set.
This kernel side documentation is still work in progress.
The Cilium project also maintains a `BPF and XDP Reference Guide`_
that goes into great technical depth about the BPF Architecture.
libbpf
======
Documentation/bpf/libbpf/index.rst is a userspace library for loading and interacting with bpf programs.
BPF Type Format (BTF)
=====================
.. toctree::
:maxdepth: 1
instruction-set
verifier
libbpf/index
btf
Frequently asked questions (FAQ)
================================
Two sets of Questions and Answers (Q&A) are maintained.
.. toctree::
:maxdepth: 1
bpf_design_QA
bpf_devel_QA
Syscall API
===========
The primary info for the bpf syscall is available in the `man-pages`_
for `bpf(2)`_. For more information about the userspace API, see
Documentation/userspace-api/ebpf/index.rst.
Helper functions
================
* `bpf-helpers(7)`_ maintains a list of helpers available to eBPF programs.
Program types
=============
.. toctree::
:maxdepth: 1
prog_cgroup_sockopt
prog_cgroup_sysctl
prog_flow_dissector
bpf_lsm
prog_sk_lookup
Map types
=========
.. toctree::
:maxdepth: 1
map_cgroup_storage
Testing and debugging BPF
=========================
.. toctree::
:maxdepth: 1
drgn
s390
Licensing
=========
.. toctree::
:maxdepth: 1
faq
syscall_api
helpers
programs
maps
bpf_licensing
test_debug
other
.. only:: subproject and html
Other
=====
Indices
=======
.. toctree::
:maxdepth: 1
ringbuf
llvm_reloc
* :ref:`genindex`
.. Links:
.. _networking-filter: ../networking/filter.rst
.. _man-pages: https://www.kernel.org/doc/man-pages/
.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html
.. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html
.. _BPF and XDP Reference Guide: https://docs.cilium.io/en/latest/bpf/

View File

@ -0,0 +1,467 @@
====================
eBPF Instruction Set
====================
eBPF is designed to be JITed with one to one mapping, which can also open up
the possibility for GCC/LLVM compilers to generate optimized eBPF code through
an eBPF backend that performs almost as fast as natively compiled code.
Some core changes of the eBPF format from classic BPF:
- Number of registers increase from 2 to 10:
The old format had two registers A and X, and a hidden frame pointer. The
new layout extends this to be 10 internal registers and a read-only frame
pointer. Since 64-bit CPUs are passing arguments to functions via registers
the number of args from eBPF program to in-kernel function is restricted
to 5 and one register is used to accept return value from an in-kernel
function. Natively, x86_64 passes first 6 arguments in registers, aarch64/
sparcv9/mips64 have 7 - 8 registers for arguments; x86_64 has 6 callee saved
registers, and aarch64/sparcv9/mips64 have 11 or more callee saved registers.
Therefore, eBPF calling convention is defined as:
* R0 - return value from in-kernel function, and exit value for eBPF program
* R1 - R5 - arguments from eBPF program to in-kernel function
* R6 - R9 - callee saved registers that in-kernel function will preserve
* R10 - read-only frame pointer to access stack
Thus, all eBPF registers map one to one to HW registers on x86_64, aarch64,
etc, and eBPF calling convention maps directly to ABIs used by the kernel on
64-bit architectures.
On 32-bit architectures JIT may map programs that use only 32-bit arithmetic
and may let more complex programs to be interpreted.
R0 - R5 are scratch registers and eBPF program needs spill/fill them if
necessary across calls. Note that there is only one eBPF program (== one
eBPF main routine) and it cannot call other eBPF functions, it can only
call predefined in-kernel functions, though.
- Register width increases from 32-bit to 64-bit:
Still, the semantics of the original 32-bit ALU operations are preserved
via 32-bit subregisters. All eBPF registers are 64-bit with 32-bit lower
subregisters that zero-extend into 64-bit if they are being written to.
That behavior maps directly to x86_64 and arm64 subregister definition, but
makes other JITs more difficult.
32-bit architectures run 64-bit eBPF programs via interpreter.
Their JITs may convert BPF programs that only use 32-bit subregisters into
native instruction set and let the rest being interpreted.
Operation is 64-bit, because on 64-bit architectures, pointers are also
64-bit wide, and we want to pass 64-bit values in/out of kernel functions,
so 32-bit eBPF registers would otherwise require to define register-pair
ABI, thus, there won't be able to use a direct eBPF register to HW register
mapping and JIT would need to do combine/split/move operations for every
register in and out of the function, which is complex, bug prone and slow.
Another reason is the use of atomic 64-bit counters.
- Conditional jt/jf targets replaced with jt/fall-through:
While the original design has constructs such as ``if (cond) jump_true;
else jump_false;``, they are being replaced into alternative constructs like
``if (cond) jump_true; /* else fall-through */``.
- Introduces bpf_call insn and register passing convention for zero overhead
calls from/to other kernel functions:
Before an in-kernel function call, the eBPF program needs to
place function arguments into R1 to R5 registers to satisfy calling
convention, then the interpreter will take them from registers and pass
to in-kernel function. If R1 - R5 registers are mapped to CPU registers
that are used for argument passing on given architecture, the JIT compiler
doesn't need to emit extra moves. Function arguments will be in the correct
registers and BPF_CALL instruction will be JITed as single 'call' HW
instruction. This calling convention was picked to cover common call
situations without performance penalty.
After an in-kernel function call, R1 - R5 are reset to unreadable and R0 has
a return value of the function. Since R6 - R9 are callee saved, their state
is preserved across the call.
For example, consider three C functions::
u64 f1() { return (*_f2)(1); }
u64 f2(u64 a) { return f3(a + 1, a); }
u64 f3(u64 a, u64 b) { return a - b; }
GCC can compile f1, f3 into x86_64::
f1:
movl $1, %edi
movq _f2(%rip), %rax
jmp *%rax
f3:
movq %rdi, %rax
subq %rsi, %rax
ret
Function f2 in eBPF may look like::
f2:
bpf_mov R2, R1
bpf_add R1, 1
bpf_call f3
bpf_exit
If f2 is JITed and the pointer stored to ``_f2``. The calls f1 -> f2 -> f3 and
returns will be seamless. Without JIT, __bpf_prog_run() interpreter needs to
be used to call into f2.
For practical reasons all eBPF programs have only one argument 'ctx' which is
already placed into R1 (e.g. on __bpf_prog_run() startup) and the programs
can call kernel functions with up to 5 arguments. Calls with 6 or more arguments
are currently not supported, but these restrictions can be lifted if necessary
in the future.
On 64-bit architectures all register map to HW registers one to one. For
example, x86_64 JIT compiler can map them as ...
::
R0 - rax
R1 - rdi
R2 - rsi
R3 - rdx
R4 - rcx
R5 - r8
R6 - rbx
R7 - r13
R8 - r14
R9 - r15
R10 - rbp
... since x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing
and rbx, r12 - r15 are callee saved.
Then the following eBPF pseudo-program::
bpf_mov R6, R1 /* save ctx */
bpf_mov R2, 2
bpf_mov R3, 3
bpf_mov R4, 4
bpf_mov R5, 5
bpf_call foo
bpf_mov R7, R0 /* save foo() return value */
bpf_mov R1, R6 /* restore ctx for next call */
bpf_mov R2, 6
bpf_mov R3, 7
bpf_mov R4, 8
bpf_mov R5, 9
bpf_call bar
bpf_add R0, R7
bpf_exit
After JIT to x86_64 may look like::
push %rbp
mov %rsp,%rbp
sub $0x228,%rsp
mov %rbx,-0x228(%rbp)
mov %r13,-0x220(%rbp)
mov %rdi,%rbx
mov $0x2,%esi
mov $0x3,%edx
mov $0x4,%ecx
mov $0x5,%r8d
callq foo
mov %rax,%r13
mov %rbx,%rdi
mov $0x6,%esi
mov $0x7,%edx
mov $0x8,%ecx
mov $0x9,%r8d
callq bar
add %r13,%rax
mov -0x228(%rbp),%rbx
mov -0x220(%rbp),%r13
leaveq
retq
Which is in this example equivalent in C to::
u64 bpf_filter(u64 ctx)
{
return foo(ctx, 2, 3, 4, 5) + bar(ctx, 6, 7, 8, 9);
}
In-kernel functions foo() and bar() with prototype: u64 (*)(u64 arg1, u64
arg2, u64 arg3, u64 arg4, u64 arg5); will receive arguments in proper
registers and place their return value into ``%rax`` which is R0 in eBPF.
Prologue and epilogue are emitted by JIT and are implicit in the
interpreter. R0-R5 are scratch registers, so eBPF program needs to preserve
them across the calls as defined by calling convention.
For example the following program is invalid::
bpf_mov R1, 1
bpf_call foo
bpf_mov R0, R1
bpf_exit
After the call the registers R1-R5 contain junk values and cannot be read.
An in-kernel `eBPF verifier`_ is used to validate eBPF programs.
Also in the new design, eBPF is limited to 4096 insns, which means that any
program will terminate quickly and will only call a fixed number of kernel
functions. Original BPF and eBPF are two operand instructions,
which helps to do one-to-one mapping between eBPF insn and x86 insn during JIT.
The input context pointer for invoking the interpreter function is generic,
its content is defined by a specific use case. For seccomp register R1 points
to seccomp_data, for converted BPF filters R1 points to a skb.
A program, that is translated internally consists of the following elements::
op:16, jt:8, jf:8, k:32 ==> op:8, dst_reg:4, src_reg:4, off:16, imm:32
So far 87 eBPF instructions were implemented. 8-bit 'op' opcode field
has room for new instructions. Some of them may use 16/24/32 byte encoding. New
instructions must be multiple of 8 bytes to preserve backward compatibility.
eBPF is a general purpose RISC instruction set. Not every register and
every instruction are used during translation from original BPF to eBPF.
For example, socket filters are not using ``exclusive add`` instruction, but
tracing filters may do to maintain counters of events, for example. Register R9
is not used by socket filters either, but more complex filters may be running
out of registers and would have to resort to spill/fill to stack.
eBPF can be used as a generic assembler for last step performance
optimizations, socket filters and seccomp are using it as assembler. Tracing
filters may use it as assembler to generate code from kernel. In kernel usage
may not be bounded by security considerations, since generated eBPF code
may be optimizing internal code path and not being exposed to the user space.
Safety of eBPF can come from the `eBPF verifier`_. In such use cases as
described, it may be used as safe instruction set.
Just like the original BPF, eBPF runs within a controlled environment,
is deterministic and the kernel can easily prove that. The safety of the program
can be determined in two steps: first step does depth-first-search to disallow
loops and other CFG validation; second step starts from the first insn and
descends all possible paths. It simulates execution of every insn and observes
the state change of registers and stack.
eBPF opcode encoding
====================
eBPF is reusing most of the opcode encoding from classic to simplify conversion
of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code'
field is divided into three parts::
+----------------+--------+--------------------+
| 4 bits | 1 bit | 3 bits |
| operation code | source | instruction class |
+----------------+--------+--------------------+
(MSB) (LSB)
Three LSB bits store instruction class which is one of:
=================== ===============
Classic BPF classes eBPF classes
=================== ===============
BPF_LD 0x00 BPF_LD 0x00
BPF_LDX 0x01 BPF_LDX 0x01
BPF_ST 0x02 BPF_ST 0x02
BPF_STX 0x03 BPF_STX 0x03
BPF_ALU 0x04 BPF_ALU 0x04
BPF_JMP 0x05 BPF_JMP 0x05
BPF_RET 0x06 BPF_JMP32 0x06
BPF_MISC 0x07 BPF_ALU64 0x07
=================== ===============
When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ...
::
BPF_K 0x00
BPF_X 0x08
* in classic BPF, this means::
BPF_SRC(code) == BPF_X - use register X as source operand
BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
* in eBPF, this means::
BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand
BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
... and four MSB bits store operation code.
If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of::
BPF_ADD 0x00
BPF_SUB 0x10
BPF_MUL 0x20
BPF_DIV 0x30
BPF_OR 0x40
BPF_AND 0x50
BPF_LSH 0x60
BPF_RSH 0x70
BPF_NEG 0x80
BPF_MOD 0x90
BPF_XOR 0xa0
BPF_MOV 0xb0 /* eBPF only: mov reg to reg */
BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */
BPF_END 0xd0 /* eBPF only: endianness conversion */
If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of::
BPF_JA 0x00 /* BPF_JMP only */
BPF_JEQ 0x10
BPF_JGT 0x20
BPF_JGE 0x30
BPF_JSET 0x40
BPF_JNE 0x50 /* eBPF only: jump != */
BPF_JSGT 0x60 /* eBPF only: signed '>' */
BPF_JSGE 0x70 /* eBPF only: signed '>=' */
BPF_CALL 0x80 /* eBPF BPF_JMP only: function call */
BPF_EXIT 0x90 /* eBPF BPF_JMP only: function return */
BPF_JLT 0xa0 /* eBPF only: unsigned '<' */
BPF_JLE 0xb0 /* eBPF only: unsigned '<=' */
BPF_JSLT 0xc0 /* eBPF only: signed '<' */
BPF_JSLE 0xd0 /* eBPF only: signed '<=' */
So BPF_ADD | BPF_X | BPF_ALU means 32-bit addition in both classic BPF
and eBPF. There are only two registers in classic BPF, so it means A += X.
In eBPF it means dst_reg = (u32) dst_reg + (u32) src_reg; similarly,
BPF_XOR | BPF_K | BPF_ALU means A ^= imm32 in classic BPF and analogous
src_reg = (u32) src_reg ^ (u32) imm32 in eBPF.
Classic BPF is using BPF_MISC class to represent A = X and X = A moves.
eBPF is using BPF_MOV | BPF_X | BPF_ALU code instead. Since there are no
BPF_MISC operations in eBPF, the class 7 is used as BPF_ALU64 to mean
exactly the same operations as BPF_ALU, but with 64-bit wide operands
instead. So BPF_ADD | BPF_X | BPF_ALU64 means 64-bit addition, i.e.:
dst_reg = dst_reg + src_reg
Classic BPF wastes the whole BPF_RET class to represent a single ``ret``
operation. Classic BPF_RET | BPF_K means copy imm32 into return register
and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT
in eBPF means function exit only. The eBPF program needs to store return
value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as
BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide
operands for the comparisons instead.
For load and store instructions the 8-bit 'code' field is divided as::
+--------+--------+-------------------+
| 3 bits | 2 bits | 3 bits |
| mode | size | instruction class |
+--------+--------+-------------------+
(MSB) (LSB)
Size modifier is one of ...
::
BPF_W 0x00 /* word */
BPF_H 0x08 /* half word */
BPF_B 0x10 /* byte */
BPF_DW 0x18 /* eBPF only, double word */
... which encodes size of load/store operation::
B - 1 byte
H - 2 byte
W - 4 byte
DW - 8 byte (eBPF only)
Mode modifier is one of::
BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
BPF_ABS 0x20
BPF_IND 0x40
BPF_MEM 0x60
BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */
BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */
BPF_ATOMIC 0xc0 /* eBPF only, atomic operations */
eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
(BPF_IND | <size> | BPF_LD) which are used to access packet data.
They had to be carried over from classic to have strong performance of
socket filters running in eBPF interpreter. These instructions can only
be used when interpreter context is a pointer to ``struct sk_buff`` and
have seven implicit operands. Register R6 is an implicit input that must
contain pointer to sk_buff. Register R0 is an implicit output which contains
the data fetched from the packet. Registers R1-R5 are scratch registers
and must not be used to store the data across BPF_ABS | BPF_LD or
BPF_IND | BPF_LD instructions.
These instructions have implicit program exit condition as well. When
eBPF program is trying to access the data beyond the packet boundary,
the interpreter will abort the execution of the program. JIT compilers
therefore must preserve this property. src_reg and imm32 fields are
explicit inputs to these instructions.
For example::
BPF_IND | BPF_W | BPF_LD means:
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
and R1 - R5 were scratched.
Unlike classic BPF instruction set, eBPF has generic load/store operations::
BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg
BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32
BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off)
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW.
It also includes atomic operations, which use the immediate field for extra
encoding::
.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
The basic atomic operations supported are::
BPF_ADD
BPF_AND
BPF_OR
BPF_XOR
Each having equivalent semantics with the ``BPF_ADD`` example, that is: the
memory location addresed by ``dst_reg + off`` is atomically modified, with
``src_reg`` as the other operand. If the ``BPF_FETCH`` flag is set in the
immediate, then these operations also overwrite ``src_reg`` with the
value that was in memory before it was modified.
The more special operations are::
BPF_XCHG
This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg +
off``. ::
BPF_CMPXCHG
This atomically compares the value addressed by ``dst_reg + off`` with
``R0``. If they match it is replaced with ``src_reg``. In either case, the
value that was there before is zero-extended and loaded back to ``R0``.
Note that 1 and 2 byte atomic operations are not supported.
Clang can generate atomic instructions by default when ``-mcpu=v3`` is
enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
the atomics features, while keeping a lower ``-mcpu`` version, you can use
``-Xclang -target-feature -Xclang +alu32``.
You may encounter ``BPF_XADD`` - this is a legacy name for ``BPF_ATOMIC``,
referring to the exclusive-add operation encoded when the immediate field is
zero.
eBPF has one 16-byte instruction: ``BPF_LD | BPF_DW | BPF_IMM`` which consists
of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
instruction that loads 64-bit immediate value into a dst_reg.
Classic BPF has similar instruction: ``BPF_LD | BPF_W | BPF_IMM`` which loads
32-bit immediate value into a register.
.. Links:
.. _eBPF verifier: verifiers.rst

View File

@ -3,8 +3,6 @@
libbpf
======
For API documentation see the `versioned API documentation site <https://libbpf.readthedocs.io/en/latest/api.html>`_.
.. toctree::
:maxdepth: 1
@ -14,6 +12,8 @@ For API documentation see the `versioned API documentation site <https://libbpf.
This is documentation for libbpf, a userspace library for loading and
interacting with bpf programs.
For API documentation see the `versioned API documentation site <https://libbpf.readthedocs.io/en/latest/api.html>`_.
All general BPF questions, including kernel functionality, libbpf APIs and
their application, should be sent to bpf@vger.kernel.org mailing list.
You can `subscribe <http://vger.kernel.org/vger-lists.html#bpf>`_ to the

View File

@ -0,0 +1,52 @@
=========
eBPF maps
=========
'maps' is a generic storage of different types for sharing data between kernel
and userspace.
The maps are accessed from user space via BPF syscall, which has commands:
- create a map with given type and attributes
``map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)``
using attr->map_type, attr->key_size, attr->value_size, attr->max_entries
returns process-local file descriptor or negative error
- lookup key in a given map
``err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key, attr->value
returns zero and stores found elem into value or negative error
- create or update key/value pair in a given map
``err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key, attr->value
returns zero or negative error
- find and delete element by key in a given map
``err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)``
using attr->map_fd, attr->key
- to delete map: close(fd)
Exiting process will delete maps automatically
userspace programs use this syscall to create/access maps that eBPF programs
are concurrently updating.
maps can have different types: hash, array, bloom filter, radix-tree, etc.
The map is defined by:
- type
- max number of elements
- key size in bytes
- value size in bytes
Map Types
=========
.. toctree::
:maxdepth: 1
:glob:
map_*

View File

@ -0,0 +1,9 @@
=====
Other
=====
.. toctree::
:maxdepth: 1
ringbuf
llvm_reloc

View File

@ -0,0 +1,9 @@
=============
Program Types
=============
.. toctree::
:maxdepth: 1
:glob:
prog_*

View File

@ -0,0 +1,11 @@
===========
Syscall API
===========
The primary info for the bpf syscall is available in the `man-pages`_
for `bpf(2)`_. For more information about the userspace API, see
Documentation/userspace-api/ebpf/index.rst.
.. Links:
.. _man-pages: https://www.kernel.org/doc/man-pages/
.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html

View File

@ -0,0 +1,9 @@
=========================
Testing and debugging BPF
=========================
.. toctree::
:maxdepth: 1
drgn
s390

View File

@ -0,0 +1,529 @@
=============
eBPF verifier
=============
The safety of the eBPF program is determined in two steps.
First step does DAG check to disallow loops and other CFG validation.
In particular it will detect programs that have unreachable instructions.
(though classic BPF checker allows them)
Second step starts from the first insn and descends all possible paths.
It simulates execution of every insn and observes the state change of
registers and stack.
At the start of the program the register R1 contains a pointer to context
and has type PTR_TO_CTX.
If verifier sees an insn that does R2=R1, then R2 has now type
PTR_TO_CTX as well and can be used on the right hand side of expression.
If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=SCALAR_VALUE,
since addition of two valid pointers makes invalid pointer.
(In 'secure' mode verifier will reject any type of pointer arithmetic to make
sure that kernel addresses don't leak to unprivileged users)
If register was never written to, it's not readable::
bpf_mov R0 = R2
bpf_exit
will be rejected, since R2 is unreadable at the start of the program.
After kernel function call, R1-R5 are reset to unreadable and
R0 has a return type of the function.
Since R6-R9 are callee saved, their state is preserved across the call.
::
bpf_mov R6 = 1
bpf_call foo
bpf_mov R0 = R6
bpf_exit
is a correct program. If there was R1 instead of R6, it would have
been rejected.
load/store instructions are allowed only with registers of valid types, which
are PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked.
For example::
bpf_mov R1 = 1
bpf_mov R2 = 2
bpf_xadd *(u32 *)(R1 + 3) += R2
bpf_exit
will be rejected, since R1 doesn't have a valid pointer type at the time of
execution of instruction bpf_xadd.
At the start R1 type is PTR_TO_CTX (a pointer to generic ``struct bpf_context``)
A callback is used to customize verifier to restrict eBPF program access to only
certain fields within ctx structure with specified size and alignment.
For example, the following insn::
bpf_ld R0 = *(u32 *)(R6 + 8)
intends to load a word from address R6 + 8 and store it into R0
If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
that offset 8 of size 4 bytes can be accessed for reading, otherwise
the verifier will reject the program.
If R6=PTR_TO_STACK, then access should be aligned and be within
stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8,
so it will fail verification, since it's out of bounds.
The verifier will allow eBPF program to read data from stack only after
it wrote into it.
Classic BPF verifier does similar check with M[0-15] memory slots.
For example::
bpf_ld R0 = *(u32 *)(R10 - 4)
bpf_exit
is invalid program.
Though R10 is correct read-only register and has type PTR_TO_STACK
and R10 - 4 is within stack bounds, there were no stores into that location.
Pointer register spill/fill is tracked as well, since four (R6-R9)
callee saved registers may not be enough for some programs.
Allowed function calls are customized with bpf_verifier_ops->get_func_proto()
The eBPF verifier will check that registers match argument constraints.
After the call register R0 will be set to return type of the function.
Function calls is a main mechanism to extend functionality of eBPF programs.
Socket filters may let programs to call one set of functions, whereas tracing
filters may allow completely different set.
If a function made accessible to eBPF program, it needs to be thought through
from safety point of view. The verifier will guarantee that the function is
called with valid arguments.
seccomp vs socket filters have different security restrictions for classic BPF.
Seccomp solves this by two stage verifier: classic BPF verifier is followed
by seccomp verifier. In case of eBPF one configurable verifier is shared for
all use cases.
See details of eBPF verifier in kernel/bpf/verifier.c
Register value tracking
=======================
In order to determine the safety of an eBPF program, the verifier must track
the range of possible values in each register and also in each stack slot.
This is done with ``struct bpf_reg_state``, defined in include/linux/
bpf_verifier.h, which unifies tracking of scalar and pointer values. Each
register state has a type, which is either NOT_INIT (the register has not been
written to), SCALAR_VALUE (some value which is not usable as a pointer), or a
pointer type. The types of pointers describe their base, as follows:
PTR_TO_CTX
Pointer to bpf_context.
CONST_PTR_TO_MAP
Pointer to struct bpf_map. "Const" because arithmetic
on these pointers is forbidden.
PTR_TO_MAP_VALUE
Pointer to the value stored in a map element.
PTR_TO_MAP_VALUE_OR_NULL
Either a pointer to a map value, or NULL; map accesses
(see maps.rst) return this type, which becomes a
PTR_TO_MAP_VALUE when checked != NULL. Arithmetic on
these pointers is forbidden.
PTR_TO_STACK
Frame pointer.
PTR_TO_PACKET
skb->data.
PTR_TO_PACKET_END
skb->data + headlen; arithmetic forbidden.
PTR_TO_SOCKET
Pointer to struct bpf_sock_ops, implicitly refcounted.
PTR_TO_SOCKET_OR_NULL
Either a pointer to a socket, or NULL; socket lookup
returns this type, which becomes a PTR_TO_SOCKET when
checked != NULL. PTR_TO_SOCKET is reference-counted,
so programs must release the reference through the
socket release function before the end of the program.
Arithmetic on these pointers is forbidden.
However, a pointer may be offset from this base (as a result of pointer
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
offset'. The former is used when an exactly-known value (e.g. an immediate
operand) is added to a pointer, while the latter is used for values which are
not exactly known. The variable offset is also used in SCALAR_VALUEs, to track
the range of possible values in the register.
The verifier's knowledge about the variable offset consists of:
* minimum and maximum values as unsigned
* minimum and maximum values as signed
* knowledge of the values of individual bits, in the form of a 'tnum': a u64
'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown;
1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both
mask and value; no bit should ever be 1 in both. For example, if a byte is read
into a register from memory, the register's top 56 bits are known zero, while
the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we
then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0;
0x1ff), because of potential carries.
Besides arithmetic, the register state can also be updated by conditional
branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch
it will have a umin_value (unsigned minimum value) of 9, whereas in the 'false'
branch it will have a umax_value of 8. A signed compare (with BPF_JSGT or
BPF_JSGE) would instead update the signed minimum/maximum values. Information
from the signed and unsigned bounds can be combined; for instance if a value is
first tested < 8 and then tested s> 4, the verifier will conclude that the value
is also > 4 and s< 8, since the bounds prevent crossing the sign boundary.
PTR_TO_PACKETs with a variable offset part have an 'id', which is common to all
pointers sharing that same variable offset. This is important for packet range
checks: after adding a variable to a packet pointer register A, if you then copy
it to another register B and then add a constant 4 to A, both registers will
share the same 'id' but the A will have a fixed offset of +4. Then if A is
bounds-checked and found to be less than a PTR_TO_PACKET_END, the register B is
now known to have a safe range of at least 4 bytes. See 'Direct packet access',
below, for more on PTR_TO_PACKET ranges.
The 'id' field is also used on PTR_TO_MAP_VALUE_OR_NULL, common to all copies of
the pointer returned from a map lookup. This means that when one copy is
checked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs.
As well as range-checking, the tracked information is also used for enforcing
alignment of pointer accesses. For instance, on most systems the packet pointer
is 2 bytes after a 4-byte alignment. If a program adds 14 bytes to that to jump
over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting
pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
that pointer are safe.
The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
to all copies of the pointer returned from a socket lookup. This has similar
behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
represents a reference to the corresponding ``struct sock``. To ensure that the
reference is not leaked, it is imperative to NULL-check the reference and in
the non-NULL case, and pass the valid reference to the socket release function.
Direct packet access
====================
In cls_bpf and act_bpf programs the verifier allows direct access to the packet
data via skb->data and skb->data_end pointers.
Ex::
1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */
2: r3 = *(u32 *)(r1 +76) /* load skb->data */
3: r5 = r3
4: r5 += 14
5: if r5 > r4 goto pc+16
R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */
this 2byte load from the packet is safe to do, since the program author
did check ``if (skb->data + 14 > skb->data_end) goto err`` at insn #5 which
means that in the fall-through case the register R3 (which points to skb->data)
has at least 14 directly accessible bytes. The verifier marks it
as R3=pkt(id=0,off=0,r=14).
id=0 means that no additional variables were added to the register.
off=0 means that no additional constants were added.
r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok.
Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points
to the packet data, but constant 14 was added to the register, so
it now points to ``skb->data + 14`` and accessible range is [R5, R5 + 14 - 14)
which is zero bytes.
More complex packet access may look like::
R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
7: r4 = *(u8 *)(r3 +12)
8: r4 *= 14
9: r3 = *(u32 *)(r1 +76) /* load skb->data */
10: r3 += r4
11: r2 = r1
12: r2 <<= 48
13: r2 >>= 48
14: r3 += r2
15: r2 = r3
16: r2 += 8
17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */
18: if r2 > r1 goto pc+2
R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp
19: r1 = *(u8 *)(r3 +4)
The state of the register R3 is R3=pkt(id=2,off=0,r=8)
id=2 means that two ``r3 += rX`` instructions were seen, so r3 points to some
offset within a packet and since the program author did
``if (r3 + 8 > r1) goto err`` at insn #18, the safe range is [R3, R3 + 8).
The verifier only allows 'add'/'sub' operations on packet registers. Any other
operation will set the register state to 'SCALAR_VALUE' and it won't be
available for direct packet access.
Operation ``r3 += rX`` may overflow and become less than original skb->data,
therefore the verifier has to prevent that. So when it sees ``r3 += rX``
instruction and rX is more than 16-bit value, any subsequent bounds-check of r3
against skb->data_end will not give us 'range' information, so attempts to read
through the pointer will give "invalid access to packet" error.
Ex. after insn ``r4 = *(u8 *)(r3 +12)`` (insn #7 above) the state of r4 is
R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits
of the register are guaranteed to be zero, and nothing is known about the lower
8 bits. After insn ``r4 *= 14`` the state becomes
R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit
value by constant 14 will keep upper 52 bits as zero, also the least significant
bit will be zero as 14 is even. Similarly ``r2 >>= 48`` will make
R2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign
extending. This logic is implemented in adjust_reg_min_max_vals() function,
which calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice
versa) and adjust_scalar_min_max_vals() for operations on two scalars.
The end result is that bpf program author can access packet directly
using normal C code as::
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;
struct eth_hdr *eth = data;
struct iphdr *iph = data + sizeof(*eth);
struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph);
if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
return 0;
if (eth->h_proto != htons(ETH_P_IP))
return 0;
if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
return 0;
if (udp->dest == 53 || udp->source == 9)
...;
which makes such programs easier to write comparing to LD_ABS insn
and significantly faster.
Pruning
=======
The verifier does not actually walk all possible paths through the program. For
each new branch to analyse, the verifier looks at all the states it's previously
been in when at this instruction. If any of them contain the current state as a
subset, the branch is 'pruned' - that is, the fact that the previous state was
accepted implies the current state would be as well. For instance, if in the
previous state, r1 held a packet-pointer, and in the current state, r1 holds a
packet-pointer with a range as long or longer and at least as strict an
alignment, then r1 is safe. Similarly, if r2 was NOT_INIT before then it can't
have been used by any path from that point, so any value in r2 (including
another NOT_INIT) is safe. The implementation is in the function regsafe().
Pruning considers not only the registers but also the stack (and any spilled
registers it may hold). They must all be safe for the branch to be pruned.
This is implemented in states_equal().
Understanding eBPF verifier messages
====================================
The following are few examples of invalid eBPF programs and verifier error
messages as seen in the log:
Program with unreachable instructions::
static struct bpf_insn prog[] = {
BPF_EXIT_INSN(),
BPF_EXIT_INSN(),
};
Error:
unreachable insn 1
Program that reads uninitialized register::
BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
BPF_EXIT_INSN(),
Error::
0: (bf) r0 = r2
R2 !read_ok
Program that doesn't initialize R0 before exiting::
BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
BPF_EXIT_INSN(),
Error::
0: (bf) r2 = r1
1: (95) exit
R0 !read_ok
Program that accesses stack out of bounds::
BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
BPF_EXIT_INSN(),
Error::
0: (7a) *(u64 *)(r10 +8) = 0
invalid stack off=8 size=8
Program that doesn't initialize stack before passing its address into function::
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_EXIT_INSN(),
Error::
0: (bf) r2 = r10
1: (07) r2 += -8
2: (b7) r1 = 0x0
3: (85) call 1
invalid indirect read from stack off -8+0 size 8
Program that uses invalid map_fd=0 while calling to map_lookup_elem() function::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_EXIT_INSN(),
Error::
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (b7) r1 = 0x0
4: (85) call 1
fd 0 is not pointing to valid bpf_map
Program that doesn't check return value of map_lookup_elem() before accessing
map element::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
BPF_EXIT_INSN(),
Error::
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (b7) r1 = 0x0
4: (85) call 1
5: (7a) *(u64 *)(r0 +0) = 0
R0 invalid mem access 'map_value_or_null'
Program that correctly checks map_lookup_elem() returned value for NULL, but
accesses the memory with incorrect alignment::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
BPF_EXIT_INSN(),
Error::
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (b7) r1 = 1
4: (85) call 1
5: (15) if r0 == 0x0 goto pc+1
R0=map_ptr R10=fp
6: (7a) *(u64 *)(r0 +4) = 0
misaligned access off 4 size 8
Program that correctly checks map_lookup_elem() returned value for NULL and
accesses memory with correct alignment in one side of 'if' branch, but fails
to do so in the other side of 'if' branch::
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
BPF_EXIT_INSN(),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
Error::
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (b7) r1 = 1
4: (85) call 1
5: (15) if r0 == 0x0 goto pc+2
R0=map_ptr R10=fp
6: (7a) *(u64 *)(r0 +0) = 0
7: (95) exit
from 5 to 8: R0=imm0 R10=fp
8: (7a) *(u64 *)(r0 +0) = 1
R0 invalid mem access 'imm'
Program that performs a socket lookup then sets the pointer to NULL without
checking it::
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_MOV64_IMM(BPF_REG_3, 4),
BPF_MOV64_IMM(BPF_REG_4, 0),
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
Error::
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
3: (07) r2 += -8
4: (b7) r3 = 4
5: (b7) r4 = 0
6: (b7) r5 = 0
7: (85) call bpf_sk_lookup_tcp#65
8: (b7) r0 = 0
9: (95) exit
Unreleased reference id=1, alloc_insn=7
Program that performs a socket lookup but does not NULL-check the returned
value::
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_MOV64_IMM(BPF_REG_3, 4),
BPF_MOV64_IMM(BPF_REG_4, 0),
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_EXIT_INSN(),
Error::
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
3: (07) r2 += -8
4: (b7) r3 = 4
5: (b7) r4 = 0
6: (b7) r5 = 0
7: (85) call bpf_sk_lookup_tcp#65
8: (95) exit
Unreleased reference id=1, alloc_insn=7

File diff suppressed because it is too large Load Diff

View File

@ -3569,7 +3569,7 @@ R: Florent Revest <revest@chromium.org>
R: Brendan Jackman <jackmanb@chromium.org>
L: bpf@vger.kernel.org
S: Maintained
F: Documentation/bpf/bpf_lsm.rst
F: Documentation/bpf/prog_lsm.rst
F: include/linux/bpf_lsm.h
F: kernel/bpf/bpf_lsm.c
F: security/bpf/

View File

@ -163,7 +163,7 @@ static const s8 bpf2a32[][2] = {
[BPF_REG_9] = {STACK_OFFSET(BPF_R9_HI), STACK_OFFSET(BPF_R9_LO)},
/* Read only Frame Pointer to access Stack */
[BPF_REG_FP] = {STACK_OFFSET(BPF_FP_HI), STACK_OFFSET(BPF_FP_LO)},
/* Temporary Register for internal BPF JIT, can be used
/* Temporary Register for BPF JIT, can be used
* for constant blindings and others.
*/
[TMP_REG_1] = {ARM_R7, ARM_R6},
@ -1199,7 +1199,8 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
/* tmp2[0] = array, tmp2[1] = index */
/* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
/*
* if (tail_call_cnt >= MAX_TAIL_CALL_CNT)
* goto out;
* tail_call_cnt++;
*/
@ -1208,7 +1209,7 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
tc = arm_bpf_get_reg64(tcc, tmp, ctx);
emit(ARM_CMP_I(tc[0], hi), ctx);
_emit(ARM_COND_EQ, ARM_CMP_I(tc[1], lo), ctx);
_emit(ARM_COND_HI, ARM_B(jmp_offset), ctx);
_emit(ARM_COND_CS, ARM_B(jmp_offset), ctx);
emit(ARM_ADDS_I(tc[1], tc[1], 1), ctx);
emit(ARM_ADC_I(tc[0], tc[0], 0), ctx);
arm_bpf_put_reg64(tcc, tmp, ctx);

View File

@ -44,7 +44,7 @@ static const int bpf2a64[] = {
[BPF_REG_9] = A64_R(22),
/* read-only frame pointer to access stack */
[BPF_REG_FP] = A64_R(25),
/* temporary registers for internal BPF JIT */
/* temporary registers for BPF JIT */
[TMP_REG_1] = A64_R(10),
[TMP_REG_2] = A64_R(11),
[TMP_REG_3] = A64_R(12),
@ -287,13 +287,14 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
emit(A64_CMP(0, r3, tmp), ctx);
emit(A64_B_(A64_COND_CS, jmp_offset), ctx);
/* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
/*
* if (tail_call_cnt >= MAX_TAIL_CALL_CNT)
* goto out;
* tail_call_cnt++;
*/
emit_a64_mov_i64(tmp, MAX_TAIL_CALL_CNT, ctx);
emit(A64_CMP(1, tcc, tmp), ctx);
emit(A64_B_(A64_COND_HI, jmp_offset), ctx);
emit(A64_B_(A64_COND_CS, jmp_offset), ctx);
emit(A64_ADD_I(1, tcc, tcc, 1), ctx);
/* prog = array->ptrs[index];

View File

@ -1381,8 +1381,7 @@ void build_prologue(struct jit_context *ctx)
* 16-byte area in the parent's stack frame. On a tail call, the
* calling function jumps into the prologue after these instructions.
*/
emit(ctx, ori, MIPS_R_T9, MIPS_R_ZERO,
min(MAX_TAIL_CALL_CNT + 1, 0xffff));
emit(ctx, ori, MIPS_R_T9, MIPS_R_ZERO, min(MAX_TAIL_CALL_CNT, 0xffff));
emit(ctx, sw, MIPS_R_T9, 0, MIPS_R_SP);
/*

View File

@ -552,7 +552,7 @@ void build_prologue(struct jit_context *ctx)
* On a tail call, the calling function jumps into the prologue
* after this instruction.
*/
emit(ctx, addiu, tc, MIPS_R_ZERO, min(MAX_TAIL_CALL_CNT + 1, 0xffff));
emit(ctx, ori, tc, MIPS_R_ZERO, min(MAX_TAIL_CALL_CNT, 0xffff));
/* === Entry-point for tail calls === */

View File

@ -221,13 +221,13 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
PPC_BCC(COND_GE, out);
/*
* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
* if (tail_call_cnt >= MAX_TAIL_CALL_CNT)
* goto out;
*/
EMIT(PPC_RAW_CMPLWI(_R0, MAX_TAIL_CALL_CNT));
/* tail_call_cnt++; */
EMIT(PPC_RAW_ADDIC(_R0, _R0, 1));
PPC_BCC(COND_GT, out);
PPC_BCC(COND_GE, out);
/* prog = array->ptrs[index]; */
EMIT(PPC_RAW_RLWINM(_R3, b2p_index, 2, 0, 29));

View File

@ -228,12 +228,12 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
PPC_BCC(COND_GE, out);
/*
* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
* if (tail_call_cnt >= MAX_TAIL_CALL_CNT)
* goto out;
*/
PPC_BPF_LL(b2p[TMP_REG_1], 1, bpf_jit_stack_tailcallcnt(ctx));
EMIT(PPC_RAW_CMPLWI(b2p[TMP_REG_1], MAX_TAIL_CALL_CNT));
PPC_BCC(COND_GT, out);
PPC_BCC(COND_GE, out);
/*
* tail_call_cnt++;

View File

@ -799,11 +799,10 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
emit_bcc(BPF_JGE, lo(idx_reg), RV_REG_T1, off, ctx);
/*
* temp_tcc = tcc - 1;
* if (tcc < 0)
* if (--tcc < 0)
* goto out;
*/
emit(rv_addi(RV_REG_T1, RV_REG_TCC, -1), ctx);
emit(rv_addi(RV_REG_TCC, RV_REG_TCC, -1), ctx);
off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
emit_bcc(BPF_JSLT, RV_REG_TCC, RV_REG_ZERO, off, ctx);
@ -829,7 +828,6 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
if (is_12b_check(off, insn))
return -1;
emit(rv_lw(RV_REG_T0, off, RV_REG_T0), ctx);
emit(rv_addi(RV_REG_TCC, RV_REG_T1, 0), ctx);
/* Epilogue jumps to *(t0 + 4). */
__build_epilogue(true, ctx);
return 0;

View File

@ -327,12 +327,12 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
emit_branch(BPF_JGE, RV_REG_A2, RV_REG_T1, off, ctx);
/* if (TCC-- < 0)
/* if (--TCC < 0)
* goto out;
*/
emit_addi(RV_REG_T1, tcc, -1, ctx);
emit_addi(RV_REG_TCC, tcc, -1, ctx);
off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);
emit_branch(BPF_JSLT, RV_REG_TCC, RV_REG_ZERO, off, ctx);
/* prog = array->ptrs[index];
* if (!prog)
@ -352,7 +352,6 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
if (is_12b_check(off, insn))
return -1;
emit_ld(RV_REG_T3, off, RV_REG_T2, ctx);
emit_mv(RV_REG_TCC, RV_REG_T1, ctx);
__build_epilogue(true, ctx);
return 0;
}

View File

@ -1369,7 +1369,7 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
jit->prg);
/*
* if (tail_call_cnt++ > MAX_TAIL_CALL_CNT)
* if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
* goto out;
*/
@ -1381,9 +1381,9 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
EMIT4_IMM(0xa7080000, REG_W0, 1);
/* laal %w1,%w0,off(%r15) */
EMIT6_DISP_LH(0xeb000000, 0x00fa, REG_W1, REG_W0, REG_15, off);
/* clij %w1,MAX_TAIL_CALL_CNT,0x2,out */
/* clij %w1,MAX_TAIL_CALL_CNT-1,0x2,out */
patch_2_clij = jit->prg;
EMIT6_PCREL_RIEC(0xec000000, 0x007f, REG_W1, MAX_TAIL_CALL_CNT,
EMIT6_PCREL_RIEC(0xec000000, 0x007f, REG_W1, MAX_TAIL_CALL_CNT - 1,
2, jit->prg);
/*

View File

@ -227,7 +227,7 @@ static const int bpf2sparc[] = {
[BPF_REG_AX] = G7,
/* temporary register for internal BPF JIT */
/* temporary register for BPF JIT */
[TMP_REG_1] = G1,
[TMP_REG_2] = G2,
[TMP_REG_3] = G3,
@ -867,7 +867,7 @@ static void emit_tail_call(struct jit_ctx *ctx)
emit(LD32 | IMMED | RS1(SP) | S13(off) | RD(tmp), ctx);
emit_cmpi(tmp, MAX_TAIL_CALL_CNT, ctx);
#define OFFSET2 13
emit_branch(BGU, ctx->idx, ctx->idx + OFFSET2, ctx);
emit_branch(BGEU, ctx->idx, ctx->idx + OFFSET2, ctx);
emit_nop(ctx);
emit_alu_K(ADD, tmp, 1, ctx);

View File

@ -1,9 +1,9 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* bpf_jit_comp.c: BPF JIT compiler
* BPF JIT compiler
*
* Copyright (C) 2011-2013 Eric Dumazet (eric.dumazet@gmail.com)
* Internal BPF Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
*/
#include <linux/netdevice.h>
#include <linux/filter.h>
@ -412,7 +412,7 @@ static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
* ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ...
* if (index >= array->map.max_entries)
* goto out;
* if (++tail_call_cnt > MAX_TAIL_CALL_CNT)
* if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
* goto out;
* prog = array->ptrs[index];
* if (prog == NULL)
@ -446,14 +446,14 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
EMIT2(X86_JBE, offset); /* jbe out */
/*
* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
* if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
* goto out;
*/
EMIT2_off32(0x8B, 0x85, tcc_off); /* mov eax, dword ptr [rbp - tcc_off] */
EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */
offset = ctx->tail_call_indirect_label - (prog + 2 - start);
EMIT2(X86_JA, offset); /* ja out */
EMIT2(X86_JAE, offset); /* jae out */
EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */
EMIT2_off32(0x89, 0x85, tcc_off); /* mov dword ptr [rbp - tcc_off], eax */
@ -504,14 +504,14 @@ static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
int offset;
/*
* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
* if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
* goto out;
*/
EMIT2_off32(0x8B, 0x85, tcc_off); /* mov eax, dword ptr [rbp - tcc_off] */
EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */
offset = ctx->tail_call_direct_label - (prog + 2 - start);
EMIT2(X86_JA, offset); /* ja out */
EMIT2(X86_JAE, offset); /* jae out */
EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */
EMIT2_off32(0x89, 0x85, tcc_off); /* mov dword ptr [rbp - tcc_off], eax */

View File

@ -1323,7 +1323,7 @@ static void emit_bpf_tail_call(u8 **pprog, u8 *ip)
EMIT2(IA32_JBE, jmp_label(jmp_label1, 2));
/*
* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
* if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
* goto out;
*/
lo = (u32)MAX_TAIL_CALL_CNT;
@ -1337,7 +1337,7 @@ static void emit_bpf_tail_call(u8 **pprog, u8 *ip)
/* cmp ecx,lo */
EMIT3(0x83, add_1reg(0xF8, IA32_ECX), lo);
/* ja out */
/* jae out */
EMIT2(IA32_JAE, jmp_label(jmp_label1, 2));
/* add eax,0x1 */

View File

@ -1082,7 +1082,7 @@ struct bpf_array {
};
#define BPF_COMPLEXITY_LIMIT_INSNS 1000000 /* yes. 1M insns */
#define MAX_TAIL_CALL_CNT 32
#define MAX_TAIL_CALL_CNT 33
#define BPF_F_ACCESS_MASK (BPF_F_RDONLY | \
BPF_F_RDONLY_PROG | \
@ -1722,6 +1722,14 @@ bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
const struct btf_func_model *
bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
const struct bpf_insn *insn);
struct bpf_core_ctx {
struct bpf_verifier_log *log;
const struct btf *btf;
};
int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
int relo_idx, void *insn);
#else /* !CONFIG_BPF_SYSCALL */
static inline struct bpf_prog *bpf_prog_get(u32 ufd)
{
@ -2154,6 +2162,7 @@ extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto;
extern const struct bpf_func_proto bpf_find_vma_proto;
extern const struct bpf_func_proto bpf_loop_proto;
const struct bpf_func_proto *tracing_prog_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);

View File

@ -396,6 +396,13 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
log->level == BPF_LOG_KERNEL);
}
static inline bool
bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log)
{
return log->len_total >= 128 && log->len_total <= UINT_MAX >> 2 &&
log->level && log->ubuf && !(log->level & ~BPF_LOG_MASK);
}
#define BPF_MAX_SUBPROGS 256
struct bpf_subprog_info {

View File

@ -144,6 +144,53 @@ static inline bool btf_type_is_enum(const struct btf_type *t)
return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM;
}
static inline bool str_is_empty(const char *s)
{
return !s || !s[0];
}
static inline u16 btf_kind(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info);
}
static inline bool btf_is_enum(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_ENUM;
}
static inline bool btf_is_composite(const struct btf_type *t)
{
u16 kind = btf_kind(t);
return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
}
static inline bool btf_is_array(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_ARRAY;
}
static inline bool btf_is_int(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_INT;
}
static inline bool btf_is_ptr(const struct btf_type *t)
{
return btf_kind(t) == BTF_KIND_PTR;
}
static inline u8 btf_int_offset(const struct btf_type *t)
{
return BTF_INT_OFFSET(*(u32 *)(t + 1));
}
static inline u8 btf_int_encoding(const struct btf_type *t)
{
return BTF_INT_ENCODING(*(u32 *)(t + 1));
}
static inline bool btf_type_is_scalar(const struct btf_type *t)
{
return btf_type_is_int(t) || btf_type_is_enum(t);
@ -184,6 +231,11 @@ static inline u16 btf_type_vlen(const struct btf_type *t)
return BTF_INFO_VLEN(t->info);
}
static inline u16 btf_vlen(const struct btf_type *t)
{
return btf_type_vlen(t);
}
static inline u16 btf_func_linkage(const struct btf_type *t)
{
return BTF_INFO_VLEN(t->info);
@ -194,25 +246,54 @@ static inline bool btf_type_kflag(const struct btf_type *t)
return BTF_INFO_KFLAG(t->info);
}
static inline u32 btf_member_bit_offset(const struct btf_type *struct_type,
const struct btf_member *member)
static inline u32 __btf_member_bit_offset(const struct btf_type *struct_type,
const struct btf_member *member)
{
return btf_type_kflag(struct_type) ? BTF_MEMBER_BIT_OFFSET(member->offset)
: member->offset;
}
static inline u32 btf_member_bitfield_size(const struct btf_type *struct_type,
const struct btf_member *member)
static inline u32 __btf_member_bitfield_size(const struct btf_type *struct_type,
const struct btf_member *member)
{
return btf_type_kflag(struct_type) ? BTF_MEMBER_BITFIELD_SIZE(member->offset)
: 0;
}
static inline struct btf_member *btf_members(const struct btf_type *t)
{
return (struct btf_member *)(t + 1);
}
static inline u32 btf_member_bit_offset(const struct btf_type *t, u32 member_idx)
{
const struct btf_member *m = btf_members(t) + member_idx;
return __btf_member_bit_offset(t, m);
}
static inline u32 btf_member_bitfield_size(const struct btf_type *t, u32 member_idx)
{
const struct btf_member *m = btf_members(t) + member_idx;
return __btf_member_bitfield_size(t, m);
}
static inline const struct btf_member *btf_type_member(const struct btf_type *t)
{
return (const struct btf_member *)(t + 1);
}
static inline struct btf_array *btf_array(const struct btf_type *t)
{
return (struct btf_array *)(t + 1);
}
static inline struct btf_enum *btf_enum(const struct btf_type *t)
{
return (struct btf_enum *)(t + 1);
}
static inline const struct btf_var_secinfo *btf_type_var_secinfo(
const struct btf_type *t)
{

View File

@ -1342,8 +1342,10 @@ union bpf_attr {
/* or valid module BTF object fd or 0 to attach to vmlinux */
__u32 attach_btf_obj_fd;
};
__u32 :32; /* pad */
__u32 core_relo_cnt; /* number of bpf_core_relo */
__aligned_u64 fd_array; /* array of FDs */
__aligned_u64 core_relos;
__u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */
};
struct { /* anonymous struct used by BPF_OBJ_* commands */
@ -1744,7 +1746,7 @@ union bpf_attr {
* if the maximum number of tail calls has been reached for this
* chain of programs. This limit is defined in the kernel by the
* macro **MAX_TAIL_CALL_CNT** (not accessible to user space),
* which is currently set to 32.
* which is currently set to 33.
* Return
* 0 on success, or a negative error in case of failure.
*
@ -4957,6 +4959,30 @@ union bpf_attr {
* **-ENOENT** if *task->mm* is NULL, or no vma contains *addr*.
* **-EBUSY** if failed to try lock mmap_lock.
* **-EINVAL** for invalid **flags**.
*
* long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx, u64 flags)
* Description
* For **nr_loops**, call **callback_fn** function
* with **callback_ctx** as the context parameter.
* The **callback_fn** should be a static function and
* the **callback_ctx** should be a pointer to the stack.
* The **flags** is used to control certain aspects of the helper.
* Currently, the **flags** must be 0. Currently, nr_loops is
* limited to 1 << 23 (~8 million) loops.
*
* long (\*callback_fn)(u32 index, void \*ctx);
*
* where **index** is the current index in the loop. The index
* is zero-indexed.
*
* If **callback_fn** returns 0, the helper will continue to the next
* loop. If return value is 1, the helper will skip the rest of
* the loops and return. Other return values are not used now,
* and will be rejected by the verifier.
*
* Return
* The number of loops performed, **-EINVAL** for invalid **flags**,
* **-E2BIG** if **nr_loops** exceeds the maximum number of loops.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@ -5140,6 +5166,7 @@ union bpf_attr {
FN(skc_to_unix_sock), \
FN(kallsyms_lookup_name), \
FN(find_vma), \
FN(loop), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -6349,4 +6376,78 @@ enum {
BTF_F_ZERO = (1ULL << 3),
};
/* bpf_core_relo_kind encodes which aspect of captured field/type/enum value
* has to be adjusted by relocations. It is emitted by llvm and passed to
* libbpf and later to the kernel.
*/
enum bpf_core_relo_kind {
BPF_CORE_FIELD_BYTE_OFFSET = 0, /* field byte offset */
BPF_CORE_FIELD_BYTE_SIZE = 1, /* field size in bytes */
BPF_CORE_FIELD_EXISTS = 2, /* field existence in target kernel */
BPF_CORE_FIELD_SIGNED = 3, /* field signedness (0 - unsigned, 1 - signed) */
BPF_CORE_FIELD_LSHIFT_U64 = 4, /* bitfield-specific left bitshift */
BPF_CORE_FIELD_RSHIFT_U64 = 5, /* bitfield-specific right bitshift */
BPF_CORE_TYPE_ID_LOCAL = 6, /* type ID in local BPF object */
BPF_CORE_TYPE_ID_TARGET = 7, /* type ID in target kernel */
BPF_CORE_TYPE_EXISTS = 8, /* type existence in target kernel */
BPF_CORE_TYPE_SIZE = 9, /* type size in bytes */
BPF_CORE_ENUMVAL_EXISTS = 10, /* enum value existence in target kernel */
BPF_CORE_ENUMVAL_VALUE = 11, /* enum value integer value */
};
/*
* "struct bpf_core_relo" is used to pass relocation data form LLVM to libbpf
* and from libbpf to the kernel.
*
* CO-RE relocation captures the following data:
* - insn_off - instruction offset (in bytes) within a BPF program that needs
* its insn->imm field to be relocated with actual field info;
* - type_id - BTF type ID of the "root" (containing) entity of a relocatable
* type or field;
* - access_str_off - offset into corresponding .BTF string section. String
* interpretation depends on specific relocation kind:
* - for field-based relocations, string encodes an accessed field using
* a sequence of field and array indices, separated by colon (:). It's
* conceptually very close to LLVM's getelementptr ([0]) instruction's
* arguments for identifying offset to a field.
* - for type-based relocations, strings is expected to be just "0";
* - for enum value-based relocations, string contains an index of enum
* value within its enum type;
* - kind - one of enum bpf_core_relo_kind;
*
* Example:
* struct sample {
* int a;
* struct {
* int b[10];
* };
* };
*
* struct sample *s = ...;
* int *x = &s->a; // encoded as "0:0" (a is field #0)
* int *y = &s->b[5]; // encoded as "0:1:0:5" (anon struct is field #1,
* // b is field #0 inside anon struct, accessing elem #5)
* int *z = &s[10]->b; // encoded as "10:1" (ptr is used as an array)
*
* type_id for all relocs in this example will capture BTF type id of
* `struct sample`.
*
* Such relocation is emitted when using __builtin_preserve_access_index()
* Clang built-in, passing expression that captures field address, e.g.:
*
* bpf_probe_read(&dst, sizeof(dst),
* __builtin_preserve_access_index(&src->a.b.c));
*
* In this case Clang will emit field relocation recording necessary data to
* be able to find offset of embedded `a.b.c` field within `src` struct.
*
* [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction
*/
struct bpf_core_relo {
__u32 insn_off;
__u32 type_id;
__u32 access_str_off;
enum bpf_core_relo_kind kind;
};
#endif /* _UAPI__LINUX_BPF_H__ */

View File

@ -36,3 +36,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
obj-${CONFIG_BPF_LSM} += bpf_lsm.o
endif
obj-$(CONFIG_BPF_PRELOAD) += preload/
obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
$(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
$(call if_changed_rule,cc_o_c)

View File

@ -714,3 +714,38 @@ const struct bpf_func_proto bpf_for_each_map_elem_proto = {
.arg3_type = ARG_PTR_TO_STACK_OR_NULL,
.arg4_type = ARG_ANYTHING,
};
/* maximum number of loops */
#define MAX_LOOPS BIT(23)
BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
u64, flags)
{
bpf_callback_t callback = (bpf_callback_t)callback_fn;
u64 ret;
u32 i;
if (flags)
return -EINVAL;
if (nr_loops > MAX_LOOPS)
return -E2BIG;
for (i = 0; i < nr_loops; i++) {
ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
/* return value: 0 - continue, 1 - stop and return */
if (ret)
return i + 1;
}
return i;
}
const struct bpf_func_proto bpf_loop_proto = {
.func = bpf_loop,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_ANYTHING,
.arg2_type = ARG_PTR_TO_FUNC,
.arg3_type = ARG_PTR_TO_STACK_OR_NULL,
.arg4_type = ARG_ANYTHING,
};

View File

@ -165,7 +165,7 @@ void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log)
break;
}
if (btf_member_bitfield_size(t, member)) {
if (__btf_member_bitfield_size(t, member)) {
pr_warn("bit field member %s in struct %s is not supported\n",
mname, st_ops->name);
break;
@ -296,7 +296,7 @@ static int check_zero_holes(const struct btf_type *t, void *data)
const struct btf_type *mtype;
for_each_member(i, t, member) {
moff = btf_member_bit_offset(t, member) / 8;
moff = __btf_member_bit_offset(t, member) / 8;
if (moff > prev_mend &&
memchr_inv(data + prev_mend, 0, moff - prev_mend))
return -EINVAL;
@ -387,7 +387,7 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
struct bpf_prog *prog;
u32 moff;
moff = btf_member_bit_offset(t, member) / 8;
moff = __btf_member_bit_offset(t, member) / 8;
ptype = btf_type_resolve_ptr(btf_vmlinux, member->type, NULL);
if (ptype == module_type) {
if (*(void **)(udata + moff))

View File

@ -25,6 +25,7 @@
#include <linux/kobject.h>
#include <linux/sysfs.h>
#include <net/sock.h>
#include "../tools/lib/bpf/relo_core.h"
/* BTF (BPF Type Format) is the meta data format which describes
* the data types of BPF program/map. Hence, it basically focus
@ -836,7 +837,7 @@ static const char *btf_show_name(struct btf_show *show)
const char *ptr_suffix = &ptr_suffixes[strlen(ptr_suffixes)];
const char *name = NULL, *prefix = "", *parens = "";
const struct btf_member *m = show->state.member;
const struct btf_type *t = show->state.type;
const struct btf_type *t;
const struct btf_array *array;
u32 id = show->state.type_id;
const char *member = NULL;
@ -2969,7 +2970,7 @@ static s32 btf_struct_check_meta(struct btf_verifier_env *env,
return -EINVAL;
}
offset = btf_member_bit_offset(t, member);
offset = __btf_member_bit_offset(t, member);
if (is_union && offset) {
btf_verifier_log_member(env, t, member,
"Invalid member bits_offset");
@ -3094,7 +3095,7 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
if (off != -ENOENT)
/* only one such field is allowed */
return -E2BIG;
off = btf_member_bit_offset(t, member);
off = __btf_member_bit_offset(t, member);
if (off % 8)
/* valid C code cannot generate such BTF */
return -EINVAL;
@ -3184,8 +3185,8 @@ static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
btf_show_start_member(show, member);
member_offset = btf_member_bit_offset(t, member);
bitfield_size = btf_member_bitfield_size(t, member);
member_offset = __btf_member_bit_offset(t, member);
bitfield_size = __btf_member_bitfield_size(t, member);
bytes_offset = BITS_ROUNDDOWN_BYTES(member_offset);
bits8_offset = BITS_PER_BYTE_MASKED(member_offset);
if (bitfield_size) {
@ -4472,8 +4473,7 @@ static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size,
log->len_total = log_size;
/* log attributes have to be sane */
if (log->len_total < 128 || log->len_total > UINT_MAX >> 8 ||
!log->level || !log->ubuf) {
if (!bpf_verifier_log_attr_valid(log)) {
err = -EINVAL;
goto errout;
}
@ -5060,7 +5060,7 @@ again:
if (array_elem->nelems != 0)
goto error;
moff = btf_member_bit_offset(t, member) / 8;
moff = __btf_member_bit_offset(t, member) / 8;
if (off < moff)
goto error;
@ -5083,14 +5083,14 @@ error:
for_each_member(i, t, member) {
/* offset of the field in bytes */
moff = btf_member_bit_offset(t, member) / 8;
moff = __btf_member_bit_offset(t, member) / 8;
if (off + size <= moff)
/* won't find anything, field is already too far */
break;
if (btf_member_bitfield_size(t, member)) {
u32 end_bit = btf_member_bit_offset(t, member) +
btf_member_bitfield_size(t, member);
if (__btf_member_bitfield_size(t, member)) {
u32 end_bit = __btf_member_bit_offset(t, member) +
__btf_member_bitfield_size(t, member);
/* off <= moff instead of off == moff because clang
* does not generate a BTF member for anonymous
@ -6169,6 +6169,8 @@ btf_module_read(struct file *file, struct kobject *kobj,
return len;
}
static void purge_cand_cache(struct btf *btf);
static int btf_module_notify(struct notifier_block *nb, unsigned long op,
void *module)
{
@ -6203,6 +6205,7 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
goto out;
}
purge_cand_cache(NULL);
mutex_lock(&btf_module_mutex);
btf_mod->module = module;
btf_mod->btf = btf;
@ -6245,6 +6248,7 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
list_del(&btf_mod->list);
if (btf_mod->sysfs_attr)
sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
purge_cand_cache(btf_mod->btf);
btf_put(btf_mod->btf);
kfree(btf_mod->sysfs_attr);
kfree(btf_mod);
@ -6406,3 +6410,385 @@ DEFINE_KFUNC_BTF_ID_LIST(bpf_tcp_ca_kfunc_list);
DEFINE_KFUNC_BTF_ID_LIST(prog_test_kfunc_list);
#endif
int bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id,
const struct btf *targ_btf, __u32 targ_id)
{
return -EOPNOTSUPP;
}
static bool bpf_core_is_flavor_sep(const char *s)
{
/* check X___Y name pattern, where X and Y are not underscores */
return s[0] != '_' && /* X */
s[1] == '_' && s[2] == '_' && s[3] == '_' && /* ___ */
s[4] != '_'; /* Y */
}
size_t bpf_core_essential_name_len(const char *name)
{
size_t n = strlen(name);
int i;
for (i = n - 5; i >= 0; i--) {
if (bpf_core_is_flavor_sep(name + i))
return i + 1;
}
return n;
}
struct bpf_cand_cache {
const char *name;
u32 name_len;
u16 kind;
u16 cnt;
struct {
const struct btf *btf;
u32 id;
} cands[];
};
static void bpf_free_cands(struct bpf_cand_cache *cands)
{
if (!cands->cnt)
/* empty candidate array was allocated on stack */
return;
kfree(cands);
}
static void bpf_free_cands_from_cache(struct bpf_cand_cache *cands)
{
kfree(cands->name);
kfree(cands);
}
#define VMLINUX_CAND_CACHE_SIZE 31
static struct bpf_cand_cache *vmlinux_cand_cache[VMLINUX_CAND_CACHE_SIZE];
#define MODULE_CAND_CACHE_SIZE 31
static struct bpf_cand_cache *module_cand_cache[MODULE_CAND_CACHE_SIZE];
static DEFINE_MUTEX(cand_cache_mutex);
static void __print_cand_cache(struct bpf_verifier_log *log,
struct bpf_cand_cache **cache,
int cache_size)
{
struct bpf_cand_cache *cc;
int i, j;
for (i = 0; i < cache_size; i++) {
cc = cache[i];
if (!cc)
continue;
bpf_log(log, "[%d]%s(", i, cc->name);
for (j = 0; j < cc->cnt; j++) {
bpf_log(log, "%d", cc->cands[j].id);
if (j < cc->cnt - 1)
bpf_log(log, " ");
}
bpf_log(log, "), ");
}
}
static void print_cand_cache(struct bpf_verifier_log *log)
{
mutex_lock(&cand_cache_mutex);
bpf_log(log, "vmlinux_cand_cache:");
__print_cand_cache(log, vmlinux_cand_cache, VMLINUX_CAND_CACHE_SIZE);
bpf_log(log, "\nmodule_cand_cache:");
__print_cand_cache(log, module_cand_cache, MODULE_CAND_CACHE_SIZE);
bpf_log(log, "\n");
mutex_unlock(&cand_cache_mutex);
}
static u32 hash_cands(struct bpf_cand_cache *cands)
{
return jhash(cands->name, cands->name_len, 0);
}
static struct bpf_cand_cache *check_cand_cache(struct bpf_cand_cache *cands,
struct bpf_cand_cache **cache,
int cache_size)
{
struct bpf_cand_cache *cc = cache[hash_cands(cands) % cache_size];
if (cc && cc->name_len == cands->name_len &&
!strncmp(cc->name, cands->name, cands->name_len))
return cc;
return NULL;
}
static size_t sizeof_cands(int cnt)
{
return offsetof(struct bpf_cand_cache, cands[cnt]);
}
static struct bpf_cand_cache *populate_cand_cache(struct bpf_cand_cache *cands,
struct bpf_cand_cache **cache,
int cache_size)
{
struct bpf_cand_cache **cc = &cache[hash_cands(cands) % cache_size], *new_cands;
if (*cc) {
bpf_free_cands_from_cache(*cc);
*cc = NULL;
}
new_cands = kmalloc(sizeof_cands(cands->cnt), GFP_KERNEL);
if (!new_cands) {
bpf_free_cands(cands);
return ERR_PTR(-ENOMEM);
}
memcpy(new_cands, cands, sizeof_cands(cands->cnt));
/* strdup the name, since it will stay in cache.
* the cands->name points to strings in prog's BTF and the prog can be unloaded.
*/
new_cands->name = kmemdup_nul(cands->name, cands->name_len, GFP_KERNEL);
bpf_free_cands(cands);
if (!new_cands->name) {
kfree(new_cands);
return ERR_PTR(-ENOMEM);
}
*cc = new_cands;
return new_cands;
}
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
static void __purge_cand_cache(struct btf *btf, struct bpf_cand_cache **cache,
int cache_size)
{
struct bpf_cand_cache *cc;
int i, j;
for (i = 0; i < cache_size; i++) {
cc = cache[i];
if (!cc)
continue;
if (!btf) {
/* when new module is loaded purge all of module_cand_cache,
* since new module might have candidates with the name
* that matches cached cands.
*/
bpf_free_cands_from_cache(cc);
cache[i] = NULL;
continue;
}
/* when module is unloaded purge cache entries
* that match module's btf
*/
for (j = 0; j < cc->cnt; j++)
if (cc->cands[j].btf == btf) {
bpf_free_cands_from_cache(cc);
cache[i] = NULL;
break;
}
}
}
static void purge_cand_cache(struct btf *btf)
{
mutex_lock(&cand_cache_mutex);
__purge_cand_cache(btf, module_cand_cache, MODULE_CAND_CACHE_SIZE);
mutex_unlock(&cand_cache_mutex);
}
#endif
static struct bpf_cand_cache *
bpf_core_add_cands(struct bpf_cand_cache *cands, const struct btf *targ_btf,
int targ_start_id)
{
struct bpf_cand_cache *new_cands;
const struct btf_type *t;
const char *targ_name;
size_t targ_essent_len;
int n, i;
n = btf_nr_types(targ_btf);
for (i = targ_start_id; i < n; i++) {
t = btf_type_by_id(targ_btf, i);
if (btf_kind(t) != cands->kind)
continue;
targ_name = btf_name_by_offset(targ_btf, t->name_off);
if (!targ_name)
continue;
/* the resched point is before strncmp to make sure that search
* for non-existing name will have a chance to schedule().
*/
cond_resched();
if (strncmp(cands->name, targ_name, cands->name_len) != 0)
continue;
targ_essent_len = bpf_core_essential_name_len(targ_name);
if (targ_essent_len != cands->name_len)
continue;
/* most of the time there is only one candidate for a given kind+name pair */
new_cands = kmalloc(sizeof_cands(cands->cnt + 1), GFP_KERNEL);
if (!new_cands) {
bpf_free_cands(cands);
return ERR_PTR(-ENOMEM);
}
memcpy(new_cands, cands, sizeof_cands(cands->cnt));
bpf_free_cands(cands);
cands = new_cands;
cands->cands[cands->cnt].btf = targ_btf;
cands->cands[cands->cnt].id = i;
cands->cnt++;
}
return cands;
}
static struct bpf_cand_cache *
bpf_core_find_cands(struct bpf_core_ctx *ctx, u32 local_type_id)
{
struct bpf_cand_cache *cands, *cc, local_cand = {};
const struct btf *local_btf = ctx->btf;
const struct btf_type *local_type;
const struct btf *main_btf;
size_t local_essent_len;
struct btf *mod_btf;
const char *name;
int id;
main_btf = bpf_get_btf_vmlinux();
if (IS_ERR(main_btf))
return (void *)main_btf;
local_type = btf_type_by_id(local_btf, local_type_id);
if (!local_type)
return ERR_PTR(-EINVAL);
name = btf_name_by_offset(local_btf, local_type->name_off);
if (str_is_empty(name))
return ERR_PTR(-EINVAL);
local_essent_len = bpf_core_essential_name_len(name);
cands = &local_cand;
cands->name = name;
cands->kind = btf_kind(local_type);
cands->name_len = local_essent_len;
cc = check_cand_cache(cands, vmlinux_cand_cache, VMLINUX_CAND_CACHE_SIZE);
/* cands is a pointer to stack here */
if (cc) {
if (cc->cnt)
return cc;
goto check_modules;
}
/* Attempt to find target candidates in vmlinux BTF first */
cands = bpf_core_add_cands(cands, main_btf, 1);
if (IS_ERR(cands))
return cands;
/* cands is a pointer to kmalloced memory here if cands->cnt > 0 */
/* populate cache even when cands->cnt == 0 */
cc = populate_cand_cache(cands, vmlinux_cand_cache, VMLINUX_CAND_CACHE_SIZE);
if (IS_ERR(cc))
return cc;
/* if vmlinux BTF has any candidate, don't go for module BTFs */
if (cc->cnt)
return cc;
check_modules:
/* cands is a pointer to stack here and cands->cnt == 0 */
cc = check_cand_cache(cands, module_cand_cache, MODULE_CAND_CACHE_SIZE);
if (cc)
/* if cache has it return it even if cc->cnt == 0 */
return cc;
/* If candidate is not found in vmlinux's BTF then search in module's BTFs */
spin_lock_bh(&btf_idr_lock);
idr_for_each_entry(&btf_idr, mod_btf, id) {
if (!btf_is_module(mod_btf))
continue;
/* linear search could be slow hence unlock/lock
* the IDR to avoiding holding it for too long
*/
btf_get(mod_btf);
spin_unlock_bh(&btf_idr_lock);
cands = bpf_core_add_cands(cands, mod_btf, btf_nr_types(main_btf));
if (IS_ERR(cands)) {
btf_put(mod_btf);
return cands;
}
spin_lock_bh(&btf_idr_lock);
btf_put(mod_btf);
}
spin_unlock_bh(&btf_idr_lock);
/* cands is a pointer to kmalloced memory here if cands->cnt > 0
* or pointer to stack if cands->cnd == 0.
* Copy it into the cache even when cands->cnt == 0 and
* return the result.
*/
return populate_cand_cache(cands, module_cand_cache, MODULE_CAND_CACHE_SIZE);
}
int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
int relo_idx, void *insn)
{
bool need_cands = relo->kind != BPF_CORE_TYPE_ID_LOCAL;
struct bpf_core_cand_list cands = {};
struct bpf_core_spec *specs;
int err;
/* ~4k of temp memory necessary to convert LLVM spec like "0:1:0:5"
* into arrays of btf_ids of struct fields and array indices.
*/
specs = kcalloc(3, sizeof(*specs), GFP_KERNEL);
if (!specs)
return -ENOMEM;
if (need_cands) {
struct bpf_cand_cache *cc;
int i;
mutex_lock(&cand_cache_mutex);
cc = bpf_core_find_cands(ctx, relo->type_id);
if (IS_ERR(cc)) {
bpf_log(ctx->log, "target candidate search failed for %d\n",
relo->type_id);
err = PTR_ERR(cc);
goto out;
}
if (cc->cnt) {
cands.cands = kcalloc(cc->cnt, sizeof(*cands.cands), GFP_KERNEL);
if (!cands.cands) {
err = -ENOMEM;
goto out;
}
}
for (i = 0; i < cc->cnt; i++) {
bpf_log(ctx->log,
"CO-RE relocating %s %s: found target candidate [%d]\n",
btf_kind_str[cc->kind], cc->name, cc->cands[i].id);
cands.cands[i].btf = cc->cands[i].btf;
cands.cands[i].id = cc->cands[i].id;
}
cands.len = cc->cnt;
/* cand_cache_mutex needs to span the cache lookup and
* copy of btf pointer into bpf_core_cand_list,
* since module can be unloaded while bpf_core_apply_relo_insn
* is working with module's btf.
*/
}
err = bpf_core_apply_relo_insn((void *)ctx->log, insn, relo->insn_off / 8,
relo, relo_idx, ctx->btf, &cands, specs);
out:
kfree(specs);
if (need_cands) {
kfree(cands.cands);
mutex_unlock(&cand_cache_mutex);
if (ctx->log->level & BPF_LOG_LEVEL2)
print_cand_cache(ctx->log);
}
return err;
}

View File

@ -1574,7 +1574,8 @@ select_insn:
if (unlikely(index >= array->map.max_entries))
goto out;
if (unlikely(tail_call_cnt > MAX_TAIL_CALL_CNT))
if (unlikely(tail_call_cnt >= MAX_TAIL_CALL_CNT))
goto out;
tail_call_cnt++;
@ -1891,7 +1892,7 @@ static void bpf_prog_select_func(struct bpf_prog *fp)
/**
* bpf_prog_select_runtime - select exec runtime for BPF program
* @fp: bpf_prog populated with internal BPF program
* @fp: bpf_prog populated with BPF program
* @err: pointer to error variable
*
* Try to JIT eBPF program, if JIT is not available, use interpreter.
@ -2300,7 +2301,6 @@ static void bpf_prog_free_deferred(struct work_struct *work)
}
}
/* Free internal BPF program */
void bpf_prog_free(struct bpf_prog *fp)
{
struct bpf_prog_aux *aux = fp->aux;

View File

@ -1376,6 +1376,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_ringbuf_query_proto;
case BPF_FUNC_for_each_map_elem:
return &bpf_for_each_map_elem_proto;
case BPF_FUNC_loop:
return &bpf_loop_proto;
default:
break;
}

View File

@ -2198,7 +2198,7 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
}
/* last field in 'union bpf_attr' used by this command */
#define BPF_PROG_LOAD_LAST_FIELD fd_array
#define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size
static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
{
@ -4819,7 +4819,7 @@ const struct bpf_func_proto bpf_kallsyms_lookup_name_proto = {
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_MEM,
.arg2_type = ARG_CONST_SIZE,
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
.arg3_type = ARG_ANYTHING,
.arg4_type = ARG_PTR_TO_LONG,
};

View File

@ -293,13 +293,15 @@ void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
WARN_ONCE(n >= BPF_VERIFIER_TMP_LOG_SIZE - 1,
"verifier log line truncated - local buffer too short\n");
n = min(log->len_total - log->len_used - 1, n);
log->kbuf[n] = '\0';
if (log->level == BPF_LOG_KERNEL) {
pr_err("BPF:%s\n", log->kbuf);
bool newline = n > 0 && log->kbuf[n - 1] == '\n';
pr_err("BPF: %s%s", log->kbuf, newline ? "" : "\n");
return;
}
n = min(log->len_total - log->len_used - 1, n);
log->kbuf[n] = '\0';
if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1))
log->len_used += n;
else
@ -6101,6 +6103,27 @@ static int set_map_elem_callback_state(struct bpf_verifier_env *env,
return 0;
}
static int set_loop_callback_state(struct bpf_verifier_env *env,
struct bpf_func_state *caller,
struct bpf_func_state *callee,
int insn_idx)
{
/* bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx,
* u64 flags);
* callback_fn(u32 index, void *callback_ctx);
*/
callee->regs[BPF_REG_1].type = SCALAR_VALUE;
callee->regs[BPF_REG_2] = caller->regs[BPF_REG_3];
/* unused */
__mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
callee->in_callback_fn = true;
return 0;
}
static int set_timer_callback_state(struct bpf_verifier_env *env,
struct bpf_func_state *caller,
struct bpf_func_state *callee,
@ -6474,13 +6497,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
return err;
}
if (func_id == BPF_FUNC_tail_call) {
err = check_reference_leak(env);
if (err) {
verbose(env, "tail_call would lead to reference leak\n");
return err;
}
} else if (is_release_function(func_id)) {
if (is_release_function(func_id)) {
err = release_reference(env, meta.ref_obj_id);
if (err) {
verbose(env, "func %s#%d reference has not been acquired before\n",
@ -6491,41 +6508,46 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
regs = cur_regs(env);
/* check that flags argument in get_local_storage(map, flags) is 0,
* this is required because get_local_storage() can't return an error.
*/
if (func_id == BPF_FUNC_get_local_storage &&
!register_is_null(&regs[BPF_REG_2])) {
verbose(env, "get_local_storage() doesn't support non-zero flags\n");
return -EINVAL;
}
if (func_id == BPF_FUNC_for_each_map_elem) {
switch (func_id) {
case BPF_FUNC_tail_call:
err = check_reference_leak(env);
if (err) {
verbose(env, "tail_call would lead to reference leak\n");
return err;
}
break;
case BPF_FUNC_get_local_storage:
/* check that flags argument in get_local_storage(map, flags) is 0,
* this is required because get_local_storage() can't return an error.
*/
if (!register_is_null(&regs[BPF_REG_2])) {
verbose(env, "get_local_storage() doesn't support non-zero flags\n");
return -EINVAL;
}
break;
case BPF_FUNC_for_each_map_elem:
err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
set_map_elem_callback_state);
if (err < 0)
return -EINVAL;
}
if (func_id == BPF_FUNC_timer_set_callback) {
break;
case BPF_FUNC_timer_set_callback:
err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
set_timer_callback_state);
if (err < 0)
return -EINVAL;
}
if (func_id == BPF_FUNC_find_vma) {
break;
case BPF_FUNC_find_vma:
err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
set_find_vma_callback_state);
if (err < 0)
return -EINVAL;
break;
case BPF_FUNC_snprintf:
err = check_bpf_snprintf_call(env, regs);
break;
case BPF_FUNC_loop:
err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
set_loop_callback_state);
break;
}
if (func_id == BPF_FUNC_snprintf) {
err = check_bpf_snprintf_call(env, regs);
if (err < 0)
return err;
}
if (err)
return err;
/* reset caller saved regs */
for (i = 0; i < CALLER_SAVED_REGS; i++) {
@ -10267,6 +10289,78 @@ err_free:
return err;
}
#define MIN_CORE_RELO_SIZE sizeof(struct bpf_core_relo)
#define MAX_CORE_RELO_SIZE MAX_FUNCINFO_REC_SIZE
static int check_core_relo(struct bpf_verifier_env *env,
const union bpf_attr *attr,
bpfptr_t uattr)
{
u32 i, nr_core_relo, ncopy, expected_size, rec_size;
struct bpf_core_relo core_relo = {};
struct bpf_prog *prog = env->prog;
const struct btf *btf = prog->aux->btf;
struct bpf_core_ctx ctx = {
.log = &env->log,
.btf = btf,
};
bpfptr_t u_core_relo;
int err;
nr_core_relo = attr->core_relo_cnt;
if (!nr_core_relo)
return 0;
if (nr_core_relo > INT_MAX / sizeof(struct bpf_core_relo))
return -EINVAL;
rec_size = attr->core_relo_rec_size;
if (rec_size < MIN_CORE_RELO_SIZE ||
rec_size > MAX_CORE_RELO_SIZE ||
rec_size % sizeof(u32))
return -EINVAL;
u_core_relo = make_bpfptr(attr->core_relos, uattr.is_kernel);
expected_size = sizeof(struct bpf_core_relo);
ncopy = min_t(u32, expected_size, rec_size);
/* Unlike func_info and line_info, copy and apply each CO-RE
* relocation record one at a time.
*/
for (i = 0; i < nr_core_relo; i++) {
/* future proofing when sizeof(bpf_core_relo) changes */
err = bpf_check_uarg_tail_zero(u_core_relo, expected_size, rec_size);
if (err) {
if (err == -E2BIG) {
verbose(env, "nonzero tailing record in core_relo");
if (copy_to_bpfptr_offset(uattr,
offsetof(union bpf_attr, core_relo_rec_size),
&expected_size, sizeof(expected_size)))
err = -EFAULT;
}
break;
}
if (copy_from_bpfptr(&core_relo, u_core_relo, ncopy)) {
err = -EFAULT;
break;
}
if (core_relo.insn_off % 8 || core_relo.insn_off / 8 >= prog->len) {
verbose(env, "Invalid core_relo[%u].insn_off:%u prog->len:%u\n",
i, core_relo.insn_off, prog->len);
err = -EINVAL;
break;
}
err = bpf_core_apply(&ctx, &core_relo, i,
&prog->insnsi[core_relo.insn_off / 8]);
if (err)
break;
bpfptr_add(&u_core_relo, rec_size);
}
return err;
}
static int check_btf_info(struct bpf_verifier_env *env,
const union bpf_attr *attr,
bpfptr_t uattr)
@ -10297,6 +10391,10 @@ static int check_btf_info(struct bpf_verifier_env *env,
if (err)
return err;
err = check_core_relo(env, attr, uattr);
if (err)
return err;
return 0;
}
@ -13975,11 +14073,11 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr)
log->ubuf = (char __user *) (unsigned long) attr->log_buf;
log->len_total = attr->log_size;
ret = -EINVAL;
/* log attributes have to be sane */
if (log->len_total < 128 || log->len_total > UINT_MAX >> 2 ||
!log->level || !log->ubuf || log->level & ~BPF_LOG_MASK)
if (!bpf_verifier_log_attr_valid(log)) {
ret = -EINVAL;
goto err_unlock;
}
}
if (IS_ERR(btf_vmlinux)) {

View File

@ -1402,9 +1402,6 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
BPF_CALL_4(bpf_read_branch_records, struct bpf_perf_event_data_kern *, ctx,
void *, buf, u32, size, u64, flags)
{
#ifndef CONFIG_X86
return -ENOENT;
#else
static const u32 br_entry_size = sizeof(struct perf_branch_entry);
struct perf_branch_stack *br_stack = ctx->data->br_stack;
u32 to_copy;
@ -1413,7 +1410,7 @@ BPF_CALL_4(bpf_read_branch_records, struct bpf_perf_event_data_kern *, ctx,
return -EINVAL;
if (unlikely(!br_stack))
return -EINVAL;
return -ENOENT;
if (flags & BPF_F_GET_BRANCH_RECORDS_SIZE)
return br_stack->nr * br_entry_size;
@ -1425,7 +1422,6 @@ BPF_CALL_4(bpf_read_branch_records, struct bpf_perf_event_data_kern *, ctx,
memcpy(buf, br_stack->entries, to_copy);
return to_copy;
#endif
}
static const struct bpf_func_proto bpf_read_branch_records_proto = {

View File

@ -14683,7 +14683,7 @@ static struct tail_call_test tail_call_tests[] = {
BPF_EXIT_INSN(),
},
.flags = FLAG_NEED_STATE | FLAG_RESULT_IN_STATE,
.result = (MAX_TAIL_CALL_CNT + 1 + 1) * MAX_TESTRUNS,
.result = (MAX_TAIL_CALL_CNT + 1) * MAX_TESTRUNS,
},
{
"Tail call count preserved across function calls",
@ -14705,7 +14705,7 @@ static struct tail_call_test tail_call_tests[] = {
},
.stack_depth = 8,
.flags = FLAG_NEED_STATE | FLAG_RESULT_IN_STATE,
.result = (MAX_TAIL_CALL_CNT + 1 + 1) * MAX_TESTRUNS,
.result = (MAX_TAIL_CALL_CNT + 1) * MAX_TESTRUNS,
},
{
"Tail call error path, NULL target",

View File

@ -1242,10 +1242,9 @@ static struct bpf_prog *bpf_migrate_filter(struct bpf_prog *fp)
int err, new_len, old_len = fp->len;
bool seen_ld_abs = false;
/* We are free to overwrite insns et al right here as it
* won't be used at this point in time anymore internally
* after the migration to the internal BPF instruction
* representation.
/* We are free to overwrite insns et al right here as it won't be used at
* this point in time anymore internally after the migration to the eBPF
* instruction representation.
*/
BUILD_BUG_ON(sizeof(struct sock_filter) !=
sizeof(struct bpf_insn));
@ -1336,8 +1335,8 @@ static struct bpf_prog *bpf_prepare_filter(struct bpf_prog *fp,
*/
bpf_jit_compile(fp);
/* JIT compiler couldn't process this filter, so do the
* internal BPF translation for the optimized interpreter.
/* JIT compiler couldn't process this filter, so do the eBPF translation
* for the optimized interpreter.
*/
if (!fp->jited)
fp = bpf_migrate_filter(fp);

View File

@ -169,7 +169,7 @@ static u32 prog_ops_moff(const struct bpf_prog *prog)
t = bpf_tcp_congestion_ops.type;
m = &btf_type_member(t)[midx];
return btf_member_bit_offset(t, m) / 8;
return __btf_member_bit_offset(t, m) / 8;
}
static const struct bpf_func_proto *
@ -246,7 +246,7 @@ static int bpf_tcp_ca_init_member(const struct btf_type *t,
utcp_ca = (const struct tcp_congestion_ops *)udata;
tcp_ca = (struct tcp_congestion_ops *)kdata;
moff = btf_member_bit_offset(t, member) / 8;
moff = __btf_member_bit_offset(t, member) / 8;
switch (moff) {
case offsetof(struct tcp_congestion_ops, flags):
if (utcp_ca->flags & ~TCP_CONG_MASK)
@ -276,7 +276,7 @@ static int bpf_tcp_ca_init_member(const struct btf_type *t,
static int bpf_tcp_ca_check_member(const struct btf_type *t,
const struct btf_member *member)
{
if (is_unsupported(btf_member_bit_offset(t, member) / 8))
if (is_unsupported(__btf_member_bit_offset(t, member) / 8))
return -ENOTSUPP;
return 0;
}

View File

@ -215,6 +215,11 @@ TPROGS_LDFLAGS := -L$(SYSROOT)/usr/lib
endif
TPROGS_LDLIBS += $(LIBBPF) -lelf -lz
TPROGLDLIBS_xdp_monitor += -lm
TPROGLDLIBS_xdp_redirect += -lm
TPROGLDLIBS_xdp_redirect_cpu += -lm
TPROGLDLIBS_xdp_redirect_map += -lm
TPROGLDLIBS_xdp_redirect_map_multi += -lm
TPROGLDLIBS_tracex4 += -lrt
TPROGLDLIBS_trace_output += -lrt
TPROGLDLIBS_map_perf_test += -lrt
@ -328,7 +333,7 @@ $(BPF_SAMPLES_PATH)/*.c: verify_target_bpf $(LIBBPF)
$(src)/*.c: verify_target_bpf $(LIBBPF)
libbpf_hdrs: $(LIBBPF)
$(obj)/$(TRACE_HELPERS): | libbpf_hdrs
$(obj)/$(TRACE_HELPERS) $(obj)/$(CGROUP_HELPERS) $(obj)/$(XDP_SAMPLE): | libbpf_hdrs
.PHONY: libbpf_hdrs
@ -343,6 +348,17 @@ $(obj)/hbm_out_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
$(obj)/hbm.o: $(src)/hbm.h
$(obj)/hbm_edt_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
# Override includes for xdp_sample_user.o because $(srctree)/usr/include in
# TPROGS_CFLAGS causes conflicts
XDP_SAMPLE_CFLAGS += -Wall -O2 \
-I$(src)/../../tools/include \
-I$(src)/../../tools/include/uapi \
-I$(LIBBPF_INCLUDE) \
-I$(src)/../../tools/testing/selftests/bpf
$(obj)/$(XDP_SAMPLE): TPROGS_CFLAGS = $(XDP_SAMPLE_CFLAGS)
$(obj)/$(XDP_SAMPLE): $(src)/xdp_sample_user.h $(src)/xdp_sample_shared.h
-include $(BPF_SAMPLES_PATH)/Makefile.target
VMLINUX_BTF_PATHS ?= $(abspath $(if $(O),$(O)/vmlinux)) \

View File

@ -73,14 +73,3 @@ quiet_cmd_tprog-cobjs = CC $@
cmd_tprog-cobjs = $(CC) $(tprogc_flags) -c -o $@ $<
$(tprog-cobjs): $(obj)/%.o: $(src)/%.c FORCE
$(call if_changed_dep,tprog-cobjs)
# Override includes for xdp_sample_user.o because $(srctree)/usr/include in
# TPROGS_CFLAGS causes conflicts
XDP_SAMPLE_CFLAGS += -Wall -O2 -lm \
-I./tools/include \
-I./tools/include/uapi \
-I./tools/lib \
-I./tools/testing/selftests/bpf
$(obj)/xdp_sample_user.o: $(src)/xdp_sample_user.c \
$(src)/xdp_sample_user.h $(src)/xdp_sample_shared.h
$(CC) $(XDP_SAMPLE_CFLAGS) -c -o $@ $<

View File

@ -67,8 +67,8 @@ static bool test_finish;
static void maps_create(void)
{
map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(uint32_t),
sizeof(struct stats), 100, 0);
map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(uint32_t),
sizeof(struct stats), 100, NULL);
if (map_fd < 0)
error(1, errno, "map create failed!\n");
}
@ -157,9 +157,13 @@ static void prog_load(void)
offsetof(struct __sk_buff, len)),
BPF_EXIT_INSN(),
};
prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog,
ARRAY_SIZE(prog), "GPL", 0,
log_buf, sizeof(log_buf));
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.log_buf = log_buf,
.log_size = sizeof(log_buf),
);
prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL",
prog, ARRAY_SIZE(prog), &opts);
if (prog_fd < 0)
error(1, errno, "failed to load prog\n%s\n", log_buf);
}

View File

@ -46,12 +46,6 @@ static void usage(void)
printf(" -h Display this help.\n");
}
static int bpf_map_create(void)
{
return bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(uint32_t),
sizeof(uint32_t), 1024, 0);
}
static int bpf_prog_create(const char *object)
{
static struct bpf_insn insns[] = {
@ -60,16 +54,22 @@ static int bpf_prog_create(const char *object)
};
size_t insns_cnt = sizeof(insns) / sizeof(struct bpf_insn);
struct bpf_object *obj;
int prog_fd;
int err;
if (object) {
assert(!bpf_prog_load(object, BPF_PROG_TYPE_UNSPEC,
&obj, &prog_fd));
return prog_fd;
obj = bpf_object__open_file(object, NULL);
assert(!libbpf_get_error(obj));
err = bpf_object__load(obj);
assert(!err);
return bpf_program__fd(bpf_object__next_program(obj, NULL));
} else {
return bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER,
insns, insns_cnt, "GPL", 0,
bpf_log_buf, BPF_LOG_BUF_SIZE);
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.log_buf = bpf_log_buf,
.log_size = BPF_LOG_BUF_SIZE,
);
return bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL",
insns, insns_cnt, &opts);
}
}
@ -79,7 +79,8 @@ static int bpf_do_map(const char *file, uint32_t flags, uint32_t key,
int fd, ret;
if (flags & BPF_F_PIN) {
fd = bpf_map_create();
fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(uint32_t),
sizeof(uint32_t), 1024, NULL);
printf("bpf: map fd:%d (%s)\n", fd, strerror(errno));
assert(fd > 0);

View File

@ -16,13 +16,6 @@
#include <uapi/linux/in.h>
#include <bpf/bpf_helpers.h>
# define printk(fmt, ...) \
({ \
char ____fmt[] = fmt; \
bpf_trace_printk(____fmt, sizeof(____fmt), \
##__VA_ARGS__); \
})
struct bpf_elf_map {
__u32 type;
__u32 size_key;

View File

@ -134,19 +134,22 @@ static void do_test_lru(enum test_type test, int cpu)
*/
int outer_fd = map_fd[array_of_lru_hashs_idx];
unsigned int mycpu, mynode;
LIBBPF_OPTS(bpf_map_create_opts, opts,
.map_flags = BPF_F_NUMA_NODE,
);
assert(cpu < MAX_NR_CPUS);
ret = syscall(__NR_getcpu, &mycpu, &mynode, NULL);
assert(!ret);
opts.numa_node = mynode;
inner_lru_map_fds[cpu] =
bpf_create_map_node(BPF_MAP_TYPE_LRU_HASH,
test_map_names[INNER_LRU_HASH_PREALLOC],
sizeof(uint32_t),
sizeof(long),
inner_lru_hash_size, 0,
mynode);
bpf_map_create(BPF_MAP_TYPE_LRU_HASH,
test_map_names[INNER_LRU_HASH_PREALLOC],
sizeof(uint32_t),
sizeof(long),
inner_lru_hash_size, &opts);
if (inner_lru_map_fds[cpu] == -1) {
printf("cannot create BPF_MAP_TYPE_LRU_HASH %s(%d)\n",
strerror(errno), errno);

View File

@ -37,8 +37,8 @@ static int test_sock(void)
int sock = -1, map_fd, prog_fd, i, key;
long long value = 0, tcp_cnt, udp_cnt, icmp_cnt;
map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key), sizeof(value),
256, 0);
map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(key), sizeof(value),
256, NULL);
if (map_fd < 0) {
printf("failed to create map '%s'\n", strerror(errno));
goto cleanup;
@ -59,9 +59,13 @@ static int test_sock(void)
BPF_EXIT_INSN(),
};
size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.log_buf = bpf_log_buf,
.log_size = BPF_LOG_BUF_SIZE,
);
prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, insns_cnt,
"GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL",
prog, insns_cnt, &opts);
if (prog_fd < 0) {
printf("failed to load prog '%s'\n", strerror(errno));
goto cleanup;

View File

@ -11,17 +11,26 @@
int main(int ac, char **argv)
{
struct bpf_object *obj;
struct bpf_program *prog;
int map_fd, prog_fd;
char filename[256];
int i, sock;
int i, sock, err;
FILE *f;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
if (bpf_prog_load(filename, BPF_PROG_TYPE_SOCKET_FILTER,
&obj, &prog_fd))
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj))
return 1;
prog = bpf_object__next_program(obj, NULL);
bpf_program__set_type(prog, BPF_PROG_TYPE_SOCKET_FILTER);
err = bpf_object__load(obj);
if (err)
return 1;
prog_fd = bpf_program__fd(prog);
map_fd = bpf_object__find_map_fd_by_name(obj, "my_map");
sock = open_raw_sock("lo");

View File

@ -16,18 +16,26 @@ struct pair {
int main(int ac, char **argv)
{
struct bpf_program *prog;
struct bpf_object *obj;
int map_fd, prog_fd;
char filename[256];
int i, sock;
int i, sock, err;
FILE *f;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
if (bpf_prog_load(filename, BPF_PROG_TYPE_SOCKET_FILTER,
&obj, &prog_fd))
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj))
return 1;
prog = bpf_object__next_program(obj, NULL);
bpf_program__set_type(prog, BPF_PROG_TYPE_SOCKET_FILTER);
err = bpf_object__load(obj);
if (err)
return 1;
prog_fd = bpf_program__fd(prog);
map_fd = bpf_object__find_map_fd_by_name(obj, "hash_map");
sock = open_raw_sock("lo");

View File

@ -64,9 +64,9 @@ int main(int argc, char **argv)
}
if (create_array) {
array_fd = bpf_create_map(BPF_MAP_TYPE_CGROUP_ARRAY,
array_fd = bpf_map_create(BPF_MAP_TYPE_CGROUP_ARRAY, NULL,
sizeof(uint32_t), sizeof(uint32_t),
1, 0);
1, NULL);
if (array_fd < 0) {
fprintf(stderr,
"bpf_create_map(BPF_MAP_TYPE_CGROUP_ARRAY,...): %s(%d)\n",

View File

@ -71,10 +71,13 @@ static int prog_load(int map_fd, int verdict)
BPF_EXIT_INSN(),
};
size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.log_buf = bpf_log_buf,
.log_size = BPF_LOG_BUF_SIZE,
);
return bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB,
prog, insns_cnt, "GPL", 0,
bpf_log_buf, BPF_LOG_BUF_SIZE);
return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SKB, NULL, "GPL",
prog, insns_cnt, &opts);
}
static int usage(const char *argv0)
@ -90,9 +93,9 @@ static int attach_filter(int cg_fd, int type, int verdict)
int prog_fd, map_fd, ret, key;
long long pkt_cnt, byte_cnt;
map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY,
map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL,
sizeof(key), sizeof(byte_cnt),
256, 0);
256, NULL);
if (map_fd < 0) {
printf("Failed to create map: '%s'\n", strerror(errno));
return EXIT_FAILURE;

View File

@ -70,6 +70,10 @@ static int prog_load(__u32 idx, __u32 mark, __u32 prio)
BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, priority)),
BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, priority)),
};
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.log_buf = bpf_log_buf,
.log_size = BPF_LOG_BUF_SIZE,
);
struct bpf_insn *prog;
size_t insns_cnt;
@ -115,8 +119,8 @@ static int prog_load(__u32 idx, __u32 mark, __u32 prio)
insns_cnt /= sizeof(struct bpf_insn);
ret = bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, insns_cnt,
"GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
ret = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL",
prog, insns_cnt, &opts);
free(prog);

View File

@ -105,10 +105,10 @@ struct pfect_lru {
static void pfect_lru_init(struct pfect_lru *lru, unsigned int lru_size,
unsigned int nr_possible_elems)
{
lru->map_fd = bpf_create_map(BPF_MAP_TYPE_HASH,
lru->map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL,
sizeof(unsigned long long),
sizeof(struct pfect_lru_node *),
nr_possible_elems, 0);
nr_possible_elems, NULL);
assert(lru->map_fd != -1);
lru->free_nodes = malloc(lru_size * sizeof(struct pfect_lru_node));
@ -207,10 +207,13 @@ static unsigned int read_keys(const char *dist_file,
static int create_map(int map_type, int map_flags, unsigned int size)
{
LIBBPF_OPTS(bpf_map_create_opts, opts,
.map_flags = map_flags,
);
int map_fd;
map_fd = bpf_create_map(map_type, sizeof(unsigned long long),
sizeof(unsigned long long), size, map_flags);
map_fd = bpf_map_create(map_type, NULL, sizeof(unsigned long long),
sizeof(unsigned long long), size, &opts);
if (map_fd == -1)
perror("bpf_create_map");

View File

@ -43,7 +43,6 @@ static void print_bpf_output(void *ctx, int cpu, void *data, __u32 size)
int main(int argc, char **argv)
{
struct perf_buffer_opts pb_opts = {};
struct bpf_link *link = NULL;
struct bpf_program *prog;
struct perf_buffer *pb;
@ -84,8 +83,7 @@ int main(int argc, char **argv)
goto cleanup;
}
pb_opts.sample_cb = print_bpf_output;
pb = perf_buffer__new(map_fd, 8, &pb_opts);
pb = perf_buffer__new(map_fd, 8, print_bpf_output, NULL, NULL, NULL);
ret = libbpf_get_error(pb);
if (ret) {
printf("failed to setup perf_buffer: %d\n", ret);

View File

@ -100,7 +100,6 @@ u16 get_dest_port_ipv4_udp(struct xdp_md *ctx, u64 nh_off)
void *data = (void *)(long)ctx->data;
struct iphdr *iph = data + nh_off;
struct udphdr *udph;
u16 dport;
if (iph + 1 > data_end)
return 0;
@ -111,8 +110,7 @@ u16 get_dest_port_ipv4_udp(struct xdp_md *ctx, u64 nh_off)
if (udph + 1 > data_end)
return 0;
dport = bpf_ntohs(udph->dest);
return dport;
return bpf_ntohs(udph->dest);
}
static __always_inline

View File

@ -110,12 +110,9 @@ static void usage(const char *prog)
int main(int argc, char **argv)
{
struct bpf_prog_load_attr prog_load_attr = {
.prog_type = BPF_PROG_TYPE_XDP,
};
struct perf_buffer_opts pb_opts = {};
const char *optstr = "FS";
int prog_fd, map_fd, opt;
struct bpf_program *prog;
struct bpf_object *obj;
struct bpf_map *map;
char filename[256];
@ -144,15 +141,19 @@ int main(int argc, char **argv)
}
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
prog_load_attr.file = filename;
if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj))
return 1;
if (!prog_fd) {
printf("bpf_prog_load_xattr: %s\n", strerror(errno));
prog = bpf_object__next_program(obj, NULL);
bpf_program__set_type(prog, BPF_PROG_TYPE_XDP);
err = bpf_object__load(obj);
if (err)
return 1;
}
prog_fd = bpf_program__fd(prog);
map = bpf_object__next_map(obj, NULL);
if (!map) {
@ -181,8 +182,7 @@ int main(int argc, char **argv)
return 1;
}
pb_opts.sample_cb = print_bpf_output;
pb = perf_buffer__new(map_fd, 8, &pb_opts);
pb = perf_buffer__new(map_fd, 8, print_bpf_output, NULL, NULL, NULL);
err = libbpf_get_error(pb);
if (err) {
perror("perf_buffer setup failed");

View File

@ -45,7 +45,9 @@ const char *get_driver_name(int ifindex);
int get_mac_addr(int ifindex, void *mac_addr);
#pragma GCC diagnostic push
#ifndef __clang__
#pragma GCC diagnostic ignored "-Wstringop-truncation"
#endif
__attribute__((unused))
static inline char *safe_strncpy(char *dst, const char *src, size_t size)
{

View File

@ -15,6 +15,9 @@
#include <bpf/xsk.h>
#include "xdpsock.h"
/* libbpf APIs for AF_XDP are deprecated starting from v0.7 */
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
static const char *opt_if = "";
static struct option long_options[] = {

View File

@ -36,6 +36,9 @@
#include <bpf/bpf.h>
#include "xdpsock.h"
/* libbpf APIs for AF_XDP are deprecated starting from v0.7 */
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
#ifndef SOL_XDP
#define SOL_XDP 283
#endif

View File

@ -27,6 +27,9 @@
#include <bpf/xsk.h>
#include <bpf/bpf.h>
/* libbpf APIs for AF_XDP are deprecated starting from v0.7 */
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
typedef __u64 u64;

View File

@ -24,7 +24,7 @@ man: man8
man8: $(DOC_MAN8)
RST2MAN_DEP := $(shell command -v rst2man 2>/dev/null)
RST2MAN_OPTS += --verbose
RST2MAN_OPTS += --verbose --strip-comments
list_pages = $(sort $(basename $(filter-out $(1),$(MAN8_RST))))
see_also = $(subst " ",, \

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
================
bpftool-btf
================
@ -7,13 +9,14 @@ tool for inspection of BTF data
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
**bpftool** [*OPTIONS*] **btf** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | {**-d** | **--debug** } |
{ **-B** | **--base-btf** } }
*OPTIONS* := { |COMMON_OPTIONS| | { **-B** | **--base-btf** } }
*COMMANDS* := { **dump** | **help** }

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
================
bpftool-cgroup
================
@ -7,13 +9,14 @@ tool for inspection and simple manipulation of eBPF progs
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
**bpftool** [*OPTIONS*] **cgroup** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } |
{ **-f** | **--bpffs** } }
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } }
*COMMANDS* :=
{ **show** | **list** | **tree** | **attach** | **detach** | **help** }

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
===============
bpftool-feature
===============
@ -7,12 +9,14 @@ tool for inspection of eBPF-related parameters for Linux kernel or net device
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
**bpftool** [*OPTIONS*] **feature** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } }
*OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* := { **probe** | **help** }

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
================
bpftool-gen
================
@ -7,13 +9,14 @@ tool for BPF code-generation
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
**bpftool** [*OPTIONS*] **gen** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } |
{ **-L** | **--use-loader** } }
*OPTIONS* := { |COMMON_OPTIONS| | { **-L** | **--use-loader** } }
*COMMAND* := { **object** | **skeleton** | **help** }

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
============
bpftool-iter
============
@ -7,12 +9,14 @@ tool to create BPF iterators
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
**bpftool** [*OPTIONS*] **iter** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } }
*OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* := { **pin** | **help** }

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
================
bpftool-link
================
@ -7,13 +9,14 @@ tool for inspection and simple manipulation of eBPF links
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
**bpftool** [*OPTIONS*] **link** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } |
{ **-f** | **--bpffs** } | { **-n** | **--nomount** } }
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } }
*COMMANDS* := { **show** | **list** | **pin** | **help** }

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
================
bpftool-map
================
@ -7,13 +9,14 @@ tool for inspection and simple manipulation of eBPF maps
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
**bpftool** [*OPTIONS*] **map** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } |
{ **-f** | **--bpffs** } | { **-n** | **--nomount** } }
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } }
*COMMANDS* :=
{ **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** |

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
================
bpftool-net
================
@ -7,12 +9,14 @@ tool for inspection of netdev/tc related bpf prog attachments
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
**bpftool** [*OPTIONS*] **net** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } }
*OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* :=
{ **show** | **list** | **attach** | **detach** | **help** }

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
================
bpftool-perf
================
@ -7,12 +9,14 @@ tool for inspection of perf related bpf prog attachments
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
**bpftool** [*OPTIONS*] **perf** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } }
*OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* :=
{ **show** | **list** | **help** }

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
================
bpftool-prog
================
@ -7,12 +9,14 @@ tool for inspection and simple manipulation of eBPF progs
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
**bpftool** [*OPTIONS*] **prog** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } |
*OPTIONS* := { |COMMON_OPTIONS| |
{ **-f** | **--bpffs** } | { **-m** | **--mapcompat** } | { **-n** | **--nomount** } |
{ **-L** | **--use-loader** } }

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
==================
bpftool-struct_ops
==================
@ -7,12 +9,14 @@ tool to register/unregister/introspect BPF struct_ops
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
**bpftool** [*OPTIONS*] **struct_ops** *COMMAND*
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } }
*OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* :=
{ **show** | **list** | **dump** | **register** | **unregister** | **help** }

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
================
BPFTOOL
================
@ -7,6 +9,8 @@ tool for inspection and simple manipulation of eBPF programs and maps
:Manual section: 8
.. include:: substitutions.rst
SYNOPSIS
========
@ -18,8 +22,7 @@ SYNOPSIS
*OBJECT* := { **map** | **program** | **cgroup** | **perf** | **net** | **feature** }
*OPTIONS* := { { **-V** | **--version** } |
{ **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } }
*OPTIONS* := { { **-V** | **--version** } | |COMMON_OPTIONS| }
*MAP-COMMANDS* :=
{ **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** |

View File

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
-h, --help
Print short help message (similar to **bpftool help**).

View File

@ -0,0 +1,3 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
.. |COMMON_OPTIONS| replace:: { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } | { **-l** | **--legacy** }

View File

@ -486,7 +486,6 @@ static void codegen_destroy(struct bpf_object *obj, const char *obj_name)
static int gen_trace(struct bpf_object *obj, const char *obj_name, const char *header_guard)
{
struct bpf_object_load_attr load_attr = {};
DECLARE_LIBBPF_OPTS(gen_loader_opts, opts);
struct bpf_map *map;
char ident[256];
@ -496,12 +495,7 @@ static int gen_trace(struct bpf_object *obj, const char *obj_name, const char *h
if (err)
return err;
load_attr.obj = obj;
if (verifier_logs)
/* log_level1 + log_level2 + stats, but not stable UAPI */
load_attr.log_level = 1 + 2 + 4;
err = bpf_object__load_xattr(&load_attr);
err = bpf_object__load(obj);
if (err) {
p_err("failed to load object file");
goto out;
@ -719,6 +713,9 @@ static int do_skeleton(int argc, char **argv)
if (obj_name[0] == '\0')
get_obj_name(obj_name, file);
opts.object_name = obj_name;
if (verifier_logs)
/* log_level1 + log_level2 + stats, but not stable UAPI */
opts.kernel_log_level = 1 + 2 + 4;
obj = bpf_object__open_mem(obj_data, file_sz, &opts);
err = libbpf_get_error(obj);
if (err) {

View File

@ -93,6 +93,7 @@ static int do_version(int argc, char **argv)
jsonw_name(json_wtr, "features");
jsonw_start_object(json_wtr); /* features */
jsonw_bool_field(json_wtr, "libbfd", has_libbfd);
jsonw_bool_field(json_wtr, "libbpf_strict", !legacy_libbpf);
jsonw_bool_field(json_wtr, "skeletons", has_skeletons);
jsonw_end_object(json_wtr); /* features */
@ -106,6 +107,10 @@ static int do_version(int argc, char **argv)
printf(" libbfd");
nb_features++;
}
if (!legacy_libbpf) {
printf("%s libbpf_strict", nb_features++ ? "," : "");
nb_features++;
}
if (has_skeletons)
printf("%s skeletons", nb_features++ ? "," : "");
printf("\n");
@ -400,6 +405,7 @@ int main(int argc, char **argv)
{ "legacy", no_argument, NULL, 'l' },
{ 0 }
};
bool version_requested = false;
int opt, ret;
last_do_help = do_help;
@ -414,7 +420,8 @@ int main(int argc, char **argv)
options, NULL)) >= 0) {
switch (opt) {
case 'V':
return do_version(argc, argv);
version_requested = true;
break;
case 'h':
return do_help(argc, argv);
case 'p':
@ -479,6 +486,9 @@ int main(int argc, char **argv)
if (argc < 0)
usage();
if (version_requested)
return do_version(argc, argv);
ret = cmd_select(cmds, argc, argv, do_help);
if (json_output)

View File

@ -1261,7 +1261,10 @@ static int do_pin(int argc, char **argv)
static int do_create(int argc, char **argv)
{
struct bpf_create_map_attr attr = { NULL, };
LIBBPF_OPTS(bpf_map_create_opts, attr);
enum bpf_map_type map_type = BPF_MAP_TYPE_UNSPEC;
__u32 key_size = 0, value_size = 0, max_entries = 0;
const char *map_name = NULL;
const char *pinfile;
int err = -1, fd;
@ -1276,30 +1279,30 @@ static int do_create(int argc, char **argv)
if (is_prefix(*argv, "type")) {
NEXT_ARG();
if (attr.map_type) {
if (map_type) {
p_err("map type already specified");
goto exit;
}
attr.map_type = map_type_from_str(*argv);
if ((int)attr.map_type < 0) {
map_type = map_type_from_str(*argv);
if ((int)map_type < 0) {
p_err("unrecognized map type: %s", *argv);
goto exit;
}
NEXT_ARG();
} else if (is_prefix(*argv, "name")) {
NEXT_ARG();
attr.name = GET_ARG();
map_name = GET_ARG();
} else if (is_prefix(*argv, "key")) {
if (parse_u32_arg(&argc, &argv, &attr.key_size,
if (parse_u32_arg(&argc, &argv, &key_size,
"key size"))
goto exit;
} else if (is_prefix(*argv, "value")) {
if (parse_u32_arg(&argc, &argv, &attr.value_size,
if (parse_u32_arg(&argc, &argv, &value_size,
"value size"))
goto exit;
} else if (is_prefix(*argv, "entries")) {
if (parse_u32_arg(&argc, &argv, &attr.max_entries,
if (parse_u32_arg(&argc, &argv, &max_entries,
"max entries"))
goto exit;
} else if (is_prefix(*argv, "flags")) {
@ -1340,14 +1343,14 @@ static int do_create(int argc, char **argv)
}
}
if (!attr.name) {
if (!map_name) {
p_err("map name not specified");
goto exit;
}
set_max_rlimit();
fd = bpf_create_map_xattr(&attr);
fd = bpf_map_create(map_type, map_name, key_size, value_size, max_entries, &attr);
if (fd < 0) {
p_err("map create failed: %s", strerror(errno));
goto exit;

View File

@ -1464,7 +1464,6 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
DECLARE_LIBBPF_OPTS(bpf_object_open_opts, open_opts,
.relaxed_maps = relaxed_maps,
);
struct bpf_object_load_attr load_attr = { 0 };
enum bpf_attach_type expected_attach_type;
struct map_replace *map_replace = NULL;
struct bpf_program *prog = NULL, *pos;
@ -1598,6 +1597,10 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
set_max_rlimit();
if (verifier_logs)
/* log_level1 + log_level2 + stats, but not stable UAPI */
open_opts.kernel_log_level = 1 + 2 + 4;
obj = bpf_object__open_file(file, &open_opts);
if (libbpf_get_error(obj)) {
p_err("failed to open object file");
@ -1677,12 +1680,7 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
goto err_close_obj;
}
load_attr.obj = obj;
if (verifier_logs)
/* log_level1 + log_level2 + stats, but not stable UAPI */
load_attr.log_level = 1 + 2 + 4;
err = bpf_object__load_xattr(&load_attr);
err = bpf_object__load(obj);
if (err) {
p_err("failed to load object file");
goto err_close_obj;
@ -1774,17 +1772,19 @@ static int try_loader(struct gen_loader_opts *gen)
sizeof(struct bpf_prog_desc));
int log_buf_sz = (1u << 24) - 1;
int err, fds_before, fd_delta;
char *log_buf;
char *log_buf = NULL;
ctx = alloca(ctx_sz);
memset(ctx, 0, ctx_sz);
ctx->sz = ctx_sz;
ctx->log_level = 1;
ctx->log_size = log_buf_sz;
log_buf = malloc(log_buf_sz);
if (!log_buf)
return -ENOMEM;
ctx->log_buf = (long) log_buf;
if (verifier_logs) {
ctx->log_level = 1 + 2 + 4;
ctx->log_size = log_buf_sz;
log_buf = malloc(log_buf_sz);
if (!log_buf)
return -ENOMEM;
ctx->log_buf = (long) log_buf;
}
opts.ctx = ctx;
opts.data = gen->data;
opts.data_sz = gen->data_sz;
@ -1793,9 +1793,9 @@ static int try_loader(struct gen_loader_opts *gen)
fds_before = count_open_fds();
err = bpf_load_and_run(&opts);
fd_delta = count_open_fds() - fds_before;
if (err < 0) {
if (err < 0 || verifier_logs) {
fprintf(stderr, "err %d\n%s\n%s", err, opts.errstr, log_buf);
if (fd_delta)
if (fd_delta && err < 0)
fprintf(stderr, "loader prog leaked %d FDs\n",
fd_delta);
}
@ -1807,7 +1807,6 @@ static int do_loader(int argc, char **argv)
{
DECLARE_LIBBPF_OPTS(bpf_object_open_opts, open_opts);
DECLARE_LIBBPF_OPTS(gen_loader_opts, gen);
struct bpf_object_load_attr load_attr = {};
struct bpf_object *obj;
const char *file;
int err = 0;
@ -1816,6 +1815,10 @@ static int do_loader(int argc, char **argv)
return -1;
file = GET_ARG();
if (verifier_logs)
/* log_level1 + log_level2 + stats, but not stable UAPI */
open_opts.kernel_log_level = 1 + 2 + 4;
obj = bpf_object__open_file(file, &open_opts);
if (libbpf_get_error(obj)) {
p_err("failed to open object file");
@ -1826,12 +1829,7 @@ static int do_loader(int argc, char **argv)
if (err)
goto err_close_obj;
load_attr.obj = obj;
if (verifier_logs)
/* log_level1 + log_level2 + stats, but not stable UAPI */
load_attr.log_level = 1 + 2 + 4;
err = bpf_object__load_xattr(&load_attr);
err = bpf_object__load(obj);
if (err) {
p_err("failed to load object file");
goto err_close_obj;

View File

@ -479,7 +479,7 @@ static int do_unregister(int argc, char **argv)
static int do_register(int argc, char **argv)
{
struct bpf_object_load_attr load_attr = {};
LIBBPF_OPTS(bpf_object_open_opts, open_opts);
const struct bpf_map_def *def;
struct bpf_map_info info = {};
__u32 info_len = sizeof(info);
@ -494,18 +494,17 @@ static int do_register(int argc, char **argv)
file = GET_ARG();
obj = bpf_object__open(file);
if (verifier_logs)
/* log_level1 + log_level2 + stats, but not stable UAPI */
open_opts.kernel_log_level = 1 + 2 + 4;
obj = bpf_object__open_file(file, &open_opts);
if (libbpf_get_error(obj))
return -1;
set_max_rlimit();
load_attr.obj = obj;
if (verifier_logs)
/* log_level1 + log_level2 + stats, but not stable UAPI */
load_attr.log_level = 1 + 2 + 4;
if (bpf_object__load_xattr(&load_attr)) {
if (bpf_object__load(obj)) {
bpf_object__close(obj);
return -1;
}

View File

@ -168,7 +168,7 @@ static struct btf_id *btf_id__find(struct rb_root *root, const char *name)
return NULL;
}
static struct btf_id*
static struct btf_id *
btf_id__add(struct rb_root *root, char *name, bool unique)
{
struct rb_node **p = &root->rb_node;
@ -732,7 +732,8 @@ int main(int argc, const char **argv)
if (obj.efile.idlist_shndx == -1 ||
obj.efile.symbols_shndx == -1) {
pr_debug("Cannot find .BTF_ids or symbols sections, nothing to do\n");
return 0;
err = 0;
goto out;
}
if (symbols_collect(&obj))

View File

@ -14,6 +14,12 @@
# define __NR_bpf 349
# elif defined(__s390__)
# define __NR_bpf 351
# elif defined(__mips__) && defined(_ABIO32)
# define __NR_bpf 4355
# elif defined(__mips__) && defined(_ABIN32)
# define __NR_bpf 6319
# elif defined(__mips__) && defined(_ABI64)
# define __NR_bpf 5315
# else
# error __NR_bpf not defined. libbpf does not support your arch.
# endif

View File

@ -1342,8 +1342,10 @@ union bpf_attr {
/* or valid module BTF object fd or 0 to attach to vmlinux */
__u32 attach_btf_obj_fd;
};
__u32 :32; /* pad */
__u32 core_relo_cnt; /* number of bpf_core_relo */
__aligned_u64 fd_array; /* array of FDs */
__aligned_u64 core_relos;
__u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */
};
struct { /* anonymous struct used by BPF_OBJ_* commands */
@ -1744,7 +1746,7 @@ union bpf_attr {
* if the maximum number of tail calls has been reached for this
* chain of programs. This limit is defined in the kernel by the
* macro **MAX_TAIL_CALL_CNT** (not accessible to user space),
* which is currently set to 32.
* which is currently set to 33.
* Return
* 0 on success, or a negative error in case of failure.
*
@ -4957,6 +4959,30 @@ union bpf_attr {
* **-ENOENT** if *task->mm* is NULL, or no vma contains *addr*.
* **-EBUSY** if failed to try lock mmap_lock.
* **-EINVAL** for invalid **flags**.
*
* long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx, u64 flags)
* Description
* For **nr_loops**, call **callback_fn** function
* with **callback_ctx** as the context parameter.
* The **callback_fn** should be a static function and
* the **callback_ctx** should be a pointer to the stack.
* The **flags** is used to control certain aspects of the helper.
* Currently, the **flags** must be 0. Currently, nr_loops is
* limited to 1 << 23 (~8 million) loops.
*
* long (\*callback_fn)(u32 index, void \*ctx);
*
* where **index** is the current index in the loop. The index
* is zero-indexed.
*
* If **callback_fn** returns 0, the helper will continue to the next
* loop. If return value is 1, the helper will skip the rest of
* the loops and return. Other return values are not used now,
* and will be rejected by the verifier.
*
* Return
* The number of loops performed, **-EINVAL** for invalid **flags**,
* **-E2BIG** if **nr_loops** exceeds the maximum number of loops.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@ -5140,6 +5166,7 @@ union bpf_attr {
FN(skc_to_unix_sock), \
FN(kallsyms_lookup_name), \
FN(find_vma), \
FN(loop), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -6349,4 +6376,78 @@ enum {
BTF_F_ZERO = (1ULL << 3),
};
/* bpf_core_relo_kind encodes which aspect of captured field/type/enum value
* has to be adjusted by relocations. It is emitted by llvm and passed to
* libbpf and later to the kernel.
*/
enum bpf_core_relo_kind {
BPF_CORE_FIELD_BYTE_OFFSET = 0, /* field byte offset */
BPF_CORE_FIELD_BYTE_SIZE = 1, /* field size in bytes */
BPF_CORE_FIELD_EXISTS = 2, /* field existence in target kernel */
BPF_CORE_FIELD_SIGNED = 3, /* field signedness (0 - unsigned, 1 - signed) */
BPF_CORE_FIELD_LSHIFT_U64 = 4, /* bitfield-specific left bitshift */
BPF_CORE_FIELD_RSHIFT_U64 = 5, /* bitfield-specific right bitshift */
BPF_CORE_TYPE_ID_LOCAL = 6, /* type ID in local BPF object */
BPF_CORE_TYPE_ID_TARGET = 7, /* type ID in target kernel */
BPF_CORE_TYPE_EXISTS = 8, /* type existence in target kernel */
BPF_CORE_TYPE_SIZE = 9, /* type size in bytes */
BPF_CORE_ENUMVAL_EXISTS = 10, /* enum value existence in target kernel */
BPF_CORE_ENUMVAL_VALUE = 11, /* enum value integer value */
};
/*
* "struct bpf_core_relo" is used to pass relocation data form LLVM to libbpf
* and from libbpf to the kernel.
*
* CO-RE relocation captures the following data:
* - insn_off - instruction offset (in bytes) within a BPF program that needs
* its insn->imm field to be relocated with actual field info;
* - type_id - BTF type ID of the "root" (containing) entity of a relocatable
* type or field;
* - access_str_off - offset into corresponding .BTF string section. String
* interpretation depends on specific relocation kind:
* - for field-based relocations, string encodes an accessed field using
* a sequence of field and array indices, separated by colon (:). It's
* conceptually very close to LLVM's getelementptr ([0]) instruction's
* arguments for identifying offset to a field.
* - for type-based relocations, strings is expected to be just "0";
* - for enum value-based relocations, string contains an index of enum
* value within its enum type;
* - kind - one of enum bpf_core_relo_kind;
*
* Example:
* struct sample {
* int a;
* struct {
* int b[10];
* };
* };
*
* struct sample *s = ...;
* int *x = &s->a; // encoded as "0:0" (a is field #0)
* int *y = &s->b[5]; // encoded as "0:1:0:5" (anon struct is field #1,
* // b is field #0 inside anon struct, accessing elem #5)
* int *z = &s[10]->b; // encoded as "10:1" (ptr is used as an array)
*
* type_id for all relocs in this example will capture BTF type id of
* `struct sample`.
*
* Such relocation is emitted when using __builtin_preserve_access_index()
* Clang built-in, passing expression that captures field address, e.g.:
*
* bpf_probe_read(&dst, sizeof(dst),
* __builtin_preserve_access_index(&src->a.b.c));
*
* In this case Clang will emit field relocation recording necessary data to
* be able to find offset of embedded `a.b.c` field within `src` struct.
*
* [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction
*/
struct bpf_core_relo {
__u32 insn_off;
__u32 type_id;
__u32 access_str_off;
enum bpf_core_relo_kind kind;
};
#endif /* _UAPI__LINUX_BPF_H__ */

View File

@ -50,6 +50,12 @@
# define __NR_bpf 351
# elif defined(__arc__)
# define __NR_bpf 280
# elif defined(__mips__) && defined(_ABIO32)
# define __NR_bpf 4355
# elif defined(__mips__) && defined(_ABIN32)
# define __NR_bpf 6319
# elif defined(__mips__) && defined(_ABI64)
# define __NR_bpf 5315
# else
# error __NR_bpf not defined. libbpf does not support your arch.
# endif
@ -88,146 +94,122 @@ static inline int sys_bpf_prog_load(union bpf_attr *attr, unsigned int size, int
return fd;
}
int libbpf__bpf_create_map_xattr(const struct bpf_create_map_params *create_attr)
int bpf_map_create(enum bpf_map_type map_type,
const char *map_name,
__u32 key_size,
__u32 value_size,
__u32 max_entries,
const struct bpf_map_create_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, map_extra);
union bpf_attr attr;
int fd;
memset(&attr, '\0', sizeof(attr));
memset(&attr, 0, attr_sz);
attr.map_type = create_attr->map_type;
attr.key_size = create_attr->key_size;
attr.value_size = create_attr->value_size;
attr.max_entries = create_attr->max_entries;
attr.map_flags = create_attr->map_flags;
if (create_attr->name)
memcpy(attr.map_name, create_attr->name,
min(strlen(create_attr->name), BPF_OBJ_NAME_LEN - 1));
attr.numa_node = create_attr->numa_node;
attr.btf_fd = create_attr->btf_fd;
attr.btf_key_type_id = create_attr->btf_key_type_id;
attr.btf_value_type_id = create_attr->btf_value_type_id;
attr.map_ifindex = create_attr->map_ifindex;
if (attr.map_type == BPF_MAP_TYPE_STRUCT_OPS)
attr.btf_vmlinux_value_type_id =
create_attr->btf_vmlinux_value_type_id;
else
attr.inner_map_fd = create_attr->inner_map_fd;
attr.map_extra = create_attr->map_extra;
if (!OPTS_VALID(opts, bpf_map_create_opts))
return libbpf_err(-EINVAL);
fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, sizeof(attr));
attr.map_type = map_type;
if (map_name)
strncat(attr.map_name, map_name, sizeof(attr.map_name) - 1);
attr.key_size = key_size;
attr.value_size = value_size;
attr.max_entries = max_entries;
attr.btf_fd = OPTS_GET(opts, btf_fd, 0);
attr.btf_key_type_id = OPTS_GET(opts, btf_key_type_id, 0);
attr.btf_value_type_id = OPTS_GET(opts, btf_value_type_id, 0);
attr.btf_vmlinux_value_type_id = OPTS_GET(opts, btf_vmlinux_value_type_id, 0);
attr.inner_map_fd = OPTS_GET(opts, inner_map_fd, 0);
attr.map_flags = OPTS_GET(opts, map_flags, 0);
attr.map_extra = OPTS_GET(opts, map_extra, 0);
attr.numa_node = OPTS_GET(opts, numa_node, 0);
attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0);
fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
return libbpf_err_errno(fd);
}
int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
{
struct bpf_create_map_params p = {};
LIBBPF_OPTS(bpf_map_create_opts, p);
p.map_type = create_attr->map_type;
p.key_size = create_attr->key_size;
p.value_size = create_attr->value_size;
p.max_entries = create_attr->max_entries;
p.map_flags = create_attr->map_flags;
p.name = create_attr->name;
p.numa_node = create_attr->numa_node;
p.btf_fd = create_attr->btf_fd;
p.btf_key_type_id = create_attr->btf_key_type_id;
p.btf_value_type_id = create_attr->btf_value_type_id;
p.map_ifindex = create_attr->map_ifindex;
if (p.map_type == BPF_MAP_TYPE_STRUCT_OPS)
p.btf_vmlinux_value_type_id =
create_attr->btf_vmlinux_value_type_id;
if (create_attr->map_type == BPF_MAP_TYPE_STRUCT_OPS)
p.btf_vmlinux_value_type_id = create_attr->btf_vmlinux_value_type_id;
else
p.inner_map_fd = create_attr->inner_map_fd;
return libbpf__bpf_create_map_xattr(&p);
return bpf_map_create(create_attr->map_type, create_attr->name,
create_attr->key_size, create_attr->value_size,
create_attr->max_entries, &p);
}
int bpf_create_map_node(enum bpf_map_type map_type, const char *name,
int key_size, int value_size, int max_entries,
__u32 map_flags, int node)
{
struct bpf_create_map_attr map_attr = {};
LIBBPF_OPTS(bpf_map_create_opts, opts);
map_attr.name = name;
map_attr.map_type = map_type;
map_attr.map_flags = map_flags;
map_attr.key_size = key_size;
map_attr.value_size = value_size;
map_attr.max_entries = max_entries;
opts.map_flags = map_flags;
if (node >= 0) {
map_attr.numa_node = node;
map_attr.map_flags |= BPF_F_NUMA_NODE;
opts.numa_node = node;
opts.map_flags |= BPF_F_NUMA_NODE;
}
return bpf_create_map_xattr(&map_attr);
return bpf_map_create(map_type, name, key_size, value_size, max_entries, &opts);
}
int bpf_create_map(enum bpf_map_type map_type, int key_size,
int value_size, int max_entries, __u32 map_flags)
{
struct bpf_create_map_attr map_attr = {};
LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = map_flags);
map_attr.map_type = map_type;
map_attr.map_flags = map_flags;
map_attr.key_size = key_size;
map_attr.value_size = value_size;
map_attr.max_entries = max_entries;
return bpf_create_map_xattr(&map_attr);
return bpf_map_create(map_type, NULL, key_size, value_size, max_entries, &opts);
}
int bpf_create_map_name(enum bpf_map_type map_type, const char *name,
int key_size, int value_size, int max_entries,
__u32 map_flags)
{
struct bpf_create_map_attr map_attr = {};
LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = map_flags);
map_attr.name = name;
map_attr.map_type = map_type;
map_attr.map_flags = map_flags;
map_attr.key_size = key_size;
map_attr.value_size = value_size;
map_attr.max_entries = max_entries;
return bpf_create_map_xattr(&map_attr);
return bpf_map_create(map_type, name, key_size, value_size, max_entries, &opts);
}
int bpf_create_map_in_map_node(enum bpf_map_type map_type, const char *name,
int key_size, int inner_map_fd, int max_entries,
__u32 map_flags, int node)
{
union bpf_attr attr;
int fd;
memset(&attr, '\0', sizeof(attr));
attr.map_type = map_type;
attr.key_size = key_size;
attr.value_size = 4;
attr.inner_map_fd = inner_map_fd;
attr.max_entries = max_entries;
attr.map_flags = map_flags;
if (name)
memcpy(attr.map_name, name,
min(strlen(name), BPF_OBJ_NAME_LEN - 1));
LIBBPF_OPTS(bpf_map_create_opts, opts);
opts.inner_map_fd = inner_map_fd;
opts.map_flags = map_flags;
if (node >= 0) {
attr.map_flags |= BPF_F_NUMA_NODE;
attr.numa_node = node;
opts.map_flags |= BPF_F_NUMA_NODE;
opts.numa_node = node;
}
fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, sizeof(attr));
return libbpf_err_errno(fd);
return bpf_map_create(map_type, name, key_size, 4, max_entries, &opts);
}
int bpf_create_map_in_map(enum bpf_map_type map_type, const char *name,
int key_size, int inner_map_fd, int max_entries,
__u32 map_flags)
{
return bpf_create_map_in_map_node(map_type, name, key_size,
inner_map_fd, max_entries, map_flags,
-1);
LIBBPF_OPTS(bpf_map_create_opts, opts,
.inner_map_fd = inner_map_fd,
.map_flags = map_flags,
);
return bpf_map_create(map_type, name, key_size, 4, max_entries, &opts);
}
static void *
@ -321,10 +303,6 @@ int bpf_prog_load_v0_6_0(enum bpf_prog_type prog_type,
if (log_level && !log_buf)
return libbpf_err(-EINVAL);
attr.log_level = log_level;
attr.log_buf = ptr_to_u64(log_buf);
attr.log_size = log_size;
func_info_rec_size = OPTS_GET(opts, func_info_rec_size, 0);
func_info = OPTS_GET(opts, func_info, NULL);
attr.func_info_rec_size = func_info_rec_size;
@ -339,6 +317,12 @@ int bpf_prog_load_v0_6_0(enum bpf_prog_type prog_type,
attr.fd_array = ptr_to_u64(OPTS_GET(opts, fd_array, NULL));
if (log_level) {
attr.log_buf = ptr_to_u64(log_buf);
attr.log_size = log_size;
attr.log_level = log_level;
}
fd = sys_bpf_prog_load(&attr, sizeof(attr), attempts);
if (fd >= 0)
return fd;
@ -384,16 +368,17 @@ int bpf_prog_load_v0_6_0(enum bpf_prog_type prog_type,
goto done;
}
if (log_level || !log_buf)
goto done;
if (log_level == 0 && log_buf) {
/* log_level == 0 with non-NULL log_buf requires retrying on error
* with log_level == 1 and log_buf/log_buf_size set, to get details of
* failure
*/
attr.log_buf = ptr_to_u64(log_buf);
attr.log_size = log_size;
attr.log_level = 1;
/* Try again with log */
log_buf[0] = 0;
attr.log_buf = ptr_to_u64(log_buf);
attr.log_size = log_size;
attr.log_level = 1;
fd = sys_bpf_prog_load(&attr, sizeof(attr), attempts);
fd = sys_bpf_prog_load(&attr, sizeof(attr), attempts);
}
done:
/* free() doesn't affect errno, so we don't need to restore it */
free(finfo);
@ -1062,24 +1047,65 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd)
return libbpf_err_errno(fd);
}
int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
bool do_log)
int bpf_btf_load(const void *btf_data, size_t btf_size, const struct bpf_btf_load_opts *opts)
{
union bpf_attr attr = {};
const size_t attr_sz = offsetofend(union bpf_attr, btf_log_level);
union bpf_attr attr;
char *log_buf;
size_t log_size;
__u32 log_level;
int fd;
attr.btf = ptr_to_u64(btf);
memset(&attr, 0, attr_sz);
if (!OPTS_VALID(opts, bpf_btf_load_opts))
return libbpf_err(-EINVAL);
log_buf = OPTS_GET(opts, log_buf, NULL);
log_size = OPTS_GET(opts, log_size, 0);
log_level = OPTS_GET(opts, log_level, 0);
if (log_size > UINT_MAX)
return libbpf_err(-EINVAL);
if (log_size && !log_buf)
return libbpf_err(-EINVAL);
attr.btf = ptr_to_u64(btf_data);
attr.btf_size = btf_size;
/* log_level == 0 and log_buf != NULL means "try loading without
* log_buf, but retry with log_buf and log_level=1 on error", which is
* consistent across low-level and high-level BTF and program loading
* APIs within libbpf and provides a sensible behavior in practice
*/
if (log_level) {
attr.btf_log_buf = ptr_to_u64(log_buf);
attr.btf_log_size = (__u32)log_size;
attr.btf_log_level = log_level;
}
fd = sys_bpf_fd(BPF_BTF_LOAD, &attr, attr_sz);
if (fd < 0 && log_buf && log_level == 0) {
attr.btf_log_buf = ptr_to_u64(log_buf);
attr.btf_log_size = (__u32)log_size;
attr.btf_log_level = 1;
fd = sys_bpf_fd(BPF_BTF_LOAD, &attr, attr_sz);
}
return libbpf_err_errno(fd);
}
int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size, bool do_log)
{
LIBBPF_OPTS(bpf_btf_load_opts, opts);
int fd;
retry:
if (do_log && log_buf && log_buf_size) {
attr.btf_log_level = 1;
attr.btf_log_size = log_buf_size;
attr.btf_log_buf = ptr_to_u64(log_buf);
opts.log_buf = log_buf;
opts.log_size = log_buf_size;
opts.log_level = 1;
}
fd = sys_bpf_fd(BPF_BTF_LOAD, &attr, sizeof(attr));
fd = bpf_btf_load(btf, btf_size, &opts);
if (fd < 0 && !do_log && log_buf && log_buf_size) {
do_log = true;
goto retry;

View File

@ -35,6 +35,30 @@
extern "C" {
#endif
struct bpf_map_create_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 btf_fd;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
__u32 btf_vmlinux_value_type_id;
__u32 inner_map_fd;
__u32 map_flags;
__u64 map_extra;
__u32 numa_node;
__u32 map_ifindex;
};
#define bpf_map_create_opts__last_field map_ifindex
LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
const char *map_name,
__u32 key_size,
__u32 value_size,
__u32 max_entries,
const struct bpf_map_create_opts *opts);
struct bpf_create_map_attr {
const char *name;
enum bpf_map_type map_type;
@ -53,20 +77,25 @@ struct bpf_create_map_attr {
};
};
LIBBPF_API int
bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr);
LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead")
LIBBPF_API int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr);
LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead")
LIBBPF_API int bpf_create_map_node(enum bpf_map_type map_type, const char *name,
int key_size, int value_size,
int max_entries, __u32 map_flags, int node);
LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead")
LIBBPF_API int bpf_create_map_name(enum bpf_map_type map_type, const char *name,
int key_size, int value_size,
int max_entries, __u32 map_flags);
LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead")
LIBBPF_API int bpf_create_map(enum bpf_map_type map_type, int key_size,
int value_size, int max_entries, __u32 map_flags);
LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead")
LIBBPF_API int bpf_create_map_in_map_node(enum bpf_map_type map_type,
const char *name, int key_size,
int inner_map_fd, int max_entries,
__u32 map_flags, int node);
LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead")
LIBBPF_API int bpf_create_map_in_map(enum bpf_map_type map_type,
const char *name, int key_size,
int inner_map_fd, int max_entries,
@ -166,8 +195,9 @@ struct bpf_load_program_attr {
/* Flags to direct loading requirements */
#define MAPS_RELAX_COMPAT 0x01
/* Recommend log buffer size */
/* Recommended log buffer size */
#define BPF_LOG_BUF_SIZE (UINT32_MAX >> 8) /* verifier maximum in kernels <= 5.1 */
LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_prog_load() instead")
LIBBPF_API int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
char *log_buf, size_t log_buf_sz);
@ -184,6 +214,23 @@ LIBBPF_API int bpf_verify_program(enum bpf_prog_type type,
char *log_buf, size_t log_buf_sz,
int log_level);
struct bpf_btf_load_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
/* kernel log options */
char *log_buf;
__u32 log_level;
__u32 log_size;
};
#define bpf_btf_load_opts__last_field log_size
LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size,
const struct bpf_btf_load_opts *opts);
LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_btf_load() instead")
LIBBPF_API int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf,
__u32 log_buf_size, bool do_log);
LIBBPF_API int bpf_map_update_elem(int fd, const void *key, const void *value,
__u64 flags);
@ -311,8 +358,6 @@ LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
__u32 query_flags, __u32 *attach_flags,
__u32 *prog_ids, __u32 *prog_cnt);
LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd);
LIBBPF_API int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf,
__u32 log_buf_size, bool do_log);
LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
__u32 *buf_len, __u32 *prog_id, __u32 *fd_type,
__u64 *probe_offset, __u64 *probe_addr);

View File

@ -39,6 +39,8 @@ struct bpf_gen {
int error;
struct ksym_relo_desc *relos;
int relo_cnt;
struct bpf_core_relo *core_relos;
int core_relo_cnt;
char attach_target[128];
int attach_kind;
struct ksym_desc *ksyms;
@ -51,7 +53,10 @@ void bpf_gen__init(struct bpf_gen *gen, int log_level, int nr_progs, int nr_maps
int bpf_gen__finish(struct bpf_gen *gen, int nr_progs, int nr_maps);
void bpf_gen__free(struct bpf_gen *gen);
void bpf_gen__load_btf(struct bpf_gen *gen, const void *raw_data, __u32 raw_size);
void bpf_gen__map_create(struct bpf_gen *gen, struct bpf_create_map_params *map_attr, int map_idx);
void bpf_gen__map_create(struct bpf_gen *gen,
enum bpf_map_type map_type, const char *map_name,
__u32 key_size, __u32 value_size, __u32 max_entries,
struct bpf_map_create_opts *map_attr, int map_idx);
void bpf_gen__prog_load(struct bpf_gen *gen,
enum bpf_prog_type prog_type, const char *prog_name,
const char *license, struct bpf_insn *insns, size_t insn_cnt,
@ -61,5 +66,7 @@ void bpf_gen__map_freeze(struct bpf_gen *gen, int map_idx);
void bpf_gen__record_attach_target(struct bpf_gen *gen, const char *name, enum bpf_attach_type type);
void bpf_gen__record_extern(struct bpf_gen *gen, const char *name, bool is_weak,
bool is_typeless, int kind, int insn_idx);
void bpf_gen__record_relo_core(struct bpf_gen *gen, const struct bpf_core_relo *core_relo);
void bpf_gen__populate_outer_map(struct bpf_gen *gen, int outer_map_idx, int key, int inner_map_idx);
#endif

View File

@ -454,7 +454,7 @@ const struct btf *btf__base_btf(const struct btf *btf)
}
/* internal helper returning non-const pointer to a type */
struct btf_type *btf_type_by_id(struct btf *btf, __u32 type_id)
struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id)
{
if (type_id == 0)
return &btf_void;
@ -610,6 +610,7 @@ __s64 btf__resolve_size(const struct btf *btf, __u32 type_id)
case BTF_KIND_RESTRICT:
case BTF_KIND_VAR:
case BTF_KIND_DECL_TAG:
case BTF_KIND_TYPE_TAG:
type_id = t->type;
break;
case BTF_KIND_ARRAY:
@ -1123,54 +1124,86 @@ struct btf *btf__parse_split(const char *path, struct btf *base_btf)
static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endian);
int btf__load_into_kernel(struct btf *btf)
int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level)
{
__u32 log_buf_size = 0, raw_size;
char *log_buf = NULL;
LIBBPF_OPTS(bpf_btf_load_opts, opts);
__u32 buf_sz = 0, raw_size;
char *buf = NULL, *tmp;
void *raw_data;
int err = 0;
if (btf->fd >= 0)
return libbpf_err(-EEXIST);
if (log_sz && !log_buf)
return libbpf_err(-EINVAL);
retry_load:
if (log_buf_size) {
log_buf = malloc(log_buf_size);
if (!log_buf)
return libbpf_err(-ENOMEM);
*log_buf = 0;
}
/* cache native raw data representation */
raw_data = btf_get_raw_data(btf, &raw_size, false);
if (!raw_data) {
err = -ENOMEM;
goto done;
}
/* cache native raw data representation */
btf->raw_size = raw_size;
btf->raw_data = raw_data;
btf->fd = bpf_load_btf(raw_data, raw_size, log_buf, log_buf_size, false);
if (btf->fd < 0) {
if (!log_buf || errno == ENOSPC) {
log_buf_size = max((__u32)BPF_LOG_BUF_SIZE,
log_buf_size << 1);
free(log_buf);
goto retry_load;
retry_load:
/* if log_level is 0, we won't provide log_buf/log_size to the kernel,
* initially. Only if BTF loading fails, we bump log_level to 1 and
* retry, using either auto-allocated or custom log_buf. This way
* non-NULL custom log_buf provides a buffer just in case, but hopes
* for successful load and no need for log_buf.
*/
if (log_level) {
/* if caller didn't provide custom log_buf, we'll keep
* allocating our own progressively bigger buffers for BTF
* verification log
*/
if (!log_buf) {
buf_sz = max((__u32)BPF_LOG_BUF_SIZE, buf_sz * 2);
tmp = realloc(buf, buf_sz);
if (!tmp) {
err = -ENOMEM;
goto done;
}
buf = tmp;
buf[0] = '\0';
}
opts.log_buf = log_buf ? log_buf : buf;
opts.log_size = log_buf ? log_sz : buf_sz;
opts.log_level = log_level;
}
btf->fd = bpf_btf_load(raw_data, raw_size, &opts);
if (btf->fd < 0) {
/* time to turn on verbose mode and try again */
if (log_level == 0) {
log_level = 1;
goto retry_load;
}
/* only retry if caller didn't provide custom log_buf, but
* make sure we can never overflow buf_sz
*/
if (!log_buf && errno == ENOSPC && buf_sz <= UINT_MAX / 2)
goto retry_load;
err = -errno;
pr_warn("Error loading BTF: %s(%d)\n", strerror(errno), errno);
if (*log_buf)
pr_warn("%s\n", log_buf);
goto done;
pr_warn("BTF loading error: %d\n", err);
/* don't print out contents of custom log_buf */
if (!log_buf && buf[0])
pr_warn("-- BEGIN BTF LOAD LOG ---\n%s\n-- END BTF LOAD LOG --\n", buf);
}
done:
free(log_buf);
free(buf);
return libbpf_err(err);
}
int btf__load_into_kernel(struct btf *btf)
{
return btf_load_into_kernel(btf, NULL, 0, 0);
}
int btf__load(struct btf *) __attribute__((alias("btf__load_into_kernel")));
int btf__fd(const struct btf *btf)
@ -2730,15 +2763,11 @@ void btf_ext__free(struct btf_ext *btf_ext)
free(btf_ext);
}
struct btf_ext *btf_ext__new(__u8 *data, __u32 size)
struct btf_ext *btf_ext__new(const __u8 *data, __u32 size)
{
struct btf_ext *btf_ext;
int err;
err = btf_ext_parse_hdr(data, size);
if (err)
return libbpf_err_ptr(err);
btf_ext = calloc(1, sizeof(struct btf_ext));
if (!btf_ext)
return libbpf_err_ptr(-ENOMEM);
@ -2751,6 +2780,10 @@ struct btf_ext *btf_ext__new(__u8 *data, __u32 size)
}
memcpy(btf_ext->data, data, size);
err = btf_ext_parse_hdr(btf_ext->data, size);
if (err)
goto done;
if (btf_ext->hdr->hdr_len < offsetofend(struct btf_ext_header, line_info_len)) {
err = -EINVAL;
goto done;
@ -3074,7 +3107,7 @@ done:
return libbpf_err(err);
}
COMPAT_VERSION(bpf__dedup_deprecated, btf__dedup, LIBBPF_0.0.2)
COMPAT_VERSION(btf__dedup_deprecated, btf__dedup, LIBBPF_0.0.2)
int btf__dedup_deprecated(struct btf *btf, struct btf_ext *btf_ext, const void *unused_opts)
{
LIBBPF_OPTS(btf_dedup_opts, opts, .btf_ext = btf_ext);
@ -3476,8 +3509,8 @@ static long btf_hash_struct(struct btf_type *t)
}
/*
* Check structural compatibility of two FUNC_PROTOs, ignoring referenced type
* IDs. This check is performed during type graph equivalence check and
* Check structural compatibility of two STRUCTs/UNIONs, ignoring referenced
* type IDs. This check is performed during type graph equivalence check and
* referenced types equivalence is checked separately.
*/
static bool btf_shallow_equal_struct(struct btf_type *t1, struct btf_type *t2)
@ -3850,6 +3883,31 @@ static int btf_dedup_identical_arrays(struct btf_dedup *d, __u32 id1, __u32 id2)
return btf_equal_array(t1, t2);
}
/* Check if given two types are identical STRUCT/UNION definitions */
static bool btf_dedup_identical_structs(struct btf_dedup *d, __u32 id1, __u32 id2)
{
const struct btf_member *m1, *m2;
struct btf_type *t1, *t2;
int n, i;
t1 = btf_type_by_id(d->btf, id1);
t2 = btf_type_by_id(d->btf, id2);
if (!btf_is_composite(t1) || btf_kind(t1) != btf_kind(t2))
return false;
if (!btf_shallow_equal_struct(t1, t2))
return false;
m1 = btf_members(t1);
m2 = btf_members(t2);
for (i = 0, n = btf_vlen(t1); i < n; i++, m1++, m2++) {
if (m1->type != m2->type)
return false;
}
return true;
}
/*
* Check equivalence of BTF type graph formed by candidate struct/union (we'll
* call it "candidate graph" in this description for brevity) to a type graph
@ -3961,6 +4019,8 @@ static int btf_dedup_is_equiv(struct btf_dedup *d, __u32 cand_id,
hypot_type_id = d->hypot_map[canon_id];
if (hypot_type_id <= BTF_MAX_NR_TYPES) {
if (hypot_type_id == cand_id)
return 1;
/* In some cases compiler will generate different DWARF types
* for *identical* array type definitions and use them for
* different fields within the *same* struct. This breaks type
@ -3969,8 +4029,18 @@ static int btf_dedup_is_equiv(struct btf_dedup *d, __u32 cand_id,
* types within a single CU. So work around that by explicitly
* allowing identical array types here.
*/
return hypot_type_id == cand_id ||
btf_dedup_identical_arrays(d, hypot_type_id, cand_id);
if (btf_dedup_identical_arrays(d, hypot_type_id, cand_id))
return 1;
/* It turns out that similar situation can happen with
* struct/union sometimes, sigh... Handle the case where
* structs/unions are exactly the same, down to the referenced
* type IDs. Anything more complicated (e.g., if referenced
* types are different, but equivalent) is *way more*
* complicated and requires a many-to-many equivalence mapping.
*/
if (btf_dedup_identical_structs(d, hypot_type_id, cand_id))
return 1;
return 0;
}
if (btf_dedup_hypot_map_add(d, canon_id, cand_id))
@ -4023,6 +4093,7 @@ static int btf_dedup_is_equiv(struct btf_dedup *d, __u32 cand_id,
case BTF_KIND_PTR:
case BTF_KIND_TYPEDEF:
case BTF_KIND_FUNC:
case BTF_KIND_TYPE_TAG:
if (cand_type->info != canon_type->info)
return 0;
return btf_dedup_is_equiv(d, cand_type->type, canon_type->type);

View File

@ -157,7 +157,7 @@ LIBBPF_API int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
__u32 expected_value_size,
__u32 *key_type_id, __u32 *value_type_id);
LIBBPF_API struct btf_ext *btf_ext__new(__u8 *data, __u32 size);
LIBBPF_API struct btf_ext *btf_ext__new(const __u8 *data, __u32 size);
LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext);
LIBBPF_API const void *btf_ext__get_raw_data(const struct btf_ext *btf_ext,
__u32 *size);

View File

@ -2216,7 +2216,7 @@ static int btf_dump_dump_type_data(struct btf_dump *d,
__u8 bits_offset,
__u8 bit_sz)
{
int size, err;
int size, err = 0;
size = btf_dump_type_data_check_overflow(d, t, id, data, bits_offset);
if (size < 0)

View File

@ -445,47 +445,33 @@ void bpf_gen__load_btf(struct bpf_gen *gen, const void *btf_raw_data,
}
void bpf_gen__map_create(struct bpf_gen *gen,
struct bpf_create_map_params *map_attr, int map_idx)
enum bpf_map_type map_type,
const char *map_name,
__u32 key_size, __u32 value_size, __u32 max_entries,
struct bpf_map_create_opts *map_attr, int map_idx)
{
int attr_size = offsetofend(union bpf_attr, btf_vmlinux_value_type_id);
int attr_size = offsetofend(union bpf_attr, map_extra);
bool close_inner_map_fd = false;
int map_create_attr, idx;
union bpf_attr attr;
memset(&attr, 0, attr_size);
attr.map_type = map_attr->map_type;
attr.key_size = map_attr->key_size;
attr.value_size = map_attr->value_size;
attr.map_type = map_type;
attr.key_size = key_size;
attr.value_size = value_size;
attr.map_flags = map_attr->map_flags;
attr.map_extra = map_attr->map_extra;
memcpy(attr.map_name, map_attr->name,
min((unsigned)strlen(map_attr->name), BPF_OBJ_NAME_LEN - 1));
if (map_name)
memcpy(attr.map_name, map_name,
min((unsigned)strlen(map_name), BPF_OBJ_NAME_LEN - 1));
attr.numa_node = map_attr->numa_node;
attr.map_ifindex = map_attr->map_ifindex;
attr.max_entries = map_attr->max_entries;
switch (attr.map_type) {
case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
case BPF_MAP_TYPE_CGROUP_ARRAY:
case BPF_MAP_TYPE_STACK_TRACE:
case BPF_MAP_TYPE_ARRAY_OF_MAPS:
case BPF_MAP_TYPE_HASH_OF_MAPS:
case BPF_MAP_TYPE_DEVMAP:
case BPF_MAP_TYPE_DEVMAP_HASH:
case BPF_MAP_TYPE_CPUMAP:
case BPF_MAP_TYPE_XSKMAP:
case BPF_MAP_TYPE_SOCKMAP:
case BPF_MAP_TYPE_SOCKHASH:
case BPF_MAP_TYPE_QUEUE:
case BPF_MAP_TYPE_STACK:
case BPF_MAP_TYPE_RINGBUF:
break;
default:
attr.btf_key_type_id = map_attr->btf_key_type_id;
attr.btf_value_type_id = map_attr->btf_value_type_id;
}
attr.max_entries = max_entries;
attr.btf_key_type_id = map_attr->btf_key_type_id;
attr.btf_value_type_id = map_attr->btf_value_type_id;
pr_debug("gen: map_create: %s idx %d type %d value_type_id %d\n",
attr.map_name, map_idx, map_attr->map_type, attr.btf_value_type_id);
attr.map_name, map_idx, map_type, attr.btf_value_type_id);
map_create_attr = add_data(gen, &attr, attr_size);
if (attr.btf_value_type_id)
@ -512,7 +498,7 @@ void bpf_gen__map_create(struct bpf_gen *gen,
/* emit MAP_CREATE command */
emit_sys_bpf(gen, BPF_MAP_CREATE, map_create_attr, attr_size);
debug_ret(gen, "map_create %s idx %d type %d value_size %d value_btf_id %d",
attr.map_name, map_idx, map_attr->map_type, attr.value_size,
attr.map_name, map_idx, map_type, value_size,
attr.btf_value_type_id);
emit_check_err(gen);
/* remember map_fd in the stack, if successful */
@ -701,27 +687,29 @@ static void emit_relo_kfunc_btf(struct bpf_gen *gen, struct ksym_relo_desc *relo
return;
}
kdesc->off = btf_fd_idx;
/* set a default value for imm */
/* jump to success case */
emit(gen, BPF_JMP_IMM(BPF_JSGE, BPF_REG_7, 0, 3));
/* set value for imm, off as 0 */
emit(gen, BPF_ST_MEM(BPF_W, BPF_REG_8, offsetof(struct bpf_insn, imm), 0));
/* skip success case store if ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 1));
emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), 0));
/* skip success case for ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 10));
/* store btf_id into insn[insn_idx].imm */
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_7, offsetof(struct bpf_insn, imm)));
/* obtain fd in BPF_REG_9 */
emit(gen, BPF_MOV64_REG(BPF_REG_9, BPF_REG_7));
emit(gen, BPF_ALU64_IMM(BPF_RSH, BPF_REG_9, 32));
/* jump to fd_array store if fd denotes module BTF */
emit(gen, BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 0, 2));
/* set the default value for off */
emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), 0));
/* skip BTF fd store for vmlinux BTF */
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 4));
/* load fd_array slot pointer */
emit2(gen, BPF_LD_IMM64_RAW_FULL(BPF_REG_0, BPF_PSEUDO_MAP_IDX_VALUE,
0, 0, 0, blob_fd_array_off(gen, btf_fd_idx)));
/* skip store of BTF fd if ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 3));
/* store BTF fd in slot */
emit(gen, BPF_MOV64_REG(BPF_REG_9, BPF_REG_7));
emit(gen, BPF_ALU64_IMM(BPF_RSH, BPF_REG_9, 32));
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_9, 0));
/* set a default value for off */
emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), 0));
/* skip insn->off store if ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 2));
/* skip if vmlinux BTF */
emit(gen, BPF_JMP_IMM(BPF_JEQ, BPF_REG_9, 0, 1));
/* store index into insn[insn_idx].off */
emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), btf_fd_idx));
log:
@ -820,9 +808,8 @@ static void emit_relo_ksym_btf(struct bpf_gen *gen, struct ksym_relo_desc *relo,
kdesc->insn + offsetof(struct bpf_insn, imm));
move_blob2blob(gen, insn + sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm), 4,
kdesc->insn + sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm));
emit(gen, BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_8, offsetof(struct bpf_insn, imm)));
/* jump over src_reg adjustment if imm is not 0 */
emit(gen, BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 0, 3));
/* jump over src_reg adjustment if imm is not 0, reuse BPF_REG_0 from move_blob2blob */
emit(gen, BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3));
goto clear_src_reg;
}
/* remember insn offset, so we can copy BTF ID and FD later */
@ -830,17 +817,20 @@ static void emit_relo_ksym_btf(struct bpf_gen *gen, struct ksym_relo_desc *relo,
emit_bpf_find_by_name_kind(gen, relo);
if (!relo->is_weak)
emit_check_err(gen);
/* set default values as 0 */
/* jump to success case */
emit(gen, BPF_JMP_IMM(BPF_JSGE, BPF_REG_7, 0, 3));
/* set values for insn[insn_idx].imm, insn[insn_idx + 1].imm as 0 */
emit(gen, BPF_ST_MEM(BPF_W, BPF_REG_8, offsetof(struct bpf_insn, imm), 0));
emit(gen, BPF_ST_MEM(BPF_W, BPF_REG_8, sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm), 0));
/* skip success case stores if ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 4));
/* skip success case for ret < 0 */
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 4));
/* store btf_id into insn[insn_idx].imm */
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_7, offsetof(struct bpf_insn, imm)));
/* store btf_obj_fd into insn[insn_idx + 1].imm */
emit(gen, BPF_ALU64_IMM(BPF_RSH, BPF_REG_7, 32));
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_7,
sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm)));
/* skip src_reg adjustment */
emit(gen, BPF_JMP_IMM(BPF_JSGE, BPF_REG_7, 0, 3));
clear_src_reg:
/* clear bpf_object__relocate_data's src_reg assignment, otherwise we get a verifier failure */
@ -852,6 +842,22 @@ clear_src_reg:
emit_ksym_relo_log(gen, relo, kdesc->ref);
}
void bpf_gen__record_relo_core(struct bpf_gen *gen,
const struct bpf_core_relo *core_relo)
{
struct bpf_core_relo *relos;
relos = libbpf_reallocarray(gen->core_relos, gen->core_relo_cnt + 1, sizeof(*relos));
if (!relos) {
gen->error = -ENOMEM;
return;
}
gen->core_relos = relos;
relos += gen->core_relo_cnt;
memcpy(relos, core_relo, sizeof(*relos));
gen->core_relo_cnt++;
}
static void emit_relo(struct bpf_gen *gen, struct ksym_relo_desc *relo, int insns)
{
int insn;
@ -884,6 +890,15 @@ static void emit_relos(struct bpf_gen *gen, int insns)
emit_relo(gen, gen->relos + i, insns);
}
static void cleanup_core_relo(struct bpf_gen *gen)
{
if (!gen->core_relo_cnt)
return;
free(gen->core_relos);
gen->core_relo_cnt = 0;
gen->core_relos = NULL;
}
static void cleanup_relos(struct bpf_gen *gen, int insns)
{
int i, insn;
@ -911,6 +926,7 @@ static void cleanup_relos(struct bpf_gen *gen, int insns)
gen->relo_cnt = 0;
gen->relos = NULL;
}
cleanup_core_relo(gen);
}
void bpf_gen__prog_load(struct bpf_gen *gen,
@ -918,12 +934,13 @@ void bpf_gen__prog_load(struct bpf_gen *gen,
const char *license, struct bpf_insn *insns, size_t insn_cnt,
struct bpf_prog_load_opts *load_attr, int prog_idx)
{
int attr_size = offsetofend(union bpf_attr, fd_array);
int prog_load_attr, license_off, insns_off, func_info, line_info;
int prog_load_attr, license_off, insns_off, func_info, line_info, core_relos;
int attr_size = offsetofend(union bpf_attr, core_relo_rec_size);
union bpf_attr attr;
memset(&attr, 0, attr_size);
pr_debug("gen: prog_load: type %d insns_cnt %zd\n", prog_type, insn_cnt);
pr_debug("gen: prog_load: type %d insns_cnt %zd progi_idx %d\n",
prog_type, insn_cnt, prog_idx);
/* add license string to blob of bytes */
license_off = add_data(gen, license, strlen(license) + 1);
/* add insns to blob of bytes */
@ -947,6 +964,11 @@ void bpf_gen__prog_load(struct bpf_gen *gen,
line_info = add_data(gen, load_attr->line_info,
attr.line_info_cnt * attr.line_info_rec_size);
attr.core_relo_rec_size = sizeof(struct bpf_core_relo);
attr.core_relo_cnt = gen->core_relo_cnt;
core_relos = add_data(gen, gen->core_relos,
attr.core_relo_cnt * attr.core_relo_rec_size);
memcpy(attr.prog_name, prog_name,
min((unsigned)strlen(prog_name), BPF_OBJ_NAME_LEN - 1));
prog_load_attr = add_data(gen, &attr, attr_size);
@ -963,6 +985,9 @@ void bpf_gen__prog_load(struct bpf_gen *gen,
/* populate union bpf_attr with a pointer to line_info */
emit_rel_store(gen, attr_field(prog_load_attr, line_info), line_info);
/* populate union bpf_attr with a pointer to core_relos */
emit_rel_store(gen, attr_field(prog_load_attr, core_relos), core_relos);
/* populate union bpf_attr fd_array with a pointer to data where map_fds are saved */
emit_rel_store(gen, attr_field(prog_load_attr, fd_array), gen->fd_array);
@ -993,9 +1018,11 @@ void bpf_gen__prog_load(struct bpf_gen *gen,
debug_ret(gen, "prog_load %s insn_cnt %d", attr.prog_name, attr.insn_cnt);
/* successful or not, close btf module FDs used in extern ksyms and attach_btf_obj_fd */
cleanup_relos(gen, insns_off);
if (gen->attach_kind)
if (gen->attach_kind) {
emit_sys_close_blob(gen,
attr_field(prog_load_attr, attach_btf_obj_fd));
gen->attach_kind = 0;
}
emit_check_err(gen);
/* remember prog_fd in the stack, if successful */
emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_7,
@ -1041,6 +1068,33 @@ void bpf_gen__map_update_elem(struct bpf_gen *gen, int map_idx, void *pvalue,
emit_check_err(gen);
}
void bpf_gen__populate_outer_map(struct bpf_gen *gen, int outer_map_idx, int slot,
int inner_map_idx)
{
int attr_size = offsetofend(union bpf_attr, flags);
int map_update_attr, key;
union bpf_attr attr;
memset(&attr, 0, attr_size);
pr_debug("gen: populate_outer_map: outer %d key %d inner %d\n",
outer_map_idx, slot, inner_map_idx);
key = add_data(gen, &slot, sizeof(slot));
map_update_attr = add_data(gen, &attr, attr_size);
move_blob2blob(gen, attr_field(map_update_attr, map_fd), 4,
blob_fd_array_off(gen, outer_map_idx));
emit_rel_store(gen, attr_field(map_update_attr, key), key);
emit_rel_store(gen, attr_field(map_update_attr, value),
blob_fd_array_off(gen, inner_map_idx));
/* emit MAP_UPDATE_ELEM command */
emit_sys_bpf(gen, BPF_MAP_UPDATE_ELEM, map_update_attr, attr_size);
debug_ret(gen, "populate_outer_map outer %d key %d inner %d",
outer_map_idx, slot, inner_map_idx);
emit_check_err(gen);
}
void bpf_gen__map_freeze(struct bpf_gen *gen, int map_idx)
{
int attr_size = offsetofend(union bpf_attr, map_fd);

File diff suppressed because it is too large Load Diff

View File

@ -24,6 +24,10 @@
extern "C" {
#endif
LIBBPF_API __u32 libbpf_major_version(void);
LIBBPF_API __u32 libbpf_minor_version(void);
LIBBPF_API const char *libbpf_version_string(void);
enum libbpf_errno {
__LIBBPF_ERRNO__START = 4000,
@ -104,12 +108,73 @@ struct bpf_object_open_opts {
* struct_ops, etc) will need actual kernel BTF at /sys/kernel/btf/vmlinux.
*/
const char *btf_custom_path;
/* Pointer to a buffer for storing kernel logs for applicable BPF
* commands. Valid kernel_log_size has to be specified as well and are
* passed-through to bpf() syscall. Keep in mind that kernel might
* fail operation with -ENOSPC error if provided buffer is too small
* to contain entire log output.
* See the comment below for kernel_log_level for interaction between
* log_buf and log_level settings.
*
* If specified, this log buffer will be passed for:
* - each BPF progral load (BPF_PROG_LOAD) attempt, unless overriden
* with bpf_program__set_log() on per-program level, to get
* BPF verifier log output.
* - during BPF object's BTF load into kernel (BPF_BTF_LOAD) to get
* BTF sanity checking log.
*
* Each BPF command (BPF_BTF_LOAD or BPF_PROG_LOAD) will overwrite
* previous contents, so if you need more fine-grained control, set
* per-program buffer with bpf_program__set_log_buf() to preserve each
* individual program's verification log. Keep using kernel_log_buf
* for BTF verification log, if necessary.
*/
char *kernel_log_buf;
size_t kernel_log_size;
/*
* Log level can be set independently from log buffer. Log_level=0
* means that libbpf will attempt loading BTF or program without any
* logging requested, but will retry with either its own or custom log
* buffer, if provided, and log_level=1 on any error.
* And vice versa, setting log_level>0 will request BTF or prog
* loading with verbose log from the first attempt (and as such also
* for successfully loaded BTF or program), and the actual log buffer
* could be either libbpf's own auto-allocated log buffer, if
* kernel_log_buffer is NULL, or user-provided custom kernel_log_buf.
* If user didn't provide custom log buffer, libbpf will emit captured
* logs through its print callback.
*/
__u32 kernel_log_level;
size_t :0;
};
#define bpf_object_open_opts__last_field btf_custom_path
#define bpf_object_open_opts__last_field kernel_log_level
LIBBPF_API struct bpf_object *bpf_object__open(const char *path);
/**
* @brief **bpf_object__open_file()** creates a bpf_object by opening
* the BPF ELF object file pointed to by the passed path and loading it
* into memory.
* @param path BPF object file path
* @param opts options for how to load the bpf object, this parameter is
* optional and can be set to NULL
* @return pointer to the new bpf_object; or NULL is returned on error,
* error code is stored in errno
*/
LIBBPF_API struct bpf_object *
bpf_object__open_file(const char *path, const struct bpf_object_open_opts *opts);
/**
* @brief **bpf_object__open_mem()** creates a bpf_object by reading
* the BPF objects raw bytes from a memory buffer containing a valid
* BPF ELF object file.
* @param obj_buf pointer to the buffer containing ELF file bytes
* @param obj_buf_sz number of bytes in the buffer
* @param opts options for how to load the bpf object
* @return pointer to the new bpf_object; or NULL is returned on error,
* error code is stored in errno
*/
LIBBPF_API struct bpf_object *
bpf_object__open_mem(const void *obj_buf, size_t obj_buf_sz,
const struct bpf_object_open_opts *opts);
@ -149,6 +214,7 @@ struct bpf_object_load_attr {
/* Load/unload object into/from kernel */
LIBBPF_API int bpf_object__load(struct bpf_object *obj);
LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_object__load() instead")
LIBBPF_API int bpf_object__load_xattr(struct bpf_object_load_attr *attr);
LIBBPF_DEPRECATED_SINCE(0, 6, "bpf_object__unload() is deprecated, use bpf_object__close() instead")
LIBBPF_API int bpf_object__unload(struct bpf_object *obj);
@ -344,10 +410,41 @@ struct bpf_uprobe_opts {
};
#define bpf_uprobe_opts__last_field retprobe
/**
* @brief **bpf_program__attach_uprobe()** attaches a BPF program
* to the userspace function which is found by binary path and
* offset. You can optionally specify a particular proccess to attach
* to. You can also optionally attach the program to the function
* exit instead of entry.
*
* @param prog BPF program to attach
* @param retprobe Attach to function exit
* @param pid Process ID to attach the uprobe to, 0 for self (own process),
* -1 for all processes
* @param binary_path Path to binary that contains the function symbol
* @param func_offset Offset within the binary of the function symbol
* @return Reference to the newly created BPF link; or NULL is returned on error,
* error code is stored in errno
*/
LIBBPF_API struct bpf_link *
bpf_program__attach_uprobe(const struct bpf_program *prog, bool retprobe,
pid_t pid, const char *binary_path,
size_t func_offset);
/**
* @brief **bpf_program__attach_uprobe_opts()** is just like
* bpf_program__attach_uprobe() except with a options struct
* for various configurations.
*
* @param prog BPF program to attach
* @param pid Process ID to attach the uprobe to, 0 for self (own process),
* -1 for all processes
* @param binary_path Path to binary that contains the function symbol
* @param func_offset Offset within the binary of the function symbol
* @param opts Options for altering program attachment
* @return Reference to the newly created BPF link; or NULL is returned on error,
* error code is stored in errno
*/
LIBBPF_API struct bpf_link *
bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
const char *binary_path, size_t func_offset,
@ -494,7 +591,16 @@ bpf_program__set_expected_attach_type(struct bpf_program *prog,
enum bpf_attach_type type);
LIBBPF_API __u32 bpf_program__flags(const struct bpf_program *prog);
LIBBPF_API int bpf_program__set_extra_flags(struct bpf_program *prog, __u32 extra_flags);
LIBBPF_API int bpf_program__set_flags(struct bpf_program *prog, __u32 flags);
/* Per-program log level and log buffer getters/setters.
* See bpf_object_open_opts comments regarding log_level and log_buf
* interactions.
*/
LIBBPF_API __u32 bpf_program__log_level(const struct bpf_program *prog);
LIBBPF_API int bpf_program__set_log_level(struct bpf_program *prog, __u32 log_level);
LIBBPF_API const char *bpf_program__log_buf(const struct bpf_program *prog, size_t *log_size);
LIBBPF_API int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log_size);
LIBBPF_API int
bpf_program__set_attach_target(struct bpf_program *prog, int attach_prog_fd,
@ -676,6 +782,7 @@ struct bpf_prog_load_attr {
int prog_flags;
};
LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_object__open() and bpf_object__load() instead")
LIBBPF_API int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
struct bpf_object **pobj, int *prog_fd);
LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_object__open() and bpf_object__load() instead")
@ -1031,11 +1138,11 @@ struct bpf_object_skeleton {
struct bpf_object **obj;
int map_cnt;
int map_skel_sz; /* sizeof(struct bpf_skeleton_map) */
int map_skel_sz; /* sizeof(struct bpf_map_skeleton) */
struct bpf_map_skeleton *maps;
int prog_cnt;
int prog_skel_sz; /* sizeof(struct bpf_skeleton_prog) */
int prog_skel_sz; /* sizeof(struct bpf_prog_skeleton) */
struct bpf_prog_skeleton *progs;
};

View File

@ -391,6 +391,7 @@ LIBBPF_0.6.0 {
global:
bpf_map__map_extra;
bpf_map__set_map_extra;
bpf_map_create;
bpf_object__next_map;
bpf_object__next_program;
bpf_object__prev_map;
@ -400,7 +401,7 @@ LIBBPF_0.6.0 {
bpf_program__flags;
bpf_program__insn_cnt;
bpf_program__insns;
bpf_program__set_extra_flags;
bpf_program__set_flags;
btf__add_btf;
btf__add_decl_tag;
btf__add_type_tag;
@ -410,8 +411,20 @@ LIBBPF_0.6.0 {
btf__type_cnt;
btf_dump__new;
btf_dump__new_deprecated;
libbpf_major_version;
libbpf_minor_version;
libbpf_version_string;
perf_buffer__new;
perf_buffer__new_deprecated;
perf_buffer__new_raw;
perf_buffer__new_raw_deprecated;
} LIBBPF_0.5.0;
LIBBPF_0.7.0 {
global:
bpf_btf_load;
bpf_program__log_buf;
bpf_program__log_level;
bpf_program__set_log_buf;
bpf_program__set_log_level;
};

View File

@ -40,6 +40,11 @@
#else
#define __LIBBPF_MARK_DEPRECATED_0_7(X)
#endif
#if __LIBBPF_CURRENT_VERSION_GEQ(0, 8)
#define __LIBBPF_MARK_DEPRECATED_0_8(X) X
#else
#define __LIBBPF_MARK_DEPRECATED_0_8(X)
#endif
/* This set of internal macros allows to do "function overloading" based on
* number of arguments provided by used in backwards-compatible way during the

View File

@ -172,7 +172,7 @@ static inline void *libbpf_reallocarray(void *ptr, size_t nmemb, size_t size)
struct btf;
struct btf_type;
struct btf_type *btf_type_by_id(struct btf *btf, __u32 type_id);
struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id);
const char *btf_kind_str(const struct btf_type *t);
const struct btf_type *skip_mods_and_typedefs(const struct btf *btf, __u32 id, __u32 *res_id);
@ -277,27 +277,7 @@ int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz);
int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz);
int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
const char *str_sec, size_t str_len);
struct bpf_create_map_params {
const char *name;
enum bpf_map_type map_type;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
__u32 max_entries;
__u32 numa_node;
__u32 btf_fd;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
__u32 map_ifindex;
union {
__u32 inner_map_fd;
__u32 btf_vmlinux_value_type_id;
};
__u64 map_extra;
};
int libbpf__bpf_create_map_xattr(const struct bpf_create_map_params *create_attr);
int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level);
struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf);
void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,

View File

@ -164,7 +164,7 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
memcpy(raw_btf + hdr.hdr_len, raw_types, hdr.type_len);
memcpy(raw_btf + hdr.hdr_len + hdr.type_len, str_sec, hdr.str_len);
btf_fd = bpf_load_btf(raw_btf, btf_len, NULL, 0, false);
btf_fd = bpf_btf_load(raw_btf, btf_len, NULL);
free(raw_btf);
return btf_fd;
@ -201,7 +201,6 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
{
int key_size, value_size, max_entries, map_flags;
__u32 btf_key_type_id = 0, btf_value_type_id = 0;
struct bpf_create_map_attr attr = {};
int fd = -1, btf_fd = -1, fd_inner;
key_size = sizeof(__u32);
@ -271,34 +270,35 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
if (map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS ||
map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
LIBBPF_OPTS(bpf_map_create_opts, opts);
/* TODO: probe for device, once libbpf has a function to create
* map-in-map for offload
*/
if (ifindex)
return false;
fd_inner = bpf_create_map(BPF_MAP_TYPE_HASH,
sizeof(__u32), sizeof(__u32), 1, 0);
fd_inner = bpf_map_create(BPF_MAP_TYPE_HASH, NULL,
sizeof(__u32), sizeof(__u32), 1, NULL);
if (fd_inner < 0)
return false;
fd = bpf_create_map_in_map(map_type, NULL, sizeof(__u32),
fd_inner, 1, 0);
opts.inner_map_fd = fd_inner;
fd = bpf_map_create(map_type, NULL, sizeof(__u32), sizeof(__u32), 1, &opts);
close(fd_inner);
} else {
LIBBPF_OPTS(bpf_map_create_opts, opts);
/* Note: No other restriction on map type probes for offload */
attr.map_type = map_type;
attr.key_size = key_size;
attr.value_size = value_size;
attr.max_entries = max_entries;
attr.map_flags = map_flags;
attr.map_ifindex = ifindex;
opts.map_flags = map_flags;
opts.map_ifindex = ifindex;
if (btf_fd >= 0) {
attr.btf_fd = btf_fd;
attr.btf_key_type_id = btf_key_type_id;
attr.btf_value_type_id = btf_value_type_id;
opts.btf_fd = btf_fd;
opts.btf_key_type_id = btf_key_type_id;
opts.btf_value_type_id = btf_value_type_id;
}
fd = bpf_create_map_xattr(&attr);
fd = bpf_map_create(map_type, NULL, key_size, value_size, max_entries, &opts);
}
if (fd >= 0)
close(fd);

View File

@ -4,6 +4,6 @@
#define __LIBBPF_VERSION_H
#define LIBBPF_MAJOR_VERSION 0
#define LIBBPF_MINOR_VERSION 6
#define LIBBPF_MINOR_VERSION 7
#endif /* __LIBBPF_VERSION_H */

Some files were not shown because too many files have changed in this diff Show More