Daniel Borkmann says:

====================
pull-request: bpf-next 2022-10-03

We've added 143 non-merge commits during the last 27 day(s) which contain
a total of 151 files changed, 8321 insertions(+), 1402 deletions(-).

The main changes are:

1) Add kfuncs for PKCS#7 signature verification from BPF programs, from Roberto Sassu.

2) Add support for struct-based arguments for trampoline based BPF programs,
   from Yonghong Song.

3) Fix entry IP for kprobe-multi and trampoline probes under IBT enabled, from Jiri Olsa.

4) Batch of improvements to veristat selftest tool in particular to add CSV output,
   a comparison mode for CSV outputs and filtering, from Andrii Nakryiko.

5) Add preparatory changes needed for the BPF core for upcoming BPF HID support,
   from Benjamin Tissoires.

6) Support for direct writes to nf_conn's mark field from tc and XDP BPF program
   types, from Daniel Xu.

7) Initial batch of documentation improvements for BPF insn set spec, from Dave Thaler.

8) Add a new BPF_MAP_TYPE_USER_RINGBUF map which provides single-user-space-producer /
   single-kernel-consumer semantics for BPF ring buffer, from David Vernet.

9) Follow-up fixes to BPF allocator under RT to always use raw spinlock for the BPF
   hashtab's bucket lock, from Hou Tao.

10) Allow creating an iterator that loops through only the resources of one
    task/thread instead of all, from Kui-Feng Lee.

11) Add support for kptrs in the per-CPU arraymap, from Kumar Kartikeya Dwivedi.

12) Add a new kfunc helper for nf to set src/dst NAT IP/port in a newly allocated CT
    entry which is not yet inserted, from Lorenzo Bianconi.

13) Remove invalid recursion check for struct_ops for TCP congestion control BPF
    programs, from Martin KaFai Lau.

14) Fix W^X issue with BPF trampoline and BPF dispatcher, from Song Liu.

15) Fix percpu_counter leakage in BPF hashtab allocation error path, from Tetsuo Handa.

16) Various cleanups in BPF selftests to use preferred ASSERT_* macros, from Wang Yufen.

17) Add invocation for cgroup/connect{4,6} BPF programs for ICMP pings, from YiFei Zhu.

18) Lift blinding decision under bpf_jit_harden = 1 to bpf_capable(), from Yauheni Kaliuta.

19) Various libbpf fixes and cleanups including a libbpf NULL pointer deref, from Xin Liu.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (143 commits)
  net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c
  Documentation: bpf: Add implementation notes documentations to table of contents
  bpf, docs: Delete misformatted table.
  selftests/xsk: Fix double free
  bpftool: Fix error message of strerror
  libbpf: Fix overrun in netlink attribute iteration
  selftests/bpf: Fix spelling mistake "unpriviledged" -> "unprivileged"
  samples/bpf: Fix typo in xdp_router_ipv4 sample
  bpftool: Remove unused struct event_ring_info
  bpftool: Remove unused struct btf_attach_point
  bpf, docs: Add TOC and fix formatting.
  bpf, docs: Add Clang note about BPF_ALU
  bpf, docs: Move Clang notes to a separate file
  bpf, docs: Linux byteswap note
  bpf, docs: Move legacy packet instructions to a separate file
  selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION)
  bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
  bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function
  bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt()
  bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline
  ...
====================

Link: https://lore.kernel.org/r/20221003194915.11847-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Jakub Kicinski 2022-10-03 13:02:48 -07:00
commit a08d97a193
151 changed files with 8321 additions and 1402 deletions

View File

@ -102,6 +102,9 @@ Values:
- 1 - enable JIT hardening for unprivileged users only
- 2 - enable JIT hardening for all users
where "privileged user" in this context means a process having
CAP_BPF or CAP_SYS_ADMIN in the root user name space.
bpf_jit_kallsyms
----------------

View File

@ -0,0 +1,30 @@
.. contents::
.. sectnum::
==========================
Clang implementation notes
==========================
This document provides more details specific to the Clang/LLVM implementation of the eBPF instruction set.
Versions
========
Clang defined "CPU" versions, where a CPU version of 3 corresponds to the current eBPF ISA.
Clang can select the eBPF ISA version using ``-mcpu=v3`` for example to select version 3.
Arithmetic instructions
=======================
For CPU versions prior to 3, Clang v7.0 and later can enable ``BPF_ALU`` support with
``-Xclang -target-feature -Xclang +alu32``. In CPU version 3, support is automatically included.
Atomic operations
=================
Clang can generate atomic instructions by default when ``-mcpu=v3`` is
enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
the atomics features, while keeping a lower ``-mcpu`` version, you can use
``-Xclang -target-feature -Xclang +alu32``.

View File

@ -26,6 +26,8 @@ that goes into great technical depth about the BPF Architecture.
classic_vs_extended.rst
bpf_licensing
test_debug
clang-notes
linux-notes
other
.. only:: subproject and html

View File

@ -1,7 +1,12 @@
.. contents::
.. sectnum::
========================================
eBPF Instruction Set Specification, v1.0
========================================
This document specifies version 1.0 of the eBPF instruction set.
====================
eBPF Instruction Set
====================
Registers and calling convention
================================
@ -11,10 +16,10 @@ all of which are 64-bits wide.
The eBPF calling convention is defined as:
* R0: return value from function calls, and exit value for eBPF programs
* R1 - R5: arguments for function calls
* R6 - R9: callee saved registers that function calls will preserve
* R10: read-only frame pointer to access stack
* R0: return value from function calls, and exit value for eBPF programs
* R1 - R5: arguments for function calls
* R6 - R9: callee saved registers that function calls will preserve
* R10: read-only frame pointer to access stack
R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
necessary across calls.
@ -24,17 +29,17 @@ Instruction encoding
eBPF has two instruction encodings:
* the basic instruction encoding, which uses 64 bits to encode an instruction
* the wide instruction encoding, which appends a second 64-bit immediate value
(imm64) after the basic instruction for a total of 128 bits.
* the basic instruction encoding, which uses 64 bits to encode an instruction
* the wide instruction encoding, which appends a second 64-bit immediate value
(imm64) after the basic instruction for a total of 128 bits.
The basic instruction encoding looks as follows:
============= ======= =============== ==================== ============
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
============= ======= =============== ==================== ============
immediate offset source register destination register opcode
============= ======= =============== ==================== ============
============= ======= =============== ==================== ============
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
============= ======= =============== ==================== ============
immediate offset source register destination register opcode
============= ======= =============== ==================== ============
Note that most instructions do not use all of the fields.
Unused fields shall be cleared to zero.
@ -44,30 +49,30 @@ Instruction classes
The three LSB bits of the 'opcode' field store the instruction class:
========= ===== ===============================
class value description
========= ===== ===============================
BPF_LD 0x00 non-standard load operations
BPF_LDX 0x01 load into register operations
BPF_ST 0x02 store from immediate operations
BPF_STX 0x03 store from register operations
BPF_ALU 0x04 32-bit arithmetic operations
BPF_JMP 0x05 64-bit jump operations
BPF_JMP32 0x06 32-bit jump operations
BPF_ALU64 0x07 64-bit arithmetic operations
========= ===== ===============================
========= ===== =============================== ===================================
class value description reference
========= ===== =============================== ===================================
BPF_LD 0x00 non-standard load operations `Load and store instructions`_
BPF_LDX 0x01 load into register operations `Load and store instructions`_
BPF_ST 0x02 store from immediate operations `Load and store instructions`_
BPF_STX 0x03 store from register operations `Load and store instructions`_
BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump instructions`_
BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump instructions`_
BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump instructions`_
BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump instructions`_
========= ===== =============================== ===================================
Arithmetic and jump instructions
================================
For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and
BPF_JMP32), the 8-bit 'opcode' field is divided into three parts:
For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and
``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts:
============== ====== =================
4 bits (MSB) 1 bit 3 bits (LSB)
============== ====== =================
operation code source instruction class
============== ====== =================
============== ====== =================
4 bits (MSB) 1 bit 3 bits (LSB)
============== ====== =================
operation code source instruction class
============== ====== =================
The 4th bit encodes the source operand:
@ -84,51 +89,51 @@ The four MSB bits store the operation code.
Arithmetic instructions
-----------------------
BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for
``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
otherwise identical operations.
The code field encodes the operation as below:
The 'code' field encodes the operation as below:
======== ===== =================================================
code value description
======== ===== =================================================
BPF_ADD 0x00 dst += src
BPF_SUB 0x10 dst -= src
BPF_MUL 0x20 dst \*= src
BPF_DIV 0x30 dst /= src
BPF_OR 0x40 dst \|= src
BPF_AND 0x50 dst &= src
BPF_LSH 0x60 dst <<= src
BPF_RSH 0x70 dst >>= src
BPF_NEG 0x80 dst = ~src
BPF_MOD 0x90 dst %= src
BPF_XOR 0xa0 dst ^= src
BPF_MOV 0xb0 dst = src
BPF_ARSH 0xc0 sign extending shift right
BPF_END 0xd0 byte swap operations (see separate section below)
======== ===== =================================================
======== ===== ==========================================================
code value description
======== ===== ==========================================================
BPF_ADD 0x00 dst += src
BPF_SUB 0x10 dst -= src
BPF_MUL 0x20 dst \*= src
BPF_DIV 0x30 dst /= src
BPF_OR 0x40 dst \|= src
BPF_AND 0x50 dst &= src
BPF_LSH 0x60 dst <<= src
BPF_RSH 0x70 dst >>= src
BPF_NEG 0x80 dst = ~src
BPF_MOD 0x90 dst %= src
BPF_XOR 0xa0 dst ^= src
BPF_MOV 0xb0 dst = src
BPF_ARSH 0xc0 sign extending shift right
BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below)
======== ===== ==========================================================
BPF_ADD | BPF_X | BPF_ALU means::
``BPF_ADD | BPF_X | BPF_ALU`` means::
dst_reg = (u32) dst_reg + (u32) src_reg;
BPF_ADD | BPF_X | BPF_ALU64 means::
``BPF_ADD | BPF_X | BPF_ALU64`` means::
dst_reg = dst_reg + src_reg
BPF_XOR | BPF_K | BPF_ALU means::
``BPF_XOR | BPF_K | BPF_ALU`` means::
src_reg = (u32) src_reg ^ (u32) imm32
BPF_XOR | BPF_K | BPF_ALU64 means::
``BPF_XOR | BPF_K | BPF_ALU64`` means::
src_reg = src_reg ^ imm32
Byte swap instructions
----------------------
~~~~~~~~~~~~~~~~~~~~~~
The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
code field of ``BPF_END``.
'code' field of ``BPF_END``.
The byte swap instructions operate on the destination register
only and do not use a separate source register or immediate value.
@ -136,14 +141,14 @@ only and do not use a separate source register or immediate value.
The 1-bit source operand field in the opcode is used to to select what byte
order the operation convert from or to:
========= ===== =================================================
source value description
========= ===== =================================================
BPF_TO_LE 0x00 convert between host byte order and little endian
BPF_TO_BE 0x08 convert between host byte order and big endian
========= ===== =================================================
========= ===== =================================================
source value description
========= ===== =================================================
BPF_TO_LE 0x00 convert between host byte order and little endian
BPF_TO_BE 0x08 convert between host byte order and big endian
========= ===== =================================================
The imm field encodes the width of the swap operations. The following widths
The 'imm' field encodes the width of the swap operations. The following widths
are supported: 16, 32 and 64.
Examples:
@ -156,35 +161,31 @@ Examples:
dst_reg = htobe64(dst_reg)
``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and
``BPF_TO_BE`` respectively.
Jump instructions
-----------------
BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for
``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
otherwise identical operations.
The code field encodes the operation as below:
The 'code' field encodes the operation as below:
======== ===== ========================= ============
code value description notes
======== ===== ========================= ============
BPF_JA 0x00 PC += off BPF_JMP only
BPF_JEQ 0x10 PC += off if dst == src
BPF_JGT 0x20 PC += off if dst > src unsigned
BPF_JGE 0x30 PC += off if dst >= src unsigned
BPF_JSET 0x40 PC += off if dst & src
BPF_JNE 0x50 PC += off if dst != src
BPF_JSGT 0x60 PC += off if dst > src signed
BPF_JSGE 0x70 PC += off if dst >= src signed
BPF_CALL 0x80 function call
BPF_EXIT 0x90 function / program return BPF_JMP only
BPF_JLT 0xa0 PC += off if dst < src unsigned
BPF_JLE 0xb0 PC += off if dst <= src unsigned
BPF_JSLT 0xc0 PC += off if dst < src signed
BPF_JSLE 0xd0 PC += off if dst <= src signed
======== ===== ========================= ============
======== ===== ========================= ============
code value description notes
======== ===== ========================= ============
BPF_JA 0x00 PC += off BPF_JMP only
BPF_JEQ 0x10 PC += off if dst == src
BPF_JGT 0x20 PC += off if dst > src unsigned
BPF_JGE 0x30 PC += off if dst >= src unsigned
BPF_JSET 0x40 PC += off if dst & src
BPF_JNE 0x50 PC += off if dst != src
BPF_JSGT 0x60 PC += off if dst > src signed
BPF_JSGE 0x70 PC += off if dst >= src signed
BPF_CALL 0x80 function call
BPF_EXIT 0x90 function / program return BPF_JMP only
BPF_JLT 0xa0 PC += off if dst < src unsigned
BPF_JLE 0xb0 PC += off if dst <= src unsigned
BPF_JSLT 0xc0 PC += off if dst < src signed
BPF_JSLE 0xd0 PC += off if dst <= src signed
======== ===== ========================= ============
The eBPF program needs to store the return value into register R0 before doing a
BPF_EXIT.
@ -193,14 +194,26 @@ BPF_EXIT.
Load and store instructions
===========================
For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the
For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the
8-bit 'opcode' field is divided as:
============ ====== =================
3 bits (MSB) 2 bits 3 bits (LSB)
============ ====== =================
mode size instruction class
============ ====== =================
============ ====== =================
3 bits (MSB) 2 bits 3 bits (LSB)
============ ====== =================
mode size instruction class
============ ====== =================
The mode modifier is one of:
============= ===== ==================================== =============
mode modifier value description reference
============= ===== ==================================== =============
BPF_IMM 0x00 64-bit immediate instructions `64-bit immediate instructions`_
BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_
BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_
BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_
BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_
============= ===== ==================================== =============
The size modifier is one of:
@ -213,19 +226,6 @@ The size modifier is one of:
BPF_DW 0x18 double word (8 bytes)
============= ===== =====================
The mode modifier is one of:
============= ===== ====================================
mode modifier value description
============= ===== ====================================
BPF_IMM 0x00 64-bit immediate instructions
BPF_ABS 0x20 legacy BPF packet access (absolute)
BPF_IND 0x40 legacy BPF packet access (indirect)
BPF_MEM 0x60 regular load and store operations
BPF_ATOMIC 0xc0 atomic operations
============= ===== ====================================
Regular load and store operations
---------------------------------
@ -256,44 +256,42 @@ by other eBPF programs or means outside of this specification.
All atomic operations supported by eBPF are encoded as store operations
that use the ``BPF_ATOMIC`` mode modifier as follows:
* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
* 8-bit and 16-bit wide atomic operations are not supported.
* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
* 8-bit and 16-bit wide atomic operations are not supported.
The imm field is used to encode the actual atomic operation.
The 'imm' field is used to encode the actual atomic operation.
Simple atomic operation use a subset of the values defined to encode
arithmetic operations in the imm field to encode the atomic operation:
arithmetic operations in the 'imm' field to encode the atomic operation:
======== ===== ===========
imm value description
======== ===== ===========
BPF_ADD 0x00 atomic add
BPF_OR 0x40 atomic or
BPF_AND 0x50 atomic and
BPF_XOR 0xa0 atomic xor
======== ===== ===========
======== ===== ===========
imm value description
======== ===== ===========
BPF_ADD 0x00 atomic add
BPF_OR 0x40 atomic or
BPF_AND 0x50 atomic and
BPF_XOR 0xa0 atomic xor
======== ===== ===========
``BPF_ATOMIC | BPF_W | BPF_STX`` with imm = BPF_ADD means::
``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means::
*(u32 *)(dst_reg + off16) += src_reg
``BPF_ATOMIC | BPF_DW | BPF_STX`` with imm = BPF ADD means::
``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
*(u64 *)(dst_reg + off16) += src_reg
``BPF_XADD`` is a deprecated name for ``BPF_ATOMIC | BPF_ADD``.
In addition to the simple atomic operations, there also is a modifier and
two complex atomic operations:
=========== ================ ===========================
imm value description
=========== ================ ===========================
BPF_FETCH 0x01 modifier: return old value
BPF_XCHG 0xe0 | BPF_FETCH atomic exchange
BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange
=========== ================ ===========================
=========== ================ ===========================
imm value description
=========== ================ ===========================
BPF_FETCH 0x01 modifier: return old value
BPF_XCHG 0xe0 | BPF_FETCH atomic exchange
BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange
=========== ================ ===========================
The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
always set for the complex atomic operations. If the ``BPF_FETCH`` flag
@ -309,16 +307,10 @@ The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
value that was at ``dst_reg + off`` before the operation is zero-extended
and loaded back to ``R0``.
Clang can generate atomic instructions by default when ``-mcpu=v3`` is
enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
the atomics features, while keeping a lower ``-mcpu`` version, you can use
``-Xclang -target-feature -Xclang +alu32``.
64-bit immediate instructions
-----------------------------
Instructions with the ``BPF_IMM`` mode modifier use the wide instruction
Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
encoding for an extra imm64 value.
There is currently only one such instruction.
@ -331,36 +323,6 @@ There is currently only one such instruction.
Legacy BPF Packet access instructions
-------------------------------------
eBPF has special instructions for access to packet data that have been
carried over from classic BPF to retain the performance of legacy socket
filters running in the eBPF interpreter.
The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
``BPF_IND | <size> | BPF_LD``.
These instructions are used to access packet data and can only be used when
the program context is a pointer to networking packet. ``BPF_ABS``
accesses packet data at an absolute offset specified by the immediate data
and ``BPF_IND`` access packet data at an offset that includes the value of
a register in addition to the immediate data.
These instructions have seven implicit operands:
* Register R6 is an implicit input that must contain pointer to a
struct sk_buff.
* Register R0 is an implicit output which contains the data fetched from
the packet.
* Registers R1-R5 are scratch registers that are clobbered after a call to
``BPF_ABS | BPF_LD`` or ``BPF_IND | BPF_LD`` instructions.
These instructions have an implicit program exit condition as well. When an
eBPF program is trying to access the data beyond the packet boundary, the
program execution will be aborted.
``BPF_ABS | BPF_W | BPF_LD`` means::
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + imm32))
``BPF_IND | BPF_W | BPF_LD`` means::
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
eBPF previously introduced special instructions for access to packet data that were
carried over from classic BPF. However, these instructions are
deprecated and should no longer be used.

View File

@ -137,14 +137,22 @@ KF_ACQUIRE and KF_RET_NULL flags.
--------------------------
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
indicates that the all pointer arguments will always be refcounted, and have
their offset set to 0. It can be used to enforce that a pointer to a refcounted
object acquired from a kfunc or BPF helper is passed as an argument to this
kfunc without any modifications (e.g. pointer arithmetic) such that it is
trusted and points to the original object. This flag is often used for kfuncs
that operate (change some property, perform some operation) on an object that
was obtained using an acquire kfunc. Such kfuncs need an unchanged pointer to
ensure the integrity of the operation being performed on the expected object.
indicates that the all pointer arguments will always have a guaranteed lifetime,
and pointers to kernel objects are always passed to helpers in their unmodified
form (as obtained from acquire kfuncs).
It can be used to enforce that a pointer to a refcounted object acquired from a
kfunc or BPF helper is passed as an argument to this kfunc without any
modifications (e.g. pointer arithmetic) such that it is trusted and points to
the original object.
Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
but those can have a non-zero offset.
This flag is often used for kfuncs that operate (change some property, perform
some operation) on an object that was obtained using an acquire kfunc. Such
kfuncs need an unchanged pointer to ensure the integrity of the operation being
performed on the expected object.
2.4.6 KF_SLEEPABLE flag
-----------------------

View File

@ -0,0 +1,53 @@
.. contents::
.. sectnum::
==========================
Linux implementation notes
==========================
This document provides more details specific to the Linux kernel implementation of the eBPF instruction set.
Byte swap instructions
======================
``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and ``BPF_TO_BE`` respectively.
Legacy BPF Packet access instructions
=====================================
As mentioned in the `ISA standard documentation <instruction-set.rst#legacy-bpf-packet-access-instructions>`_,
Linux has special eBPF instructions for access to packet data that have been
carried over from classic BPF to retain the performance of legacy socket
filters running in the eBPF interpreter.
The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and
``BPF_IND | <size> | BPF_LD``.
These instructions are used to access packet data and can only be used when
the program context is a pointer to a networking packet. ``BPF_ABS``
accesses packet data at an absolute offset specified by the immediate data
and ``BPF_IND`` access packet data at an offset that includes the value of
a register in addition to the immediate data.
These instructions have seven implicit operands:
* Register R6 is an implicit input that must contain a pointer to a
struct sk_buff.
* Register R0 is an implicit output which contains the data fetched from
the packet.
* Registers R1-R5 are scratch registers that are clobbered by the
instruction.
These instructions have an implicit program exit condition as well. If an
eBPF program attempts access data beyond the packet boundary, the
program execution will be aborted.
``BPF_ABS | BPF_W | BPF_LD`` (0x20) means::
R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + imm))
where ``ntohl()`` converts a 32-bit value from network byte order to host byte order.
``BPF_IND | BPF_W | BPF_LD`` (0x40) means::
R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + src + imm))

View File

@ -1970,7 +1970,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
u32 flags, struct bpf_tramp_links *tlinks,
void *orig_call)
{
int ret;
int i, ret;
int nargs = m->nr_args;
int max_insns = ((long)image_end - (long)image) / AARCH64_INSN_SIZE;
struct jit_ctx ctx = {
@ -1982,6 +1982,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
if (nargs > 8)
return -ENOTSUPP;
/* don't support struct argument */
for (i = 0; i < MAX_BPF_FUNC_ARGS; i++) {
if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
return -ENOTSUPP;
}
ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
if (ret < 0)
return ret;

View File

@ -284,6 +284,7 @@ config X86
select PROC_PID_ARCH_STATUS if PROC_FS
select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
config INSTRUCTION_DECODER
def_bool y

View File

@ -662,7 +662,7 @@ static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
*/
emit_mov_imm32(&prog, false, dst_reg, imm32_lo);
} else {
/* movabsq %rax, imm64 */
/* movabsq rax, imm64 */
EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
EMIT(imm32_lo, 4);
EMIT(imm32_hi, 4);
@ -1751,34 +1751,60 @@ emit_jmp:
static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_args,
int stack_size)
{
int i;
int i, j, arg_size, nr_regs;
/* Store function arguments to stack.
* For a function that accepts two pointers the sequence will be:
* mov QWORD PTR [rbp-0x10],rdi
* mov QWORD PTR [rbp-0x8],rsi
*/
for (i = 0; i < min(nr_args, 6); i++)
emit_stx(prog, bytes_to_bpf_size(m->arg_size[i]),
BPF_REG_FP,
i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
-(stack_size - i * 8));
for (i = 0, j = 0; i < min(nr_args, 6); i++) {
if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) {
nr_regs = (m->arg_size[i] + 7) / 8;
arg_size = 8;
} else {
nr_regs = 1;
arg_size = m->arg_size[i];
}
while (nr_regs) {
emit_stx(prog, bytes_to_bpf_size(arg_size),
BPF_REG_FP,
j == 5 ? X86_REG_R9 : BPF_REG_1 + j,
-(stack_size - j * 8));
nr_regs--;
j++;
}
}
}
static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_args,
int stack_size)
{
int i;
int i, j, arg_size, nr_regs;
/* Restore function arguments from stack.
* For a function that accepts two pointers the sequence will be:
* EMIT4(0x48, 0x8B, 0x7D, 0xF0); mov rdi,QWORD PTR [rbp-0x10]
* EMIT4(0x48, 0x8B, 0x75, 0xF8); mov rsi,QWORD PTR [rbp-0x8]
*/
for (i = 0; i < min(nr_args, 6); i++)
emit_ldx(prog, bytes_to_bpf_size(m->arg_size[i]),
i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
BPF_REG_FP,
-(stack_size - i * 8));
for (i = 0, j = 0; i < min(nr_args, 6); i++) {
if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) {
nr_regs = (m->arg_size[i] + 7) / 8;
arg_size = 8;
} else {
nr_regs = 1;
arg_size = m->arg_size[i];
}
while (nr_regs) {
emit_ldx(prog, bytes_to_bpf_size(arg_size),
j == 5 ? X86_REG_R9 : BPF_REG_1 + j,
BPF_REG_FP,
-(stack_size - j * 8));
nr_regs--;
j++;
}
}
}
static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
@ -1810,6 +1836,9 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
if (p->aux->sleepable) {
enter = __bpf_prog_enter_sleepable;
exit = __bpf_prog_exit_sleepable;
} else if (p->type == BPF_PROG_TYPE_STRUCT_OPS) {
enter = __bpf_prog_enter_struct_ops;
exit = __bpf_prog_exit_struct_ops;
} else if (p->expected_attach_type == BPF_LSM_CGROUP) {
enter = __bpf_prog_enter_lsm_cgroup;
exit = __bpf_prog_exit_lsm_cgroup;
@ -2013,13 +2042,14 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end,
const struct btf_func_model *m, u32 flags,
struct bpf_tramp_links *tlinks,
void *orig_call)
void *func_addr)
{
int ret, i, nr_args = m->nr_args;
int ret, i, nr_args = m->nr_args, extra_nregs = 0;
int regs_off, ip_off, args_off, stack_size = nr_args * 8, run_ctx_off;
struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
void *orig_call = func_addr;
u8 **branches = NULL;
u8 *prog;
bool save_ret;
@ -2028,6 +2058,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
if (nr_args > 6)
return -ENOTSUPP;
for (i = 0; i < MAX_BPF_FUNC_ARGS; i++) {
if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
extra_nregs += (m->arg_size[i] + 7) / 8 - 1;
}
if (nr_args + extra_nregs > 6)
return -ENOTSUPP;
stack_size += extra_nregs * 8;
/* Generated trampoline stack layout:
*
* RBP + 8 [ return address ]
@ -2040,7 +2078,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
* [ ... ]
* RBP - regs_off [ reg_arg1 ] program's ctx pointer
*
* RBP - args_off [ args count ] always
* RBP - args_off [ arg regs count ] always
*
* RBP - ip_off [ traced function ] BPF_TRAMP_F_IP_ARG flag
*
@ -2083,21 +2121,19 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
EMIT1(0x53); /* push rbx */
/* Store number of arguments of the traced function:
* mov rax, nr_args
/* Store number of argument registers of the traced function:
* mov rax, nr_args + extra_nregs
* mov QWORD PTR [rbp - args_off], rax
*/
emit_mov_imm64(&prog, BPF_REG_0, 0, (u32) nr_args);
emit_mov_imm64(&prog, BPF_REG_0, 0, (u32) nr_args + extra_nregs);
emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -args_off);
if (flags & BPF_TRAMP_F_IP_ARG) {
/* Store IP address of the traced function:
* mov rax, QWORD PTR [rbp + 8]
* sub rax, X86_PATCH_SIZE
* movabsq rax, func_addr
* mov QWORD PTR [rbp - ip_off], rax
*/
emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
EMIT4(0x48, 0x83, 0xe8, X86_PATCH_SIZE);
emit_mov_imm64(&prog, BPF_REG_0, (long) func_addr >> 32, (u32) (long) func_addr);
emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -ip_off);
}
@ -2209,7 +2245,7 @@ cleanup:
return ret;
}
static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs, u8 *image, u8 *buf)
{
u8 *jg_reloc, *prog = *pprog;
int pivot, err, jg_bytes = 1;
@ -2225,12 +2261,12 @@ static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
EMIT2_off32(0x81, add_1reg(0xF8, BPF_REG_3),
progs[a]);
err = emit_cond_near_jump(&prog, /* je func */
(void *)progs[a], prog,
(void *)progs[a], image + (prog - buf),
X86_JE);
if (err)
return err;
emit_indirect_jump(&prog, 2 /* rdx */, prog);
emit_indirect_jump(&prog, 2 /* rdx */, image + (prog - buf));
*pprog = prog;
return 0;
@ -2255,7 +2291,7 @@ static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
jg_reloc = prog;
err = emit_bpf_dispatcher(&prog, a, a + pivot, /* emit lower_part */
progs);
progs, image, buf);
if (err)
return err;
@ -2269,7 +2305,7 @@ static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
emit_code(jg_reloc - jg_bytes, jg_offset, jg_bytes);
err = emit_bpf_dispatcher(&prog, a + pivot + 1, /* emit upper_part */
b, progs);
b, progs, image, buf);
if (err)
return err;
@ -2289,12 +2325,12 @@ static int cmp_ips(const void *a, const void *b)
return 0;
}
int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs)
int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs)
{
u8 *prog = image;
u8 *prog = buf;
sort(funcs, num_funcs, sizeof(funcs[0]), cmp_ips, NULL);
return emit_bpf_dispatcher(&prog, 0, num_funcs - 1, funcs);
return emit_bpf_dispatcher(&prog, 0, num_funcs - 1, funcs, image, buf);
}
struct x64_jit_data {

View File

@ -154,6 +154,14 @@
#define MEM_DISCARD(sec) *(.mem##sec)
#endif
#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
#define KEEP_PATCHABLE KEEP(*(__patchable_function_entries))
#define PATCHABLE_DISCARDS
#else
#define KEEP_PATCHABLE
#define PATCHABLE_DISCARDS *(__patchable_function_entries)
#endif
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
/*
* The ftrace call sites are logged to a section whose name depends on the
@ -172,7 +180,7 @@
#define MCOUNT_REC() . = ALIGN(8); \
__start_mcount_loc = .; \
KEEP(*(__mcount_loc)) \
KEEP(*(__patchable_function_entries)) \
KEEP_PATCHABLE \
__stop_mcount_loc = .; \
ftrace_stub_graph = ftrace_stub; \
ftrace_ops_list_func = arch_ftrace_ops_list_func;
@ -1023,6 +1031,7 @@
#define COMMON_DISCARDS \
SANITIZER_DISCARDS \
PATCHABLE_DISCARDS \
*(.discard) \
*(.discard.*) \
*(.modinfo) \

View File

@ -280,14 +280,33 @@ static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
}
}
/* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */
static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
* forced to use 'long' read/writes to try to atomically copy long counters.
* Best-effort only. No barriers here, since it _will_ race with concurrent
* updates from BPF programs. Called from bpf syscall and mostly used with
* size 8 or 16 bytes, so ask compiler to inline it.
*/
static inline void bpf_long_memcpy(void *dst, const void *src, u32 size)
{
const long *lsrc = src;
long *ldst = dst;
size /= sizeof(long);
while (size--)
*ldst++ = *lsrc++;
}
/* copy everything but bpf_spin_lock, bpf_timer, and kptrs. There could be one of each. */
static inline void __copy_map_value(struct bpf_map *map, void *dst, void *src, bool long_memcpy)
{
u32 curr_off = 0;
int i;
if (likely(!map->off_arr)) {
memcpy(dst, src, map->value_size);
if (long_memcpy)
bpf_long_memcpy(dst, src, round_up(map->value_size, 8));
else
memcpy(dst, src, map->value_size);
return;
}
@ -299,6 +318,36 @@ static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
}
memcpy(dst + curr_off, src + curr_off, map->value_size - curr_off);
}
static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
{
__copy_map_value(map, dst, src, false);
}
static inline void copy_map_value_long(struct bpf_map *map, void *dst, void *src)
{
__copy_map_value(map, dst, src, true);
}
static inline void zero_map_value(struct bpf_map *map, void *dst)
{
u32 curr_off = 0;
int i;
if (likely(!map->off_arr)) {
memset(dst, 0, map->value_size);
return;
}
for (i = 0; i < map->off_arr->cnt; i++) {
u32 next_off = map->off_arr->field_off[i];
memset(dst + curr_off, 0, next_off - curr_off);
curr_off += map->off_arr->field_sz[i];
}
memset(dst + curr_off, 0, map->value_size - curr_off);
}
void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
bool lock_src);
void bpf_timer_cancel_and_free(void *timer);
@ -402,7 +451,7 @@ enum bpf_type_flag {
/* DYNPTR points to memory local to the bpf program. */
DYNPTR_TYPE_LOCAL = BIT(8 + BPF_BASE_TYPE_BITS),
/* DYNPTR points to a ringbuf record. */
/* DYNPTR points to a kernel-produced ringbuf record. */
DYNPTR_TYPE_RINGBUF = BIT(9 + BPF_BASE_TYPE_BITS),
/* Size is known at compile time. */
@ -607,6 +656,7 @@ enum bpf_reg_type {
PTR_TO_MEM, /* reg points to valid memory region */
PTR_TO_BUF, /* reg points to a read/write buffer */
PTR_TO_FUNC, /* reg points to a bpf program function */
PTR_TO_DYNPTR, /* reg points to a dynptr */
__BPF_REG_TYPE_MAX,
/* Extended reg_types. */
@ -727,10 +777,14 @@ enum bpf_cgroup_storage_type {
*/
#define MAX_BPF_FUNC_REG_ARGS 5
/* The argument is a structure. */
#define BTF_FMODEL_STRUCT_ARG BIT(0)
struct btf_func_model {
u8 ret_size;
u8 nr_args;
u8 arg_size[MAX_BPF_FUNC_ARGS];
u8 arg_flags[MAX_BPF_FUNC_ARGS];
};
/* Restore arguments before returning from trampoline to let original function
@ -810,6 +864,10 @@ u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog,
struct bpf_tramp_run_ctx *run_ctx);
void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start,
struct bpf_tramp_run_ctx *run_ctx);
u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog,
struct bpf_tramp_run_ctx *run_ctx);
void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start,
struct bpf_tramp_run_ctx *run_ctx);
void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr);
void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr);
@ -892,6 +950,7 @@ struct bpf_dispatcher {
struct bpf_dispatcher_prog progs[BPF_DISPATCHER_MAX];
int num_progs;
void *image;
void *rw_image;
u32 image_off;
struct bpf_ksym ksym;
};
@ -910,7 +969,7 @@ int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampolin
struct bpf_trampoline *bpf_trampoline_get(u64 key,
struct bpf_attach_target_info *tgt_info);
void bpf_trampoline_put(struct bpf_trampoline *tr);
int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs);
int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs);
#define BPF_DISPATCHER_INIT(_name) { \
.mutex = __MUTEX_INITIALIZER(_name.mutex), \
.func = &_name##_func, \
@ -924,7 +983,14 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs);
}, \
}
#ifdef CONFIG_X86_64
#define BPF_DISPATCHER_ATTRIBUTES __attribute__((patchable_function_entry(5)))
#else
#define BPF_DISPATCHER_ATTRIBUTES
#endif
#define DEFINE_BPF_DISPATCHER(name) \
notrace BPF_DISPATCHER_ATTRIBUTES \
noinline __nocfi unsigned int bpf_dispatcher_##name##_func( \
const void *ctx, \
const struct bpf_insn *insnsi, \
@ -946,7 +1012,6 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs);
void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
struct bpf_prog *to);
/* Called only from JIT-enabled code, so there's no need for stubs. */
void *bpf_jit_alloc_exec_page(void);
void bpf_image_ksym_add(void *data, struct bpf_ksym *ksym);
void bpf_image_ksym_del(struct bpf_ksym *ksym);
void bpf_ksym_add(struct bpf_ksym *ksym);
@ -1334,6 +1399,11 @@ struct bpf_array {
#define BPF_MAP_CAN_READ BIT(0)
#define BPF_MAP_CAN_WRITE BIT(1)
/* Maximum number of user-producer ring buffer samples that can be drained in
* a call to bpf_user_ringbuf_drain().
*/
#define BPF_MAX_USER_RINGBUF_SAMPLES (128 * 1024)
static inline u32 bpf_map_flags_to_cap(struct bpf_map *map)
{
u32 access_flags = map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG);
@ -1730,6 +1800,27 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
extern int bpf_iter_ ## target(args); \
int __init bpf_iter_ ## target(args) { return 0; }
/*
* The task type of iterators.
*
* For BPF task iterators, they can be parameterized with various
* parameters to visit only some of tasks.
*
* BPF_TASK_ITER_ALL (default)
* Iterate over resources of every task.
*
* BPF_TASK_ITER_TID
* Iterate over resources of a task/tid.
*
* BPF_TASK_ITER_TGID
* Iterate over resources of every task of a process / task group.
*/
enum bpf_iter_task_type {
BPF_TASK_ITER_ALL = 0,
BPF_TASK_ITER_TID,
BPF_TASK_ITER_TGID,
};
struct bpf_iter_aux_info {
/* for map_elem iter */
struct bpf_map *map;
@ -1739,6 +1830,10 @@ struct bpf_iter_aux_info {
struct cgroup *start; /* starting cgroup */
enum bpf_cgroup_iter_order order;
} cgroup;
struct {
enum bpf_iter_task_type type;
u32 pid;
} task;
};
typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
@ -1823,22 +1918,6 @@ int bpf_get_file_flag(int flags);
int bpf_check_uarg_tail_zero(bpfptr_t uaddr, size_t expected_size,
size_t actual_size);
/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
* forced to use 'long' read/writes to try to atomically copy long counters.
* Best-effort only. No barriers here, since it _will_ race with concurrent
* updates from BPF programs. Called from bpf syscall and mostly used with
* size 8 or 16 bytes, so ask compiler to inline it.
*/
static inline void bpf_long_memcpy(void *dst, const void *src, u32 size)
{
const long *lsrc = src;
long *ldst = dst;
size /= sizeof(long);
while (size--)
*ldst++ = *lsrc++;
}
/* verify correctness of eBPF program */
int bpf_check(struct bpf_prog **fp, union bpf_attr *attr, bpfptr_t uattr);
@ -1940,13 +2019,22 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
const char *func_name,
struct btf_func_model *m);
struct bpf_kfunc_arg_meta {
u64 r0_size;
bool r0_rdonly;
int ref_obj_id;
u32 flags;
};
struct bpf_reg_state;
int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *regs);
int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *regs);
int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
const struct btf *btf, u32 func_id,
struct bpf_reg_state *regs,
u32 kfunc_flags);
struct bpf_kfunc_arg_meta *meta);
int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *reg);
int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,
@ -1983,6 +2071,8 @@ static inline bool has_current_bpf_ctx(void)
{
return !!current->bpf_ctx;
}
void notrace bpf_prog_inc_misses_counter(struct bpf_prog *prog);
#else /* !CONFIG_BPF_SYSCALL */
static inline struct bpf_prog *bpf_prog_get(u32 ufd)
{
@ -2165,6 +2255,15 @@ static inline struct bpf_prog *bpf_prog_by_id(u32 id)
return ERR_PTR(-ENOTSUPP);
}
static inline int btf_struct_access(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype,
u32 *next_btf_id, enum bpf_type_flag *flag)
{
return -EACCES;
}
static inline const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
{
@ -2196,6 +2295,10 @@ static inline bool has_current_bpf_ctx(void)
{
return false;
}
static inline void bpf_prog_inc_misses_counter(struct bpf_prog *prog)
{
}
#endif /* CONFIG_BPF_SYSCALL */
void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
@ -2433,6 +2536,7 @@ extern const struct bpf_func_proto bpf_loop_proto;
extern const struct bpf_func_proto bpf_copy_from_user_task_proto;
extern const struct bpf_func_proto bpf_set_retval_proto;
extern const struct bpf_func_proto bpf_get_retval_proto;
extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
const struct bpf_func_proto *tracing_prog_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);
@ -2577,7 +2681,7 @@ enum bpf_dynptr_type {
BPF_DYNPTR_TYPE_INVALID,
/* Points to memory that is local to the bpf program */
BPF_DYNPTR_TYPE_LOCAL,
/* Underlying data is a ringbuf record */
/* Underlying data is a kernel-produced ringbuf record */
BPF_DYNPTR_TYPE_RINGBUF,
};
@ -2585,6 +2689,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
enum bpf_dynptr_type type, u32 offset, u32 size);
void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
int bpf_dynptr_check_size(u32 size);
u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr);
#ifdef CONFIG_BPF_LSM
void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
@ -2594,4 +2699,12 @@ static inline void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype) {}
static inline void bpf_cgroup_atype_put(int cgroup_atype) {}
#endif /* CONFIG_BPF_LSM */
struct key;
#ifdef CONFIG_KEYS
struct bpf_key {
struct key *key;
bool has_ref;
};
#endif /* CONFIG_KEYS */
#endif /* _LINUX_BPF_H */

View File

@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops)
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)

View File

@ -248,6 +248,7 @@ struct bpf_func_state {
*/
u32 async_entry_cnt;
bool in_callback_fn;
struct tnum callback_ret_range;
bool in_async_callback_fn;
/* The following fields should be last. See copy_func_state() */
@ -348,6 +349,27 @@ struct bpf_verifier_state {
iter < frame->allocated_stack / BPF_REG_SIZE; \
iter++, reg = bpf_get_spilled_reg(iter, frame))
/* Invoke __expr over regsiters in __vst, setting __state and __reg */
#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \
({ \
struct bpf_verifier_state *___vstate = __vst; \
int ___i, ___j; \
for (___i = 0; ___i <= ___vstate->curframe; ___i++) { \
struct bpf_reg_state *___regs; \
__state = ___vstate->frame[___i]; \
___regs = __state->regs; \
for (___j = 0; ___j < MAX_BPF_REG; ___j++) { \
__reg = &___regs[___j]; \
(void)(__expr); \
} \
bpf_for_each_spilled_reg(___j, __state, __reg) { \
if (!__reg) \
continue; \
(void)(__expr); \
} \
} \
})
/* linked list of verifier states used to prune search */
struct bpf_verifier_state_list {
struct bpf_verifier_state state;
@ -571,6 +593,11 @@ int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state
u32 regno);
int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
u32 regno, u32 mem_size);
bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env,
struct bpf_reg_state *reg);
bool is_dynptr_type_expected(struct bpf_verifier_env *env,
struct bpf_reg_state *reg,
enum bpf_arg_type arg_type);
/* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
@ -598,6 +625,8 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
struct bpf_attach_target_info *tgt_info);
void bpf_free_kfunc_btf_tab(struct bpf_kfunc_btf_tab *tab);
int mark_chain_precision(struct bpf_verifier_env *env, int regno);
#define BPF_BASE_TYPE_MASK GENMASK(BPF_BASE_TYPE_BITS - 1, 0)
/* extract base type from bpf_{arg, return, reg}_type. */

View File

@ -52,6 +52,15 @@
#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */
#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */
/*
* Return the name of the passed struct, if exists, or halt the build if for
* example the structure gets renamed. In this way, developers have to revisit
* the code using that structure name, and update it accordingly.
*/
#define stringify_struct(x) \
({ BUILD_BUG_ON(sizeof(struct x) < 0); \
__stringify(x); })
struct btf;
struct btf_member;
struct btf_type;
@ -441,4 +450,14 @@ static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dt
}
#endif
static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)
{
if (!btf_type_is_ptr(t))
return false;
t = btf_type_skip_modifiers(btf, t->type, NULL);
return btf_type_is_struct(t);
}
#endif

View File

@ -567,6 +567,12 @@ struct sk_filter {
DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
extern struct mutex nf_conn_btf_access_lock;
extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype, u32 *next_btf_id,
enum bpf_type_flag *flag);
typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx,
const struct bpf_insn *insnsi,
unsigned int (*bpf_func)(const void *,
@ -1017,6 +1023,8 @@ extern long bpf_jit_limit_max;
typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size);
void bpf_jit_fill_hole_with_zero(void *area, unsigned int size);
struct bpf_binary_header *
bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
unsigned int alignment,
@ -1029,6 +1037,9 @@ void bpf_jit_free(struct bpf_prog *fp);
struct bpf_binary_header *
bpf_jit_binary_pack_hdr(const struct bpf_prog *fp);
void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns);
void bpf_prog_pack_free(struct bpf_binary_header *hdr);
static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
{
return list_empty(&fp->aux->ksym.lnode) ||
@ -1099,7 +1110,7 @@ static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog)
return false;
if (!bpf_jit_harden)
return false;
if (bpf_jit_harden == 1 && capable(CAP_SYS_ADMIN))
if (bpf_jit_harden == 1 && bpf_capable())
return false;
return true;

View File

@ -88,6 +88,12 @@ enum key_need_perm {
KEY_DEFER_PERM_CHECK, /* Special: permission check is deferred */
};
enum key_lookup_flag {
KEY_LOOKUP_CREATE = 0x01,
KEY_LOOKUP_PARTIAL = 0x02,
KEY_LOOKUP_ALL = (KEY_LOOKUP_CREATE | KEY_LOOKUP_PARTIAL),
};
struct seq_file;
struct user_struct;
struct signal_struct;

View File

@ -103,6 +103,7 @@ struct kprobe {
* this flag is only for optimized_kprobe.
*/
#define KPROBE_FLAG_FTRACE 8 /* probe is using ftrace */
#define KPROBE_FLAG_ON_FUNC_ENTRY 16 /* probe is on the function entry */
/* Has this kprobe gone ? */
static inline bool kprobe_gone(struct kprobe *p)

View File

@ -81,4 +81,7 @@
/********** net/core/page_pool.c **********/
#define PP_SIGNATURE (0x40 + POISON_POINTER_DELTA)
/********** kernel/bpf/ **********/
#define BPF_PTR_POISON ((void *)(0xeB9FUL + POISON_POINTER_DELTA))
#endif

View File

@ -388,6 +388,12 @@ struct tcp_sock {
u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs
* values defined in uapi/linux/tcp.h
*/
u8 bpf_chg_cc_inprogress:1; /* In the middle of
* bpf_setsockopt(TCP_CONGESTION),
* it is to avoid the bpf_tcp_cc->init()
* to recur itself by calling
* bpf_setsockopt(TCP_CONGESTION, "itself").
*/
#define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
#else
#define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0

View File

@ -17,6 +17,14 @@
#define VERIFY_USE_SECONDARY_KEYRING ((struct key *)1UL)
#define VERIFY_USE_PLATFORM_KEYRING ((struct key *)2UL)
static inline int system_keyring_id_check(u64 id)
{
if (id > (unsigned long)VERIFY_USE_PLATFORM_KEYRING)
return -EINVAL;
return 0;
}
/*
* The use to which an asymmetric key is being put.
*/

View File

@ -3,13 +3,18 @@
#ifndef _NF_CONNTRACK_BPF_H
#define _NF_CONNTRACK_BPF_H
#include <linux/btf.h>
#include <linux/kconfig.h>
#include <net/netfilter/nf_conntrack.h>
struct nf_conn___init {
struct nf_conn ct;
};
#if (IS_BUILTIN(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
(IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
extern int register_nf_conntrack_bpf(void);
extern void cleanup_nf_conntrack_bpf(void);
#else
@ -18,6 +23,24 @@ static inline int register_nf_conntrack_bpf(void)
return 0;
}
static inline void cleanup_nf_conntrack_bpf(void)
{
}
#endif
#if (IS_BUILTIN(CONFIG_NF_NAT) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
(IS_MODULE(CONFIG_NF_NAT) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
extern int register_nf_nat_bpf(void);
#else
static inline int register_nf_nat_bpf(void)
{
return 0;
}
#endif
#endif /* _NF_CONNTRACK_BPF_H */

View File

@ -110,6 +110,12 @@ union bpf_iter_link_info {
__u32 cgroup_fd;
__u64 cgroup_id;
} cgroup;
/* Parameters of task iterators. */
struct {
__u32 tid;
__u32 pid;
__u32 pid_fd;
} task;
};
/* BPF syscall commands, see bpf(2) man-page for more details. */
@ -928,6 +934,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_INODE_STORAGE,
BPF_MAP_TYPE_TASK_STORAGE,
BPF_MAP_TYPE_BLOOM_FILTER,
BPF_MAP_TYPE_USER_RINGBUF,
};
/* Note that tracing related programs such as
@ -4950,6 +4957,7 @@ union bpf_attr {
* Get address of the traced function (for tracing and kprobe programs).
* Return
* Address of the traced function.
* 0 for kprobes placed within the function (not at the entry).
*
* u64 bpf_get_attach_cookie(void *ctx)
* Description
@ -5079,12 +5087,12 @@ union bpf_attr {
*
* long bpf_get_func_arg(void *ctx, u32 n, u64 *value)
* Description
* Get **n**-th argument (zero based) of the traced function (for tracing programs)
* Get **n**-th argument register (zero based) of the traced function (for tracing programs)
* returned in **value**.
*
* Return
* 0 on success.
* **-EINVAL** if n >= arguments count of traced function.
* **-EINVAL** if n >= argument register count of traced function.
*
* long bpf_get_func_ret(void *ctx, u64 *value)
* Description
@ -5097,10 +5105,11 @@ union bpf_attr {
*
* long bpf_get_func_arg_cnt(void *ctx)
* Description
* Get number of arguments of the traced function (for tracing programs).
* Get number of registers of the traced function (for tracing programs) where
* function arguments are stored in these registers.
*
* Return
* The number of arguments of the traced function.
* The number of argument registers of the traced function.
*
* int bpf_get_retval(void)
* Description
@ -5386,6 +5395,43 @@ union bpf_attr {
* Return
* Current *ktime*.
*
* long bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags)
* Description
* Drain samples from the specified user ring buffer, and invoke
* the provided callback for each such sample:
*
* long (\*callback_fn)(struct bpf_dynptr \*dynptr, void \*ctx);
*
* If **callback_fn** returns 0, the helper will continue to try
* and drain the next sample, up to a maximum of
* BPF_MAX_USER_RINGBUF_SAMPLES samples. If the return value is 1,
* the helper will skip the rest of the samples and return. Other
* return values are not used now, and will be rejected by the
* verifier.
* Return
* The number of drained samples if no error was encountered while
* draining samples, or 0 if no samples were present in the ring
* buffer. If a user-space producer was epoll-waiting on this map,
* and at least one sample was drained, they will receive an event
* notification notifying them of available space in the ring
* buffer. If the BPF_RB_NO_WAKEUP flag is passed to this
* function, no wakeup notification will be sent. If the
* BPF_RB_FORCE_WAKEUP flag is passed, a wakeup notification will
* be sent even if no sample was drained.
*
* On failure, the returned value is one of the following:
*
* **-EBUSY** if the ring buffer is contended, and another calling
* context was concurrently draining the ring buffer.
*
* **-EINVAL** if user-space is not properly tracking the ring
* buffer due to the producer position not being aligned to 8
* bytes, a sample not being aligned to 8 bytes, or the producer
* position not matching the advertised length of a sample.
*
* **-E2BIG** if user-space has tried to publish a sample which is
* larger than the size of the ring buffer, or which cannot fit
* within a struct bpf_dynptr.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@ -5597,6 +5643,7 @@ union bpf_attr {
FN(tcp_raw_check_syncookie_ipv4), \
FN(tcp_raw_check_syncookie_ipv6), \
FN(ktime_get_tai_ns), \
FN(user_ringbuf_drain), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -6218,6 +6265,10 @@ struct bpf_link_info {
__u64 cgroup_id;
__u32 order;
} cgroup;
struct {
__u32 tid;
__u32 pid;
} task;
};
} iter;
struct {

View File

@ -279,7 +279,8 @@ int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
rcu_read_lock();
pptr = array->pptrs[index & array->index_mask];
for_each_possible_cpu(cpu) {
bpf_long_memcpy(value + off, per_cpu_ptr(pptr, cpu), size);
copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu));
check_and_init_map_value(map, value + off);
off += size;
}
rcu_read_unlock();
@ -338,8 +339,9 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
return -EINVAL;
if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
memcpy(this_cpu_ptr(array->pptrs[index & array->index_mask]),
value, map->value_size);
val = this_cpu_ptr(array->pptrs[index & array->index_mask]);
copy_map_value(map, val, value);
check_and_free_fields(array, val);
} else {
val = array->value +
(u64)array->elem_size * (index & array->index_mask);
@ -383,7 +385,8 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
rcu_read_lock();
pptr = array->pptrs[index & array->index_mask];
for_each_possible_cpu(cpu) {
bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value + off, size);
copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off);
check_and_free_fields(array, per_cpu_ptr(pptr, cpu));
off += size;
}
rcu_read_unlock();
@ -421,8 +424,20 @@ static void array_map_free(struct bpf_map *map)
int i;
if (map_value_has_kptrs(map)) {
for (i = 0; i < array->map.max_entries; i++)
bpf_map_free_kptrs(map, array_map_elem_ptr(array, i));
if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
for (i = 0; i < array->map.max_entries; i++) {
void __percpu *pptr = array->pptrs[i & array->index_mask];
int cpu;
for_each_possible_cpu(cpu) {
bpf_map_free_kptrs(map, per_cpu_ptr(pptr, cpu));
cond_resched();
}
}
} else {
for (i = 0; i < array->map.max_entries; i++)
bpf_map_free_kptrs(map, array_map_elem_ptr(array, i));
}
bpf_map_free_kptr_off_tab(map);
}
@ -608,9 +623,9 @@ static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
pptr = v;
size = array->elem_size;
for_each_possible_cpu(cpu) {
bpf_long_memcpy(info->percpu_value_buf + off,
per_cpu_ptr(pptr, cpu),
size);
copy_map_value_long(map, info->percpu_value_buf + off,
per_cpu_ptr(pptr, cpu));
check_and_init_map_value(map, info->percpu_value_buf + off);
off += size;
}
ctx.value = info->percpu_value_buf;

View File

@ -208,7 +208,7 @@ enum btf_kfunc_hook {
};
enum {
BTF_KFUNC_SET_MAX_CNT = 32,
BTF_KFUNC_SET_MAX_CNT = 256,
BTF_DTOR_KFUNC_MAX_CNT = 256,
};
@ -818,6 +818,7 @@ const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id)
return NULL;
return btf->types[type_id];
}
EXPORT_SYMBOL_GPL(btf_type_by_id);
/*
* Regular int is not a bit field and it must be either
@ -1396,7 +1397,6 @@ __printf(4, 5) static void __btf_verifier_log_type(struct btf_verifier_env *env,
const char *fmt, ...)
{
struct bpf_verifier_log *log = &env->log;
u8 kind = BTF_INFO_KIND(t->info);
struct btf *btf = env->btf;
va_list args;
@ -1412,7 +1412,7 @@ __printf(4, 5) static void __btf_verifier_log_type(struct btf_verifier_env *env,
__btf_verifier_log(log, "[%u] %s %s%s",
env->log_type_id,
btf_kind_str[kind],
btf_type_str(t),
__btf_name_by_offset(btf, t->name_off),
log_details ? " " : "");
@ -4854,7 +4854,6 @@ static int btf_parse_hdr(struct btf_verifier_env *env)
u32 hdr_len, hdr_copy, btf_data_size;
const struct btf_header *hdr;
struct btf *btf;
int err;
btf = env->btf;
btf_data_size = btf->data_size;
@ -4911,11 +4910,7 @@ static int btf_parse_hdr(struct btf_verifier_env *env)
return -EINVAL;
}
err = btf_check_sec_info(env, btf_data_size);
if (err)
return err;
return 0;
return btf_check_sec_info(env, btf_data_size);
}
static int btf_check_type_tags(struct btf_verifier_env *env,
@ -5328,6 +5323,34 @@ static bool is_int_ptr(struct btf *btf, const struct btf_type *t)
return btf_type_is_int(t);
}
static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto,
int off)
{
const struct btf_param *args;
const struct btf_type *t;
u32 offset = 0, nr_args;
int i;
if (!func_proto)
return off / 8;
nr_args = btf_type_vlen(func_proto);
args = (const struct btf_param *)(func_proto + 1);
for (i = 0; i < nr_args; i++) {
t = btf_type_skip_modifiers(btf, args[i].type, NULL);
offset += btf_type_is_ptr(t) ? 8 : roundup(t->size, 8);
if (off < offset)
return i;
}
t = btf_type_skip_modifiers(btf, func_proto->type, NULL);
offset += btf_type_is_ptr(t) ? 8 : roundup(t->size, 8);
if (off < offset)
return nr_args;
return nr_args + 1;
}
bool btf_ctx_access(int off, int size, enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
@ -5347,7 +5370,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
tname, off);
return false;
}
arg = off / 8;
arg = get_ctx_arg_idx(btf, t, off);
args = (const struct btf_param *)(t + 1);
/* if (t == NULL) Fall back to default BPF prog with
* MAX_BPF_FUNC_REG_ARGS u64 arguments.
@ -5398,7 +5421,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
if (!btf_type_is_small_int(t)) {
bpf_log(log,
"ret type %s not allowed for fmod_ret\n",
btf_kind_str[BTF_INFO_KIND(t->info)]);
btf_type_str(t));
return false;
}
break;
@ -5417,7 +5440,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
/* skip modifiers */
while (btf_type_is_modifier(t))
t = btf_type_by_id(btf, t->type);
if (btf_type_is_small_int(t) || btf_is_any_enum(t))
if (btf_type_is_small_int(t) || btf_is_any_enum(t) || __btf_type_is_struct(t))
/* accessing a scalar */
return true;
if (!btf_type_is_ptr(t)) {
@ -5425,7 +5448,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
"func '%s' arg%d '%s' has type %s. Only pointer access is allowed\n",
tname, arg,
__btf_name_by_offset(btf, t->name_off),
btf_kind_str[BTF_INFO_KIND(t->info)]);
btf_type_str(t));
return false;
}
@ -5509,11 +5532,11 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
if (!btf_type_is_struct(t)) {
bpf_log(log,
"func '%s' arg%d type %s is not a struct\n",
tname, arg, btf_kind_str[BTF_INFO_KIND(t->info)]);
tname, arg, btf_type_str(t));
return false;
}
bpf_log(log, "func '%s' arg%d has btf_id %d type %s '%s'\n",
tname, arg, info->btf_id, btf_kind_str[BTF_INFO_KIND(t->info)],
tname, arg, info->btf_id, btf_type_str(t),
__btf_name_by_offset(btf, t->name_off));
return true;
}
@ -5881,7 +5904,7 @@ static int __get_type_size(struct btf *btf, u32 btf_id,
if (btf_type_is_ptr(t))
/* kernel size of pointer. Not BPF's size of pointer*/
return sizeof(void *);
if (btf_type_is_int(t) || btf_is_any_enum(t))
if (btf_type_is_int(t) || btf_is_any_enum(t) || __btf_type_is_struct(t))
return t->size;
return -EINVAL;
}
@ -5901,8 +5924,10 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
/* BTF function prototype doesn't match the verifier types.
* Fall back to MAX_BPF_FUNC_REG_ARGS u64 args.
*/
for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++)
for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
m->arg_size[i] = 8;
m->arg_flags[i] = 0;
}
m->ret_size = 8;
m->nr_args = MAX_BPF_FUNC_REG_ARGS;
return 0;
@ -5916,10 +5941,10 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
return -EINVAL;
}
ret = __get_type_size(btf, func->type, &t);
if (ret < 0) {
if (ret < 0 || __btf_type_is_struct(t)) {
bpf_log(log,
"The function %s return type %s is unsupported.\n",
tname, btf_kind_str[BTF_INFO_KIND(t->info)]);
tname, btf_type_str(t));
return -EINVAL;
}
m->ret_size = ret;
@ -5932,10 +5957,12 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
return -EINVAL;
}
ret = __get_type_size(btf, args[i].type, &t);
if (ret < 0) {
/* No support of struct argument size greater than 16 bytes */
if (ret < 0 || ret > 16) {
bpf_log(log,
"The function %s arg%d type %s is unsupported.\n",
tname, i, btf_kind_str[BTF_INFO_KIND(t->info)]);
tname, i, btf_type_str(t));
return -EINVAL;
}
if (ret == 0) {
@ -5945,6 +5972,7 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
return -EINVAL;
}
m->arg_size[i] = ret;
m->arg_flags[i] = __btf_type_is_struct(t) ? BTF_FMODEL_STRUCT_ARG : 0;
}
m->nr_args = nargs;
return 0;
@ -6166,14 +6194,40 @@ static bool is_kfunc_arg_mem_size(const struct btf *btf,
return true;
}
static bool btf_is_kfunc_arg_mem_size(const struct btf *btf,
const struct btf_param *arg,
const struct bpf_reg_state *reg,
const char *name)
{
int len, target_len = strlen(name);
const struct btf_type *t;
const char *param_name;
t = btf_type_skip_modifiers(btf, arg->type, NULL);
if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE)
return false;
param_name = btf_name_by_offset(btf, arg->name_off);
if (str_is_empty(param_name))
return false;
len = strlen(param_name);
if (len != target_len)
return false;
if (strcmp(param_name, name))
return false;
return true;
}
static int btf_check_func_arg_match(struct bpf_verifier_env *env,
const struct btf *btf, u32 func_id,
struct bpf_reg_state *regs,
bool ptr_to_mem_ok,
u32 kfunc_flags)
struct bpf_kfunc_arg_meta *kfunc_meta,
bool processing_call)
{
enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
bool rel = false, kptr_get = false, trusted_arg = false;
bool rel = false, kptr_get = false, trusted_args = false;
bool sleepable = false;
struct bpf_verifier_log *log = &env->log;
u32 i, nargs, ref_id, ref_obj_id = 0;
@ -6207,12 +6261,12 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
return -EINVAL;
}
if (is_kfunc) {
if (is_kfunc && kfunc_meta) {
/* Only kfunc can be release func */
rel = kfunc_flags & KF_RELEASE;
kptr_get = kfunc_flags & KF_KPTR_GET;
trusted_arg = kfunc_flags & KF_TRUSTED_ARGS;
sleepable = kfunc_flags & KF_SLEEPABLE;
rel = kfunc_meta->flags & KF_RELEASE;
kptr_get = kfunc_meta->flags & KF_KPTR_GET;
trusted_args = kfunc_meta->flags & KF_TRUSTED_ARGS;
sleepable = kfunc_meta->flags & KF_SLEEPABLE;
}
/* check that BTF function arguments match actual types that the
@ -6222,9 +6276,42 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
enum bpf_arg_type arg_type = ARG_DONTCARE;
u32 regno = i + 1;
struct bpf_reg_state *reg = &regs[regno];
bool obj_ptr = false;
t = btf_type_skip_modifiers(btf, args[i].type, NULL);
if (btf_type_is_scalar(t)) {
if (is_kfunc && kfunc_meta) {
bool is_buf_size = false;
/* check for any const scalar parameter of name "rdonly_buf_size"
* or "rdwr_buf_size"
*/
if (btf_is_kfunc_arg_mem_size(btf, &args[i], reg,
"rdonly_buf_size")) {
kfunc_meta->r0_rdonly = true;
is_buf_size = true;
} else if (btf_is_kfunc_arg_mem_size(btf, &args[i], reg,
"rdwr_buf_size"))
is_buf_size = true;
if (is_buf_size) {
if (kfunc_meta->r0_size) {
bpf_log(log, "2 or more rdonly/rdwr_buf_size parameters for kfunc");
return -EINVAL;
}
if (!tnum_is_const(reg->var_off)) {
bpf_log(log, "R%d is not a const\n", regno);
return -EINVAL;
}
kfunc_meta->r0_size = reg->var_off.value;
ret = mark_chain_precision(env, regno);
if (ret)
return ret;
}
}
if (reg->type == SCALAR_VALUE)
continue;
bpf_log(log, "R%d is not a scalar\n", regno);
@ -6237,10 +6324,17 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
return -EINVAL;
}
/* These register types have special constraints wrt ref_obj_id
* and offset checks. The rest of trusted args don't.
*/
obj_ptr = reg->type == PTR_TO_CTX || reg->type == PTR_TO_BTF_ID ||
reg2btf_ids[base_type(reg->type)];
/* Check if argument must be a referenced pointer, args + i has
* been verified to be a pointer (after skipping modifiers).
* PTR_TO_CTX is ok without having non-zero ref_obj_id.
*/
if (is_kfunc && trusted_arg && !reg->ref_obj_id) {
if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) {
bpf_log(log, "R%d must be referenced\n", regno);
return -EINVAL;
}
@ -6249,12 +6343,23 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
ref_tname = btf_name_by_offset(btf, ref_t->name_off);
/* Trusted args have the same offset checks as release arguments */
if (trusted_arg || (rel && reg->ref_obj_id))
if ((trusted_args && obj_ptr) || (rel && reg->ref_obj_id))
arg_type |= OBJ_RELEASE;
ret = check_func_arg_reg_off(env, reg, regno, arg_type);
if (ret < 0)
return ret;
if (is_kfunc && reg->ref_obj_id) {
/* Ensure only one argument is referenced PTR_TO_BTF_ID */
if (ref_obj_id) {
bpf_log(log, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
regno, reg->ref_obj_id, ref_obj_id);
return -EFAULT;
}
ref_regno = regno;
ref_obj_id = reg->ref_obj_id;
}
/* kptr_get is only true for kfunc */
if (i == 0 && kptr_get) {
struct bpf_map_value_off_desc *off_desc;
@ -6327,16 +6432,6 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
if (reg->type == PTR_TO_BTF_ID) {
reg_btf = reg->btf;
reg_ref_id = reg->btf_id;
/* Ensure only one argument is referenced PTR_TO_BTF_ID */
if (reg->ref_obj_id) {
if (ref_obj_id) {
bpf_log(log, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
regno, reg->ref_obj_id, ref_obj_id);
return -EFAULT;
}
ref_regno = regno;
ref_obj_id = reg->ref_obj_id;
}
} else {
reg_btf = btf_vmlinux;
reg_ref_id = *reg2btf_ids[base_type(reg->type)];
@ -6348,7 +6443,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
reg_ref_t->name_off);
if (!btf_struct_ids_match(log, reg_btf, reg_ref_id,
reg->off, btf, ref_id,
trusted_arg || (rel && reg->ref_obj_id))) {
trusted_args || (rel && reg->ref_obj_id))) {
bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
func_name, i,
btf_type_str(ref_t), ref_tname,
@ -6356,21 +6451,26 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
reg_ref_tname);
return -EINVAL;
}
} else if (ptr_to_mem_ok) {
} else if (ptr_to_mem_ok && processing_call) {
const struct btf_type *resolve_ret;
u32 type_size;
if (is_kfunc) {
bool arg_mem_size = i + 1 < nargs && is_kfunc_arg_mem_size(btf, &args[i + 1], &regs[regno + 1]);
bool arg_dynptr = btf_type_is_struct(ref_t) &&
!strcmp(ref_tname,
stringify_struct(bpf_dynptr_kern));
/* Permit pointer to mem, but only when argument
* type is pointer to scalar, or struct composed
* (recursively) of scalars.
* When arg_mem_size is true, the pointer can be
* void *.
* Also permit initialized local dynamic pointers.
*/
if (!btf_type_is_scalar(ref_t) &&
!__btf_type_is_scalar_struct(log, btf, ref_t, 0) &&
!arg_dynptr &&
(arg_mem_size ? !btf_type_is_void(ref_t) : 1)) {
bpf_log(log,
"arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n",
@ -6378,6 +6478,34 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
return -EINVAL;
}
if (arg_dynptr) {
if (reg->type != PTR_TO_STACK) {
bpf_log(log, "arg#%d pointer type %s %s not to stack\n",
i, btf_type_str(ref_t),
ref_tname);
return -EINVAL;
}
if (!is_dynptr_reg_valid_init(env, reg)) {
bpf_log(log,
"arg#%d pointer type %s %s must be valid and initialized\n",
i, btf_type_str(ref_t),
ref_tname);
return -EINVAL;
}
if (!is_dynptr_type_expected(env, reg,
ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL)) {
bpf_log(log,
"arg#%d pointer type %s %s points to unsupported dynamic pointer type\n",
i, btf_type_str(ref_t),
ref_tname);
return -EINVAL;
}
continue;
}
/* Check for mem, len pair */
if (arg_mem_size) {
if (check_kfunc_mem_size_reg(env, &regs[regno + 1], regno + 1)) {
@ -6427,11 +6555,14 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
return -EINVAL;
}
if (kfunc_meta && ref_obj_id)
kfunc_meta->ref_obj_id = ref_obj_id;
/* returns argument register number > 0 in case of reference release kfunc */
return rel ? ref_regno : 0;
}
/* Compare BTF of a function with given bpf_reg_state.
/* Compare BTF of a function declaration with given bpf_reg_state.
* Returns:
* EFAULT - there is a verifier bug. Abort verification.
* EINVAL - there is a type mismatch or BTF is not available.
@ -6458,7 +6589,50 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
return -EINVAL;
is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, 0);
err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, NULL, false);
/* Compiler optimizations can remove arguments from static functions
* or mismatched type can be passed into a global function.
* In such cases mark the function as unreliable from BTF point of view.
*/
if (err)
prog->aux->func_info_aux[subprog].unreliable = true;
return err;
}
/* Compare BTF of a function call with given bpf_reg_state.
* Returns:
* EFAULT - there is a verifier bug. Abort verification.
* EINVAL - there is a type mismatch or BTF is not available.
* 0 - BTF matches with what bpf_reg_state expects.
* Only PTR_TO_CTX and SCALAR_VALUE states are recognized.
*
* NOTE: the code is duplicated from btf_check_subprog_arg_match()
* because btf_check_func_arg_match() is still doing both. Once that
* function is split in 2, we can call from here btf_check_subprog_arg_match()
* first, and then treat the calling part in a new code path.
*/
int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *regs)
{
struct bpf_prog *prog = env->prog;
struct btf *btf = prog->aux->btf;
bool is_global;
u32 btf_id;
int err;
if (!prog->aux->func_info)
return -EINVAL;
btf_id = prog->aux->func_info[subprog].type_id;
if (!btf_id)
return -EFAULT;
if (prog->aux->func_info_aux[subprog].unreliable)
return -EINVAL;
is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, NULL, true);
/* Compiler optimizations can remove arguments from static functions
* or mismatched type can be passed into a global function.
@ -6472,9 +6646,9 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
const struct btf *btf, u32 func_id,
struct bpf_reg_state *regs,
u32 kfunc_flags)
struct bpf_kfunc_arg_meta *meta)
{
return btf_check_func_arg_match(env, btf, func_id, regs, true, kfunc_flags);
return btf_check_func_arg_match(env, btf, func_id, regs, true, meta, true);
}
/* Convert BTF of a function into bpf_reg_state if possible
@ -6588,7 +6762,7 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
continue;
}
bpf_log(log, "Arg#%d type %s in %s() is not supported yet.\n",
i, btf_kind_str[BTF_INFO_KIND(t->info)], tname);
i, btf_type_str(t), tname);
return -EINVAL;
}
return 0;
@ -7243,6 +7417,7 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
case BPF_PROG_TYPE_STRUCT_OPS:
return BTF_KFUNC_HOOK_STRUCT_OPS;
case BPF_PROG_TYPE_TRACING:
case BPF_PROG_TYPE_LSM:
return BTF_KFUNC_HOOK_TRACING;
case BPF_PROG_TYPE_SYSCALL:
return BTF_KFUNC_HOOK_SYSCALL;

View File

@ -825,6 +825,11 @@ struct bpf_prog_pack {
unsigned long bitmap[];
};
void bpf_jit_fill_hole_with_zero(void *area, unsigned int size)
{
memset(area, 0, size);
}
#define BPF_PROG_SIZE_TO_NBITS(size) (round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE)
static DEFINE_MUTEX(pack_mutex);
@ -864,7 +869,7 @@ static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins
return pack;
}
static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns)
void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns)
{
unsigned int nbits = BPF_PROG_SIZE_TO_NBITS(size);
struct bpf_prog_pack *pack;
@ -905,7 +910,7 @@ out:
return ptr;
}
static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
void bpf_prog_pack_free(struct bpf_binary_header *hdr)
{
struct bpf_prog_pack *pack = NULL, *tmp;
unsigned int nbits;

View File

@ -85,12 +85,12 @@ static bool bpf_dispatcher_remove_prog(struct bpf_dispatcher *d,
return false;
}
int __weak arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs)
int __weak arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs)
{
return -ENOTSUPP;
}
static int bpf_dispatcher_prepare(struct bpf_dispatcher *d, void *image)
static int bpf_dispatcher_prepare(struct bpf_dispatcher *d, void *image, void *buf)
{
s64 ips[BPF_DISPATCHER_MAX] = {}, *ipsp = &ips[0];
int i;
@ -99,12 +99,12 @@ static int bpf_dispatcher_prepare(struct bpf_dispatcher *d, void *image)
if (d->progs[i].prog)
*ipsp++ = (s64)(uintptr_t)d->progs[i].prog->bpf_func;
}
return arch_prepare_bpf_dispatcher(image, &ips[0], d->num_progs);
return arch_prepare_bpf_dispatcher(image, buf, &ips[0], d->num_progs);
}
static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
{
void *old, *new;
void *old, *new, *tmp;
u32 noff;
int err;
@ -117,8 +117,14 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
}
new = d->num_progs ? d->image + noff : NULL;
tmp = d->num_progs ? d->rw_image + noff : NULL;
if (new) {
if (bpf_dispatcher_prepare(d, new))
/* Prepare the dispatcher in d->rw_image. Then use
* bpf_arch_text_copy to update d->image, which is RO+X.
*/
if (bpf_dispatcher_prepare(d, new, tmp))
return;
if (IS_ERR(bpf_arch_text_copy(new, tmp, PAGE_SIZE / 2)))
return;
}
@ -140,9 +146,18 @@ void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
mutex_lock(&d->mutex);
if (!d->image) {
d->image = bpf_jit_alloc_exec_page();
d->image = bpf_prog_pack_alloc(PAGE_SIZE, bpf_jit_fill_hole_with_zero);
if (!d->image)
goto out;
d->rw_image = bpf_jit_alloc_exec(PAGE_SIZE);
if (!d->rw_image) {
u32 size = PAGE_SIZE;
bpf_arch_text_copy(d->image, &size, sizeof(size));
bpf_prog_pack_free((struct bpf_binary_header *)d->image);
d->image = NULL;
goto out;
}
bpf_image_ksym_add(d->image, &d->ksym);
}

View File

@ -68,24 +68,16 @@
* In theory the BPF locks could be converted to regular spinlocks as well,
* but the bucket locks and percpu_freelist locks can be taken from
* arbitrary contexts (perf, kprobes, tracepoints) which are required to be
* atomic contexts even on RT. These mechanisms require preallocated maps,
* so there is no need to invoke memory allocations within the lock held
* sections.
*
* BPF maps which need dynamic allocation are only used from (forced)
* thread context on RT and can therefore use regular spinlocks which in
* turn allows to invoke memory allocations from the lock held section.
*
* On a non RT kernel this distinction is neither possible nor required.
* spinlock maps to raw_spinlock and the extra code is optimized out by the
* compiler.
* atomic contexts even on RT. Before the introduction of bpf_mem_alloc,
* it is only safe to use raw spinlock for preallocated hash map on a RT kernel,
* because there is no memory allocation within the lock held sections. However
* after hash map was fully converted to use bpf_mem_alloc, there will be
* non-synchronous memory allocation for non-preallocated hash map, so it is
* safe to always use raw spinlock for bucket lock.
*/
struct bucket {
struct hlist_nulls_head head;
union {
raw_spinlock_t raw_lock;
spinlock_t lock;
};
raw_spinlock_t raw_lock;
};
#define HASHTAB_MAP_LOCK_COUNT 8
@ -141,26 +133,15 @@ static inline bool htab_is_prealloc(const struct bpf_htab *htab)
return !(htab->map.map_flags & BPF_F_NO_PREALLOC);
}
static inline bool htab_use_raw_lock(const struct bpf_htab *htab)
{
return (!IS_ENABLED(CONFIG_PREEMPT_RT) || htab_is_prealloc(htab));
}
static void htab_init_buckets(struct bpf_htab *htab)
{
unsigned int i;
for (i = 0; i < htab->n_buckets; i++) {
INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
if (htab_use_raw_lock(htab)) {
raw_spin_lock_init(&htab->buckets[i].raw_lock);
lockdep_set_class(&htab->buckets[i].raw_lock,
raw_spin_lock_init(&htab->buckets[i].raw_lock);
lockdep_set_class(&htab->buckets[i].raw_lock,
&htab->lockdep_key);
} else {
spin_lock_init(&htab->buckets[i].lock);
lockdep_set_class(&htab->buckets[i].lock,
&htab->lockdep_key);
}
cond_resched();
}
}
@ -170,28 +151,17 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab,
unsigned long *pflags)
{
unsigned long flags;
bool use_raw_lock;
hash = hash & HASHTAB_MAP_LOCK_MASK;
use_raw_lock = htab_use_raw_lock(htab);
if (use_raw_lock)
preempt_disable();
else
migrate_disable();
preempt_disable();
if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
__this_cpu_dec(*(htab->map_locked[hash]));
if (use_raw_lock)
preempt_enable();
else
migrate_enable();
preempt_enable();
return -EBUSY;
}
if (use_raw_lock)
raw_spin_lock_irqsave(&b->raw_lock, flags);
else
spin_lock_irqsave(&b->lock, flags);
raw_spin_lock_irqsave(&b->raw_lock, flags);
*pflags = flags;
return 0;
@ -201,18 +171,10 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab,
struct bucket *b, u32 hash,
unsigned long flags)
{
bool use_raw_lock = htab_use_raw_lock(htab);
hash = hash & HASHTAB_MAP_LOCK_MASK;
if (use_raw_lock)
raw_spin_unlock_irqrestore(&b->raw_lock, flags);
else
spin_unlock_irqrestore(&b->lock, flags);
raw_spin_unlock_irqrestore(&b->raw_lock, flags);
__this_cpu_dec(*(htab->map_locked[hash]));
if (use_raw_lock)
preempt_enable();
else
migrate_enable();
preempt_enable();
}
static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
@ -622,6 +584,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
free_prealloc:
prealloc_destroy(htab);
free_map_locked:
if (htab->use_percpu_counter)
percpu_counter_destroy(&htab->pcount);
for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
free_percpu(htab->map_locked[i]);
bpf_map_area_free(htab->buckets);

View File

@ -15,6 +15,7 @@
#include <linux/ctype.h>
#include <linux/jiffies.h>
#include <linux/pid_namespace.h>
#include <linux/poison.h>
#include <linux/proc_ns.h>
#include <linux/security.h>
#include <linux/btf_ids.h>
@ -1376,10 +1377,9 @@ BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
}
/* Unlike other PTR_TO_BTF_ID helpers the btf_id in bpf_kptr_xchg()
* helper is determined dynamically by the verifier.
* helper is determined dynamically by the verifier. Use BPF_PTR_POISON to
* denote type that verifier will determine.
*/
#define BPF_PTR_POISON ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA))
static const struct bpf_func_proto bpf_kptr_xchg_proto = {
.func = bpf_kptr_xchg,
.gpl_only = false,
@ -1408,7 +1408,7 @@ static void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_typ
ptr->size |= type << DYNPTR_TYPE_SHIFT;
}
static u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
{
return ptr->size & DYNPTR_SIZE_MASK;
}
@ -1446,6 +1446,8 @@ BPF_CALL_4(bpf_dynptr_from_mem, void *, data, u32, size, u64, flags, struct bpf_
{
int err;
BTF_TYPE_EMIT(struct bpf_dynptr);
err = bpf_dynptr_check_size(size);
if (err)
goto error;
@ -1659,6 +1661,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_for_each_map_elem_proto;
case BPF_FUNC_loop:
return &bpf_loop_proto;
case BPF_FUNC_user_ringbuf_drain:
return &bpf_user_ringbuf_drain_proto;
default:
break;
}

View File

@ -277,7 +277,8 @@ static void free_bulk(struct bpf_mem_cache *c)
local_dec(&c->active);
if (IS_ENABLED(CONFIG_PREEMPT_RT))
local_irq_restore(flags);
enque_to_free(c, llnode);
if (llnode)
enque_to_free(c, llnode);
} while (cnt > (c->high_watermark + c->low_watermark) / 2);
/* and drain free_llist_extra */
@ -610,7 +611,7 @@ void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr)
if (!ptr)
return;
idx = bpf_mem_cache_idx(__ksize(ptr - LLIST_NODE_SZ));
idx = bpf_mem_cache_idx(ksize(ptr - LLIST_NODE_SZ));
if (idx < 0)
return;

View File

@ -58,23 +58,21 @@ static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
{
int cpu, orig_cpu;
orig_cpu = cpu = raw_smp_processor_id();
orig_cpu = raw_smp_processor_id();
while (1) {
struct pcpu_freelist_head *head;
for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) {
struct pcpu_freelist_head *head;
head = per_cpu_ptr(s->freelist, cpu);
if (raw_spin_trylock(&head->lock)) {
pcpu_freelist_push_node(head, node);
raw_spin_unlock(&head->lock);
return;
head = per_cpu_ptr(s->freelist, cpu);
if (raw_spin_trylock(&head->lock)) {
pcpu_freelist_push_node(head, node);
raw_spin_unlock(&head->lock);
return;
}
}
cpu = cpumask_next(cpu, cpu_possible_mask);
if (cpu >= nr_cpu_ids)
cpu = 0;
/* cannot lock any per cpu lock, try extralist */
if (cpu == orig_cpu &&
pcpu_freelist_try_push_extra(s, node))
if (pcpu_freelist_try_push_extra(s, node))
return;
}
}
@ -125,13 +123,12 @@ static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s)
{
struct pcpu_freelist_head *head;
struct pcpu_freelist_node *node;
int orig_cpu, cpu;
int cpu;
orig_cpu = cpu = raw_smp_processor_id();
while (1) {
for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
head = per_cpu_ptr(s->freelist, cpu);
if (!READ_ONCE(head->first))
goto next_cpu;
continue;
raw_spin_lock(&head->lock);
node = head->first;
if (node) {
@ -140,12 +137,6 @@ static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s)
return node;
}
raw_spin_unlock(&head->lock);
next_cpu:
cpu = cpumask_next(cpu, cpu_possible_mask);
if (cpu >= nr_cpu_ids)
cpu = 0;
if (cpu == orig_cpu)
break;
}
/* per cpu lists are all empty, try extralist */
@ -164,13 +155,12 @@ ___pcpu_freelist_pop_nmi(struct pcpu_freelist *s)
{
struct pcpu_freelist_head *head;
struct pcpu_freelist_node *node;
int orig_cpu, cpu;
int cpu;
orig_cpu = cpu = raw_smp_processor_id();
while (1) {
for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
head = per_cpu_ptr(s->freelist, cpu);
if (!READ_ONCE(head->first))
goto next_cpu;
continue;
if (raw_spin_trylock(&head->lock)) {
node = head->first;
if (node) {
@ -180,12 +170,6 @@ ___pcpu_freelist_pop_nmi(struct pcpu_freelist *s)
}
raw_spin_unlock(&head->lock);
}
next_cpu:
cpu = cpumask_next(cpu, cpu_possible_mask);
if (cpu >= nr_cpu_ids)
cpu = 0;
if (cpu == orig_cpu)
break;
}
/* cannot pop from per cpu lists, try extralist */

View File

@ -38,10 +38,43 @@ struct bpf_ringbuf {
struct page **pages;
int nr_pages;
spinlock_t spinlock ____cacheline_aligned_in_smp;
/* Consumer and producer counters are put into separate pages to allow
* mapping consumer page as r/w, but restrict producer page to r/o.
* This protects producer position from being modified by user-space
* application and ruining in-kernel position tracking.
/* For user-space producer ring buffers, an atomic_t busy bit is used
* to synchronize access to the ring buffers in the kernel, rather than
* the spinlock that is used for kernel-producer ring buffers. This is
* done because the ring buffer must hold a lock across a BPF program's
* callback:
*
* __bpf_user_ringbuf_peek() // lock acquired
* -> program callback_fn()
* -> __bpf_user_ringbuf_sample_release() // lock released
*
* It is unsafe and incorrect to hold an IRQ spinlock across what could
* be a long execution window, so we instead simply disallow concurrent
* access to the ring buffer by kernel consumers, and return -EBUSY from
* __bpf_user_ringbuf_peek() if the busy bit is held by another task.
*/
atomic_t busy ____cacheline_aligned_in_smp;
/* Consumer and producer counters are put into separate pages to
* allow each position to be mapped with different permissions.
* This prevents a user-space application from modifying the
* position and ruining in-kernel tracking. The permissions of the
* pages depend on who is producing samples: user-space or the
* kernel.
*
* Kernel-producer
* ---------------
* The producer position and data pages are mapped as r/o in
* userspace. For this approach, bits in the header of samples are
* used to signal to user-space, and to other producers, whether a
* sample is currently being written.
*
* User-space producer
* -------------------
* Only the page containing the consumer position is mapped r/o in
* user-space. User-space producers also use bits of the header to
* communicate to the kernel, but the kernel must carefully check and
* validate each sample to ensure that they're correctly formatted, and
* fully contained within the ring buffer.
*/
unsigned long consumer_pos __aligned(PAGE_SIZE);
unsigned long producer_pos __aligned(PAGE_SIZE);
@ -136,6 +169,7 @@ static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node)
return NULL;
spin_lock_init(&rb->spinlock);
atomic_set(&rb->busy, 0);
init_waitqueue_head(&rb->waitq);
init_irq_work(&rb->work, bpf_ringbuf_notify);
@ -224,7 +258,7 @@ static int ringbuf_map_get_next_key(struct bpf_map *map, void *key,
return -ENOTSUPP;
}
static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma)
static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma)
{
struct bpf_ringbuf_map *rb_map;
@ -242,6 +276,26 @@ static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma)
vma->vm_pgoff + RINGBUF_PGOFF);
}
static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma)
{
struct bpf_ringbuf_map *rb_map;
rb_map = container_of(map, struct bpf_ringbuf_map, map);
if (vma->vm_flags & VM_WRITE) {
if (vma->vm_pgoff == 0)
/* Disallow writable mappings to the consumer pointer,
* and allow writable mappings to both the producer
* position, and the ring buffer data itself.
*/
return -EPERM;
} else {
vma->vm_flags &= ~VM_MAYWRITE;
}
/* remap_vmalloc_range() checks size and offset constraints */
return remap_vmalloc_range(vma, rb_map->rb, vma->vm_pgoff + RINGBUF_PGOFF);
}
static unsigned long ringbuf_avail_data_sz(struct bpf_ringbuf *rb)
{
unsigned long cons_pos, prod_pos;
@ -251,8 +305,13 @@ static unsigned long ringbuf_avail_data_sz(struct bpf_ringbuf *rb)
return prod_pos - cons_pos;
}
static __poll_t ringbuf_map_poll(struct bpf_map *map, struct file *filp,
struct poll_table_struct *pts)
static u32 ringbuf_total_data_sz(const struct bpf_ringbuf *rb)
{
return rb->mask + 1;
}
static __poll_t ringbuf_map_poll_kern(struct bpf_map *map, struct file *filp,
struct poll_table_struct *pts)
{
struct bpf_ringbuf_map *rb_map;
@ -264,13 +323,26 @@ static __poll_t ringbuf_map_poll(struct bpf_map *map, struct file *filp,
return 0;
}
static __poll_t ringbuf_map_poll_user(struct bpf_map *map, struct file *filp,
struct poll_table_struct *pts)
{
struct bpf_ringbuf_map *rb_map;
rb_map = container_of(map, struct bpf_ringbuf_map, map);
poll_wait(filp, &rb_map->rb->waitq, pts);
if (ringbuf_avail_data_sz(rb_map->rb) < ringbuf_total_data_sz(rb_map->rb))
return EPOLLOUT | EPOLLWRNORM;
return 0;
}
BTF_ID_LIST_SINGLE(ringbuf_map_btf_ids, struct, bpf_ringbuf_map)
const struct bpf_map_ops ringbuf_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc = ringbuf_map_alloc,
.map_free = ringbuf_map_free,
.map_mmap = ringbuf_map_mmap,
.map_poll = ringbuf_map_poll,
.map_mmap = ringbuf_map_mmap_kern,
.map_poll = ringbuf_map_poll_kern,
.map_lookup_elem = ringbuf_map_lookup_elem,
.map_update_elem = ringbuf_map_update_elem,
.map_delete_elem = ringbuf_map_delete_elem,
@ -278,6 +350,20 @@ const struct bpf_map_ops ringbuf_map_ops = {
.map_btf_id = &ringbuf_map_btf_ids[0],
};
BTF_ID_LIST_SINGLE(user_ringbuf_map_btf_ids, struct, bpf_ringbuf_map)
const struct bpf_map_ops user_ringbuf_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
.map_alloc = ringbuf_map_alloc,
.map_free = ringbuf_map_free,
.map_mmap = ringbuf_map_mmap_user,
.map_poll = ringbuf_map_poll_user,
.map_lookup_elem = ringbuf_map_lookup_elem,
.map_update_elem = ringbuf_map_update_elem,
.map_delete_elem = ringbuf_map_delete_elem,
.map_get_next_key = ringbuf_map_get_next_key,
.map_btf_id = &user_ringbuf_map_btf_ids[0],
};
/* Given pointer to ring buffer record metadata and struct bpf_ringbuf itself,
* calculate offset from record metadata to ring buffer in pages, rounded
* down. This page offset is stored as part of record metadata and allows to
@ -312,7 +398,7 @@ static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
return NULL;
len = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
if (len > rb->mask + 1)
if (len > ringbuf_total_data_sz(rb))
return NULL;
cons_pos = smp_load_acquire(&rb->consumer_pos);
@ -459,7 +545,7 @@ BPF_CALL_2(bpf_ringbuf_query, struct bpf_map *, map, u64, flags)
case BPF_RB_AVAIL_DATA:
return ringbuf_avail_data_sz(rb);
case BPF_RB_RING_SIZE:
return rb->mask + 1;
return ringbuf_total_data_sz(rb);
case BPF_RB_CONS_POS:
return smp_load_acquire(&rb->consumer_pos);
case BPF_RB_PROD_POS:
@ -553,3 +639,138 @@ const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto = {
.arg1_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE,
.arg2_type = ARG_ANYTHING,
};
static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *size)
{
int err;
u32 hdr_len, sample_len, total_len, flags, *hdr;
u64 cons_pos, prod_pos;
/* Synchronizes with smp_store_release() in user-space producer. */
prod_pos = smp_load_acquire(&rb->producer_pos);
if (prod_pos % 8)
return -EINVAL;
/* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */
cons_pos = smp_load_acquire(&rb->consumer_pos);
if (cons_pos >= prod_pos)
return -ENODATA;
hdr = (u32 *)((uintptr_t)rb->data + (uintptr_t)(cons_pos & rb->mask));
/* Synchronizes with smp_store_release() in user-space producer. */
hdr_len = smp_load_acquire(hdr);
flags = hdr_len & (BPF_RINGBUF_BUSY_BIT | BPF_RINGBUF_DISCARD_BIT);
sample_len = hdr_len & ~flags;
total_len = round_up(sample_len + BPF_RINGBUF_HDR_SZ, 8);
/* The sample must fit within the region advertised by the producer position. */
if (total_len > prod_pos - cons_pos)
return -EINVAL;
/* The sample must fit within the data region of the ring buffer. */
if (total_len > ringbuf_total_data_sz(rb))
return -E2BIG;
/* The sample must fit into a struct bpf_dynptr. */
err = bpf_dynptr_check_size(sample_len);
if (err)
return -E2BIG;
if (flags & BPF_RINGBUF_DISCARD_BIT) {
/* If the discard bit is set, the sample should be skipped.
*
* Update the consumer pos, and return -EAGAIN so the caller
* knows to skip this sample and try to read the next one.
*/
smp_store_release(&rb->consumer_pos, cons_pos + total_len);
return -EAGAIN;
}
if (flags & BPF_RINGBUF_BUSY_BIT)
return -ENODATA;
*sample = (void *)((uintptr_t)rb->data +
(uintptr_t)((cons_pos + BPF_RINGBUF_HDR_SZ) & rb->mask));
*size = sample_len;
return 0;
}
static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags)
{
u64 consumer_pos;
u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
/* Using smp_load_acquire() is unnecessary here, as the busy-bit
* prevents another task from writing to consumer_pos after it was read
* by this task with smp_load_acquire() in __bpf_user_ringbuf_peek().
*/
consumer_pos = rb->consumer_pos;
/* Synchronizes with smp_load_acquire() in user-space producer. */
smp_store_release(&rb->consumer_pos, consumer_pos + rounded_size);
}
BPF_CALL_4(bpf_user_ringbuf_drain, struct bpf_map *, map,
void *, callback_fn, void *, callback_ctx, u64, flags)
{
struct bpf_ringbuf *rb;
long samples, discarded_samples = 0, ret = 0;
bpf_callback_t callback = (bpf_callback_t)callback_fn;
u64 wakeup_flags = BPF_RB_NO_WAKEUP | BPF_RB_FORCE_WAKEUP;
int busy = 0;
if (unlikely(flags & ~wakeup_flags))
return -EINVAL;
rb = container_of(map, struct bpf_ringbuf_map, map)->rb;
/* If another consumer is already consuming a sample, wait for them to finish. */
if (!atomic_try_cmpxchg(&rb->busy, &busy, 1))
return -EBUSY;
for (samples = 0; samples < BPF_MAX_USER_RINGBUF_SAMPLES && ret == 0; samples++) {
int err;
u32 size;
void *sample;
struct bpf_dynptr_kern dynptr;
err = __bpf_user_ringbuf_peek(rb, &sample, &size);
if (err) {
if (err == -ENODATA) {
break;
} else if (err == -EAGAIN) {
discarded_samples++;
continue;
} else {
ret = err;
goto schedule_work_return;
}
}
bpf_dynptr_init(&dynptr, sample, BPF_DYNPTR_TYPE_LOCAL, 0, size);
ret = callback((uintptr_t)&dynptr, (uintptr_t)callback_ctx, 0, 0, 0);
__bpf_user_ringbuf_sample_release(rb, size, flags);
}
ret = samples - discarded_samples;
schedule_work_return:
/* Prevent the clearing of the busy-bit from being reordered before the
* storing of any rb consumer or producer positions.
*/
smp_mb__before_atomic();
atomic_set(&rb->busy, 0);
if (flags & BPF_RB_FORCE_WAKEUP)
irq_work_queue(&rb->work);
else if (!(flags & BPF_RB_NO_WAKEUP) && samples > 0)
irq_work_queue(&rb->work);
return ret;
}
const struct bpf_func_proto bpf_user_ringbuf_drain_proto = {
.func = bpf_user_ringbuf_drain,
.ret_type = RET_INTEGER,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_FUNC,
.arg3_type = ARG_PTR_TO_STACK_OR_NULL,
.arg4_type = ARG_ANYTHING,
};

View File

@ -598,7 +598,7 @@ void bpf_map_free_kptrs(struct bpf_map *map, void *map_value)
if (off_desc->type == BPF_KPTR_UNREF) {
u64 *p = (u64 *)btf_id_ptr;
WRITE_ONCE(p, 0);
WRITE_ONCE(*p, 0);
continue;
}
old_ptr = xchg(btf_id_ptr, 0);
@ -1049,7 +1049,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
}
if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY) {
map->map_type != BPF_MAP_TYPE_ARRAY &&
map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY) {
ret = -EOPNOTSUPP;
goto free_map_tab;
}
@ -1416,19 +1417,14 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
}
value_size = bpf_map_value_size(map);
err = -ENOMEM;
value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN);
if (!value)
value = kvmemdup_bpfptr(uvalue, value_size);
if (IS_ERR(value)) {
err = PTR_ERR(value);
goto free_key;
err = -EFAULT;
if (copy_from_bpfptr(value, uvalue, value_size) != 0)
goto free_value;
}
err = bpf_map_update_value(map, f, key, value, attr->flags);
free_value:
kvfree(value);
free_key:
kvfree(key);
@ -2097,6 +2093,17 @@ struct bpf_prog_kstats {
u64 misses;
};
void notrace bpf_prog_inc_misses_counter(struct bpf_prog *prog)
{
struct bpf_prog_stats *stats;
unsigned int flags;
stats = this_cpu_ptr(prog->stats);
flags = u64_stats_update_begin_irqsave(&stats->syncp);
u64_stats_inc(&stats->misses);
u64_stats_update_end_irqrestore(&stats->syncp, flags);
}
static void bpf_prog_get_stats(const struct bpf_prog *prog,
struct bpf_prog_kstats *stats)
{

View File

@ -10,8 +10,17 @@
#include <linux/btf_ids.h>
#include "mmap_unlock_work.h"
static const char * const iter_task_type_names[] = {
"ALL",
"TID",
"PID",
};
struct bpf_iter_seq_task_common {
struct pid_namespace *ns;
enum bpf_iter_task_type type;
u32 pid;
u32 pid_visiting;
};
struct bpf_iter_seq_task_info {
@ -22,18 +31,115 @@ struct bpf_iter_seq_task_info {
u32 tid;
};
static struct task_struct *task_seq_get_next(struct pid_namespace *ns,
static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_common *common,
u32 *tid,
bool skip_if_dup_files)
{
struct task_struct *task, *next_task;
struct pid *pid;
u32 saved_tid;
if (!*tid) {
/* The first time, the iterator calls this function. */
pid = find_pid_ns(common->pid, common->ns);
if (!pid)
return NULL;
task = get_pid_task(pid, PIDTYPE_TGID);
if (!task)
return NULL;
*tid = common->pid;
common->pid_visiting = common->pid;
return task;
}
/* If the control returns to user space and comes back to the
* kernel again, *tid and common->pid_visiting should be the
* same for task_seq_start() to pick up the correct task.
*/
if (*tid == common->pid_visiting) {
pid = find_pid_ns(common->pid_visiting, common->ns);
task = get_pid_task(pid, PIDTYPE_PID);
return task;
}
pid = find_pid_ns(common->pid_visiting, common->ns);
if (!pid)
return NULL;
task = get_pid_task(pid, PIDTYPE_PID);
if (!task)
return NULL;
retry:
if (!pid_alive(task)) {
put_task_struct(task);
return NULL;
}
next_task = next_thread(task);
put_task_struct(task);
if (!next_task)
return NULL;
saved_tid = *tid;
*tid = __task_pid_nr_ns(next_task, PIDTYPE_PID, common->ns);
if (!*tid || *tid == common->pid) {
/* Run out of tasks of a process. The tasks of a
* thread_group are linked as circular linked list.
*/
*tid = saved_tid;
return NULL;
}
get_task_struct(next_task);
common->pid_visiting = *tid;
if (skip_if_dup_files && task->files == task->group_leader->files) {
task = next_task;
goto retry;
}
return next_task;
}
static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *common,
u32 *tid,
bool skip_if_dup_files)
{
struct task_struct *task = NULL;
struct pid *pid;
if (common->type == BPF_TASK_ITER_TID) {
if (*tid && *tid != common->pid)
return NULL;
rcu_read_lock();
pid = find_pid_ns(common->pid, common->ns);
if (pid) {
task = get_pid_task(pid, PIDTYPE_TGID);
*tid = common->pid;
}
rcu_read_unlock();
return task;
}
if (common->type == BPF_TASK_ITER_TGID) {
rcu_read_lock();
task = task_group_seq_get_next(common, tid, skip_if_dup_files);
rcu_read_unlock();
return task;
}
rcu_read_lock();
retry:
pid = find_ge_pid(*tid, ns);
pid = find_ge_pid(*tid, common->ns);
if (pid) {
*tid = pid_nr_ns(pid, ns);
*tid = pid_nr_ns(pid, common->ns);
task = get_pid_task(pid, PIDTYPE_PID);
if (!task) {
++*tid;
@ -56,7 +162,7 @@ static void *task_seq_start(struct seq_file *seq, loff_t *pos)
struct bpf_iter_seq_task_info *info = seq->private;
struct task_struct *task;
task = task_seq_get_next(info->common.ns, &info->tid, false);
task = task_seq_get_next(&info->common, &info->tid, false);
if (!task)
return NULL;
@ -73,7 +179,7 @@ static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos)
++*pos;
++info->tid;
put_task_struct((struct task_struct *)v);
task = task_seq_get_next(info->common.ns, &info->tid, false);
task = task_seq_get_next(&info->common, &info->tid, false);
if (!task)
return NULL;
@ -117,6 +223,41 @@ static void task_seq_stop(struct seq_file *seq, void *v)
put_task_struct((struct task_struct *)v);
}
static int bpf_iter_attach_task(struct bpf_prog *prog,
union bpf_iter_link_info *linfo,
struct bpf_iter_aux_info *aux)
{
unsigned int flags;
struct pid *pid;
pid_t tgid;
if ((!!linfo->task.tid + !!linfo->task.pid + !!linfo->task.pid_fd) > 1)
return -EINVAL;
aux->task.type = BPF_TASK_ITER_ALL;
if (linfo->task.tid != 0) {
aux->task.type = BPF_TASK_ITER_TID;
aux->task.pid = linfo->task.tid;
}
if (linfo->task.pid != 0) {
aux->task.type = BPF_TASK_ITER_TGID;
aux->task.pid = linfo->task.pid;
}
if (linfo->task.pid_fd != 0) {
aux->task.type = BPF_TASK_ITER_TGID;
pid = pidfd_get_pid(linfo->task.pid_fd, &flags);
if (IS_ERR(pid))
return PTR_ERR(pid);
tgid = pid_nr_ns(pid, task_active_pid_ns(current));
aux->task.pid = tgid;
put_pid(pid);
}
return 0;
}
static const struct seq_operations task_seq_ops = {
.start = task_seq_start,
.next = task_seq_next,
@ -137,8 +278,7 @@ struct bpf_iter_seq_task_file_info {
static struct file *
task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info)
{
struct pid_namespace *ns = info->common.ns;
u32 curr_tid = info->tid;
u32 saved_tid = info->tid;
struct task_struct *curr_task;
unsigned int curr_fd = info->fd;
@ -151,21 +291,18 @@ again:
curr_task = info->task;
curr_fd = info->fd;
} else {
curr_task = task_seq_get_next(ns, &curr_tid, true);
curr_task = task_seq_get_next(&info->common, &info->tid, true);
if (!curr_task) {
info->task = NULL;
info->tid = curr_tid;
return NULL;
}
/* set info->task and info->tid */
/* set info->task */
info->task = curr_task;
if (curr_tid == info->tid) {
if (saved_tid == info->tid)
curr_fd = info->fd;
} else {
info->tid = curr_tid;
else
curr_fd = 0;
}
}
rcu_read_lock();
@ -186,9 +323,15 @@ again:
/* the current task is done, go to the next task */
rcu_read_unlock();
put_task_struct(curr_task);
if (info->common.type == BPF_TASK_ITER_TID) {
info->task = NULL;
return NULL;
}
info->task = NULL;
info->fd = 0;
curr_tid = ++(info->tid);
saved_tid = ++(info->tid);
goto again;
}
@ -269,6 +412,9 @@ static int init_seq_pidns(void *priv_data, struct bpf_iter_aux_info *aux)
struct bpf_iter_seq_task_common *common = priv_data;
common->ns = get_pid_ns(task_active_pid_ns(current));
common->type = aux->task.type;
common->pid = aux->task.pid;
return 0;
}
@ -307,11 +453,10 @@ enum bpf_task_vma_iter_find_op {
static struct vm_area_struct *
task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
{
struct pid_namespace *ns = info->common.ns;
enum bpf_task_vma_iter_find_op op;
struct vm_area_struct *curr_vma;
struct task_struct *curr_task;
u32 curr_tid = info->tid;
u32 saved_tid = info->tid;
/* If this function returns a non-NULL vma, it holds a reference to
* the task_struct, and holds read lock on vma->mm->mmap_lock.
@ -371,14 +516,13 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info)
}
} else {
again:
curr_task = task_seq_get_next(ns, &curr_tid, true);
curr_task = task_seq_get_next(&info->common, &info->tid, true);
if (!curr_task) {
info->tid = curr_tid + 1;
info->tid++;
goto finish;
}
if (curr_tid != info->tid) {
info->tid = curr_tid;
if (saved_tid != info->tid) {
/* new task, process the first vma */
op = task_vma_iter_first_vma;
} else {
@ -430,9 +574,12 @@ again:
return curr_vma;
next_task:
if (info->common.type == BPF_TASK_ITER_TID)
goto finish;
put_task_struct(curr_task);
info->task = NULL;
curr_tid++;
info->tid++;
goto again;
finish:
@ -531,8 +678,33 @@ static const struct bpf_iter_seq_info task_seq_info = {
.seq_priv_size = sizeof(struct bpf_iter_seq_task_info),
};
static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct bpf_link_info *info)
{
switch (aux->task.type) {
case BPF_TASK_ITER_TID:
info->iter.task.tid = aux->task.pid;
break;
case BPF_TASK_ITER_TGID:
info->iter.task.pid = aux->task.pid;
break;
default:
break;
}
return 0;
}
static void bpf_iter_task_show_fdinfo(const struct bpf_iter_aux_info *aux, struct seq_file *seq)
{
seq_printf(seq, "task_type:\t%s\n", iter_task_type_names[aux->task.type]);
if (aux->task.type == BPF_TASK_ITER_TID)
seq_printf(seq, "tid:\t%u\n", aux->task.pid);
else if (aux->task.type == BPF_TASK_ITER_TGID)
seq_printf(seq, "pid:\t%u\n", aux->task.pid);
}
static struct bpf_iter_reg task_reg_info = {
.target = "task",
.attach_target = bpf_iter_attach_task,
.feature = BPF_ITER_RESCHED,
.ctx_arg_info_size = 1,
.ctx_arg_info = {
@ -540,6 +712,8 @@ static struct bpf_iter_reg task_reg_info = {
PTR_TO_BTF_ID_OR_NULL },
},
.seq_info = &task_seq_info,
.fill_link_info = bpf_iter_fill_link_info,
.show_fdinfo = bpf_iter_task_show_fdinfo,
};
static const struct bpf_iter_seq_info task_file_seq_info = {
@ -551,6 +725,7 @@ static const struct bpf_iter_seq_info task_file_seq_info = {
static struct bpf_iter_reg task_file_reg_info = {
.target = "task_file",
.attach_target = bpf_iter_attach_task,
.feature = BPF_ITER_RESCHED,
.ctx_arg_info_size = 2,
.ctx_arg_info = {
@ -560,6 +735,8 @@ static struct bpf_iter_reg task_file_reg_info = {
PTR_TO_BTF_ID_OR_NULL },
},
.seq_info = &task_file_seq_info,
.fill_link_info = bpf_iter_fill_link_info,
.show_fdinfo = bpf_iter_task_show_fdinfo,
};
static const struct bpf_iter_seq_info task_vma_seq_info = {
@ -571,6 +748,7 @@ static const struct bpf_iter_seq_info task_vma_seq_info = {
static struct bpf_iter_reg task_vma_reg_info = {
.target = "task_vma",
.attach_target = bpf_iter_attach_task,
.feature = BPF_ITER_RESCHED,
.ctx_arg_info_size = 2,
.ctx_arg_info = {
@ -580,6 +758,8 @@ static struct bpf_iter_reg task_vma_reg_info = {
PTR_TO_BTF_ID_OR_NULL },
},
.seq_info = &task_vma_seq_info,
.fill_link_info = bpf_iter_fill_link_info,
.show_fdinfo = bpf_iter_task_show_fdinfo,
};
BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,

View File

@ -116,22 +116,6 @@ bool bpf_prog_has_trampoline(const struct bpf_prog *prog)
(ptype == BPF_PROG_TYPE_LSM && eatype == BPF_LSM_MAC);
}
void *bpf_jit_alloc_exec_page(void)
{
void *image;
image = bpf_jit_alloc_exec(PAGE_SIZE);
if (!image)
return NULL;
set_vm_flush_reset_perms(image);
/* Keep image as writeable. The alternative is to keep flipping ro/rw
* every time new program is attached or detached.
*/
set_memory_x((long)image, 1);
return image;
}
void bpf_image_ksym_add(void *data, struct bpf_ksym *ksym)
{
ksym->start = (unsigned long) data;
@ -404,9 +388,10 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key, u32 idx)
goto out_free_im;
err = -ENOMEM;
im->image = image = bpf_jit_alloc_exec_page();
im->image = image = bpf_jit_alloc_exec(PAGE_SIZE);
if (!image)
goto out_uncharge;
set_vm_flush_reset_perms(image);
err = percpu_ref_init(&im->pcref, __bpf_tramp_image_release, 0, GFP_KERNEL);
if (err)
@ -483,6 +468,9 @@ again:
if (err < 0)
goto out;
set_memory_ro((long)im->image, 1);
set_memory_x((long)im->image, 1);
WARN_ON(tr->cur_image && tr->selector == 0);
WARN_ON(!tr->cur_image && tr->selector);
if (tr->cur_image)
@ -863,17 +851,6 @@ static __always_inline u64 notrace bpf_prog_start_time(void)
return start;
}
static void notrace inc_misses_counter(struct bpf_prog *prog)
{
struct bpf_prog_stats *stats;
unsigned int flags;
stats = this_cpu_ptr(prog->stats);
flags = u64_stats_update_begin_irqsave(&stats->syncp);
u64_stats_inc(&stats->misses);
u64_stats_update_end_irqrestore(&stats->syncp, flags);
}
/* The logic is similar to bpf_prog_run(), but with an explicit
* rcu_read_lock() and migrate_disable() which are required
* for the trampoline. The macro is split into
@ -896,7 +873,7 @@ u64 notrace __bpf_prog_enter(struct bpf_prog *prog, struct bpf_tramp_run_ctx *ru
run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
inc_misses_counter(prog);
bpf_prog_inc_misses_counter(prog);
return 0;
}
return bpf_prog_start_time();
@ -967,7 +944,7 @@ u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_r
might_fault();
if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
inc_misses_counter(prog);
bpf_prog_inc_misses_counter(prog);
return 0;
}
@ -987,6 +964,29 @@ void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start,
rcu_read_unlock_trace();
}
u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog,
struct bpf_tramp_run_ctx *run_ctx)
__acquires(RCU)
{
rcu_read_lock();
migrate_disable();
run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
return bpf_prog_start_time();
}
void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start,
struct bpf_tramp_run_ctx *run_ctx)
__releases(RCU)
{
bpf_reset_run_ctx(run_ctx->saved_run_ctx);
update_prog_stats(prog, start);
migrate_enable();
rcu_read_unlock();
}
void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr)
{
percpu_ref_get(&tr->pcref);

View File

@ -23,6 +23,7 @@
#include <linux/error-injection.h>
#include <linux/bpf_lsm.h>
#include <linux/btf_ids.h>
#include <linux/poison.h>
#include "disasm.h"
@ -370,6 +371,7 @@ __printf(2, 3) void bpf_log(struct bpf_verifier_log *log,
bpf_verifier_vlog(log, fmt, args);
va_end(args);
}
EXPORT_SYMBOL_GPL(bpf_log);
static const char *ltrim(const char *s)
{
@ -561,6 +563,7 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
[PTR_TO_BUF] = "buf",
[PTR_TO_FUNC] = "func",
[PTR_TO_MAP_KEY] = "map_key",
[PTR_TO_DYNPTR] = "dynptr_ptr",
};
if (type & PTR_MAYBE_NULL) {
@ -779,8 +782,8 @@ static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_
return true;
}
static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
enum bpf_arg_type arg_type)
bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env,
struct bpf_reg_state *reg)
{
struct bpf_func_state *state = func(env, reg);
int spi = get_spi(reg->off);
@ -796,11 +799,24 @@ static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_re
return false;
}
return true;
}
bool is_dynptr_type_expected(struct bpf_verifier_env *env,
struct bpf_reg_state *reg,
enum bpf_arg_type arg_type)
{
struct bpf_func_state *state = func(env, reg);
enum bpf_dynptr_type dynptr_type;
int spi = get_spi(reg->off);
/* ARG_PTR_TO_DYNPTR takes any type of dynptr */
if (arg_type == ARG_PTR_TO_DYNPTR)
return true;
return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
dynptr_type = arg_to_dynptr_type(arg_type);
return state->stack[spi].spilled_ptr.dynptr.type == dynptr_type;
}
/* The reg state of a pointer or a bounded scalar was saved when
@ -1749,6 +1765,7 @@ static void init_func_state(struct bpf_verifier_env *env,
state->callsite = callsite;
state->frameno = frameno;
state->subprogno = subprogno;
state->callback_ret_range = tnum_range(0, 0);
init_reg_state(env, state);
mark_verifier_state_scratched(env);
}
@ -2908,7 +2925,7 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
return 0;
}
static int mark_chain_precision(struct bpf_verifier_env *env, int regno)
int mark_chain_precision(struct bpf_verifier_env *env, int regno)
{
return __mark_chain_precision(env, regno, -1);
}
@ -5233,6 +5250,25 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
env,
regno, reg->off, access_size,
zero_size_allowed, ACCESS_HELPER, meta);
case PTR_TO_CTX:
/* in case the function doesn't know how to access the context,
* (because we are in a program of type SYSCALL for example), we
* can not statically check its size.
* Dynamically check it now.
*/
if (!env->ops->convert_ctx_access) {
enum bpf_access_type atype = meta && meta->raw_mode ? BPF_WRITE : BPF_READ;
int offset = access_size - 1;
/* Allow zero-byte read from PTR_TO_CTX */
if (access_size == 0)
return zero_size_allowed ? 0 : -EACCES;
return check_mem_access(env, env->insn_idx, regno, offset, BPF_B,
atype, -1, false);
}
fallthrough;
default: /* scalar_value or invalid ptr */
/* Allow zero-byte read from NULL, regardless of pointer type */
if (zero_size_allowed && access_size == 0 &&
@ -5666,6 +5702,12 @@ static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK }
static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
static const struct bpf_reg_types dynptr_types = {
.types = {
PTR_TO_STACK,
PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL,
}
};
static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
[ARG_PTR_TO_MAP_KEY] = &map_key_value_types,
@ -5692,7 +5734,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
[ARG_PTR_TO_CONST_STR] = &const_str_ptr_types,
[ARG_PTR_TO_TIMER] = &timer_types,
[ARG_PTR_TO_KPTR] = &kptr_types,
[ARG_PTR_TO_DYNPTR] = &stack_ptr_types,
[ARG_PTR_TO_DYNPTR] = &dynptr_types,
};
static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
@ -5761,13 +5803,22 @@ found:
if (meta->func_id == BPF_FUNC_kptr_xchg) {
if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno))
return -EACCES;
} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
btf_vmlinux, *arg_btf_id,
strict_type_match)) {
verbose(env, "R%d is of type %s but %s is expected\n",
regno, kernel_type_name(reg->btf, reg->btf_id),
kernel_type_name(btf_vmlinux, *arg_btf_id));
return -EACCES;
} else {
if (arg_btf_id == BPF_PTR_POISON) {
verbose(env, "verifier internal error:");
verbose(env, "R%d has non-overwritten BPF_PTR_POISON type\n",
regno);
return -EACCES;
}
if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
btf_vmlinux, *arg_btf_id,
strict_type_match)) {
verbose(env, "R%d is of type %s but %s is expected\n",
regno, kernel_type_name(reg->btf, reg->btf_id),
kernel_type_name(btf_vmlinux, *arg_btf_id));
return -EACCES;
}
}
}
@ -6035,6 +6086,13 @@ skip_type_check:
err = check_mem_size_reg(env, reg, regno, true, meta);
break;
case ARG_PTR_TO_DYNPTR:
/* We only need to check for initialized / uninitialized helper
* dynptr args if the dynptr is not PTR_TO_DYNPTR, as the
* assumption is that if it is, that a helper function
* initialized the dynptr on behalf of the BPF program.
*/
if (base_type(reg->type) == PTR_TO_DYNPTR)
break;
if (arg_type & MEM_UNINIT) {
if (!is_dynptr_reg_valid_uninit(env, reg)) {
verbose(env, "Dynptr has to be an uninitialized dynptr\n");
@ -6050,21 +6108,27 @@ skip_type_check:
}
meta->uninit_dynptr_regno = regno;
} else if (!is_dynptr_reg_valid_init(env, reg, arg_type)) {
} else if (!is_dynptr_reg_valid_init(env, reg)) {
verbose(env,
"Expected an initialized dynptr as arg #%d\n",
arg + 1);
return -EINVAL;
} else if (!is_dynptr_type_expected(env, reg, arg_type)) {
const char *err_extra = "";
switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
case DYNPTR_TYPE_LOCAL:
err_extra = "local ";
err_extra = "local";
break;
case DYNPTR_TYPE_RINGBUF:
err_extra = "ringbuf ";
err_extra = "ringbuf";
break;
default:
err_extra = "<unknown>";
break;
}
verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
verbose(env,
"Expected a dynptr of type %s as arg #%d\n",
err_extra, arg + 1);
return -EINVAL;
}
@ -6209,6 +6273,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
func_id != BPF_FUNC_ringbuf_discard_dynptr)
goto error;
break;
case BPF_MAP_TYPE_USER_RINGBUF:
if (func_id != BPF_FUNC_user_ringbuf_drain)
goto error;
break;
case BPF_MAP_TYPE_STACK_TRACE:
if (func_id != BPF_FUNC_get_stackid)
goto error;
@ -6328,6 +6396,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
if (map->map_type != BPF_MAP_TYPE_RINGBUF)
goto error;
break;
case BPF_FUNC_user_ringbuf_drain:
if (map->map_type != BPF_MAP_TYPE_USER_RINGBUF)
goto error;
break;
case BPF_FUNC_get_stackid:
if (map->map_type != BPF_MAP_TYPE_STACK_TRACE)
goto error;
@ -6494,31 +6566,15 @@ static int check_func_proto(const struct bpf_func_proto *fn, int func_id)
/* Packet data might have moved, any old PTR_TO_PACKET[_META,_END]
* are now invalid, so turn them into unknown SCALAR_VALUE.
*/
static void __clear_all_pkt_pointers(struct bpf_verifier_env *env,
struct bpf_func_state *state)
{
struct bpf_reg_state *regs = state->regs, *reg;
int i;
for (i = 0; i < MAX_BPF_REG; i++)
if (reg_is_pkt_pointer_any(&regs[i]))
mark_reg_unknown(env, regs, i);
bpf_for_each_spilled_reg(i, state, reg) {
if (!reg)
continue;
if (reg_is_pkt_pointer_any(reg))
__mark_reg_unknown(env, reg);
}
}
static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
{
struct bpf_verifier_state *vstate = env->cur_state;
int i;
struct bpf_func_state *state;
struct bpf_reg_state *reg;
for (i = 0; i <= vstate->curframe; i++)
__clear_all_pkt_pointers(env, vstate->frame[i]);
bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
if (reg_is_pkt_pointer_any(reg))
__mark_reg_unknown(env, reg);
}));
}
enum {
@ -6547,41 +6603,24 @@ static void mark_pkt_end(struct bpf_verifier_state *vstate, int regn, bool range
reg->range = AT_PKT_END;
}
static void release_reg_references(struct bpf_verifier_env *env,
struct bpf_func_state *state,
int ref_obj_id)
{
struct bpf_reg_state *regs = state->regs, *reg;
int i;
for (i = 0; i < MAX_BPF_REG; i++)
if (regs[i].ref_obj_id == ref_obj_id)
mark_reg_unknown(env, regs, i);
bpf_for_each_spilled_reg(i, state, reg) {
if (!reg)
continue;
if (reg->ref_obj_id == ref_obj_id)
__mark_reg_unknown(env, reg);
}
}
/* The pointer with the specified id has released its reference to kernel
* resources. Identify all copies of the same pointer and clear the reference.
*/
static int release_reference(struct bpf_verifier_env *env,
int ref_obj_id)
{
struct bpf_verifier_state *vstate = env->cur_state;
struct bpf_func_state *state;
struct bpf_reg_state *reg;
int err;
int i;
err = release_reference_state(cur_func(env), ref_obj_id);
if (err)
return err;
for (i = 0; i <= vstate->curframe; i++)
release_reg_references(env, vstate->frame[i], ref_obj_id);
bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
if (reg->ref_obj_id == ref_obj_id)
__mark_reg_unknown(env, reg);
}));
return 0;
}
@ -6629,7 +6668,7 @@ static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn
func_info_aux = env->prog->aux->func_info_aux;
if (func_info_aux)
is_global = func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
err = btf_check_subprog_arg_match(env, subprog, caller->regs);
err = btf_check_subprog_call(env, subprog, caller->regs);
if (err == -EFAULT)
return err;
if (is_global) {
@ -6803,6 +6842,7 @@ static int set_map_elem_callback_state(struct bpf_verifier_env *env,
return err;
callee->in_callback_fn = true;
callee->callback_ret_range = tnum_range(0, 1);
return 0;
}
@ -6824,6 +6864,7 @@ static int set_loop_callback_state(struct bpf_verifier_env *env,
__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
callee->in_callback_fn = true;
callee->callback_ret_range = tnum_range(0, 1);
return 0;
}
@ -6853,6 +6894,7 @@ static int set_timer_callback_state(struct bpf_verifier_env *env,
__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
callee->in_async_callback_fn = true;
callee->callback_ret_range = tnum_range(0, 1);
return 0;
}
@ -6879,6 +6921,30 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
/* unused */
__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
callee->in_callback_fn = true;
callee->callback_ret_range = tnum_range(0, 1);
return 0;
}
static int set_user_ringbuf_callback_state(struct bpf_verifier_env *env,
struct bpf_func_state *caller,
struct bpf_func_state *callee,
int insn_idx)
{
/* bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void
* callback_ctx, u64 flags);
* callback_fn(struct bpf_dynptr_t* dynptr, void *callback_ctx);
*/
__mark_reg_not_init(env, &callee->regs[BPF_REG_0]);
callee->regs[BPF_REG_1].type = PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL;
__mark_reg_known_zero(&callee->regs[BPF_REG_1]);
callee->regs[BPF_REG_2] = caller->regs[BPF_REG_3];
/* unused */
__mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
callee->in_callback_fn = true;
return 0;
}
@ -6907,7 +6973,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
caller = state->frame[state->curframe];
if (callee->in_callback_fn) {
/* enforce R0 return value range [0, 1]. */
struct tnum range = tnum_range(0, 1);
struct tnum range = callee->callback_ret_range;
if (r0->type != SCALAR_VALUE) {
verbose(env, "R0 not a scalar value\n");
@ -7342,12 +7408,18 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
case BPF_FUNC_dynptr_data:
for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
if (arg_type_is_dynptr(fn->arg_type[i])) {
struct bpf_reg_state *reg = &regs[BPF_REG_1 + i];
if (meta.ref_obj_id) {
verbose(env, "verifier internal error: meta.ref_obj_id already set\n");
return -EFAULT;
}
/* Find the id of the dynptr we're tracking the reference of */
meta.ref_obj_id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]);
if (base_type(reg->type) != PTR_TO_DYNPTR)
/* Find the id of the dynptr we're
* tracking the reference of
*/
meta.ref_obj_id = stack_slot_get_id(env, reg);
break;
}
}
@ -7356,6 +7428,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
return -EFAULT;
}
break;
case BPF_FUNC_user_ringbuf_drain:
err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
set_user_ringbuf_callback_state);
break;
}
if (err)
@ -7465,6 +7541,12 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
ret_btf = meta.kptr_off_desc->kptr.btf;
ret_btf_id = meta.kptr_off_desc->kptr.btf_id;
} else {
if (fn->ret_btf_id == BPF_PTR_POISON) {
verbose(env, "verifier internal error:");
verbose(env, "func %s has non-overwritten BPF_PTR_POISON return type\n",
func_id_name(func_id));
return -EINVAL;
}
ret_btf = btf_vmlinux;
ret_btf_id = *fn->ret_btf_id;
}
@ -7576,6 +7658,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
{
const struct btf_type *t, *func, *func_proto, *ptr_type;
struct bpf_reg_state *regs = cur_regs(env);
struct bpf_kfunc_arg_meta meta = { 0 };
const char *func_name, *ptr_type_name;
u32 i, nargs, func_id, ptr_type_id;
int err, insn_idx = *insn_idx_p;
@ -7610,8 +7693,10 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
acq = *kfunc_flags & KF_ACQUIRE;
meta.flags = *kfunc_flags;
/* Check the arguments */
err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, *kfunc_flags);
err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, &meta);
if (err < 0)
return err;
/* In case of release function, we get register number of refcounted
@ -7632,7 +7717,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
/* Check return type */
t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL);
if (acq && !btf_type_is_ptr(t)) {
if (acq && !btf_type_is_struct_ptr(desc_btf, t)) {
verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
return -EINVAL;
}
@ -7644,17 +7729,33 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
ptr_type = btf_type_skip_modifiers(desc_btf, t->type,
&ptr_type_id);
if (!btf_type_is_struct(ptr_type)) {
ptr_type_name = btf_name_by_offset(desc_btf,
ptr_type->name_off);
verbose(env, "kernel function %s returns pointer type %s %s is not supported\n",
func_name, btf_type_str(ptr_type),
ptr_type_name);
return -EINVAL;
if (!meta.r0_size) {
ptr_type_name = btf_name_by_offset(desc_btf,
ptr_type->name_off);
verbose(env,
"kernel function %s returns pointer type %s %s is not supported\n",
func_name,
btf_type_str(ptr_type),
ptr_type_name);
return -EINVAL;
}
mark_reg_known_zero(env, regs, BPF_REG_0);
regs[BPF_REG_0].type = PTR_TO_MEM;
regs[BPF_REG_0].mem_size = meta.r0_size;
if (meta.r0_rdonly)
regs[BPF_REG_0].type |= MEM_RDONLY;
/* Ensures we don't access the memory after a release_reference() */
if (meta.ref_obj_id)
regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id;
} else {
mark_reg_known_zero(env, regs, BPF_REG_0);
regs[BPF_REG_0].btf = desc_btf;
regs[BPF_REG_0].type = PTR_TO_BTF_ID;
regs[BPF_REG_0].btf_id = ptr_type_id;
}
mark_reg_known_zero(env, regs, BPF_REG_0);
regs[BPF_REG_0].btf = desc_btf;
regs[BPF_REG_0].type = PTR_TO_BTF_ID;
regs[BPF_REG_0].btf_id = ptr_type_id;
if (*kfunc_flags & KF_RET_NULL) {
regs[BPF_REG_0].type |= PTR_MAYBE_NULL;
/* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */
@ -9297,34 +9398,14 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
return 0;
}
static void __find_good_pkt_pointers(struct bpf_func_state *state,
struct bpf_reg_state *dst_reg,
enum bpf_reg_type type, int new_range)
{
struct bpf_reg_state *reg;
int i;
for (i = 0; i < MAX_BPF_REG; i++) {
reg = &state->regs[i];
if (reg->type == type && reg->id == dst_reg->id)
/* keep the maximum range already checked */
reg->range = max(reg->range, new_range);
}
bpf_for_each_spilled_reg(i, state, reg) {
if (!reg)
continue;
if (reg->type == type && reg->id == dst_reg->id)
reg->range = max(reg->range, new_range);
}
}
static void find_good_pkt_pointers(struct bpf_verifier_state *vstate,
struct bpf_reg_state *dst_reg,
enum bpf_reg_type type,
bool range_right_open)
{
int new_range, i;
struct bpf_func_state *state;
struct bpf_reg_state *reg;
int new_range;
if (dst_reg->off < 0 ||
(dst_reg->off == 0 && range_right_open))
@ -9389,9 +9470,11 @@ static void find_good_pkt_pointers(struct bpf_verifier_state *vstate,
* the range won't allow anything.
* dst_reg->off is known < MAX_PACKET_OFF, therefore it fits in a u16.
*/
for (i = 0; i <= vstate->curframe; i++)
__find_good_pkt_pointers(vstate->frame[i], dst_reg, type,
new_range);
bpf_for_each_reg_in_vstate(vstate, state, reg, ({
if (reg->type == type && reg->id == dst_reg->id)
/* keep the maximum range already checked */
reg->range = max(reg->range, new_range);
}));
}
static int is_branch32_taken(struct bpf_reg_state *reg, u32 val, u8 opcode)
@ -9880,7 +9963,7 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
if (!reg_may_point_to_spin_lock(reg)) {
/* For not-NULL ptr, reg->ref_obj_id will be reset
* in release_reg_references().
* in release_reference().
*
* reg->id is still used by spin_lock ptr. Other
* than spin_lock ptr type, reg->id can be reset.
@ -9890,22 +9973,6 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
}
}
static void __mark_ptr_or_null_regs(struct bpf_func_state *state, u32 id,
bool is_null)
{
struct bpf_reg_state *reg;
int i;
for (i = 0; i < MAX_BPF_REG; i++)
mark_ptr_or_null_reg(state, &state->regs[i], id, is_null);
bpf_for_each_spilled_reg(i, state, reg) {
if (!reg)
continue;
mark_ptr_or_null_reg(state, reg, id, is_null);
}
}
/* The logic is similar to find_good_pkt_pointers(), both could eventually
* be folded together at some point.
*/
@ -9913,10 +9980,9 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
bool is_null)
{
struct bpf_func_state *state = vstate->frame[vstate->curframe];
struct bpf_reg_state *regs = state->regs;
struct bpf_reg_state *regs = state->regs, *reg;
u32 ref_obj_id = regs[regno].ref_obj_id;
u32 id = regs[regno].id;
int i;
if (ref_obj_id && ref_obj_id == id && is_null)
/* regs[regno] is in the " == NULL" branch.
@ -9925,8 +9991,9 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
*/
WARN_ON_ONCE(release_reference_state(state, id));
for (i = 0; i <= vstate->curframe; i++)
__mark_ptr_or_null_regs(vstate->frame[i], id, is_null);
bpf_for_each_reg_in_vstate(vstate, state, reg, ({
mark_ptr_or_null_reg(state, reg, id, is_null);
}));
}
static bool try_match_pkt_pointers(const struct bpf_insn *insn,
@ -10039,23 +10106,11 @@ static void find_equal_scalars(struct bpf_verifier_state *vstate,
{
struct bpf_func_state *state;
struct bpf_reg_state *reg;
int i, j;
for (i = 0; i <= vstate->curframe; i++) {
state = vstate->frame[i];
for (j = 0; j < MAX_BPF_REG; j++) {
reg = &state->regs[j];
if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
*reg = *known_reg;
}
bpf_for_each_spilled_reg(j, state, reg) {
if (!reg)
continue;
if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
*reg = *known_reg;
}
}
bpf_for_each_reg_in_vstate(vstate, state, reg, ({
if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
*reg = *known_reg;
}));
}
static int check_cond_jmp_op(struct bpf_verifier_env *env,
@ -12654,6 +12709,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
case BPF_MAP_TYPE_ARRAY_OF_MAPS:
case BPF_MAP_TYPE_HASH_OF_MAPS:
case BPF_MAP_TYPE_RINGBUF:
case BPF_MAP_TYPE_USER_RINGBUF:
case BPF_MAP_TYPE_INODE_STORAGE:
case BPF_MAP_TYPE_SK_STORAGE:
case BPF_MAP_TYPE_TASK_STORAGE:
@ -13447,9 +13503,6 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
insn->code = BPF_LDX | BPF_PROBE_MEM |
BPF_SIZE((insn)->code);
env->prog->aux->num_exentries++;
} else if (resolve_prog_type(env->prog) != BPF_PROG_TYPE_STRUCT_OPS) {
verbose(env, "Writes through BTF pointers are not allowed\n");
return -EINVAL;
}
continue;
default:

View File

@ -1607,9 +1607,10 @@ int register_kprobe(struct kprobe *p)
struct kprobe *old_p;
struct module *probed_mod;
kprobe_opcode_t *addr;
bool on_func_entry;
/* Adjust probe address from symbol */
addr = kprobe_addr(p);
addr = _kprobe_addr(p->addr, p->symbol_name, p->offset, &on_func_entry);
if (IS_ERR(addr))
return PTR_ERR(addr);
p->addr = addr;
@ -1629,6 +1630,9 @@ int register_kprobe(struct kprobe *p)
mutex_lock(&kprobe_mutex);
if (on_func_entry)
p->flags |= KPROBE_FLAG_ON_FUNC_ENTRY;
old_p = get_kprobe(p->addr);
if (old_p) {
/* Since this may unoptimize 'old_p', locking 'text_mutex'. */

View File

@ -51,6 +51,12 @@ config HAVE_DYNAMIC_FTRACE_WITH_ARGS
This allows for use of regs_get_kernel_argument() and
kernel_stack_pointer().
config HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
bool
help
If the architecture generates __patchable_function_entries sections
but does not want them included in the ftrace locations.
config HAVE_FTRACE_MCOUNT_RECORD
bool
help

View File

@ -20,6 +20,8 @@
#include <linux/fprobe.h>
#include <linux/bsearch.h>
#include <linux/sort.h>
#include <linux/key.h>
#include <linux/verification.h>
#include <net/bpf_sk_storage.h>
@ -1026,11 +1028,30 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_tracing = {
.arg1_type = ARG_PTR_TO_CTX,
};
#ifdef CONFIG_X86_KERNEL_IBT
static unsigned long get_entry_ip(unsigned long fentry_ip)
{
u32 instr;
/* Being extra safe in here in case entry ip is on the page-edge. */
if (get_kernel_nofault(instr, (u32 *) fentry_ip - 1))
return fentry_ip;
if (is_endbr(instr))
fentry_ip -= ENDBR_INSN_SIZE;
return fentry_ip;
}
#else
#define get_entry_ip(fentry_ip) fentry_ip
#endif
BPF_CALL_1(bpf_get_func_ip_kprobe, struct pt_regs *, regs)
{
struct kprobe *kp = kprobe_running();
return kp ? (uintptr_t)kp->addr : 0;
if (!kp || !(kp->flags & KPROBE_FLAG_ON_FUNC_ENTRY))
return 0;
return get_entry_ip((uintptr_t)kp->addr);
}
static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
@ -1181,6 +1202,184 @@ static const struct bpf_func_proto bpf_get_func_arg_cnt_proto = {
.arg1_type = ARG_PTR_TO_CTX,
};
#ifdef CONFIG_KEYS
__diag_push();
__diag_ignore_all("-Wmissing-prototypes",
"kfuncs which will be used in BPF programs");
/**
* bpf_lookup_user_key - lookup a key by its serial
* @serial: key handle serial number
* @flags: lookup-specific flags
*
* Search a key with a given *serial* and the provided *flags*.
* If found, increment the reference count of the key by one, and
* return it in the bpf_key structure.
*
* The bpf_key structure must be passed to bpf_key_put() when done
* with it, so that the key reference count is decremented and the
* bpf_key structure is freed.
*
* Permission checks are deferred to the time the key is used by
* one of the available key-specific kfuncs.
*
* Set *flags* with KEY_LOOKUP_CREATE, to attempt creating a requested
* special keyring (e.g. session keyring), if it doesn't yet exist.
* Set *flags* with KEY_LOOKUP_PARTIAL, to lookup a key without waiting
* for the key construction, and to retrieve uninstantiated keys (keys
* without data attached to them).
*
* Return: a bpf_key pointer with a valid key pointer if the key is found, a
* NULL pointer otherwise.
*/
struct bpf_key *bpf_lookup_user_key(u32 serial, u64 flags)
{
key_ref_t key_ref;
struct bpf_key *bkey;
if (flags & ~KEY_LOOKUP_ALL)
return NULL;
/*
* Permission check is deferred until the key is used, as the
* intent of the caller is unknown here.
*/
key_ref = lookup_user_key(serial, flags, KEY_DEFER_PERM_CHECK);
if (IS_ERR(key_ref))
return NULL;
bkey = kmalloc(sizeof(*bkey), GFP_KERNEL);
if (!bkey) {
key_put(key_ref_to_ptr(key_ref));
return NULL;
}
bkey->key = key_ref_to_ptr(key_ref);
bkey->has_ref = true;
return bkey;
}
/**
* bpf_lookup_system_key - lookup a key by a system-defined ID
* @id: key ID
*
* Obtain a bpf_key structure with a key pointer set to the passed key ID.
* The key pointer is marked as invalid, to prevent bpf_key_put() from
* attempting to decrement the key reference count on that pointer. The key
* pointer set in such way is currently understood only by
* verify_pkcs7_signature().
*
* Set *id* to one of the values defined in include/linux/verification.h:
* 0 for the primary keyring (immutable keyring of system keys);
* VERIFY_USE_SECONDARY_KEYRING for both the primary and secondary keyring
* (where keys can be added only if they are vouched for by existing keys
* in those keyrings); VERIFY_USE_PLATFORM_KEYRING for the platform
* keyring (primarily used by the integrity subsystem to verify a kexec'ed
* kerned image and, possibly, the initramfs signature).
*
* Return: a bpf_key pointer with an invalid key pointer set from the
* pre-determined ID on success, a NULL pointer otherwise
*/
struct bpf_key *bpf_lookup_system_key(u64 id)
{
struct bpf_key *bkey;
if (system_keyring_id_check(id) < 0)
return NULL;
bkey = kmalloc(sizeof(*bkey), GFP_ATOMIC);
if (!bkey)
return NULL;
bkey->key = (struct key *)(unsigned long)id;
bkey->has_ref = false;
return bkey;
}
/**
* bpf_key_put - decrement key reference count if key is valid and free bpf_key
* @bkey: bpf_key structure
*
* Decrement the reference count of the key inside *bkey*, if the pointer
* is valid, and free *bkey*.
*/
void bpf_key_put(struct bpf_key *bkey)
{
if (bkey->has_ref)
key_put(bkey->key);
kfree(bkey);
}
#ifdef CONFIG_SYSTEM_DATA_VERIFICATION
/**
* bpf_verify_pkcs7_signature - verify a PKCS#7 signature
* @data_ptr: data to verify
* @sig_ptr: signature of the data
* @trusted_keyring: keyring with keys trusted for signature verification
*
* Verify the PKCS#7 signature *sig_ptr* against the supplied *data_ptr*
* with keys in a keyring referenced by *trusted_keyring*.
*
* Return: 0 on success, a negative value on error.
*/
int bpf_verify_pkcs7_signature(struct bpf_dynptr_kern *data_ptr,
struct bpf_dynptr_kern *sig_ptr,
struct bpf_key *trusted_keyring)
{
int ret;
if (trusted_keyring->has_ref) {
/*
* Do the permission check deferred in bpf_lookup_user_key().
* See bpf_lookup_user_key() for more details.
*
* A call to key_task_permission() here would be redundant, as
* it is already done by keyring_search() called by
* find_asymmetric_key().
*/
ret = key_validate(trusted_keyring->key);
if (ret < 0)
return ret;
}
return verify_pkcs7_signature(data_ptr->data,
bpf_dynptr_get_size(data_ptr),
sig_ptr->data,
bpf_dynptr_get_size(sig_ptr),
trusted_keyring->key,
VERIFYING_UNSPECIFIED_SIGNATURE, NULL,
NULL);
}
#endif /* CONFIG_SYSTEM_DATA_VERIFICATION */
__diag_pop();
BTF_SET8_START(key_sig_kfunc_set)
BTF_ID_FLAGS(func, bpf_lookup_user_key, KF_ACQUIRE | KF_RET_NULL | KF_SLEEPABLE)
BTF_ID_FLAGS(func, bpf_lookup_system_key, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE)
#ifdef CONFIG_SYSTEM_DATA_VERIFICATION
BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE)
#endif
BTF_SET8_END(key_sig_kfunc_set)
static const struct btf_kfunc_id_set bpf_key_sig_kfunc_set = {
.owner = THIS_MODULE,
.set = &key_sig_kfunc_set,
};
static int __init bpf_key_sig_kfuncs_init(void)
{
return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
&bpf_key_sig_kfunc_set);
}
late_initcall(bpf_key_sig_kfuncs_init);
#endif /* CONFIG_KEYS */
static const struct bpf_func_proto *
bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
@ -2042,9 +2241,15 @@ static __always_inline
void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
{
cant_sleep();
if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
bpf_prog_inc_misses_counter(prog);
goto out;
}
rcu_read_lock();
(void) bpf_prog_run(prog, args);
rcu_read_unlock();
out:
this_cpu_dec(*(prog->active));
}
#define UNPACK(...) __VA_ARGS__
@ -2414,13 +2619,13 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
}
static void
kprobe_multi_link_handler(struct fprobe *fp, unsigned long entry_ip,
kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,
struct pt_regs *regs)
{
struct bpf_kprobe_multi_link *link;
link = container_of(fp, struct bpf_kprobe_multi_link, fp);
kprobe_multi_link_prog_run(link, entry_ip, regs);
kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
}
static int symbols_cmp_r(const void *a, const void *b, const void *priv)

View File

@ -8265,8 +8265,7 @@ static int kallsyms_callback(void *data, const char *name,
if (args->addrs[idx])
return 0;
addr = ftrace_location(addr);
if (!addr)
if (!ftrace_location(addr))
return 0;
args->addrs[idx] = addr;

View File

@ -606,6 +606,38 @@ noinline void bpf_kfunc_call_memb1_release(struct prog_test_member1 *p)
WARN_ON_ONCE(1);
}
static int *__bpf_kfunc_call_test_get_mem(struct prog_test_ref_kfunc *p, const int size)
{
if (size > 2 * sizeof(int))
return NULL;
return (int *)p;
}
noinline int *bpf_kfunc_call_test_get_rdwr_mem(struct prog_test_ref_kfunc *p, const int rdwr_buf_size)
{
return __bpf_kfunc_call_test_get_mem(p, rdwr_buf_size);
}
noinline int *bpf_kfunc_call_test_get_rdonly_mem(struct prog_test_ref_kfunc *p, const int rdonly_buf_size)
{
return __bpf_kfunc_call_test_get_mem(p, rdonly_buf_size);
}
/* the next 2 ones can't be really used for testing expect to ensure
* that the verifier rejects the call.
* Acquire functions must return struct pointers, so these ones are
* failing.
*/
noinline int *bpf_kfunc_call_test_acq_rdonly_mem(struct prog_test_ref_kfunc *p, const int rdonly_buf_size)
{
return __bpf_kfunc_call_test_get_mem(p, rdonly_buf_size);
}
noinline void bpf_kfunc_call_int_mem_release(int *p)
{
}
noinline struct prog_test_ref_kfunc *
bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **pp, int a, int b)
{
@ -712,6 +744,10 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_memb_acquire, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_kfunc_call_memb_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_kfunc_call_memb1_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_get_rdwr_mem, KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_get_rdonly_mem, KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_acq_rdonly_mem, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_kfunc_call_int_mem_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_kptr_get, KF_ACQUIRE | KF_RET_NULL | KF_KPTR_GET)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass_ctx)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass1)
@ -1634,6 +1670,7 @@ static int __init bpf_prog_test_run_init(void)
ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_prog_test_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_prog_test_kfunc_set);
return ret ?: register_btf_id_dtor_kfuncs(bpf_prog_test_dtor_kfunc,
ARRAY_SIZE(bpf_prog_test_dtor_kfunc),
THIS_MODULE);

View File

@ -18,6 +18,7 @@
*/
#include <linux/atomic.h>
#include <linux/bpf_verifier.h>
#include <linux/module.h>
#include <linux/types.h>
#include <linux/mm.h>
@ -5101,6 +5102,59 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
return 0;
}
static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
int *optlen, bool getopt)
{
struct tcp_sock *tp;
int ret;
if (*optlen < 2)
return -EINVAL;
if (getopt) {
if (!inet_csk(sk)->icsk_ca_ops)
return -EINVAL;
/* BPF expects NULL-terminated tcp-cc string */
optval[--(*optlen)] = '\0';
return do_tcp_getsockopt(sk, SOL_TCP, TCP_CONGESTION,
KERNEL_SOCKPTR(optval),
KERNEL_SOCKPTR(optlen));
}
/* "cdg" is the only cc that alloc a ptr
* in inet_csk_ca area. The bpf-tcp-cc may
* overwrite this ptr after switching to cdg.
*/
if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen))
return -ENOTSUPP;
/* It stops this looping
*
* .init => bpf_setsockopt(tcp_cc) => .init =>
* bpf_setsockopt(tcp_cc)" => .init => ....
*
* The second bpf_setsockopt(tcp_cc) is not allowed
* in order to break the loop when both .init
* are the same bpf prog.
*
* This applies even the second bpf_setsockopt(tcp_cc)
* does not cause a loop. This limits only the first
* '.init' can call bpf_setsockopt(TCP_CONGESTION) to
* pick a fallback cc (eg. peer does not support ECN)
* and the second '.init' cannot fallback to
* another.
*/
tp = tcp_sk(sk);
if (tp->bpf_chg_cc_inprogress)
return -EBUSY;
tp->bpf_chg_cc_inprogress = 1;
ret = do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
KERNEL_SOCKPTR(optval), *optlen);
tp->bpf_chg_cc_inprogress = 0;
return ret;
}
static int sol_tcp_sockopt(struct sock *sk, int optname,
char *optval, int *optlen,
bool getopt)
@ -5124,9 +5178,7 @@ static int sol_tcp_sockopt(struct sock *sk, int optname,
return -EINVAL;
break;
case TCP_CONGESTION:
if (*optlen < 2)
return -EINVAL;
break;
return sol_tcp_sockopt_congestion(sk, optval, optlen, getopt);
case TCP_SAVED_SYN:
if (*optlen < 1)
return -EINVAL;
@ -5151,13 +5203,6 @@ static int sol_tcp_sockopt(struct sock *sk, int optname,
return 0;
}
if (optname == TCP_CONGESTION) {
if (!inet_csk(sk)->icsk_ca_ops)
return -EINVAL;
/* BPF expects NULL-terminated tcp-cc string */
optval[--(*optlen)] = '\0';
}
return do_tcp_getsockopt(sk, SOL_TCP, optname,
KERNEL_SOCKPTR(optval),
KERNEL_SOCKPTR(optlen));
@ -5284,12 +5329,6 @@ static int _bpf_getsockopt(struct sock *sk, int level, int optname,
BPF_CALL_5(bpf_sk_setsockopt, struct sock *, sk, int, level,
int, optname, char *, optval, int, optlen)
{
if (level == SOL_TCP && optname == TCP_CONGESTION) {
if (optlen >= sizeof("cdg") - 1 &&
!strncmp("cdg", optval, optlen))
return -ENOTSUPP;
}
return _bpf_setsockopt(sk, level, optname, optval, optlen);
}
@ -8605,6 +8644,36 @@ static bool tc_cls_act_is_valid_access(int off, int size,
return bpf_skb_is_valid_access(off, size, type, prog, info);
}
DEFINE_MUTEX(nf_conn_btf_access_lock);
EXPORT_SYMBOL_GPL(nf_conn_btf_access_lock);
int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype, u32 *next_btf_id,
enum bpf_type_flag *flag);
EXPORT_SYMBOL_GPL(nfct_btf_struct_access);
static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off,
int size, enum bpf_access_type atype,
u32 *next_btf_id,
enum bpf_type_flag *flag)
{
int ret = -EACCES;
if (atype == BPF_READ)
return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
flag);
mutex_lock(&nf_conn_btf_access_lock);
if (nfct_btf_struct_access)
ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
mutex_unlock(&nf_conn_btf_access_lock);
return ret;
}
static bool __is_valid_xdp_access(int off, int size)
{
if (off < 0 || off >= sizeof(struct xdp_md))
@ -8664,6 +8733,27 @@ void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
}
EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
static int xdp_btf_struct_access(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off,
int size, enum bpf_access_type atype,
u32 *next_btf_id,
enum bpf_type_flag *flag)
{
int ret = -EACCES;
if (atype == BPF_READ)
return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
flag);
mutex_lock(&nf_conn_btf_access_lock);
if (nfct_btf_struct_access)
ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag);
mutex_unlock(&nf_conn_btf_access_lock);
return ret;
}
static bool sock_addr_is_valid_access(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
@ -10558,6 +10648,7 @@ const struct bpf_verifier_ops tc_cls_act_verifier_ops = {
.convert_ctx_access = tc_cls_act_convert_ctx_access,
.gen_prologue = tc_cls_act_prologue,
.gen_ld_abs = bpf_gen_ld_abs,
.btf_struct_access = tc_cls_act_btf_struct_access,
};
const struct bpf_prog_ops tc_cls_act_prog_ops = {
@ -10569,6 +10660,7 @@ const struct bpf_verifier_ops xdp_verifier_ops = {
.is_valid_access = xdp_is_valid_access,
.convert_ctx_access = xdp_convert_ctx_access,
.gen_prologue = bpf_noop_prologue,
.btf_struct_access = xdp_btf_struct_access,
};
const struct bpf_prog_ops xdp_prog_ops = {

View File

@ -434,8 +434,10 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
if (copied + copy > len)
copy = len - copied;
copy = copy_page_to_iter(page, sge->offset, copy, iter);
if (!copy)
return copied ? copied : -EFAULT;
if (!copy) {
copied = copied ? copied : -EFAULT;
goto out;
}
copied += copy;
if (likely(!peek)) {
@ -455,7 +457,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
* didn't copy the entire length lets just break.
*/
if (copy != sge->length)
return copied;
goto out;
sk_msg_iter_var_next(i);
}
@ -477,7 +479,9 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
}
msg_rx = sk_psock_peek_msg(psock);
}
out:
if (psock->work_state.skb && copied > 0)
schedule_work(&psock->work);
return copied;
}
EXPORT_SYMBOL_GPL(sk_msg_recvmsg);

View File

@ -159,7 +159,8 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
*timeo_p = current_timeo;
}
out:
remove_wait_queue(sk_sleep(sk), &wait);
if (!sock_flag(sk, SOCK_DEAD))
remove_wait_queue(sk_sleep(sk), &wait);
return err;
do_error:

View File

@ -124,7 +124,7 @@ static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
return -EACCES;
}
return NOT_INIT;
return 0;
}
BPF_CALL_2(bpf_tcp_send_ack, struct tcp_sock *, tp, u32, rcv_nxt)

View File

@ -33,6 +33,7 @@
#include <linux/skbuff.h>
#include <linux/proc_fs.h>
#include <linux/export.h>
#include <linux/bpf-cgroup.h>
#include <net/sock.h>
#include <net/ping.h>
#include <net/udp.h>
@ -295,6 +296,19 @@ void ping_close(struct sock *sk, long timeout)
}
EXPORT_SYMBOL_GPL(ping_close);
static int ping_pre_connect(struct sock *sk, struct sockaddr *uaddr,
int addr_len)
{
/* This check is replicated from __ip4_datagram_connect() and
* intended to prevent BPF program called below from accessing bytes
* that are out of the bound specified by user in addr_len.
*/
if (addr_len < sizeof(struct sockaddr_in))
return -EINVAL;
return BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr);
}
/* Checks the bind address and possibly modifies sk->sk_bound_dev_if. */
static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk,
struct sockaddr *uaddr, int addr_len)
@ -1009,6 +1023,7 @@ struct proto ping_prot = {
.owner = THIS_MODULE,
.init = ping_init_sock,
.close = ping_close,
.pre_connect = ping_pre_connect,
.connect = ip4_datagram_connect,
.disconnect = __udp_disconnect,
.setsockopt = ip_setsockopt,

View File

@ -561,6 +561,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
newtp->fastopen_req = NULL;
RCU_INIT_POINTER(newtp->fastopen_rsk, NULL);
newtp->bpf_chg_cc_inprogress = 0;
tcp_bpf_clone(sk, newsk);
__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);

View File

@ -20,6 +20,7 @@
#include <net/udp.h>
#include <net/transp_v6.h>
#include <linux/proc_fs.h>
#include <linux/bpf-cgroup.h>
#include <net/ping.h>
static void ping_v6_destroy(struct sock *sk)
@ -49,6 +50,20 @@ static int dummy_ipv6_chk_addr(struct net *net, const struct in6_addr *addr,
return 0;
}
static int ping_v6_pre_connect(struct sock *sk, struct sockaddr *uaddr,
int addr_len)
{
/* This check is replicated from __ip6_datagram_connect() and
* intended to prevent BPF program called below from accessing
* bytes that are out of the bound specified by user in addr_len.
*/
if (addr_len < SIN6_LEN_RFC2133)
return -EINVAL;
return BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr);
}
static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
{
struct inet_sock *inet = inet_sk(sk);
@ -191,6 +206,7 @@ struct proto pingv6_prot = {
.init = ping_init_sock,
.close = ping_close,
.destroy = ping_v6_destroy,
.pre_connect = ping_v6_pre_connect,
.connect = ip6_datagram_connect_v6_only,
.disconnect = __udp_disconnect,
.setsockopt = ipv6_setsockopt,

View File

@ -60,6 +60,12 @@ obj-$(CONFIG_NF_NAT) += nf_nat.o
nf_nat-$(CONFIG_NF_NAT_REDIRECT) += nf_nat_redirect.o
nf_nat-$(CONFIG_NF_NAT_MASQUERADE) += nf_nat_masquerade.o
ifeq ($(CONFIG_NF_NAT),m)
nf_nat-$(CONFIG_DEBUG_INFO_BTF_MODULES) += nf_nat_bpf.o
else ifeq ($(CONFIG_NF_NAT),y)
nf_nat-$(CONFIG_DEBUG_INFO_BTF) += nf_nat_bpf.o
endif
# NAT helpers
obj-$(CONFIG_NF_NAT_AMANDA) += nf_nat_amanda.o
obj-$(CONFIG_NF_NAT_FTP) += nf_nat_ftp.o

View File

@ -6,12 +6,14 @@
* are exposed through to BPF programs is explicitly unstable.
*/
#include <linux/bpf_verifier.h>
#include <linux/bpf.h>
#include <linux/btf.h>
#include <linux/filter.h>
#include <linux/mutex.h>
#include <linux/types.h>
#include <linux/btf_ids.h>
#include <linux/net_namespace.h>
#include <net/netfilter/nf_conntrack.h>
#include <net/netfilter/nf_conntrack_bpf.h>
#include <net/netfilter/nf_conntrack_core.h>
@ -134,7 +136,6 @@ __bpf_nf_ct_alloc_entry(struct net *net, struct bpf_sock_tuple *bpf_tuple,
memset(&ct->proto, 0, sizeof(ct->proto));
__nf_ct_set_timeout(ct, timeout * HZ);
ct->status |= IPS_CONFIRMED;
out:
if (opts->netns_id >= 0)
@ -184,14 +185,58 @@ static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
return ct;
}
BTF_ID_LIST(btf_nf_conn_ids)
BTF_ID(struct, nf_conn)
BTF_ID(struct, nf_conn___init)
/* Check writes into `struct nf_conn` */
static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
const struct btf *btf,
const struct btf_type *t, int off,
int size, enum bpf_access_type atype,
u32 *next_btf_id,
enum bpf_type_flag *flag)
{
const struct btf_type *ncit;
const struct btf_type *nct;
size_t end;
ncit = btf_type_by_id(btf, btf_nf_conn_ids[1]);
nct = btf_type_by_id(btf, btf_nf_conn_ids[0]);
if (t != nct && t != ncit) {
bpf_log(log, "only read is supported\n");
return -EACCES;
}
/* `struct nf_conn` and `struct nf_conn___init` have the same layout
* so we are safe to simply merge offset checks here
*/
switch (off) {
#if defined(CONFIG_NF_CONNTRACK_MARK)
case offsetof(struct nf_conn, mark):
end = offsetofend(struct nf_conn, mark);
break;
#endif
default:
bpf_log(log, "no write support to nf_conn at off %d\n", off);
return -EACCES;
}
if (off + size > end) {
bpf_log(log,
"write access at off %d with size %d beyond the member of nf_conn ended at %zu\n",
off, size, end);
return -EACCES;
}
return 0;
}
__diag_push();
__diag_ignore_all("-Wmissing-prototypes",
"Global functions as their definitions will be in nf_conntrack BTF");
struct nf_conn___init {
struct nf_conn ct;
};
/* bpf_xdp_ct_alloc - Allocate a new CT entry
*
* Parameters:
@ -339,6 +384,7 @@ struct nf_conn *bpf_ct_insert_entry(struct nf_conn___init *nfct_i)
struct nf_conn *nfct = (struct nf_conn *)nfct_i;
int err;
nfct->status |= IPS_CONFIRMED;
err = nf_conntrack_hash_check_insert(nfct);
if (err < 0) {
nf_conntrack_free(nfct);
@ -449,5 +495,19 @@ int register_nf_conntrack_bpf(void)
int ret;
ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_kfunc_set);
return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_kfunc_set);
if (!ret) {
mutex_lock(&nf_conn_btf_access_lock);
nfct_btf_struct_access = _nf_conntrack_btf_struct_access;
mutex_unlock(&nf_conn_btf_access_lock);
}
return ret;
}
void cleanup_nf_conntrack_bpf(void)
{
mutex_lock(&nf_conn_btf_access_lock);
nfct_btf_struct_access = NULL;
mutex_unlock(&nf_conn_btf_access_lock);
}

View File

@ -2516,6 +2516,7 @@ static int kill_all(struct nf_conn *i, void *data)
void nf_conntrack_cleanup_start(void)
{
cleanup_nf_conntrack_bpf();
conntrack_gc_work.exiting = true;
}

View File

@ -0,0 +1,79 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Unstable NAT Helpers for XDP and TC-BPF hook
*
* These are called from the XDP and SCHED_CLS BPF programs. Note that it is
* allowed to break compatibility for these functions since the interface they
* are exposed through to BPF programs is explicitly unstable.
*/
#include <linux/bpf.h>
#include <linux/btf_ids.h>
#include <net/netfilter/nf_conntrack_bpf.h>
#include <net/netfilter/nf_conntrack_core.h>
#include <net/netfilter/nf_nat.h>
__diag_push();
__diag_ignore_all("-Wmissing-prototypes",
"Global functions as their definitions will be in nf_nat BTF");
/* bpf_ct_set_nat_info - Set source or destination nat address
*
* Set source or destination nat address of the newly allocated
* nf_conn before insertion. This must be invoked for referenced
* PTR_TO_BTF_ID to nf_conn___init.
*
* Parameters:
* @nfct - Pointer to referenced nf_conn object, obtained using
* bpf_xdp_ct_alloc or bpf_skb_ct_alloc.
* @addr - Nat source/destination address
* @port - Nat source/destination port. Non-positive values are
* interpreted as select a random port.
* @manip - NF_NAT_MANIP_SRC or NF_NAT_MANIP_DST
*/
int bpf_ct_set_nat_info(struct nf_conn___init *nfct,
union nf_inet_addr *addr, int port,
enum nf_nat_manip_type manip)
{
struct nf_conn *ct = (struct nf_conn *)nfct;
u16 proto = nf_ct_l3num(ct);
struct nf_nat_range2 range;
if (proto != NFPROTO_IPV4 && proto != NFPROTO_IPV6)
return -EINVAL;
memset(&range, 0, sizeof(struct nf_nat_range2));
range.flags = NF_NAT_RANGE_MAP_IPS;
range.min_addr = *addr;
range.max_addr = range.min_addr;
if (port > 0) {
range.flags |= NF_NAT_RANGE_PROTO_SPECIFIED;
range.min_proto.all = cpu_to_be16(port);
range.max_proto.all = range.min_proto.all;
}
return nf_nat_setup_info(ct, &range, manip) == NF_DROP ? -ENOMEM : 0;
}
__diag_pop()
BTF_SET8_START(nf_nat_kfunc_set)
BTF_ID_FLAGS(func, bpf_ct_set_nat_info, KF_TRUSTED_ARGS)
BTF_SET8_END(nf_nat_kfunc_set)
static const struct btf_kfunc_id_set nf_bpf_nat_kfunc_set = {
.owner = THIS_MODULE,
.set = &nf_nat_kfunc_set,
};
int register_nf_nat_bpf(void)
{
int ret;
ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP,
&nf_bpf_nat_kfunc_set);
if (ret)
return ret;
return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS,
&nf_bpf_nat_kfunc_set);
}

View File

@ -16,7 +16,7 @@
#include <linux/siphash.h>
#include <linux/rtnetlink.h>
#include <net/netfilter/nf_conntrack.h>
#include <net/netfilter/nf_conntrack_bpf.h>
#include <net/netfilter/nf_conntrack_core.h>
#include <net/netfilter/nf_conntrack_helper.h>
#include <net/netfilter/nf_conntrack_seqadj.h>
@ -1152,7 +1152,7 @@ static int __init nf_nat_init(void)
WARN_ON(nf_nat_hook != NULL);
RCU_INIT_POINTER(nf_nat_hook, &nat_hook);
return 0;
return register_nf_nat_bpf();
}
static void __exit nf_nat_cleanup(void)

View File

@ -10,7 +10,7 @@ int bpf_prog1(struct pt_regs *ctx)
return 0;
}
SEC("kretprobe/blk_account_io_done")
SEC("kretprobe/__blk_account_io_done")
int bpf_prog2(struct pt_regs *ctx)
{
return 0;

View File

@ -348,7 +348,7 @@ int main(int argc, char **argv)
/* test two functions in the corresponding *_kern.c file */
CHECK_AND_RET(test_debug_fs_kprobe(0, "blk_mq_start_request",
BPF_FD_TYPE_KPROBE));
CHECK_AND_RET(test_debug_fs_kprobe(1, "blk_account_io_done",
CHECK_AND_RET(test_debug_fs_kprobe(1, "__blk_account_io_done",
BPF_FD_TYPE_KRETPROBE));
/* test nondebug fs kprobe */

View File

@ -49,7 +49,7 @@ struct {
__uint(max_entries, SLOTS);
} lat_map SEC(".maps");
SEC("kprobe/blk_account_io_done")
SEC("kprobe/__blk_account_io_done")
int bpf_prog2(struct pt_regs *ctx)
{
long rq = PT_REGS_PARM1(ctx);

View File

@ -209,7 +209,7 @@ static void read_route(struct nlmsghdr *nh, int nll)
/* Rereading the route table to check if
* there is an entry with the same
* prefix but a different metric as the
* deleted enty.
* deleted entry.
*/
get_route_table(AF_INET);
} else if (prefix_key->data[0] ==

View File

@ -165,8 +165,6 @@ extern struct key *request_key_and_link(struct key_type *type,
extern bool lookup_user_key_possessed(const struct key *key,
const struct key_match_data *match_data);
#define KEY_LOOKUP_CREATE 0x01
#define KEY_LOOKUP_PARTIAL 0x02
extern long join_session_keyring(const char *name);
extern void key_change_session_keyring(struct callback_head *twork);

View File

@ -55,7 +55,7 @@ MAP COMMANDS
| | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
| | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
| | **task_storage** | **bloom_filter** }
| | **task_storage** | **bloom_filter** | **user_ringbuf** }
DESCRIPTION
===========

View File

@ -43,11 +43,6 @@ static const char * const btf_kind_str[NR_BTF_KINDS] = {
[BTF_KIND_ENUM64] = "ENUM64",
};
struct btf_attach_point {
__u32 obj_id;
__u32 btf_id;
};
static const char *btf_int_enc_str(__u8 encoding)
{
switch (encoding) {
@ -640,10 +635,9 @@ static int do_dump(int argc, char **argv)
btf = btf__parse_split(*argv, base ?: base_btf);
err = libbpf_get_error(btf);
if (err) {
btf = NULL;
if (!btf) {
p_err("failed to load BTF from %s: %s",
*argv, strerror(err));
*argv, strerror(errno));
goto done;
}
NEXT_ARG();
@ -688,8 +682,8 @@ static int do_dump(int argc, char **argv)
btf = btf__load_from_kernel_by_id_split(btf_id, base_btf);
err = libbpf_get_error(btf);
if (err) {
p_err("get btf by id (%u): %s", btf_id, strerror(err));
if (!btf) {
p_err("get btf by id (%u): %s", btf_id, strerror(errno));
goto done;
}
}
@ -825,7 +819,7 @@ build_btf_type_table(struct hashmap *tab, enum bpf_obj_type type,
u32_as_hash_field(id));
if (err) {
p_err("failed to append entry to hashmap for BTF ID %u, object ID %u: %s",
btf_id, id, strerror(errno));
btf_id, id, strerror(-err));
goto err_free;
}
}

View File

@ -1594,14 +1594,14 @@ static int do_object(int argc, char **argv)
err = bpf_linker__add_file(linker, file, NULL);
if (err) {
p_err("failed to link '%s': %s (%d)", file, strerror(err), err);
p_err("failed to link '%s': %s (%d)", file, strerror(errno), errno);
goto out;
}
}
err = bpf_linker__finalize(linker);
if (err) {
p_err("failed to finalize ELF file: %s (%d)", strerror(err), err);
p_err("failed to finalize ELF file: %s (%d)", strerror(errno), errno);
goto out;
}

View File

@ -106,6 +106,13 @@ static const char *cgroup_order_string(__u32 order)
}
}
static bool is_iter_task_target(const char *target_name)
{
return strcmp(target_name, "task") == 0 ||
strcmp(target_name, "task_file") == 0 ||
strcmp(target_name, "task_vma") == 0;
}
static void show_iter_json(struct bpf_link_info *info, json_writer_t *wtr)
{
const char *target_name = u64_to_ptr(info->iter.target_name);
@ -114,6 +121,12 @@ static void show_iter_json(struct bpf_link_info *info, json_writer_t *wtr)
if (is_iter_map_target(target_name))
jsonw_uint_field(wtr, "map_id", info->iter.map.map_id);
else if (is_iter_task_target(target_name)) {
if (info->iter.task.tid)
jsonw_uint_field(wtr, "tid", info->iter.task.tid);
else if (info->iter.task.pid)
jsonw_uint_field(wtr, "pid", info->iter.task.pid);
}
if (is_iter_cgroup_target(target_name)) {
jsonw_lluint_field(wtr, "cgroup_id", info->iter.cgroup.cgroup_id);
@ -237,6 +250,12 @@ static void show_iter_plain(struct bpf_link_info *info)
if (is_iter_map_target(target_name))
printf("map_id %u ", info->iter.map.map_id);
else if (is_iter_task_target(target_name)) {
if (info->iter.task.tid)
printf("tid %u ", info->iter.task.tid);
else if (info->iter.task.pid)
printf("pid %u ", info->iter.task.pid);
}
if (is_iter_cgroup_target(target_name)) {
printf("cgroup_id %llu ", info->iter.cgroup.cgroup_id);

View File

@ -1459,7 +1459,7 @@ static int do_help(int argc, char **argv)
" devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
" cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
" queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
" task_storage | bloom_filter }\n"
" task_storage | bloom_filter | user_ringbuf }\n"
" " HELP_SPEC_OPTIONS " |\n"
" {-f|--bpffs} | {-n|--nomount} }\n"
"",

View File

@ -29,13 +29,6 @@
static volatile bool stop;
struct event_ring_info {
int fd;
int key;
unsigned int cpu;
void *mem;
};
struct perf_event_sample {
struct perf_event_header header;
__u64 time;
@ -195,10 +188,9 @@ int do_event_pipe(int argc, char **argv)
opts.map_keys = &ctx.idx;
pb = perf_buffer__new_raw(map_fd, MMAP_PAGE_CNT, &perf_attr,
print_bpf_output, &ctx, &opts);
err = libbpf_get_error(pb);
if (err) {
if (!pb) {
p_err("failed to create perf buffer: %s (%d)",
strerror(err), err);
strerror(errno), errno);
goto err_close_map;
}
@ -213,7 +205,7 @@ int do_event_pipe(int argc, char **argv)
err = perf_buffer__poll(pb, 200);
if (err < 0 && err != -EINTR) {
p_err("perf buffer polling failed: %s (%d)",
strerror(err), err);
strerror(errno), errno);
goto err_close_pb;
}
}

View File

@ -110,6 +110,12 @@ union bpf_iter_link_info {
__u32 cgroup_fd;
__u64 cgroup_id;
} cgroup;
/* Parameters of task iterators. */
struct {
__u32 tid;
__u32 pid;
__u32 pid_fd;
} task;
};
/* BPF syscall commands, see bpf(2) man-page for more details. */
@ -928,6 +934,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_INODE_STORAGE,
BPF_MAP_TYPE_TASK_STORAGE,
BPF_MAP_TYPE_BLOOM_FILTER,
BPF_MAP_TYPE_USER_RINGBUF,
};
/* Note that tracing related programs such as
@ -4950,6 +4957,7 @@ union bpf_attr {
* Get address of the traced function (for tracing and kprobe programs).
* Return
* Address of the traced function.
* 0 for kprobes placed within the function (not at the entry).
*
* u64 bpf_get_attach_cookie(void *ctx)
* Description
@ -5079,12 +5087,12 @@ union bpf_attr {
*
* long bpf_get_func_arg(void *ctx, u32 n, u64 *value)
* Description
* Get **n**-th argument (zero based) of the traced function (for tracing programs)
* Get **n**-th argument register (zero based) of the traced function (for tracing programs)
* returned in **value**.
*
* Return
* 0 on success.
* **-EINVAL** if n >= arguments count of traced function.
* **-EINVAL** if n >= argument register count of traced function.
*
* long bpf_get_func_ret(void *ctx, u64 *value)
* Description
@ -5097,10 +5105,11 @@ union bpf_attr {
*
* long bpf_get_func_arg_cnt(void *ctx)
* Description
* Get number of arguments of the traced function (for tracing programs).
* Get number of registers of the traced function (for tracing programs) where
* function arguments are stored in these registers.
*
* Return
* The number of arguments of the traced function.
* The number of argument registers of the traced function.
*
* int bpf_get_retval(void)
* Description
@ -5386,6 +5395,43 @@ union bpf_attr {
* Return
* Current *ktime*.
*
* long bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags)
* Description
* Drain samples from the specified user ring buffer, and invoke
* the provided callback for each such sample:
*
* long (\*callback_fn)(struct bpf_dynptr \*dynptr, void \*ctx);
*
* If **callback_fn** returns 0, the helper will continue to try
* and drain the next sample, up to a maximum of
* BPF_MAX_USER_RINGBUF_SAMPLES samples. If the return value is 1,
* the helper will skip the rest of the samples and return. Other
* return values are not used now, and will be rejected by the
* verifier.
* Return
* The number of drained samples if no error was encountered while
* draining samples, or 0 if no samples were present in the ring
* buffer. If a user-space producer was epoll-waiting on this map,
* and at least one sample was drained, they will receive an event
* notification notifying them of available space in the ring
* buffer. If the BPF_RB_NO_WAKEUP flag is passed to this
* function, no wakeup notification will be sent. If the
* BPF_RB_FORCE_WAKEUP flag is passed, a wakeup notification will
* be sent even if no sample was drained.
*
* On failure, the returned value is one of the following:
*
* **-EBUSY** if the ring buffer is contended, and another calling
* context was concurrently draining the ring buffer.
*
* **-EINVAL** if user-space is not properly tracking the ring
* buffer due to the producer position not being aligned to 8
* bytes, a sample not being aligned to 8 bytes, or the producer
* position not matching the advertised length of a sample.
*
* **-E2BIG** if user-space has tried to publish a sample which is
* larger than the size of the ring buffer, or which cannot fit
* within a struct bpf_dynptr.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@ -5597,6 +5643,7 @@ union bpf_attr {
FN(tcp_raw_check_syncookie_ipv4), \
FN(tcp_raw_check_syncookie_ipv6), \
FN(ktime_get_tai_ns), \
FN(user_ringbuf_drain), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -6218,6 +6265,10 @@ struct bpf_link_info {
__u64 cgroup_id;
__u32 order;
} cgroup;
struct {
__u32 tid;
__u32 pid;
} task;
};
} iter;
struct {

View File

@ -131,7 +131,7 @@
/*
* Helper function to perform a tail call with a constant/immediate map slot.
*/
#if (!defined(__clang__) || __clang_major__ >= 8) && defined(__bpf__)
#if __clang_major__ >= 8 && defined(__bpf__)
static __always_inline void
bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
{
@ -139,8 +139,8 @@ bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
__bpf_unreachable();
/*
* Provide a hard guarantee that the compiler won't optimize setting r2
* (map pointer) and r3 (constant map index) from _different paths_ ending
* Provide a hard guarantee that LLVM won't optimize setting r2 (map
* pointer) and r3 (constant map index) from _different paths_ ending
* up at the _same_ call insn as otherwise we won't be able to use the
* jmpq/nopl retpoline-free patching by the x86-64 JIT in the kernel
* given they mismatch. See also d2e4c1e6c294 ("bpf: Constant map key
@ -148,37 +148,18 @@ bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
*
* Note on clobber list: we need to stay in-line with BPF calling
* convention, so even if we don't end up using r0, r4, r5, we need
* to mark them as clobber so that the compiler doesn't end up using
* them before / after the call.
* to mark them as clobber so that LLVM doesn't end up using them
* before / after the call.
*/
asm volatile(
#ifdef __clang__
"r1 = %[ctx]\n\t"
asm volatile("r1 = %[ctx]\n\t"
"r2 = %[map]\n\t"
"r3 = %[slot]\n\t"
#else
"mov %%r1,%[ctx]\n\t"
"mov %%r2,%[map]\n\t"
"mov %%r3,%[slot]\n\t"
#endif
"call 12"
:: [ctx]"r"(ctx), [map]"r"(map), [slot]"i"(slot)
: "r0", "r1", "r2", "r3", "r4", "r5");
}
#endif
/*
* Helper structure used by eBPF C program
* to describe BPF map attributes to libbpf loader
*/
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
unsigned int map_flags;
} __attribute__((deprecated("use BTF-defined maps in .maps section")));
enum libbpf_pin_type {
LIBBPF_PIN_NONE,
/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */

View File

@ -438,6 +438,113 @@ typeof(name(0)) name(unsigned long long *ctx) \
static __always_inline typeof(name(0)) \
____##name(unsigned long long *ctx, ##args)
#ifndef ___bpf_nth2
#define ___bpf_nth2(_, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, \
_14, _15, _16, _17, _18, _19, _20, _21, _22, _23, _24, N, ...) N
#endif
#ifndef ___bpf_narg2
#define ___bpf_narg2(...) \
___bpf_nth2(_, ##__VA_ARGS__, 12, 12, 11, 11, 10, 10, 9, 9, 8, 8, 7, 7, \
6, 6, 5, 5, 4, 4, 3, 3, 2, 2, 1, 1, 0)
#endif
#define ___bpf_treg_cnt(t) \
__builtin_choose_expr(sizeof(t) == 1, 1, \
__builtin_choose_expr(sizeof(t) == 2, 1, \
__builtin_choose_expr(sizeof(t) == 4, 1, \
__builtin_choose_expr(sizeof(t) == 8, 1, \
__builtin_choose_expr(sizeof(t) == 16, 2, \
(void)0)))))
#define ___bpf_reg_cnt0() (0)
#define ___bpf_reg_cnt1(t, x) (___bpf_reg_cnt0() + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt2(t, x, args...) (___bpf_reg_cnt1(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt3(t, x, args...) (___bpf_reg_cnt2(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt4(t, x, args...) (___bpf_reg_cnt3(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt5(t, x, args...) (___bpf_reg_cnt4(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt6(t, x, args...) (___bpf_reg_cnt5(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt7(t, x, args...) (___bpf_reg_cnt6(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt8(t, x, args...) (___bpf_reg_cnt7(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt9(t, x, args...) (___bpf_reg_cnt8(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt10(t, x, args...) (___bpf_reg_cnt9(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt11(t, x, args...) (___bpf_reg_cnt10(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt12(t, x, args...) (___bpf_reg_cnt11(args) + ___bpf_treg_cnt(t))
#define ___bpf_reg_cnt(args...) ___bpf_apply(___bpf_reg_cnt, ___bpf_narg2(args))(args)
#define ___bpf_union_arg(t, x, n) \
__builtin_choose_expr(sizeof(t) == 1, ({ union { __u8 z[1]; t x; } ___t = { .z = {ctx[n]}}; ___t.x; }), \
__builtin_choose_expr(sizeof(t) == 2, ({ union { __u16 z[1]; t x; } ___t = { .z = {ctx[n]} }; ___t.x; }), \
__builtin_choose_expr(sizeof(t) == 4, ({ union { __u32 z[1]; t x; } ___t = { .z = {ctx[n]} }; ___t.x; }), \
__builtin_choose_expr(sizeof(t) == 8, ({ union { __u64 z[1]; t x; } ___t = {.z = {ctx[n]} }; ___t.x; }), \
__builtin_choose_expr(sizeof(t) == 16, ({ union { __u64 z[2]; t x; } ___t = {.z = {ctx[n], ctx[n + 1]} }; ___t.x; }), \
(void)0)))))
#define ___bpf_ctx_arg0(n, args...)
#define ___bpf_ctx_arg1(n, t, x) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt1(t, x))
#define ___bpf_ctx_arg2(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt2(t, x, args)) ___bpf_ctx_arg1(n, args)
#define ___bpf_ctx_arg3(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt3(t, x, args)) ___bpf_ctx_arg2(n, args)
#define ___bpf_ctx_arg4(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt4(t, x, args)) ___bpf_ctx_arg3(n, args)
#define ___bpf_ctx_arg5(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt5(t, x, args)) ___bpf_ctx_arg4(n, args)
#define ___bpf_ctx_arg6(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt6(t, x, args)) ___bpf_ctx_arg5(n, args)
#define ___bpf_ctx_arg7(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt7(t, x, args)) ___bpf_ctx_arg6(n, args)
#define ___bpf_ctx_arg8(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt8(t, x, args)) ___bpf_ctx_arg7(n, args)
#define ___bpf_ctx_arg9(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt9(t, x, args)) ___bpf_ctx_arg8(n, args)
#define ___bpf_ctx_arg10(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt10(t, x, args)) ___bpf_ctx_arg9(n, args)
#define ___bpf_ctx_arg11(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt11(t, x, args)) ___bpf_ctx_arg10(n, args)
#define ___bpf_ctx_arg12(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt12(t, x, args)) ___bpf_ctx_arg11(n, args)
#define ___bpf_ctx_arg(args...) ___bpf_apply(___bpf_ctx_arg, ___bpf_narg2(args))(___bpf_reg_cnt(args), args)
#define ___bpf_ctx_decl0()
#define ___bpf_ctx_decl1(t, x) , t x
#define ___bpf_ctx_decl2(t, x, args...) , t x ___bpf_ctx_decl1(args)
#define ___bpf_ctx_decl3(t, x, args...) , t x ___bpf_ctx_decl2(args)
#define ___bpf_ctx_decl4(t, x, args...) , t x ___bpf_ctx_decl3(args)
#define ___bpf_ctx_decl5(t, x, args...) , t x ___bpf_ctx_decl4(args)
#define ___bpf_ctx_decl6(t, x, args...) , t x ___bpf_ctx_decl5(args)
#define ___bpf_ctx_decl7(t, x, args...) , t x ___bpf_ctx_decl6(args)
#define ___bpf_ctx_decl8(t, x, args...) , t x ___bpf_ctx_decl7(args)
#define ___bpf_ctx_decl9(t, x, args...) , t x ___bpf_ctx_decl8(args)
#define ___bpf_ctx_decl10(t, x, args...) , t x ___bpf_ctx_decl9(args)
#define ___bpf_ctx_decl11(t, x, args...) , t x ___bpf_ctx_decl10(args)
#define ___bpf_ctx_decl12(t, x, args...) , t x ___bpf_ctx_decl11(args)
#define ___bpf_ctx_decl(args...) ___bpf_apply(___bpf_ctx_decl, ___bpf_narg2(args))(args)
/*
* BPF_PROG2 is an enhanced version of BPF_PROG in order to handle struct
* arguments. Since each struct argument might take one or two u64 values
* in the trampoline stack, argument type size is needed to place proper number
* of u64 values for each argument. Therefore, BPF_PROG2 has different
* syntax from BPF_PROG. For example, for the following BPF_PROG syntax:
*
* int BPF_PROG(test2, int a, int b) { ... }
*
* the corresponding BPF_PROG2 syntax is:
*
* int BPF_PROG2(test2, int, a, int, b) { ... }
*
* where type and the corresponding argument name are separated by comma.
*
* Use BPF_PROG2 macro if one of the arguments might be a struct/union larger
* than 8 bytes:
*
* int BPF_PROG2(test_struct_arg, struct bpf_testmod_struct_arg_1, a, int, b,
* int, c, int, d, struct bpf_testmod_struct_arg_2, e, int, ret)
* {
* // access a, b, c, d, e, and ret directly
* ...
* }
*/
#define BPF_PROG2(name, args...) \
name(unsigned long long *ctx); \
static __always_inline typeof(name(0)) \
____##name(unsigned long long *ctx ___bpf_ctx_decl(args)); \
typeof(name(0)) name(unsigned long long *ctx) \
{ \
return ____##name(ctx ___bpf_ctx_arg(args)); \
} \
static __always_inline typeof(name(0)) \
____##name(unsigned long long *ctx ___bpf_ctx_decl(args))
struct pt_regs;
#define ___bpf_kprobe_args0() ctx

View File

@ -4642,20 +4642,17 @@ static int btf_dedup_remap_types(struct btf_dedup *d)
*/
struct btf *btf__load_vmlinux_btf(void)
{
struct {
const char *path_fmt;
bool raw_btf;
} locations[] = {
const char *locations[] = {
/* try canonical vmlinux BTF through sysfs first */
{ "/sys/kernel/btf/vmlinux", true /* raw BTF */ },
/* fall back to trying to find vmlinux ELF on disk otherwise */
{ "/boot/vmlinux-%1$s" },
{ "/lib/modules/%1$s/vmlinux-%1$s" },
{ "/lib/modules/%1$s/build/vmlinux" },
{ "/usr/lib/modules/%1$s/kernel/vmlinux" },
{ "/usr/lib/debug/boot/vmlinux-%1$s" },
{ "/usr/lib/debug/boot/vmlinux-%1$s.debug" },
{ "/usr/lib/debug/lib/modules/%1$s/vmlinux" },
"/sys/kernel/btf/vmlinux",
/* fall back to trying to find vmlinux on disk otherwise */
"/boot/vmlinux-%1$s",
"/lib/modules/%1$s/vmlinux-%1$s",
"/lib/modules/%1$s/build/vmlinux",
"/usr/lib/modules/%1$s/kernel/vmlinux",
"/usr/lib/debug/boot/vmlinux-%1$s",
"/usr/lib/debug/boot/vmlinux-%1$s.debug",
"/usr/lib/debug/lib/modules/%1$s/vmlinux",
};
char path[PATH_MAX + 1];
struct utsname buf;
@ -4665,15 +4662,12 @@ struct btf *btf__load_vmlinux_btf(void)
uname(&buf);
for (i = 0; i < ARRAY_SIZE(locations); i++) {
snprintf(path, PATH_MAX, locations[i].path_fmt, buf.release);
snprintf(path, PATH_MAX, locations[i], buf.release);
if (access(path, R_OK))
if (faccessat(AT_FDCWD, path, R_OK, AT_EACCESS))
continue;
if (locations[i].raw_btf)
btf = btf__parse_raw(path);
else
btf = btf__parse_elf(path, NULL);
btf = btf__parse(path, NULL);
err = libbpf_get_error(btf);
pr_debug("loading kernel BTF '%s': %d\n", path, err);
if (err)

View File

@ -486,6 +486,8 @@ static inline struct btf_enum *btf_enum(const struct btf_type *t)
return (struct btf_enum *)(t + 1);
}
struct btf_enum64;
static inline struct btf_enum64 *btf_enum64(const struct btf_type *t)
{
return (struct btf_enum64 *)(t + 1);
@ -493,7 +495,28 @@ static inline struct btf_enum64 *btf_enum64(const struct btf_type *t)
static inline __u64 btf_enum64_value(const struct btf_enum64 *e)
{
return ((__u64)e->val_hi32 << 32) | e->val_lo32;
/* struct btf_enum64 is introduced in Linux 6.0, which is very
* bleeding-edge. Here we are avoiding relying on struct btf_enum64
* definition coming from kernel UAPI headers to support wider range
* of system-wide kernel headers.
*
* Given this header can be also included from C++ applications, that
* further restricts C tricks we can use (like using compatible
* anonymous struct). So just treat struct btf_enum64 as
* a three-element array of u32 and access second (lo32) and third
* (hi32) elements directly.
*
* For reference, here is a struct btf_enum64 definition:
*
* const struct btf_enum64 {
* __u32 name_off;
* __u32 val_lo32;
* __u32 val_hi32;
* };
*/
const __u32 *e64 = (const __u32 *)e;
return ((__u64)e64[2] << 32) | e64[1];
}
static inline struct btf_member *btf_members(const struct btf_type *t)

View File

@ -2385,7 +2385,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
d->typed_dump->indent_lvl = OPTS_GET(opts, indent_level, 0);
/* default indent string is a tab */
if (!opts->indent_str)
if (!OPTS_GET(opts, indent_str, NULL))
d->typed_dump->indent_str[0] = '\t';
else
libbpf_strlcpy(d->typed_dump->indent_str, opts->indent_str,

View File

@ -163,6 +163,7 @@ static const char * const map_type_name[] = {
[BPF_MAP_TYPE_INODE_STORAGE] = "inode_storage",
[BPF_MAP_TYPE_TASK_STORAGE] = "task_storage",
[BPF_MAP_TYPE_BLOOM_FILTER] = "bloom_filter",
[BPF_MAP_TYPE_USER_RINGBUF] = "user_ringbuf",
};
static const char * const prog_type_name[] = {
@ -883,7 +884,7 @@ __u32 get_kernel_version(void)
__u32 major, minor, patch;
struct utsname info;
if (access(ubuntu_kver_file, R_OK) == 0) {
if (faccessat(AT_FDCWD, ubuntu_kver_file, R_OK, AT_EACCESS) == 0) {
FILE *f;
f = fopen(ubuntu_kver_file, "r");
@ -2096,19 +2097,30 @@ static bool get_map_field_int(const char *map_name, const struct btf *btf,
return true;
}
static int pathname_concat(char *buf, size_t buf_sz, const char *path, const char *name)
{
int len;
len = snprintf(buf, buf_sz, "%s/%s", path, name);
if (len < 0)
return -EINVAL;
if (len >= buf_sz)
return -ENAMETOOLONG;
return 0;
}
static int build_map_pin_path(struct bpf_map *map, const char *path)
{
char buf[PATH_MAX];
int len;
int err;
if (!path)
path = "/sys/fs/bpf";
len = snprintf(buf, PATH_MAX, "%s/%s", path, bpf_map__name(map));
if (len < 0)
return -EINVAL;
else if (len >= PATH_MAX)
return -ENAMETOOLONG;
err = pathname_concat(buf, sizeof(buf), path, bpf_map__name(map));
if (err)
return err;
return bpf_map__set_pin_path(map, buf);
}
@ -2372,6 +2384,12 @@ static size_t adjust_ringbuf_sz(size_t sz)
return sz;
}
static bool map_is_ringbuf(const struct bpf_map *map)
{
return map->def.type == BPF_MAP_TYPE_RINGBUF ||
map->def.type == BPF_MAP_TYPE_USER_RINGBUF;
}
static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def)
{
map->def.type = def->map_type;
@ -2386,7 +2404,7 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
map->btf_value_type_id = def->value_type_id;
/* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */
if (map->def.type == BPF_MAP_TYPE_RINGBUF)
if (map_is_ringbuf(map))
map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
if (def->parts & MAP_DEF_MAP_TYPE)
@ -4369,7 +4387,7 @@ int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries)
map->def.max_entries = max_entries;
/* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */
if (map->def.type == BPF_MAP_TYPE_RINGBUF)
if (map_is_ringbuf(map))
map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
return 0;
@ -7961,17 +7979,9 @@ int bpf_object__pin_maps(struct bpf_object *obj, const char *path)
continue;
if (path) {
int len;
len = snprintf(buf, PATH_MAX, "%s/%s", path,
bpf_map__name(map));
if (len < 0) {
err = -EINVAL;
err = pathname_concat(buf, sizeof(buf), path, bpf_map__name(map));
if (err)
goto err_unpin_maps;
} else if (len >= PATH_MAX) {
err = -ENAMETOOLONG;
goto err_unpin_maps;
}
sanitize_pin_path(buf);
pin_path = buf;
} else if (!map->pin_path) {
@ -8009,14 +8019,9 @@ int bpf_object__unpin_maps(struct bpf_object *obj, const char *path)
char buf[PATH_MAX];
if (path) {
int len;
len = snprintf(buf, PATH_MAX, "%s/%s", path,
bpf_map__name(map));
if (len < 0)
return libbpf_err(-EINVAL);
else if (len >= PATH_MAX)
return libbpf_err(-ENAMETOOLONG);
err = pathname_concat(buf, sizeof(buf), path, bpf_map__name(map));
if (err)
return libbpf_err(err);
sanitize_pin_path(buf);
pin_path = buf;
} else if (!map->pin_path) {
@ -8034,6 +8039,7 @@ int bpf_object__unpin_maps(struct bpf_object *obj, const char *path)
int bpf_object__pin_programs(struct bpf_object *obj, const char *path)
{
struct bpf_program *prog;
char buf[PATH_MAX];
int err;
if (!obj)
@ -8045,17 +8051,9 @@ int bpf_object__pin_programs(struct bpf_object *obj, const char *path)
}
bpf_object__for_each_program(prog, obj) {
char buf[PATH_MAX];
int len;
len = snprintf(buf, PATH_MAX, "%s/%s", path, prog->name);
if (len < 0) {
err = -EINVAL;
err = pathname_concat(buf, sizeof(buf), path, prog->name);
if (err)
goto err_unpin_programs;
} else if (len >= PATH_MAX) {
err = -ENAMETOOLONG;
goto err_unpin_programs;
}
err = bpf_program__pin(prog, buf);
if (err)
@ -8066,13 +8064,7 @@ int bpf_object__pin_programs(struct bpf_object *obj, const char *path)
err_unpin_programs:
while ((prog = bpf_object__prev_program(obj, prog))) {
char buf[PATH_MAX];
int len;
len = snprintf(buf, PATH_MAX, "%s/%s", path, prog->name);
if (len < 0)
continue;
else if (len >= PATH_MAX)
if (pathname_concat(buf, sizeof(buf), path, prog->name))
continue;
bpf_program__unpin(prog, buf);
@ -8091,13 +8083,10 @@ int bpf_object__unpin_programs(struct bpf_object *obj, const char *path)
bpf_object__for_each_program(prog, obj) {
char buf[PATH_MAX];
int len;
len = snprintf(buf, PATH_MAX, "%s/%s", path, prog->name);
if (len < 0)
return libbpf_err(-EINVAL);
else if (len >= PATH_MAX)
return libbpf_err(-ENAMETOOLONG);
err = pathname_concat(buf, sizeof(buf), path, prog->name);
if (err)
return libbpf_err(err);
err = bpf_program__unpin(prog, buf);
if (err)
@ -9084,11 +9073,15 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, const char *attac
int err = 0;
/* BPF program's BTF ID */
if (attach_prog_fd) {
if (prog->type == BPF_PROG_TYPE_EXT || attach_prog_fd) {
if (!attach_prog_fd) {
pr_warn("prog '%s': attach program FD is not set\n", prog->name);
return -EINVAL;
}
err = libbpf_find_prog_btf_id(attach_name, attach_prog_fd);
if (err < 0) {
pr_warn("failed to find BPF program (FD %d) BTF ID for '%s': %d\n",
attach_prog_fd, attach_name, err);
pr_warn("prog '%s': failed to find BPF program (FD %d) BTF ID for '%s': %d\n",
prog->name, attach_prog_fd, attach_name, err);
return err;
}
*btf_obj_fd = 0;
@ -9105,7 +9098,8 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, const char *attac
err = find_kernel_btf_id(prog->obj, attach_name, attach_type, btf_obj_fd, btf_type_id);
}
if (err) {
pr_warn("failed to find kernel BTF type ID of '%s': %d\n", attach_name, err);
pr_warn("prog '%s': failed to find kernel BTF type ID of '%s': %d\n",
prog->name, attach_name, err);
return err;
}
return 0;
@ -9910,7 +9904,7 @@ static bool use_debugfs(void)
static int has_debugfs = -1;
if (has_debugfs < 0)
has_debugfs = access(DEBUGFS, F_OK) == 0;
has_debugfs = faccessat(AT_FDCWD, DEBUGFS, F_OK, AT_EACCESS) == 0;
return has_debugfs == 1;
}
@ -10727,7 +10721,7 @@ static int resolve_full_path(const char *file, char *result, size_t result_sz)
continue;
snprintf(result, result_sz, "%.*s/%s", seg_len, s, file);
/* ensure it has required permissions */
if (access(result, perm) < 0)
if (faccessat(AT_FDCWD, result, perm, AT_EACCESS) < 0)
continue;
pr_debug("resolved '%s' to '%s'\n", file, result);
return 0;

View File

@ -118,7 +118,9 @@ struct bpf_object_open_opts {
* auto-pinned to that path on load; defaults to "/sys/fs/bpf".
*/
const char *pin_root_path;
long :0;
__u32 :32; /* stub out now removed attach_prog_fd */
/* Additional kernel config content that augments and overrides
* system Kconfig for CONFIG_xxx externs.
*/
@ -1011,6 +1013,7 @@ LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook,
/* Ring buffer APIs */
struct ring_buffer;
struct user_ring_buffer;
typedef int (*ring_buffer_sample_fn)(void *ctx, void *data, size_t size);
@ -1030,6 +1033,112 @@ LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms);
LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb);
LIBBPF_API int ring_buffer__epoll_fd(const struct ring_buffer *rb);
struct user_ring_buffer_opts {
size_t sz; /* size of this struct, for forward/backward compatibility */
};
#define user_ring_buffer_opts__last_field sz
/* @brief **user_ring_buffer__new()** creates a new instance of a user ring
* buffer.
*
* @param map_fd A file descriptor to a BPF_MAP_TYPE_USER_RINGBUF map.
* @param opts Options for how the ring buffer should be created.
* @return A user ring buffer on success; NULL and errno being set on a
* failure.
*/
LIBBPF_API struct user_ring_buffer *
user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts);
/* @brief **user_ring_buffer__reserve()** reserves a pointer to a sample in the
* user ring buffer.
* @param rb A pointer to a user ring buffer.
* @param size The size of the sample, in bytes.
* @return A pointer to an 8-byte aligned reserved region of the user ring
* buffer; NULL, and errno being set if a sample could not be reserved.
*
* This function is *not* thread safe, and callers must synchronize accessing
* this function if there are multiple producers. If a size is requested that
* is larger than the size of the entire ring buffer, errno will be set to
* E2BIG and NULL is returned. If the ring buffer could accommodate the size,
* but currently does not have enough space, errno is set to ENOSPC and NULL is
* returned.
*
* After initializing the sample, callers must invoke
* **user_ring_buffer__submit()** to post the sample to the kernel. Otherwise,
* the sample must be freed with **user_ring_buffer__discard()**.
*/
LIBBPF_API void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
/* @brief **user_ring_buffer__reserve_blocking()** reserves a record in the
* ring buffer, possibly blocking for up to @timeout_ms until a sample becomes
* available.
* @param rb The user ring buffer.
* @param size The size of the sample, in bytes.
* @param timeout_ms The amount of time, in milliseconds, for which the caller
* should block when waiting for a sample. -1 causes the caller to block
* indefinitely.
* @return A pointer to an 8-byte aligned reserved region of the user ring
* buffer; NULL, and errno being set if a sample could not be reserved.
*
* This function is *not* thread safe, and callers must synchronize
* accessing this function if there are multiple producers
*
* If **timeout_ms** is -1, the function will block indefinitely until a sample
* becomes available. Otherwise, **timeout_ms** must be non-negative, or errno
* is set to EINVAL, and NULL is returned. If **timeout_ms** is 0, no blocking
* will occur and the function will return immediately after attempting to
* reserve a sample.
*
* If **size** is larger than the size of the entire ring buffer, errno is set
* to E2BIG and NULL is returned. If the ring buffer could accommodate
* **size**, but currently does not have enough space, the caller will block
* until at most **timeout_ms** has elapsed. If insufficient space is available
* at that time, errno is set to ENOSPC, and NULL is returned.
*
* The kernel guarantees that it will wake up this thread to check if
* sufficient space is available in the ring buffer at least once per
* invocation of the **bpf_ringbuf_drain()** helper function, provided that at
* least one sample is consumed, and the BPF program did not invoke the
* function with BPF_RB_NO_WAKEUP. A wakeup may occur sooner than that, but the
* kernel does not guarantee this. If the helper function is invoked with
* BPF_RB_FORCE_WAKEUP, a wakeup event will be sent even if no sample is
* consumed.
*
* When a sample of size **size** is found within **timeout_ms**, a pointer to
* the sample is returned. After initializing the sample, callers must invoke
* **user_ring_buffer__submit()** to post the sample to the ring buffer.
* Otherwise, the sample must be freed with **user_ring_buffer__discard()**.
*/
LIBBPF_API void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
__u32 size,
int timeout_ms);
/* @brief **user_ring_buffer__submit()** submits a previously reserved sample
* into the ring buffer.
* @param rb The user ring buffer.
* @param sample A reserved sample.
*
* It is not necessary to synchronize amongst multiple producers when invoking
* this function.
*/
LIBBPF_API void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample);
/* @brief **user_ring_buffer__discard()** discards a previously reserved sample.
* @param rb The user ring buffer.
* @param sample A reserved sample.
*
* It is not necessary to synchronize amongst multiple producers when invoking
* this function.
*/
LIBBPF_API void user_ring_buffer__discard(struct user_ring_buffer *rb, void *sample);
/* @brief **user_ring_buffer__free()** frees a ring buffer that was previously
* created with **user_ring_buffer__new()**.
* @param rb The user ring buffer being freed.
*/
LIBBPF_API void user_ring_buffer__free(struct user_ring_buffer *rb);
/* Perf buffer APIs */
struct perf_buffer;

View File

@ -368,3 +368,13 @@ LIBBPF_1.0.0 {
libbpf_bpf_prog_type_str;
perf_buffer__buffer;
};
LIBBPF_1.1.0 {
global:
user_ring_buffer__discard;
user_ring_buffer__free;
user_ring_buffer__new;
user_ring_buffer__reserve;
user_ring_buffer__reserve_blocking;
user_ring_buffer__submit;
} LIBBPF_1.0.0;

View File

@ -231,6 +231,7 @@ static int probe_map_create(enum bpf_map_type map_type)
return btf_fd;
break;
case BPF_MAP_TYPE_RINGBUF:
case BPF_MAP_TYPE_USER_RINGBUF:
key_size = 0;
value_size = 0;
max_entries = 4096;

View File

@ -4,6 +4,6 @@
#define __LIBBPF_VERSION_H
#define LIBBPF_MAJOR_VERSION 1
#define LIBBPF_MINOR_VERSION 0
#define LIBBPF_MINOR_VERSION 1
#endif /* __LIBBPF_VERSION_H */

View File

@ -32,7 +32,7 @@ static struct nlattr *nla_next(const struct nlattr *nla, int *remaining)
static int nla_ok(const struct nlattr *nla, int remaining)
{
return remaining >= sizeof(*nla) &&
return remaining >= (int)sizeof(*nla) &&
nla->nla_len >= sizeof(*nla) &&
nla->nla_len <= remaining;
}

View File

@ -16,6 +16,7 @@
#include <asm/barrier.h>
#include <sys/mman.h>
#include <sys/epoll.h>
#include <time.h>
#include "libbpf.h"
#include "libbpf_internal.h"
@ -39,6 +40,23 @@ struct ring_buffer {
int ring_cnt;
};
struct user_ring_buffer {
struct epoll_event event;
unsigned long *consumer_pos;
unsigned long *producer_pos;
void *data;
unsigned long mask;
size_t page_size;
int map_fd;
int epoll_fd;
};
/* 8-byte ring buffer header structure */
struct ringbuf_hdr {
__u32 len;
__u32 pad;
};
static void ringbuf_unmap_ring(struct ring_buffer *rb, struct ring *r)
{
if (r->consumer_pos) {
@ -300,3 +318,256 @@ int ring_buffer__epoll_fd(const struct ring_buffer *rb)
{
return rb->epoll_fd;
}
static void user_ringbuf_unmap_ring(struct user_ring_buffer *rb)
{
if (rb->consumer_pos) {
munmap(rb->consumer_pos, rb->page_size);
rb->consumer_pos = NULL;
}
if (rb->producer_pos) {
munmap(rb->producer_pos, rb->page_size + 2 * (rb->mask + 1));
rb->producer_pos = NULL;
}
}
void user_ring_buffer__free(struct user_ring_buffer *rb)
{
if (!rb)
return;
user_ringbuf_unmap_ring(rb);
if (rb->epoll_fd >= 0)
close(rb->epoll_fd);
free(rb);
}
static int user_ringbuf_map(struct user_ring_buffer *rb, int map_fd)
{
struct bpf_map_info info;
__u32 len = sizeof(info);
void *tmp;
struct epoll_event *rb_epoll;
int err;
memset(&info, 0, sizeof(info));
err = bpf_obj_get_info_by_fd(map_fd, &info, &len);
if (err) {
err = -errno;
pr_warn("user ringbuf: failed to get map info for fd=%d: %d\n", map_fd, err);
return err;
}
if (info.type != BPF_MAP_TYPE_USER_RINGBUF) {
pr_warn("user ringbuf: map fd=%d is not BPF_MAP_TYPE_USER_RINGBUF\n", map_fd);
return -EINVAL;
}
rb->map_fd = map_fd;
rb->mask = info.max_entries - 1;
/* Map read-only consumer page */
tmp = mmap(NULL, rb->page_size, PROT_READ, MAP_SHARED, map_fd, 0);
if (tmp == MAP_FAILED) {
err = -errno;
pr_warn("user ringbuf: failed to mmap consumer page for map fd=%d: %d\n",
map_fd, err);
return err;
}
rb->consumer_pos = tmp;
/* Map read-write the producer page and data pages. We map the data
* region as twice the total size of the ring buffer to allow the
* simple reading and writing of samples that wrap around the end of
* the buffer. See the kernel implementation for details.
*/
tmp = mmap(NULL, rb->page_size + 2 * info.max_entries,
PROT_READ | PROT_WRITE, MAP_SHARED, map_fd, rb->page_size);
if (tmp == MAP_FAILED) {
err = -errno;
pr_warn("user ringbuf: failed to mmap data pages for map fd=%d: %d\n",
map_fd, err);
return err;
}
rb->producer_pos = tmp;
rb->data = tmp + rb->page_size;
rb_epoll = &rb->event;
rb_epoll->events = EPOLLOUT;
if (epoll_ctl(rb->epoll_fd, EPOLL_CTL_ADD, map_fd, rb_epoll) < 0) {
err = -errno;
pr_warn("user ringbuf: failed to epoll add map fd=%d: %d\n", map_fd, err);
return err;
}
return 0;
}
struct user_ring_buffer *
user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts)
{
struct user_ring_buffer *rb;
int err;
if (!OPTS_VALID(opts, user_ring_buffer_opts))
return errno = EINVAL, NULL;
rb = calloc(1, sizeof(*rb));
if (!rb)
return errno = ENOMEM, NULL;
rb->page_size = getpagesize();
rb->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
if (rb->epoll_fd < 0) {
err = -errno;
pr_warn("user ringbuf: failed to create epoll instance: %d\n", err);
goto err_out;
}
err = user_ringbuf_map(rb, map_fd);
if (err)
goto err_out;
return rb;
err_out:
user_ring_buffer__free(rb);
return errno = -err, NULL;
}
static void user_ringbuf_commit(struct user_ring_buffer *rb, void *sample, bool discard)
{
__u32 new_len;
struct ringbuf_hdr *hdr;
uintptr_t hdr_offset;
hdr_offset = rb->mask + 1 + (sample - rb->data) - BPF_RINGBUF_HDR_SZ;
hdr = rb->data + (hdr_offset & rb->mask);
new_len = hdr->len & ~BPF_RINGBUF_BUSY_BIT;
if (discard)
new_len |= BPF_RINGBUF_DISCARD_BIT;
/* Synchronizes with smp_load_acquire() in __bpf_user_ringbuf_peek() in
* the kernel.
*/
__atomic_exchange_n(&hdr->len, new_len, __ATOMIC_ACQ_REL);
}
void user_ring_buffer__discard(struct user_ring_buffer *rb, void *sample)
{
user_ringbuf_commit(rb, sample, true);
}
void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample)
{
user_ringbuf_commit(rb, sample, false);
}
void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size)
{
__u32 avail_size, total_size, max_size;
/* 64-bit to avoid overflow in case of extreme application behavior */
__u64 cons_pos, prod_pos;
struct ringbuf_hdr *hdr;
/* Synchronizes with smp_store_release() in __bpf_user_ringbuf_peek() in
* the kernel.
*/
cons_pos = smp_load_acquire(rb->consumer_pos);
/* Synchronizes with smp_store_release() in user_ringbuf_commit() */
prod_pos = smp_load_acquire(rb->producer_pos);
max_size = rb->mask + 1;
avail_size = max_size - (prod_pos - cons_pos);
/* Round up total size to a multiple of 8. */
total_size = (size + BPF_RINGBUF_HDR_SZ + 7) / 8 * 8;
if (total_size > max_size)
return errno = E2BIG, NULL;
if (avail_size < total_size)
return errno = ENOSPC, NULL;
hdr = rb->data + (prod_pos & rb->mask);
hdr->len = size | BPF_RINGBUF_BUSY_BIT;
hdr->pad = 0;
/* Synchronizes with smp_load_acquire() in __bpf_user_ringbuf_peek() in
* the kernel.
*/
smp_store_release(rb->producer_pos, prod_pos + total_size);
return (void *)rb->data + ((prod_pos + BPF_RINGBUF_HDR_SZ) & rb->mask);
}
static __u64 ns_elapsed_timespec(const struct timespec *start, const struct timespec *end)
{
__u64 start_ns, end_ns, ns_per_s = 1000000000;
start_ns = (__u64)start->tv_sec * ns_per_s + start->tv_nsec;
end_ns = (__u64)end->tv_sec * ns_per_s + end->tv_nsec;
return end_ns - start_ns;
}
void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms)
{
void *sample;
int err, ms_remaining = timeout_ms;
struct timespec start;
if (timeout_ms < 0 && timeout_ms != -1)
return errno = EINVAL, NULL;
if (timeout_ms != -1) {
err = clock_gettime(CLOCK_MONOTONIC, &start);
if (err)
return NULL;
}
do {
int cnt, ms_elapsed;
struct timespec curr;
__u64 ns_per_ms = 1000000;
sample = user_ring_buffer__reserve(rb, size);
if (sample)
return sample;
else if (errno != ENOSPC)
return NULL;
/* The kernel guarantees at least one event notification
* delivery whenever at least one sample is drained from the
* ring buffer in an invocation to bpf_ringbuf_drain(). Other
* additional events may be delivered at any time, but only one
* event is guaranteed per bpf_ringbuf_drain() invocation,
* provided that a sample is drained, and the BPF program did
* not pass BPF_RB_NO_WAKEUP to bpf_ringbuf_drain(). If
* BPF_RB_FORCE_WAKEUP is passed to bpf_ringbuf_drain(), a
* wakeup event will be delivered even if no samples are
* drained.
*/
cnt = epoll_wait(rb->epoll_fd, &rb->event, 1, ms_remaining);
if (cnt < 0)
return NULL;
if (timeout_ms == -1)
continue;
err = clock_gettime(CLOCK_MONOTONIC, &curr);
if (err)
return NULL;
ms_elapsed = ns_elapsed_timespec(&start, &curr) / ns_per_ms;
ms_remaining = timeout_ms - ms_elapsed;
} while (ms_remaining > 0);
/* Try one more time to reserve a sample after the specified timeout has elapsed. */
return user_ring_buffer__reserve(rb, size);
}

View File

@ -282,7 +282,7 @@ struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
* If this is not supported, USDTs with semaphores will not be supported.
* Added in: a6ca88b241d5 ("trace_uprobe: support reference counter in fd-based uprobe")
*/
man->has_sema_refcnt = access(ref_ctr_sysfs_path, F_OK) == 0;
man->has_sema_refcnt = faccessat(AT_FDCWD, ref_ctr_sysfs_path, F_OK, AT_EACCESS) == 0;
return man;
}

View File

@ -4113,7 +4113,8 @@ static int validate_ibt(struct objtool_file *file)
!strcmp(sec->name, "__bug_table") ||
!strcmp(sec->name, "__ex_table") ||
!strcmp(sec->name, "__jump_table") ||
!strcmp(sec->name, "__mcount_loc"))
!strcmp(sec->name, "__mcount_loc") ||
strstr(sec->name, "__patchable_function_entries"))
continue;
list_for_each_entry(reloc, &sec->reloc->reloc_list, list)

View File

@ -39,6 +39,8 @@ test_cpp
/tools
/runqslower
/bench
/veristat
/sign-file
*.ko
*.tmp
xskxceiver

View File

@ -70,3 +70,8 @@ setget_sockopt # attach unexpected error: -524
cb_refs # expected error message unexpected error: -524 (trampoline)
cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc)
htab_update # failed to attach: ERROR: strerror_r(-524)=22 (trampoline)
tracing_struct # failed to auto-attach: -524 (trampoline)
user_ringbuf # failed to find kernel BTF type ID of '__s390x_sys_prctl': -3 (?)
lookup_key # JIT does not support calling kernel function (kfunc)
verify_pkcs7_sig # JIT does not support calling kernel function (kfunc)
kfunc_dynptr_param # JIT does not support calling kernel function (kfunc)

View File

@ -14,6 +14,7 @@ BPFTOOLDIR := $(TOOLSDIR)/bpf/bpftool
APIDIR := $(TOOLSINCDIR)/uapi
GENDIR := $(abspath ../../../../include/generated)
GENHDR := $(GENDIR)/autoconf.h
HOSTPKG_CONFIG := pkg-config
ifneq ($(wildcard $(GENHDR)),)
GENFLAGS := -DHAVE_GENHDR
@ -75,16 +76,17 @@ TEST_PROGS := test_kmod.sh \
test_xsk.sh
TEST_PROGS_EXTENDED := with_addr.sh \
with_tunnels.sh ima_setup.sh \
with_tunnels.sh ima_setup.sh verify_sig_setup.sh \
test_xdp_vlan.sh test_bpftool.py
# Compile but not part of 'make run_tests'
TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \
flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \
xskxceiver xdp_redirect_multi xdp_synproxy
xskxceiver xdp_redirect_multi xdp_synproxy veristat
TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read
TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read $(OUTPUT)/sign-file
TEST_GEN_FILES += liburandom_read.so
# Emit succinct information message describing current building step
# $1 - generic step name (e.g., CC, LINK, etc);
@ -189,6 +191,12 @@ $(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_r
-fuse-ld=$(LLD) -Wl,-znoseparate-code \
-Wl,-rpath=. -Wl,--build-id=sha1 -o $@
$(OUTPUT)/sign-file: ../../../../scripts/sign-file.c
$(call msg,SIGN-FILE,,$@)
$(Q)$(CC) $(shell $(HOSTPKG_CONFIG)--cflags libcrypto 2> /dev/null) \
$< -o $@ \
$(shell $(HOSTPKG_CONFIG) --libs libcrypto 2> /dev/null || echo -lcrypto)
$(OUTPUT)/bpf_testmod.ko: $(VMLINUX_BTF) $(wildcard bpf_testmod/Makefile bpf_testmod/*.[ch])
$(call msg,MOD,,$@)
$(Q)$(RM) bpf_testmod/bpf_testmod.ko # force re-compilation
@ -351,11 +359,12 @@ LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h \
test_subskeleton.skel.h test_subskeleton_lib.skel.h \
test_usdt.skel.h
LSKELS := kfunc_call_test.c fentry_test.c fexit_test.c fexit_sleep.c \
LSKELS := fentry_test.c fexit_test.c fexit_sleep.c \
test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c \
map_ptr_kern.c core_kern.c core_kern_overflow.c
# Generate both light skeleton and libbpf skeleton for these
LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test_subprog.c
LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test.c \
kfunc_call_test_subprog.c
SKEL_BLACKLIST += $$(LSKELS)
test_static_linked.skel.h-deps := test_static_linked1.bpf.o test_static_linked2.bpf.o
@ -515,7 +524,8 @@ TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c \
TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \
$(OUTPUT)/liburandom_read.so \
$(OUTPUT)/xdp_synproxy \
ima_setup.sh \
$(OUTPUT)/sign-file \
ima_setup.sh verify_sig_setup.sh \
$(wildcard progs/btf_dump_test_case_*.c)
TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS
@ -594,6 +604,11 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \
$(call msg,BINARY,,$@)
$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@
$(OUTPUT)/veristat.o: $(BPFOBJ)
$(OUTPUT)/veristat: $(OUTPUT)/veristat.o
$(call msg,BINARY,,$@)
$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@
EXTRA_CLEAN := $(TEST_CUSTOM_PROGS) $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) \
prog_tests/tests.h map_tests/tests.h verifier/tests.h \
feature bpftool \

View File

@ -18,6 +18,46 @@ typedef int (*func_proto_typedef_nested1)(func_proto_typedef);
typedef int (*func_proto_typedef_nested2)(func_proto_typedef_nested1);
DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123;
long bpf_testmod_test_struct_arg_result;
struct bpf_testmod_struct_arg_1 {
int a;
};
struct bpf_testmod_struct_arg_2 {
long a;
long b;
};
noinline int
bpf_testmod_test_struct_arg_1(struct bpf_testmod_struct_arg_2 a, int b, int c) {
bpf_testmod_test_struct_arg_result = a.a + a.b + b + c;
return bpf_testmod_test_struct_arg_result;
}
noinline int
bpf_testmod_test_struct_arg_2(int a, struct bpf_testmod_struct_arg_2 b, int c) {
bpf_testmod_test_struct_arg_result = a + b.a + b.b + c;
return bpf_testmod_test_struct_arg_result;
}
noinline int
bpf_testmod_test_struct_arg_3(int a, int b, struct bpf_testmod_struct_arg_2 c) {
bpf_testmod_test_struct_arg_result = a + b + c.a + c.b;
return bpf_testmod_test_struct_arg_result;
}
noinline int
bpf_testmod_test_struct_arg_4(struct bpf_testmod_struct_arg_1 a, int b,
int c, int d, struct bpf_testmod_struct_arg_2 e) {
bpf_testmod_test_struct_arg_result = a.a + b + c + d + e.a + e.b;
return bpf_testmod_test_struct_arg_result;
}
noinline int
bpf_testmod_test_struct_arg_5(void) {
bpf_testmod_test_struct_arg_result = 1;
return bpf_testmod_test_struct_arg_result;
}
noinline void
bpf_testmod_test_mod_kfunc(int i)
@ -98,11 +138,19 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
.off = off,
.len = len,
};
struct bpf_testmod_struct_arg_1 struct_arg1 = {10};
struct bpf_testmod_struct_arg_2 struct_arg2 = {2, 3};
int i = 1;
while (bpf_testmod_return_ptr(i))
i++;
(void)bpf_testmod_test_struct_arg_1(struct_arg2, 1, 4);
(void)bpf_testmod_test_struct_arg_2(1, struct_arg2, 4);
(void)bpf_testmod_test_struct_arg_3(1, 4, struct_arg2);
(void)bpf_testmod_test_struct_arg_4(struct_arg1, 1, 2, 3, struct_arg2);
(void)bpf_testmod_test_struct_arg_5();
/* This is always true. Use the check to make sure the compiler
* doesn't remove bpf_testmod_loop_test.
*/

View File

@ -7,9 +7,9 @@ CONFIG_BPF_LSM=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_BPF_SYSCALL=y
CONFIG_CGROUP_BPF=y
CONFIG_CRYPTO_HMAC=m
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_USER_API_HASH=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FPROBE=y
CONFIG_FTRACE_SYSCALLS=y
@ -24,30 +24,36 @@ CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_RAW=y
CONFIG_IP_NF_TARGET_SYNPROXY=y
CONFIG_IPV6=y
CONFIG_IPV6_FOU=m
CONFIG_IPV6_FOU_TUNNEL=m
CONFIG_IPV6_FOU=y
CONFIG_IPV6_FOU_TUNNEL=y
CONFIG_IPV6_GRE=y
CONFIG_IPV6_SEG6_BPF=y
CONFIG_IPV6_SIT=m
CONFIG_IPV6_SIT=y
CONFIG_IPV6_TUNNEL=y
CONFIG_KEYS=y
CONFIG_LIRC=y
CONFIG_LWTUNNEL=y
CONFIG_MODULE_SIG=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_MPLS=y
CONFIG_MPLS_IPTUNNEL=m
CONFIG_MPLS_ROUTING=m
CONFIG_MPLS_IPTUNNEL=y
CONFIG_MPLS_ROUTING=y
CONFIG_MPTCP=y
CONFIG_NET_CLS_ACT=y
CONFIG_NET_CLS_BPF=y
CONFIG_NET_CLS_FLOWER=m
CONFIG_NET_FOU=m
CONFIG_NET_CLS_FLOWER=y
CONFIG_NET_FOU=y
CONFIG_NET_FOU_IP_TUNNELS=y
CONFIG_NET_IPGRE=y
CONFIG_NET_IPGRE_DEMUX=y
CONFIG_NET_IPIP=y
CONFIG_NET_MPLS_GSO=m
CONFIG_NET_MPLS_GSO=y
CONFIG_NET_SCH_INGRESS=y
CONFIG_NET_SCHED=y
CONFIG_NETDEVSIM=m
CONFIG_NETDEVSIM=y
CONFIG_NETFILTER=y
CONFIG_NETFILTER_SYNPROXY=y
CONFIG_NETFILTER_XT_CONNMARK=y
@ -57,10 +63,11 @@ CONFIG_NF_CONNTRACK=y
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_DEFRAG_IPV4=y
CONFIG_NF_DEFRAG_IPV6=y
CONFIG_NF_NAT=y
CONFIG_RC_CORE=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_TEST_BPF=m
CONFIG_TEST_BPF=y
CONFIG_USERFAULTFD=y
CONFIG_VXLAN=y
CONFIG_XDP_SOCKETS=y

View File

@ -47,7 +47,7 @@ CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPUSETS=y
CONFIG_CRC_T10DIF=y
CONFIG_CRYPTO_BLAKE2B=y
CONFIG_CRYPTO_DEV_VIRTIO=m
CONFIG_CRYPTO_DEV_VIRTIO=y
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_XXHASH=y
CONFIG_DCB=y
@ -145,11 +145,6 @@ CONFIG_MCORE2=y
CONFIG_MEMCG=y
CONFIG_MEMORY_FAILURE=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_MODULE_SIG=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_NAMESPACES=y
CONFIG_NET=y
CONFIG_NET_9P=y

View File

@ -3,6 +3,7 @@
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
@ -137,6 +138,7 @@ static void __test_map_lookup_and_update_batch(bool is_pcpu)
free(keys);
free(values);
free(visited);
close(map_fd);
}
static void array_map_batch_ops(void)

View File

@ -3,6 +3,7 @@
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
@ -255,6 +256,7 @@ void __test_map_lookup_and_delete_batch(bool is_pcpu)
free(visited);
if (!is_pcpu)
free(values);
close(map_fd);
}
void htab_map_batch_ops(void)

View File

@ -7,6 +7,7 @@
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
@ -150,4 +151,5 @@ void test_lpm_trie_map_batch_ops(void)
free(keys);
free(values);
free(visited);
close(map_fd);
}

View File

@ -77,8 +77,12 @@ void test_task_storage_map_stress_lookup(void)
CHECK(err, "open_and_load", "error %d\n", err);
/* Only for a fully preemptible kernel */
if (!skel->kconfig->CONFIG_PREEMPT)
if (!skel->kconfig->CONFIG_PREEMPT) {
printf("%s SKIP (no CONFIG_PREEMPT)\n", __func__);
read_bpf_task_storage_busy__destroy(skel);
skips++;
return;
}
/* Save the old affinity setting */
sched_getaffinity(getpid(), sizeof(old), &old);
@ -119,4 +123,5 @@ out:
read_bpf_task_storage_busy__destroy(skel);
/* Restore affinity setting */
sched_setaffinity(getpid(), sizeof(old), &old);
printf("%s:PASS\n", __func__);
}

View File

@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Facebook */
#include <test_progs.h>
#include <unistd.h>
#include <sys/syscall.h>
#include "bpf_iter_ipv6_route.skel.h"
#include "bpf_iter_netlink.skel.h"
#include "bpf_iter_bpf_map.skel.h"
@ -14,6 +16,7 @@
#include "bpf_iter_udp4.skel.h"
#include "bpf_iter_udp6.skel.h"
#include "bpf_iter_unix.skel.h"
#include "bpf_iter_vma_offset.skel.h"
#include "bpf_iter_test_kern1.skel.h"
#include "bpf_iter_test_kern2.skel.h"
#include "bpf_iter_test_kern3.skel.h"
@ -43,13 +46,13 @@ static void test_btf_id_or_null(void)
}
}
static void do_dummy_read(struct bpf_program *prog)
static void do_dummy_read_opts(struct bpf_program *prog, struct bpf_iter_attach_opts *opts)
{
struct bpf_link *link;
char buf[16] = {};
int iter_fd, len;
link = bpf_program__attach_iter(prog, NULL);
link = bpf_program__attach_iter(prog, opts);
if (!ASSERT_OK_PTR(link, "attach_iter"))
return;
@ -68,6 +71,11 @@ free_link:
bpf_link__destroy(link);
}
static void do_dummy_read(struct bpf_program *prog)
{
do_dummy_read_opts(prog, NULL);
}
static void do_read_map_iter_fd(struct bpf_object_skeleton **skel, struct bpf_program *prog,
struct bpf_map *map)
{
@ -167,19 +175,140 @@ static void test_bpf_map(void)
bpf_iter_bpf_map__destroy(skel);
}
static void test_task(void)
static int pidfd_open(pid_t pid, unsigned int flags)
{
return syscall(SYS_pidfd_open, pid, flags);
}
static void check_bpf_link_info(const struct bpf_program *prog)
{
LIBBPF_OPTS(bpf_iter_attach_opts, opts);
union bpf_iter_link_info linfo;
struct bpf_link_info info = {};
struct bpf_link *link;
__u32 info_len;
int err;
memset(&linfo, 0, sizeof(linfo));
linfo.task.tid = getpid();
opts.link_info = &linfo;
opts.link_info_len = sizeof(linfo);
link = bpf_program__attach_iter(prog, &opts);
if (!ASSERT_OK_PTR(link, "attach_iter"))
return;
info_len = sizeof(info);
err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len);
ASSERT_OK(err, "bpf_obj_get_info_by_fd");
ASSERT_EQ(info.iter.task.tid, getpid(), "check_task_tid");
bpf_link__destroy(link);
}
static pthread_mutex_t do_nothing_mutex;
static void *do_nothing_wait(void *arg)
{
pthread_mutex_lock(&do_nothing_mutex);
pthread_mutex_unlock(&do_nothing_mutex);
pthread_exit(arg);
}
static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts,
int *num_unknown, int *num_known)
{
struct bpf_iter_task *skel;
pthread_t thread_id;
void *ret;
skel = bpf_iter_task__open_and_load();
if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
return;
do_dummy_read(skel->progs.dump_task);
ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock");
ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL),
"pthread_create");
skel->bss->tid = getpid();
do_dummy_read_opts(skel->progs.dump_task, opts);
*num_unknown = skel->bss->num_unknown_tid;
*num_known = skel->bss->num_known_tid;
ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
"pthread_join");
bpf_iter_task__destroy(skel);
}
static void test_task_common(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known)
{
int num_unknown_tid, num_known_tid;
test_task_common_nocheck(opts, &num_unknown_tid, &num_known_tid);
ASSERT_EQ(num_unknown_tid, num_unknown, "check_num_unknown_tid");
ASSERT_EQ(num_known_tid, num_known, "check_num_known_tid");
}
static void test_task_tid(void)
{
LIBBPF_OPTS(bpf_iter_attach_opts, opts);
union bpf_iter_link_info linfo;
int num_unknown_tid, num_known_tid;
memset(&linfo, 0, sizeof(linfo));
linfo.task.tid = getpid();
opts.link_info = &linfo;
opts.link_info_len = sizeof(linfo);
test_task_common(&opts, 0, 1);
linfo.task.tid = 0;
linfo.task.pid = getpid();
test_task_common(&opts, 1, 1);
test_task_common_nocheck(NULL, &num_unknown_tid, &num_known_tid);
ASSERT_GT(num_unknown_tid, 1, "check_num_unknown_tid");
ASSERT_EQ(num_known_tid, 1, "check_num_known_tid");
}
static void test_task_pid(void)
{
LIBBPF_OPTS(bpf_iter_attach_opts, opts);
union bpf_iter_link_info linfo;
memset(&linfo, 0, sizeof(linfo));
linfo.task.pid = getpid();
opts.link_info = &linfo;
opts.link_info_len = sizeof(linfo);
test_task_common(&opts, 1, 1);
}
static void test_task_pidfd(void)
{
LIBBPF_OPTS(bpf_iter_attach_opts, opts);
union bpf_iter_link_info linfo;
int pidfd;
pidfd = pidfd_open(getpid(), 0);
if (!ASSERT_GT(pidfd, 0, "pidfd_open"))
return;
memset(&linfo, 0, sizeof(linfo));
linfo.task.pid_fd = pidfd;
opts.link_info = &linfo;
opts.link_info_len = sizeof(linfo);
test_task_common(&opts, 1, 1);
close(pidfd);
}
static void test_task_sleepable(void)
{
struct bpf_iter_task *skel;
@ -212,14 +341,11 @@ static void test_task_stack(void)
bpf_iter_task_stack__destroy(skel);
}
static void *do_nothing(void *arg)
{
pthread_exit(arg);
}
static void test_task_file(void)
{
LIBBPF_OPTS(bpf_iter_attach_opts, opts);
struct bpf_iter_task_file *skel;
union bpf_iter_link_info linfo;
pthread_t thread_id;
void *ret;
@ -229,19 +355,36 @@ static void test_task_file(void)
skel->bss->tgid = getpid();
if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing, NULL),
"pthread_create"))
goto done;
ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock");
ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL),
"pthread_create");
memset(&linfo, 0, sizeof(linfo));
linfo.task.tid = getpid();
opts.link_info = &linfo;
opts.link_info_len = sizeof(linfo);
do_dummy_read_opts(skel->progs.dump_task_file, &opts);
ASSERT_EQ(skel->bss->count, 0, "check_count");
ASSERT_EQ(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");
skel->bss->last_tgid = 0;
skel->bss->count = 0;
skel->bss->unique_tgid_count = 0;
do_dummy_read(skel->progs.dump_task_file);
if (!ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
"pthread_join"))
goto done;
ASSERT_EQ(skel->bss->count, 0, "check_count");
ASSERT_GT(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count");
check_bpf_link_info(skel->progs.dump_task_file);
ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock");
ASSERT_OK(pthread_join(thread_id, &ret), "pthread_join");
ASSERT_NULL(ret, "pthread_join");
done:
bpf_iter_task_file__destroy(skel);
}
@ -1249,7 +1392,7 @@ static void str_strip_first_line(char *str)
*dst = '\0';
}
static void test_task_vma(void)
static void test_task_vma_common(struct bpf_iter_attach_opts *opts)
{
int err, iter_fd = -1, proc_maps_fd = -1;
struct bpf_iter_task_vma *skel;
@ -1261,13 +1404,14 @@ static void test_task_vma(void)
return;
skel->bss->pid = getpid();
skel->bss->one_task = opts ? 1 : 0;
err = bpf_iter_task_vma__load(skel);
if (!ASSERT_OK(err, "bpf_iter_task_vma__load"))
goto out;
skel->links.proc_maps = bpf_program__attach_iter(
skel->progs.proc_maps, NULL);
skel->progs.proc_maps, opts);
if (!ASSERT_OK_PTR(skel->links.proc_maps, "bpf_program__attach_iter")) {
skel->links.proc_maps = NULL;
@ -1291,6 +1435,8 @@ static void test_task_vma(void)
goto out;
len += err;
}
if (opts)
ASSERT_EQ(skel->bss->one_task_error, 0, "unexpected task");
/* read CMP_BUFFER_SIZE (1kB) from /proc/pid/maps */
snprintf(maps_path, 64, "/proc/%u/maps", skel->bss->pid);
@ -1306,6 +1452,9 @@ static void test_task_vma(void)
str_strip_first_line(proc_maps_output);
ASSERT_STREQ(task_vma_output, proc_maps_output, "compare_output");
check_bpf_link_info(skel->progs.proc_maps);
out:
close(proc_maps_fd);
close(iter_fd);
@ -1325,8 +1474,93 @@ void test_bpf_sockmap_map_iter_fd(void)
bpf_iter_sockmap__destroy(skel);
}
static void test_task_vma(void)
{
LIBBPF_OPTS(bpf_iter_attach_opts, opts);
union bpf_iter_link_info linfo;
memset(&linfo, 0, sizeof(linfo));
linfo.task.tid = getpid();
opts.link_info = &linfo;
opts.link_info_len = sizeof(linfo);
test_task_vma_common(&opts);
test_task_vma_common(NULL);
}
/* uprobe attach point */
static noinline int trigger_func(int arg)
{
asm volatile ("");
return arg + 1;
}
static void test_task_vma_offset_common(struct bpf_iter_attach_opts *opts, bool one_proc)
{
struct bpf_iter_vma_offset *skel;
struct bpf_link *link;
char buf[16] = {};
int iter_fd, len;
int pgsz, shift;
skel = bpf_iter_vma_offset__open_and_load();
if (!ASSERT_OK_PTR(skel, "bpf_iter_vma_offset__open_and_load"))
return;
skel->bss->pid = getpid();
skel->bss->address = (uintptr_t)trigger_func;
for (pgsz = getpagesize(), shift = 0; pgsz > 1; pgsz >>= 1, shift++)
;
skel->bss->page_shift = shift;
link = bpf_program__attach_iter(skel->progs.get_vma_offset, opts);
if (!ASSERT_OK_PTR(link, "attach_iter"))
return;
iter_fd = bpf_iter_create(bpf_link__fd(link));
if (!ASSERT_GT(iter_fd, 0, "create_iter"))
goto exit;
while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
;
buf[15] = 0;
ASSERT_EQ(strcmp(buf, "OK\n"), 0, "strcmp");
ASSERT_EQ(skel->bss->offset, get_uprobe_offset(trigger_func), "offset");
if (one_proc)
ASSERT_EQ(skel->bss->unique_tgid_cnt, 1, "unique_tgid_count");
else
ASSERT_GT(skel->bss->unique_tgid_cnt, 1, "unique_tgid_count");
close(iter_fd);
exit:
bpf_link__destroy(link);
}
static void test_task_vma_offset(void)
{
LIBBPF_OPTS(bpf_iter_attach_opts, opts);
union bpf_iter_link_info linfo;
memset(&linfo, 0, sizeof(linfo));
linfo.task.pid = getpid();
opts.link_info = &linfo;
opts.link_info_len = sizeof(linfo);
test_task_vma_offset_common(&opts, true);
linfo.task.pid = 0;
linfo.task.tid = getpid();
test_task_vma_offset_common(&opts, true);
test_task_vma_offset_common(NULL, false);
}
void test_bpf_iter(void)
{
ASSERT_OK(pthread_mutex_init(&do_nothing_mutex, NULL), "pthread_mutex_init");
if (test__start_subtest("btf_id_or_null"))
test_btf_id_or_null();
if (test__start_subtest("ipv6_route"))
@ -1335,8 +1569,12 @@ void test_bpf_iter(void)
test_netlink();
if (test__start_subtest("bpf_map"))
test_bpf_map();
if (test__start_subtest("task"))
test_task();
if (test__start_subtest("task_tid"))
test_task_tid();
if (test__start_subtest("task_pid"))
test_task_pid();
if (test__start_subtest("task_pidfd"))
test_task_pidfd();
if (test__start_subtest("task_sleepable"))
test_task_sleepable();
if (test__start_subtest("task_stack"))
@ -1397,4 +1635,6 @@ void test_bpf_iter(void)
test_ksym_iter();
if (test__start_subtest("bpf_sockmap_map_iter_fd"))
test_bpf_sockmap_map_iter_fd();
if (test__start_subtest("vma_offset"))
test_task_vma_offset();
}

View File

@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include <network_helpers.h>
#include <linux/netfilter/nf_conntrack_common.h>
#include "test_bpf_nf.skel.h"
#include "test_bpf_nf_fail.skel.h"
@ -17,6 +18,7 @@ struct {
{ "set_status_after_insert", "kernel function bpf_ct_set_status args#0 expected pointer to STRUCT nf_conn___init but" },
{ "change_timeout_after_alloc", "kernel function bpf_ct_change_timeout args#0 expected pointer to STRUCT nf_conn but" },
{ "change_status_after_alloc", "kernel function bpf_ct_change_status args#0 expected pointer to STRUCT nf_conn but" },
{ "write_not_allowlisted_field", "no write support to nf_conn at off" },
};
enum {
@ -24,7 +26,10 @@ enum {
TEST_TC_BPF,
};
#define TIMEOUT_MS 3000
#define TIMEOUT_MS 3000
#define IPS_STATUS_MASK (IPS_CONFIRMED | IPS_SEEN_REPLY | \
IPS_SRC_NAT_DONE | IPS_DST_NAT_DONE | \
IPS_SRC_NAT | IPS_DST_NAT)
static int connect_to_server(int srv_fd)
{
@ -111,10 +116,12 @@ static void test_bpf_nf_ct(int mode)
/* allow some tolerance for test_delta_timeout value to avoid races. */
ASSERT_GT(skel->bss->test_delta_timeout, 8, "Test for min ct timeout update");
ASSERT_LE(skel->bss->test_delta_timeout, 10, "Test for max ct timeout update");
/* expected status is IPS_SEEN_REPLY */
ASSERT_EQ(skel->bss->test_status, 2, "Test for ct status update ");
ASSERT_EQ(skel->bss->test_insert_lookup_mark, 77, "Test for insert and lookup mark value");
ASSERT_EQ(skel->bss->test_status, IPS_STATUS_MASK, "Test for ct status update ");
ASSERT_EQ(skel->data->test_exist_lookup, 0, "Test existing connection lookup");
ASSERT_EQ(skel->bss->test_exist_lookup_mark, 43, "Test existing connection lookup ctmark");
ASSERT_EQ(skel->data->test_snat_addr, 0, "Test for source natting");
ASSERT_EQ(skel->data->test_dnat_addr, 0, "Test for destination natting");
end:
if (srv_client_fd != -1)
close(srv_client_fd);

View File

@ -290,6 +290,10 @@ static void test_dctcp_fallback(void)
goto done;
ASSERT_STREQ(dctcp_skel->bss->cc_res, "cubic", "cc_res");
ASSERT_EQ(dctcp_skel->bss->tcp_cdg_res, -ENOTSUPP, "tcp_cdg_res");
/* All setsockopt(TCP_CONGESTION) in the recurred
* bpf_dctcp->init() should fail with -EBUSY.
*/
ASSERT_EQ(dctcp_skel->bss->ebusy_cnt, 3, "ebusy_cnt");
err = getsockopt(srv_fd, SOL_TCP, TCP_CONGESTION, srv_cc, &cc_len);
if (!ASSERT_OK(err, "getsockopt(srv_fd, TCP_CONGESTION)"))

View File

@ -764,7 +764,7 @@ static void test_btf_dump_struct_data(struct btf *btf, struct btf_dump *d,
/* union with nested struct */
TEST_BTF_DUMP_DATA(btf, d, "union", str, union bpf_iter_link_info, BTF_F_COMPACT,
"(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},.cgroup = (struct){.order = (enum bpf_cgroup_iter_order)BPF_CGROUP_ITER_SELF_ONLY,.cgroup_fd = (__u32)1,},}",
"(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},.cgroup = (struct){.order = (enum bpf_cgroup_iter_order)BPF_CGROUP_ITER_SELF_ONLY,.cgroup_fd = (__u32)1,},.task = (struct){.tid = (__u32)1,.pid = (__u32)1,},}",
{ .cgroup = { .order = 1, .cgroup_fd = 1, }});
/* struct skb with nested structs/unions; because type output is so

View File

@ -22,26 +22,6 @@ static __u32 duration;
#define PROG_PIN_FILE "/sys/fs/bpf/btf_skc_cls_ingress"
static int write_sysctl(const char *sysctl, const char *value)
{
int fd, err, len;
fd = open(sysctl, O_WRONLY);
if (CHECK(fd == -1, "open sysctl", "open(%s): %s (%d)\n",
sysctl, strerror(errno), errno))
return -1;
len = strlen(value);
err = write(fd, value, len);
close(fd);
if (CHECK(err != len, "write sysctl",
"write(%s, %s, %d): err:%d %s (%d)\n",
sysctl, value, len, err, strerror(errno), errno))
return -1;
return 0;
}
static int prepare_netns(void)
{
if (CHECK(unshare(CLONE_NEWNET), "create netns",

View File

@ -1,6 +1,22 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Functions to manage eBPF programs attached to cgroup subsystems
* This test makes sure BPF stats collection using rstat works correctly.
* The test uses 3 BPF progs:
* (a) counter: This BPF prog is invoked every time we attach a process to a
* cgroup and locklessly increments a percpu counter.
* The program then calls cgroup_rstat_updated() to inform rstat
* of an update on the (cpu, cgroup) pair.
*
* (b) flusher: This BPF prog is invoked when an rstat flush is ongoing, it
* aggregates all percpu counters to a total counter, and also
* propagates the changes to the ancestor cgroups.
*
* (c) dumper: This BPF prog is a cgroup_iter. It is used to output the total
* counter of a cgroup through reading a file in userspace.
*
* The test sets up a cgroup hierarchy, and the above programs. It spawns a few
* processes in the leaf cgroups and makes sure all the counters are aggregated
* correctly.
*
* Copyright 2022 Google LLC.
*/
@ -21,8 +37,10 @@
#define PAGE_SIZE 4096
#define MB(x) (x << 20)
#define PROCESSES_PER_CGROUP 3
#define BPFFS_ROOT "/sys/fs/bpf/"
#define BPFFS_VMSCAN BPFFS_ROOT"vmscan/"
#define BPFFS_ATTACH_COUNTERS BPFFS_ROOT "attach_counters/"
#define CG_ROOT_NAME "root"
#define CG_ROOT_ID 1
@ -79,7 +97,7 @@ static int setup_bpffs(void)
return err;
/* Create a directory to contain stat files in bpffs */
err = mkdir(BPFFS_VMSCAN, 0755);
err = mkdir(BPFFS_ATTACH_COUNTERS, 0755);
if (!ASSERT_OK(err, "mkdir"))
return err;
@ -89,7 +107,7 @@ static int setup_bpffs(void)
static void cleanup_bpffs(void)
{
/* Remove created directory in bpffs */
ASSERT_OK(rmdir(BPFFS_VMSCAN), "rmdir "BPFFS_VMSCAN);
ASSERT_OK(rmdir(BPFFS_ATTACH_COUNTERS), "rmdir "BPFFS_ATTACH_COUNTERS);
/* Unmount bpffs, if it wasn't already mounted when we started */
if (mounted_bpffs)
@ -118,18 +136,6 @@ static int setup_cgroups(void)
cgroups[i].fd = fd;
cgroups[i].id = get_cgroup_id(cgroups[i].path);
/*
* Enable memcg controller for the entire hierarchy.
* Note that stats are collected for all cgroups in a hierarchy
* with memcg enabled anyway, but are only exposed for cgroups
* that have memcg enabled.
*/
if (i < N_NON_LEAF_CGROUPS) {
err = enable_controllers(cgroups[i].path, "memory");
if (!ASSERT_OK(err, "enable_controllers"))
return err;
}
}
return 0;
}
@ -154,109 +160,85 @@ static void destroy_hierarchy(void)
cleanup_bpffs();
}
static int reclaimer(const char *cgroup_path, size_t size)
static int attach_processes(void)
{
static char size_buf[128];
char *buf, *ptr;
int err;
int i, j, status;
/* Join cgroup in the parent process workdir */
if (join_parent_cgroup(cgroup_path))
return EACCES;
/* Allocate memory */
buf = malloc(size);
if (!buf)
return ENOMEM;
/* Write to memory to make sure it's actually allocated */
for (ptr = buf; ptr < buf + size; ptr += PAGE_SIZE)
*ptr = 1;
/* Try to reclaim memory */
snprintf(size_buf, 128, "%lu", size);
err = write_cgroup_file_parent(cgroup_path, "memory.reclaim", size_buf);
free(buf);
/* memory.reclaim returns EAGAIN if the amount is not fully reclaimed */
if (err && errno != EAGAIN)
return errno;
return 0;
}
static int induce_vmscan(void)
{
int i, status;
/*
* In every leaf cgroup, run a child process that allocates some memory
* and attempts to reclaim some of it.
*/
/* In every leaf cgroup, attach 3 processes */
for (i = N_NON_LEAF_CGROUPS; i < N_CGROUPS; i++) {
pid_t pid;
for (j = 0; j < PROCESSES_PER_CGROUP; j++) {
pid_t pid;
/* Create reclaimer child */
pid = fork();
if (pid == 0) {
status = reclaimer(cgroups[i].path, MB(5));
exit(status);
/* Create child and attach to cgroup */
pid = fork();
if (pid == 0) {
if (join_parent_cgroup(cgroups[i].path))
exit(EACCES);
exit(0);
}
/* Cleanup child */
waitpid(pid, &status, 0);
if (!ASSERT_TRUE(WIFEXITED(status), "child process exited"))
return 1;
if (!ASSERT_EQ(WEXITSTATUS(status), 0,
"child process exit code"))
return 1;
}
/* Cleanup reclaimer child */
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status), "reclaimer exited");
ASSERT_EQ(WEXITSTATUS(status), 0, "reclaim exit code");
}
return 0;
}
static unsigned long long
get_cgroup_vmscan_delay(unsigned long long cgroup_id, const char *file_name)
get_attach_counter(unsigned long long cgroup_id, const char *file_name)
{
unsigned long long vmscan = 0, id = 0;
unsigned long long attach_counter = 0, id = 0;
static char buf[128], path[128];
/* For every cgroup, read the file generated by cgroup_iter */
snprintf(path, 128, "%s%s", BPFFS_VMSCAN, file_name);
snprintf(path, 128, "%s%s", BPFFS_ATTACH_COUNTERS, file_name);
if (!ASSERT_OK(read_from_file(path, buf, 128), "read cgroup_iter"))
return 0;
/* Check the output file formatting */
ASSERT_EQ(sscanf(buf, "cg_id: %llu, total_vmscan_delay: %llu\n",
&id, &vmscan), 2, "output format");
ASSERT_EQ(sscanf(buf, "cg_id: %llu, attach_counter: %llu\n",
&id, &attach_counter), 2, "output format");
/* Check that the cgroup_id is displayed correctly */
ASSERT_EQ(id, cgroup_id, "cgroup_id");
/* Check that the vmscan reading is non-zero */
ASSERT_GT(vmscan, 0, "vmscan_reading");
return vmscan;
/* Check that the counter is non-zero */
ASSERT_GT(attach_counter, 0, "attach counter non-zero");
return attach_counter;
}
static void check_vmscan_stats(void)
static void check_attach_counters(void)
{
unsigned long long vmscan_readings[N_CGROUPS], vmscan_root;
unsigned long long attach_counters[N_CGROUPS], root_attach_counter;
int i;
for (i = 0; i < N_CGROUPS; i++) {
vmscan_readings[i] = get_cgroup_vmscan_delay(cgroups[i].id,
cgroups[i].name);
}
for (i = 0; i < N_CGROUPS; i++)
attach_counters[i] = get_attach_counter(cgroups[i].id,
cgroups[i].name);
/* Read stats for root too */
vmscan_root = get_cgroup_vmscan_delay(CG_ROOT_ID, CG_ROOT_NAME);
root_attach_counter = get_attach_counter(CG_ROOT_ID, CG_ROOT_NAME);
/* Check that all leafs cgroups have an attach counter of 3 */
for (i = N_NON_LEAF_CGROUPS; i < N_CGROUPS; i++)
ASSERT_EQ(attach_counters[i], PROCESSES_PER_CGROUP,
"leaf cgroup attach counter");
/* Check that child1 == child1_1 + child1_2 */
ASSERT_EQ(vmscan_readings[1], vmscan_readings[3] + vmscan_readings[4],
"child1_vmscan");
ASSERT_EQ(attach_counters[1], attach_counters[3] + attach_counters[4],
"child1_counter");
/* Check that child2 == child2_1 + child2_2 */
ASSERT_EQ(vmscan_readings[2], vmscan_readings[5] + vmscan_readings[6],
"child2_vmscan");
ASSERT_EQ(attach_counters[2], attach_counters[5] + attach_counters[6],
"child2_counter");
/* Check that test == child1 + child2 */
ASSERT_EQ(vmscan_readings[0], vmscan_readings[1] + vmscan_readings[2],
"test_vmscan");
ASSERT_EQ(attach_counters[0], attach_counters[1] + attach_counters[2],
"test_counter");
/* Check that root >= test */
ASSERT_GE(vmscan_root, vmscan_readings[1], "root_vmscan");
ASSERT_GE(root_attach_counter, attach_counters[1], "root_counter");
}
/* Creates iter link and pins in bpffs, returns 0 on success, -errno on failure.
@ -278,12 +260,12 @@ static int setup_cgroup_iter(struct cgroup_hierarchical_stats *obj,
linfo.cgroup.order = BPF_CGROUP_ITER_SELF_ONLY;
opts.link_info = &linfo;
opts.link_info_len = sizeof(linfo);
link = bpf_program__attach_iter(obj->progs.dump_vmscan, &opts);
link = bpf_program__attach_iter(obj->progs.dumper, &opts);
if (!ASSERT_OK_PTR(link, "attach_iter"))
return -EFAULT;
/* Pin the link to a bpffs file */
snprintf(path, 128, "%s%s", BPFFS_VMSCAN, file_name);
snprintf(path, 128, "%s%s", BPFFS_ATTACH_COUNTERS, file_name);
err = bpf_link__pin(link, path);
ASSERT_OK(err, "pin cgroup_iter");
@ -313,7 +295,7 @@ static int setup_progs(struct cgroup_hierarchical_stats **skel)
if (!ASSERT_OK(err, "setup_cgroup_iter"))
return err;
bpf_program__set_autoattach((*skel)->progs.dump_vmscan, false);
bpf_program__set_autoattach((*skel)->progs.dumper, false);
err = cgroup_hierarchical_stats__attach(*skel);
if (!ASSERT_OK(err, "attach"))
return err;
@ -328,13 +310,13 @@ static void destroy_progs(struct cgroup_hierarchical_stats *skel)
for (i = 0; i < N_CGROUPS; i++) {
/* Delete files in bpffs that cgroup_iters are pinned in */
snprintf(path, 128, "%s%s", BPFFS_VMSCAN,
snprintf(path, 128, "%s%s", BPFFS_ATTACH_COUNTERS,
cgroups[i].name);
ASSERT_OK(remove(path), "remove cgroup_iter pin");
}
/* Delete root file in bpffs */
snprintf(path, 128, "%s%s", BPFFS_VMSCAN, CG_ROOT_NAME);
snprintf(path, 128, "%s%s", BPFFS_ATTACH_COUNTERS, CG_ROOT_NAME);
ASSERT_OK(remove(path), "remove cgroup_iter root pin");
cgroup_hierarchical_stats__destroy(skel);
}
@ -347,9 +329,9 @@ void test_cgroup_hierarchical_stats(void)
goto hierarchy_cleanup;
if (setup_progs(&skel))
goto cleanup;
if (induce_vmscan())
if (attach_processes())
goto cleanup;
check_vmscan_stats();
check_attach_counters();
cleanup:
destroy_progs(skel);
hierarchy_cleanup:

View File

@ -0,0 +1,178 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2022 Google LLC.
*/
#define _GNU_SOURCE
#include <sys/mount.h>
#include "test_progs.h"
#include "cgroup_helpers.h"
#include "network_helpers.h"
#include "connect_ping.skel.h"
/* 2001:db8::1 */
#define BINDADDR_V6 { { { 0x20,0x01,0x0d,0xb8,0,0,0,0,0,0,0,0,0,0,0,1 } } }
static const struct in6_addr bindaddr_v6 = BINDADDR_V6;
static void subtest(int cgroup_fd, struct connect_ping *skel,
int family, int do_bind)
{
struct sockaddr_in sa4 = {
.sin_family = AF_INET,
.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
};
struct sockaddr_in6 sa6 = {
.sin6_family = AF_INET6,
.sin6_addr = IN6ADDR_LOOPBACK_INIT,
};
struct sockaddr *sa;
socklen_t sa_len;
int protocol;
int sock_fd;
switch (family) {
case AF_INET:
sa = (struct sockaddr *)&sa4;
sa_len = sizeof(sa4);
protocol = IPPROTO_ICMP;
break;
case AF_INET6:
sa = (struct sockaddr *)&sa6;
sa_len = sizeof(sa6);
protocol = IPPROTO_ICMPV6;
break;
}
memset(skel->bss, 0, sizeof(*skel->bss));
skel->bss->do_bind = do_bind;
sock_fd = socket(family, SOCK_DGRAM, protocol);
if (!ASSERT_GE(sock_fd, 0, "sock-create"))
return;
if (!ASSERT_OK(connect(sock_fd, sa, sa_len), "connect"))
goto close_sock;
if (!ASSERT_EQ(skel->bss->invocations_v4, family == AF_INET ? 1 : 0,
"invocations_v4"))
goto close_sock;
if (!ASSERT_EQ(skel->bss->invocations_v6, family == AF_INET6 ? 1 : 0,
"invocations_v6"))
goto close_sock;
if (!ASSERT_EQ(skel->bss->has_error, 0, "has_error"))
goto close_sock;
if (!ASSERT_OK(getsockname(sock_fd, sa, &sa_len),
"getsockname"))
goto close_sock;
switch (family) {
case AF_INET:
if (!ASSERT_EQ(sa4.sin_family, family, "sin_family"))
goto close_sock;
if (!ASSERT_EQ(sa4.sin_addr.s_addr,
htonl(do_bind ? 0x01010101 : INADDR_LOOPBACK),
"sin_addr"))
goto close_sock;
break;
case AF_INET6:
if (!ASSERT_EQ(sa6.sin6_family, AF_INET6, "sin6_family"))
goto close_sock;
if (!ASSERT_EQ(memcmp(&sa6.sin6_addr,
do_bind ? &bindaddr_v6 : &in6addr_loopback,
sizeof(sa6.sin6_addr)),
0, "sin6_addr"))
goto close_sock;
break;
}
close_sock:
close(sock_fd);
}
void test_connect_ping(void)
{
struct connect_ping *skel;
int cgroup_fd;
if (!ASSERT_OK(unshare(CLONE_NEWNET | CLONE_NEWNS), "unshare"))
return;
/* overmount sysfs, and making original sysfs private so overmount
* does not propagate to other mntns.
*/
if (!ASSERT_OK(mount("none", "/sys", NULL, MS_PRIVATE, NULL),
"remount-private-sys"))
return;
if (!ASSERT_OK(mount("sysfs", "/sys", "sysfs", 0, NULL),
"mount-sys"))
return;
if (!ASSERT_OK(mount("bpffs", "/sys/fs/bpf", "bpf", 0, NULL),
"mount-bpf"))
goto clean_mount;
if (!ASSERT_OK(system("ip link set dev lo up"), "lo-up"))
goto clean_mount;
if (!ASSERT_OK(system("ip addr add 1.1.1.1 dev lo"), "lo-addr-v4"))
goto clean_mount;
if (!ASSERT_OK(system("ip -6 addr add 2001:db8::1 dev lo"), "lo-addr-v6"))
goto clean_mount;
if (write_sysctl("/proc/sys/net/ipv4/ping_group_range", "0 0"))
goto clean_mount;
cgroup_fd = test__join_cgroup("/connect_ping");
if (!ASSERT_GE(cgroup_fd, 0, "cg-create"))
goto clean_mount;
skel = connect_ping__open_and_load();
if (!ASSERT_OK_PTR(skel, "skel-load"))
goto close_cgroup;
skel->links.connect_v4_prog =
bpf_program__attach_cgroup(skel->progs.connect_v4_prog, cgroup_fd);
if (!ASSERT_OK_PTR(skel->links.connect_v4_prog, "cg-attach-v4"))
goto skel_destroy;
skel->links.connect_v6_prog =
bpf_program__attach_cgroup(skel->progs.connect_v6_prog, cgroup_fd);
if (!ASSERT_OK_PTR(skel->links.connect_v6_prog, "cg-attach-v6"))
goto skel_destroy;
/* Connect a v4 ping socket to localhost, assert that only v4 is called,
* and called exactly once, and that the socket's bound address is
* original loopback address.
*/
if (test__start_subtest("ipv4"))
subtest(cgroup_fd, skel, AF_INET, 0);
/* Connect a v4 ping socket to localhost, assert that only v4 is called,
* and called exactly once, and that the socket's bound address is
* address we explicitly bound.
*/
if (test__start_subtest("ipv4-bind"))
subtest(cgroup_fd, skel, AF_INET, 1);
/* Connect a v6 ping socket to localhost, assert that only v6 is called,
* and called exactly once, and that the socket's bound address is
* original loopback address.
*/
if (test__start_subtest("ipv6"))
subtest(cgroup_fd, skel, AF_INET6, 0);
/* Connect a v6 ping socket to localhost, assert that only v6 is called,
* and called exactly once, and that the socket's bound address is
* address we explicitly bound.
*/
if (test__start_subtest("ipv6-bind"))
subtest(cgroup_fd, skel, AF_INET6, 1);
skel_destroy:
connect_ping__destroy(skel);
close_cgroup:
close(cgroup_fd);
clean_mount:
umount2("/sys", MNT_DETACH);
}

View File

@ -30,7 +30,7 @@ static struct {
{"invalid_helper2", "Expected an initialized dynptr as arg #3"},
{"invalid_write1", "Expected an initialized dynptr as arg #1"},
{"invalid_write2", "Expected an initialized dynptr as arg #3"},
{"invalid_write3", "Expected an initialized ringbuf dynptr as arg #1"},
{"invalid_write3", "Expected an initialized dynptr as arg #1"},
{"invalid_write4", "arg 1 is an unacquired reference"},
{"invalid_read1", "invalid read from stack"},
{"invalid_read2", "cannot pass in dynptr at an offset"},

View File

@ -2,7 +2,7 @@
#include <test_progs.h>
#include "get_func_ip_test.skel.h"
void test_get_func_ip_test(void)
static void test_function_entry(void)
{
struct get_func_ip_test *skel = NULL;
int err, prog_fd;
@ -12,14 +12,6 @@ void test_get_func_ip_test(void)
if (!ASSERT_OK_PTR(skel, "get_func_ip_test__open"))
return;
/* test6 is x86_64 specifc because of the instruction
* offset, disabling it for all other archs
*/
#ifndef __x86_64__
bpf_program__set_autoload(skel->progs.test6, false);
bpf_program__set_autoload(skel->progs.test7, false);
#endif
err = get_func_ip_test__load(skel);
if (!ASSERT_OK(err, "get_func_ip_test__load"))
goto cleanup;
@ -43,11 +35,56 @@ void test_get_func_ip_test(void)
ASSERT_EQ(skel->bss->test3_result, 1, "test3_result");
ASSERT_EQ(skel->bss->test4_result, 1, "test4_result");
ASSERT_EQ(skel->bss->test5_result, 1, "test5_result");
#ifdef __x86_64__
ASSERT_EQ(skel->bss->test6_result, 1, "test6_result");
ASSERT_EQ(skel->bss->test7_result, 1, "test7_result");
#endif
cleanup:
get_func_ip_test__destroy(skel);
}
/* test6 is x86_64 specific because of the instruction
* offset, disabling it for all other archs
*/
#ifdef __x86_64__
static void test_function_body(void)
{
struct get_func_ip_test *skel = NULL;
LIBBPF_OPTS(bpf_test_run_opts, topts);
LIBBPF_OPTS(bpf_kprobe_opts, kopts);
struct bpf_link *link6 = NULL;
int err, prog_fd;
skel = get_func_ip_test__open();
if (!ASSERT_OK_PTR(skel, "get_func_ip_test__open"))
return;
bpf_program__set_autoload(skel->progs.test6, true);
err = get_func_ip_test__load(skel);
if (!ASSERT_OK(err, "get_func_ip_test__load"))
goto cleanup;
kopts.offset = skel->kconfig->CONFIG_X86_KERNEL_IBT ? 9 : 5;
link6 = bpf_program__attach_kprobe_opts(skel->progs.test6, "bpf_fentry_test6", &kopts);
if (!ASSERT_OK_PTR(link6, "link6"))
goto cleanup;
prog_fd = bpf_program__fd(skel->progs.test1);
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "test_run");
ASSERT_EQ(topts.retval, 0, "test_run");
ASSERT_EQ(skel->bss->test6_result, 1, "test6_result");
cleanup:
bpf_link__destroy(link6);
get_func_ip_test__destroy(skel);
}
#else
#define test_function_body()
#endif
void test_get_func_ip_test(void)
{
test_function_entry();
test_function_body();
}

View File

@ -2,6 +2,8 @@
/* Copyright (c) 2021 Facebook */
#include <test_progs.h>
#include <network_helpers.h>
#include "kfunc_call_fail.skel.h"
#include "kfunc_call_test.skel.h"
#include "kfunc_call_test.lskel.h"
#include "kfunc_call_test_subprog.skel.h"
#include "kfunc_call_test_subprog.lskel.h"
@ -9,36 +11,220 @@
#include "cap_helpers.h"
static void test_main(void)
{
struct kfunc_call_test_lskel *skel;
int prog_fd, err;
LIBBPF_OPTS(bpf_test_run_opts, topts,
.data_in = &pkt_v4,
.data_size_in = sizeof(pkt_v4),
.repeat = 1,
);
static size_t log_buf_sz = 1048576; /* 1 MB */
static char obj_log_buf[1048576];
skel = kfunc_call_test_lskel__open_and_load();
enum kfunc_test_type {
tc_test = 0,
syscall_test,
syscall_null_ctx_test,
};
struct kfunc_test_params {
const char *prog_name;
unsigned long lskel_prog_desc_offset;
int retval;
enum kfunc_test_type test_type;
const char *expected_err_msg;
};
#define __BPF_TEST_SUCCESS(name, __retval, type) \
{ \
.prog_name = #name, \
.lskel_prog_desc_offset = offsetof(struct kfunc_call_test_lskel, progs.name), \
.retval = __retval, \
.test_type = type, \
.expected_err_msg = NULL, \
}
#define __BPF_TEST_FAIL(name, __retval, type, error_msg) \
{ \
.prog_name = #name, \
.lskel_prog_desc_offset = 0 /* unused when test is failing */, \
.retval = __retval, \
.test_type = type, \
.expected_err_msg = error_msg, \
}
#define TC_TEST(name, retval) __BPF_TEST_SUCCESS(name, retval, tc_test)
#define SYSCALL_TEST(name, retval) __BPF_TEST_SUCCESS(name, retval, syscall_test)
#define SYSCALL_NULL_CTX_TEST(name, retval) __BPF_TEST_SUCCESS(name, retval, syscall_null_ctx_test)
#define TC_FAIL(name, retval, error_msg) __BPF_TEST_FAIL(name, retval, tc_test, error_msg)
#define SYSCALL_NULL_CTX_FAIL(name, retval, error_msg) \
__BPF_TEST_FAIL(name, retval, syscall_null_ctx_test, error_msg)
static struct kfunc_test_params kfunc_tests[] = {
/* failure cases:
* if retval is 0 -> the program will fail to load and the error message is an error
* if retval is not 0 -> the program can be loaded but running it will gives the
* provided return value. The error message is thus the one
* from a successful load
*/
SYSCALL_NULL_CTX_FAIL(kfunc_syscall_test_fail, -EINVAL, "processed 4 insns"),
SYSCALL_NULL_CTX_FAIL(kfunc_syscall_test_null_fail, -EINVAL, "processed 4 insns"),
TC_FAIL(kfunc_call_test_get_mem_fail_rdonly, 0, "R0 cannot write into rdonly_mem"),
TC_FAIL(kfunc_call_test_get_mem_fail_use_after_free, 0, "invalid mem access 'scalar'"),
TC_FAIL(kfunc_call_test_get_mem_fail_oob, 0, "min value is outside of the allowed memory range"),
TC_FAIL(kfunc_call_test_get_mem_fail_not_const, 0, "is not a const"),
TC_FAIL(kfunc_call_test_mem_acquire_fail, 0, "acquire kernel function does not return PTR_TO_BTF_ID"),
/* success cases */
TC_TEST(kfunc_call_test1, 12),
TC_TEST(kfunc_call_test2, 3),
TC_TEST(kfunc_call_test_ref_btf_id, 0),
TC_TEST(kfunc_call_test_get_mem, 42),
SYSCALL_TEST(kfunc_syscall_test, 0),
SYSCALL_NULL_CTX_TEST(kfunc_syscall_test_null, 0),
};
struct syscall_test_args {
__u8 data[16];
size_t size;
};
static void verify_success(struct kfunc_test_params *param)
{
struct kfunc_call_test_lskel *lskel = NULL;
LIBBPF_OPTS(bpf_test_run_opts, topts);
struct bpf_prog_desc *lskel_prog;
struct kfunc_call_test *skel;
struct bpf_program *prog;
int prog_fd, err;
struct syscall_test_args args = {
.size = 10,
};
switch (param->test_type) {
case syscall_test:
topts.ctx_in = &args;
topts.ctx_size_in = sizeof(args);
/* fallthrough */
case syscall_null_ctx_test:
break;
case tc_test:
topts.data_in = &pkt_v4;
topts.data_size_in = sizeof(pkt_v4);
topts.repeat = 1;
break;
}
/* first test with normal libbpf */
skel = kfunc_call_test__open_and_load();
if (!ASSERT_OK_PTR(skel, "skel"))
return;
prog_fd = skel->progs.kfunc_call_test1.prog_fd;
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "bpf_prog_test_run(test1)");
ASSERT_EQ(topts.retval, 12, "test1-retval");
prog = bpf_object__find_program_by_name(skel->obj, param->prog_name);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto cleanup;
prog_fd = skel->progs.kfunc_call_test2.prog_fd;
prog_fd = bpf_program__fd(prog);
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "bpf_prog_test_run(test2)");
ASSERT_EQ(topts.retval, 3, "test2-retval");
if (!ASSERT_OK(err, param->prog_name))
goto cleanup;
prog_fd = skel->progs.kfunc_call_test_ref_btf_id.prog_fd;
if (!ASSERT_EQ(topts.retval, param->retval, "retval"))
goto cleanup;
/* second test with light skeletons */
lskel = kfunc_call_test_lskel__open_and_load();
if (!ASSERT_OK_PTR(lskel, "lskel"))
goto cleanup;
lskel_prog = (struct bpf_prog_desc *)((char *)lskel + param->lskel_prog_desc_offset);
prog_fd = lskel_prog->prog_fd;
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "bpf_prog_test_run(test_ref_btf_id)");
ASSERT_EQ(topts.retval, 0, "test_ref_btf_id-retval");
if (!ASSERT_OK(err, param->prog_name))
goto cleanup;
kfunc_call_test_lskel__destroy(skel);
ASSERT_EQ(topts.retval, param->retval, "retval");
cleanup:
kfunc_call_test__destroy(skel);
if (lskel)
kfunc_call_test_lskel__destroy(lskel);
}
static void verify_fail(struct kfunc_test_params *param)
{
LIBBPF_OPTS(bpf_object_open_opts, opts);
LIBBPF_OPTS(bpf_test_run_opts, topts);
struct bpf_program *prog;
struct kfunc_call_fail *skel;
int prog_fd, err;
struct syscall_test_args args = {
.size = 10,
};
opts.kernel_log_buf = obj_log_buf;
opts.kernel_log_size = log_buf_sz;
opts.kernel_log_level = 1;
switch (param->test_type) {
case syscall_test:
topts.ctx_in = &args;
topts.ctx_size_in = sizeof(args);
/* fallthrough */
case syscall_null_ctx_test:
break;
case tc_test:
topts.data_in = &pkt_v4;
topts.data_size_in = sizeof(pkt_v4);
break;
topts.repeat = 1;
}
skel = kfunc_call_fail__open_opts(&opts);
if (!ASSERT_OK_PTR(skel, "kfunc_call_fail__open_opts"))
goto cleanup;
prog = bpf_object__find_program_by_name(skel->obj, param->prog_name);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto cleanup;
bpf_program__set_autoload(prog, true);
err = kfunc_call_fail__load(skel);
if (!param->retval) {
/* the verifier is supposed to complain and refuses to load */
if (!ASSERT_ERR(err, "unexpected load success"))
goto out_err;
} else {
/* the program is loaded but must dynamically fail */
if (!ASSERT_OK(err, "unexpected load error"))
goto out_err;
prog_fd = bpf_program__fd(prog);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_EQ(err, param->retval, param->prog_name))
goto out_err;
}
out_err:
if (!ASSERT_OK_PTR(strstr(obj_log_buf, param->expected_err_msg), "expected_err_msg")) {
fprintf(stderr, "Expected err_msg: %s\n", param->expected_err_msg);
fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
}
cleanup:
kfunc_call_fail__destroy(skel);
}
static void test_main(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(kfunc_tests); i++) {
if (!test__start_subtest(kfunc_tests[i].prog_name))
continue;
if (!kfunc_tests[i].expected_err_msg)
verify_success(&kfunc_tests[i]);
else
verify_fail(&kfunc_tests[i]);
}
}
static void test_subprog(void)
@ -121,8 +307,7 @@ static void test_destructive(void)
void test_kfunc_call(void)
{
if (test__start_subtest("main"))
test_main();
test_main();
if (test__start_subtest("subprog"))
test_subprog();

View File

@ -0,0 +1,164 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2022 Facebook
* Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH
*
* Author: Roberto Sassu <roberto.sassu@huawei.com>
*/
#include <test_progs.h>
#include "test_kfunc_dynptr_param.skel.h"
static size_t log_buf_sz = 1048576; /* 1 MB */
static char obj_log_buf[1048576];
static struct {
const char *prog_name;
const char *expected_verifier_err_msg;
int expected_runtime_err;
} kfunc_dynptr_tests[] = {
{"dynptr_type_not_supp",
"arg#0 pointer type STRUCT bpf_dynptr_kern points to unsupported dynamic pointer type", 0},
{"not_valid_dynptr",
"arg#0 pointer type STRUCT bpf_dynptr_kern must be valid and initialized", 0},
{"not_ptr_to_stack", "arg#0 pointer type STRUCT bpf_dynptr_kern not to stack", 0},
{"dynptr_data_null", NULL, -EBADMSG},
};
static bool kfunc_not_supported;
static int libbpf_print_cb(enum libbpf_print_level level, const char *fmt,
va_list args)
{
if (strcmp(fmt, "libbpf: extern (func ksym) '%s': not found in kernel or module BTFs\n"))
return 0;
if (strcmp(va_arg(args, char *), "bpf_verify_pkcs7_signature"))
return 0;
kfunc_not_supported = true;
return 0;
}
static void verify_fail(const char *prog_name, const char *expected_err_msg)
{
struct test_kfunc_dynptr_param *skel;
LIBBPF_OPTS(bpf_object_open_opts, opts);
libbpf_print_fn_t old_print_cb;
struct bpf_program *prog;
int err;
opts.kernel_log_buf = obj_log_buf;
opts.kernel_log_size = log_buf_sz;
opts.kernel_log_level = 1;
skel = test_kfunc_dynptr_param__open_opts(&opts);
if (!ASSERT_OK_PTR(skel, "test_kfunc_dynptr_param__open_opts"))
goto cleanup;
prog = bpf_object__find_program_by_name(skel->obj, prog_name);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto cleanup;
bpf_program__set_autoload(prog, true);
bpf_map__set_max_entries(skel->maps.ringbuf, getpagesize());
kfunc_not_supported = false;
old_print_cb = libbpf_set_print(libbpf_print_cb);
err = test_kfunc_dynptr_param__load(skel);
libbpf_set_print(old_print_cb);
if (err < 0 && kfunc_not_supported) {
fprintf(stderr,
"%s:SKIP:bpf_verify_pkcs7_signature() kfunc not supported\n",
__func__);
test__skip();
goto cleanup;
}
if (!ASSERT_ERR(err, "unexpected load success"))
goto cleanup;
if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
}
cleanup:
test_kfunc_dynptr_param__destroy(skel);
}
static void verify_success(const char *prog_name, int expected_runtime_err)
{
struct test_kfunc_dynptr_param *skel;
libbpf_print_fn_t old_print_cb;
struct bpf_program *prog;
struct bpf_link *link;
__u32 next_id;
int err;
skel = test_kfunc_dynptr_param__open();
if (!ASSERT_OK_PTR(skel, "test_kfunc_dynptr_param__open"))
return;
skel->bss->pid = getpid();
bpf_map__set_max_entries(skel->maps.ringbuf, getpagesize());
kfunc_not_supported = false;
old_print_cb = libbpf_set_print(libbpf_print_cb);
err = test_kfunc_dynptr_param__load(skel);
libbpf_set_print(old_print_cb);
if (err < 0 && kfunc_not_supported) {
fprintf(stderr,
"%s:SKIP:bpf_verify_pkcs7_signature() kfunc not supported\n",
__func__);
test__skip();
goto cleanup;
}
if (!ASSERT_OK(err, "test_kfunc_dynptr_param__load"))
goto cleanup;
prog = bpf_object__find_program_by_name(skel->obj, prog_name);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto cleanup;
link = bpf_program__attach(prog);
if (!ASSERT_OK_PTR(link, "bpf_program__attach"))
goto cleanup;
err = bpf_prog_get_next_id(0, &next_id);
bpf_link__destroy(link);
if (!ASSERT_OK(err, "bpf_prog_get_next_id"))
goto cleanup;
ASSERT_EQ(skel->bss->err, expected_runtime_err, "err");
cleanup:
test_kfunc_dynptr_param__destroy(skel);
}
void test_kfunc_dynptr_param(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(kfunc_dynptr_tests); i++) {
if (!test__start_subtest(kfunc_dynptr_tests[i].prog_name))
continue;
if (kfunc_dynptr_tests[i].expected_verifier_err_msg)
verify_fail(kfunc_dynptr_tests[i].prog_name,
kfunc_dynptr_tests[i].expected_verifier_err_msg);
else
verify_success(kfunc_dynptr_tests[i].prog_name,
kfunc_dynptr_tests[i].expected_runtime_err);
}
}

View File

@ -0,0 +1,112 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH
*
* Author: Roberto Sassu <roberto.sassu@huawei.com>
*/
#include <linux/keyctl.h>
#include <test_progs.h>
#include "test_lookup_key.skel.h"
#define KEY_LOOKUP_CREATE 0x01
#define KEY_LOOKUP_PARTIAL 0x02
static bool kfunc_not_supported;
static int libbpf_print_cb(enum libbpf_print_level level, const char *fmt,
va_list args)
{
char *func;
if (strcmp(fmt, "libbpf: extern (func ksym) '%s': not found in kernel or module BTFs\n"))
return 0;
func = va_arg(args, char *);
if (strcmp(func, "bpf_lookup_user_key") && strcmp(func, "bpf_key_put") &&
strcmp(func, "bpf_lookup_system_key"))
return 0;
kfunc_not_supported = true;
return 0;
}
void test_lookup_key(void)
{
libbpf_print_fn_t old_print_cb;
struct test_lookup_key *skel;
__u32 next_id;
int ret;
skel = test_lookup_key__open();
if (!ASSERT_OK_PTR(skel, "test_lookup_key__open"))
return;
old_print_cb = libbpf_set_print(libbpf_print_cb);
ret = test_lookup_key__load(skel);
libbpf_set_print(old_print_cb);
if (ret < 0 && kfunc_not_supported) {
printf("%s:SKIP:bpf_lookup_*_key(), bpf_key_put() kfuncs not supported\n",
__func__);
test__skip();
goto close_prog;
}
if (!ASSERT_OK(ret, "test_lookup_key__load"))
goto close_prog;
ret = test_lookup_key__attach(skel);
if (!ASSERT_OK(ret, "test_lookup_key__attach"))
goto close_prog;
skel->bss->monitored_pid = getpid();
skel->bss->key_serial = KEY_SPEC_THREAD_KEYRING;
/* The thread-specific keyring does not exist, this test fails. */
skel->bss->flags = 0;
ret = bpf_prog_get_next_id(0, &next_id);
if (!ASSERT_LT(ret, 0, "bpf_prog_get_next_id"))
goto close_prog;
/* Force creation of the thread-specific keyring, this test succeeds. */
skel->bss->flags = KEY_LOOKUP_CREATE;
ret = bpf_prog_get_next_id(0, &next_id);
if (!ASSERT_OK(ret, "bpf_prog_get_next_id"))
goto close_prog;
/* Pass both lookup flags for parameter validation. */
skel->bss->flags = KEY_LOOKUP_CREATE | KEY_LOOKUP_PARTIAL;
ret = bpf_prog_get_next_id(0, &next_id);
if (!ASSERT_OK(ret, "bpf_prog_get_next_id"))
goto close_prog;
/* Pass invalid flags. */
skel->bss->flags = UINT64_MAX;
ret = bpf_prog_get_next_id(0, &next_id);
if (!ASSERT_LT(ret, 0, "bpf_prog_get_next_id"))
goto close_prog;
skel->bss->key_serial = 0;
skel->bss->key_id = 1;
ret = bpf_prog_get_next_id(0, &next_id);
if (!ASSERT_OK(ret, "bpf_prog_get_next_id"))
goto close_prog;
skel->bss->key_id = UINT32_MAX;
ret = bpf_prog_get_next_id(0, &next_id);
ASSERT_LT(ret, 0, "bpf_prog_get_next_id");
close_prog:
skel->bss->monitored_pid = 0;
test_lookup_key__destroy(skel);
}

Some files were not shown because too many files have changed in this diff Show More