Alexei Starovoitov says:

====================
pull-request: bpf-next 2022-03-21 v2

We've added 137 non-merge commits during the last 17 day(s) which contain
a total of 143 files changed, 7123 insertions(+), 1092 deletions(-).

The main changes are:

1) Custom SEC() handling in libbpf, from Andrii.

2) subskeleton support, from Delyan.

3) Use btf_tag to recognize __percpu pointers in the verifier, from Hao.

4) Fix net.core.bpf_jit_harden race, from Hou.

5) Fix bpf_sk_lookup remote_port on big-endian, from Jakub.

6) Introduce fprobe (multi kprobe) _without_ arch bits, from Masami.
The arch specific bits will come later.

7) Introduce multi_kprobe bpf programs on top of fprobe, from Jiri.

8) Enable non-atomic allocations in local storage, from Joanne.

9) Various var_off ptr_to_btf_id fixed, from Kumar.

10) bpf_ima_file_hash helper, from Roberto.

11) Add "live packet" mode for XDP in BPF_PROG_RUN, from Toke.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (137 commits)
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  bpftool: Fix a bug in subskeleton code generation
  bpf: Fix bpf_prog_pack when PMU_SIZE is not defined
  bpf: Fix bpf_prog_pack for multi-node setup
  bpf: Fix warning for cast from restricted gfp_t in verifier
  bpf, arm: Fix various typos in comments
  libbpf: Close fd in bpf_object__reuse_map
  bpftool: Fix print error when show bpf map
  bpf: Fix kprobe_multi return probe backtrace
  Revert "bpf: Add support to inline bpf_get_func_ip helper on x86"
  bpf: Simplify check in btf_parse_hdr()
  selftests/bpf/test_lirc_mode2.sh: Exit with proper code
  bpf: Check for NULL return from bpf_get_btf_vmlinux
  selftests/bpf: Test skipping stacktrace
  bpf: Adjust BPF stack helper functions to accommodate skip > 0
  bpf: Select proper size for bpf_prog_pack
  ...
====================

Link: https://lore.kernel.org/r/20220322050159.5507-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Jakub Kicinski 2022-03-22 10:36:56 -07:00
commit 0db8640df5
143 changed files with 7139 additions and 1108 deletions

View File

@ -0,0 +1,117 @@
.. SPDX-License-Identifier: GPL-2.0
===================================
Running BPF programs from userspace
===================================
This document describes the ``BPF_PROG_RUN`` facility for running BPF programs
from userspace.
.. contents::
:local:
:depth: 2
Overview
--------
The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to
execute a BPF program in the kernel and return the results to userspace. This
can be used to unit test BPF programs against user-supplied context objects, and
as way to explicitly execute programs in the kernel for their side effects. The
command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue
to be defined in the UAPI header, aliased to the same value.
The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the
following types:
- ``BPF_PROG_TYPE_SOCKET_FILTER``
- ``BPF_PROG_TYPE_SCHED_CLS``
- ``BPF_PROG_TYPE_SCHED_ACT``
- ``BPF_PROG_TYPE_XDP``
- ``BPF_PROG_TYPE_SK_LOOKUP``
- ``BPF_PROG_TYPE_CGROUP_SKB``
- ``BPF_PROG_TYPE_LWT_IN``
- ``BPF_PROG_TYPE_LWT_OUT``
- ``BPF_PROG_TYPE_LWT_XMIT``
- ``BPF_PROG_TYPE_LWT_SEG6LOCAL``
- ``BPF_PROG_TYPE_FLOW_DISSECTOR``
- ``BPF_PROG_TYPE_STRUCT_OPS``
- ``BPF_PROG_TYPE_RAW_TRACEPOINT``
- ``BPF_PROG_TYPE_SYSCALL``
When using the ``BPF_PROG_RUN`` command, userspace supplies an input context
object and (for program types operating on network packets) a buffer containing
the packet data that the BPF program will operate on. The kernel will then
execute the program and return the results to userspace. Note that programs will
not have any side effects while being run in this mode; in particular, packets
will not actually be redirected or dropped, the program return code will just be
returned to userspace. A separate mode for live execution of XDP programs is
provided, documented separately below.
Running XDP programs in "live frame mode"
-----------------------------------------
The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs,
which can be used to execute XDP programs in a way where packets will actually
be processed by the kernel after the execution of the XDP program as if they
arrived on a physical interface. This mode is activated by setting the
``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to
``BPF_PROG_RUN``.
The live packet mode is optimised for high performance execution of the supplied
XDP program many times (suitable for, e.g., running as a traffic generator),
which means the semantics are not quite as straight-forward as the regular test
run mode. Specifically:
- When executing an XDP program in live frame mode, the result of the execution
will not be returned to userspace; instead, the kernel will perform the
operation indicated by the program's return code (drop the packet, redirect
it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes
in the syscall parameters when running in this mode will be rejected. In
addition, not all failures will be reported back to userspace directly;
specifically, only fatal errors in setup or during execution (like memory
allocation errors) will halt execution and return an error. If an error occurs
in packet processing, like a failure to redirect to a given interface,
execution will continue with the next repetition; these errors can be detected
via the same trace points as for regular XDP programs.
- Userspace can supply an ifindex as part of the context object, just like in
the regular (non-live) mode. The XDP program will be executed as though the
packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context
object will point to that interface. Furthermore, if the XDP program returns
``XDP_PASS``, the packet will be injected into the kernel networking stack as
though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet
will be transmitted *out* of that same interface. Do note, though, that
because the program execution is not happening in driver context, an
``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to
that same interface (i.e., it will only work if the driver has support for the
``ndo_xdp_xmit`` driver op).
- When running the program with multiple repetitions, the execution will happen
in batches. The batch size defaults to 64 packets (which is same as the
maximum NAPI receive batch size), but can be specified by userspace through
the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch,
the kernel executes the XDP program repeatedly, each invocation getting a
separate copy of the packet data. For each repetition, if the program drops
the packet, the data page is immediately recycled (see below). Otherwise, the
packet is buffered until the end of the batch, at which point all packets
buffered this way during the batch are transmitted at once.
- When setting up the test run, the kernel will initialise a pool of memory
pages of the same size as the batch size. Each memory page will be initialised
with the initial packet data supplied by userspace at ``BPF_PROG_RUN``
invocation. When possible, the pages will be recycled on future program
invocations, to improve performance. Pages will generally be recycled a full
batch at a time, except when a packet is dropped (by return code or because
of, say, a redirection error), in which case that page will be recycled
immediately. If a packet ends up being passed to the regular networking stack
(because the XDP program returns ``XDP_PASS``, or because it ends up being
redirected to an interface that injects it into the stack), the page will be
released and a new one will be allocated when the pool is empty.
When recycling, the page content is not rewritten; only the packet boundary
pointers (``data``, ``data_end`` and ``data_meta``) in the context object will
be reset to the original values. This means that if a program rewrites the
packet contents, it has to be prepared to see either the original content or
the modified version on subsequent invocations.

View File

@ -21,6 +21,7 @@ that goes into great technical depth about the BPF Architecture.
helpers
programs
maps
bpf_prog_run
classic_vs_extended.rst
bpf_licensing
test_debug

View File

@ -0,0 +1,174 @@
.. SPDX-License-Identifier: GPL-2.0
==================================
Fprobe - Function entry/exit probe
==================================
.. Author: Masami Hiramatsu <mhiramat@kernel.org>
Introduction
============
Fprobe is a function entry/exit probe mechanism based on ftrace.
Instead of using ftrace full feature, if you only want to attach callbacks
on function entry and exit, similar to the kprobes and kretprobes, you can
use fprobe. Compared with kprobes and kretprobes, fprobe gives faster
instrumentation for multiple functions with single handler. This document
describes how to use fprobe.
The usage of fprobe
===================
The fprobe is a wrapper of ftrace (+ kretprobe-like return callback) to
attach callbacks to multiple function entry and exit. User needs to set up
the `struct fprobe` and pass it to `register_fprobe()`.
Typically, `fprobe` data structure is initialized with the `entry_handler`
and/or `exit_handler` as below.
.. code-block:: c
struct fprobe fp = {
.entry_handler = my_entry_callback,
.exit_handler = my_exit_callback,
};
To enable the fprobe, call one of register_fprobe(), register_fprobe_ips(), and
register_fprobe_syms(). These functions register the fprobe with different types
of parameters.
The register_fprobe() enables a fprobe by function-name filters.
E.g. this enables @fp on "func*()" function except "func2()".::
register_fprobe(&fp, "func*", "func2");
The register_fprobe_ips() enables a fprobe by ftrace-location addresses.
E.g.
.. code-block:: c
unsigned long ips[] = { 0x.... };
register_fprobe_ips(&fp, ips, ARRAY_SIZE(ips));
And the register_fprobe_syms() enables a fprobe by symbol names.
E.g.
.. code-block:: c
char syms[] = {"func1", "func2", "func3"};
register_fprobe_syms(&fp, syms, ARRAY_SIZE(syms));
To disable (remove from functions) this fprobe, call::
unregister_fprobe(&fp);
You can temporally (soft) disable the fprobe by::
disable_fprobe(&fp);
and resume by::
enable_fprobe(&fp);
The above is defined by including the header::
#include <linux/fprobe.h>
Same as ftrace, the registered callbacks will start being called some time
after the register_fprobe() is called and before it returns. See
:file:`Documentation/trace/ftrace.rst`.
Also, the unregister_fprobe() will guarantee that the both enter and exit
handlers are no longer being called by functions after unregister_fprobe()
returns as same as unregister_ftrace_function().
The fprobe entry/exit handler
=============================
The prototype of the entry/exit callback function is as follows:
.. code-block:: c
void callback_func(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
Note that both entry and exit callbacks have same ptototype. The @entry_ip is
saved at function entry and passed to exit handler.
@fp
This is the address of `fprobe` data structure related to this handler.
You can embed the `fprobe` to your data structure and get it by
container_of() macro from @fp. The @fp must not be NULL.
@entry_ip
This is the ftrace address of the traced function (both entry and exit).
Note that this may not be the actual entry address of the function but
the address where the ftrace is instrumented.
@regs
This is the `pt_regs` data structure at the entry and exit. Note that
the instruction pointer of @regs may be different from the @entry_ip
in the entry_handler. If you need traced instruction pointer, you need
to use @entry_ip. On the other hand, in the exit_handler, the instruction
pointer of @regs is set to the currect return address.
Share the callbacks with kprobes
================================
Since the recursion safeness of the fprobe (and ftrace) is a bit different
from the kprobes, this may cause an issue if user wants to run the same
code from the fprobe and the kprobes.
Kprobes has per-cpu 'current_kprobe' variable which protects the kprobe
handler from recursion in all cases. On the other hand, fprobe uses
only ftrace_test_recursion_trylock(). This allows interrupt context to
call another (or same) fprobe while the fprobe user handler is running.
This is not a matter if the common callback code has its own recursion
detection, or it can handle the recursion in the different contexts
(normal/interrupt/NMI.)
But if it relies on the 'current_kprobe' recursion lock, it has to check
kprobe_running() and use kprobe_busy_*() APIs.
Fprobe has FPROBE_FL_KPROBE_SHARED flag to do this. If your common callback
code will be shared with kprobes, please set FPROBE_FL_KPROBE_SHARED
*before* registering the fprobe, like:
.. code-block:: c
fprobe.flags = FPROBE_FL_KPROBE_SHARED;
register_fprobe(&fprobe, "func*", NULL);
This will protect your common callback from the nested call.
The missed counter
==================
The `fprobe` data structure has `fprobe::nmissed` counter field as same as
kprobes.
This counter counts up when;
- fprobe fails to take ftrace_recursion lock. This usually means that a function
which is traced by other ftrace users is called from the entry_handler.
- fprobe fails to setup the function exit because of the shortage of rethook
(the shadow stack for hooking the function return.)
The `fprobe::nmissed` field counts up in both cases. Therefore, the former
skips both of entry and exit callback and the latter skips the exit
callback, but in both case the counter will increase by 1.
Note that if you set the FTRACE_OPS_FL_RECURSION and/or FTRACE_OPS_FL_RCU to
`fprobe::ops::flags` (ftrace_ops::flags) when registering the fprobe, this
counter may not work correctly, because ftrace skips the fprobe function which
increase the counter.
Functions and structures
========================
.. kernel-doc:: include/linux/fprobe.h
.. kernel-doc:: kernel/trace/fprobe.c

View File

@ -9,6 +9,7 @@ Linux Tracing Technologies
tracepoint-analysis
ftrace
ftrace-uses
fprobe
kprobes
kprobetrace
uprobetracer

View File

@ -1864,7 +1864,7 @@ static int build_body(struct jit_ctx *ctx)
if (ctx->target == NULL)
ctx->offsets[i] = ctx->idx;
/* If unsuccesfull, return with error code */
/* If unsuccesful, return with error code */
if (ret)
return ret;
}
@ -1973,7 +1973,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
* for jit, although it can decrease the size of the image.
*
* As each arm instruction is of length 32bit, we are translating
* number of JITed intructions into the size required to store these
* number of JITed instructions into the size required to store these
* JITed code.
*/
image_size = sizeof(u32) * ctx.idx;

View File

@ -2335,7 +2335,13 @@ out_image:
sizeof(rw_header->size));
bpf_jit_binary_pack_free(header, rw_header);
}
/* Fall back to interpreter mode */
prog = orig_prog;
if (extra_pass) {
prog->bpf_func = NULL;
prog->jited = 0;
prog->jited_len = 0;
}
goto out_addrs;
}
if (image) {
@ -2384,8 +2390,9 @@ out_image:
* Both cases are serious bugs and justify WARN_ON.
*/
if (WARN_ON(bpf_jit_binary_pack_finalize(prog, header, rw_header))) {
prog = orig_prog;
goto out_addrs;
/* header has been freed */
header = NULL;
goto out_image;
}
bpf_tail_call_direct_fixup(prog);

View File

@ -433,21 +433,6 @@ static void veth_set_multicast_list(struct net_device *dev)
{
}
static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
int buflen)
{
struct sk_buff *skb;
skb = build_skb(head, buflen);
if (!skb)
return NULL;
skb_reserve(skb, headroom);
skb_put(skb, len);
return skb;
}
static int veth_select_rxq(struct net_device *dev)
{
return smp_processor_id() % dev->real_num_rx_queues;
@ -494,7 +479,7 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
struct xdp_frame *frame = frames[i];
void *ptr = veth_xdp_to_ptr(frame);
if (unlikely(frame->len > max_len ||
if (unlikely(xdp_get_frame_len(frame) > max_len ||
__ptr_ring_produce(&rq->xdp_ring, ptr)))
break;
nxmit++;
@ -695,16 +680,130 @@ static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames,
}
}
static void veth_xdp_get(struct xdp_buff *xdp)
{
struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
int i;
get_page(virt_to_page(xdp->data));
if (likely(!xdp_buff_has_frags(xdp)))
return;
for (i = 0; i < sinfo->nr_frags; i++)
__skb_frag_ref(&sinfo->frags[i]);
}
static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
struct xdp_buff *xdp,
struct sk_buff **pskb)
{
struct sk_buff *skb = *pskb;
u32 frame_sz;
if (skb_shared(skb) || skb_head_is_locked(skb) ||
skb_shinfo(skb)->nr_frags) {
u32 size, len, max_head_size, off;
struct sk_buff *nskb;
struct page *page;
int i, head_off;
/* We need a private copy of the skb and data buffers since
* the ebpf program can modify it. We segment the original skb
* into order-0 pages without linearize it.
*
* Make sure we have enough space for linear and paged area
*/
max_head_size = SKB_WITH_OVERHEAD(PAGE_SIZE -
VETH_XDP_HEADROOM);
if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
goto drop;
/* Allocate skb head */
page = alloc_page(GFP_ATOMIC | __GFP_NOWARN);
if (!page)
goto drop;
nskb = build_skb(page_address(page), PAGE_SIZE);
if (!nskb) {
put_page(page);
goto drop;
}
skb_reserve(nskb, VETH_XDP_HEADROOM);
size = min_t(u32, skb->len, max_head_size);
if (skb_copy_bits(skb, 0, nskb->data, size)) {
consume_skb(nskb);
goto drop;
}
skb_put(nskb, size);
skb_copy_header(nskb, skb);
head_off = skb_headroom(nskb) - skb_headroom(skb);
skb_headers_offset_update(nskb, head_off);
/* Allocate paged area of new skb */
off = size;
len = skb->len - off;
for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
page = alloc_page(GFP_ATOMIC | __GFP_NOWARN);
if (!page) {
consume_skb(nskb);
goto drop;
}
size = min_t(u32, len, PAGE_SIZE);
skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
if (skb_copy_bits(skb, off, page_address(page),
size)) {
consume_skb(nskb);
goto drop;
}
len -= size;
off += size;
}
consume_skb(skb);
skb = nskb;
} else if (skb_headroom(skb) < XDP_PACKET_HEADROOM &&
pskb_expand_head(skb, VETH_XDP_HEADROOM, 0, GFP_ATOMIC)) {
goto drop;
}
/* SKB "head" area always have tailroom for skb_shared_info */
frame_sz = skb_end_pointer(skb) - skb->head;
frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
xdp_prepare_buff(xdp, skb->head, skb_headroom(skb),
skb_headlen(skb), true);
if (skb_is_nonlinear(skb)) {
skb_shinfo(skb)->xdp_frags_size = skb->data_len;
xdp_buff_set_frags_flag(xdp);
} else {
xdp_buff_clear_frags_flag(xdp);
}
*pskb = skb;
return 0;
drop:
consume_skb(skb);
*pskb = NULL;
return -ENOMEM;
}
static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
struct sk_buff *skb,
struct veth_xdp_tx_bq *bq,
struct veth_stats *stats)
{
u32 pktlen, headroom, act, metalen, frame_sz;
void *orig_data, *orig_data_end;
struct bpf_prog *xdp_prog;
int mac_len, delta, off;
struct xdp_buff xdp;
u32 act, metalen;
int off;
skb_prepare_for_gro(skb);
@ -715,52 +814,9 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
goto out;
}
mac_len = skb->data - skb_mac_header(skb);
pktlen = skb->len + mac_len;
headroom = skb_headroom(skb) - mac_len;
if (skb_shared(skb) || skb_head_is_locked(skb) ||
skb_is_nonlinear(skb) || headroom < XDP_PACKET_HEADROOM) {
struct sk_buff *nskb;
int size, head_off;
void *head, *start;
struct page *page;
size = SKB_DATA_ALIGN(VETH_XDP_HEADROOM + pktlen) +
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
if (size > PAGE_SIZE)
goto drop;
page = alloc_page(GFP_ATOMIC | __GFP_NOWARN);
if (!page)
goto drop;
head = page_address(page);
start = head + VETH_XDP_HEADROOM;
if (skb_copy_bits(skb, -mac_len, start, pktlen)) {
page_frag_free(head);
goto drop;
}
nskb = veth_build_skb(head, VETH_XDP_HEADROOM + mac_len,
skb->len, PAGE_SIZE);
if (!nskb) {
page_frag_free(head);
goto drop;
}
skb_copy_header(nskb, skb);
head_off = skb_headroom(nskb) - skb_headroom(skb);
skb_headers_offset_update(nskb, head_off);
consume_skb(skb);
skb = nskb;
}
/* SKB "head" area always have tailroom for skb_shared_info */
frame_sz = skb_end_pointer(skb) - skb->head;
frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
xdp_init_buff(&xdp, frame_sz, &rq->xdp_rxq);
xdp_prepare_buff(&xdp, skb->head, skb->mac_header, pktlen, true);
__skb_push(skb, skb->data - skb_mac_header(skb));
if (veth_convert_skb_to_xdp_buff(rq, &xdp, &skb))
goto drop;
orig_data = xdp.data;
orig_data_end = xdp.data_end;
@ -771,7 +827,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
case XDP_PASS:
break;
case XDP_TX:
get_page(virt_to_page(xdp.data));
veth_xdp_get(&xdp);
consume_skb(skb);
xdp.rxq->mem = rq->xdp_mem;
if (unlikely(veth_xdp_tx(rq, &xdp, bq) < 0)) {
@ -783,7 +839,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
rcu_read_unlock();
goto xdp_xmit;
case XDP_REDIRECT:
get_page(virt_to_page(xdp.data));
veth_xdp_get(&xdp);
consume_skb(skb);
xdp.rxq->mem = rq->xdp_mem;
if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
@ -806,18 +862,27 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
rcu_read_unlock();
/* check if bpf_xdp_adjust_head was used */
delta = orig_data - xdp.data;
off = mac_len + delta;
off = orig_data - xdp.data;
if (off > 0)
__skb_push(skb, off);
else if (off < 0)
__skb_pull(skb, -off);
skb->mac_header -= delta;
skb_reset_mac_header(skb);
/* check if bpf_xdp_adjust_tail was used */
off = xdp.data_end - orig_data_end;
if (off != 0)
__skb_put(skb, off); /* positive on grow, negative on shrink */
/* XDP frag metadata (e.g. nr_frags) are updated in eBPF helpers
* (e.g. bpf_xdp_adjust_tail), we need to update data_len here.
*/
if (xdp_buff_has_frags(&xdp))
skb->data_len = skb_shinfo(skb)->xdp_frags_size;
else
skb->data_len = 0;
skb->protocol = eth_type_trans(skb, rq->dev);
metalen = xdp.data - xdp.data_meta;
@ -833,7 +898,7 @@ xdp_drop:
return NULL;
err_xdp:
rcu_read_unlock();
page_frag_free(xdp.data);
xdp_return_buff(&xdp);
xdp_xmit:
return NULL;
}
@ -855,7 +920,7 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
/* ndo_xdp_xmit */
struct xdp_frame *frame = veth_ptr_to_xdp(ptr);
stats->xdp_bytes += frame->len;
stats->xdp_bytes += xdp_get_frame_len(frame);
frame = veth_xdp_rcv_one(rq, frame, bq, stats);
if (frame) {
/* XDP_PASS */
@ -1463,9 +1528,14 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog,
goto err;
}
max_mtu = PAGE_SIZE - VETH_XDP_HEADROOM -
peer->hard_header_len -
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
max_mtu = SKB_WITH_OVERHEAD(PAGE_SIZE - VETH_XDP_HEADROOM) -
peer->hard_header_len;
/* Allow increasing the max_mtu if the program supports
* XDP fragments.
*/
if (prog->aux->xdp_has_frags)
max_mtu += PAGE_SIZE * MAX_SKB_FRAGS;
if (peer->mtu > max_mtu) {
NL_SET_ERR_MSG_MOD(extack, "Peer MTU is too large to set XDP");
err = -ERANGE;

View File

@ -334,7 +334,15 @@ enum bpf_type_flag {
/* MEM is in user address space. */
MEM_USER = BIT(3 + BPF_BASE_TYPE_BITS),
__BPF_TYPE_LAST_FLAG = MEM_USER,
/* MEM is a percpu memory. MEM_PERCPU tags PTR_TO_BTF_ID. When tagged
* with MEM_PERCPU, PTR_TO_BTF_ID _cannot_ be directly accessed. In
* order to drop this tag, it must be passed into bpf_per_cpu_ptr()
* or bpf_this_cpu_ptr(), which will return the pointer corresponding
* to the specified cpu.
*/
MEM_PERCPU = BIT(4 + BPF_BASE_TYPE_BITS),
__BPF_TYPE_LAST_FLAG = MEM_PERCPU,
};
/* Max number of base types. */
@ -516,7 +524,6 @@ enum bpf_reg_type {
*/
PTR_TO_MEM, /* reg points to valid memory region */
PTR_TO_BUF, /* reg points to a read/write buffer */
PTR_TO_PERCPU_BTF_ID, /* reg points to a percpu kernel variable */
PTR_TO_FUNC, /* reg points to a bpf program function */
__BPF_REG_TYPE_MAX,

View File

@ -154,16 +154,17 @@ void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem);
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
bool charge_mem);
bool charge_mem, gfp_t gfp_flags);
int
bpf_local_storage_alloc(void *owner,
struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *first_selem);
struct bpf_local_storage_elem *first_selem,
gfp_t gfp_flags);
struct bpf_local_storage_data *
bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
void *value, u64 map_flags);
void *value, u64 map_flags, gfp_t gfp_flags);
void bpf_local_storage_free_rcu(struct rcu_head *rcu);

View File

@ -140,3 +140,4 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
#ifdef CONFIG_PERF_EVENTS
BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
#endif
BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi)

View File

@ -521,6 +521,10 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt);
int check_ptr_off_reg(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg, int regno);
int check_func_arg_reg_off(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg, int regno,
enum bpf_arg_type arg_type,
bool is_release_func);
int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
u32 regno);
int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,

View File

@ -68,3 +68,28 @@
#define __nocfi __attribute__((__no_sanitize__("cfi")))
#define __cficanonical __attribute__((__cfi_canonical_jump_table__))
/*
* Turn individual warnings and errors on and off locally, depending
* on version.
*/
#define __diag_clang(version, severity, s) \
__diag_clang_ ## version(__diag_clang_ ## severity s)
/* Severity used in pragma directives */
#define __diag_clang_ignore ignored
#define __diag_clang_warn warning
#define __diag_clang_error error
#define __diag_str1(s) #s
#define __diag_str(s) __diag_str1(s)
#define __diag(s) _Pragma(__diag_str(clang diagnostic s))
#if CONFIG_CLANG_VERSION >= 110000
#define __diag_clang_11(s) __diag(s)
#else
#define __diag_clang_11(s)
#endif
#define __diag_ignore_all(option, comment) \
__diag_clang(11, ignore, option)

View File

@ -151,6 +151,9 @@
#define __diag_GCC_8(s)
#endif
#define __diag_ignore_all(option, comment) \
__diag_GCC(8, ignore, option)
/*
* Prior to 9.1, -Wno-alloc-size-larger-than (and therefore the "alloc_size"
* attribute) do not work, and must be disabled.

View File

@ -4,6 +4,13 @@
#ifndef __ASSEMBLY__
#if defined(CONFIG_DEBUG_INFO_BTF) && defined(CONFIG_PAHOLE_HAS_BTF_TAG) && \
__has_attribute(btf_type_tag)
# define BTF_TYPE_TAG(value) __attribute__((btf_type_tag(#value)))
#else
# define BTF_TYPE_TAG(value) /* nothing */
#endif
#ifdef __CHECKER__
/* address spaces */
# define __kernel __attribute__((address_space(0)))
@ -31,14 +38,11 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { }
# define __kernel
# ifdef STRUCTLEAK_PLUGIN
# define __user __attribute__((user))
# elif defined(CONFIG_DEBUG_INFO_BTF) && defined(CONFIG_PAHOLE_HAS_BTF_TAG) && \
__has_attribute(btf_type_tag)
# define __user __attribute__((btf_type_tag("user")))
# else
# define __user
# define __user BTF_TYPE_TAG(user)
# endif
# define __iomem
# define __percpu
# define __percpu BTF_TYPE_TAG(percpu)
# define __rcu
# define __chk_user_ptr(x) (void)0
# define __chk_io_ptr(x) (void)0
@ -371,4 +375,8 @@ struct ftrace_likely_data {
#define __diag_error(compiler, version, option, comment) \
__diag_ ## compiler(version, error, option)
#ifndef __diag_ignore_all
#define __diag_ignore_all(option, comment)
#endif
#endif /* __LINUX_COMPILER_TYPES_H */

View File

@ -566,6 +566,7 @@ struct bpf_prog {
gpl_compatible:1, /* Is filter GPL compatible? */
cb_access:1, /* Is control block accessed? */
dst_needed:1, /* Do we need dst entry? */
blinding_requested:1, /* needs constant blinding */
blinded:1, /* Was blinded */
is_func:1, /* program is a bpf function */
kprobe_override:1, /* Do we override a kprobe? */
@ -573,7 +574,7 @@ struct bpf_prog {
enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
call_get_func_ip:1, /* Do we call get_func_ip() */
delivery_time_access:1; /* Accessed __sk_buff->delivery_time_type */
tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
enum bpf_prog_type type; /* Type of BPF program */
enum bpf_attach_type expected_attach_type; /* For some prog types */
u32 len; /* Number of filter blocks */

105
include/linux/fprobe.h Normal file
View File

@ -0,0 +1,105 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Simple ftrace probe wrapper */
#ifndef _LINUX_FPROBE_H
#define _LINUX_FPROBE_H
#include <linux/compiler.h>
#include <linux/ftrace.h>
#include <linux/rethook.h>
/**
* struct fprobe - ftrace based probe.
* @ops: The ftrace_ops.
* @nmissed: The counter for missing events.
* @flags: The status flag.
* @rethook: The rethook data structure. (internal data)
* @entry_handler: The callback function for function entry.
* @exit_handler: The callback function for function exit.
*/
struct fprobe {
#ifdef CONFIG_FUNCTION_TRACER
/*
* If CONFIG_FUNCTION_TRACER is not set, CONFIG_FPROBE is disabled too.
* But user of fprobe may keep embedding the struct fprobe on their own
* code. To avoid build error, this will keep the fprobe data structure
* defined here, but remove ftrace_ops data structure.
*/
struct ftrace_ops ops;
#endif
unsigned long nmissed;
unsigned int flags;
struct rethook *rethook;
void (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
};
/* This fprobe is soft-disabled. */
#define FPROBE_FL_DISABLED 1
/*
* This fprobe handler will be shared with kprobes.
* This flag must be set before registering.
*/
#define FPROBE_FL_KPROBE_SHARED 2
static inline bool fprobe_disabled(struct fprobe *fp)
{
return (fp) ? fp->flags & FPROBE_FL_DISABLED : false;
}
static inline bool fprobe_shared_with_kprobes(struct fprobe *fp)
{
return (fp) ? fp->flags & FPROBE_FL_KPROBE_SHARED : false;
}
#ifdef CONFIG_FPROBE
int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter);
int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num);
int register_fprobe_syms(struct fprobe *fp, const char **syms, int num);
int unregister_fprobe(struct fprobe *fp);
#else
static inline int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter)
{
return -EOPNOTSUPP;
}
static inline int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num)
{
return -EOPNOTSUPP;
}
static inline int register_fprobe_syms(struct fprobe *fp, const char **syms, int num)
{
return -EOPNOTSUPP;
}
static inline int unregister_fprobe(struct fprobe *fp)
{
return -EOPNOTSUPP;
}
#endif
/**
* disable_fprobe() - Disable fprobe
* @fp: The fprobe to be disabled.
*
* This will soft-disable @fp. Note that this doesn't remove the ftrace
* hooks from the function entry.
*/
static inline void disable_fprobe(struct fprobe *fp)
{
if (fp)
fp->flags |= FPROBE_FL_DISABLED;
}
/**
* enable_fprobe() - Enable fprobe
* @fp: The fprobe to be enabled.
*
* This will soft-enable @fp.
*/
static inline void enable_fprobe(struct fprobe *fp)
{
if (fp)
fp->flags &= ~FPROBE_FL_DISABLED;
}
#endif

View File

@ -512,6 +512,8 @@ struct dyn_ftrace {
int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
int remove, int reset);
int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips,
unsigned int cnt, int remove, int reset);
int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
int len, int reset);
int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
@ -802,6 +804,7 @@ static inline unsigned long ftrace_location(unsigned long ip)
#define ftrace_regex_open(ops, flag, inod, file) ({ -ENODEV; })
#define ftrace_set_early_filter(ops, buf, enable) do { } while (0)
#define ftrace_set_filter_ip(ops, ip, remove, reset) ({ -ENODEV; })
#define ftrace_set_filter_ips(ops, ips, cnt, remove, reset) ({ -ENODEV; })
#define ftrace_set_filter(ops, buf, len, reset) ({ -ENODEV; })
#define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; })
#define ftrace_free_filter(ops) do { } while (0)

View File

@ -427,6 +427,9 @@ static inline struct kprobe *kprobe_running(void)
{
return NULL;
}
#define kprobe_busy_begin() do {} while (0)
#define kprobe_busy_end() do {} while (0)
static inline int register_kprobe(struct kprobe *p)
{
return -EOPNOTSUPP;

100
include/linux/rethook.h Normal file
View File

@ -0,0 +1,100 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Return hooking with list-based shadow stack.
*/
#ifndef _LINUX_RETHOOK_H
#define _LINUX_RETHOOK_H
#include <linux/compiler.h>
#include <linux/freelist.h>
#include <linux/kallsyms.h>
#include <linux/llist.h>
#include <linux/rcupdate.h>
#include <linux/refcount.h>
struct rethook_node;
typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs *);
/**
* struct rethook - The rethook management data structure.
* @data: The user-defined data storage.
* @handler: The user-defined return hook handler.
* @pool: The pool of struct rethook_node.
* @ref: The reference counter.
* @rcu: The rcu_head for deferred freeing.
*
* Don't embed to another data structure, because this is a self-destructive
* data structure when all rethook_node are freed.
*/
struct rethook {
void *data;
rethook_handler_t handler;
struct freelist_head pool;
refcount_t ref;
struct rcu_head rcu;
};
/**
* struct rethook_node - The rethook shadow-stack entry node.
* @freelist: The freelist, linked to struct rethook::pool.
* @rcu: The rcu_head for deferred freeing.
* @llist: The llist, linked to a struct task_struct::rethooks.
* @rethook: The pointer to the struct rethook.
* @ret_addr: The storage for the real return address.
* @frame: The storage for the frame pointer.
*
* You can embed this to your extended data structure to store any data
* on each entry of the shadow stack.
*/
struct rethook_node {
union {
struct freelist_node freelist;
struct rcu_head rcu;
};
struct llist_node llist;
struct rethook *rethook;
unsigned long ret_addr;
unsigned long frame;
};
struct rethook *rethook_alloc(void *data, rethook_handler_t handler);
void rethook_free(struct rethook *rh);
void rethook_add_node(struct rethook *rh, struct rethook_node *node);
struct rethook_node *rethook_try_get(struct rethook *rh);
void rethook_recycle(struct rethook_node *node);
void rethook_hook(struct rethook_node *node, struct pt_regs *regs, bool mcount);
unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
struct llist_node **cur);
/* Arch dependent code must implement arch_* and trampoline code */
void arch_rethook_prepare(struct rethook_node *node, struct pt_regs *regs, bool mcount);
void arch_rethook_trampoline(void);
/**
* is_rethook_trampoline() - Check whether the address is rethook trampoline
* @addr: The address to be checked
*
* Return true if the @addr is the rethook trampoline address.
*/
static inline bool is_rethook_trampoline(unsigned long addr)
{
return addr == (unsigned long)dereference_symbol_descriptor(arch_rethook_trampoline);
}
/* If the architecture needs to fixup the return address, implement it. */
void arch_rethook_fixup_return(struct pt_regs *regs,
unsigned long correct_ret_addr);
/* Generic trampoline handler, arch code must prepare asm stub */
unsigned long rethook_trampoline_handler(struct pt_regs *regs,
unsigned long frame);
#ifdef CONFIG_RETHOOK
void rethook_flush_task(struct task_struct *tk);
#else
#define rethook_flush_task(tsk) do { } while (0)
#endif
#endif

View File

@ -1481,6 +1481,9 @@ struct task_struct {
#ifdef CONFIG_KRETPROBES
struct llist_head kretprobe_instances;
#endif
#ifdef CONFIG_RETHOOK
struct llist_head rethooks;
#endif
#ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH
/*

View File

@ -992,10 +992,10 @@ struct sk_buff {
__u8 csum_complete_sw:1;
__u8 csum_level:2;
__u8 dst_pending_confirm:1;
__u8 mono_delivery_time:1;
__u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */
#ifdef CONFIG_NET_CLS_ACT
__u8 tc_skip_classify:1;
__u8 tc_at_ingress:1;
__u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
#endif
#ifdef CONFIG_IPV6_NDISC_NODETYPE
__u8 ndisc_nodetype:2;
@ -1094,7 +1094,9 @@ struct sk_buff {
#endif
#define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset)
/* if you move pkt_vlan_present around you also must adapt these constants */
/* if you move pkt_vlan_present, tc_at_ingress, or mono_delivery_time
* around, you also must adapt these constants.
*/
#ifdef __BIG_ENDIAN_BITFIELD
#define PKT_VLAN_PRESENT_BIT 7
#define TC_AT_INGRESS_MASK (1 << 0)
@ -1105,8 +1107,6 @@ struct sk_buff {
#define SKB_MONO_DELIVERY_TIME_MASK (1 << 5)
#endif
#define PKT_VLAN_PRESENT_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset)
#define TC_AT_INGRESS_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset)
#define SKB_MONO_DELIVERY_TIME_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset)
#ifdef __KERNEL__
/*

View File

@ -304,21 +304,16 @@ static inline void sock_drop(struct sock *sk, struct sk_buff *skb)
kfree_skb(skb);
}
static inline void drop_sk_msg(struct sk_psock *psock, struct sk_msg *msg)
{
if (msg->skb)
sock_drop(psock->sk, msg->skb);
kfree(msg);
}
static inline void sk_psock_queue_msg(struct sk_psock *psock,
struct sk_msg *msg)
{
spin_lock_bh(&psock->ingress_lock);
if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))
list_add_tail(&msg->list, &psock->ingress_msg);
else
drop_sk_msg(psock, msg);
else {
sk_msg_free(psock->sk, msg);
kfree(msg);
}
spin_unlock_bh(&psock->ingress_lock);
}

View File

@ -6,7 +6,7 @@
void sort_r(void *base, size_t num, size_t size,
cmp_r_func_t cmp_func,
swap_func_t swap_func,
swap_r_func_t swap_func,
const void *priv);
void sort(void *base, size_t num, size_t size,

View File

@ -15,6 +15,7 @@ struct array_buffer;
struct tracer;
struct dentry;
struct bpf_prog;
union bpf_attr;
const char *trace_print_flags_seq(struct trace_seq *p, const char *delim,
unsigned long flags,
@ -738,6 +739,7 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
u32 *fd_type, const char **buf,
u64 *probe_offset, u64 *probe_addr);
int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
#else
static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
{
@ -779,6 +781,11 @@ static inline int bpf_get_perf_event_info(const struct perf_event *event,
{
return -EOPNOTSUPP;
}
static inline int
bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
return -EOPNOTSUPP;
}
#endif
enum {

View File

@ -226,6 +226,7 @@ struct callback_head {
typedef void (*rcu_callback_t)(struct rcu_head *head);
typedef void (*call_rcu_func_t)(struct rcu_head *head, rcu_callback_t func);
typedef void (*swap_r_func_t)(void *a, void *b, int size, const void *priv);
typedef void (*swap_func_t)(void *a, void *b, int size);
typedef int (*cmp_r_func_t)(const void *a, const void *b, const void *priv);

View File

@ -343,6 +343,20 @@ out:
__xdp_release_frame(xdpf->data, mem);
}
static __always_inline unsigned int xdp_get_frame_len(struct xdp_frame *xdpf)
{
struct skb_shared_info *sinfo;
unsigned int len = xdpf->len;
if (likely(!xdp_frame_has_frags(xdpf)))
goto out;
sinfo = xdp_get_shared_info_from_frame(xdpf);
len += sinfo->xdp_frags_size;
out:
return len;
}
int __xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
struct net_device *dev, u32 queue_index,
unsigned int napi_id, u32 frag_size);

View File

@ -997,6 +997,7 @@ enum bpf_attach_type {
BPF_SK_REUSEPORT_SELECT,
BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
BPF_PERF_EVENT,
BPF_TRACE_KPROBE_MULTI,
__MAX_BPF_ATTACH_TYPE
};
@ -1011,6 +1012,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_NETNS = 5,
BPF_LINK_TYPE_XDP = 6,
BPF_LINK_TYPE_PERF_EVENT = 7,
BPF_LINK_TYPE_KPROBE_MULTI = 8,
MAX_BPF_LINK_TYPE,
};
@ -1118,6 +1120,11 @@ enum bpf_link_type {
*/
#define BPF_F_XDP_HAS_FRAGS (1U << 5)
/* link_create.kprobe_multi.flags used in LINK_CREATE command for
* BPF_TRACE_KPROBE_MULTI attach type to create return probe.
*/
#define BPF_F_KPROBE_MULTI_RETURN (1U << 0)
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions:
*
@ -1232,6 +1239,8 @@ enum {
/* If set, run the test on the cpu specified by bpf_attr.test.cpu */
#define BPF_F_TEST_RUN_ON_CPU (1U << 0)
/* If set, XDP frames will be transmitted after processing */
#define BPF_F_TEST_XDP_LIVE_FRAMES (1U << 1)
/* type for BPF_ENABLE_STATS */
enum bpf_stats_type {
@ -1393,6 +1402,7 @@ union bpf_attr {
__aligned_u64 ctx_out;
__u32 flags;
__u32 cpu;
__u32 batch_size;
} test;
struct { /* anonymous struct used by BPF_*_GET_*_ID */
@ -1472,6 +1482,13 @@ union bpf_attr {
*/
__u64 bpf_cookie;
} perf_event;
struct {
__u32 flags;
__u32 cnt;
__aligned_u64 syms;
__aligned_u64 addrs;
__aligned_u64 cookies;
} kprobe_multi;
};
} link_create;
@ -2299,8 +2316,8 @@ union bpf_attr {
* Return
* The return value depends on the result of the test, and can be:
*
* * 0, if current task belongs to the cgroup2.
* * 1, if current task does not belong to the cgroup2.
* * 1, if current task belongs to the cgroup2.
* * 0, if current task does not belong to the cgroup2.
* * A negative error code, if an error occurred.
*
* long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
@ -2992,8 +3009,8 @@ union bpf_attr {
*
* # sysctl kernel.perf_event_max_stack=<new value>
* Return
* A non-negative value equal to or less than *size* on success,
* or a negative error in case of failure.
* The non-negative copied *buf* length equal to or less than
* *size* on success, or a negative error in case of failure.
*
* long bpf_skb_load_bytes_relative(const void *skb, u32 offset, void *to, u32 len, u32 start_header)
* Description
@ -4299,8 +4316,8 @@ union bpf_attr {
*
* # sysctl kernel.perf_event_max_stack=<new value>
* Return
* A non-negative value equal to or less than *size* on success,
* or a negative error in case of failure.
* The non-negative copied *buf* length equal to or less than
* *size* on success, or a negative error in case of failure.
*
* long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res, u32 len, u64 flags)
* Description
@ -5087,23 +5104,22 @@ union bpf_attr {
* 0 on success, or a negative error in case of failure. On error
* *dst* buffer is zeroed out.
*
* long bpf_skb_set_delivery_time(struct sk_buff *skb, u64 dtime, u32 dtime_type)
* long bpf_skb_set_tstamp(struct sk_buff *skb, u64 tstamp, u32 tstamp_type)
* Description
* Set a *dtime* (delivery time) to the __sk_buff->tstamp and also
* change the __sk_buff->delivery_time_type to *dtime_type*.
* Change the __sk_buff->tstamp_type to *tstamp_type*
* and set *tstamp* to the __sk_buff->tstamp together.
*
* When setting a delivery time (non zero *dtime*) to
* __sk_buff->tstamp, only BPF_SKB_DELIVERY_TIME_MONO *dtime_type*
* is supported. It is the only delivery_time_type that will be
* kept after bpf_redirect_*().
*
* If there is no need to change the __sk_buff->delivery_time_type,
* the delivery time can be directly written to __sk_buff->tstamp
* If there is no need to change the __sk_buff->tstamp_type,
* the tstamp value can be directly written to __sk_buff->tstamp
* instead.
*
* *dtime* 0 and *dtime_type* BPF_SKB_DELIVERY_TIME_NONE
* can be used to clear any delivery time stored in
* __sk_buff->tstamp.
* BPF_SKB_TSTAMP_DELIVERY_MONO is the only tstamp that
* will be kept during bpf_redirect_*(). A non zero
* *tstamp* must be used with the BPF_SKB_TSTAMP_DELIVERY_MONO
* *tstamp_type*.
*
* A BPF_SKB_TSTAMP_UNSPEC *tstamp_type* can only be used
* with a zero *tstamp*.
*
* Only IPv4 and IPv6 skb->protocol are supported.
*
@ -5116,7 +5132,17 @@ union bpf_attr {
* Return
* 0 on success.
* **-EINVAL** for invalid input
* **-EOPNOTSUPP** for unsupported delivery_time_type and protocol
* **-EOPNOTSUPP** for unsupported protocol
*
* long bpf_ima_file_hash(struct file *file, void *dst, u32 size)
* Description
* Returns a calculated IMA hash of the *file*.
* If the hash is larger than *size*, then only *size*
* bytes will be copied to *dst*
* Return
* The **hash_algo** is returned on success,
* **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
* invalid arguments are passed.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@ -5311,7 +5337,8 @@ union bpf_attr {
FN(xdp_load_bytes), \
FN(xdp_store_bytes), \
FN(copy_from_user_task), \
FN(skb_set_delivery_time), \
FN(skb_set_tstamp), \
FN(ima_file_hash), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -5502,9 +5529,12 @@ union { \
} __attribute__((aligned(8)))
enum {
BPF_SKB_DELIVERY_TIME_NONE,
BPF_SKB_DELIVERY_TIME_UNSPEC,
BPF_SKB_DELIVERY_TIME_MONO,
BPF_SKB_TSTAMP_UNSPEC,
BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
/* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
* the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
* and try to deduce it by ingress, egress or skb->sk->sk_clockid.
*/
};
/* user accessible mirror of in-kernel sk_buff.
@ -5547,7 +5577,7 @@ struct __sk_buff {
__u32 gso_segs;
__bpf_md_ptr(struct bpf_sock *, sk);
__u32 gso_size;
__u8 delivery_time_type;
__u8 tstamp_type;
__u32 :24; /* Padding, future use. */
__u64 hwtstamp;
};

View File

@ -30,6 +30,7 @@ config BPF_SYSCALL
select TASKS_TRACE_RCU
select BINARY_PRINTF
select NET_SOCK_MSG if NET
select PAGE_POOL if NET
default n
help
Enable the bpf() system call that allows to manipulate BPF programs

View File

@ -136,7 +136,7 @@ static int bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
sdata = bpf_local_storage_update(f->f_inode,
(struct bpf_local_storage_map *)map,
value, map_flags);
value, map_flags, GFP_ATOMIC);
fput(f);
return PTR_ERR_OR_ZERO(sdata);
}
@ -169,8 +169,9 @@ static int bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
return err;
}
BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
void *, value, u64, flags)
/* *gfp_flags* is a hidden argument provided by the verifier */
BPF_CALL_5(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
void *, value, u64, flags, gfp_t, gfp_flags)
{
struct bpf_local_storage_data *sdata;
@ -196,7 +197,7 @@ BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
sdata = bpf_local_storage_update(
inode, (struct bpf_local_storage_map *)map, value,
BPF_NOEXIST);
BPF_NOEXIST, gfp_flags);
return IS_ERR(sdata) ? (unsigned long)NULL :
(unsigned long)sdata->data;
}

View File

@ -63,7 +63,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
void *value, bool charge_mem)
void *value, bool charge_mem, gfp_t gfp_flags)
{
struct bpf_local_storage_elem *selem;
@ -71,7 +71,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
return NULL;
selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
GFP_ATOMIC | __GFP_NOWARN);
gfp_flags | __GFP_NOWARN);
if (selem) {
if (value)
memcpy(SDATA(selem)->data, value, smap->map.value_size);
@ -282,7 +282,8 @@ static int check_flags(const struct bpf_local_storage_data *old_sdata,
int bpf_local_storage_alloc(void *owner,
struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *first_selem)
struct bpf_local_storage_elem *first_selem,
gfp_t gfp_flags)
{
struct bpf_local_storage *prev_storage, *storage;
struct bpf_local_storage **owner_storage_ptr;
@ -293,7 +294,7 @@ int bpf_local_storage_alloc(void *owner,
return err;
storage = bpf_map_kzalloc(&smap->map, sizeof(*storage),
GFP_ATOMIC | __GFP_NOWARN);
gfp_flags | __GFP_NOWARN);
if (!storage) {
err = -ENOMEM;
goto uncharge;
@ -350,10 +351,10 @@ uncharge:
*/
struct bpf_local_storage_data *
bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
void *value, u64 map_flags)
void *value, u64 map_flags, gfp_t gfp_flags)
{
struct bpf_local_storage_data *old_sdata = NULL;
struct bpf_local_storage_elem *selem;
struct bpf_local_storage_elem *selem = NULL;
struct bpf_local_storage *local_storage;
unsigned long flags;
int err;
@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
!map_value_has_spin_lock(&smap->map)))
return ERR_PTR(-EINVAL);
if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST)
return ERR_PTR(-EINVAL);
local_storage = rcu_dereference_check(*owner_storage(smap, owner),
bpf_rcu_lock_held());
if (!local_storage || hlist_empty(&local_storage->list)) {
@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
if (err)
return ERR_PTR(err);
selem = bpf_selem_alloc(smap, owner, value, true);
selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
if (!selem)
return ERR_PTR(-ENOMEM);
err = bpf_local_storage_alloc(owner, smap, selem);
err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags);
if (err) {
kfree(selem);
mem_uncharge(smap, owner, smap->elem_size);
@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
}
}
if (gfp_flags == GFP_KERNEL) {
selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
if (!selem)
return ERR_PTR(-ENOMEM);
}
raw_spin_lock_irqsave(&local_storage->lock, flags);
/* Recheck local_storage->list under local_storage->lock */
@ -429,19 +439,21 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
goto unlock;
}
/* local_storage->lock is held. Hence, we are sure
* we can unlink and uncharge the old_sdata successfully
* later. Hence, instead of charging the new selem now
* and then uncharge the old selem later (which may cause
* a potential but unnecessary charge failure), avoid taking
* a charge at all here (the "!old_sdata" check) and the
* old_sdata will not be uncharged later during
* bpf_selem_unlink_storage_nolock().
*/
selem = bpf_selem_alloc(smap, owner, value, !old_sdata);
if (!selem) {
err = -ENOMEM;
goto unlock_err;
if (gfp_flags != GFP_KERNEL) {
/* local_storage->lock is held. Hence, we are sure
* we can unlink and uncharge the old_sdata successfully
* later. Hence, instead of charging the new selem now
* and then uncharge the old selem later (which may cause
* a potential but unnecessary charge failure), avoid taking
* a charge at all here (the "!old_sdata" check) and the
* old_sdata will not be uncharged later during
* bpf_selem_unlink_storage_nolock().
*/
selem = bpf_selem_alloc(smap, owner, value, !old_sdata, gfp_flags);
if (!selem) {
err = -ENOMEM;
goto unlock_err;
}
}
/* First, link the new selem to the map */
@ -463,6 +475,10 @@ unlock:
unlock_err:
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
if (selem) {
mem_uncharge(smap, owner, smap->elem_size);
kfree(selem);
}
return ERR_PTR(err);
}

View File

@ -99,6 +99,24 @@ static const struct bpf_func_proto bpf_ima_inode_hash_proto = {
.allowed = bpf_ima_inode_hash_allowed,
};
BPF_CALL_3(bpf_ima_file_hash, struct file *, file, void *, dst, u32, size)
{
return ima_file_hash(file, dst, size);
}
BTF_ID_LIST_SINGLE(bpf_ima_file_hash_btf_ids, struct, file)
static const struct bpf_func_proto bpf_ima_file_hash_proto = {
.func = bpf_ima_file_hash,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_BTF_ID,
.arg1_btf_id = &bpf_ima_file_hash_btf_ids[0],
.arg2_type = ARG_PTR_TO_UNINIT_MEM,
.arg3_type = ARG_CONST_SIZE,
.allowed = bpf_ima_inode_hash_allowed,
};
static const struct bpf_func_proto *
bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
@ -121,6 +139,8 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_bprm_opts_set_proto;
case BPF_FUNC_ima_inode_hash:
return prog->aux->sleepable ? &bpf_ima_inode_hash_proto : NULL;
case BPF_FUNC_ima_file_hash:
return prog->aux->sleepable ? &bpf_ima_file_hash_proto : NULL;
default:
return tracing_prog_func_proto(func_id, prog);
}
@ -167,6 +187,7 @@ BTF_ID(func, bpf_lsm_inode_setxattr)
BTF_ID(func, bpf_lsm_inode_symlink)
BTF_ID(func, bpf_lsm_inode_unlink)
BTF_ID(func, bpf_lsm_kernel_module_request)
BTF_ID(func, bpf_lsm_kernel_read_file)
BTF_ID(func, bpf_lsm_kernfs_init_security)
#ifdef CONFIG_KEYS

View File

@ -174,7 +174,8 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
bpf_task_storage_lock();
sdata = bpf_local_storage_update(
task, (struct bpf_local_storage_map *)map, value, map_flags);
task, (struct bpf_local_storage_map *)map, value, map_flags,
GFP_ATOMIC);
bpf_task_storage_unlock();
err = PTR_ERR_OR_ZERO(sdata);
@ -226,8 +227,9 @@ out:
return err;
}
BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
task, void *, value, u64, flags)
/* *gfp_flags* is a hidden argument provided by the verifier */
BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
task, void *, value, u64, flags, gfp_t, gfp_flags)
{
struct bpf_local_storage_data *sdata;
@ -250,7 +252,7 @@ BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
(flags & BPF_LOCAL_STORAGE_GET_F_CREATE))
sdata = bpf_local_storage_update(
task, (struct bpf_local_storage_map *)map, value,
BPF_NOEXIST);
BPF_NOEXIST, gfp_flags);
unlock:
bpf_task_storage_unlock();

View File

@ -525,6 +525,50 @@ s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
return -ENOENT;
}
static s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p)
{
struct btf *btf;
s32 ret;
int id;
btf = bpf_get_btf_vmlinux();
if (IS_ERR(btf))
return PTR_ERR(btf);
if (!btf)
return -EINVAL;
ret = btf_find_by_name_kind(btf, name, kind);
/* ret is never zero, since btf_find_by_name_kind returns
* positive btf_id or negative error.
*/
if (ret > 0) {
btf_get(btf);
*btf_p = btf;
return ret;
}
/* If name is not found in vmlinux's BTF then search in module's BTFs */
spin_lock_bh(&btf_idr_lock);
idr_for_each_entry(&btf_idr, btf, id) {
if (!btf_is_module(btf))
continue;
/* linear search could be slow hence unlock/lock
* the IDR to avoiding holding it for too long
*/
btf_get(btf);
spin_unlock_bh(&btf_idr_lock);
ret = btf_find_by_name_kind(btf, name, kind);
if (ret > 0) {
*btf_p = btf;
return ret;
}
spin_lock_bh(&btf_idr_lock);
btf_put(btf);
}
spin_unlock_bh(&btf_idr_lock);
return ret;
}
const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
u32 id, u32 *res_id)
{
@ -4438,8 +4482,7 @@ static int btf_parse_hdr(struct btf_verifier_env *env)
btf = env->btf;
btf_data_size = btf->data_size;
if (btf_data_size <
offsetof(struct btf_header, hdr_len) + sizeof(hdr->hdr_len)) {
if (btf_data_size < offsetofend(struct btf_header, hdr_len)) {
btf_verifier_log(env, "hdr_len not found");
return -EINVAL;
}
@ -5057,6 +5100,8 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
tag_value = __btf_name_by_offset(btf, t->name_off);
if (strcmp(tag_value, "user") == 0)
info->reg_type |= MEM_USER;
if (strcmp(tag_value, "percpu") == 0)
info->reg_type |= MEM_PERCPU;
}
/* skip modifiers */
@ -5285,12 +5330,16 @@ error:
return -EACCES;
}
/* check __user tag */
/* check type tag */
t = btf_type_by_id(btf, mtype->type);
if (btf_type_is_type_tag(t)) {
tag_value = __btf_name_by_offset(btf, t->name_off);
/* check __user tag */
if (strcmp(tag_value, "user") == 0)
tmp_flag = MEM_USER;
/* check __percpu tag */
if (strcmp(tag_value, "percpu") == 0)
tmp_flag = MEM_PERCPU;
}
stype = btf_type_skip_modifiers(btf, mtype->type, &id);
@ -5726,7 +5775,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
const char *func_name, *ref_tname;
const struct btf_type *t, *ref_t;
const struct btf_param *args;
int ref_regno = 0;
int ref_regno = 0, ret;
bool rel = false;
t = btf_type_by_id(btf, func_id);
@ -5753,6 +5802,10 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
return -EINVAL;
}
/* Only kfunc can be release func */
if (is_kfunc)
rel = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog),
BTF_KFUNC_TYPE_RELEASE, func_id);
/* check that BTF function arguments match actual types that the
* verifier sees.
*/
@ -5776,6 +5829,11 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
ref_tname = btf_name_by_offset(btf, ref_t->name_off);
ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE, rel);
if (ret < 0)
return ret;
if (btf_get_prog_ctx_type(log, btf, t,
env->prog->type, i)) {
/* If function expects ctx type in BTF check that caller
@ -5787,8 +5845,6 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
i, btf_type_str(t));
return -EINVAL;
}
if (check_ptr_off_reg(env, reg, regno))
return -EINVAL;
} else if (is_kfunc && (reg->type == PTR_TO_BTF_ID ||
(reg2btf_ids[base_type(reg->type)] && !type_flag(reg->type)))) {
const struct btf_type *reg_ref_t;
@ -5806,7 +5862,11 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
if (reg->type == PTR_TO_BTF_ID) {
reg_btf = reg->btf;
reg_ref_id = reg->btf_id;
/* Ensure only one argument is referenced PTR_TO_BTF_ID */
/* Ensure only one argument is referenced
* PTR_TO_BTF_ID, check_func_arg_reg_off relies
* on only one referenced register being allowed
* for kfuncs.
*/
if (reg->ref_obj_id) {
if (ref_obj_id) {
bpf_log(log, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
@ -5888,18 +5948,15 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
/* Either both are set, or neither */
WARN_ON_ONCE((ref_obj_id && !ref_regno) || (!ref_obj_id && ref_regno));
if (is_kfunc) {
rel = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog),
BTF_KFUNC_TYPE_RELEASE, func_id);
/* We already made sure ref_obj_id is set only for one argument */
if (rel && !ref_obj_id) {
bpf_log(log, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n",
func_name);
return -EINVAL;
}
/* Allow (!rel && ref_obj_id), so that passing such referenced PTR_TO_BTF_ID to
* other kfuncs works
*/
/* We already made sure ref_obj_id is set only for one argument. We do
* allow (!rel && ref_obj_id), so that passing such referenced
* PTR_TO_BTF_ID to other kfuncs works. Note that rel is only true when
* is_kfunc is true.
*/
if (rel && !ref_obj_id) {
bpf_log(log, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n",
func_name);
return -EINVAL;
}
/* returns argument register number > 0 in case of reference release kfunc */
return rel ? ref_regno : 0;
@ -6516,20 +6573,23 @@ struct module *btf_try_get_module(const struct btf *btf)
return res;
}
/* Returns struct btf corresponding to the struct module
*
* This function can return NULL or ERR_PTR. Note that caller must
* release reference for struct btf iff btf_is_module is true.
/* Returns struct btf corresponding to the struct module.
* This function can return NULL or ERR_PTR.
*/
static struct btf *btf_get_module_btf(const struct module *module)
{
struct btf *btf = NULL;
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
struct btf_module *btf_mod, *tmp;
#endif
struct btf *btf = NULL;
if (!module) {
btf = bpf_get_btf_vmlinux();
if (!IS_ERR_OR_NULL(btf))
btf_get(btf);
return btf;
}
if (!module)
return bpf_get_btf_vmlinux();
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
mutex_lock(&btf_module_mutex);
list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
@ -6548,7 +6608,8 @@ static struct btf *btf_get_module_btf(const struct module *module)
BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int, flags)
{
struct btf *btf;
struct btf *btf = NULL;
int btf_obj_fd = 0;
long ret;
if (flags)
@ -6557,44 +6618,17 @@ BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int
if (name_sz <= 1 || name[name_sz - 1])
return -EINVAL;
btf = bpf_get_btf_vmlinux();
if (IS_ERR(btf))
return PTR_ERR(btf);
ret = btf_find_by_name_kind(btf, name, kind);
/* ret is never zero, since btf_find_by_name_kind returns
* positive btf_id or negative error.
*/
if (ret < 0) {
struct btf *mod_btf;
int id;
/* If name is not found in vmlinux's BTF then search in module's BTFs */
spin_lock_bh(&btf_idr_lock);
idr_for_each_entry(&btf_idr, mod_btf, id) {
if (!btf_is_module(mod_btf))
continue;
/* linear search could be slow hence unlock/lock
* the IDR to avoiding holding it for too long
*/
btf_get(mod_btf);
spin_unlock_bh(&btf_idr_lock);
ret = btf_find_by_name_kind(mod_btf, name, kind);
if (ret > 0) {
int btf_obj_fd;
btf_obj_fd = __btf_new_fd(mod_btf);
if (btf_obj_fd < 0) {
btf_put(mod_btf);
return btf_obj_fd;
}
return ret | (((u64)btf_obj_fd) << 32);
}
spin_lock_bh(&btf_idr_lock);
btf_put(mod_btf);
ret = bpf_find_btf_id(name, kind, &btf);
if (ret > 0 && btf_is_module(btf)) {
btf_obj_fd = __btf_new_fd(btf);
if (btf_obj_fd < 0) {
btf_put(btf);
return btf_obj_fd;
}
spin_unlock_bh(&btf_idr_lock);
return ret | (((u64)btf_obj_fd) << 32);
}
if (ret > 0)
btf_put(btf);
return ret;
}
@ -6793,9 +6827,7 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
hook = bpf_prog_type_to_kfunc_hook(prog_type);
ret = btf_populate_kfunc_set(btf, hook, kset);
/* reference is only taken for module BTF */
if (btf_is_module(btf))
btf_put(btf);
btf_put(btf);
return ret;
}
EXPORT_SYMBOL_GPL(register_btf_kfunc_id_set);
@ -7149,6 +7181,8 @@ bpf_core_find_cands(struct bpf_core_ctx *ctx, u32 local_type_id)
main_btf = bpf_get_btf_vmlinux();
if (IS_ERR(main_btf))
return ERR_CAST(main_btf);
if (!main_btf)
return ERR_PTR(-EINVAL);
local_type = btf_type_by_id(local_btf, local_type_id);
if (!local_type)

View File

@ -33,6 +33,7 @@
#include <linux/extable.h>
#include <linux/log2.h>
#include <linux/bpf_verifier.h>
#include <linux/nodemask.h>
#include <asm/barrier.h>
#include <asm/unaligned.h>
@ -105,6 +106,7 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
fp->aux = aux;
fp->aux->prog = fp;
fp->jit_requested = ebpf_jit_enabled();
fp->blinding_requested = bpf_jit_blinding_enabled(fp);
INIT_LIST_HEAD_RCU(&fp->aux->ksym.lnode);
mutex_init(&fp->aux->used_maps_mutex);
@ -814,15 +816,9 @@ int bpf_jit_add_poke_descriptor(struct bpf_prog *prog,
* allocator. The prog_pack allocator uses HPAGE_PMD_SIZE page (2MB on x86)
* to host BPF programs.
*/
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#define BPF_PROG_PACK_SIZE HPAGE_PMD_SIZE
#else
#define BPF_PROG_PACK_SIZE PAGE_SIZE
#endif
#define BPF_PROG_CHUNK_SHIFT 6
#define BPF_PROG_CHUNK_SIZE (1 << BPF_PROG_CHUNK_SHIFT)
#define BPF_PROG_CHUNK_MASK (~(BPF_PROG_CHUNK_SIZE - 1))
#define BPF_PROG_CHUNK_COUNT (BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE)
struct bpf_prog_pack {
struct list_head list;
@ -830,30 +826,72 @@ struct bpf_prog_pack {
unsigned long bitmap[];
};
#define BPF_PROG_MAX_PACK_PROG_SIZE BPF_PROG_PACK_SIZE
#define BPF_PROG_SIZE_TO_NBITS(size) (round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE)
static size_t bpf_prog_pack_size = -1;
static size_t bpf_prog_pack_mask = -1;
static int bpf_prog_chunk_count(void)
{
WARN_ON_ONCE(bpf_prog_pack_size == -1);
return bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE;
}
static DEFINE_MUTEX(pack_mutex);
static LIST_HEAD(pack_list);
/* PMD_SIZE is not available in some special config, e.g. ARCH=arm with
* CONFIG_MMU=n. Use PAGE_SIZE in these cases.
*/
#ifdef PMD_SIZE
#define BPF_HPAGE_SIZE PMD_SIZE
#define BPF_HPAGE_MASK PMD_MASK
#else
#define BPF_HPAGE_SIZE PAGE_SIZE
#define BPF_HPAGE_MASK PAGE_MASK
#endif
static size_t select_bpf_prog_pack_size(void)
{
size_t size;
void *ptr;
size = BPF_HPAGE_SIZE * num_online_nodes();
ptr = module_alloc(size);
/* Test whether we can get huge pages. If not just use PAGE_SIZE
* packs.
*/
if (!ptr || !is_vm_area_hugepages(ptr)) {
size = PAGE_SIZE;
bpf_prog_pack_mask = PAGE_MASK;
} else {
bpf_prog_pack_mask = BPF_HPAGE_MASK;
}
vfree(ptr);
return size;
}
static struct bpf_prog_pack *alloc_new_pack(void)
{
struct bpf_prog_pack *pack;
pack = kzalloc(sizeof(*pack) + BITS_TO_BYTES(BPF_PROG_CHUNK_COUNT), GFP_KERNEL);
pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(bpf_prog_chunk_count())),
GFP_KERNEL);
if (!pack)
return NULL;
pack->ptr = module_alloc(BPF_PROG_PACK_SIZE);
pack->ptr = module_alloc(bpf_prog_pack_size);
if (!pack->ptr) {
kfree(pack);
return NULL;
}
bitmap_zero(pack->bitmap, BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE);
bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
list_add_tail(&pack->list, &pack_list);
set_vm_flush_reset_perms(pack->ptr);
set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
return pack;
}
@ -864,7 +902,11 @@ static void *bpf_prog_pack_alloc(u32 size)
unsigned long pos;
void *ptr = NULL;
if (size > BPF_PROG_MAX_PACK_PROG_SIZE) {
mutex_lock(&pack_mutex);
if (bpf_prog_pack_size == -1)
bpf_prog_pack_size = select_bpf_prog_pack_size();
if (size > bpf_prog_pack_size) {
size = round_up(size, PAGE_SIZE);
ptr = module_alloc(size);
if (ptr) {
@ -872,13 +914,12 @@ static void *bpf_prog_pack_alloc(u32 size)
set_memory_ro((unsigned long)ptr, size / PAGE_SIZE);
set_memory_x((unsigned long)ptr, size / PAGE_SIZE);
}
return ptr;
goto out;
}
mutex_lock(&pack_mutex);
list_for_each_entry(pack, &pack_list, list) {
pos = bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0,
pos = bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
nbits, 0);
if (pos < BPF_PROG_CHUNK_COUNT)
if (pos < bpf_prog_chunk_count())
goto found_free_area;
}
@ -904,13 +945,13 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
unsigned long pos;
void *pack_ptr;
if (hdr->size > BPF_PROG_MAX_PACK_PROG_SIZE) {
mutex_lock(&pack_mutex);
if (hdr->size > bpf_prog_pack_size) {
module_memfree(hdr);
return;
goto out;
}
pack_ptr = (void *)((unsigned long)hdr & ~(BPF_PROG_PACK_SIZE - 1));
mutex_lock(&pack_mutex);
pack_ptr = (void *)((unsigned long)hdr & bpf_prog_pack_mask);
list_for_each_entry(tmp, &pack_list, list) {
if (tmp->ptr == pack_ptr) {
@ -926,8 +967,8 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
pos = ((unsigned long)hdr - (unsigned long)pack_ptr) >> BPF_PROG_CHUNK_SHIFT;
bitmap_clear(pack->bitmap, pos, nbits);
if (bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0,
BPF_PROG_CHUNK_COUNT, 0) == 0) {
if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
bpf_prog_chunk_count(), 0) == 0) {
list_del(&pack->list);
module_memfree(pack->ptr);
kfree(pack);
@ -1382,7 +1423,7 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
struct bpf_insn *insn;
int i, rewritten;
if (!bpf_jit_blinding_enabled(prog) || prog->blinded)
if (!prog->blinding_requested || prog->blinded)
return prog;
clone = bpf_prog_clone_create(prog, GFP_USER);

View File

@ -225,13 +225,8 @@ BPF_CALL_2(bpf_get_current_comm, char *, buf, u32, size)
if (unlikely(!task))
goto err_clear;
strncpy(buf, task->comm, size);
/* Verifier guarantees that size > 0. For task->comm exceeding
* size, guarantee that buf is %NUL-terminated. Unconditionally
* done here to save the size test.
*/
buf[size - 1] = 0;
/* Verifier guarantees that size > 0 */
strscpy(buf, task->comm, size);
return 0;
err_clear:
memset(buf, 0, size);

View File

@ -1,8 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
LIBBPF_SRCS = $(srctree)/tools/lib/bpf/
LIBBPF_INCLUDE = $(LIBBPF_SRCS)/..
LIBBPF_INCLUDE = $(srctree)/tools/lib
obj-$(CONFIG_BPF_PRELOAD_UMD) += bpf_preload.o
CFLAGS_bpf_preload_kern.o += -I $(LIBBPF_INCLUDE)
CFLAGS_bpf_preload_kern.o += -I$(LIBBPF_INCLUDE)
bpf_preload-objs += bpf_preload_kern.o

View File

@ -176,7 +176,7 @@ build_id_valid:
}
static struct perf_callchain_entry *
get_callchain_entry_for_task(struct task_struct *task, u32 init_nr)
get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
{
#ifdef CONFIG_STACKTRACE
struct perf_callchain_entry *entry;
@ -187,9 +187,8 @@ get_callchain_entry_for_task(struct task_struct *task, u32 init_nr)
if (!entry)
return NULL;
entry->nr = init_nr +
stack_trace_save_tsk(task, (unsigned long *)(entry->ip + init_nr),
sysctl_perf_event_max_stack - init_nr, 0);
entry->nr = stack_trace_save_tsk(task, (unsigned long *)entry->ip,
max_depth, 0);
/* stack_trace_save_tsk() works on unsigned long array, while
* perf_callchain_entry uses u64 array. For 32-bit systems, it is
@ -201,7 +200,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 init_nr)
int i;
/* copy data from the end to avoid using extra buffer */
for (i = entry->nr - 1; i >= (int)init_nr; i--)
for (i = entry->nr - 1; i >= 0; i--)
to[i] = (u64)(from[i]);
}
@ -218,27 +217,19 @@ static long __bpf_get_stackid(struct bpf_map *map,
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
u32 max_depth = map->value_size / stack_map_data_size(map);
/* stack_map_alloc() checks that max_depth <= sysctl_perf_event_max_stack */
u32 init_nr = sysctl_perf_event_max_stack - max_depth;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
u32 hash, id, trace_nr, trace_len;
bool user = flags & BPF_F_USER_STACK;
u64 *ips;
bool hash_matches;
/* get_perf_callchain() guarantees that trace->nr >= init_nr
* and trace-nr <= sysctl_perf_event_max_stack, so trace_nr <= max_depth
*/
trace_nr = trace->nr - init_nr;
if (trace_nr <= skip)
if (trace->nr <= skip)
/* skipping more than usable stack trace */
return -EFAULT;
trace_nr -= skip;
trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
ips = trace->ip + skip + init_nr;
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
bucket = READ_ONCE(smap->buckets[id]);
@ -295,8 +286,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
u32 max_depth = map->value_size / stack_map_data_size(map);
/* stack_map_alloc() checks that max_depth <= sysctl_perf_event_max_stack */
u32 init_nr = sysctl_perf_event_max_stack - max_depth;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
@ -305,8 +295,12 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;
trace = get_perf_callchain(regs, init_nr, kernel, user,
sysctl_perf_event_max_stack, false, false);
max_depth += skip;
if (max_depth > sysctl_perf_event_max_stack)
max_depth = sysctl_perf_event_max_stack;
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
if (unlikely(!trace))
/* couldn't fetch the stack trace */
@ -397,7 +391,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags)
{
u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
bool user = flags & BPF_F_USER_STACK;
@ -422,30 +416,28 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto err_fault;
num_elem = size / elem_size;
if (sysctl_perf_event_max_stack < num_elem)
init_nr = 0;
else
init_nr = sysctl_perf_event_max_stack - num_elem;
max_depth = num_elem + skip;
if (sysctl_perf_event_max_stack < max_depth)
max_depth = sysctl_perf_event_max_stack;
if (trace_in)
trace = trace_in;
else if (kernel && task)
trace = get_callchain_entry_for_task(task, init_nr);
trace = get_callchain_entry_for_task(task, max_depth);
else
trace = get_perf_callchain(regs, init_nr, kernel, user,
sysctl_perf_event_max_stack,
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
if (unlikely(!trace))
goto err_fault;
trace_nr = trace->nr - init_nr;
if (trace_nr < skip)
if (trace->nr < skip)
goto err_fault;
trace_nr -= skip;
trace_nr = trace->nr - skip;
trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
copy_len = trace_nr * elem_size;
ips = trace->ip + skip + init_nr;
ips = trace->ip + skip;
if (user && user_build_id)
stack_map_get_build_id_offset(buf, ips, trace_nr, user);
else

View File

@ -32,6 +32,7 @@
#include <linux/bpf-netns.h>
#include <linux/rcupdate_trace.h>
#include <linux/memcontrol.h>
#include <linux/trace_events.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@ -3022,6 +3023,11 @@ out_put_file:
fput(perf_file);
return err;
}
#else
static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
return -EOPNOTSUPP;
}
#endif /* CONFIG_PERF_EVENTS */
#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
@ -3336,7 +3342,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
}
}
#define BPF_PROG_TEST_RUN_LAST_FIELD test.cpu
#define BPF_PROG_TEST_RUN_LAST_FIELD test.batch_size
static int bpf_prog_test_run(const union bpf_attr *attr,
union bpf_attr __user *uattr)
@ -4255,7 +4261,7 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
return -EINVAL;
}
#define BPF_LINK_CREATE_LAST_FIELD link_create.iter_info_len
#define BPF_LINK_CREATE_LAST_FIELD link_create.kprobe_multi.cookies
static int link_create(union bpf_attr *attr, bpfptr_t uattr)
{
enum bpf_prog_type ptype;
@ -4279,7 +4285,6 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
ret = tracing_bpf_link_attach(attr, uattr, prog);
goto out;
case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_KPROBE:
case BPF_PROG_TYPE_TRACEPOINT:
if (attr->link_create.attach_type != BPF_PERF_EVENT) {
ret = -EINVAL;
@ -4287,6 +4292,14 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
}
ptype = prog->type;
break;
case BPF_PROG_TYPE_KPROBE:
if (attr->link_create.attach_type != BPF_PERF_EVENT &&
attr->link_create.attach_type != BPF_TRACE_KPROBE_MULTI) {
ret = -EINVAL;
goto out;
}
ptype = prog->type;
break;
default:
ptype = attach_type_to_prog_type(attr->link_create.attach_type);
if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
@ -4318,13 +4331,16 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
ret = bpf_xdp_link_attach(attr, prog);
break;
#endif
#ifdef CONFIG_PERF_EVENTS
case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_TRACEPOINT:
case BPF_PROG_TYPE_KPROBE:
ret = bpf_perf_link_attach(attr, prog);
break;
#endif
case BPF_PROG_TYPE_KPROBE:
if (attr->link_create.attach_type == BPF_PERF_EVENT)
ret = bpf_perf_link_attach(attr, prog);
else
ret = bpf_kprobe_multi_link_attach(attr, prog);
break;
default:
ret = -EINVAL;
}

View File

@ -554,7 +554,6 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
[PTR_TO_TP_BUFFER] = "tp_buffer",
[PTR_TO_XDP_SOCK] = "xdp_sock",
[PTR_TO_BTF_ID] = "ptr_",
[PTR_TO_PERCPU_BTF_ID] = "percpu_ptr_",
[PTR_TO_MEM] = "mem",
[PTR_TO_BUF] = "buf",
[PTR_TO_FUNC] = "func",
@ -562,8 +561,7 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
};
if (type & PTR_MAYBE_NULL) {
if (base_type(type) == PTR_TO_BTF_ID ||
base_type(type) == PTR_TO_PERCPU_BTF_ID)
if (base_type(type) == PTR_TO_BTF_ID)
strncpy(postfix, "or_null_", 16);
else
strncpy(postfix, "_or_null", 16);
@ -575,6 +573,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
strncpy(prefix, "alloc_", 32);
if (type & MEM_USER)
strncpy(prefix, "user_", 32);
if (type & MEM_PERCPU)
strncpy(prefix, "percpu_", 32);
snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
prefix, str[base_type(type)], postfix);
@ -697,8 +697,7 @@ static void print_verifier_state(struct bpf_verifier_env *env,
const char *sep = "";
verbose(env, "%s", reg_type_str(env, t));
if (base_type(t) == PTR_TO_BTF_ID ||
base_type(t) == PTR_TO_PERCPU_BTF_ID)
if (base_type(t) == PTR_TO_BTF_ID)
verbose(env, "%s", kernel_type_name(reg->btf, reg->btf_id));
verbose(env, "(");
/*
@ -2783,7 +2782,6 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
case PTR_TO_XDP_SOCK:
case PTR_TO_BTF_ID:
case PTR_TO_BUF:
case PTR_TO_PERCPU_BTF_ID:
case PTR_TO_MEM:
case PTR_TO_FUNC:
case PTR_TO_MAP_KEY:
@ -3990,6 +3988,12 @@ static int __check_ptr_off_reg(struct bpf_verifier_env *env,
* is only allowed in its original, unmodified form.
*/
if (reg->off < 0) {
verbose(env, "negative offset %s ptr R%d off=%d disallowed\n",
reg_type_str(env, reg->type), regno, reg->off);
return -EACCES;
}
if (!fixed_off_ok && reg->off) {
verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n",
reg_type_str(env, reg->type), regno, reg->off);
@ -4058,9 +4062,9 @@ static int check_buffer_access(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg,
int regno, int off, int size,
bool zero_size_allowed,
const char *buf_info,
u32 *max_access)
{
const char *buf_info = type_is_rdonly_mem(reg->type) ? "rdonly" : "rdwr";
int err;
err = __check_buffer_access(env, buf_info, reg, regno, off, size);
@ -4197,6 +4201,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
return -EACCES;
}
if (reg->type & MEM_PERCPU) {
verbose(env,
"R%d is ptr_%s access percpu memory: off=%d\n",
regno, tname, off);
return -EACCES;
}
if (env->ops->btf_struct_access) {
ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
off, size, atype, &btf_id, &flag);
@ -4556,7 +4567,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
err = check_tp_buffer_access(env, reg, regno, off, size);
if (!err && t == BPF_READ && value_regno >= 0)
mark_reg_unknown(env, regs, value_regno);
} else if (reg->type == PTR_TO_BTF_ID) {
} else if (base_type(reg->type) == PTR_TO_BTF_ID &&
!type_may_be_null(reg->type)) {
err = check_ptr_to_btf_access(env, regs, regno, off, size, t,
value_regno);
} else if (reg->type == CONST_PTR_TO_MAP) {
@ -4564,7 +4576,6 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
value_regno);
} else if (base_type(reg->type) == PTR_TO_BUF) {
bool rdonly_mem = type_is_rdonly_mem(reg->type);
const char *buf_info;
u32 *max_access;
if (rdonly_mem) {
@ -4573,15 +4584,13 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
regno, reg_type_str(env, reg->type));
return -EACCES;
}
buf_info = "rdonly";
max_access = &env->prog->aux->max_rdonly_access;
} else {
buf_info = "rdwr";
max_access = &env->prog->aux->max_rdwr_access;
}
err = check_buffer_access(env, reg, regno, off, size, false,
buf_info, max_access);
max_access);
if (!err && value_regno >= 0 && (rdonly_mem || t == BPF_READ))
mark_reg_unknown(env, regs, value_regno);
@ -4802,7 +4811,7 @@ static int check_stack_range_initialized(
}
if (is_spilled_reg(&state->stack[spi]) &&
state->stack[spi].spilled_ptr.type == PTR_TO_BTF_ID)
base_type(state->stack[spi].spilled_ptr.type) == PTR_TO_BTF_ID)
goto mark;
if (is_spilled_reg(&state->stack[spi]) &&
@ -4844,7 +4853,6 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
struct bpf_call_arg_meta *meta)
{
struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
const char *buf_info;
u32 *max_access;
switch (base_type(reg->type)) {
@ -4871,15 +4879,13 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
if (meta && meta->raw_mode)
return -EACCES;
buf_info = "rdonly";
max_access = &env->prog->aux->max_rdonly_access;
} else {
buf_info = "rdwr";
max_access = &env->prog->aux->max_rdwr_access;
}
return check_buffer_access(env, reg, regno, reg->off,
access_size, zero_size_allowed,
buf_info, max_access);
max_access);
case PTR_TO_STACK:
return check_stack_range_initialized(
env,
@ -5258,7 +5264,7 @@ static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | ME
static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_PERCPU_BTF_ID } };
static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
@ -5359,6 +5365,60 @@ found:
return 0;
}
int check_func_arg_reg_off(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg, int regno,
enum bpf_arg_type arg_type,
bool is_release_func)
{
bool fixed_off_ok = false, release_reg;
enum bpf_reg_type type = reg->type;
switch ((u32)type) {
case SCALAR_VALUE:
/* Pointer types where reg offset is explicitly allowed: */
case PTR_TO_PACKET:
case PTR_TO_PACKET_META:
case PTR_TO_MAP_KEY:
case PTR_TO_MAP_VALUE:
case PTR_TO_MEM:
case PTR_TO_MEM | MEM_RDONLY:
case PTR_TO_MEM | MEM_ALLOC:
case PTR_TO_BUF:
case PTR_TO_BUF | MEM_RDONLY:
case PTR_TO_STACK:
/* Some of the argument types nevertheless require a
* zero register offset.
*/
if (arg_type != ARG_PTR_TO_ALLOC_MEM)
return 0;
break;
/* All the rest must be rejected, except PTR_TO_BTF_ID which allows
* fixed offset.
*/
case PTR_TO_BTF_ID:
/* When referenced PTR_TO_BTF_ID is passed to release function,
* it's fixed offset must be 0. We rely on the property that
* only one referenced register can be passed to BPF helpers and
* kfuncs. In the other cases, fixed offset can be non-zero.
*/
release_reg = is_release_func && reg->ref_obj_id;
if (release_reg && reg->off) {
verbose(env, "R%d must have zero offset when passed to release func\n",
regno);
return -EINVAL;
}
/* For release_reg == true, fixed_off_ok must be false, but we
* already checked and rejected reg->off != 0 above, so set to
* true to allow fixed offset for all other cases.
*/
fixed_off_ok = true;
break;
default:
break;
}
return __check_ptr_off_reg(env, reg, regno, fixed_off_ok);
}
static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
struct bpf_call_arg_meta *meta,
const struct bpf_func_proto *fn)
@ -5408,36 +5468,14 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
if (err)
return err;
switch ((u32)type) {
case SCALAR_VALUE:
/* Pointer types where reg offset is explicitly allowed: */
case PTR_TO_PACKET:
case PTR_TO_PACKET_META:
case PTR_TO_MAP_KEY:
case PTR_TO_MAP_VALUE:
case PTR_TO_MEM:
case PTR_TO_MEM | MEM_RDONLY:
case PTR_TO_MEM | MEM_ALLOC:
case PTR_TO_BUF:
case PTR_TO_BUF | MEM_RDONLY:
case PTR_TO_STACK:
/* Some of the argument types nevertheless require a
* zero register offset.
*/
if (arg_type == ARG_PTR_TO_ALLOC_MEM)
goto force_off_check;
break;
/* All the rest must be rejected: */
default:
force_off_check:
err = __check_ptr_off_reg(env, reg, regno,
type == PTR_TO_BTF_ID);
if (err < 0)
return err;
break;
}
err = check_func_arg_reg_off(env, reg, regno, arg_type, is_release_function(meta->func_id));
if (err)
return err;
skip_type_check:
/* check_func_arg_reg_off relies on only one referenced register being
* allowed for BPF helpers.
*/
if (reg->ref_obj_id) {
if (meta->ref_obj_id) {
verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
@ -9638,7 +9676,6 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
dst_reg->mem_size = aux->btf_var.mem_size;
break;
case PTR_TO_BTF_ID:
case PTR_TO_PERCPU_BTF_ID:
dst_reg->btf = aux->btf_var.btf;
dst_reg->btf_id = aux->btf_var.btf_id;
break;
@ -10363,8 +10400,7 @@ static void adjust_btf_func(struct bpf_verifier_env *env)
aux->func_info[i].insn_off = env->subprog_info[i].start;
}
#define MIN_BPF_LINEINFO_SIZE (offsetof(struct bpf_line_info, line_col) + \
sizeof(((struct bpf_line_info *)(0))->line_col))
#define MIN_BPF_LINEINFO_SIZE offsetofend(struct bpf_line_info, line_col)
#define MAX_LINEINFO_REC_SIZE MAX_FUNCINFO_REC_SIZE
static int check_btf_line(struct bpf_verifier_env *env,
@ -11838,7 +11874,7 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env,
type = t->type;
t = btf_type_skip_modifiers(btf, type, NULL);
if (percpu) {
aux->btf_var.reg_type = PTR_TO_PERCPU_BTF_ID;
aux->btf_var.reg_type = PTR_TO_BTF_ID | MEM_PERCPU;
aux->btf_var.btf = btf;
aux->btf_var.btf_id = type;
} else if (!btf_type_is_struct(t)) {
@ -12987,6 +13023,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
func[i]->aux->name[0] = 'F';
func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
func[i]->jit_requested = 1;
func[i]->blinding_requested = prog->blinding_requested;
func[i]->aux->kfunc_tab = prog->aux->kfunc_tab;
func[i]->aux->kfunc_btf_tab = prog->aux->kfunc_btf_tab;
func[i]->aux->linfo = prog->aux->linfo;
@ -13110,6 +13147,7 @@ out_free:
out_undo_insn:
/* cleanup main prog to be interpreted */
prog->jit_requested = 0;
prog->blinding_requested = 0;
for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
if (!bpf_pseudo_call(insn))
continue;
@ -13203,7 +13241,6 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
{
struct bpf_prog *prog = env->prog;
enum bpf_attach_type eatype = prog->expected_attach_type;
bool expect_blinding = bpf_jit_blinding_enabled(prog);
enum bpf_prog_type prog_type = resolve_prog_type(prog);
struct bpf_insn *insn = prog->insnsi;
const struct bpf_func_proto *fn;
@ -13367,7 +13404,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
insn->code = BPF_JMP | BPF_TAIL_CALL;
aux = &env->insn_aux_data[i + delta];
if (env->bpf_capable && !expect_blinding &&
if (env->bpf_capable && !prog->blinding_requested &&
prog->jit_requested &&
!bpf_map_key_poisoned(aux) &&
!bpf_map_ptr_poisoned(aux) &&
@ -13455,6 +13492,26 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
goto patch_call_imm;
}
if (insn->imm == BPF_FUNC_task_storage_get ||
insn->imm == BPF_FUNC_sk_storage_get ||
insn->imm == BPF_FUNC_inode_storage_get) {
if (env->prog->aux->sleepable)
insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL);
else
insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
insn_buf[1] = *insn;
cnt = 2;
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
if (!new_prog)
return -ENOMEM;
delta += cnt - 1;
env->prog = prog = new_prog;
insn = new_prog->insnsi + i + delta;
goto patch_call_imm;
}
/* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
* and other inlining handlers are currently limited to 64 bit
* only.

View File

@ -64,6 +64,7 @@
#include <linux/compat.h>
#include <linux/io_uring.h>
#include <linux/kprobes.h>
#include <linux/rethook.h>
#include <linux/uaccess.h>
#include <asm/unistd.h>
@ -169,6 +170,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);
kprobe_flush_task(tsk);
rethook_flush_task(tsk);
perf_event_delayed_put(tsk);
trace_sched_process_free(tsk);
put_task_struct(tsk);

View File

@ -2255,6 +2255,9 @@ static __latent_entropy struct task_struct *copy_process(
#ifdef CONFIG_KRETPROBES
p->kretprobe_instances.first = NULL;
#endif
#ifdef CONFIG_RETHOOK
p->rethooks.first = NULL;
#endif
/*
* Ensure that the cgroup subsystem policies allow the new process to be

View File

@ -212,6 +212,10 @@ unsigned long kallsyms_lookup_name(const char *name)
unsigned long i;
unsigned int off;
/* Skip the search for empty string. */
if (!*name)
return 0;
for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));

View File

@ -10,6 +10,17 @@ config USER_STACKTRACE_SUPPORT
config NOP_TRACER
bool
config HAVE_RETHOOK
bool
config RETHOOK
bool
depends on HAVE_RETHOOK
help
Enable generic return hooking feature. This is an internal
API, which will be used by other function-entry hooking
features like fprobe and kprobes.
config HAVE_FUNCTION_TRACER
bool
help
@ -236,6 +247,21 @@ config DYNAMIC_FTRACE_WITH_ARGS
depends on DYNAMIC_FTRACE
depends on HAVE_DYNAMIC_FTRACE_WITH_ARGS
config FPROBE
bool "Kernel Function Probe (fprobe)"
depends on FUNCTION_TRACER
depends on DYNAMIC_FTRACE_WITH_REGS
depends on HAVE_RETHOOK
select RETHOOK
default n
help
This option enables kernel function probe (fprobe) based on ftrace.
The fprobe is similar to kprobes, but probes only for kernel function
entries and exits. This also can probe multiple functions by one
fprobe.
If unsure, say N.
config FUNCTION_PROFILER
bool "Kernel function profiler"
depends on FUNCTION_TRACER

View File

@ -97,6 +97,8 @@ obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
obj-$(CONFIG_FPROBE) += fprobe.o
obj-$(CONFIG_RETHOOK) += rethook.o
obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o

View File

@ -17,6 +17,9 @@
#include <linux/error-injection.h>
#include <linux/btf_ids.h>
#include <linux/bpf_lsm.h>
#include <linux/fprobe.h>
#include <linux/bsearch.h>
#include <linux/sort.h>
#include <net/bpf_sk_storage.h>
@ -77,6 +80,8 @@ u64 bpf_get_stack(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
static int bpf_btf_printf_prepare(struct btf_ptr *ptr, u32 btf_ptr_size,
u64 flags, const struct btf **btf,
s32 *btf_id);
static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx);
static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx);
/**
* trace_call_bpf - invoke BPF program
@ -1036,6 +1041,30 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
.arg1_type = ARG_PTR_TO_CTX,
};
BPF_CALL_1(bpf_get_func_ip_kprobe_multi, struct pt_regs *, regs)
{
return bpf_kprobe_multi_entry_ip(current->bpf_ctx);
}
static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe_multi = {
.func = bpf_get_func_ip_kprobe_multi,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_CTX,
};
BPF_CALL_1(bpf_get_attach_cookie_kprobe_multi, struct pt_regs *, regs)
{
return bpf_kprobe_multi_cookie(current->bpf_ctx);
}
static const struct bpf_func_proto bpf_get_attach_cookie_proto_kmulti = {
.func = bpf_get_attach_cookie_kprobe_multi,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_CTX,
};
BPF_CALL_1(bpf_get_attach_cookie_trace, void *, ctx)
{
struct bpf_trace_run_ctx *run_ctx;
@ -1279,9 +1308,13 @@ kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_override_return_proto;
#endif
case BPF_FUNC_get_func_ip:
return &bpf_get_func_ip_proto_kprobe;
return prog->expected_attach_type == BPF_TRACE_KPROBE_MULTI ?
&bpf_get_func_ip_proto_kprobe_multi :
&bpf_get_func_ip_proto_kprobe;
case BPF_FUNC_get_attach_cookie:
return &bpf_get_attach_cookie_proto_trace;
return prog->expected_attach_type == BPF_TRACE_KPROBE_MULTI ?
&bpf_get_attach_cookie_proto_kmulti :
&bpf_get_attach_cookie_proto_trace;
default:
return bpf_tracing_func_proto(func_id, prog);
}
@ -2181,3 +2214,314 @@ static int __init bpf_event_init(void)
fs_initcall(bpf_event_init);
#endif /* CONFIG_MODULES */
#ifdef CONFIG_FPROBE
struct bpf_kprobe_multi_link {
struct bpf_link link;
struct fprobe fp;
unsigned long *addrs;
u64 *cookies;
u32 cnt;
};
struct bpf_kprobe_multi_run_ctx {
struct bpf_run_ctx run_ctx;
struct bpf_kprobe_multi_link *link;
unsigned long entry_ip;
};
static void bpf_kprobe_multi_link_release(struct bpf_link *link)
{
struct bpf_kprobe_multi_link *kmulti_link;
kmulti_link = container_of(link, struct bpf_kprobe_multi_link, link);
unregister_fprobe(&kmulti_link->fp);
}
static void bpf_kprobe_multi_link_dealloc(struct bpf_link *link)
{
struct bpf_kprobe_multi_link *kmulti_link;
kmulti_link = container_of(link, struct bpf_kprobe_multi_link, link);
kvfree(kmulti_link->addrs);
kvfree(kmulti_link->cookies);
kfree(kmulti_link);
}
static const struct bpf_link_ops bpf_kprobe_multi_link_lops = {
.release = bpf_kprobe_multi_link_release,
.dealloc = bpf_kprobe_multi_link_dealloc,
};
static void bpf_kprobe_multi_cookie_swap(void *a, void *b, int size, const void *priv)
{
const struct bpf_kprobe_multi_link *link = priv;
unsigned long *addr_a = a, *addr_b = b;
u64 *cookie_a, *cookie_b;
unsigned long tmp1;
u64 tmp2;
cookie_a = link->cookies + (addr_a - link->addrs);
cookie_b = link->cookies + (addr_b - link->addrs);
/* swap addr_a/addr_b and cookie_a/cookie_b values */
tmp1 = *addr_a; *addr_a = *addr_b; *addr_b = tmp1;
tmp2 = *cookie_a; *cookie_a = *cookie_b; *cookie_b = tmp2;
}
static int __bpf_kprobe_multi_cookie_cmp(const void *a, const void *b)
{
const unsigned long *addr_a = a, *addr_b = b;
if (*addr_a == *addr_b)
return 0;
return *addr_a < *addr_b ? -1 : 1;
}
static int bpf_kprobe_multi_cookie_cmp(const void *a, const void *b, const void *priv)
{
return __bpf_kprobe_multi_cookie_cmp(a, b);
}
static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx)
{
struct bpf_kprobe_multi_run_ctx *run_ctx;
struct bpf_kprobe_multi_link *link;
u64 *cookie, entry_ip;
unsigned long *addr;
if (WARN_ON_ONCE(!ctx))
return 0;
run_ctx = container_of(current->bpf_ctx, struct bpf_kprobe_multi_run_ctx, run_ctx);
link = run_ctx->link;
if (!link->cookies)
return 0;
entry_ip = run_ctx->entry_ip;
addr = bsearch(&entry_ip, link->addrs, link->cnt, sizeof(entry_ip),
__bpf_kprobe_multi_cookie_cmp);
if (!addr)
return 0;
cookie = link->cookies + (addr - link->addrs);
return *cookie;
}
static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
{
struct bpf_kprobe_multi_run_ctx *run_ctx;
run_ctx = container_of(current->bpf_ctx, struct bpf_kprobe_multi_run_ctx, run_ctx);
return run_ctx->entry_ip;
}
static int
kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
unsigned long entry_ip, struct pt_regs *regs)
{
struct bpf_kprobe_multi_run_ctx run_ctx = {
.link = link,
.entry_ip = entry_ip,
};
struct bpf_run_ctx *old_run_ctx;
int err;
if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
err = 0;
goto out;
}
migrate_disable();
rcu_read_lock();
old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
err = bpf_prog_run(link->link.prog, regs);
bpf_reset_run_ctx(old_run_ctx);
rcu_read_unlock();
migrate_enable();
out:
__this_cpu_dec(bpf_prog_active);
return err;
}
static void
kprobe_multi_link_handler(struct fprobe *fp, unsigned long entry_ip,
struct pt_regs *regs)
{
struct bpf_kprobe_multi_link *link;
link = container_of(fp, struct bpf_kprobe_multi_link, fp);
kprobe_multi_link_prog_run(link, entry_ip, regs);
}
static int
kprobe_multi_resolve_syms(const void *usyms, u32 cnt,
unsigned long *addrs)
{
unsigned long addr, size;
const char **syms;
int err = -ENOMEM;
unsigned int i;
char *func;
size = cnt * sizeof(*syms);
syms = kvzalloc(size, GFP_KERNEL);
if (!syms)
return -ENOMEM;
func = kmalloc(KSYM_NAME_LEN, GFP_KERNEL);
if (!func)
goto error;
if (copy_from_user(syms, usyms, size)) {
err = -EFAULT;
goto error;
}
for (i = 0; i < cnt; i++) {
err = strncpy_from_user(func, syms[i], KSYM_NAME_LEN);
if (err == KSYM_NAME_LEN)
err = -E2BIG;
if (err < 0)
goto error;
err = -EINVAL;
addr = kallsyms_lookup_name(func);
if (!addr)
goto error;
if (!kallsyms_lookup_size_offset(addr, &size, NULL))
goto error;
addr = ftrace_location_range(addr, addr + size - 1);
if (!addr)
goto error;
addrs[i] = addr;
}
err = 0;
error:
kvfree(syms);
kfree(func);
return err;
}
int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
struct bpf_kprobe_multi_link *link = NULL;
struct bpf_link_primer link_primer;
void __user *ucookies;
unsigned long *addrs;
u32 flags, cnt, size;
void __user *uaddrs;
u64 *cookies = NULL;
void __user *usyms;
int err;
/* no support for 32bit archs yet */
if (sizeof(u64) != sizeof(void *))
return -EOPNOTSUPP;
if (prog->expected_attach_type != BPF_TRACE_KPROBE_MULTI)
return -EINVAL;
flags = attr->link_create.kprobe_multi.flags;
if (flags & ~BPF_F_KPROBE_MULTI_RETURN)
return -EINVAL;
uaddrs = u64_to_user_ptr(attr->link_create.kprobe_multi.addrs);
usyms = u64_to_user_ptr(attr->link_create.kprobe_multi.syms);
if (!!uaddrs == !!usyms)
return -EINVAL;
cnt = attr->link_create.kprobe_multi.cnt;
if (!cnt)
return -EINVAL;
size = cnt * sizeof(*addrs);
addrs = kvmalloc(size, GFP_KERNEL);
if (!addrs)
return -ENOMEM;
if (uaddrs) {
if (copy_from_user(addrs, uaddrs, size)) {
err = -EFAULT;
goto error;
}
} else {
err = kprobe_multi_resolve_syms(usyms, cnt, addrs);
if (err)
goto error;
}
ucookies = u64_to_user_ptr(attr->link_create.kprobe_multi.cookies);
if (ucookies) {
cookies = kvmalloc(size, GFP_KERNEL);
if (!cookies) {
err = -ENOMEM;
goto error;
}
if (copy_from_user(cookies, ucookies, size)) {
err = -EFAULT;
goto error;
}
}
link = kzalloc(sizeof(*link), GFP_KERNEL);
if (!link) {
err = -ENOMEM;
goto error;
}
bpf_link_init(&link->link, BPF_LINK_TYPE_KPROBE_MULTI,
&bpf_kprobe_multi_link_lops, prog);
err = bpf_link_prime(&link->link, &link_primer);
if (err)
goto error;
if (flags & BPF_F_KPROBE_MULTI_RETURN)
link->fp.exit_handler = kprobe_multi_link_handler;
else
link->fp.entry_handler = kprobe_multi_link_handler;
link->addrs = addrs;
link->cookies = cookies;
link->cnt = cnt;
if (cookies) {
/*
* Sorting addresses will trigger sorting cookies as well
* (check bpf_kprobe_multi_cookie_swap). This way we can
* find cookie based on the address in bpf_get_attach_cookie
* helper.
*/
sort_r(addrs, cnt, sizeof(*addrs),
bpf_kprobe_multi_cookie_cmp,
bpf_kprobe_multi_cookie_swap,
link);
}
err = register_fprobe_ips(&link->fp, addrs, cnt);
if (err) {
bpf_link_cleanup(&link_primer);
return err;
}
return bpf_link_settle(&link_primer);
error:
kfree(link);
kvfree(addrs);
kvfree(cookies);
return err;
}
#else /* !CONFIG_FPROBE */
int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
return -EOPNOTSUPP;
}
static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx)
{
return 0;
}
static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
{
return 0;
}
#endif

332
kernel/trace/fprobe.c Normal file
View File

@ -0,0 +1,332 @@
// SPDX-License-Identifier: GPL-2.0
/*
* fprobe - Simple ftrace probe wrapper for function entry.
*/
#define pr_fmt(fmt) "fprobe: " fmt
#include <linux/err.h>
#include <linux/fprobe.h>
#include <linux/kallsyms.h>
#include <linux/kprobes.h>
#include <linux/rethook.h>
#include <linux/slab.h>
#include <linux/sort.h>
#include "trace.h"
struct fprobe_rethook_node {
struct rethook_node node;
unsigned long entry_ip;
};
static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct ftrace_regs *fregs)
{
struct fprobe_rethook_node *fpr;
struct rethook_node *rh;
struct fprobe *fp;
int bit;
fp = container_of(ops, struct fprobe, ops);
if (fprobe_disabled(fp))
return;
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0) {
fp->nmissed++;
return;
}
if (fp->entry_handler)
fp->entry_handler(fp, ip, ftrace_get_regs(fregs));
if (fp->exit_handler) {
rh = rethook_try_get(fp->rethook);
if (!rh) {
fp->nmissed++;
goto out;
}
fpr = container_of(rh, struct fprobe_rethook_node, node);
fpr->entry_ip = ip;
rethook_hook(rh, ftrace_get_regs(fregs), true);
}
out:
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(fprobe_handler);
static void fprobe_kprobe_handler(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct ftrace_regs *fregs)
{
struct fprobe *fp = container_of(ops, struct fprobe, ops);
if (unlikely(kprobe_running())) {
fp->nmissed++;
return;
}
kprobe_busy_begin();
fprobe_handler(ip, parent_ip, ops, fregs);
kprobe_busy_end();
}
static void fprobe_exit_handler(struct rethook_node *rh, void *data,
struct pt_regs *regs)
{
struct fprobe *fp = (struct fprobe *)data;
struct fprobe_rethook_node *fpr;
if (!fp || fprobe_disabled(fp))
return;
fpr = container_of(rh, struct fprobe_rethook_node, node);
fp->exit_handler(fp, fpr->entry_ip, regs);
}
NOKPROBE_SYMBOL(fprobe_exit_handler);
/* Convert ftrace location address from symbols */
static unsigned long *get_ftrace_locations(const char **syms, int num)
{
unsigned long addr, size;
unsigned long *addrs;
int i;
/* Convert symbols to symbol address */
addrs = kcalloc(num, sizeof(*addrs), GFP_KERNEL);
if (!addrs)
return ERR_PTR(-ENOMEM);
for (i = 0; i < num; i++) {
addr = kallsyms_lookup_name(syms[i]);
if (!addr) /* Maybe wrong symbol */
goto error;
/* Convert symbol address to ftrace location. */
if (!kallsyms_lookup_size_offset(addr, &size, NULL) || !size)
goto error;
addr = ftrace_location_range(addr, addr + size - 1);
if (!addr) /* No dynamic ftrace there. */
goto error;
addrs[i] = addr;
}
return addrs;
error:
kfree(addrs);
return ERR_PTR(-ENOENT);
}
static void fprobe_init(struct fprobe *fp)
{
fp->nmissed = 0;
if (fprobe_shared_with_kprobes(fp))
fp->ops.func = fprobe_kprobe_handler;
else
fp->ops.func = fprobe_handler;
fp->ops.flags |= FTRACE_OPS_FL_SAVE_REGS;
}
static int fprobe_init_rethook(struct fprobe *fp, int num)
{
int i, size;
if (num < 0)
return -EINVAL;
if (!fp->exit_handler) {
fp->rethook = NULL;
return 0;
}
/* Initialize rethook if needed */
size = num * num_possible_cpus() * 2;
if (size < 0)
return -E2BIG;
fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler);
for (i = 0; i < size; i++) {
struct rethook_node *node;
node = kzalloc(sizeof(struct fprobe_rethook_node), GFP_KERNEL);
if (!node) {
rethook_free(fp->rethook);
fp->rethook = NULL;
return -ENOMEM;
}
rethook_add_node(fp->rethook, node);
}
return 0;
}
static void fprobe_fail_cleanup(struct fprobe *fp)
{
if (fp->rethook) {
/* Don't need to cleanup rethook->handler because this is not used. */
rethook_free(fp->rethook);
fp->rethook = NULL;
}
ftrace_free_filter(&fp->ops);
}
/**
* register_fprobe() - Register fprobe to ftrace by pattern.
* @fp: A fprobe data structure to be registered.
* @filter: A wildcard pattern of probed symbols.
* @notfilter: A wildcard pattern of NOT probed symbols.
*
* Register @fp to ftrace for enabling the probe on the symbols matched to @filter.
* If @notfilter is not NULL, the symbols matched the @notfilter are not probed.
*
* Return 0 if @fp is registered successfully, -errno if not.
*/
int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter)
{
struct ftrace_hash *hash;
unsigned char *str;
int ret, len;
if (!fp || !filter)
return -EINVAL;
fprobe_init(fp);
len = strlen(filter);
str = kstrdup(filter, GFP_KERNEL);
ret = ftrace_set_filter(&fp->ops, str, len, 0);
kfree(str);
if (ret)
return ret;
if (notfilter) {
len = strlen(notfilter);
str = kstrdup(notfilter, GFP_KERNEL);
ret = ftrace_set_notrace(&fp->ops, str, len, 0);
kfree(str);
if (ret)
goto out;
}
/* TODO:
* correctly calculate the total number of filtered symbols
* from both filter and notfilter.
*/
hash = fp->ops.local_hash.filter_hash;
if (WARN_ON_ONCE(!hash))
goto out;
ret = fprobe_init_rethook(fp, (int)hash->count);
if (!ret)
ret = register_ftrace_function(&fp->ops);
out:
if (ret)
fprobe_fail_cleanup(fp);
return ret;
}
EXPORT_SYMBOL_GPL(register_fprobe);
/**
* register_fprobe_ips() - Register fprobe to ftrace by address.
* @fp: A fprobe data structure to be registered.
* @addrs: An array of target ftrace location addresses.
* @num: The number of entries of @addrs.
*
* Register @fp to ftrace for enabling the probe on the address given by @addrs.
* The @addrs must be the addresses of ftrace location address, which may be
* the symbol address + arch-dependent offset.
* If you unsure what this mean, please use other registration functions.
*
* Return 0 if @fp is registered successfully, -errno if not.
*/
int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num)
{
int ret;
if (!fp || !addrs || num <= 0)
return -EINVAL;
fprobe_init(fp);
ret = ftrace_set_filter_ips(&fp->ops, addrs, num, 0, 0);
if (ret)
return ret;
ret = fprobe_init_rethook(fp, num);
if (!ret)
ret = register_ftrace_function(&fp->ops);
if (ret)
fprobe_fail_cleanup(fp);
return ret;
}
EXPORT_SYMBOL_GPL(register_fprobe_ips);
/**
* register_fprobe_syms() - Register fprobe to ftrace by symbols.
* @fp: A fprobe data structure to be registered.
* @syms: An array of target symbols.
* @num: The number of entries of @syms.
*
* Register @fp to the symbols given by @syms array. This will be useful if
* you are sure the symbols exist in the kernel.
*
* Return 0 if @fp is registered successfully, -errno if not.
*/
int register_fprobe_syms(struct fprobe *fp, const char **syms, int num)
{
unsigned long *addrs;
int ret;
if (!fp || !syms || num <= 0)
return -EINVAL;
addrs = get_ftrace_locations(syms, num);
if (IS_ERR(addrs))
return PTR_ERR(addrs);
ret = register_fprobe_ips(fp, addrs, num);
kfree(addrs);
return ret;
}
EXPORT_SYMBOL_GPL(register_fprobe_syms);
/**
* unregister_fprobe() - Unregister fprobe from ftrace
* @fp: A fprobe data structure to be unregistered.
*
* Unregister fprobe (and remove ftrace hooks from the function entries).
*
* Return 0 if @fp is unregistered successfully, -errno if not.
*/
int unregister_fprobe(struct fprobe *fp)
{
int ret;
if (!fp || fp->ops.func != fprobe_handler)
return -EINVAL;
/*
* rethook_free() starts disabling the rethook, but the rethook handlers
* may be running on other processors at this point. To make sure that all
* current running handlers are finished, call unregister_ftrace_function()
* after this.
*/
if (fp->rethook)
rethook_free(fp->rethook);
ret = unregister_ftrace_function(&fp->ops);
if (ret < 0)
return ret;
ftrace_free_filter(&fp->ops);
return ret;
}
EXPORT_SYMBOL_GPL(unregister_fprobe);

View File

@ -4958,7 +4958,7 @@ ftrace_notrace_write(struct file *file, const char __user *ubuf,
}
static int
ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
__ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
{
struct ftrace_func_entry *entry;
@ -4976,9 +4976,30 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
return add_hash_entry(hash, ip);
}
static int
ftrace_match_addr(struct ftrace_hash *hash, unsigned long *ips,
unsigned int cnt, int remove)
{
unsigned int i;
int err;
for (i = 0; i < cnt; i++) {
err = __ftrace_match_addr(hash, ips[i], remove);
if (err) {
/*
* This expects the @hash is a temporary hash and if this
* fails the caller must free the @hash.
*/
return err;
}
}
return 0;
}
static int
ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
unsigned long ip, int remove, int reset, int enable)
unsigned long *ips, unsigned int cnt,
int remove, int reset, int enable)
{
struct ftrace_hash **orig_hash;
struct ftrace_hash *hash;
@ -5008,8 +5029,8 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
ret = -EINVAL;
goto out_regex_unlock;
}
if (ip) {
ret = ftrace_match_addr(hash, ip, remove);
if (ips) {
ret = ftrace_match_addr(hash, ips, cnt, remove);
if (ret < 0)
goto out_regex_unlock;
}
@ -5026,10 +5047,10 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
}
static int
ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, int remove,
int reset, int enable)
ftrace_set_addr(struct ftrace_ops *ops, unsigned long *ips, unsigned int cnt,
int remove, int reset, int enable)
{
return ftrace_set_hash(ops, NULL, 0, ip, remove, reset, enable);
return ftrace_set_hash(ops, NULL, 0, ips, cnt, remove, reset, enable);
}
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
@ -5634,10 +5655,29 @@ int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
int remove, int reset)
{
ftrace_ops_init(ops);
return ftrace_set_addr(ops, ip, remove, reset, 1);
return ftrace_set_addr(ops, &ip, 1, remove, reset, 1);
}
EXPORT_SYMBOL_GPL(ftrace_set_filter_ip);
/**
* ftrace_set_filter_ips - set functions to filter on in ftrace by addresses
* @ops - the ops to set the filter with
* @ips - the array of addresses to add to or remove from the filter.
* @cnt - the number of addresses in @ips
* @remove - non zero to remove ips from the filter
* @reset - non zero to reset all filters before applying this filter.
*
* Filters denote which functions should be enabled when tracing is enabled
* If @ips array or any ip specified within is NULL , it fails to update filter.
*/
int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips,
unsigned int cnt, int remove, int reset)
{
ftrace_ops_init(ops);
return ftrace_set_addr(ops, ips, cnt, remove, reset, 1);
}
EXPORT_SYMBOL_GPL(ftrace_set_filter_ips);
/**
* ftrace_ops_set_global_filter - setup ops to use global filters
* @ops - the ops which will use the global filters
@ -5659,7 +5699,7 @@ static int
ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
int reset, int enable)
{
return ftrace_set_hash(ops, buf, len, 0, 0, reset, enable);
return ftrace_set_hash(ops, buf, len, NULL, 0, 0, reset, enable);
}
/**

317
kernel/trace/rethook.c Normal file
View File

@ -0,0 +1,317 @@
// SPDX-License-Identifier: GPL-2.0
#define pr_fmt(fmt) "rethook: " fmt
#include <linux/bug.h>
#include <linux/kallsyms.h>
#include <linux/kprobes.h>
#include <linux/preempt.h>
#include <linux/rethook.h>
#include <linux/slab.h>
#include <linux/sort.h>
/* Return hook list (shadow stack by list) */
/*
* This function is called from delayed_put_task_struct() when a task is
* dead and cleaned up to recycle any kretprobe instances associated with
* this task. These left over instances represent probed functions that
* have been called but will never return.
*/
void rethook_flush_task(struct task_struct *tk)
{
struct rethook_node *rhn;
struct llist_node *node;
node = __llist_del_all(&tk->rethooks);
while (node) {
rhn = container_of(node, struct rethook_node, llist);
node = node->next;
preempt_disable();
rethook_recycle(rhn);
preempt_enable();
}
}
static void rethook_free_rcu(struct rcu_head *head)
{
struct rethook *rh = container_of(head, struct rethook, rcu);
struct rethook_node *rhn;
struct freelist_node *node;
int count = 1;
node = rh->pool.head;
while (node) {
rhn = container_of(node, struct rethook_node, freelist);
node = node->next;
kfree(rhn);
count++;
}
/* The rh->ref is the number of pooled node + 1 */
if (refcount_sub_and_test(count, &rh->ref))
kfree(rh);
}
/**
* rethook_free() - Free struct rethook.
* @rh: the struct rethook to be freed.
*
* Free the rethook. Before calling this function, user must ensure the
* @rh::data is cleaned if needed (or, the handler can access it after
* calling this function.) This function will set the @rh to be freed
* after all rethook_node are freed (not soon). And the caller must
* not touch @rh after calling this.
*/
void rethook_free(struct rethook *rh)
{
rcu_assign_pointer(rh->handler, NULL);
call_rcu(&rh->rcu, rethook_free_rcu);
}
/**
* rethook_alloc() - Allocate struct rethook.
* @data: a data to pass the @handler when hooking the return.
* @handler: the return hook callback function.
*
* Allocate and initialize a new rethook with @data and @handler.
* Return NULL if memory allocation fails or @handler is NULL.
* Note that @handler == NULL means this rethook is going to be freed.
*/
struct rethook *rethook_alloc(void *data, rethook_handler_t handler)
{
struct rethook *rh = kzalloc(sizeof(struct rethook), GFP_KERNEL);
if (!rh || !handler)
return NULL;
rh->data = data;
rh->handler = handler;
rh->pool.head = NULL;
refcount_set(&rh->ref, 1);
return rh;
}
/**
* rethook_add_node() - Add a new node to the rethook.
* @rh: the struct rethook.
* @node: the struct rethook_node to be added.
*
* Add @node to @rh. User must allocate @node (as a part of user's
* data structure.) The @node fields are initialized in this function.
*/
void rethook_add_node(struct rethook *rh, struct rethook_node *node)
{
node->rethook = rh;
freelist_add(&node->freelist, &rh->pool);
refcount_inc(&rh->ref);
}
static void free_rethook_node_rcu(struct rcu_head *head)
{
struct rethook_node *node = container_of(head, struct rethook_node, rcu);
if (refcount_dec_and_test(&node->rethook->ref))
kfree(node->rethook);
kfree(node);
}
/**
* rethook_recycle() - return the node to rethook.
* @node: The struct rethook_node to be returned.
*
* Return back the @node to @node::rethook. If the @node::rethook is already
* marked as freed, this will free the @node.
*/
void rethook_recycle(struct rethook_node *node)
{
lockdep_assert_preemption_disabled();
if (likely(READ_ONCE(node->rethook->handler)))
freelist_add(&node->freelist, &node->rethook->pool);
else
call_rcu(&node->rcu, free_rethook_node_rcu);
}
NOKPROBE_SYMBOL(rethook_recycle);
/**
* rethook_try_get() - get an unused rethook node.
* @rh: The struct rethook which pools the nodes.
*
* Get an unused rethook node from @rh. If the node pool is empty, this
* will return NULL. Caller must disable preemption.
*/
struct rethook_node *rethook_try_get(struct rethook *rh)
{
rethook_handler_t handler = READ_ONCE(rh->handler);
struct freelist_node *fn;
lockdep_assert_preemption_disabled();
/* Check whether @rh is going to be freed. */
if (unlikely(!handler))
return NULL;
fn = freelist_try_get(&rh->pool);
if (!fn)
return NULL;
return container_of(fn, struct rethook_node, freelist);
}
NOKPROBE_SYMBOL(rethook_try_get);
/**
* rethook_hook() - Hook the current function return.
* @node: The struct rethook node to hook the function return.
* @regs: The struct pt_regs for the function entry.
* @mcount: True if this is called from mcount(ftrace) context.
*
* Hook the current running function return. This must be called when the
* function entry (or at least @regs must be the registers of the function
* entry.) @mcount is used for identifying the context. If this is called
* from ftrace (mcount) callback, @mcount must be set true. If this is called
* from the real function entry (e.g. kprobes) @mcount must be set false.
* This is because the way to hook the function return depends on the context.
*/
void rethook_hook(struct rethook_node *node, struct pt_regs *regs, bool mcount)
{
arch_rethook_prepare(node, regs, mcount);
__llist_add(&node->llist, &current->rethooks);
}
NOKPROBE_SYMBOL(rethook_hook);
/* This assumes the 'tsk' is the current task or is not running. */
static unsigned long __rethook_find_ret_addr(struct task_struct *tsk,
struct llist_node **cur)
{
struct rethook_node *rh = NULL;
struct llist_node *node = *cur;
if (!node)
node = tsk->rethooks.first;
else
node = node->next;
while (node) {
rh = container_of(node, struct rethook_node, llist);
if (rh->ret_addr != (unsigned long)arch_rethook_trampoline) {
*cur = node;
return rh->ret_addr;
}
node = node->next;
}
return 0;
}
NOKPROBE_SYMBOL(__rethook_find_ret_addr);
/**
* rethook_find_ret_addr -- Find correct return address modified by rethook
* @tsk: Target task
* @frame: A frame pointer
* @cur: a storage of the loop cursor llist_node pointer for next call
*
* Find the correct return address modified by a rethook on @tsk in unsigned
* long type.
* The @tsk must be 'current' or a task which is not running. @frame is a hint
* to get the currect return address - which is compared with the
* rethook::frame field. The @cur is a loop cursor for searching the
* kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the
* first call, but '@cur' itself must NOT NULL.
*
* Returns found address value or zero if not found.
*/
unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
struct llist_node **cur)
{
struct rethook_node *rhn = NULL;
unsigned long ret;
if (WARN_ON_ONCE(!cur))
return 0;
if (WARN_ON_ONCE(tsk != current && task_is_running(tsk)))
return 0;
do {
ret = __rethook_find_ret_addr(tsk, cur);
if (!ret)
break;
rhn = container_of(*cur, struct rethook_node, llist);
} while (rhn->frame != frame);
return ret;
}
NOKPROBE_SYMBOL(rethook_find_ret_addr);
void __weak arch_rethook_fixup_return(struct pt_regs *regs,
unsigned long correct_ret_addr)
{
/*
* Do nothing by default. If the architecture which uses a
* frame pointer to record real return address on the stack,
* it should fill this function to fixup the return address
* so that stacktrace works from the rethook handler.
*/
}
/* This function will be called from each arch-defined trampoline. */
unsigned long rethook_trampoline_handler(struct pt_regs *regs,
unsigned long frame)
{
struct llist_node *first, *node = NULL;
unsigned long correct_ret_addr;
rethook_handler_t handler;
struct rethook_node *rhn;
correct_ret_addr = __rethook_find_ret_addr(current, &node);
if (!correct_ret_addr) {
pr_err("rethook: Return address not found! Maybe there is a bug in the kernel\n");
BUG_ON(1);
}
instruction_pointer_set(regs, correct_ret_addr);
/*
* These loops must be protected from rethook_free_rcu() because those
* are accessing 'rhn->rethook'.
*/
preempt_disable();
/*
* Run the handler on the shadow stack. Do not unlink the list here because
* stackdump inside the handlers needs to decode it.
*/
first = current->rethooks.first;
while (first) {
rhn = container_of(first, struct rethook_node, llist);
if (WARN_ON_ONCE(rhn->frame != frame))
break;
handler = READ_ONCE(rhn->rethook->handler);
if (handler)
handler(rhn, rhn->rethook->data, regs);
if (first == node)
break;
first = first->next;
}
/* Fixup registers for returning to correct address. */
arch_rethook_fixup_return(regs, correct_ret_addr);
/* Unlink used shadow stack */
first = current->rethooks.first;
current->rethooks.first = node->next;
node->next = NULL;
while (first) {
rhn = container_of(first, struct rethook_node, llist);
first = first->next;
rethook_recycle(rhn);
}
preempt_enable();
return correct_ret_addr;
}
NOKPROBE_SYMBOL(rethook_trampoline_handler);

View File

@ -2118,6 +2118,18 @@ config KPROBES_SANITY_TEST
Say N if you are unsure.
config FPROBE_SANITY_TEST
bool "Self test for fprobe"
depends on DEBUG_KERNEL
depends on FPROBE
depends on KUNIT=y
help
This option will enable testing the fprobe when the system boot.
A series of tests are made to verify that the fprobe is functioning
properly.
Say N if you are unsure.
config BACKTRACE_SELF_TEST
tristate "Self test for the backtrace code"
depends on DEBUG_KERNEL

View File

@ -103,6 +103,8 @@ obj-$(CONFIG_TEST_HMM) += test_hmm.o
obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o
obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o
CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE)
obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o
#
# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns
# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS

View File

@ -122,16 +122,27 @@ static void swap_bytes(void *a, void *b, size_t n)
* a pointer, but small integers make for the smallest compare
* instructions.
*/
#define SWAP_WORDS_64 (swap_func_t)0
#define SWAP_WORDS_32 (swap_func_t)1
#define SWAP_BYTES (swap_func_t)2
#define SWAP_WORDS_64 (swap_r_func_t)0
#define SWAP_WORDS_32 (swap_r_func_t)1
#define SWAP_BYTES (swap_r_func_t)2
#define SWAP_WRAPPER (swap_r_func_t)3
struct wrapper {
cmp_func_t cmp;
swap_func_t swap;
};
/*
* The function pointer is last to make tail calls most efficient if the
* compiler decides not to inline this function.
*/
static void do_swap(void *a, void *b, size_t size, swap_func_t swap_func)
static void do_swap(void *a, void *b, size_t size, swap_r_func_t swap_func, const void *priv)
{
if (swap_func == SWAP_WRAPPER) {
((const struct wrapper *)priv)->swap(a, b, (int)size);
return;
}
if (swap_func == SWAP_WORDS_64)
swap_words_64(a, b, size);
else if (swap_func == SWAP_WORDS_32)
@ -139,7 +150,7 @@ static void do_swap(void *a, void *b, size_t size, swap_func_t swap_func)
else if (swap_func == SWAP_BYTES)
swap_bytes(a, b, size);
else
swap_func(a, b, (int)size);
swap_func(a, b, (int)size, priv);
}
#define _CMP_WRAPPER ((cmp_r_func_t)0L)
@ -147,7 +158,7 @@ static void do_swap(void *a, void *b, size_t size, swap_func_t swap_func)
static int do_cmp(const void *a, const void *b, cmp_r_func_t cmp, const void *priv)
{
if (cmp == _CMP_WRAPPER)
return ((cmp_func_t)(priv))(a, b);
return ((const struct wrapper *)priv)->cmp(a, b);
return cmp(a, b, priv);
}
@ -198,7 +209,7 @@ static size_t parent(size_t i, unsigned int lsbit, size_t size)
*/
void sort_r(void *base, size_t num, size_t size,
cmp_r_func_t cmp_func,
swap_func_t swap_func,
swap_r_func_t swap_func,
const void *priv)
{
/* pre-scale counters for performance */
@ -208,6 +219,10 @@ void sort_r(void *base, size_t num, size_t size,
if (!a) /* num < 2 || size == 0 */
return;
/* called from 'sort' without swap function, let's pick the default */
if (swap_func == SWAP_WRAPPER && !((struct wrapper *)priv)->swap)
swap_func = NULL;
if (!swap_func) {
if (is_aligned(base, size, 8))
swap_func = SWAP_WORDS_64;
@ -230,7 +245,7 @@ void sort_r(void *base, size_t num, size_t size,
if (a) /* Building heap: sift down --a */
a -= size;
else if (n -= size) /* Sorting: Extract root to --n */
do_swap(base, base + n, size, swap_func);
do_swap(base, base + n, size, swap_func, priv);
else /* Sort complete */
break;
@ -257,7 +272,7 @@ void sort_r(void *base, size_t num, size_t size,
c = b; /* Where "a" belongs */
while (b != a) { /* Shift it into place */
b = parent(b, lsbit, size);
do_swap(base + b, base + c, size, swap_func);
do_swap(base + b, base + c, size, swap_func, priv);
}
}
}
@ -267,6 +282,11 @@ void sort(void *base, size_t num, size_t size,
cmp_func_t cmp_func,
swap_func_t swap_func)
{
return sort_r(base, num, size, _CMP_WRAPPER, swap_func, cmp_func);
struct wrapper w = {
.cmp = cmp_func,
.swap = swap_func,
};
return sort_r(base, num, size, _CMP_WRAPPER, SWAP_WRAPPER, &w);
}
EXPORT_SYMBOL(sort);

174
lib/test_fprobe.c Normal file
View File

@ -0,0 +1,174 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* test_fprobe.c - simple sanity test for fprobe
*/
#include <linux/kernel.h>
#include <linux/fprobe.h>
#include <linux/random.h>
#include <kunit/test.h>
#define div_factor 3
static struct kunit *current_test;
static u32 rand1, entry_val, exit_val;
/* Use indirect calls to avoid inlining the target functions */
static u32 (*target)(u32 value);
static u32 (*target2)(u32 value);
static unsigned long target_ip;
static unsigned long target2_ip;
static noinline u32 fprobe_selftest_target(u32 value)
{
return (value / div_factor);
}
static noinline u32 fprobe_selftest_target2(u32 value)
{
return (value / div_factor) + 1;
}
static notrace void fp_entry_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs)
{
KUNIT_EXPECT_FALSE(current_test, preemptible());
/* This can be called on the fprobe_selftest_target and the fprobe_selftest_target2 */
if (ip != target_ip)
KUNIT_EXPECT_EQ(current_test, ip, target2_ip);
entry_val = (rand1 / div_factor);
}
static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs)
{
unsigned long ret = regs_return_value(regs);
KUNIT_EXPECT_FALSE(current_test, preemptible());
if (ip != target_ip) {
KUNIT_EXPECT_EQ(current_test, ip, target2_ip);
KUNIT_EXPECT_EQ(current_test, ret, (rand1 / div_factor) + 1);
} else
KUNIT_EXPECT_EQ(current_test, ret, (rand1 / div_factor));
KUNIT_EXPECT_EQ(current_test, entry_val, (rand1 / div_factor));
exit_val = entry_val + div_factor;
}
/* Test entry only (no rethook) */
static void test_fprobe_entry(struct kunit *test)
{
struct fprobe fp_entry = {
.entry_handler = fp_entry_handler,
};
current_test = test;
/* Before register, unregister should be failed. */
KUNIT_EXPECT_NE(test, 0, unregister_fprobe(&fp_entry));
KUNIT_EXPECT_EQ(test, 0, register_fprobe(&fp_entry, "fprobe_selftest_target*", NULL));
entry_val = 0;
exit_val = 0;
target(rand1);
KUNIT_EXPECT_NE(test, 0, entry_val);
KUNIT_EXPECT_EQ(test, 0, exit_val);
entry_val = 0;
exit_val = 0;
target2(rand1);
KUNIT_EXPECT_NE(test, 0, entry_val);
KUNIT_EXPECT_EQ(test, 0, exit_val);
KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp_entry));
}
static void test_fprobe(struct kunit *test)
{
struct fprobe fp = {
.entry_handler = fp_entry_handler,
.exit_handler = fp_exit_handler,
};
current_test = test;
KUNIT_EXPECT_EQ(test, 0, register_fprobe(&fp, "fprobe_selftest_target*", NULL));
entry_val = 0;
exit_val = 0;
target(rand1);
KUNIT_EXPECT_NE(test, 0, entry_val);
KUNIT_EXPECT_EQ(test, entry_val + div_factor, exit_val);
entry_val = 0;
exit_val = 0;
target2(rand1);
KUNIT_EXPECT_NE(test, 0, entry_val);
KUNIT_EXPECT_EQ(test, entry_val + div_factor, exit_val);
KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp));
}
static void test_fprobe_syms(struct kunit *test)
{
static const char *syms[] = {"fprobe_selftest_target", "fprobe_selftest_target2"};
struct fprobe fp = {
.entry_handler = fp_entry_handler,
.exit_handler = fp_exit_handler,
};
current_test = test;
KUNIT_EXPECT_EQ(test, 0, register_fprobe_syms(&fp, syms, 2));
entry_val = 0;
exit_val = 0;
target(rand1);
KUNIT_EXPECT_NE(test, 0, entry_val);
KUNIT_EXPECT_EQ(test, entry_val + div_factor, exit_val);
entry_val = 0;
exit_val = 0;
target2(rand1);
KUNIT_EXPECT_NE(test, 0, entry_val);
KUNIT_EXPECT_EQ(test, entry_val + div_factor, exit_val);
KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp));
}
static unsigned long get_ftrace_location(void *func)
{
unsigned long size, addr = (unsigned long)func;
if (!kallsyms_lookup_size_offset(addr, &size, NULL) || !size)
return 0;
return ftrace_location_range(addr, addr + size - 1);
}
static int fprobe_test_init(struct kunit *test)
{
do {
rand1 = prandom_u32();
} while (rand1 <= div_factor);
target = fprobe_selftest_target;
target2 = fprobe_selftest_target2;
target_ip = get_ftrace_location(target);
target2_ip = get_ftrace_location(target2);
return 0;
}
static struct kunit_case fprobe_testcases[] = {
KUNIT_CASE(test_fprobe_entry),
KUNIT_CASE(test_fprobe),
KUNIT_CASE(test_fprobe_syms),
{}
};
static struct kunit_suite fprobe_test_suite = {
.name = "fprobe_test",
.init = fprobe_test_init,
.test_cases = fprobe_testcases,
};
kunit_test_suites(&fprobe_test_suite);
MODULE_LICENSE("GPL");

View File

@ -15,6 +15,7 @@
#include <net/sock.h>
#include <net/tcp.h>
#include <net/net_namespace.h>
#include <net/page_pool.h>
#include <linux/error-injection.h>
#include <linux/smp.h>
#include <linux/sock_diag.h>
@ -53,10 +54,11 @@ static void bpf_test_timer_leave(struct bpf_test_timer *t)
rcu_read_unlock();
}
static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *err, u32 *duration)
static bool bpf_test_timer_continue(struct bpf_test_timer *t, int iterations,
u32 repeat, int *err, u32 *duration)
__must_hold(rcu)
{
t->i++;
t->i += iterations;
if (t->i >= repeat) {
/* We're done. */
t->time_spent += ktime_get_ns() - t->time_start;
@ -88,6 +90,284 @@ reset:
return false;
}
/* We put this struct at the head of each page with a context and frame
* initialised when the page is allocated, so we don't have to do this on each
* repetition of the test run.
*/
struct xdp_page_head {
struct xdp_buff orig_ctx;
struct xdp_buff ctx;
struct xdp_frame frm;
u8 data[];
};
struct xdp_test_data {
struct xdp_buff *orig_ctx;
struct xdp_rxq_info rxq;
struct net_device *dev;
struct page_pool *pp;
struct xdp_frame **frames;
struct sk_buff **skbs;
u32 batch_size;
u32 frame_cnt;
};
#define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head))
#define TEST_XDP_MAX_BATCH 256
static void xdp_test_run_init_page(struct page *page, void *arg)
{
struct xdp_page_head *head = phys_to_virt(page_to_phys(page));
struct xdp_buff *new_ctx, *orig_ctx;
u32 headroom = XDP_PACKET_HEADROOM;
struct xdp_test_data *xdp = arg;
size_t frm_len, meta_len;
struct xdp_frame *frm;
void *data;
orig_ctx = xdp->orig_ctx;
frm_len = orig_ctx->data_end - orig_ctx->data_meta;
meta_len = orig_ctx->data - orig_ctx->data_meta;
headroom -= meta_len;
new_ctx = &head->ctx;
frm = &head->frm;
data = &head->data;
memcpy(data + headroom, orig_ctx->data_meta, frm_len);
xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &xdp->rxq);
xdp_prepare_buff(new_ctx, data, headroom, frm_len, true);
new_ctx->data = new_ctx->data_meta + meta_len;
xdp_update_frame_from_buff(new_ctx, frm);
frm->mem = new_ctx->rxq->mem;
memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx));
}
static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_ctx)
{
struct xdp_mem_info mem = {};
struct page_pool *pp;
int err = -ENOMEM;
struct page_pool_params pp_params = {
.order = 0,
.flags = 0,
.pool_size = xdp->batch_size,
.nid = NUMA_NO_NODE,
.init_callback = xdp_test_run_init_page,
.init_arg = xdp,
};
xdp->frames = kvmalloc_array(xdp->batch_size, sizeof(void *), GFP_KERNEL);
if (!xdp->frames)
return -ENOMEM;
xdp->skbs = kvmalloc_array(xdp->batch_size, sizeof(void *), GFP_KERNEL);
if (!xdp->skbs)
goto err_skbs;
pp = page_pool_create(&pp_params);
if (IS_ERR(pp)) {
err = PTR_ERR(pp);
goto err_pp;
}
/* will copy 'mem.id' into pp->xdp_mem_id */
err = xdp_reg_mem_model(&mem, MEM_TYPE_PAGE_POOL, pp);
if (err)
goto err_mmodel;
xdp->pp = pp;
/* We create a 'fake' RXQ referencing the original dev, but with an
* xdp_mem_info pointing to our page_pool
*/
xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0);
xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL;
xdp->rxq.mem.id = pp->xdp_mem_id;
xdp->dev = orig_ctx->rxq->dev;
xdp->orig_ctx = orig_ctx;
return 0;
err_mmodel:
page_pool_destroy(pp);
err_pp:
kvfree(xdp->skbs);
err_skbs:
kvfree(xdp->frames);
return err;
}
static void xdp_test_run_teardown(struct xdp_test_data *xdp)
{
page_pool_destroy(xdp->pp);
kfree(xdp->frames);
kfree(xdp->skbs);
}
static bool ctx_was_changed(struct xdp_page_head *head)
{
return head->orig_ctx.data != head->ctx.data ||
head->orig_ctx.data_meta != head->ctx.data_meta ||
head->orig_ctx.data_end != head->ctx.data_end;
}
static void reset_ctx(struct xdp_page_head *head)
{
if (likely(!ctx_was_changed(head)))
return;
head->ctx.data = head->orig_ctx.data;
head->ctx.data_meta = head->orig_ctx.data_meta;
head->ctx.data_end = head->orig_ctx.data_end;
xdp_update_frame_from_buff(&head->ctx, &head->frm);
}
static int xdp_recv_frames(struct xdp_frame **frames, int nframes,
struct sk_buff **skbs,
struct net_device *dev)
{
gfp_t gfp = __GFP_ZERO | GFP_ATOMIC;
int i, n;
LIST_HEAD(list);
n = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, nframes, (void **)skbs);
if (unlikely(n == 0)) {
for (i = 0; i < nframes; i++)
xdp_return_frame(frames[i]);
return -ENOMEM;
}
for (i = 0; i < nframes; i++) {
struct xdp_frame *xdpf = frames[i];
struct sk_buff *skb = skbs[i];
skb = __xdp_build_skb_from_frame(xdpf, skb, dev);
if (!skb) {
xdp_return_frame(xdpf);
continue;
}
list_add_tail(&skb->list, &list);
}
netif_receive_skb_list(&list);
return 0;
}
static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog,
u32 repeat)
{
struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
int err = 0, act, ret, i, nframes = 0, batch_sz;
struct xdp_frame **frames = xdp->frames;
struct xdp_page_head *head;
struct xdp_frame *frm;
bool redirect = false;
struct xdp_buff *ctx;
struct page *page;
batch_sz = min_t(u32, repeat, xdp->batch_size);
local_bh_disable();
xdp_set_return_frame_no_direct();
for (i = 0; i < batch_sz; i++) {
page = page_pool_dev_alloc_pages(xdp->pp);
if (!page) {
err = -ENOMEM;
goto out;
}
head = phys_to_virt(page_to_phys(page));
reset_ctx(head);
ctx = &head->ctx;
frm = &head->frm;
xdp->frame_cnt++;
act = bpf_prog_run_xdp(prog, ctx);
/* if program changed pkt bounds we need to update the xdp_frame */
if (unlikely(ctx_was_changed(head))) {
ret = xdp_update_frame_from_buff(ctx, frm);
if (ret) {
xdp_return_buff(ctx);
continue;
}
}
switch (act) {
case XDP_TX:
/* we can't do a real XDP_TX since we're not in the
* driver, so turn it into a REDIRECT back to the same
* index
*/
ri->tgt_index = xdp->dev->ifindex;
ri->map_id = INT_MAX;
ri->map_type = BPF_MAP_TYPE_UNSPEC;
fallthrough;
case XDP_REDIRECT:
redirect = true;
ret = xdp_do_redirect_frame(xdp->dev, ctx, frm, prog);
if (ret)
xdp_return_buff(ctx);
break;
case XDP_PASS:
frames[nframes++] = frm;
break;
default:
bpf_warn_invalid_xdp_action(NULL, prog, act);
fallthrough;
case XDP_DROP:
xdp_return_buff(ctx);
break;
}
}
out:
if (redirect)
xdp_do_flush();
if (nframes) {
ret = xdp_recv_frames(frames, nframes, xdp->skbs, xdp->dev);
if (ret)
err = ret;
}
xdp_clear_return_frame_no_direct();
local_bh_enable();
return err;
}
static int bpf_test_run_xdp_live(struct bpf_prog *prog, struct xdp_buff *ctx,
u32 repeat, u32 batch_size, u32 *time)
{
struct xdp_test_data xdp = { .batch_size = batch_size };
struct bpf_test_timer t = { .mode = NO_MIGRATE };
int ret;
if (!repeat)
repeat = 1;
ret = xdp_test_run_setup(&xdp, ctx);
if (ret)
return ret;
bpf_test_timer_enter(&t);
do {
xdp.frame_cnt = 0;
ret = xdp_test_run_batch(&xdp, prog, repeat - t.i);
if (unlikely(ret < 0))
break;
} while (bpf_test_timer_continue(&t, xdp.frame_cnt, repeat, &ret, time));
bpf_test_timer_leave(&t);
xdp_test_run_teardown(&xdp);
return ret;
}
static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
u32 *retval, u32 *time, bool xdp)
{
@ -119,7 +399,7 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
*retval = bpf_prog_run_xdp(prog, ctx);
else
*retval = bpf_prog_run(prog, ctx);
} while (bpf_test_timer_continue(&t, repeat, &ret, time));
} while (bpf_test_timer_continue(&t, 1, repeat, &ret, time));
bpf_reset_run_ctx(old_ctx);
bpf_test_timer_leave(&t);
@ -201,8 +481,8 @@ out:
* future.
*/
__diag_push();
__diag_ignore(GCC, 8, "-Wmissing-prototypes",
"Global functions as their definitions will be in vmlinux BTF");
__diag_ignore_all("-Wmissing-prototypes",
"Global functions as their definitions will be in vmlinux BTF");
int noinline bpf_fentry_test1(int a)
{
return a + 1;
@ -270,9 +550,14 @@ struct sock * noinline bpf_kfunc_call_test3(struct sock *sk)
return sk;
}
struct prog_test_member {
u64 c;
};
struct prog_test_ref_kfunc {
int a;
int b;
struct prog_test_member memb;
struct prog_test_ref_kfunc *next;
};
@ -295,6 +580,10 @@ noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p)
{
}
noinline void bpf_kfunc_call_memb_release(struct prog_test_member *p)
{
}
struct prog_test_pass1 {
int x0;
struct {
@ -379,6 +668,7 @@ BTF_ID(func, bpf_kfunc_call_test2)
BTF_ID(func, bpf_kfunc_call_test3)
BTF_ID(func, bpf_kfunc_call_test_acquire)
BTF_ID(func, bpf_kfunc_call_test_release)
BTF_ID(func, bpf_kfunc_call_memb_release)
BTF_ID(func, bpf_kfunc_call_test_pass_ctx)
BTF_ID(func, bpf_kfunc_call_test_pass1)
BTF_ID(func, bpf_kfunc_call_test_pass2)
@ -396,6 +686,7 @@ BTF_SET_END(test_sk_acquire_kfunc_ids)
BTF_SET_START(test_sk_release_kfunc_ids)
BTF_ID(func, bpf_kfunc_call_test_release)
BTF_ID(func, bpf_kfunc_call_memb_release)
BTF_SET_END(test_sk_release_kfunc_ids)
BTF_SET_START(test_sk_ret_null_kfunc_ids)
@ -435,7 +726,7 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog,
int b = 2, err = -EFAULT;
u32 retval = 0;
if (kattr->test.flags || kattr->test.cpu)
if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size)
return -EINVAL;
switch (prog->expected_attach_type) {
@ -499,7 +790,7 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
/* doesn't support data_in/out, ctx_out, duration, or repeat */
if (kattr->test.data_in || kattr->test.data_out ||
kattr->test.ctx_out || kattr->test.duration ||
kattr->test.repeat)
kattr->test.repeat || kattr->test.batch_size)
return -EINVAL;
if (ctx_size_in < prog->aux->max_ctx_offset ||
@ -730,7 +1021,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
void *data;
int ret;
if (kattr->test.flags || kattr->test.cpu)
if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size)
return -EINVAL;
data = bpf_test_init(kattr, kattr->test.data_size_in,
@ -911,10 +1202,12 @@ static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md)
int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
union bpf_attr __user *uattr)
{
bool do_live = (kattr->test.flags & BPF_F_TEST_XDP_LIVE_FRAMES);
u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
u32 batch_size = kattr->test.batch_size;
u32 retval = 0, duration, max_data_sz;
u32 size = kattr->test.data_size_in;
u32 headroom = XDP_PACKET_HEADROOM;
u32 retval, duration, max_data_sz;
u32 repeat = kattr->test.repeat;
struct netdev_rx_queue *rxqueue;
struct skb_shared_info *sinfo;
@ -927,6 +1220,20 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
prog->expected_attach_type == BPF_XDP_CPUMAP)
return -EINVAL;
if (kattr->test.flags & ~BPF_F_TEST_XDP_LIVE_FRAMES)
return -EINVAL;
if (do_live) {
if (!batch_size)
batch_size = NAPI_POLL_WEIGHT;
else if (batch_size > TEST_XDP_MAX_BATCH)
return -E2BIG;
headroom += sizeof(struct xdp_page_head);
} else if (batch_size) {
return -EINVAL;
}
ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md));
if (IS_ERR(ctx))
return PTR_ERR(ctx);
@ -935,14 +1242,20 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
/* There can't be user provided data before the meta data */
if (ctx->data_meta || ctx->data_end != size ||
ctx->data > ctx->data_end ||
unlikely(xdp_metalen_invalid(ctx->data)))
unlikely(xdp_metalen_invalid(ctx->data)) ||
(do_live && (kattr->test.data_out || kattr->test.ctx_out)))
goto free_ctx;
/* Meta data is allocated from the headroom */
headroom -= ctx->data;
}
max_data_sz = 4096 - headroom - tailroom;
size = min_t(u32, size, max_data_sz);
if (size > max_data_sz) {
/* disallow live data mode for jumbo frames */
if (do_live)
goto free_ctx;
size = max_data_sz;
}
data = bpf_test_init(kattr, size, max_data_sz, headroom, tailroom);
if (IS_ERR(data)) {
@ -1000,7 +1313,10 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
if (repeat > 1)
bpf_prog_change_xdp(NULL, prog);
ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
if (do_live)
ret = bpf_test_run_xdp_live(prog, &xdp, repeat, batch_size, &duration);
else
ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
/* We convert the xdp_buff back to an xdp_md before checking the return
* code so the reference count of any held netdevice will be decremented
* even if the test run failed.
@ -1062,7 +1378,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
return -EINVAL;
if (kattr->test.flags || kattr->test.cpu)
if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size)
return -EINVAL;
if (size < ETH_HLEN)
@ -1097,7 +1413,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
do {
retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN,
size, flags);
} while (bpf_test_timer_continue(&t, repeat, &ret, &duration));
} while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration));
bpf_test_timer_leave(&t);
if (ret < 0)
@ -1129,7 +1445,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat
if (prog->type != BPF_PROG_TYPE_SK_LOOKUP)
return -EINVAL;
if (kattr->test.flags || kattr->test.cpu)
if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size)
return -EINVAL;
if (kattr->test.data_in || kattr->test.data_size_in || kattr->test.data_out ||
@ -1192,7 +1508,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat
do {
ctx.selected_sk = NULL;
retval = BPF_PROG_SK_LOOKUP_RUN_ARRAY(progs, ctx, bpf_prog_run);
} while (bpf_test_timer_continue(&t, repeat, &ret, &duration));
} while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration));
bpf_test_timer_leave(&t);
if (ret < 0)
@ -1231,7 +1547,8 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog,
/* doesn't support data_in/out, ctx_out, duration, or repeat or flags */
if (kattr->test.data_in || kattr->test.data_out ||
kattr->test.ctx_out || kattr->test.duration ||
kattr->test.repeat || kattr->test.flags)
kattr->test.repeat || kattr->test.flags ||
kattr->test.batch_size)
return -EINVAL;
if (ctx_size_in < prog->aux->max_ctx_offset ||

View File

@ -141,7 +141,7 @@ static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key,
if (sock) {
sdata = bpf_local_storage_update(
sock->sk, (struct bpf_local_storage_map *)map, value,
map_flags);
map_flags, GFP_ATOMIC);
sockfd_put(sock);
return PTR_ERR_OR_ZERO(sdata);
}
@ -172,7 +172,7 @@ bpf_sk_storage_clone_elem(struct sock *newsk,
{
struct bpf_local_storage_elem *copy_selem;
copy_selem = bpf_selem_alloc(smap, newsk, NULL, true);
copy_selem = bpf_selem_alloc(smap, newsk, NULL, true, GFP_ATOMIC);
if (!copy_selem)
return NULL;
@ -230,7 +230,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
bpf_selem_link_map(smap, copy_selem);
bpf_selem_link_storage_nolock(new_sk_storage, copy_selem);
} else {
ret = bpf_local_storage_alloc(newsk, smap, copy_selem);
ret = bpf_local_storage_alloc(newsk, smap, copy_selem, GFP_ATOMIC);
if (ret) {
kfree(copy_selem);
atomic_sub(smap->elem_size,
@ -255,8 +255,9 @@ out:
return ret;
}
BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
void *, value, u64, flags)
/* *gfp_flags* is a hidden argument provided by the verifier */
BPF_CALL_5(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
void *, value, u64, flags, gfp_t, gfp_flags)
{
struct bpf_local_storage_data *sdata;
@ -277,7 +278,7 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
refcount_inc_not_zero(&sk->sk_refcnt)) {
sdata = bpf_local_storage_update(
sk, (struct bpf_local_storage_map *)map, value,
BPF_NOEXIST);
BPF_NOEXIST, gfp_flags);
/* sk must be a fullsock (guaranteed by verifier),
* so sock_gen_put() is unnecessary.
*/
@ -405,6 +406,8 @@ static bool bpf_sk_storage_tracing_allowed(const struct bpf_prog *prog)
case BPF_TRACE_FENTRY:
case BPF_TRACE_FEXIT:
btf_vmlinux = bpf_get_btf_vmlinux();
if (IS_ERR_OR_NULL(btf_vmlinux))
return false;
btf_id = prog->aux->attach_btf_id;
t = btf_type_by_id(btf_vmlinux, btf_id);
tname = btf_name_by_offset(btf_vmlinux, t->name_off);
@ -417,14 +420,16 @@ static bool bpf_sk_storage_tracing_allowed(const struct bpf_prog *prog)
return false;
}
BPF_CALL_4(bpf_sk_storage_get_tracing, struct bpf_map *, map, struct sock *, sk,
void *, value, u64, flags)
/* *gfp_flags* is a hidden argument provided by the verifier */
BPF_CALL_5(bpf_sk_storage_get_tracing, struct bpf_map *, map, struct sock *, sk,
void *, value, u64, flags, gfp_t, gfp_flags)
{
WARN_ON_ONCE(!bpf_rcu_lock_held());
if (in_hardirq() || in_nmi())
return (unsigned long)NULL;
return (unsigned long)____bpf_sk_storage_get(map, sk, value, flags);
return (unsigned long)____bpf_sk_storage_get(map, sk, value, flags,
gfp_flags);
}
BPF_CALL_2(bpf_sk_storage_delete_tracing, struct bpf_map *, map,

View File

@ -7388,36 +7388,36 @@ static const struct bpf_func_proto bpf_sock_ops_reserve_hdr_opt_proto = {
.arg3_type = ARG_ANYTHING,
};
BPF_CALL_3(bpf_skb_set_delivery_time, struct sk_buff *, skb,
u64, dtime, u32, dtime_type)
BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
u64, tstamp, u32, tstamp_type)
{
/* skb_clear_delivery_time() is done for inet protocol */
if (skb->protocol != htons(ETH_P_IP) &&
skb->protocol != htons(ETH_P_IPV6))
return -EOPNOTSUPP;
switch (dtime_type) {
case BPF_SKB_DELIVERY_TIME_MONO:
if (!dtime)
switch (tstamp_type) {
case BPF_SKB_TSTAMP_DELIVERY_MONO:
if (!tstamp)
return -EINVAL;
skb->tstamp = dtime;
skb->tstamp = tstamp;
skb->mono_delivery_time = 1;
break;
case BPF_SKB_DELIVERY_TIME_NONE:
if (dtime)
case BPF_SKB_TSTAMP_UNSPEC:
if (tstamp)
return -EINVAL;
skb->tstamp = 0;
skb->mono_delivery_time = 0;
break;
default:
return -EOPNOTSUPP;
return -EINVAL;
}
return 0;
}
static const struct bpf_func_proto bpf_skb_set_delivery_time_proto = {
.func = bpf_skb_set_delivery_time,
static const struct bpf_func_proto bpf_skb_set_tstamp_proto = {
.func = bpf_skb_set_tstamp,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_CTX,
@ -7786,8 +7786,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_tcp_gen_syncookie_proto;
case BPF_FUNC_sk_assign:
return &bpf_sk_assign_proto;
case BPF_FUNC_skb_set_delivery_time:
return &bpf_skb_set_delivery_time_proto;
case BPF_FUNC_skb_set_tstamp:
return &bpf_skb_set_tstamp_proto;
#endif
default:
return bpf_sk_base_func_proto(func_id);
@ -8127,9 +8127,9 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type
return false;
info->reg_type = PTR_TO_SOCK_COMMON_OR_NULL;
break;
case offsetof(struct __sk_buff, delivery_time_type):
case offsetof(struct __sk_buff, tstamp_type):
return false;
case offsetofend(struct __sk_buff, delivery_time_type) ... offsetof(struct __sk_buff, hwtstamp) - 1:
case offsetofend(struct __sk_buff, tstamp_type) ... offsetof(struct __sk_buff, hwtstamp) - 1:
/* Explicitly prohibit access to padding in __sk_buff. */
return false;
default:
@ -8484,14 +8484,14 @@ static bool tc_cls_act_is_valid_access(int off, int size,
break;
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
return false;
case offsetof(struct __sk_buff, delivery_time_type):
case offsetof(struct __sk_buff, tstamp_type):
/* The convert_ctx_access() on reading and writing
* __sk_buff->tstamp depends on whether the bpf prog
* has used __sk_buff->delivery_time_type or not.
* Thus, we need to set prog->delivery_time_access
* has used __sk_buff->tstamp_type or not.
* Thus, we need to set prog->tstamp_type_access
* earlier during is_valid_access() here.
*/
((struct bpf_prog *)prog)->delivery_time_access = 1;
((struct bpf_prog *)prog)->tstamp_type_access = 1;
return size == sizeof(__u8);
}
@ -8888,42 +8888,22 @@ static u32 flow_dissector_convert_ctx_access(enum bpf_access_type type,
return insn - insn_buf;
}
static struct bpf_insn *bpf_convert_dtime_type_read(const struct bpf_insn *si,
struct bpf_insn *insn)
static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si,
struct bpf_insn *insn)
{
__u8 value_reg = si->dst_reg;
__u8 skb_reg = si->src_reg;
/* AX is needed because src_reg and dst_reg could be the same */
__u8 tmp_reg = BPF_REG_AX;
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
SKB_MONO_DELIVERY_TIME_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2);
/* value_reg = BPF_SKB_DELIVERY_TIME_MONO */
*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_MONO);
*insn++ = BPF_JMP_A(IS_ENABLED(CONFIG_NET_CLS_ACT) ? 10 : 5);
*insn++ = BPF_LDX_MEM(BPF_DW, tmp_reg, skb_reg,
offsetof(struct sk_buff, tstamp));
*insn++ = BPF_JMP_IMM(BPF_JNE, tmp_reg, 0, 2);
/* value_reg = BPF_SKB_DELIVERY_TIME_NONE */
*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_NONE);
*insn++ = BPF_JMP_A(IS_ENABLED(CONFIG_NET_CLS_ACT) ? 6 : 1);
#ifdef CONFIG_NET_CLS_ACT
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2);
/* At ingress, value_reg = 0 */
*insn++ = BPF_MOV32_IMM(value_reg, 0);
PKT_VLAN_PRESENT_OFFSET);
*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg,
SKB_MONO_DELIVERY_TIME_MASK, 2);
*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
*insn++ = BPF_JMP_A(1);
#endif
*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO);
/* value_reg = BPF_SKB_DELIVERYT_TIME_UNSPEC */
*insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_UNSPEC);
/* 15 insns with CONFIG_NET_CLS_ACT */
return insn;
}
@ -8956,21 +8936,22 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
__u8 skb_reg = si->src_reg;
#ifdef CONFIG_NET_CLS_ACT
if (!prog->delivery_time_access) {
/* If the tstamp_type is read,
* the bpf prog is aware the tstamp could have delivery time.
* Thus, read skb->tstamp as is if tstamp_type_access is true.
*/
if (!prog->tstamp_type_access) {
/* AX is needed because src_reg and dst_reg could be the same */
__u8 tmp_reg = BPF_REG_AX;
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 5);
/* @ingress, read __sk_buff->tstamp as the (rcv) timestamp,
* so check the skb->mono_delivery_time.
*/
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
SKB_MONO_DELIVERY_TIME_OFFSET);
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, PKT_VLAN_PRESENT_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2);
/* skb->mono_delivery_time is set, read 0 as the (rcv) timestamp. */
TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
/* skb->tc_at_ingress && skb->mono_delivery_time,
* read 0 as the (rcv) timestamp.
*/
*insn++ = BPF_MOV64_IMM(value_reg, 0);
*insn++ = BPF_JMP_A(1);
}
@ -8989,25 +8970,27 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
__u8 skb_reg = si->dst_reg;
#ifdef CONFIG_NET_CLS_ACT
if (!prog->delivery_time_access) {
/* If the tstamp_type is read,
* the bpf prog is aware the tstamp could have delivery time.
* Thus, write skb->tstamp as is if tstamp_type_access is true.
* Otherwise, writing at ingress will have to clear the
* mono_delivery_time bit also.
*/
if (!prog->tstamp_type_access) {
__u8 tmp_reg = BPF_REG_AX;
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 3);
/* Writing __sk_buff->tstamp at ingress as the (rcv) timestamp.
* Clear the skb->mono_delivery_time.
*/
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
SKB_MONO_DELIVERY_TIME_OFFSET);
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
~SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg,
SKB_MONO_DELIVERY_TIME_OFFSET);
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, PKT_VLAN_PRESENT_OFFSET);
/* Writing __sk_buff->tstamp as ingress, goto <clear> */
*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
/* goto <store> */
*insn++ = BPF_JMP_A(2);
/* <clear>: mono_delivery_time */
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, PKT_VLAN_PRESENT_OFFSET);
}
#endif
/* skb->tstamp = tstamp */
/* <store>: skb->tstamp = tstamp */
*insn++ = BPF_STX_MEM(BPF_DW, skb_reg, value_reg,
offsetof(struct sk_buff, tstamp));
return insn;
@ -9326,8 +9309,8 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
insn = bpf_convert_tstamp_read(prog, si, insn);
break;
case offsetof(struct __sk_buff, delivery_time_type):
insn = bpf_convert_dtime_type_read(si, insn);
case offsetof(struct __sk_buff, tstamp_type):
insn = bpf_convert_tstamp_type_read(si, insn);
break;
case offsetof(struct __sk_buff, gso_segs):
@ -11006,13 +10989,24 @@ static bool sk_lookup_is_valid_access(int off, int size,
case bpf_ctx_range(struct bpf_sk_lookup, local_ip4):
case bpf_ctx_range_till(struct bpf_sk_lookup, remote_ip6[0], remote_ip6[3]):
case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0], local_ip6[3]):
case offsetof(struct bpf_sk_lookup, remote_port) ...
offsetof(struct bpf_sk_lookup, local_ip4) - 1:
case bpf_ctx_range(struct bpf_sk_lookup, local_port):
case bpf_ctx_range(struct bpf_sk_lookup, ingress_ifindex):
bpf_ctx_record_field_size(info, sizeof(__u32));
return bpf_ctx_narrow_access_ok(off, size, sizeof(__u32));
case bpf_ctx_range(struct bpf_sk_lookup, remote_port):
/* Allow 4-byte access to 2-byte field for backward compatibility */
if (size == sizeof(__u32))
return true;
bpf_ctx_record_field_size(info, sizeof(__be16));
return bpf_ctx_narrow_access_ok(off, size, sizeof(__be16));
case offsetofend(struct bpf_sk_lookup, remote_port) ...
offsetof(struct bpf_sk_lookup, local_ip4) - 1:
/* Allow access to zero padding for backward compatibility */
bpf_ctx_record_field_size(info, sizeof(__u16));
return bpf_ctx_narrow_access_ok(off, size, sizeof(__u16));
default:
return false;
}
@ -11094,6 +11088,11 @@ static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type,
sport, 2, target_size));
break;
case offsetofend(struct bpf_sk_lookup, remote_port):
*target_size = 2;
*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
break;
case offsetof(struct bpf_sk_lookup, local_port):
*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
bpf_target_off(struct bpf_sk_lookup_kern,

View File

@ -27,6 +27,7 @@ int sk_msg_alloc(struct sock *sk, struct sk_msg *msg, int len,
int elem_first_coalesce)
{
struct page_frag *pfrag = sk_page_frag(sk);
u32 osize = msg->sg.size;
int ret = 0;
len -= msg->sg.size;
@ -35,13 +36,17 @@ int sk_msg_alloc(struct sock *sk, struct sk_msg *msg, int len,
u32 orig_offset;
int use, i;
if (!sk_page_frag_refill(sk, pfrag))
return -ENOMEM;
if (!sk_page_frag_refill(sk, pfrag)) {
ret = -ENOMEM;
goto msg_trim;
}
orig_offset = pfrag->offset;
use = min_t(int, len, pfrag->size - orig_offset);
if (!sk_wmem_schedule(sk, use))
return -ENOMEM;
if (!sk_wmem_schedule(sk, use)) {
ret = -ENOMEM;
goto msg_trim;
}
i = msg->sg.end;
sk_msg_iter_var_prev(i);
@ -71,6 +76,10 @@ int sk_msg_alloc(struct sock *sk, struct sk_msg *msg, int len,
}
return ret;
msg_trim:
sk_msg_trim(sk, msg, osize);
return ret;
}
EXPORT_SYMBOL_GPL(sk_msg_alloc);

View File

@ -529,6 +529,7 @@ void xdp_return_buff(struct xdp_buff *xdp)
out:
__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp);
}
EXPORT_SYMBOL_GPL(xdp_return_buff);
/* Only called for MEM_TYPE_PAGE_POOL see xdp.h */
void __xdp_release_frame(void *data, struct xdp_mem_info *mem)

View File

@ -138,10 +138,9 @@ int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg,
struct sk_psock *psock = sk_psock_get(sk);
int ret;
if (unlikely(!psock)) {
sk_msg_free(sk, msg);
return 0;
}
if (unlikely(!psock))
return -EPIPE;
ret = ingress ? bpf_tcp_ingress(sk, psock, msg, bytes, flags) :
tcp_bpf_push_locked(sk, msg, bytes, flags, false);
sk_psock_put(sk, psock);
@ -335,7 +334,7 @@ more_data:
cork = true;
psock->cork = NULL;
}
sk_msg_return(sk, msg, tosend);
sk_msg_return(sk, msg, msg->sg.size);
release_sock(sk);
ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
@ -375,8 +374,11 @@ more_data:
}
if (msg &&
msg->sg.data[msg->sg.start].page_link &&
msg->sg.data[msg->sg.start].length)
msg->sg.data[msg->sg.start].length) {
if (eval == __SK_REDIRECT)
sk_mem_charge(sk, msg->sg.size);
goto more_data;
}
}
return ret;
}

View File

@ -12,6 +12,7 @@
#include <linux/btf_ids.h>
#include <linux/net_namespace.h>
#include <net/netfilter/nf_conntrack.h>
#include <net/netfilter/nf_conntrack_bpf.h>
#include <net/netfilter/nf_conntrack_core.h>
/* bpf_ct_opts - Options for CT lookup helpers
@ -102,8 +103,8 @@ static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
}
__diag_push();
__diag_ignore(GCC, 8, "-Wmissing-prototypes",
"Global functions as their definitions will be in nf_conntrack BTF");
__diag_ignore_all("-Wmissing-prototypes",
"Global functions as their definitions will be in nf_conntrack BTF");
/* bpf_xdp_ct_lookup - Lookup CT entry for the given tuple, and acquire a
* reference to it

View File

@ -73,6 +73,13 @@ config SAMPLE_HW_BREAKPOINT
help
This builds kernel hardware breakpoint example modules.
config SAMPLE_FPROBE
tristate "Build fprobe examples -- loadable modules only"
depends on FPROBE && m
help
This builds a fprobe example module. This module has an option 'symbol'.
You can specify a probed symbol or symbols separated with ','.
config SAMPLE_KFIFO
tristate "Build kfifo examples -- loadable modules only"
depends on m

View File

@ -33,3 +33,4 @@ subdir-$(CONFIG_SAMPLE_WATCHDOG) += watchdog
subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak/
obj-$(CONFIG_SAMPLE_CORESIGHT_SYSCFG) += coresight/
obj-$(CONFIG_SAMPLE_FPROBE) += fprobe/

View File

@ -1984,15 +1984,15 @@ int main(int argc, char **argv)
setlocale(LC_ALL, "");
prev_time = get_nsecs();
start_time = prev_time;
if (!opt_quiet) {
ret = pthread_create(&pt, NULL, poller, NULL);
if (ret)
exit_with_error(ret);
}
prev_time = get_nsecs();
start_time = prev_time;
/* Configure sched priority for better wake-up accuracy */
memset(&schparam, 0, sizeof(schparam));
schparam.sched_priority = opt_schprio;

3
samples/fprobe/Makefile Normal file
View File

@ -0,0 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_SAMPLE_FPROBE) += fprobe_example.o

View File

@ -0,0 +1,120 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Here's a sample kernel module showing the use of fprobe to dump a
* stack trace and selected registers when kernel_clone() is called.
*
* For more information on theory of operation of kprobes, see
* Documentation/trace/kprobes.rst
*
* You will see the trace data in /var/log/messages and on the console
* whenever kernel_clone() is invoked to create a new process.
*/
#define pr_fmt(fmt) "%s: " fmt, __func__
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/fprobe.h>
#include <linux/sched/debug.h>
#include <linux/slab.h>
#define BACKTRACE_DEPTH 16
#define MAX_SYMBOL_LEN 4096
struct fprobe sample_probe;
static char symbol[MAX_SYMBOL_LEN] = "kernel_clone";
module_param_string(symbol, symbol, sizeof(symbol), 0644);
static char nosymbol[MAX_SYMBOL_LEN] = "";
module_param_string(nosymbol, nosymbol, sizeof(nosymbol), 0644);
static bool stackdump = true;
module_param(stackdump, bool, 0644);
static void show_backtrace(void)
{
unsigned long stacks[BACKTRACE_DEPTH];
unsigned int len;
len = stack_trace_save(stacks, BACKTRACE_DEPTH, 2);
stack_trace_print(stacks, len, 24);
}
static void sample_entry_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs)
{
pr_info("Enter <%pS> ip = 0x%p\n", (void *)ip, (void *)ip);
if (stackdump)
show_backtrace();
}
static void sample_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs)
{
unsigned long rip = instruction_pointer(regs);
pr_info("Return from <%pS> ip = 0x%p to rip = 0x%p (%pS)\n",
(void *)ip, (void *)ip, (void *)rip, (void *)rip);
if (stackdump)
show_backtrace();
}
static int __init fprobe_init(void)
{
char *p, *symbuf = NULL;
const char **syms;
int ret, count, i;
sample_probe.entry_handler = sample_entry_handler;
sample_probe.exit_handler = sample_exit_handler;
if (strchr(symbol, '*')) {
/* filter based fprobe */
ret = register_fprobe(&sample_probe, symbol,
nosymbol[0] == '\0' ? NULL : nosymbol);
goto out;
} else if (!strchr(symbol, ',')) {
symbuf = symbol;
ret = register_fprobe_syms(&sample_probe, (const char **)&symbuf, 1);
goto out;
}
/* Comma separated symbols */
symbuf = kstrdup(symbol, GFP_KERNEL);
if (!symbuf)
return -ENOMEM;
p = symbuf;
count = 1;
while ((p = strchr(++p, ',')) != NULL)
count++;
pr_info("%d symbols found\n", count);
syms = kcalloc(count, sizeof(char *), GFP_KERNEL);
if (!syms) {
kfree(symbuf);
return -ENOMEM;
}
p = symbuf;
for (i = 0; i < count; i++)
syms[i] = strsep(&p, ",");
ret = register_fprobe_syms(&sample_probe, syms, count);
kfree(syms);
kfree(symbuf);
out:
if (ret < 0)
pr_err("register_fprobe failed, returned %d\n", ret);
else
pr_info("Planted fprobe at %s\n", symbol);
return ret;
}
static void __exit fprobe_exit(void)
{
unregister_fprobe(&sample_probe);
pr_info("fprobe at %s unregistered\n", symbol);
}
module_init(fprobe_init)
module_exit(fprobe_exit)
MODULE_LICENSE("GPL");

View File

@ -418,6 +418,7 @@ int ima_file_mmap(struct file *file, unsigned long prot)
/**
* ima_file_mprotect - based on policy, limit mprotect change
* @vma: vm_area_struct protection is set to
* @prot: contains the protection that will be applied by the kernel.
*
* Files can be mmap'ed read/write and later changed to execute to circumvent
@ -519,20 +520,38 @@ int ima_file_check(struct file *file, int mask)
}
EXPORT_SYMBOL_GPL(ima_file_check);
static int __ima_inode_hash(struct inode *inode, char *buf, size_t buf_size)
static int __ima_inode_hash(struct inode *inode, struct file *file, char *buf,
size_t buf_size)
{
struct integrity_iint_cache *iint;
int hash_algo;
struct integrity_iint_cache *iint = NULL, tmp_iint;
int rc, hash_algo;
if (!ima_policy_flag)
return -EOPNOTSUPP;
if (ima_policy_flag) {
iint = integrity_iint_find(inode);
if (iint)
mutex_lock(&iint->mutex);
}
if ((!iint || !(iint->flags & IMA_COLLECTED)) && file) {
if (iint)
mutex_unlock(&iint->mutex);
memset(&tmp_iint, 0, sizeof(tmp_iint));
tmp_iint.inode = inode;
mutex_init(&tmp_iint.mutex);
rc = ima_collect_measurement(&tmp_iint, file, NULL, 0,
ima_hash_algo, NULL);
if (rc < 0)
return -EOPNOTSUPP;
iint = &tmp_iint;
mutex_lock(&iint->mutex);
}
iint = integrity_iint_find(inode);
if (!iint)
return -EOPNOTSUPP;
mutex_lock(&iint->mutex);
/*
* ima_file_hash can be called when ima_collect_measurement has still
* not been called, we might not always have a hash.
@ -551,12 +570,14 @@ static int __ima_inode_hash(struct inode *inode, char *buf, size_t buf_size)
hash_algo = iint->ima_hash->algo;
mutex_unlock(&iint->mutex);
if (iint == &tmp_iint)
kfree(iint->ima_hash);
return hash_algo;
}
/**
* ima_file_hash - return the stored measurement if a file has been hashed and
* is in the iint cache.
* ima_file_hash - return a measurement of the file
* @file: pointer to the file
* @buf: buffer in which to store the hash
* @buf_size: length of the buffer
@ -569,7 +590,7 @@ static int __ima_inode_hash(struct inode *inode, char *buf, size_t buf_size)
* The file hash returned is based on the entire file, including the appended
* signature.
*
* If IMA is disabled or if no measurement is available, return -EOPNOTSUPP.
* If the measurement cannot be performed, return -EOPNOTSUPP.
* If the parameters are incorrect, return -EINVAL.
*/
int ima_file_hash(struct file *file, char *buf, size_t buf_size)
@ -577,7 +598,7 @@ int ima_file_hash(struct file *file, char *buf, size_t buf_size)
if (!file)
return -EINVAL;
return __ima_inode_hash(file_inode(file), buf, buf_size);
return __ima_inode_hash(file_inode(file), file, buf, buf_size);
}
EXPORT_SYMBOL_GPL(ima_file_hash);
@ -604,14 +625,14 @@ int ima_inode_hash(struct inode *inode, char *buf, size_t buf_size)
if (!inode)
return -EINVAL;
return __ima_inode_hash(inode, buf, buf_size);
return __ima_inode_hash(inode, NULL, buf, buf_size);
}
EXPORT_SYMBOL_GPL(ima_inode_hash);
/**
* ima_post_create_tmpfile - mark newly created tmpfile as new
* @mnt_userns: user namespace of the mount the inode was found from
* @file : newly created tmpfile
* @mnt_userns: user namespace of the mount the inode was found from
* @inode: inode of the newly created tmpfile
*
* No measuring, appraising or auditing of newly created tmpfiles is needed.
* Skip calling process_measurement(), but indicate which newly, created
@ -643,7 +664,7 @@ void ima_post_create_tmpfile(struct user_namespace *mnt_userns,
/**
* ima_post_path_mknod - mark as a new inode
* @mnt_userns: user namespace of the mount the inode was found from
* @mnt_userns: user namespace of the mount the inode was found from
* @dentry: newly created dentry
*
* Mark files created via the mknodat syscall as new, so that the
@ -814,8 +835,8 @@ int ima_load_data(enum kernel_load_data_id id, bool contents)
* ima_post_load_data - appraise decision based on policy
* @buf: pointer to in memory file contents
* @size: size of in memory file contents
* @id: kernel load data caller identifier
* @description: @id-specific description of contents
* @load_id: kernel load data caller identifier
* @description: @load_id-specific description of contents
*
* Measure/appraise/audit in memory buffer based on policy. Policy rules
* are written in terms of a policy identifier.

View File

@ -25,6 +25,7 @@ GEN COMMANDS
| **bpftool** **gen object** *OUTPUT_FILE* *INPUT_FILE* [*INPUT_FILE*...]
| **bpftool** **gen skeleton** *FILE* [**name** *OBJECT_NAME*]
| **bpftool** **gen subskeleton** *FILE* [**name** *OBJECT_NAME*]
| **bpftool** **gen min_core_btf** *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...]
| **bpftool** **gen help**
@ -150,6 +151,30 @@ DESCRIPTION
(non-read-only) data from userspace, with same simplicity
as for BPF side.
**bpftool gen subskeleton** *FILE*
Generate BPF subskeleton C header file for a given *FILE*.
Subskeletons are similar to skeletons, except they do not own
the corresponding maps, programs, or global variables. They
require that the object file used to generate them is already
loaded into a *bpf_object* by some other means.
This functionality is useful when a library is included into a
larger BPF program. A subskeleton for the library would have
access to all objects and globals defined in it, without
having to know about the larger program.
Consequently, there are only two functions defined
for subskeletons:
- **example__open(bpf_object\*)**
Instantiates a subskeleton from an already opened (but not
necessarily loaded) **bpf_object**.
- **example__destroy()**
Frees the storage for the subskeleton but *does not* unload
any BPF programs or maps.
**bpftool** **gen min_core_btf** *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...]
Generate a minimum BTF file as *OUTPUT*, derived from a given
*INPUT* BTF file, containing all needed BTF types so one, or

View File

@ -20,7 +20,8 @@ SYNOPSIS
**bpftool** **version**
*OBJECT* := { **map** | **program** | **cgroup** | **perf** | **net** | **feature** }
*OBJECT* := { **map** | **program** | **link** | **cgroup** | **perf** | **net** | **feature** |
**btf** | **gen** | **struct_ops** | **iter** }
*OPTIONS* := { { **-V** | **--version** } | |COMMON_OPTIONS| }
@ -31,6 +32,8 @@ SYNOPSIS
*PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin** |
**load** | **attach** | **detach** | **help** }
*LINK-COMMANDS* := { **show** | **list** | **pin** | **detach** | **help** }
*CGROUP-COMMANDS* := { **show** | **list** | **attach** | **detach** | **help** }
*PERF-COMMANDS* := { **show** | **list** | **help** }
@ -39,6 +42,14 @@ SYNOPSIS
*FEATURE-COMMANDS* := { **probe** | **help** }
*BTF-COMMANDS* := { **show** | **list** | **dump** | **help** }
*GEN-COMMANDS* := { **object** | **skeleton** | **min_core_btf** | **help** }
*STRUCT-OPS-COMMANDS* := { **show** | **list** | **dump** | **register** | **unregister** | **help** }
*ITER-COMMANDS* := { **pin** | **help** }
DESCRIPTION
===========
*bpftool* allows for inspection and simple modification of BPF objects

View File

@ -1003,13 +1003,25 @@ _bpftool()
;;
esac
;;
subskeleton)
case $prev in
$command)
_filedir
return 0
;;
*)
_bpftool_once_attr 'name'
return 0
;;
esac
;;
min_core_btf)
_filedir
return 0
;;
*)
[[ $prev == $object ]] && \
COMPREPLY=( $( compgen -W 'object skeleton help min_core_btf' -- "$cur" ) )
COMPREPLY=( $( compgen -W 'object skeleton subskeleton help min_core_btf' -- "$cur" ) )
;;
esac
;;

View File

@ -56,7 +56,6 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = {
[BPF_CGROUP_UDP6_RECVMSG] = "recvmsg6",
[BPF_CGROUP_GETSOCKOPT] = "getsockopt",
[BPF_CGROUP_SETSOCKOPT] = "setsockopt",
[BPF_SK_SKB_STREAM_PARSER] = "sk_skb_stream_parser",
[BPF_SK_SKB_STREAM_VERDICT] = "sk_skb_stream_verdict",
[BPF_SK_SKB_VERDICT] = "sk_skb_verdict",
@ -76,6 +75,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = {
[BPF_SK_REUSEPORT_SELECT] = "sk_skb_reuseport_select",
[BPF_SK_REUSEPORT_SELECT_OR_MIGRATE] = "sk_skb_reuseport_select_or_migrate",
[BPF_PERF_EVENT] = "perf_event",
[BPF_TRACE_KPROBE_MULTI] = "trace_kprobe_multi",
};
void p_err(const char *fmt, ...)

View File

@ -3,6 +3,7 @@
#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
#include <net/if.h>
@ -45,6 +46,11 @@ static bool run_as_unprivileged;
/* Miscellaneous utility functions */
static bool grep(const char *buffer, const char *pattern)
{
return !!strstr(buffer, pattern);
}
static bool check_procfs(void)
{
struct statfs st_fs;
@ -135,6 +141,32 @@ static void print_end_section(void)
/* Probing functions */
static int get_vendor_id(int ifindex)
{
char ifname[IF_NAMESIZE], path[64], buf[8];
ssize_t len;
int fd;
if (!if_indextoname(ifindex, ifname))
return -1;
snprintf(path, sizeof(path), "/sys/class/net/%s/device/vendor", ifname);
fd = open(path, O_RDONLY | O_CLOEXEC);
if (fd < 0)
return -1;
len = read(fd, buf, sizeof(buf));
close(fd);
if (len < 0)
return -1;
if (len >= (ssize_t)sizeof(buf))
return -1;
buf[len] = '\0';
return strtol(buf, NULL, 0);
}
static int read_procfs(const char *path)
{
char *endptr, *line = NULL;
@ -478,6 +510,40 @@ static bool probe_bpf_syscall(const char *define_prefix)
return res;
}
static bool
probe_prog_load_ifindex(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, size_t insns_cnt,
char *log_buf, size_t log_buf_sz,
__u32 ifindex)
{
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.log_buf = log_buf,
.log_size = log_buf_sz,
.log_level = log_buf ? 1 : 0,
.prog_ifindex = ifindex,
);
int fd;
errno = 0;
fd = bpf_prog_load(prog_type, NULL, "GPL", insns, insns_cnt, &opts);
if (fd >= 0)
close(fd);
return fd >= 0 && errno != EINVAL && errno != EOPNOTSUPP;
}
static bool probe_prog_type_ifindex(enum bpf_prog_type prog_type, __u32 ifindex)
{
/* nfp returns -EINVAL on exit(0) with TC offload */
struct bpf_insn insns[2] = {
BPF_MOV64_IMM(BPF_REG_0, 2),
BPF_EXIT_INSN()
};
return probe_prog_load_ifindex(prog_type, insns, ARRAY_SIZE(insns),
NULL, 0, ifindex);
}
static void
probe_prog_type(enum bpf_prog_type prog_type, bool *supported_types,
const char *define_prefix, __u32 ifindex)
@ -488,11 +554,19 @@ probe_prog_type(enum bpf_prog_type prog_type, bool *supported_types,
bool res;
if (ifindex) {
p_info("BPF offload feature probing is not supported");
return;
switch (prog_type) {
case BPF_PROG_TYPE_SCHED_CLS:
case BPF_PROG_TYPE_XDP:
break;
default:
return;
}
res = probe_prog_type_ifindex(prog_type, ifindex);
} else {
res = libbpf_probe_bpf_prog_type(prog_type, NULL);
}
res = libbpf_probe_bpf_prog_type(prog_type, NULL);
#ifdef USE_LIBCAP
/* Probe may succeed even if program load fails, for unprivileged users
* check that we did not fail because of insufficient permissions
@ -521,6 +595,26 @@ probe_prog_type(enum bpf_prog_type prog_type, bool *supported_types,
define_prefix);
}
static bool probe_map_type_ifindex(enum bpf_map_type map_type, __u32 ifindex)
{
LIBBPF_OPTS(bpf_map_create_opts, opts);
int key_size, value_size, max_entries;
int fd;
opts.map_ifindex = ifindex;
key_size = sizeof(__u32);
value_size = sizeof(__u32);
max_entries = 1;
fd = bpf_map_create(map_type, NULL, key_size, value_size, max_entries,
&opts);
if (fd >= 0)
close(fd);
return fd >= 0;
}
static void
probe_map_type(enum bpf_map_type map_type, const char *define_prefix,
__u32 ifindex)
@ -531,11 +625,18 @@ probe_map_type(enum bpf_map_type map_type, const char *define_prefix,
bool res;
if (ifindex) {
p_info("BPF offload feature probing is not supported");
return;
}
switch (map_type) {
case BPF_MAP_TYPE_HASH:
case BPF_MAP_TYPE_ARRAY:
break;
default:
return;
}
res = libbpf_probe_bpf_map_type(map_type, NULL);
res = probe_map_type_ifindex(map_type, ifindex);
} else {
res = libbpf_probe_bpf_map_type(map_type, NULL);
}
/* Probe result depends on the success of map creation, no additional
* check required for unprivileged users
@ -559,6 +660,33 @@ probe_map_type(enum bpf_map_type map_type, const char *define_prefix,
define_prefix);
}
static bool
probe_helper_ifindex(enum bpf_func_id id, enum bpf_prog_type prog_type,
__u32 ifindex)
{
struct bpf_insn insns[2] = {
BPF_EMIT_CALL(id),
BPF_EXIT_INSN()
};
char buf[4096] = {};
bool res;
probe_prog_load_ifindex(prog_type, insns, ARRAY_SIZE(insns), buf,
sizeof(buf), ifindex);
res = !grep(buf, "invalid func ") && !grep(buf, "unknown func ");
switch (get_vendor_id(ifindex)) {
case 0x19ee: /* Netronome specific */
res = res && !grep(buf, "not supported by FW") &&
!grep(buf, "unsupported function id");
break;
default:
break;
}
return res;
}
static void
probe_helper_for_progtype(enum bpf_prog_type prog_type, bool supported_type,
const char *define_prefix, unsigned int id,
@ -567,12 +695,10 @@ probe_helper_for_progtype(enum bpf_prog_type prog_type, bool supported_type,
bool res = false;
if (supported_type) {
if (ifindex) {
p_info("BPF offload feature probing is not supported");
return;
}
res = libbpf_probe_bpf_helper(prog_type, id, NULL);
if (ifindex)
res = probe_helper_ifindex(id, prog_type, ifindex);
else
res = libbpf_probe_bpf_helper(prog_type, id, NULL);
#ifdef USE_LIBCAP
/* Probe may succeed even if program load fails, for
* unprivileged users check that we did not fail because of

View File

@ -64,11 +64,11 @@ static void get_obj_name(char *name, const char *file)
sanitize_identifier(name);
}
static void get_header_guard(char *guard, const char *obj_name)
static void get_header_guard(char *guard, const char *obj_name, const char *suffix)
{
int i;
sprintf(guard, "__%s_SKEL_H__", obj_name);
sprintf(guard, "__%s_%s__", obj_name, suffix);
for (i = 0; guard[i]; i++)
guard[i] = toupper(guard[i]);
}
@ -231,6 +231,17 @@ static const struct btf_type *find_type_for_map(struct btf *btf, const char *map
return NULL;
}
static bool is_internal_mmapable_map(const struct bpf_map *map, char *buf, size_t sz)
{
if (!bpf_map__is_internal(map) || !(bpf_map__map_flags(map) & BPF_F_MMAPABLE))
return false;
if (!get_map_ident(map, buf, sz))
return false;
return true;
}
static int codegen_datasecs(struct bpf_object *obj, const char *obj_name)
{
struct btf *btf = bpf_object__btf(obj);
@ -247,12 +258,7 @@ static int codegen_datasecs(struct bpf_object *obj, const char *obj_name)
bpf_object__for_each_map(map, obj) {
/* only generate definitions for memory-mapped internal maps */
if (!bpf_map__is_internal(map))
continue;
if (!(bpf_map__map_flags(map) & BPF_F_MMAPABLE))
continue;
if (!get_map_ident(map, map_ident, sizeof(map_ident)))
if (!is_internal_mmapable_map(map, map_ident, sizeof(map_ident)))
continue;
sec = find_type_for_map(btf, map_ident);
@ -280,6 +286,96 @@ out:
return err;
}
static bool btf_is_ptr_to_func_proto(const struct btf *btf,
const struct btf_type *v)
{
return btf_is_ptr(v) && btf_is_func_proto(btf__type_by_id(btf, v->type));
}
static int codegen_subskel_datasecs(struct bpf_object *obj, const char *obj_name)
{
struct btf *btf = bpf_object__btf(obj);
struct btf_dump *d;
struct bpf_map *map;
const struct btf_type *sec, *var;
const struct btf_var_secinfo *sec_var;
int i, err = 0, vlen;
char map_ident[256], sec_ident[256];
bool strip_mods = false, needs_typeof = false;
const char *sec_name, *var_name;
__u32 var_type_id;
d = btf_dump__new(btf, codegen_btf_dump_printf, NULL, NULL);
if (!d)
return -errno;
bpf_object__for_each_map(map, obj) {
/* only generate definitions for memory-mapped internal maps */
if (!is_internal_mmapable_map(map, map_ident, sizeof(map_ident)))
continue;
sec = find_type_for_map(btf, map_ident);
if (!sec)
continue;
sec_name = btf__name_by_offset(btf, sec->name_off);
if (!get_datasec_ident(sec_name, sec_ident, sizeof(sec_ident)))
continue;
strip_mods = strcmp(sec_name, ".kconfig") != 0;
printf(" struct %s__%s {\n", obj_name, sec_ident);
sec_var = btf_var_secinfos(sec);
vlen = btf_vlen(sec);
for (i = 0; i < vlen; i++, sec_var++) {
DECLARE_LIBBPF_OPTS(btf_dump_emit_type_decl_opts, opts,
.indent_level = 2,
.strip_mods = strip_mods,
/* we'll print the name separately */
.field_name = "",
);
var = btf__type_by_id(btf, sec_var->type);
var_name = btf__name_by_offset(btf, var->name_off);
var_type_id = var->type;
/* static variables are not exposed through BPF skeleton */
if (btf_var(var)->linkage == BTF_VAR_STATIC)
continue;
/* The datasec member has KIND_VAR but we want the
* underlying type of the variable (e.g. KIND_INT).
*/
var = skip_mods_and_typedefs(btf, var->type, NULL);
printf("\t\t");
/* Func and array members require special handling.
* Instead of producing `typename *var`, they produce
* `typeof(typename) *var`. This allows us to keep a
* similar syntax where the identifier is just prefixed
* by *, allowing us to ignore C declaration minutiae.
*/
needs_typeof = btf_is_array(var) || btf_is_ptr_to_func_proto(btf, var);
if (needs_typeof)
printf("typeof(");
err = btf_dump__emit_type_decl(d, var_type_id, &opts);
if (err)
goto out;
if (needs_typeof)
printf(")");
printf(" *%s;\n", var_name);
}
printf(" } %s;\n", sec_ident);
}
out:
btf_dump__free(d);
return err;
}
static void codegen(const char *template, ...)
{
const char *src, *end;
@ -389,11 +485,7 @@ static void codegen_asserts(struct bpf_object *obj, const char *obj_name)
", obj_name);
bpf_object__for_each_map(map, obj) {
if (!bpf_map__is_internal(map))
continue;
if (!(bpf_map__map_flags(map) & BPF_F_MMAPABLE))
continue;
if (!get_map_ident(map, map_ident, sizeof(map_ident)))
if (!is_internal_mmapable_map(map, map_ident, sizeof(map_ident)))
continue;
sec = find_type_for_map(btf, map_ident);
@ -608,11 +700,7 @@ static int gen_trace(struct bpf_object *obj, const char *obj_name, const char *h
const void *mmap_data = NULL;
size_t mmap_size = 0;
if (!get_map_ident(map, ident, sizeof(ident)))
continue;
if (!bpf_map__is_internal(map) ||
!(bpf_map__map_flags(map) & BPF_F_MMAPABLE))
if (!is_internal_mmapable_map(map, ident, sizeof(ident)))
continue;
codegen("\
@ -671,11 +759,7 @@ static int gen_trace(struct bpf_object *obj, const char *obj_name, const char *h
bpf_object__for_each_map(map, obj) {
const char *mmap_flags;
if (!get_map_ident(map, ident, sizeof(ident)))
continue;
if (!bpf_map__is_internal(map) ||
!(bpf_map__map_flags(map) & BPF_F_MMAPABLE))
if (!is_internal_mmapable_map(map, ident, sizeof(ident)))
continue;
if (bpf_map__map_flags(map) & BPF_F_RDONLY_PROG)
@ -727,10 +811,95 @@ out:
return err;
}
static void
codegen_maps_skeleton(struct bpf_object *obj, size_t map_cnt, bool mmaped)
{
struct bpf_map *map;
char ident[256];
size_t i;
if (!map_cnt)
return;
codegen("\
\n\
\n\
/* maps */ \n\
s->map_cnt = %zu; \n\
s->map_skel_sz = sizeof(*s->maps); \n\
s->maps = (struct bpf_map_skeleton *)calloc(s->map_cnt, s->map_skel_sz);\n\
if (!s->maps) \n\
goto err; \n\
",
map_cnt
);
i = 0;
bpf_object__for_each_map(map, obj) {
if (!get_map_ident(map, ident, sizeof(ident)))
continue;
codegen("\
\n\
\n\
s->maps[%zu].name = \"%s\"; \n\
s->maps[%zu].map = &obj->maps.%s; \n\
",
i, bpf_map__name(map), i, ident);
/* memory-mapped internal maps */
if (mmaped && is_internal_mmapable_map(map, ident, sizeof(ident))) {
printf("\ts->maps[%zu].mmaped = (void **)&obj->%s;\n",
i, ident);
}
i++;
}
}
static void
codegen_progs_skeleton(struct bpf_object *obj, size_t prog_cnt, bool populate_links)
{
struct bpf_program *prog;
int i;
if (!prog_cnt)
return;
codegen("\
\n\
\n\
/* programs */ \n\
s->prog_cnt = %zu; \n\
s->prog_skel_sz = sizeof(*s->progs); \n\
s->progs = (struct bpf_prog_skeleton *)calloc(s->prog_cnt, s->prog_skel_sz);\n\
if (!s->progs) \n\
goto err; \n\
",
prog_cnt
);
i = 0;
bpf_object__for_each_program(prog, obj) {
codegen("\
\n\
\n\
s->progs[%1$zu].name = \"%2$s\"; \n\
s->progs[%1$zu].prog = &obj->progs.%2$s;\n\
",
i, bpf_program__name(prog));
if (populate_links) {
codegen("\
\n\
s->progs[%1$zu].link = &obj->links.%2$s;\n\
",
i, bpf_program__name(prog));
}
i++;
}
}
static int do_skeleton(int argc, char **argv)
{
char header_guard[MAX_OBJ_NAME_LEN + sizeof("__SKEL_H__")];
size_t i, map_cnt = 0, prog_cnt = 0, file_sz, mmap_sz;
size_t map_cnt = 0, prog_cnt = 0, file_sz, mmap_sz;
DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts);
char obj_name[MAX_OBJ_NAME_LEN] = "", *obj_data;
struct bpf_object *obj = NULL;
@ -821,7 +990,7 @@ static int do_skeleton(int argc, char **argv)
prog_cnt++;
}
get_header_guard(header_guard, obj_name);
get_header_guard(header_guard, obj_name, "SKEL_H");
if (use_loader) {
codegen("\
\n\
@ -1024,66 +1193,10 @@ static int do_skeleton(int argc, char **argv)
",
obj_name
);
if (map_cnt) {
codegen("\
\n\
\n\
/* maps */ \n\
s->map_cnt = %zu; \n\
s->map_skel_sz = sizeof(*s->maps); \n\
s->maps = (struct bpf_map_skeleton *)calloc(s->map_cnt, s->map_skel_sz);\n\
if (!s->maps) \n\
goto err; \n\
",
map_cnt
);
i = 0;
bpf_object__for_each_map(map, obj) {
if (!get_map_ident(map, ident, sizeof(ident)))
continue;
codegen("\
\n\
\n\
s->maps[%zu].name = \"%s\"; \n\
s->maps[%zu].map = &obj->maps.%s; \n\
",
i, bpf_map__name(map), i, ident);
/* memory-mapped internal maps */
if (bpf_map__is_internal(map) &&
(bpf_map__map_flags(map) & BPF_F_MMAPABLE)) {
printf("\ts->maps[%zu].mmaped = (void **)&obj->%s;\n",
i, ident);
}
i++;
}
}
if (prog_cnt) {
codegen("\
\n\
\n\
/* programs */ \n\
s->prog_cnt = %zu; \n\
s->prog_skel_sz = sizeof(*s->progs); \n\
s->progs = (struct bpf_prog_skeleton *)calloc(s->prog_cnt, s->prog_skel_sz);\n\
if (!s->progs) \n\
goto err; \n\
",
prog_cnt
);
i = 0;
bpf_object__for_each_program(prog, obj) {
codegen("\
\n\
\n\
s->progs[%1$zu].name = \"%2$s\"; \n\
s->progs[%1$zu].prog = &obj->progs.%2$s;\n\
s->progs[%1$zu].link = &obj->links.%2$s;\n\
",
i, bpf_program__name(prog));
i++;
}
}
codegen_maps_skeleton(obj, map_cnt, true /*mmaped*/);
codegen_progs_skeleton(obj, prog_cnt, true /*populate_links*/);
codegen("\
\n\
\n\
@ -1141,6 +1254,310 @@ out:
return err;
}
/* Subskeletons are like skeletons, except they don't own the bpf_object,
* associated maps, links, etc. Instead, they know about the existence of
* variables, maps, programs and are able to find their locations
* _at runtime_ from an already loaded bpf_object.
*
* This allows for library-like BPF objects to have userspace counterparts
* with access to their own items without having to know anything about the
* final BPF object that the library was linked into.
*/
static int do_subskeleton(int argc, char **argv)
{
char header_guard[MAX_OBJ_NAME_LEN + sizeof("__SUBSKEL_H__")];
size_t i, len, file_sz, map_cnt = 0, prog_cnt = 0, mmap_sz, var_cnt = 0, var_idx = 0;
DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts);
char obj_name[MAX_OBJ_NAME_LEN] = "", *obj_data;
struct bpf_object *obj = NULL;
const char *file, *var_name;
char ident[256];
int fd, err = -1, map_type_id;
const struct bpf_map *map;
struct bpf_program *prog;
struct btf *btf;
const struct btf_type *map_type, *var_type;
const struct btf_var_secinfo *var;
struct stat st;
if (!REQ_ARGS(1)) {
usage();
return -1;
}
file = GET_ARG();
while (argc) {
if (!REQ_ARGS(2))
return -1;
if (is_prefix(*argv, "name")) {
NEXT_ARG();
if (obj_name[0] != '\0') {
p_err("object name already specified");
return -1;
}
strncpy(obj_name, *argv, MAX_OBJ_NAME_LEN - 1);
obj_name[MAX_OBJ_NAME_LEN - 1] = '\0';
} else {
p_err("unknown arg %s", *argv);
return -1;
}
NEXT_ARG();
}
if (argc) {
p_err("extra unknown arguments");
return -1;
}
if (use_loader) {
p_err("cannot use loader for subskeletons");
return -1;
}
if (stat(file, &st)) {
p_err("failed to stat() %s: %s", file, strerror(errno));
return -1;
}
file_sz = st.st_size;
mmap_sz = roundup(file_sz, sysconf(_SC_PAGE_SIZE));
fd = open(file, O_RDONLY);
if (fd < 0) {
p_err("failed to open() %s: %s", file, strerror(errno));
return -1;
}
obj_data = mmap(NULL, mmap_sz, PROT_READ, MAP_PRIVATE, fd, 0);
if (obj_data == MAP_FAILED) {
obj_data = NULL;
p_err("failed to mmap() %s: %s", file, strerror(errno));
goto out;
}
if (obj_name[0] == '\0')
get_obj_name(obj_name, file);
/* The empty object name allows us to use bpf_map__name and produce
* ELF section names out of it. (".data" instead of "obj.data")
*/
opts.object_name = "";
obj = bpf_object__open_mem(obj_data, file_sz, &opts);
if (!obj) {
char err_buf[256];
libbpf_strerror(errno, err_buf, sizeof(err_buf));
p_err("failed to open BPF object file: %s", err_buf);
obj = NULL;
goto out;
}
btf = bpf_object__btf(obj);
if (!btf) {
err = -1;
p_err("need btf type information for %s", obj_name);
goto out;
}
bpf_object__for_each_program(prog, obj) {
prog_cnt++;
}
/* First, count how many variables we have to find.
* We need this in advance so the subskel can allocate the right
* amount of storage.
*/
bpf_object__for_each_map(map, obj) {
if (!get_map_ident(map, ident, sizeof(ident)))
continue;
/* Also count all maps that have a name */
map_cnt++;
if (!is_internal_mmapable_map(map, ident, sizeof(ident)))
continue;
map_type_id = bpf_map__btf_value_type_id(map);
if (map_type_id <= 0) {
err = map_type_id;
goto out;
}
map_type = btf__type_by_id(btf, map_type_id);
var = btf_var_secinfos(map_type);
len = btf_vlen(map_type);
for (i = 0; i < len; i++, var++) {
var_type = btf__type_by_id(btf, var->type);
if (btf_var(var_type)->linkage == BTF_VAR_STATIC)
continue;
var_cnt++;
}
}
get_header_guard(header_guard, obj_name, "SUBSKEL_H");
codegen("\
\n\
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ \n\
\n\
/* THIS FILE IS AUTOGENERATED! */ \n\
#ifndef %2$s \n\
#define %2$s \n\
\n\
#include <errno.h> \n\
#include <stdlib.h> \n\
#include <bpf/libbpf.h> \n\
\n\
struct %1$s { \n\
struct bpf_object *obj; \n\
struct bpf_object_subskeleton *subskel; \n\
", obj_name, header_guard);
if (map_cnt) {
printf("\tstruct {\n");
bpf_object__for_each_map(map, obj) {
if (!get_map_ident(map, ident, sizeof(ident)))
continue;
printf("\t\tstruct bpf_map *%s;\n", ident);
}
printf("\t} maps;\n");
}
if (prog_cnt) {
printf("\tstruct {\n");
bpf_object__for_each_program(prog, obj) {
printf("\t\tstruct bpf_program *%s;\n",
bpf_program__name(prog));
}
printf("\t} progs;\n");
}
err = codegen_subskel_datasecs(obj, obj_name);
if (err)
goto out;
/* emit code that will allocate enough storage for all symbols */
codegen("\
\n\
\n\
#ifdef __cplusplus \n\
static inline struct %1$s *open(const struct bpf_object *src);\n\
static inline void destroy(struct %1$s *skel); \n\
#endif /* __cplusplus */ \n\
}; \n\
\n\
static inline void \n\
%1$s__destroy(struct %1$s *skel) \n\
{ \n\
if (!skel) \n\
return; \n\
if (skel->subskel) \n\
bpf_object__destroy_subskeleton(skel->subskel);\n\
free(skel); \n\
} \n\
\n\
static inline struct %1$s * \n\
%1$s__open(const struct bpf_object *src) \n\
{ \n\
struct %1$s *obj; \n\
struct bpf_object_subskeleton *s; \n\
int err; \n\
\n\
obj = (struct %1$s *)calloc(1, sizeof(*obj)); \n\
if (!obj) { \n\
errno = ENOMEM; \n\
goto err; \n\
} \n\
s = (struct bpf_object_subskeleton *)calloc(1, sizeof(*s));\n\
if (!s) { \n\
errno = ENOMEM; \n\
goto err; \n\
} \n\
s->sz = sizeof(*s); \n\
s->obj = src; \n\
s->var_skel_sz = sizeof(*s->vars); \n\
obj->subskel = s; \n\
\n\
/* vars */ \n\
s->var_cnt = %2$d; \n\
s->vars = (struct bpf_var_skeleton *)calloc(%2$d, sizeof(*s->vars));\n\
if (!s->vars) { \n\
errno = ENOMEM; \n\
goto err; \n\
} \n\
",
obj_name, var_cnt
);
/* walk through each symbol and emit the runtime representation */
bpf_object__for_each_map(map, obj) {
if (!is_internal_mmapable_map(map, ident, sizeof(ident)))
continue;
map_type_id = bpf_map__btf_value_type_id(map);
if (map_type_id <= 0)
/* skip over internal maps with no type*/
continue;
map_type = btf__type_by_id(btf, map_type_id);
var = btf_var_secinfos(map_type);
len = btf_vlen(map_type);
for (i = 0; i < len; i++, var++) {
var_type = btf__type_by_id(btf, var->type);
var_name = btf__name_by_offset(btf, var_type->name_off);
if (btf_var(var_type)->linkage == BTF_VAR_STATIC)
continue;
/* Note that we use the dot prefix in .data as the
* field access operator i.e. maps%s becomes maps.data
*/
codegen("\
\n\
\n\
s->vars[%3$d].name = \"%1$s\"; \n\
s->vars[%3$d].map = &obj->maps.%2$s; \n\
s->vars[%3$d].addr = (void **) &obj->%2$s.%1$s;\n\
", var_name, ident, var_idx);
var_idx++;
}
}
codegen_maps_skeleton(obj, map_cnt, false /*mmaped*/);
codegen_progs_skeleton(obj, prog_cnt, false /*links*/);
codegen("\
\n\
\n\
err = bpf_object__open_subskeleton(s); \n\
if (err) \n\
goto err; \n\
\n\
return obj; \n\
err: \n\
%1$s__destroy(obj); \n\
return NULL; \n\
} \n\
\n\
#ifdef __cplusplus \n\
struct %1$s *%1$s::open(const struct bpf_object *src) { return %1$s__open(src); }\n\
void %1$s::destroy(struct %1$s *skel) { %1$s__destroy(skel); }\n\
#endif /* __cplusplus */ \n\
\n\
#endif /* %2$s */ \n\
",
obj_name, header_guard);
err = 0;
out:
bpf_object__close(obj);
if (obj_data)
munmap(obj_data, mmap_sz);
close(fd);
return err;
}
static int do_object(int argc, char **argv)
{
struct bpf_linker *linker;
@ -1192,6 +1609,7 @@ static int do_help(int argc, char **argv)
fprintf(stderr,
"Usage: %1$s %2$s object OUTPUT_FILE INPUT_FILE [INPUT_FILE...]\n"
" %1$s %2$s skeleton FILE [name OBJECT_NAME]\n"
" %1$s %2$s subskeleton FILE [name OBJECT_NAME]\n"
" %1$s %2$s min_core_btf INPUT OUTPUT OBJECT [OBJECT...]\n"
" %1$s %2$s help\n"
"\n"
@ -1788,6 +2206,7 @@ static int do_min_core_btf(int argc, char **argv)
static const struct cmd cmds[] = {
{ "object", do_object },
{ "skeleton", do_skeleton },
{ "subskeleton", do_subskeleton },
{ "min_core_btf", do_min_core_btf},
{ "help", do_help },
{ 0 }

View File

@ -113,7 +113,9 @@ struct obj_ref {
struct obj_refs {
int ref_cnt;
bool has_bpf_cookie;
struct obj_ref *refs;
__u64 bpf_cookie;
};
struct btf;

View File

@ -504,7 +504,7 @@ static int show_map_close_json(int fd, struct bpf_map_info *info)
jsonw_uint_field(json_wtr, "max_entries", info->max_entries);
if (memlock)
jsonw_int_field(json_wtr, "bytes_memlock", atoi(memlock));
jsonw_int_field(json_wtr, "bytes_memlock", atoll(memlock));
free(memlock);
if (info->type == BPF_MAP_TYPE_PROG_ARRAY) {
@ -620,17 +620,14 @@ static int show_map_close_plain(int fd, struct bpf_map_info *info)
u32_as_hash_field(info->id))
printf("\n\tpinned %s", (char *)entry->value);
}
printf("\n");
if (frozen_str) {
frozen = atoi(frozen_str);
free(frozen_str);
}
if (!info->btf_id && !frozen)
return 0;
printf("\t");
if (info->btf_id || frozen)
printf("\n\t");
if (info->btf_id)
printf("btf_id %d", info->btf_id);

View File

@ -78,6 +78,8 @@ static void add_ref(struct hashmap *map, struct pid_iter_entry *e)
ref->pid = e->pid;
memcpy(ref->comm, e->comm, sizeof(ref->comm));
refs->ref_cnt = 1;
refs->has_bpf_cookie = e->has_bpf_cookie;
refs->bpf_cookie = e->bpf_cookie;
err = hashmap__append(map, u32_as_hash_field(e->id), refs);
if (err)
@ -205,6 +207,9 @@ void emit_obj_refs_json(struct hashmap *map, __u32 id,
if (refs->ref_cnt == 0)
break;
if (refs->has_bpf_cookie)
jsonw_lluint_field(json_writer, "bpf_cookie", refs->bpf_cookie);
jsonw_name(json_writer, "pids");
jsonw_start_array(json_writer);
for (i = 0; i < refs->ref_cnt; i++) {
@ -234,6 +239,9 @@ void emit_obj_refs_plain(struct hashmap *map, __u32 id, const char *prefix)
if (refs->ref_cnt == 0)
break;
if (refs->has_bpf_cookie)
printf("\n\tbpf_cookie %llu", (unsigned long long) refs->bpf_cookie);
printf("%s", prefix);
for (i = 0; i < refs->ref_cnt; i++) {
struct obj_ref *ref = &refs->refs[i];

View File

@ -485,7 +485,7 @@ static void print_prog_json(struct bpf_prog_info *info, int fd)
memlock = get_fdinfo(fd, "memlock");
if (memlock)
jsonw_int_field(json_wtr, "bytes_memlock", atoi(memlock));
jsonw_int_field(json_wtr, "bytes_memlock", atoll(memlock));
free(memlock);
if (info->nr_map_ids)

View File

@ -38,6 +38,17 @@ static __always_inline __u32 get_obj_id(void *ent, enum bpf_obj_type type)
}
}
/* could be used only with BPF_LINK_TYPE_PERF_EVENT links */
static __u64 get_bpf_cookie(struct bpf_link *link)
{
struct bpf_perf_link *perf_link;
struct perf_event *event;
perf_link = container_of(link, struct bpf_perf_link, link);
event = BPF_CORE_READ(perf_link, perf_file, private_data);
return BPF_CORE_READ(event, bpf_cookie);
}
SEC("iter/task_file")
int iter(struct bpf_iter__task_file *ctx)
{
@ -69,8 +80,19 @@ int iter(struct bpf_iter__task_file *ctx)
if (file->f_op != fops)
return 0;
__builtin_memset(&e, 0, sizeof(e));
e.pid = task->tgid;
e.id = get_obj_id(file->private_data, obj_type);
if (obj_type == BPF_OBJ_LINK) {
struct bpf_link *link = (struct bpf_link *) file->private_data;
if (BPF_CORE_READ(link, type) == BPF_LINK_TYPE_PERF_EVENT) {
e.has_bpf_cookie = true;
e.bpf_cookie = get_bpf_cookie(link);
}
}
bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm),
task->group_leader->comm);
bpf_seq_write(ctx->meta->seq, &e, sizeof(e));

View File

@ -6,6 +6,8 @@
struct pid_iter_entry {
__u32 id;
int pid;
__u64 bpf_cookie;
bool has_bpf_cookie;
char comm[16];
};

View File

@ -997,6 +997,7 @@ enum bpf_attach_type {
BPF_SK_REUSEPORT_SELECT,
BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
BPF_PERF_EVENT,
BPF_TRACE_KPROBE_MULTI,
__MAX_BPF_ATTACH_TYPE
};
@ -1011,6 +1012,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_NETNS = 5,
BPF_LINK_TYPE_XDP = 6,
BPF_LINK_TYPE_PERF_EVENT = 7,
BPF_LINK_TYPE_KPROBE_MULTI = 8,
MAX_BPF_LINK_TYPE,
};
@ -1118,6 +1120,11 @@ enum bpf_link_type {
*/
#define BPF_F_XDP_HAS_FRAGS (1U << 5)
/* link_create.kprobe_multi.flags used in LINK_CREATE command for
* BPF_TRACE_KPROBE_MULTI attach type to create return probe.
*/
#define BPF_F_KPROBE_MULTI_RETURN (1U << 0)
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions:
*
@ -1232,6 +1239,8 @@ enum {
/* If set, run the test on the cpu specified by bpf_attr.test.cpu */
#define BPF_F_TEST_RUN_ON_CPU (1U << 0)
/* If set, XDP frames will be transmitted after processing */
#define BPF_F_TEST_XDP_LIVE_FRAMES (1U << 1)
/* type for BPF_ENABLE_STATS */
enum bpf_stats_type {
@ -1393,6 +1402,7 @@ union bpf_attr {
__aligned_u64 ctx_out;
__u32 flags;
__u32 cpu;
__u32 batch_size;
} test;
struct { /* anonymous struct used by BPF_*_GET_*_ID */
@ -1472,6 +1482,13 @@ union bpf_attr {
*/
__u64 bpf_cookie;
} perf_event;
struct {
__u32 flags;
__u32 cnt;
__aligned_u64 syms;
__aligned_u64 addrs;
__aligned_u64 cookies;
} kprobe_multi;
};
} link_create;
@ -2299,8 +2316,8 @@ union bpf_attr {
* Return
* The return value depends on the result of the test, and can be:
*
* * 0, if current task belongs to the cgroup2.
* * 1, if current task does not belong to the cgroup2.
* * 1, if current task belongs to the cgroup2.
* * 0, if current task does not belong to the cgroup2.
* * A negative error code, if an error occurred.
*
* long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
@ -5087,23 +5104,22 @@ union bpf_attr {
* 0 on success, or a negative error in case of failure. On error
* *dst* buffer is zeroed out.
*
* long bpf_skb_set_delivery_time(struct sk_buff *skb, u64 dtime, u32 dtime_type)
* long bpf_skb_set_tstamp(struct sk_buff *skb, u64 tstamp, u32 tstamp_type)
* Description
* Set a *dtime* (delivery time) to the __sk_buff->tstamp and also
* change the __sk_buff->delivery_time_type to *dtime_type*.
* Change the __sk_buff->tstamp_type to *tstamp_type*
* and set *tstamp* to the __sk_buff->tstamp together.
*
* When setting a delivery time (non zero *dtime*) to
* __sk_buff->tstamp, only BPF_SKB_DELIVERY_TIME_MONO *dtime_type*
* is supported. It is the only delivery_time_type that will be
* kept after bpf_redirect_*().
*
* If there is no need to change the __sk_buff->delivery_time_type,
* the delivery time can be directly written to __sk_buff->tstamp
* If there is no need to change the __sk_buff->tstamp_type,
* the tstamp value can be directly written to __sk_buff->tstamp
* instead.
*
* *dtime* 0 and *dtime_type* BPF_SKB_DELIVERY_TIME_NONE
* can be used to clear any delivery time stored in
* __sk_buff->tstamp.
* BPF_SKB_TSTAMP_DELIVERY_MONO is the only tstamp that
* will be kept during bpf_redirect_*(). A non zero
* *tstamp* must be used with the BPF_SKB_TSTAMP_DELIVERY_MONO
* *tstamp_type*.
*
* A BPF_SKB_TSTAMP_UNSPEC *tstamp_type* can only be used
* with a zero *tstamp*.
*
* Only IPv4 and IPv6 skb->protocol are supported.
*
@ -5116,7 +5132,17 @@ union bpf_attr {
* Return
* 0 on success.
* **-EINVAL** for invalid input
* **-EOPNOTSUPP** for unsupported delivery_time_type and protocol
* **-EOPNOTSUPP** for unsupported protocol
*
* long bpf_ima_file_hash(struct file *file, void *dst, u32 size)
* Description
* Returns a calculated IMA hash of the *file*.
* If the hash is larger than *size*, then only *size*
* bytes will be copied to *dst*
* Return
* The **hash_algo** is returned on success,
* **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
* invalid arguments are passed.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@ -5311,7 +5337,8 @@ union bpf_attr {
FN(xdp_load_bytes), \
FN(xdp_store_bytes), \
FN(copy_from_user_task), \
FN(skb_set_delivery_time), \
FN(skb_set_tstamp), \
FN(ima_file_hash), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@ -5502,9 +5529,12 @@ union { \
} __attribute__((aligned(8)))
enum {
BPF_SKB_DELIVERY_TIME_NONE,
BPF_SKB_DELIVERY_TIME_UNSPEC,
BPF_SKB_DELIVERY_TIME_MONO,
BPF_SKB_TSTAMP_UNSPEC,
BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
/* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
* the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
* and try to deduce it by ingress, egress or skb->sk->sk_clockid.
*/
};
/* user accessible mirror of in-kernel sk_buff.
@ -5547,7 +5577,7 @@ struct __sk_buff {
__u32 gso_segs;
__bpf_md_ptr(struct bpf_sock *, sk);
__u32 gso_size;
__u8 delivery_time_type;
__u8 tstamp_type;
__u32 :24; /* Padding, future use. */
__u64 hwtstamp;
};

View File

@ -29,6 +29,7 @@
#include <errno.h>
#include <linux/bpf.h>
#include <linux/filter.h>
#include <linux/kernel.h>
#include <limits.h>
#include <sys/resource.h>
#include "bpf.h"
@ -111,7 +112,7 @@ int probe_memcg_account(void)
BPF_EMIT_CALL(BPF_FUNC_ktime_get_coarse_ns),
BPF_EXIT_INSN(),
};
size_t insn_cnt = sizeof(insns) / sizeof(insns[0]);
size_t insn_cnt = ARRAY_SIZE(insns);
union bpf_attr attr;
int prog_fd;
@ -853,6 +854,15 @@ int bpf_link_create(int prog_fd, int target_fd,
if (!OPTS_ZEROED(opts, perf_event))
return libbpf_err(-EINVAL);
break;
case BPF_TRACE_KPROBE_MULTI:
attr.link_create.kprobe_multi.flags = OPTS_GET(opts, kprobe_multi.flags, 0);
attr.link_create.kprobe_multi.cnt = OPTS_GET(opts, kprobe_multi.cnt, 0);
attr.link_create.kprobe_multi.syms = ptr_to_u64(OPTS_GET(opts, kprobe_multi.syms, 0));
attr.link_create.kprobe_multi.addrs = ptr_to_u64(OPTS_GET(opts, kprobe_multi.addrs, 0));
attr.link_create.kprobe_multi.cookies = ptr_to_u64(OPTS_GET(opts, kprobe_multi.cookies, 0));
if (!OPTS_ZEROED(opts, kprobe_multi))
return libbpf_err(-EINVAL);
break;
default:
if (!OPTS_ZEROED(opts, flags))
return libbpf_err(-EINVAL);
@ -994,6 +1004,7 @@ int bpf_prog_test_run_opts(int prog_fd, struct bpf_test_run_opts *opts)
memset(&attr, 0, sizeof(attr));
attr.test.prog_fd = prog_fd;
attr.test.batch_size = OPTS_GET(opts, batch_size, 0);
attr.test.cpu = OPTS_GET(opts, cpu, 0);
attr.test.flags = OPTS_GET(opts, flags, 0);
attr.test.repeat = OPTS_GET(opts, repeat, 0);

View File

@ -413,10 +413,17 @@ struct bpf_link_create_opts {
struct {
__u64 bpf_cookie;
} perf_event;
struct {
__u32 flags;
__u32 cnt;
const char **syms;
const unsigned long *addrs;
const __u64 *cookies;
} kprobe_multi;
};
size_t :0;
};
#define bpf_link_create_opts__last_field perf_event
#define bpf_link_create_opts__last_field kprobe_multi.cookies
LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
enum bpf_attach_type attach_type,
@ -512,8 +519,9 @@ struct bpf_test_run_opts {
__u32 duration; /* out: average per repetition in ns */
__u32 flags;
__u32 cpu;
__u32 batch_size;
};
#define bpf_test_run_opts__last_field cpu
#define bpf_test_run_opts__last_field batch_size
LIBBPF_API int bpf_prog_test_run_opts(int prog_fd,
struct bpf_test_run_opts *opts);

File diff suppressed because it is too large Load Diff

View File

@ -425,6 +425,29 @@ bpf_program__attach_kprobe_opts(const struct bpf_program *prog,
const char *func_name,
const struct bpf_kprobe_opts *opts);
struct bpf_kprobe_multi_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
/* array of function symbols to attach */
const char **syms;
/* array of function addresses to attach */
const unsigned long *addrs;
/* array of user-provided values fetchable through bpf_get_attach_cookie */
const __u64 *cookies;
/* number of elements in syms/addrs/cookies arrays */
size_t cnt;
/* create return kprobes */
bool retprobe;
size_t :0;
};
#define bpf_kprobe_multi_opts__last_field retprobe
LIBBPF_API struct bpf_link *
bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog,
const char *pattern,
const struct bpf_kprobe_multi_opts *opts);
struct bpf_uprobe_opts {
/* size of this struct, for forward/backward compatiblity */
size_t sz;
@ -1289,6 +1312,35 @@ LIBBPF_API int bpf_object__attach_skeleton(struct bpf_object_skeleton *s);
LIBBPF_API void bpf_object__detach_skeleton(struct bpf_object_skeleton *s);
LIBBPF_API void bpf_object__destroy_skeleton(struct bpf_object_skeleton *s);
struct bpf_var_skeleton {
const char *name;
struct bpf_map **map;
void **addr;
};
struct bpf_object_subskeleton {
size_t sz; /* size of this struct, for forward/backward compatibility */
const struct bpf_object *obj;
int map_cnt;
int map_skel_sz; /* sizeof(struct bpf_map_skeleton) */
struct bpf_map_skeleton *maps;
int prog_cnt;
int prog_skel_sz; /* sizeof(struct bpf_prog_skeleton) */
struct bpf_prog_skeleton *progs;
int var_cnt;
int var_skel_sz; /* sizeof(struct bpf_var_skeleton) */
struct bpf_var_skeleton *vars;
};
LIBBPF_API int
bpf_object__open_subskeleton(struct bpf_object_subskeleton *s);
LIBBPF_API void
bpf_object__destroy_subskeleton(struct bpf_object_subskeleton *s);
struct gen_loader_opts {
size_t sz; /* size of this struct, for forward/backward compatiblity */
const char *data;
@ -1328,6 +1380,115 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker,
LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
/*
* Custom handling of BPF program's SEC() definitions
*/
struct bpf_prog_load_opts; /* defined in bpf.h */
/* Called during bpf_object__open() for each recognized BPF program. Callback
* can use various bpf_program__set_*() setters to adjust whatever properties
* are necessary.
*/
typedef int (*libbpf_prog_setup_fn_t)(struct bpf_program *prog, long cookie);
/* Called right before libbpf performs bpf_prog_load() to load BPF program
* into the kernel. Callback can adjust opts as necessary.
*/
typedef int (*libbpf_prog_prepare_load_fn_t)(struct bpf_program *prog,
struct bpf_prog_load_opts *opts, long cookie);
/* Called during skeleton attach or through bpf_program__attach(). If
* auto-attach is not supported, callback should return 0 and set link to
* NULL (it's not considered an error during skeleton attach, but it will be
* an error for bpf_program__attach() calls). On error, error should be
* returned directly and link set to NULL. On success, return 0 and set link
* to a valid struct bpf_link.
*/
typedef int (*libbpf_prog_attach_fn_t)(const struct bpf_program *prog, long cookie,
struct bpf_link **link);
struct libbpf_prog_handler_opts {
/* size of this struct, for forward/backward compatiblity */
size_t sz;
/* User-provided value that is passed to prog_setup_fn,
* prog_prepare_load_fn, and prog_attach_fn callbacks. Allows user to
* register one set of callbacks for multiple SEC() definitions and
* still be able to distinguish them, if necessary. For example,
* libbpf itself is using this to pass necessary flags (e.g.,
* sleepable flag) to a common internal SEC() handler.
*/
long cookie;
/* BPF program initialization callback (see libbpf_prog_setup_fn_t).
* Callback is optional, pass NULL if it's not necessary.
*/
libbpf_prog_setup_fn_t prog_setup_fn;
/* BPF program loading callback (see libbpf_prog_prepare_load_fn_t).
* Callback is optional, pass NULL if it's not necessary.
*/
libbpf_prog_prepare_load_fn_t prog_prepare_load_fn;
/* BPF program attach callback (see libbpf_prog_attach_fn_t).
* Callback is optional, pass NULL if it's not necessary.
*/
libbpf_prog_attach_fn_t prog_attach_fn;
};
#define libbpf_prog_handler_opts__last_field prog_attach_fn
/**
* @brief **libbpf_register_prog_handler()** registers a custom BPF program
* SEC() handler.
* @param sec section prefix for which custom handler is registered
* @param prog_type BPF program type associated with specified section
* @param exp_attach_type Expected BPF attach type associated with specified section
* @param opts optional cookie, callbacks, and other extra options
* @return Non-negative handler ID is returned on success. This handler ID has
* to be passed to *libbpf_unregister_prog_handler()* to unregister such
* custom handler. Negative error code is returned on error.
*
* *sec* defines which SEC() definitions are handled by this custom handler
* registration. *sec* can have few different forms:
* - if *sec* is just a plain string (e.g., "abc"), it will match only
* SEC("abc"). If BPF program specifies SEC("abc/whatever") it will result
* in an error;
* - if *sec* is of the form "abc/", proper SEC() form is
* SEC("abc/something"), where acceptable "something" should be checked by
* *prog_init_fn* callback, if there are additional restrictions;
* - if *sec* is of the form "abc+", it will successfully match both
* SEC("abc") and SEC("abc/whatever") forms;
* - if *sec* is NULL, custom handler is registered for any BPF program that
* doesn't match any of the registered (custom or libbpf's own) SEC()
* handlers. There could be only one such generic custom handler registered
* at any given time.
*
* All custom handlers (except the one with *sec* == NULL) are processed
* before libbpf's own SEC() handlers. It is allowed to "override" libbpf's
* SEC() handlers by registering custom ones for the same section prefix
* (i.e., it's possible to have custom SEC("perf_event/LLC-load-misses")
* handler).
*
* Note, like much of global libbpf APIs (e.g., libbpf_set_print(),
* libbpf_set_strict_mode(), etc)) these APIs are not thread-safe. User needs
* to ensure synchronization if there is a risk of running this API from
* multiple threads simultaneously.
*/
LIBBPF_API int libbpf_register_prog_handler(const char *sec,
enum bpf_prog_type prog_type,
enum bpf_attach_type exp_attach_type,
const struct libbpf_prog_handler_opts *opts);
/**
* @brief *libbpf_unregister_prog_handler()* unregisters previously registered
* custom BPF program SEC() handler.
* @param handler_id handler ID returned by *libbpf_register_prog_handler()*
* after successful registration
* @return 0 on success, negative error code if handler isn't found
*
* Note, like much of global libbpf APIs (e.g., libbpf_set_print(),
* libbpf_set_strict_mode(), etc)) these APIs are not thread-safe. User needs
* to ensure synchronization if there is a risk of running this API from
* multiple threads simultaneously.
*/
LIBBPF_API int libbpf_unregister_prog_handler(int handler_id);
#ifdef __cplusplus
} /* extern "C" */
#endif

View File

@ -439,3 +439,12 @@ LIBBPF_0.7.0 {
libbpf_probe_bpf_prog_type;
libbpf_set_memlock_rlim_max;
} LIBBPF_0.6.0;
LIBBPF_0.8.0 {
global:
bpf_object__destroy_subskeleton;
bpf_object__open_subskeleton;
libbpf_register_prog_handler;
libbpf_unregister_prog_handler;
bpf_program__attach_kprobe_multi_opts;
} LIBBPF_0.7.0;

View File

@ -449,6 +449,11 @@ __s32 btf__find_by_name_kind_own(const struct btf *btf, const char *type_name,
extern enum libbpf_strict_mode libbpf_mode;
typedef int (*kallsyms_cb_t)(unsigned long long sym_addr, char sym_type,
const char *sym_name, void *ctx);
int libbpf_kallsyms_parse(kallsyms_cb_t cb, void *arg);
/* handle direct returned errors */
static inline int libbpf_err(int ret)
{

View File

@ -54,6 +54,10 @@ enum libbpf_strict_mode {
*
* Note, in this mode the program pin path will be based on the
* function name instead of section name.
*
* Additionally, routines in the .text section are always considered
* sub-programs. Legacy behavior allows for a single routine in .text
* to be a program.
*/
LIBBPF_STRICT_SEC_NAME = 0x04,
/*

View File

@ -4,6 +4,6 @@
#define __LIBBPF_VERSION_H
#define LIBBPF_MAJOR_VERSION 0
#define LIBBPF_MINOR_VERSION 7
#define LIBBPF_MINOR_VERSION 8
#endif /* __LIBBPF_VERSION_H */

View File

@ -481,8 +481,8 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
BPF_EMIT_CALL(BPF_FUNC_redirect_map),
BPF_EXIT_INSN(),
};
size_t insns_cnt[] = {sizeof(prog) / sizeof(struct bpf_insn),
sizeof(prog_redirect_flags) / sizeof(struct bpf_insn),
size_t insns_cnt[] = {ARRAY_SIZE(prog),
ARRAY_SIZE(prog_redirect_flags),
};
struct bpf_insn *progs[] = {prog, prog_redirect_flags};
enum xsk_prog option = get_xsk_prog();
@ -1193,12 +1193,23 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
int xsk_umem__delete(struct xsk_umem *umem)
{
struct xdp_mmap_offsets off;
int err;
if (!umem)
return 0;
if (umem->refcount)
return -EBUSY;
err = xsk_get_mmap_offsets(umem->fd, &off);
if (!err && umem->fill_save && umem->comp_save) {
munmap(umem->fill_save->ring - off.fr.desc,
off.fr.desc + umem->config.fill_size * sizeof(__u64));
munmap(umem->comp_save->ring - off.cr.desc,
off.cr.desc + umem->config.comp_size * sizeof(__u64));
}
close(umem->fd);
free(umem);

View File

@ -89,6 +89,9 @@ ifeq ($(CC_NO_CLANG), 1)
EXTRA_WARNINGS += -Wstrict-aliasing=3
else ifneq ($(CROSS_COMPILE),)
# Allow userspace to override CLANG_CROSS_FLAGS to specify their own
# sysroots and flags or to avoid the GCC call in pure Clang builds.
ifeq ($(CLANG_CROSS_FLAGS),)
CLANG_CROSS_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%))
GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)gcc 2>/dev/null))
ifneq ($(GCC_TOOLCHAIN_DIR),)
@ -96,6 +99,7 @@ CLANG_CROSS_FLAGS += --prefix=$(GCC_TOOLCHAIN_DIR)$(notdir $(CROSS_COMPILE))
CLANG_CROSS_FLAGS += --sysroot=$(shell $(CROSS_COMPILE)gcc -print-sysroot)
CLANG_CROSS_FLAGS += --gcc-toolchain=$(realpath $(GCC_TOOLCHAIN_DIR)/..)
endif # GCC_TOOLCHAIN_DIR
endif # CLANG_CROSS_FLAGS
CFLAGS += $(CLANG_CROSS_FLAGS)
AFLAGS += $(CLANG_CROSS_FLAGS)
endif # CROSS_COMPILE

View File

@ -31,6 +31,7 @@ test_tcp_check_syncookie_user
test_sysctl
xdping
test_cpp
*.subskel.h
*.skel.h
*.lskel.h
/no_alu32

View File

@ -25,7 +25,7 @@ CFLAGS += -g -O0 -rdynamic -Wall -Werror $(GENFLAGS) $(SAN_CFLAGS) \
-I$(CURDIR) -I$(INCLUDE_DIR) -I$(GENDIR) -I$(LIBDIR) \
-I$(TOOLSINCDIR) -I$(APIDIR) -I$(OUTPUT)
LDFLAGS += $(SAN_CFLAGS)
LDLIBS += -lcap -lelf -lz -lrt -lpthread
LDLIBS += -lelf -lz -lrt -lpthread
# Silence some warnings when compiled with clang
ifneq ($(LLVM),)
@ -195,6 +195,7 @@ $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): $(BPFOBJ)
CGROUP_HELPERS := $(OUTPUT)/cgroup_helpers.o
TESTING_HELPERS := $(OUTPUT)/testing_helpers.o
TRACE_HELPERS := $(OUTPUT)/trace_helpers.o
CAP_HELPERS := $(OUTPUT)/cap_helpers.o
$(OUTPUT)/test_dev_cgroup: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_skb_cgroup_id_user: $(CGROUP_HELPERS) $(TESTING_HELPERS)
@ -211,7 +212,7 @@ $(OUTPUT)/test_lirc_mode2_user: $(TESTING_HELPERS)
$(OUTPUT)/xdping: $(TESTING_HELPERS)
$(OUTPUT)/flow_dissector_load: $(TESTING_HELPERS)
$(OUTPUT)/test_maps: $(TESTING_HELPERS)
$(OUTPUT)/test_verifier: $(TESTING_HELPERS)
$(OUTPUT)/test_verifier: $(TESTING_HELPERS) $(CAP_HELPERS)
BPFTOOL ?= $(DEFAULT_BPFTOOL)
$(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) \
@ -326,7 +327,13 @@ endef
SKEL_BLACKLIST := btf__% test_pinning_invalid.c test_sk_assign.c
LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h \
linked_vars.skel.h linked_maps.skel.h
linked_vars.skel.h linked_maps.skel.h \
test_subskeleton.skel.h test_subskeleton_lib.skel.h
# In the subskeleton case, we want the test_subskeleton_lib.subskel.h file
# but that's created as a side-effect of the skel.h generation.
test_subskeleton.skel.h-deps := test_subskeleton_lib2.o test_subskeleton_lib.o test_subskeleton.o
test_subskeleton_lib.skel.h-deps := test_subskeleton_lib2.o test_subskeleton_lib.o
LSKELS := kfunc_call_test.c fentry_test.c fexit_test.c fexit_sleep.c \
test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c \
@ -404,6 +411,7 @@ $(TRUNNER_BPF_SKELS): %.skel.h: %.o $(BPFTOOL) | $(TRUNNER_OUTPUT)
$(Q)$$(BPFTOOL) gen object $$(<:.o=.linked3.o) $$(<:.o=.linked2.o)
$(Q)diff $$(<:.o=.linked2.o) $$(<:.o=.linked3.o)
$(Q)$$(BPFTOOL) gen skeleton $$(<:.o=.linked3.o) name $$(notdir $$(<:.o=)) > $$@
$(Q)$$(BPFTOOL) gen subskeleton $$(<:.o=.linked3.o) name $$(notdir $$(<:.o=)) > $$(@:.skel.h=.subskel.h)
$(TRUNNER_BPF_LSKELS): %.lskel.h: %.o $(BPFTOOL) | $(TRUNNER_OUTPUT)
$$(call msg,GEN-SKEL,$(TRUNNER_BINARY),$$@)
@ -421,6 +429,7 @@ $(TRUNNER_BPF_SKELS_LINKED): $(TRUNNER_BPF_OBJS) $(BPFTOOL) | $(TRUNNER_OUTPUT)
$(Q)diff $$(@:.skel.h=.linked2.o) $$(@:.skel.h=.linked3.o)
$$(call msg,GEN-SKEL,$(TRUNNER_BINARY),$$@)
$(Q)$$(BPFTOOL) gen skeleton $$(@:.skel.h=.linked3.o) name $$(notdir $$(@:.skel.h=)) > $$@
$(Q)$$(BPFTOOL) gen subskeleton $$(@:.skel.h=.linked3.o) name $$(notdir $$(@:.skel.h=)) > $$(@:.skel.h=.subskel.h)
endif
# ensure we set up tests.h header generation rule just once
@ -479,7 +488,8 @@ TRUNNER_TESTS_DIR := prog_tests
TRUNNER_BPF_PROGS_DIR := progs
TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c \
network_helpers.c testing_helpers.c \
btf_helpers.c flow_dissector_load.h
btf_helpers.c flow_dissector_load.h \
cap_helpers.c
TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \
ima_setup.sh \
$(wildcard progs/btf_dump_test_case_*.c)
@ -557,6 +567,6 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \
EXTRA_CLEAN := $(TEST_CUSTOM_PROGS) $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) \
prog_tests/tests.h map_tests/tests.h verifier/tests.h \
feature bpftool \
$(addprefix $(OUTPUT)/,*.o *.skel.h *.lskel.h no_alu32 bpf_gcc bpf_testmod.ko)
$(addprefix $(OUTPUT)/,*.o *.skel.h *.lskel.h *.subskel.h no_alu32 bpf_gcc bpf_testmod.ko)
.PHONY: docs docs-clean

View File

@ -32,11 +32,19 @@ For more information on about using the script, run:
$ tools/testing/selftests/bpf/vmtest.sh -h
In case of linker errors when running selftests, try using static linking:
.. code-block:: console
$ LDLIBS=-static vmtest.sh
.. note:: Some distros may not support static linking.
.. note:: The script uses pahole and clang based on host environment setting.
If you want to change pahole and llvm, you can change `PATH` environment
variable in the beginning of script.
.. note:: The script currently only supports x86_64.
.. note:: The script currently only supports x86_64 and s390x architectures.
Additional information about selftest failures are
documented here.

View File

@ -33,6 +33,10 @@ struct bpf_testmod_btf_type_tag_2 {
struct bpf_testmod_btf_type_tag_1 __user *p;
};
struct bpf_testmod_btf_type_tag_3 {
struct bpf_testmod_btf_type_tag_1 __percpu *p;
};
noinline int
bpf_testmod_test_btf_type_tag_user_1(struct bpf_testmod_btf_type_tag_1 __user *arg) {
BTF_TYPE_EMIT(func_proto_typedef);
@ -46,6 +50,16 @@ bpf_testmod_test_btf_type_tag_user_2(struct bpf_testmod_btf_type_tag_2 *arg) {
return arg->p->a;
}
noinline int
bpf_testmod_test_btf_type_tag_percpu_1(struct bpf_testmod_btf_type_tag_1 __percpu *arg) {
return arg->a;
}
noinline int
bpf_testmod_test_btf_type_tag_percpu_2(struct bpf_testmod_btf_type_tag_3 *arg) {
return arg->p->a;
}
noinline int bpf_testmod_loop_test(int n)
{
int i, sum = 0;

View File

@ -0,0 +1,67 @@
// SPDX-License-Identifier: GPL-2.0
#include "cap_helpers.h"
/* Avoid including <sys/capability.h> from the libcap-devel package,
* so directly declare them here and use them from glibc.
*/
int capget(cap_user_header_t header, cap_user_data_t data);
int capset(cap_user_header_t header, const cap_user_data_t data);
int cap_enable_effective(__u64 caps, __u64 *old_caps)
{
struct __user_cap_data_struct data[_LINUX_CAPABILITY_U32S_3];
struct __user_cap_header_struct hdr = {
.version = _LINUX_CAPABILITY_VERSION_3,
};
__u32 cap0 = caps;
__u32 cap1 = caps >> 32;
int err;
err = capget(&hdr, data);
if (err)
return err;
if (old_caps)
*old_caps = (__u64)(data[1].effective) << 32 | data[0].effective;
if ((data[0].effective & cap0) == cap0 &&
(data[1].effective & cap1) == cap1)
return 0;
data[0].effective |= cap0;
data[1].effective |= cap1;
err = capset(&hdr, data);
if (err)
return err;
return 0;
}
int cap_disable_effective(__u64 caps, __u64 *old_caps)
{
struct __user_cap_data_struct data[_LINUX_CAPABILITY_U32S_3];
struct __user_cap_header_struct hdr = {
.version = _LINUX_CAPABILITY_VERSION_3,
};
__u32 cap0 = caps;
__u32 cap1 = caps >> 32;
int err;
err = capget(&hdr, data);
if (err)
return err;
if (old_caps)
*old_caps = (__u64)(data[1].effective) << 32 | data[0].effective;
if (!(data[0].effective & cap0) && !(data[1].effective & cap1))
return 0;
data[0].effective &= ~cap0;
data[1].effective &= ~cap1;
err = capset(&hdr, data);
if (err)
return err;
return 0;
}

View File

@ -0,0 +1,19 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __CAP_HELPERS_H
#define __CAP_HELPERS_H
#include <linux/types.h>
#include <linux/capability.h>
#ifndef CAP_PERFMON
#define CAP_PERFMON 38
#endif
#ifndef CAP_BPF
#define CAP_BPF 39
#endif
int cap_enable_effective(__u64 caps, __u64 *old_caps);
int cap_disable_effective(__u64 caps, __u64 *old_caps);
#endif

View File

@ -12,7 +12,7 @@ LOG_FILE="$(mktemp /tmp/ima_setup.XXXX.log)"
usage()
{
echo "Usage: $0 <setup|cleanup|run> <existing_tmp_dir>"
echo "Usage: $0 <setup|cleanup|run|modify-bin|restore-bin|load-policy> <existing_tmp_dir>"
exit 1
}
@ -51,6 +51,7 @@ setup()
ensure_mount_securityfs
echo "measure func=BPRM_CHECK fsuuid=${mount_uuid}" > ${IMA_POLICY_FILE}
echo "measure func=BPRM_CHECK fsuuid=${mount_uuid}" > ${mount_dir}/policy_test
}
cleanup() {
@ -77,6 +78,32 @@ run()
exec "${copied_bin_path}"
}
modify_bin()
{
local tmp_dir="$1"
local mount_dir="${tmp_dir}/mnt"
local copied_bin_path="${mount_dir}/$(basename ${TEST_BINARY})"
echo "mod" >> "${copied_bin_path}"
}
restore_bin()
{
local tmp_dir="$1"
local mount_dir="${tmp_dir}/mnt"
local copied_bin_path="${mount_dir}/$(basename ${TEST_BINARY})"
truncate -s -4 "${copied_bin_path}"
}
load_policy()
{
local tmp_dir="$1"
local mount_dir="${tmp_dir}/mnt"
echo ${mount_dir}/policy_test > ${IMA_POLICY_FILE} 2> /dev/null
}
catch()
{
local exit_code="$1"
@ -105,6 +132,12 @@ main()
cleanup "${tmp_dir}"
elif [[ "${action}" == "run" ]]; then
run "${tmp_dir}"
elif [[ "${action}" == "modify-bin" ]]; then
modify_bin "${tmp_dir}"
elif [[ "${action}" == "restore-bin" ]]; then
restore_bin "${tmp_dir}"
elif [[ "${action}" == "load-policy" ]]; then
load_policy "${tmp_dir}"
else
echo "Unknown action: ${action}"
exit 1

View File

@ -1,18 +1,25 @@
// SPDX-License-Identifier: GPL-2.0-only
#define _GNU_SOURCE
#include <errno.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sched.h>
#include <arpa/inet.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <linux/err.h>
#include <linux/in.h>
#include <linux/in6.h>
#include <linux/limits.h>
#include "bpf_util.h"
#include "network_helpers.h"
#include "test_progs.h"
#define clean_errno() (errno == 0 ? "None" : strerror(errno))
#define log_err(MSG, ...) ({ \
@ -356,3 +363,82 @@ char *ping_command(int family)
}
return "ping";
}
struct nstoken {
int orig_netns_fd;
};
static int setns_by_fd(int nsfd)
{
int err;
err = setns(nsfd, CLONE_NEWNET);
close(nsfd);
if (!ASSERT_OK(err, "setns"))
return err;
/* Switch /sys to the new namespace so that e.g. /sys/class/net
* reflects the devices in the new namespace.
*/
err = unshare(CLONE_NEWNS);
if (!ASSERT_OK(err, "unshare"))
return err;
/* Make our /sys mount private, so the following umount won't
* trigger the global umount in case it's shared.
*/
err = mount("none", "/sys", NULL, MS_PRIVATE, NULL);
if (!ASSERT_OK(err, "remount private /sys"))
return err;
err = umount2("/sys", MNT_DETACH);
if (!ASSERT_OK(err, "umount2 /sys"))
return err;
err = mount("sysfs", "/sys", "sysfs", 0, NULL);
if (!ASSERT_OK(err, "mount /sys"))
return err;
err = mount("bpffs", "/sys/fs/bpf", "bpf", 0, NULL);
if (!ASSERT_OK(err, "mount /sys/fs/bpf"))
return err;
return 0;
}
struct nstoken *open_netns(const char *name)
{
int nsfd;
char nspath[PATH_MAX];
int err;
struct nstoken *token;
token = malloc(sizeof(struct nstoken));
if (!ASSERT_OK_PTR(token, "malloc token"))
return NULL;
token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY);
if (!ASSERT_GE(token->orig_netns_fd, 0, "open /proc/self/ns/net"))
goto fail;
snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name);
nsfd = open(nspath, O_RDONLY | O_CLOEXEC);
if (!ASSERT_GE(nsfd, 0, "open netns fd"))
goto fail;
err = setns_by_fd(nsfd);
if (!ASSERT_OK(err, "setns_by_fd"))
goto fail;
return token;
fail:
free(token);
return NULL;
}
void close_netns(struct nstoken *token)
{
ASSERT_OK(setns_by_fd(token->orig_netns_fd), "setns_by_fd");
free(token);
}

View File

@ -55,4 +55,13 @@ int make_sockaddr(int family, const char *addr_str, __u16 port,
struct sockaddr_storage *addr, socklen_t *len);
char *ping_command(int family);
struct nstoken;
/**
* open_netns() - Switch to specified network namespace by name.
*
* Returns token with which to restore the original namespace
* using close_netns().
*/
struct nstoken *open_netns(const char *name);
void close_netns(struct nstoken *token);
#endif

View File

@ -4,9 +4,9 @@
#include <stdlib.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/capability.h>
#include "test_progs.h"
#include "cap_helpers.h"
#include "bind_perm.skel.h"
static int duration;
@ -49,41 +49,11 @@ close_socket:
close(fd);
}
bool cap_net_bind_service(cap_flag_value_t flag)
{
const cap_value_t cap_net_bind_service = CAP_NET_BIND_SERVICE;
cap_flag_value_t original_value;
bool was_effective = false;
cap_t caps;
caps = cap_get_proc();
if (CHECK(!caps, "cap_get_proc", "errno %d", errno))
goto free_caps;
if (CHECK(cap_get_flag(caps, CAP_NET_BIND_SERVICE, CAP_EFFECTIVE,
&original_value),
"cap_get_flag", "errno %d", errno))
goto free_caps;
was_effective = (original_value == CAP_SET);
if (CHECK(cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_net_bind_service,
flag),
"cap_set_flag", "errno %d", errno))
goto free_caps;
if (CHECK(cap_set_proc(caps), "cap_set_proc", "errno %d", errno))
goto free_caps;
free_caps:
CHECK(cap_free(caps), "cap_free", "errno %d", errno);
return was_effective;
}
void test_bind_perm(void)
{
bool cap_was_effective;
const __u64 net_bind_svc_cap = 1ULL << CAP_NET_BIND_SERVICE;
struct bind_perm *skel;
__u64 old_caps = 0;
int cgroup_fd;
if (create_netns())
@ -105,7 +75,8 @@ void test_bind_perm(void)
if (!ASSERT_OK_PTR(skel, "bind_v6_prog"))
goto close_skeleton;
cap_was_effective = cap_net_bind_service(CAP_CLEAR);
ASSERT_OK(cap_disable_effective(net_bind_svc_cap, &old_caps),
"cap_disable_effective");
try_bind(AF_INET, 110, EACCES);
try_bind(AF_INET6, 110, EACCES);
@ -113,8 +84,9 @@ void test_bind_perm(void)
try_bind(AF_INET, 111, 0);
try_bind(AF_INET6, 111, 0);
if (cap_was_effective)
cap_net_bind_service(CAP_SET);
if (old_caps & net_bind_svc_cap)
ASSERT_OK(cap_enable_effective(net_bind_svc_cap, NULL),
"cap_enable_effective");
close_skeleton:
bind_perm__destroy(skel);

View File

@ -7,6 +7,7 @@
#include <unistd.h>
#include <test_progs.h>
#include "test_bpf_cookie.skel.h"
#include "kprobe_multi.skel.h"
/* uprobe attach point */
static void trigger_func(void)
@ -63,6 +64,178 @@ cleanup:
bpf_link__destroy(retlink2);
}
static void kprobe_multi_test_run(struct kprobe_multi *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
prog_fd = bpf_program__fd(skel->progs.trigger);
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "test_run");
ASSERT_EQ(topts.retval, 0, "test_run");
ASSERT_EQ(skel->bss->kprobe_test1_result, 1, "kprobe_test1_result");
ASSERT_EQ(skel->bss->kprobe_test2_result, 1, "kprobe_test2_result");
ASSERT_EQ(skel->bss->kprobe_test3_result, 1, "kprobe_test3_result");
ASSERT_EQ(skel->bss->kprobe_test4_result, 1, "kprobe_test4_result");
ASSERT_EQ(skel->bss->kprobe_test5_result, 1, "kprobe_test5_result");
ASSERT_EQ(skel->bss->kprobe_test6_result, 1, "kprobe_test6_result");
ASSERT_EQ(skel->bss->kprobe_test7_result, 1, "kprobe_test7_result");
ASSERT_EQ(skel->bss->kprobe_test8_result, 1, "kprobe_test8_result");
ASSERT_EQ(skel->bss->kretprobe_test1_result, 1, "kretprobe_test1_result");
ASSERT_EQ(skel->bss->kretprobe_test2_result, 1, "kretprobe_test2_result");
ASSERT_EQ(skel->bss->kretprobe_test3_result, 1, "kretprobe_test3_result");
ASSERT_EQ(skel->bss->kretprobe_test4_result, 1, "kretprobe_test4_result");
ASSERT_EQ(skel->bss->kretprobe_test5_result, 1, "kretprobe_test5_result");
ASSERT_EQ(skel->bss->kretprobe_test6_result, 1, "kretprobe_test6_result");
ASSERT_EQ(skel->bss->kretprobe_test7_result, 1, "kretprobe_test7_result");
ASSERT_EQ(skel->bss->kretprobe_test8_result, 1, "kretprobe_test8_result");
}
static void kprobe_multi_link_api_subtest(void)
{
int prog_fd, link1_fd = -1, link2_fd = -1;
struct kprobe_multi *skel = NULL;
LIBBPF_OPTS(bpf_link_create_opts, opts);
unsigned long long addrs[8];
__u64 cookies[8];
if (!ASSERT_OK(load_kallsyms(), "load_kallsyms"))
goto cleanup;
skel = kprobe_multi__open_and_load();
if (!ASSERT_OK_PTR(skel, "fentry_raw_skel_load"))
goto cleanup;
skel->bss->pid = getpid();
skel->bss->test_cookie = true;
#define GET_ADDR(__sym, __addr) ({ \
__addr = ksym_get_addr(__sym); \
if (!ASSERT_NEQ(__addr, 0, "ksym_get_addr " #__sym)) \
goto cleanup; \
})
GET_ADDR("bpf_fentry_test1", addrs[0]);
GET_ADDR("bpf_fentry_test2", addrs[1]);
GET_ADDR("bpf_fentry_test3", addrs[2]);
GET_ADDR("bpf_fentry_test4", addrs[3]);
GET_ADDR("bpf_fentry_test5", addrs[4]);
GET_ADDR("bpf_fentry_test6", addrs[5]);
GET_ADDR("bpf_fentry_test7", addrs[6]);
GET_ADDR("bpf_fentry_test8", addrs[7]);
#undef GET_ADDR
cookies[0] = 1;
cookies[1] = 2;
cookies[2] = 3;
cookies[3] = 4;
cookies[4] = 5;
cookies[5] = 6;
cookies[6] = 7;
cookies[7] = 8;
opts.kprobe_multi.addrs = (const unsigned long *) &addrs;
opts.kprobe_multi.cnt = ARRAY_SIZE(addrs);
opts.kprobe_multi.cookies = (const __u64 *) &cookies;
prog_fd = bpf_program__fd(skel->progs.test_kprobe);
link1_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_KPROBE_MULTI, &opts);
if (!ASSERT_GE(link1_fd, 0, "link1_fd"))
goto cleanup;
cookies[0] = 8;
cookies[1] = 7;
cookies[2] = 6;
cookies[3] = 5;
cookies[4] = 4;
cookies[5] = 3;
cookies[6] = 2;
cookies[7] = 1;
opts.kprobe_multi.flags = BPF_F_KPROBE_MULTI_RETURN;
prog_fd = bpf_program__fd(skel->progs.test_kretprobe);
link2_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_KPROBE_MULTI, &opts);
if (!ASSERT_GE(link2_fd, 0, "link2_fd"))
goto cleanup;
kprobe_multi_test_run(skel);
cleanup:
close(link1_fd);
close(link2_fd);
kprobe_multi__destroy(skel);
}
static void kprobe_multi_attach_api_subtest(void)
{
struct bpf_link *link1 = NULL, *link2 = NULL;
LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
LIBBPF_OPTS(bpf_test_run_opts, topts);
struct kprobe_multi *skel = NULL;
const char *syms[8] = {
"bpf_fentry_test1",
"bpf_fentry_test2",
"bpf_fentry_test3",
"bpf_fentry_test4",
"bpf_fentry_test5",
"bpf_fentry_test6",
"bpf_fentry_test7",
"bpf_fentry_test8",
};
__u64 cookies[8];
skel = kprobe_multi__open_and_load();
if (!ASSERT_OK_PTR(skel, "fentry_raw_skel_load"))
goto cleanup;
skel->bss->pid = getpid();
skel->bss->test_cookie = true;
cookies[0] = 1;
cookies[1] = 2;
cookies[2] = 3;
cookies[3] = 4;
cookies[4] = 5;
cookies[5] = 6;
cookies[6] = 7;
cookies[7] = 8;
opts.syms = syms;
opts.cnt = ARRAY_SIZE(syms);
opts.cookies = cookies;
link1 = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kprobe,
NULL, &opts);
if (!ASSERT_OK_PTR(link1, "bpf_program__attach_kprobe_multi_opts"))
goto cleanup;
cookies[0] = 8;
cookies[1] = 7;
cookies[2] = 6;
cookies[3] = 5;
cookies[4] = 4;
cookies[5] = 3;
cookies[6] = 2;
cookies[7] = 1;
opts.retprobe = true;
link2 = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kretprobe,
NULL, &opts);
if (!ASSERT_OK_PTR(link2, "bpf_program__attach_kprobe_multi_opts"))
goto cleanup;
kprobe_multi_test_run(skel);
cleanup:
bpf_link__destroy(link2);
bpf_link__destroy(link1);
kprobe_multi__destroy(skel);
}
static void uprobe_subtest(struct test_bpf_cookie *skel)
{
DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, opts);
@ -199,7 +372,7 @@ static void pe_subtest(struct test_bpf_cookie *skel)
attr.type = PERF_TYPE_SOFTWARE;
attr.config = PERF_COUNT_SW_CPU_CLOCK;
attr.freq = 1;
attr.sample_freq = 4000;
attr.sample_freq = 1000;
pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
if (!ASSERT_GE(pfd, 0, "perf_fd"))
goto cleanup;
@ -249,6 +422,10 @@ void test_bpf_cookie(void)
if (test__start_subtest("kprobe"))
kprobe_subtest(skel);
if (test__start_subtest("multi_kprobe_link_api"))
kprobe_multi_link_api_subtest();
if (test__start_subtest("multi_kprobe_attach_api"))
kprobe_multi_attach_api_subtest();
if (test__start_subtest("uprobe"))
uprobe_subtest(skel);
if (test__start_subtest("tracepoint"))

View File

@ -10,6 +10,7 @@ struct btf_type_tag_test {
};
#include "btf_type_tag.skel.h"
#include "btf_type_tag_user.skel.h"
#include "btf_type_tag_percpu.skel.h"
static void test_btf_decl_tag(void)
{
@ -43,38 +44,81 @@ static void test_btf_type_tag(void)
btf_type_tag__destroy(skel);
}
static void test_btf_type_tag_mod_user(bool load_test_user1)
/* loads vmlinux_btf as well as module_btf. If the caller passes NULL as
* module_btf, it will not load module btf.
*
* Returns 0 on success.
* Return -1 On error. In case of error, the loaded btf will be freed and the
* input parameters will be set to pointing to NULL.
*/
static int load_btfs(struct btf **vmlinux_btf, struct btf **module_btf,
bool needs_vmlinux_tag)
{
const char *module_name = "bpf_testmod";
struct btf *vmlinux_btf, *module_btf;
struct btf_type_tag_user *skel;
__s32 type_id;
int err;
if (!env.has_testmod) {
test__skip();
return;
return -1;
}
/* skip the test if the module does not have __user tags */
vmlinux_btf = btf__load_vmlinux_btf();
if (!ASSERT_OK_PTR(vmlinux_btf, "could not load vmlinux BTF"))
return;
*vmlinux_btf = btf__load_vmlinux_btf();
if (!ASSERT_OK_PTR(*vmlinux_btf, "could not load vmlinux BTF"))
return -1;
module_btf = btf__load_module_btf(module_name, vmlinux_btf);
if (!ASSERT_OK_PTR(module_btf, "could not load module BTF"))
if (!needs_vmlinux_tag)
goto load_module_btf;
/* skip the test if the vmlinux does not have __user tags */
type_id = btf__find_by_name_kind(*vmlinux_btf, "user", BTF_KIND_TYPE_TAG);
if (type_id <= 0) {
printf("%s:SKIP: btf_type_tag attribute not in vmlinux btf", __func__);
test__skip();
goto free_vmlinux_btf;
}
load_module_btf:
/* skip loading module_btf, if not requested by caller */
if (!module_btf)
return 0;
*module_btf = btf__load_module_btf(module_name, *vmlinux_btf);
if (!ASSERT_OK_PTR(*module_btf, "could not load module BTF"))
goto free_vmlinux_btf;
type_id = btf__find_by_name_kind(module_btf, "user", BTF_KIND_TYPE_TAG);
/* skip the test if the module does not have __user tags */
type_id = btf__find_by_name_kind(*module_btf, "user", BTF_KIND_TYPE_TAG);
if (type_id <= 0) {
printf("%s:SKIP: btf_type_tag attribute not in %s", __func__, module_name);
test__skip();
goto free_module_btf;
}
return 0;
free_module_btf:
btf__free(*module_btf);
free_vmlinux_btf:
btf__free(*vmlinux_btf);
*vmlinux_btf = NULL;
if (module_btf)
*module_btf = NULL;
return -1;
}
static void test_btf_type_tag_mod_user(bool load_test_user1)
{
struct btf *vmlinux_btf = NULL, *module_btf = NULL;
struct btf_type_tag_user *skel;
int err;
if (load_btfs(&vmlinux_btf, &module_btf, /*needs_vmlinux_tag=*/false))
return;
skel = btf_type_tag_user__open();
if (!ASSERT_OK_PTR(skel, "btf_type_tag_user"))
goto free_module_btf;
goto cleanup;
bpf_program__set_autoload(skel->progs.test_sys_getsockname, false);
if (load_test_user1)
@ -87,34 +131,23 @@ static void test_btf_type_tag_mod_user(bool load_test_user1)
btf_type_tag_user__destroy(skel);
free_module_btf:
cleanup:
btf__free(module_btf);
free_vmlinux_btf:
btf__free(vmlinux_btf);
}
static void test_btf_type_tag_vmlinux_user(void)
{
struct btf_type_tag_user *skel;
struct btf *vmlinux_btf;
__s32 type_id;
struct btf *vmlinux_btf = NULL;
int err;
/* skip the test if the vmlinux does not have __user tags */
vmlinux_btf = btf__load_vmlinux_btf();
if (!ASSERT_OK_PTR(vmlinux_btf, "could not load vmlinux BTF"))
if (load_btfs(&vmlinux_btf, NULL, /*needs_vmlinux_tag=*/true))
return;
type_id = btf__find_by_name_kind(vmlinux_btf, "user", BTF_KIND_TYPE_TAG);
if (type_id <= 0) {
printf("%s:SKIP: btf_type_tag attribute not in vmlinux btf", __func__);
test__skip();
goto free_vmlinux_btf;
}
skel = btf_type_tag_user__open();
if (!ASSERT_OK_PTR(skel, "btf_type_tag_user"))
goto free_vmlinux_btf;
goto cleanup;
bpf_program__set_autoload(skel->progs.test_user2, false);
bpf_program__set_autoload(skel->progs.test_user1, false);
@ -124,7 +157,70 @@ static void test_btf_type_tag_vmlinux_user(void)
btf_type_tag_user__destroy(skel);
free_vmlinux_btf:
cleanup:
btf__free(vmlinux_btf);
}
static void test_btf_type_tag_mod_percpu(bool load_test_percpu1)
{
struct btf *vmlinux_btf, *module_btf;
struct btf_type_tag_percpu *skel;
int err;
if (load_btfs(&vmlinux_btf, &module_btf, /*needs_vmlinux_tag=*/false))
return;
skel = btf_type_tag_percpu__open();
if (!ASSERT_OK_PTR(skel, "btf_type_tag_percpu"))
goto cleanup;
bpf_program__set_autoload(skel->progs.test_percpu_load, false);
bpf_program__set_autoload(skel->progs.test_percpu_helper, false);
if (load_test_percpu1)
bpf_program__set_autoload(skel->progs.test_percpu2, false);
else
bpf_program__set_autoload(skel->progs.test_percpu1, false);
err = btf_type_tag_percpu__load(skel);
ASSERT_ERR(err, "btf_type_tag_percpu");
btf_type_tag_percpu__destroy(skel);
cleanup:
btf__free(module_btf);
btf__free(vmlinux_btf);
}
static void test_btf_type_tag_vmlinux_percpu(bool load_test)
{
struct btf_type_tag_percpu *skel;
struct btf *vmlinux_btf = NULL;
int err;
if (load_btfs(&vmlinux_btf, NULL, /*needs_vmlinux_tag=*/true))
return;
skel = btf_type_tag_percpu__open();
if (!ASSERT_OK_PTR(skel, "btf_type_tag_percpu"))
goto cleanup;
bpf_program__set_autoload(skel->progs.test_percpu2, false);
bpf_program__set_autoload(skel->progs.test_percpu1, false);
if (load_test) {
bpf_program__set_autoload(skel->progs.test_percpu_helper, false);
err = btf_type_tag_percpu__load(skel);
ASSERT_ERR(err, "btf_type_tag_percpu_load");
} else {
bpf_program__set_autoload(skel->progs.test_percpu_load, false);
err = btf_type_tag_percpu__load(skel);
ASSERT_OK(err, "btf_type_tag_percpu_helper");
}
btf_type_tag_percpu__destroy(skel);
cleanup:
btf__free(vmlinux_btf);
}
@ -134,10 +230,20 @@ void test_btf_tag(void)
test_btf_decl_tag();
if (test__start_subtest("btf_type_tag"))
test_btf_type_tag();
if (test__start_subtest("btf_type_tag_user_mod1"))
test_btf_type_tag_mod_user(true);
if (test__start_subtest("btf_type_tag_user_mod2"))
test_btf_type_tag_mod_user(false);
if (test__start_subtest("btf_type_tag_sys_user_vmlinux"))
test_btf_type_tag_vmlinux_user();
if (test__start_subtest("btf_type_tag_percpu_mod1"))
test_btf_type_tag_mod_percpu(true);
if (test__start_subtest("btf_type_tag_percpu_mod2"))
test_btf_type_tag_mod_percpu(false);
if (test__start_subtest("btf_type_tag_percpu_vmlinux_load"))
test_btf_type_tag_vmlinux_percpu(true);
if (test__start_subtest("btf_type_tag_percpu_vmlinux_helper"))
test_btf_type_tag_vmlinux_percpu(false);
}

Some files were not shown because too many files have changed in this diff Show More