Commit graph

1121129 commits

Author SHA1 Message Date
Stanislav Fomichev
dea6a4e170 bpf: Introduce cgroup_{common,current}_func_proto
Split cgroup_base_func_proto into the following:

* cgroup_common_func_proto - common helpers for all cgroup hooks
* cgroup_current_func_proto - common helpers for all cgroup hooks
  running in the process context (== have meaningful 'current').

Move bpf_{g,s}et_retval and other cgroup-related helpers into
kernel/bpf/cgroup.c so they closer to where they are being used.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220823222555.523590-2-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23 16:08:21 -07:00
Quentin Monnet
92ec1cc378 scripts/bpf: Set date attribute for bpf-helpers(7) man page
The bpf-helpers(7) manual page shipped in the man-pages project is
generated from the documentation contained in the BPF UAPI header, in
the Linux repository, parsed by script/bpf_doc.py and then fed to
rst2man.

The man page should contain the date of last modification of the
documentation. This commit adds the relevant date when generating the
page.

Before:

    $ ./scripts/bpf_doc.py helpers | rst2man | grep '\.TH'
    .TH BPF-HELPERS 7 "" "Linux v5.19-14022-g30d2a4d74e11" ""

After:

    $ ./scripts/bpf_doc.py helpers | rst2man | grep '\.TH'
    .TH BPF-HELPERS 7 "2022-08-15" "Linux v5.19-14022-g30d2a4d74e11" ""

We get the version by using "git log" to look for the commit date of the
latest change to the section of the BPF header containing the
documentation. If the command fails, we just skip the date field. and
keep generating the page.

Reported-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Link: https://lore.kernel.org/bpf/20220823155327.98888-2-quentin@isovalent.com
2022-08-23 22:51:04 +02:00
Quentin Monnet
fd0a38f9c3 scripts/bpf: Set version attribute for bpf-helpers(7) man page
The bpf-helpers(7) manual page shipped in the man-pages project is
generated from the documentation contained in the BPF UAPI header, in
the Linux repository, parsed by script/bpf_doc.py and then fed to
rst2man.

After a recent update of that page [0], Alejandro reported that the
linter used to validate the man pages complains about the generated
document [1]. The header for the page is supposed to contain some
attributes that we do not set correctly with the script. This commit
updates the "project and version" field. We discussed the format of
those fields in [1] and [2].

Before:

    $ ./scripts/bpf_doc.py helpers | rst2man | grep '\.TH'
    .TH BPF-HELPERS 7 "" "" ""

After:

    $ ./scripts/bpf_doc.py helpers | rst2man | grep '\.TH'
    .TH BPF-HELPERS 7 "" "Linux v5.19-14022-g30d2a4d74e11" ""

We get the version from "git describe", but if unavailable, we fall back
on "make kernelversion". If none works, for example because neither git
nore make are installed, we just set the field to "Linux" and keep
generating the page.

[0] https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man7/bpf-helpers.7?id=19c7f78393f2b038e76099f87335ddf43a87f039
[1] https://lore.kernel.org/all/20220823084719.13613-1-quentin@isovalent.com/t/#m58a418a318642c6428e14ce9bb84eba5183b06e8
[2] https://lore.kernel.org/all/20220721110821.8240-1-alx.manpages@gmail.com/t/#m8e689a822e03f6e2530a0d6de9d128401916c5de

Reported-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Link: https://lore.kernel.org/bpf/20220823155327.98888-1-quentin@isovalent.com
2022-08-23 22:51:04 +02:00
Shmulik Ladkani
d6513727c2 bpf, selftests: Test BPF_FLOW_DISSECTOR_CONTINUE
The dissector program returns BPF_FLOW_DISSECTOR_CONTINUE (and avoids
setting skb->flow_keys or last_dissection map) in case it encounters
IP packets whose (outer) source address is 127.0.0.127.

Additional test is added to prog_tests/flow_dissector.c which sets
this address as test's pkk.iph.saddr, with the expected retval of
BPF_FLOW_DISSECTOR_CONTINUE.

Also, legacy test_flow_dissector.sh was similarly augmented.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-5-shmulik.ladkani@gmail.com
2022-08-23 22:48:12 +02:00
Shmulik Ladkani
5deedfbee8 bpf, test_run: Propagate bpf_flow_dissect's retval to user's bpf_attr.test.retval
Formerly, a boolean denoting whether bpf_flow_dissect returned BPF_OK
was set into 'bpf_attr.test.retval'.

Augment this, so users can check the actual return code of the dissector
program under test.

Existing prog_tests/flow_dissector*.c tests were correspondingly changed
to check against each test's expected retval.

Also, tests' resulting 'flow_keys' are verified only in case the expected
retval is BPF_OK. This allows adding new tests that expect non BPF_OK.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-4-shmulik.ladkani@gmail.com
2022-08-23 22:48:03 +02:00
Shmulik Ladkani
91350fe152 bpf, flow_dissector: Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode for bpf progs
Currently, attaching BPF_PROG_TYPE_FLOW_DISSECTOR programs completely
replaces the flow-dissector logic with custom dissection logic. This
forces implementors to write programs that handle dissection for any
flows expected in the namespace.

It makes sense for flow-dissector BPF programs to just augment the
dissector with custom logic (e.g. dissecting certain flows or custom
protocols), while enjoying the broad capabilities of the standard
dissector for any other traffic.

Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode. Flow-dissector BPF
programs may return this to indicate no dissection was made, and
fallback to the standard dissector is requested.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-3-shmulik.ladkani@gmail.com
2022-08-23 22:47:55 +02:00
Shmulik Ladkani
0ba985024a flow_dissector: Make 'bpf_flow_dissect' return the bpf program retcode
Let 'bpf_flow_dissect' callers know the BPF program's retcode and act
accordingly.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-2-shmulik.ladkani@gmail.com
2022-08-23 22:47:42 +02:00
Martin KaFai Lau
b979f005d9 selftest/bpf: Add setget_sockopt to DENYLIST.s390x
Trampoline is not supported in s390.

Fixes: 31123c0360 ("selftests/bpf: bpf_setsockopt tests")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220819192155.91713-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-19 12:46:18 -07:00
Colin Ian King
e918cd231e selftests/bpf: Fix spelling mistake.
There is a spelling mistake in an ASSERT_OK literal string. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Mykola Lysenko <mykolal@fb.com>
Link: https://lore.kernel.org/r/20220817213242.101277-1-colin.i.king@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-19 12:45:14 -07:00
Alexei Starovoitov
75179e2b7f Merge branch 'bpf: net: Remove duplicated code from bpf_setsockopt()'
Martin KaFai Lau says:

====================

The code in bpf_setsockopt() is mostly a copy-and-paste from
the sock_setsockopt(), do_tcp_setsockopt(), do_ipv6_setsockopt(),
and do_ip_setsockopt().  As the allowed optnames in bpf_setsockopt()
grows, so are the duplicated code.  The code between the copies
also slowly drifted.

This set is an effort to clean this up and reuse the existing
{sock,do_tcp,do_ipv6,do_ip}_setsockopt() as much as possible.

After the clean up, this set also adds a few allowed optnames
that we need to the bpf_setsockopt().

The initial attempt was to clean up both bpf_setsockopt() and
bpf_getsockopt() together.  However, the patch set was getting
too long.  It is beneficial to leave the bpf_getsockopt()
out for another patch set.  Thus, this set is focusing
on the bpf_setsockopt().

v4:
- This set now depends on the commit f574f7f839 ("net: bpf: Use the protocol's set_rcvlowat behavior if there is one")
  in the net-next tree.  The commit calls a specific protocol's
  set_rcvlowat and it changed the bpf_setsockopt
  which this set has also changed.

  Because of this, patch 9 of this set has also adjusted
  and a 'sock' NULL check is added to the sk_setsockopt()
  because some of the bpf hooks have a NULL sk->sk_socket.
  This removes more dup code from the bpf_setsockopt() side.
- Avoid mentioning specific prog types in the comment of
  the has_current_bpf_ctx(). (Andrii)
- Replace signed with unsigned int bitfield in the
  patch 15 selftest. (Daniel)

v3:
- s/in_bpf/has_current_bpf_ctx/ (Andrii)
- Add comment to has_current_bpf_ctx() and sockopt_lock_sock()
  (Stanislav)
- Use vmlinux.h in selftest and add defines to bpf_tracing_net.h
  (Stanislav)
- Use bpf_getsockopt(SO_MARK) in selftest (Stanislav)
- Use BPF_CORE_READ_BITFIELD in selftest (Yonghong)

v2:
- A major change is to use in_bpf() to test if a setsockopt()
  is called by a bpf prog and use in_bpf() to skip capable
  check.  Suggested by Stanislav.
- Instead of passing is_locked through sockptr_t or through an extra
  argument to sk_setsockopt, v2 uses in_bpf() to skip the lock_sock()
  also because bpf prog has the lock acquired.
- No change to the current sockptr_t in this revision
- s/codes/code/
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:14 -07:00
Martin KaFai Lau
31123c0360 selftests/bpf: bpf_setsockopt tests
This patch adds tests to exercise optnames that are allowed
in bpf_setsockopt().

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061847.4182339-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:14 -07:00
Martin KaFai Lau
7e41df5dbb bpf: Add a few optnames to bpf_setsockopt
This patch adds a few optnames for bpf_setsockopt:
SO_REUSEADDR, IPV6_AUTOFLOWLABEL, TCP_MAXSEG, TCP_NODELAY,
and TCP_THIN_LINEAR_TIMEOUTS.

Thanks to the previous patches of this set, all additions can reuse
the sk_setsockopt(), do_ipv6_setsockopt(), and do_tcp_setsockopt().
The only change here is to allow them in bpf_setsockopt.

The bpf prog has been able to read all members of a sk by
using PTR_TO_BTF_ID of a sk.  The optname additions here can also be
read by the same approach.  Meaning there is a way to read
the values back.

These optnames can also be added to bpf_getsockopt() later with
another patch set that makes the bpf_getsockopt() to reuse
the sock_getsockopt(), tcp_getsockopt(), and ip[v6]_getsockopt().
Thus, this patch does not add more duplicated code to
bpf_getsockopt() now.

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061841.4181642-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:14 -07:00
Martin KaFai Lau
75b64b68ee bpf: Change bpf_setsockopt(SOL_IPV6) to reuse do_ipv6_setsockopt()
After the prep work in the previous patches,
this patch removes the dup code from bpf_setsockopt(SOL_IPV6)
and reuses the implementation in do_ipv6_setsockopt().

ipv6 could be compiled as a module.  Like how other code solved it
with stubs in ipv6_stubs.h, this patch adds the do_ipv6_setsockopt
to the ipv6_bpf_stub.

The current bpf_setsockopt(IPV6_TCLASS) does not take the
INET_ECN_MASK into the account for tcp.  The
do_ipv6_setsockopt(IPV6_TCLASS) will handle it correctly.

The existing optname white-list is refactored into a new
function sol_ipv6_setsockopt().

After this last SOL_IPV6 dup code removal, the __bpf_setsockopt()
is simplified enough that the extra "{ }" around the if statement
can be removed.

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061834.4181198-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:13 -07:00
Martin KaFai Lau
ee7f1e1302 bpf: Change bpf_setsockopt(SOL_IP) to reuse do_ip_setsockopt()
After the prep work in the previous patches,
this patch removes the dup code from bpf_setsockopt(SOL_IP)
and reuses the implementation in do_ip_setsockopt().

The existing optname white-list is refactored into a new
function sol_ip_setsockopt().

NOTE,
the current bpf_setsockopt(IP_TOS) is quite different from the
the do_ip_setsockopt(IP_TOS).  For example, it does not take
the INET_ECN_MASK into the account for tcp and also does not adjust
sk->sk_priority.  It looks like the current bpf_setsockopt(IP_TOS)
was referencing the IPV6_TCLASS implementation instead of IP_TOS.
This patch tries to rectify that by using the do_ip_setsockopt(IP_TOS).
While this is a behavior change,  the do_ip_setsockopt(IP_TOS) behavior
is arguably what the user is expecting.  At least, the INET_ECN_MASK bits
should be masked out for tcp.

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061826.4180990-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:13 -07:00
Martin KaFai Lau
0c751f7071 bpf: Change bpf_setsockopt(SOL_TCP) to reuse do_tcp_setsockopt()
After the prep work in the previous patches,
this patch removes all the dup code from bpf_setsockopt(SOL_TCP)
and reuses the do_tcp_setsockopt().

The existing optname white-list is refactored into a new
function sol_tcp_setsockopt().  The sol_tcp_setsockopt()
also calls the bpf_sol_tcp_setsockopt() to handle
the TCP_BPF_XXX specific optnames.

bpf_setsockopt(TCP_SAVE_SYN) now also allows a value 2 to
save the eth header also and it comes for free from
do_tcp_setsockopt().

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061819.4180146-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:13 -07:00
Martin KaFai Lau
57db31a1a3 bpf: Refactor bpf specific tcp optnames to a new function
The patch moves all bpf specific tcp optnames (TCP_BPF_XXX)
to a new function bpf_sol_tcp_setsockopt().  This will make
the next patch easier to follow.

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061812.4179645-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:13 -07:00
Martin KaFai Lau
29003875bd bpf: Change bpf_setsockopt(SOL_SOCKET) to reuse sk_setsockopt()
After the prep work in the previous patches,
this patch removes most of the dup code from bpf_setsockopt(SOL_SOCKET)
and reuses them from sk_setsockopt().

The sock ptr test is added to the SO_RCVLOWAT because
the sk->sk_socket could be NULL in some of the bpf hooks.

The existing optname white-list is refactored into a new
function sol_socket_setsockopt().

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061804.4178920-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:13 -07:00
Martin KaFai Lau
ebf9e8e653 bpf: Embed kernel CONFIG check into the if statement in bpf_setsockopt
This patch moves the "#ifdef CONFIG_XXX" check into the "if/else"
statement itself.  The change is done for the bpf_setsockopt()
function only.  It will make the latter patches easier to follow
without the surrounding ifdef macro.

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061758.4178374-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:13 -07:00
Martin KaFai Lau
2b5a2ecbfd bpf: Initialize the bpf_run_ctx in bpf_iter_run_prog()
The bpf-iter-prog for tcp and unix sk can do bpf_setsockopt()
which needs has_current_bpf_ctx() to decide if it is called by a
bpf prog.  This patch initializes the bpf_run_ctx in
bpf_iter_run_prog() for the has_current_bpf_ctx() to use.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061751.4177657-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:13 -07:00
Martin KaFai Lau
40cd308ea5 bpf: net: Change do_ipv6_setsockopt() to use the sockopt's lock_sock() and capable()
Similar to the earlier patch that avoids sk_setsockopt() from
taking sk lock and doing capable test when called by bpf.  This patch
changes do_ipv6_setsockopt() to use the sockopt_{lock,release}_sock()
and sockopt_[ns_]capable().

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061744.4176893-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:13 -07:00
Martin KaFai Lau
1df055d3c7 bpf: net: Change do_ip_setsockopt() to use the sockopt's lock_sock() and capable()
Similar to the earlier patch that avoids sk_setsockopt() from
taking sk lock and doing capable test when called by bpf.  This patch
changes do_ip_setsockopt() to use the sockopt_{lock,release}_sock()
and sockopt_[ns_]capable().

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061737.4176402-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:13 -07:00
Martin KaFai Lau
cb388e7ee3 bpf: net: Change do_tcp_setsockopt() to use the sockopt's lock_sock() and capable()
Similar to the earlier patch that avoids sk_setsockopt() from
taking sk lock and doing capable test when called by bpf.  This patch
changes do_tcp_setsockopt() to use the sockopt_{lock,release}_sock()
and sockopt_[ns_]capable().

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061730.4176021-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:12 -07:00
Martin KaFai Lau
e42c7beee7 bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()
When bpf program calling bpf_setsockopt(SOL_SOCKET),
it could be run in softirq and doesn't make sense to do the capable
check.  There was a similar situation in bpf_setsockopt(TCP_CONGESTION).
In commit 8d650cdeda ("tcp: fix tcp_set_congestion_control() use from bpf hook"),
tcp_set_congestion_control(..., cap_net_admin) was added to skip
the cap check for bpf prog.

This patch adds sockopt_ns_capable() and sockopt_capable() for
the sk_setsockopt() to use.  They will consider the
has_current_bpf_ctx() before doing the ns_capable() and capable() test.
They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061723.4175820-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:12 -07:00
Martin KaFai Lau
24426654ed bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf
Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
the sk_setsockopt().  The number of supported optnames are
increasing ever and so as the duplicated code.

One issue in reusing sk_setsockopt() is that the bpf prog
has already acquired the sk lock.  This patch adds a
has_current_bpf_ctx() to tell if the sk_setsockopt() is called from
a bpf prog.  The bpf prog calling bpf_setsockopt() is either running
in_task() or in_serving_softirq().  Both cases have the current->bpf_ctx
initialized.  Thus, the has_current_bpf_ctx() only needs to
test !!current->bpf_ctx.

This patch also adds sockopt_{lock,release}_sock() helpers
for sk_setsockopt() to use.  These helpers will test
has_current_bpf_ctx() before acquiring/releasing the lock.  They are
in EXPORT_SYMBOL for the ipv6 module to use in a latter patch.

Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
is done in sock_setbindtodevice() instead of doing the lock_sock
in sock_bindtoindex(..., lock_sk = true).

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061717.4175589-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:12 -07:00
Martin KaFai Lau
4d748f9916 net: Add sk_setsockopt() to take the sk ptr instead of the sock ptr
A latter patch refactors bpf_setsockopt(SOL_SOCKET) with the
sock_setsockopt() to avoid code duplication and code
drift between the two duplicates.

The current sock_setsockopt() takes sock ptr as the argument.
The very first thing of this function is to get back the sk ptr
by 'sk = sock->sk'.

bpf_setsockopt() could be called when the sk does not have
the sock ptr created.  Meaning sk->sk_socket is NULL.  For example,
when a passive tcp connection has just been established but has yet
been accept()-ed.  Thus, it cannot use the sock_setsockopt(sk->sk_socket)
or else it will pass a NULL ptr.

This patch moves all sock_setsockopt implementation to the newly
added sk_setsockopt().  The new sk_setsockopt() takes a sk ptr
and immediately gets the sock ptr by 'sock = sk->sk_socket'

The existing sock_setsockopt(sock) is changed to call
sk_setsockopt(sock->sk).  All existing callers have both sock->sk
and sk->sk_socket pointer.

The latter patch will make bpf_setsockopt(SOL_SOCKET) call
sk_setsockopt(sk) directly.  The bpf_setsockopt(SOL_SOCKET) does
not use the optnames that require sk->sk_socket, so it will
be safe.

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061711.4175048-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:12 -07:00
Maxime Chevallier
fb8d784b53 net: ethernet: altera: Add use of ethtool_op_get_ts_info
Add the ethtool_op_get_ts_info() callback to ethtool ops, so that we can
at least use software timestamping.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://lore.kernel.org/r/20220817095725.97444-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-18 10:25:39 -07:00
Wong Vee Khee
e34cfee65e stmmac: intel: remove unused 'has_crossts' flag
The 'has_crossts' flag was not used anywhere in the stmmac driver,
removing it from both header file and dwmac-intel driver.

Signed-off-by: Wong Vee Khee <veekhee@apple.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20220817064324.10025-1-veekhee@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 20:40:52 -07:00
Jakub Kicinski
3f5f728a72 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
bpf-next 2022-08-17

We've added 45 non-merge commits during the last 14 day(s) which contain
a total of 61 files changed, 986 insertions(+), 372 deletions(-).

The main changes are:

1) New bpf_ktime_get_tai_ns() BPF helper to access CLOCK_TAI, from Kurt
   Kanzenbach and Jesper Dangaard Brouer.

2) Few clean ups and improvements for libbpf 1.0, from Andrii Nakryiko.

3) Expose crash_kexec() as kfunc for BPF programs, from Artem Savkov.

4) Add ability to define sleepable-only kfuncs, from Benjamin Tissoires.

5) Teach libbpf's bpf_prog_load() and bpf_map_create() to gracefully handle
   unsupported names on old kernels, from Hangbin Liu.

6) Allow opting out from auto-attaching BPF programs by libbpf's BPF skeleton,
   from Hao Luo.

7) Relax libbpf's requirement for shared libs to be marked executable, from
   Henqgi Chen.

8) Improve bpf_iter internals handling of error returns, from Hao Luo.

9) Few accommodations in libbpf to support GCC-BPF quirks, from James Hilliard.

10) Fix BPF verifier logic around tracking dynptr ref_obj_id, from Joanne Koong.

11) bpftool improvements to handle full BPF program names better, from Manu
    Bretelle.

12) bpftool fixes around libcap use, from Quentin Monnet.

13) BPF map internals clean ups and improvements around memory allocations,
    from Yafang Shao.

14) Allow to use cgroup_get_from_file() on cgroupv1, allowing BPF cgroup
    iterator to work on cgroupv1, from Yosry Ahmed.

15) BPF verifier internal clean ups, from Dave Marchevsky and Joanne Koong.

16) Various fixes and clean ups for selftests/bpf and vmtest.sh, from Daniel
    Xu, Artem Savkov, Joanne Koong, Andrii Nakryiko, Shibin Koikkara Reeny.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
  selftests/bpf: Few fixes for selftests/bpf built in release mode
  libbpf: Clean up deprecated and legacy aliases
  libbpf: Streamline bpf_attr and perf_event_attr initialization
  libbpf: Fix potential NULL dereference when parsing ELF
  selftests/bpf: Tests libbpf autoattach APIs
  libbpf: Allows disabling auto attach
  selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm
  libbpf: Making bpf_prog_load() ignore name if kernel doesn't support
  selftests/bpf: Update CI kconfig
  selftests/bpf: Add connmark read test
  selftests/bpf: Add existing connection bpf_*_ct_lookup() test
  bpftool: Clear errno after libcap's checks
  bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation
  bpftool: Fix a typo in a comment
  libbpf: Add names for auxiliary maps
  bpf: Use bpf_map_area_alloc consistently on bpf map creation
  bpf: Make __GFP_NOWARN consistent in bpf map creation
  bpf: Use bpf_map_area_free instread of kvfree
  bpf: Remove unneeded memset in queue_stack_map creation
  libbpf: preserve errno across pr_warn/pr_info/pr_debug
  ...
====================

Link: https://lore.kernel.org/r/20220817215656.1180215-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 20:29:36 -07:00
Andrii Nakryiko
df78da2726 selftests/bpf: Few fixes for selftests/bpf built in release mode
Fix few issues found when building and running test_progs in
release mode.

First, potentially uninitialized idx variable in xskxceiver,
force-initialize to zero to satisfy compiler.

Few instances of defining uprobe trigger functions break in release mode
unless marked as noinline, due to being static. Add noinline to make
sure everything works.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220816001929.369487-5-andrii@kernel.org
2022-08-17 22:43:58 +02:00
Andrii Nakryiko
abf84b64e3 libbpf: Clean up deprecated and legacy aliases
Remove three missed deprecated APIs that were aliased to new APIs:
bpf_object__unload, bpf_prog_attach_xattr and btf__load.

Also move legacy API libbpf_find_kernel_btf (aliased to
btf__load_vmlinux_btf) into libbpf_legacy.h.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220816001929.369487-4-andrii@kernel.org
2022-08-17 22:42:56 +02:00
Andrii Nakryiko
813847a314 libbpf: Streamline bpf_attr and perf_event_attr initialization
Make sure that entire libbpf code base is initializing bpf_attr and
perf_event_attr with memset(0). Also for bpf_attr make sure we
clear and pass to kernel only relevant parts of bpf_attr. bpf_attr is
a huge union of independent sub-command attributes, so there is no need
to clear and pass entire union bpf_attr, which over time grows quite
a lot and for most commands this growth is completely irrelevant.

Few cases where we were relying on compiler initialization of BPF UAPI
structs (like bpf_prog_info, bpf_map_info, etc) with `= {};` were
switched to memset(0) pattern for future-proofing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220816001929.369487-3-andrii@kernel.org
2022-08-17 22:42:10 +02:00
Andrii Nakryiko
d4e6d684f3 libbpf: Fix potential NULL dereference when parsing ELF
Fix if condition filtering empty ELF sections to prevent NULL
dereference.

Fixes: 47ea7417b0 ("libbpf: Skip empty sections in bpf_object__init_global_data_maps")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20220816001929.369487-2-andrii@kernel.org
2022-08-17 22:42:10 +02:00
Jakub Kicinski
fd78d07c7c Merge branch 'net-dsa-bcm_sf2-utilize-phylink-for-all-ports'
Florian Fainelli says:

====================
net: dsa: bcm_sf2: Utilize PHYLINK for all ports

This patch series has the bcm_sf2 driver utilize PHYLINK to configure
the CPU port link parameters to unify the configuration and pave the way
for DSA to utilize PHYLINK for all ports in the future.

Tested on BCM7445 and BCM7278
====================

Link: https://lore.kernel.org/r/20220815175009.2681932-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 10:55:07 -07:00
Florian Fainelli
4d2f6dde4d net: dsa: bcm_sf2: Have PHYLINK configure CPU/IMP port(s)
Remove the artificial limitations imposed upon
bcm_sf2_sw_mac_link_{up,down} and allow us to override the link
parameters for IMP port(s) as well as regular ports by accounting for
the special differences that exist there.

Remove the code that did override the link parameters in
bcm_sf2_imp_setup().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 10:55:03 -07:00
Florian Fainelli
1ed26ce485 net: dsa: bcm_sf2: Introduce helper for port override offset
Depending upon the generation of switches, we have different offsets for
configuring a given port's status override where link parameters are
applied. Introduce a helper function that we re-use throughout the code
in order to let phylink callbacks configure the IMP/CPU port(s) in
subsequent changes.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 10:55:03 -07:00
Beniamin Sandu
815f5f5741 net: sfp: use simplified HWMON_CHANNEL_INFO macro
This makes the code look cleaner and easier to read.

Signed-off-by: Beniamin Sandu <beniaminsandu@gmail.com>
Link: https://lore.kernel.org/r/20220813204658.848372-1-beniaminsandu@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 10:16:41 -07:00
Hao Luo
738a2f2f91 selftests/bpf: Tests libbpf autoattach APIs
Adds test for libbpf APIs that toggle bpf program auto-attaching.

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220816234012.910255-2-haoluo@google.com
2022-08-17 09:42:07 -07:00
Hao Luo
43cb8cbadf libbpf: Allows disabling auto attach
Adds libbpf APIs for disabling auto-attach for individual functions.
This is motivated by the use case of cgroup iter [1]. Some iter
types require their parameters to be non-zero, therefore applying
auto-attach on them will fail. With these two new APIs, users who
want to use auto-attach and these types of iters can disable
auto-attach on the program and perform manual attach.

[1] https://lore.kernel.org/bpf/CAEf4BzZ+a2uDo_t6kGBziqdz--m2gh2_EUwkGLDtMd65uwxUjA@mail.gmail.com/

Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220816234012.910255-1-haoluo@google.com
2022-08-17 09:40:47 -07:00
David S. Miller
5417197dd5 Merge branch 'wwan-t7xx-fw-flashing-and-coredump-support'
M Chetan Kumar says:

====================
net: wwan: t7xx: fw flashing & coredump support

This patch series brings-in the support for FM350 wwan device firmware
flashing & coredump collection using devlink interface.

Below is the high level description of individual patches.
Refer to individual patch commit message for details.

PATCH1:  Enables AP CLDMA communication for firmware flashing &
coredump collection.

PATCH2: Enables the infrastructure & queue configuration required
for early ports enumeration.

PATCH3: Implements device reset and rescan logic required to enter
or exit fastboot mode.

PATCH4: Implements devlink interface & uses the fastboot protocol for
fw flashing and coredump collection.

PATCH5: t7xx devlink commands documentation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17 11:53:53 +01:00
M Chetan Kumar
b0bc1709b7 net: wwan: t7xx: Devlink documentation
Document the t7xx devlink commands usage for fw flashing &
coredump collection.

Refer to t7xx.rst file for details.

Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Signed-off-by: Devegowda Chandrashekar <chandrashekar.devegowda@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17 11:53:53 +01:00
M Chetan Kumar
87dae9e70b net: wwan: t7xx: Enable devlink based fw flashing and coredump collection
This patch brings-in support for t7xx wwan device firmware flashing &
coredump collection using devlink.

Driver Registers with Devlink framework.
Implements devlink ops flash_update callback that programs modem firmware.
Creates region & snapshot required for device coredump log collection.
On early detection of wwan device in fastboot mode driver sets up CLDMA0 HW
tx/rx queues for raw data transfer then registers with devlink framework.
Upon receiving firmware image & partition details driver sends fastboot
commands for flashing the firmware.

In this flow the fastboot command & response gets exchanged between driver
and device. Once firmware flashing is success completion status is reported
to user space application.

Below is the devlink command usage for firmware flashing

$devlink dev flash pci/$BDF file ABC.img component ABC

Note: ABC.img is the firmware to be programmed to "ABC" partition.

In case of coredump collection when wwan device encounters an exception
it reboots & stays in fastboot mode for coredump collection by host driver.
On detecting exception state driver collects the core dump, creates the
devlink region & reports an event to user space application for dump
collection. The user space application invokes devlink region read command
for dump collection.

Below are the devlink commands used for coredump collection.

devlink region new pci/$BDF/mr_dump
devlink region read pci/$BDF/mr_dump snapshot $ID address $ADD length $LEN
devlink region del pci/$BDF/mr_dump snapshot $ID

Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Signed-off-by: Devegowda Chandrashekar <chandrashekar.devegowda@intel.com>
Signed-off-by: Mishra Soumya Prakash <soumya.prakash.mishra@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17 11:53:53 +01:00
Haijun Liu
140424d901 net: wwan: t7xx: PCIe reset rescan
PCI rescan module implements "rescan work queue". In firmware flashing
or coredump collection procedure WWAN device is programmed to boot in
fastboot mode and a work item is scheduled for removal & detection.
The WWAN device is reset using APCI call as part driver removal flow.
Work queue rescans pci bus at fixed interval for device detection,
later when device is detect work queue exits.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Co-developed-by: Madhusmita Sahu <madhusmita.sahu@intel.com>
Signed-off-by: Madhusmita Sahu <madhusmita.sahu@intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Signed-off-by: Devegowda Chandrashekar <chandrashekar.devegowda@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17 11:53:53 +01:00
Haijun Liu
007f26f0d6 net: wwan: t7xx: Infrastructure for early port configuration
To support cases such as FW update or Core dump, the t7xx device
is capable of signaling the host that a special port needs
to be created before the handshake phase.

This patch adds the infrastructure required to create the
early ports which also requires a different configuration of
CLDMA queues.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Co-developed-by: Madhusmita Sahu <madhusmita.sahu@intel.com>
Signed-off-by: Madhusmita Sahu <madhusmita.sahu@intel.com>
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Signed-off-by: Devegowda Chandrashekar <chandrashekar.devegowda@intel.com>
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17 11:53:53 +01:00
Haijun Liu
d20ef656f9 net: wwan: t7xx: Add AP CLDMA
The t7xx device contains two Cross Layer DMA (CLDMA) interfaces to
communicate with AP and Modem processors respectively. So far only
MD-CLDMA was being used, this patch enables AP-CLDMA.

Rename small Application Processor (sAP) to AP.

Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Co-developed-by: Madhusmita Sahu <madhusmita.sahu@intel.com>
Signed-off-by: Madhusmita Sahu <madhusmita.sahu@intel.com>
Signed-off-by: Moises Veleta <moises.veleta@linux.intel.com>
Signed-off-by: Devegowda Chandrashekar <chandrashekar.devegowda@intel.com>
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17 11:53:53 +01:00
Florian Fainelli
0630f64d25 net: phy: broadcom: Implement suspend/resume for AC131 and BCM5241
Implement the suspend/resume procedure for the Broadcom AC131 and BCM5241 type
of PHYs (10/100 only) by entering the standard power down followed by the
proprietary standby mode in the auxiliary mode 4 shadow register. On resume,
the PHY software reset is enough to make it come out of standby mode so we can
utilize brcm_fet_config_init() as the resume hook.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17 11:49:23 +01:00
David S. Miller
95657e6a4b Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
t-queue

Tony Nguyen says:

====================
ice: detect and report PTP timestamp issues

Jacob Keller says:

This series fixes a few small issues with the cached PTP Hardware Clock
timestamp used for timestamp extension. It also introduces extra checks to
help detect issues with this logic, such as if the cached timestamp is not
updated within the 2 second window.

This introduces a few statistics similar to the ones already available in
other Intel drivers, including tx_hwtstamp_skipped and tx_hwtstamp_timeouts.

It is intended to aid in debugging issues we're seeing with some setups
which might be related to incorrect cached timestamp values.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17 10:20:45 +01:00
Jie Meng
8ea731d4c2 tcp: Make SYN ACK RTO tunable by BPF programs with TFO
Instead of the hardcoded TCP_TIMEOUT_INIT, this diff calls tcp_timeout_init
to initiate req->timeout like the non TFO SYN ACK case.

Tested using the following packetdrill script, on a host with a BPF
program that sets the initial connect timeout to 10ms.

`../../common/defaults.sh`

// Initialize connection
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1000,sackOK,FO TFO_COOKIE>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
   +.01 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
   +.02 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
   +.04 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
   +.01 < . 1:1(0) ack 1 win 32792

   +0 accept(3, ..., ...) = 4

Signed-off-by: Jie Meng <jmeng@fb.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17 10:19:22 +01:00
Zhengchao Shao
cfc111d539 net: sched: delete unused input parameter in qdisc_create
The input parameter p is unused in qdisc_create. Delete it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220815061023.51318-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-16 19:49:56 -07:00
Stefan Wahren
56cb6a59da net: vertexcom: mse102x: Update email address
in-tech smart charging is now chargebyte. So update the email address
accordingly.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Link: https://lore.kernel.org/r/20220815080626.9688-2-stefan.wahren@i2se.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-16 19:49:51 -07:00
Stefan Wahren
d56ef29afb dt-bindings: vertexcom-mse102x: Update email address
in-tech smart charging is now chargebyte. So update the email address
accordingly.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220815080626.9688-1-stefan.wahren@i2se.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-16 19:49:51 -07:00