Commit graph

68809 commits

Author SHA1 Message Date
Menglong Dong
b384c95a86 net: icmp: add skb drop reasons to icmp protocol
Replace kfree_skb() used in icmp_rcv() and icmpv6_rcv() with
kfree_skb_reason().

In order to get the reasons of the skb drops after icmp message handle,
we change the return type of 'handler()' in 'struct icmp_control' from
'bool' to 'enum skb_drop_reason'. This may change its original
intention, as 'false' means failure, but 'SKB_NOT_DROPPED_YET' means
success now. Therefore, all 'handler' and the call of them need to be
handled. Following 'handler' functions are involved:

icmp_unreach()
icmp_redirect()
icmp_echo()
icmp_timestamp()
icmp_discard()

And following new drop reasons are added:

SKB_DROP_REASON_ICMP_CSUM
SKB_DROP_REASON_INVALID_PROTO

The reason 'INVALID_PROTO' is introduced for the case that the packet
doesn't follow rfc 1122 and is dropped. This is not a common case, and
I believe we can locate the problem from the data in the packet. For now,
this 'INVALID_PROTO' is used for the icmp broadcasts with wrong types.

Maybe there should be a document file for these reasons. For example,
list all the case that causes the 'UNHANDLED_PROTO' and 'INVALID_PROTO'
drop reason. Therefore, users can locate their problems according to the
document.

Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:38:37 +01:00
Menglong Dong
41a95a00eb net: icmp: introduce __ping_queue_rcv_skb() to report drop reasons
In order to avoid to change the return value of ping_queue_rcv_skb(),
introduce the function __ping_queue_rcv_skb(), which is able to report
the reasons of skb drop as its return value, as Paolo suggested.

Meanwhile, make ping_queue_rcv_skb() a simple call to
__ping_queue_rcv_skb().

The kfree_skb() and sock_queue_rcv_skb() used in ping_queue_rcv_skb()
are replaced with kfree_skb_reason() and sock_queue_rcv_skb_reason()
now.

Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:38:37 +01:00
Menglong Dong
9f8ed577c2 net: skb: rename SKB_DROP_REASON_PTYPE_ABSENT
As David Ahern suggested, the reasons for skb drops should be more
general and not be code based.

Therefore, rename SKB_DROP_REASON_PTYPE_ABSENT to
SKB_DROP_REASON_UNHANDLED_PROTO, which is used for the cases of no
L3 protocol handler, no L4 protocol handler, version extensions, etc.

From previous discussion, now we have the aim to make these reasons
more abstract and users based, avoiding code based.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:38:37 +01:00
Menglong Dong
c1b8a56755 net: sock: introduce sock_queue_rcv_skb_reason()
In order to report the reasons of skb drops in 'sock_queue_rcv_skb()',
introduce the function 'sock_queue_rcv_skb_reason()'.

As the return value of 'sock_queue_rcv_skb()' is used as the error code,
we can't make it as drop reason and have to pass extra output argument.
'sock_queue_rcv_skb()' is used in many places, so we can't change it
directly.

Introduce the new function 'sock_queue_rcv_skb_reason()' and make
'sock_queue_rcv_skb()' an inline call to it.

Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:38:37 +01:00
Jakub Kicinski
f940b6efb1 tls: rx: jump out for cases which need to leave skb on list
The current invese logic is harder to follow (and adds extra
tests to the fast path). We have to enumerate all cases which
need to keep the skb before consuming it. It's simpler to
jump out of the full record flow as we detect those cases.

This makes it clear that partial consumption and peek can
only reach end of the function thru the !zc case so move
the code up there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:12 +01:00
Jakub Kicinski
b1a2c17863 tls: rx: clear ctx->recv_pkt earlier
Whatever we do in the loop the skb should not remain on as
ctx->recv_pkt afterwards. We can clear that pointer and
restart strparser earlier.

This adds overhead of extra linking and unlinking to rx_list
but that's not large (upcoming change will switch to unlocked
skb list operations).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:12 +01:00
Jakub Kicinski
465ea73535 tls: rx: inline consuming the skb at the end of the loop
tls_sw_advance_skb() always consumes the skb at the end of the loop.

To fall here the following must be true:

 !async && !is_peek && !retain_skb
   retain_skb => !zc && rxm->full_len > len
     # but non-full record implies !zc, so above can be simplified as
   retain_skb => rxm->full_len > len

 !async && !is_peek && !(rxm->full_len > len)
 !async && !is_peek && rxm->full_len <= len

tls_sw_advance_skb() returns false if len < rxm->full_len
which can't be true given conditions above.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:12 +01:00
Jakub Kicinski
ba13609df1 tls: rx: pull most of zc check out of the loop
Most of the conditions deciding if zero-copy can be used
do not change throughout the iterations, so pre-calculate
them.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
7da18bcc5e tls: rx: don't track the async count
We track both if the last record was handled by async crypto
and how many records were async. This is not necessary. We
implicitly assume once crypto goes async it will stay that
way, otherwise we'd reorder records. So just track if we're
in async mode, the exact number of records is not necessary.

This change also forces us into "async" mode more consistently
in case crypto ever decided to interleave async and sync.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
fc8da80f99 tls: rx: don't handle async in tls_sw_advance_skb()
tls_sw_advance_skb() caters to the async case when skb argument
is NULL. In that case it simply unpauses the strparser.

These are surprising semantics to a person reading the code,
and result in higher LoC, so inline the __strp_unpause and
only call tls_sw_advance_skb() when we actually move past
an skb.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
06554f4ffc tls: rx: factor out writing ContentType to cmsg
cmsg can be filled in during rx_list processing or normal
receive. Consolidate the code.

We don't need to keep the boolean to track if the cmsg was
created. 0 is an invalid content type.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
37943f047b tls: rx: simplify async wait
Since we are protected from async completions by decrypt_compl_lock
we can drop the async_notify and reinit the completion before we
start waiting.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
4175eac371 tls: rx: wrap decryption arguments in a structure
We pass zc as a pointer to bool a few functions down as an in/out
argument. This is error prone since C will happily evalue a pointer
as a boolean (IOW forgetting *zc and writing zc leads to loss of
developer time..). Wrap the arguments into a structure.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
9bdf75ccff tls: rx: don't report text length from the bowels of decrypt
We plumb pointer to chunk all the way to the decryption method.
It's set to the length of the text when decrypt_skb_update()
returns.

I think the code is written this way because original TLS
implementation passed &chunk to zerocopy_from_iter() and this
was carried forward as the code gotten more complex, without
any refactoring.

The fix for peek() introduced a new variable - to_decrypt
which for all practical purposes is what chunk is going to
get set to. Spare ourselves the pointer passing, use to_decrypt.

Use this opportunity to clean things up a little further.

Note that chunk / to_decrypt was mostly needed for the async
path, since the sync path would access rxm->full_len (decryption
transforms full_len from record size to text size). Use the
right source of truth more explicitly.

We have three cases:
 - async - it's TLS 1.2 only, so chunk == to_decrypt, but we
           need the min() because to_decrypt is a whole record
	   and we don't want to underflow len. Note that we can't
	   handle partial record by falling back to sync as it
	   would introduce reordering against records in flight.
 - zc - again, TLS 1.2 only for now, so chunk == to_decrypt,
        we don't do zc if len < to_decrypt, no need to check again.
 - normal - it already handles chunk > len, we can factor out the
            assignment to rxm->full_len and share it with zc.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
d4bd88e676 tls: rx: drop unnecessary arguments from tls_setup_from_iter()
sk is unused, remove it to make it clear the function
doesn't poke at the socket.

size_used is always 0 on input and @length on success.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
34ba23b44c Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-04-09

We've added 63 non-merge commits during the last 9 day(s) which contain
a total of 68 files changed, 4852 insertions(+), 619 deletions(-).

The main changes are:

1) Add libbpf support for USDT (User Statically-Defined Tracing) probes.
   USDTs are an abstraction built on top of uprobes, critical for tracing
   and BPF, and widely used in production applications, from Andrii Nakryiko.

2) While Andrii was adding support for x86{-64}-specific logic of parsing
   USDT argument specification, Ilya followed-up with USDT support for s390
   architecture, from Ilya Leoshkevich.

3) Support name-based attaching for uprobe BPF programs in libbpf. The format
   supported is `u[ret]probe/binary_path:[raw_offset|function[+offset]]`, e.g.
   attaching to libc malloc can be done in BPF via SEC("uprobe/libc.so.6:malloc")
   now, from Alan Maguire.

4) Various load/store optimizations for the arm64 JIT to shrink the image
   size by using arm64 str/ldr immediate instructions. Also enable pointer
   authentication to verify return address for JITed code, from Xu Kuohai.

5) BPF verifier fixes for write access checks to helper functions, e.g.
   rd-only memory from bpf_*_cpu_ptr() must not be passed to helpers that
   write into passed buffers, from Kumar Kartikeya Dwivedi.

6) Fix overly excessive stack map allocation for its base map structure and
   buckets which slipped-in from cleanups during the rlimit accounting removal
   back then, from Yuntao Wang.

7) Extend the unstable CT lookup helpers for XDP and tc/BPF to report netfilter
   connection tracking tuple direction, from Lorenzo Bianconi.

8) Improve bpftool dump to show BPF program/link type names, Milan Landaverde.

9) Minor cleanups all over the place from various others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (63 commits)
  bpf: Fix excessive memory allocation in stack_map_alloc()
  selftests/bpf: Fix return value checks in perf_event_stackmap test
  selftests/bpf: Add CO-RE relos into linked_funcs selftests
  libbpf: Use weak hidden modifier for USDT BPF-side API functions
  libbpf: Don't error out on CO-RE relos for overriden weak subprogs
  samples, bpf: Move routes monitor in xdp_router_ipv4 in a dedicated thread
  libbpf: Allow WEAK and GLOBAL bindings during BTF fixup
  libbpf: Use strlcpy() in path resolution fallback logic
  libbpf: Add s390-specific USDT arg spec parsing logic
  libbpf: Make BPF-side of USDT support work on big-endian machines
  libbpf: Minor style improvements in USDT code
  libbpf: Fix use #ifdef instead of #if to avoid compiler warning
  libbpf: Potential NULL dereference in usdt_manager_attach_usdt()
  selftests/bpf: Uprobe tests should verify param/return values
  libbpf: Improve string parsing for uprobe auto-attach
  libbpf: Improve library identification for uprobe binary path resolution
  selftests/bpf: Test for writes to map key from BPF helpers
  selftests/bpf: Test passing rdonly mem to global func
  bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access
  bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access
  ...
====================

Link: https://lore.kernel.org/r/20220408231741.19116-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08 17:07:29 -07:00
Ido Schimmel
fd23e0e250 net/sched: flower: Avoid overwriting error messages
The various error paths of tc_setup_offload_action() now report specific
error messages. Remove the generic messages to avoid overwriting the
more specific ones.

Before:

 # tc filter add dev dummy0 ingress pref 1 proto ip flower skip_sw dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
 Error: cls_flower: Failed to setup flow action.
 We have an error talking to the kernel

After:

 # tc filter add dev dummy0 ingress pref 1 proto ip flower skip_sw dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
 Error: act_police: Offload not supported when conform/exceed action is "reclassify".
 We have an error talking to the kernel

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
0cba5c34b8 net/sched: matchall: Avoid overwriting error messages
The various error paths of tc_setup_offload_action() now report specific
error messages. Remove the generic messages to avoid overwriting the
more specific ones.

Before:

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

After:

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000
 Error: act_police: Offload not supported when conform/exceed action is "reclassify".
 We have an error talking to the kernel

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
c440615ffb net/sched: cls_api: Add extack message for unsupported action offload
For better error reporting to user space, add an extack message when the
requested action does not support offload.

Example:

 # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action nat ingress 192.0.2.1 198.51.100.1
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-181     [000] b..1.    88.406093: netlink_extack: msg=Action does not support offload
       tc-181     [000] .....    88.406108: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
f8fab31694 net/sched: act_vlan: Add extack message for offload failure
For better error reporting to user space, add an extack message when
vlan action offload fails.

Currently, the failure cannot be triggered, but add a message in case
the action is extended in the future to support more than the current
set of modes.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
ee367d44b9 net/sched: act_tunnel_key: Add extack message for offload failure
For better error reporting to user space, add an extack message when
tunnel_key action offload fails.

Currently, the failure cannot be triggered, but add a message in case
the action is extended in the future to support more than set/release
modes.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
a9c64939b6 net/sched: act_skbedit: Add extack messages for offload failure
For better error reporting to user space, add extack messages when
skbedit action offload fails.

Example:

 # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action skbedit queue_mapping 1234
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-185     [002] b..1.    31.802414: netlink_extack: msg=act_skbedit: Offload not supported when "queue_mapping" option is used
       tc-185     [002] .....    31.802418: netlink_extack: msg=cls_matchall: Failed to setup flow action

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action skbedit inheritdsfield
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-187     [002] b..1.    45.985145: netlink_extack: msg=act_skbedit: Offload not supported when "inheritdsfield" option is used
       tc-187     [002] .....    45.985160: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
b50e462bc2 net/sched: act_police: Add extack messages for offload failure
For better error reporting to user space, add extack messages when
police action offload fails.

Example:

 # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-182     [000] b..1.    21.592969: netlink_extack: msg=act_police: Offload not supported when conform/exceed action is "reclassify"
       tc-182     [000] .....    21.592982: netlink_extack: msg=cls_matchall: Failed to setup flow action

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action police rate 100Mbit burst 10000 conform-exceed drop/continue
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-184     [000] b..1.    38.882579: netlink_extack: msg=act_police: Offload not supported when conform/exceed action is "continue"
       tc-184     [000] .....    38.882593: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
bf3b99e4f9 net/sched: act_pedit: Add extack message for offload failure
For better error reporting to user space, add an extack message when
pedit action offload fails.

Currently, the failure cannot be triggered, but add a message in case
the action is extended in the future to support more than set/add
commands.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
bca3821d19 net/sched: act_mpls: Add extack messages for offload failure
For better error reporting to user space, add extack messages when mpls
action offload fails.

Example:

 # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action mpls dec_ttl
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-182     [000] b..1.    18.693915: netlink_extack: msg=act_mpls: Offload not supported when "dec_ttl" option is used
       tc-182     [000] .....    18.693921: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
4dcaa50d02 net/sched: act_mirred: Add extack message for offload failure
For better error reporting to user space, add an extack message when
mirred action offload fails.

Currently, the failure cannot be triggered, but add a message in case
the action is extended in the future to support more than ingress/egress
mirror/redirect.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
69642c2ab2 net/sched: act_gact: Add extack messages for offload failure
For better error reporting to user space, add extack messages when gact
action offload fails.

Example:

 # echo 1 > /sys/kernel/tracing/events/netlink/netlink_extack/enable

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action continue
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-181     [002] b..1.   105.493450: netlink_extack: msg=act_gact: Offload of "continue" action is not supported
       tc-181     [002] .....   105.493466: netlink_extack: msg=cls_matchall: Failed to setup flow action

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action reclassify
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-183     [002] b..1.   124.126477: netlink_extack: msg=act_gact: Offload of "reclassify" action is not supported
       tc-183     [002] .....   124.126489: netlink_extack: msg=cls_matchall: Failed to setup flow action

 # tc filter add dev dummy0 ingress pref 1 proto all matchall skip_sw action pipe action drop
 Error: cls_matchall: Failed to setup flow action.
 We have an error talking to the kernel

 # cat /sys/kernel/tracing/trace_pipe
       tc-185     [002] b..1.   137.097791: netlink_extack: msg=act_gact: Offload of "pipe" action is not supported
       tc-185     [002] .....   137.097804: netlink_extack: msg=cls_matchall: Failed to setup flow action

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
c2ccf84ecb net/sched: act_api: Add extack to offload_act_setup() callback
The callback is used by various actions to populate the flow action
structure prior to offload. Pass extack to this callback so that the
various actions will be able to report accurate error messages to user
space.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
11c95317bc net/sched: flower: Take verbose flag into account when logging error messages
The verbose flag was added in commit 81c7288b17 ("sched: cls: enable
verbose logging") to avoid suppressing logging of error messages that
occur "when the rule is not to be exclusively executed by the hardware".

However, such error messages are currently suppressed when setup of flow
action fails. Take the verbose flag into account to avoid suppressing
error messages. This is done by using the extack pointer initialized by
tc_cls_common_offload_init(), which performs the necessary checks.

Before:

 # tc filter add dev dummy0 ingress pref 1 proto ip flower dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
 # tc filter add dev dummy0 ingress pref 2 proto ip flower verbose dst_ip 198.51.100.1 action police rate 100Mbit burst 10000

After:

 # tc filter add dev dummy0 ingress pref 1 proto ip flower dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
 # tc filter add dev dummy0 ingress pref 2 proto ip flower verbose dst_ip 198.51.100.1 action police rate 100Mbit burst 10000
 Warning: cls_flower: Failed to setup flow action.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:43 +01:00
Ido Schimmel
4c096ea2d6 net/sched: matchall: Take verbose flag into account when logging error messages
The verbose flag was added in commit 81c7288b17 ("sched: cls: enable
verbose logging") to avoid suppressing logging of error messages that
occur "when the rule is not to be exclusively executed by the hardware".

However, such error messages are currently suppressed when setup of flow
action fails. Take the verbose flag into account to avoid suppressing
error messages. This is done by using the extack pointer initialized by
tc_cls_common_offload_init(), which performs the necessary checks.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:45:42 +01:00
Jakub Kicinski
71471ca325 tls: hw: rx: use return value of tls_device_decrypted() to carry status
Instead of tls_device poking into internals of the message
return 1 from tls_device_decrypted() if the device handled
the decryption.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:49:09 +01:00
Jakub Kicinski
3764ae5ba6 tls: rx: refactor decrypt_skb_update()
Use early return and a jump label to remove two indentation levels.
No functional changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:49:08 +01:00
Jakub Kicinski
5dbda02d32 tls: rx: don't issue wake ups when data is decrypted
We inform the applications that data is available when
the record is received. Decryption happens inline inside
recvmsg or splice call. Generating another wakeup inside
the decryption handler seems pointless as someone must
be actively reading the socket if we are executing this
code.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:49:08 +01:00
Jakub Kicinski
5deee41b19 tls: rx: replace 'back' with 'offset'
The padding length TLS 1.3 logic is searching for content_type from
the end of text. IMHO the code is easier to parse if we calculate
offset and decrement it rather than try to maintain positive offset
from the end of the record called "back".

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:49:08 +01:00
Jakub Kicinski
a8340cc02b tls: rx: use a define for tag length
TLS 1.3 has to strip padding, and it starts out 16 bytes
from the end of the record. Make it clear this is because
of the auth tag.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:49:08 +01:00
Jakub Kicinski
863533e316 tls: rx: init decrypted status in tls_read_size()
We set the record type in tls_read_size(), can as well init
the tlm->decrypted field there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:49:08 +01:00
Jakub Kicinski
7dc59c33d6 tls: rx: don't store the decryption status in socket context
Similar justification to previous change, the information
about decryption status belongs in the skb.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:49:08 +01:00
Jakub Kicinski
c3f6bb7413 tls: rx: don't store the record type in socket context
Original TLS implementation was handling one record at a time.
It stashed the type of the record inside tls context (per socket
structure) for convenience. When async crypto support was added
[1] the author had to use skb->cb to store the type per-message.

The use of skb->cb overlaps with strparser, however, so a hybrid
approach was taken where type is stored in context while parsing
(since we parse a message at a time) but once parsed its copied
to skb->cb.

Recently a workaround for sockmaps [2] exposed the previously
private struct _strp_msg and started a trend of adding user
fields directly in strparser's header. This is cleaner than
storing information about an skb in the context.

This change is not strictly necessary, but IMHO the ownership
of the context field is confusing. Information naturally
belongs to the skb.

[1] commit 94524d8fc9 ("net/tls: Add support for async decryption of tls records")
[2] commit b2c4618162 ("bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:49:08 +01:00
Jakub Kicinski
d5123edd10 tls: rx: drop pointless else after goto
Pointless else branch after goto makes the code harder to refactor
down the line.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:49:07 +01:00
Jakub Kicinski
bfc06e1aaa tls: rx: jump to a more appropriate label
'recv_end:' checks num_async and decrypted, and is then followed
by the 'end' label. Since we know that decrypted and num_async
are 0 at the start we can jump to 'end'.

Move the init of decrypted and num_async to let the compiler
catch if I'm wrong.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:49:07 +01:00
Jakub Kicinski
dc2e0617f1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07 23:24:23 -07:00
GONG, Ruiqi
27a5a5685d net: mpls: fix memdup.cocci warning
Simply use kmemdup instead of explicitly allocating and copying memory.

Generated by: scripts/coccinelle/api/memdup.cocci

Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Link: https://lore.kernel.org/r/20220406114629.182833-1-gongruiqi1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07 21:06:41 -07:00
Ian Wienand
e9f656b7a2 net: ethernet: set default assignment identifier to NET_NAME_ENUM
As noted in the original commit 685343fc3b ("net: add
name_assign_type netdev attribute")

  ... when the kernel has given the interface a name using global
  device enumeration based on order of discovery (ethX, wlanY, etc)
  ... are labelled NET_NAME_ENUM.

That describes this case, so set the default for the devices here to
NET_NAME_ENUM.  Current popular network setup tools like systemd use
this only to warn if you're setting static settings on interfaces that
might change, so it is expected this only leads to better user
information, but not changing of interfaces, etc.

Signed-off-by: Ian Wienand <iwienand@redhat.com>
Link: https://lore.kernel.org/r/20220406093635.1601506-1-iwienand@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07 21:04:03 -07:00
Ping Gan
15fcdf6ae1 tcp: Add tracepoint for tcp_set_ca_state
The congestion status of a tcp flow may be updated since there
is congestion between tcp sender and receiver. It makes sense to
add tracepoint for congestion status set function to summate cc
status duration and evaluate the performance of network
and congestion algorithm. the backgound of this patch is below.

Link: https://github.com/iovisor/bcc/pull/3899

Signed-off-by: Ping Gan <jacky_gam_2001@163.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220406010956.19656-1-jacky_gam_2001@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07 20:33:15 -07:00
Jeffrey Ji
794c24e992 net-core: rx_otherhost_dropped to core_stats
Increment rx_otherhost_dropped counter when packet dropped due to
mismatched dest MAC addr.

An example when this drop can occur is when manually crafting raw
packets that will be consumed by a user space application via a tap
device. For testing purposes local traffic was generated using trafgen
for the client and netcat to start a server

Tested: Created 2 netns, sent 1 packet using trafgen from 1 to the other
with "{eth(daddr=$INCORRECT_MAC...}", verified that iproute2 showed the
counter was incremented. (Also had to modify iproute2 to show the stat,
additional patch for that coming next.)

Signed-off-by: Jeffrey Ji <jeffreyji@google.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220406172600.1141083-1-jeffreyjilinux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07 20:32:49 -07:00
Jakub Kicinski
6264f58ca0 net: extract a few internals from netdevice.h
There's a number of functions and static variables used
under net/core/ but not from the outside. We currently
dump most of them into netdevice.h. That bad for many
reasons:
 - netdevice.h is very cluttered, hard to figure out
   what the APIs are;
 - netdevice.h is very long;
 - we have to touch netdevice.h more which causes expensive
   incremental builds.

Create a header under net/core/ and move some declarations.

The new header is also a bit of a catch-all but that's
fine, if we create more specific headers people will
likely over-think where their declaration fit best.
And end up putting them in netdevice.h, again.

More work should be done on splitting netdevice.h into more
targeted headers, but that'd be more time consuming so small
steps.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07 20:32:09 -07:00
Jakub Kicinski
2cc6cdd44a net: unexport a handful of dev_* functions
We have a bunch of functions which are only used under
net/core/ yet they get exported. Remove the exports.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07 20:32:07 -07:00
Niels Dossche
51454ea42c ipv6: fix locking issues with loops over idev->addr_list
idev->addr_list needs to be protected by idev->lock. However, it is not
always possible to do so while iterating and performing actions on
inet6_ifaddr instances. For example, multiple functions (like
addrconf_{join,leave}_anycast) eventually call down to other functions
that acquire the idev->lock. The current code temporarily unlocked the
idev->lock during the loops, which can cause race conditions. Moving the
locks up is also not an appropriate solution as the ordering of lock
acquisition will be inconsistent with for example mc_lock.

This solution adds an additional field to inet6_ifaddr that is used
to temporarily add the instances to a temporary list while holding
idev->lock. The temporary list can then be traversed without holding
idev->lock. This change was done in two places. In addrconf_ifdown, the
list_for_each_entry_safe variant of the list loop is also no longer
necessary as there is no deletion within that specific loop.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220403231523.45843-1-dossche.niels@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06 22:09:39 -07:00
Jakub Kicinski
8e9d0d7a76 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2022-04-06

We've added 8 non-merge commits during the last 8 day(s) which contain
a total of 9 files changed, 139 insertions(+), 36 deletions(-).

The main changes are:

1) rethook related fixes, from Jiri and Masami.

2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin.

3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
  bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
  bpf: selftests: Test fentry tracing a struct_ops program
  bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT
  rethook: Fix to use WRITE_ONCE() for rethook:: Handler
  selftests/bpf: Fix warning comparing pointer to 0
  bpf: Fix sparse warnings in kprobe_multi_resolve_syms
  bpftool: Explicit errno handling in skeletons
====================

Link: https://lore.kernel.org/r/20220407031245.73026-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06 21:58:50 -07:00
Eric Dumazet
4057037535 tcp: add accessors to read/set tp->snd_cwnd
We had various bugs over the years with code
breaking the assumption that tp->snd_cwnd is greater
than zero.

Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added
in commit 8b8a321ff7 ("tcp: fix zero cwnd in tcp_cwnd_reduction")
can trigger, and without a repro we would have to spend
considerable time finding the bug.

Instead of complaining too late, we want to catch where
and when tp->snd_cwnd is set to an illegal value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06 12:05:41 -07:00