Commit graph

1088963 commits

Author SHA1 Message Date
Horatiu Vultur
8f2c7d9ad7 net: lan966x: Expose functions that are needed by FDMA
Expose the following functions 'lan966x_hw_offload',
'lan966x_ifh_get_src_port' and 'lan966x_ifh_get_timestamp' in
lan966x_main.h so they can be accessed by FDMA.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 20:49:32 -07:00
Horatiu Vultur
fdb2981c00 net: lan966x: Add registers that are used for FDMA.
Add the registers that are used to configure the FDMA.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 20:49:31 -07:00
Jonathan Neuschäfer
d6967d0414 net: calxedaxgmac: Fix typo (doubled "the")
Fix a doubled word in the comment above xgmac_poll.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/r/20220409182147.2509788-1-j.neuschaefer@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 20:49:21 -07:00
YueHaibing
bfa323c659 net: ethernet: ti: am65-cpsw: Fix build error without PHYLINK
If PHYLINK is n, build fails:

drivers/net/ethernet/ti/am65-cpsw-ethtool.o: In function `am65_cpsw_set_link_ksettings':
am65-cpsw-ethtool.c:(.text+0x118): undefined reference to `phylink_ethtool_ksettings_set'
drivers/net/ethernet/ti/am65-cpsw-ethtool.o: In function `am65_cpsw_get_link_ksettings':
am65-cpsw-ethtool.c:(.text+0x138): undefined reference to `phylink_ethtool_ksettings_get'
drivers/net/ethernet/ti/am65-cpsw-ethtool.o: In function `am65_cpsw_set_eee':
am65-cpsw-ethtool.c:(.text+0x158): undefined reference to `phylink_ethtool_set_eee'

Select PHYLINK for TI_K3_AM65_CPSW_NUSS to fix this.

Fixes: e8609e6947 ("net: ethernet: ti: am65-cpsw: Convert to PHYLINK")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20220409105931.9080-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 20:49:10 -07:00
Jakub Kicinski
e69a837f58 Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:

====================
Mellanox shared branch that includes:

 * Removal of FPGA TLS code https://lore.kernel.org/all/cover.1649073691.git.leonro@nvidia.com

  Mellanox INNOVA TLS cards are EOL in May, 2018 [1]. As such, the code
  is unmaintained, untested and not in-use by any upstream/distro oriented
  customers. In order to reduce code complexity, drop the kernel code,
  clean build config options and delete useless kTLS vs. TLS separation.

  [1] https://network.nvidia.com/related-docs/eol/LCR-000286.pdf

 * Removal of FPGA IPsec code https://lore.kernel.org/all/cover.1649232994.git.leonro@nvidia.com

  Together with FPGA TLS, the IPsec went to EOL state in the November of
  2019 [1]. Exactly like FPGA TLS, no active customers exist for this
  upstream code and all the complexity around that area can be deleted.

  [2] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf

 * Fix to undefined behavior from Borislav https://lore.kernel.org/all/20220405151517.29753-11-bp@alien8.de

* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: (23 commits)
  net/mlx5: Remove not-implemented IPsec capabilities
  net/mlx5: Remove ipsec_ops function table
  net/mlx5: Reduce kconfig complexity while building crypto support
  net/mlx5: Move IPsec file to relevant directory
  net/mlx5: Remove not-needed IPsec config
  net/mlx5: Align flow steering allocation namespace to common style
  net/mlx5: Unify device IPsec capabilities check
  net/mlx5: Remove useless IPsec device checks
  net/mlx5: Remove ipsec vs. ipsec offload file separation
  RDMA/core: Delete IPsec flow action logic from the core
  RDMA/mlx5: Drop crypto flow steering API
  RDMA/mlx5: Delete never supported IPsec flow action
  net/mlx5: Remove FPGA ipsec specific statistics
  net/mlx5: Remove XFRM no_trailer flag
  net/mlx5: Remove not-used IDA field from IPsec struct
  net/mlx5: Delete metadata handling logic
  net/mlx5_fpga: Drop INNOVA IPsec support
  IB/mlx5: Fix undefined behavior due to shift overflowing the constant
  net/mlx5: Cleanup kTLS function names and their exposure
  net/mlx5: Remove tls vs. ktls separation as it is the same
  ...
====================

Link: https://lore.kernel.org/r/20220409055303.1223644-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 20:34:02 -07:00
Minghao Chi
e2d0acd40c net: stmmac: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
Using pm_runtime_resume_and_get is more appropriate
for simplifing code

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Link: https://lore.kernel.org/r/20220408081250.2494588-1-chi.minghao@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 20:31:59 -07:00
Haiyang Zhang
1cb9d3b618 hv_netvsc: Add support for XDP_REDIRECT
Handle XDP_REDIRECT action in netvsc driver.
Also, transparently pass ndo_xdp_xmit to VF when available.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1649362894-20077-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 18:25:47 -07:00
Jakub Kicinski
2e36437f44 Merge branch 'ipv4-convert-several-tos-fields-to-dscp_t'
Guillaume Nault says:

====================
ipv4: Convert several tos fields to dscp_t

Continue the work started with commit a410a0cf98 ("ipv6: Define
dscp_t and stop taking ECN bits into account in fib6-rules") and
convert more structure fields and variables to dscp_t. This series
focuses on struct fib_rt_info, struct fib_entry_notifier_info and their
users (networking drivers).

The purpose of dscp_t is to ensure that ECN bits don't influence IP
route lookups. It does so by ensuring that dscp_t variables have the
ECN bits cleared.

Notes:
  * This series is entirely about type annotation and isn't supposed
to have any user visible effect.

  * The first two patches have to introduce a few dsfield <-> dscp
conversions in the affected drivers, but those are then removed when
converting the internal driver structures (patches 3-5). In the end,
drivers don't have to handle any conversion.
====================

Link: https://lore.kernel.org/r/cover.1649445279.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 17:38:03 -07:00
Guillaume Nault
9f6982e9a3 net: marvell: prestera: Use dscp_t in struct prestera_kern_fib_cache
Use the new dscp_t type to replace the kern_tos field of struct
prestera_kern_fib_cache. This ensures ECN bits are ignored and makes it
compatible with the dscp fields of struct fib_entry_notifier_info and
struct fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 17:38:02 -07:00
Guillaume Nault
046eabbf19 mlxsw: Use dscp_t in struct mlxsw_sp_fib4_entry
Use the new dscp_t type to replace the tos field of struct
mlxsw_sp_fib4_entry. This ensures ECN bits are ignored and makes it
compatible with the dscp fields of fib_entry_notifier_info and
fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 17:38:01 -07:00
Guillaume Nault
20bbf32efe netdevsim: Use dscp_t in struct nsim_fib4_rt
Use the new dscp_t type to replace the tos field of struct
nsim_fib4_rt. This ensures ECN bits are ignored and makes it compatible
with the dscp fields of struct fib_entry_notifier_info and struct
fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 17:37:58 -07:00
Guillaume Nault
568a3f33b4 ipv4: Use dscp_t in struct fib_entry_notifier_info
Use the new dscp_t type to replace the tos field of struct
fib_entry_notifier_info. This ensures ECN bits are ignored and makes it
compatible with the dscp field of struct fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 17:37:50 -07:00
Guillaume Nault
888ade8f90 ipv4: Use dscp_t in struct fib_rt_info
Use the new dscp_t type to replace the tos field of struct fib_rt_info.
This ensures ECN bits are ignored and makes it compatible with the
fa_dscp field of struct fib_alias.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11 17:37:50 -07:00
Grygorii Strashko
d072c88c28 net: ethernet: ti: cpsw: drop CPSW_HEADROOM define
Since commit 1771afd474 ("net: cpsw: avoid alignment faults by taking
NET_IP_ALIGN into account") the TI CPSW driver was switched to use correct
define CPSW_HEADROOM_NA to avoid alignment faults, but there are two places
left where CPSW_HEADROOM is still used (without causing issues).

Hence, completely drop CPSW_HEADROOM define and use CPSW_HEADROOM_NA
everywhere to avoid further mistakes in code.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 12:27:33 +01:00
David S. Miller
e782f5bad3 Merge branch 'mptcp-next'
Mat Martineau says:

====================
mptcp: Miscellaneous changes for 5.19

Four separate groups of patches here:

Patch 1 optimizes flag checking when releasing mptcp socket locks.

Patches 2 and 3 update the packet scheduler when subflow priorities
change.

Patch 4 adds some pernet helper functions for MPTCP.

Patches 5-8 add diag support for MPTCP listeners, including a selftest.

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 11:55:54 +01:00
Florian Westphal
f2ae0fa68e selftests/mptcp: add diag listen tests
Check dumping of mptcp listener sockets:
1. filter by dport should not return any results
2. filter by sport should return listen sk
3. filter by saddr+sport should return listen sk
4. no filter should return listen sk

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 11:55:54 +01:00
Florian Westphal
4fa39b701c mptcp: listen diag dump support
makes 'ss -Ml' show mptcp listen sockets.

Iterate over the tcp listen sockets and pick those that have mptcp ulp
info attached.

mptcp_diag_get_info() is modified to prefer msk->first for mptcp sockets
in listen state.  This reports accurate number for recv and send queue
(pending / max connection backlog counters).

Sample output:
ss -Mil
State        Recv-Q Send-Q Local Address:Port  Peer Address:Port
LISTEN       0      20     127.0.0.1:12000     0.0.0.0:*
         subflows_max:2

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 11:55:53 +01:00
Florian Westphal
e8887b7161 mptcp: remove locking in mptcp_diag_fill_info
Problem is that listener iteration would call this from atomic context
so this locking is not allowed.

One way is to drop locks before calling the helper, but afaics the lock
isn't really needed, all values are fetched via READ_ONCE().

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 11:55:53 +01:00
Florian Westphal
6b9ea5c81e mptcp: diag: switch to context structure
Raw access to cb->arg[] is deprecated, use a context structure.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 11:55:53 +01:00
Geliang Tang
c682bf536c mptcp: add pm_nl_pernet helpers
This patch adds two pm_nl_pernet related helpers, named pm_nl_get_pernet()
and pm_nl_get_pernet_from_msk() to get pm_nl_pernet from 'net' or 'msk'.
Use these helpers instead of using net_generic() directly.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 11:55:53 +01:00
Paolo Abeni
0e203c3247 mptcp: reset the packet scheduler on PRIO change
Similar to the previous patch, for priority changes
requested by the local PM.

Reported-and-suggested-by: Davide Caratti <dcaratti@redhat.com>
Fixes: 067065422f ("mptcp: add the outgoing MP_PRIO support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 11:55:53 +01:00
Paolo Abeni
43f5b111d1 mptcp: reset the packet scheduler on incoming MP_PRIO
When an incoming MP_PRIO option changes the backup
status of any subflow, we need to reset the packet
scheduler status, or the next send could keep using
the previously selected subflow, without taking in account
the new priorities.

Reported-by: Davide Caratti <dcaratti@redhat.com>
Fixes: 40453a5c61 ("mptcp: add the incoming MP_PRIO support")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 11:55:53 +01:00
Paolo Abeni
65a569b03c mptcp: optimize release_cb for the common case
The mptcp release callback checks several flags in atomic
context, but only MPTCP_CLEAN_UNA can be up frequently.

Reorganize the code to avoid multiple conditionals in the
most common scenarios.

Additional clarify a related comment.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 11:55:53 +01:00
David S. Miller
4696ad36d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Replace unnecessary list_for_each_entry_continue() in nf_tables,
   from Jakob Koschel.

2) Add struct nf_conntrack_net_ecache to conntrack event cache and
   use it, from Florian Westphal.

3) Refactor ctnetlink_dump_list(), also from Florian.

4) Bump module reference counter on cttimeout object addition/removal,
   from Florian.

5) Consolidate nf_log MAC printer, from Phil Sutter.

6) Add basic logging support for unknown ethertype, from Phil Sutter.

7) Consolidate check for sysctl nf_log_all_netns toggle, also from Phil.

8) Replace hardcode value in nft_bitwise, from Jeremy Sowden.

9) Rename BASIC-like goto tags in nft_bitwise to more meaningful names,
   also from Jeremy.

10) nft_fib support for reverse path filtering with policy-based routing
    on iif. Extend selftests to cover for this new usecase, from Florian.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 11:47:58 +01:00
Florian Westphal
0c7b27616f selftests: netfilter: add fib expression forward test case
Its now possible to use fib expression in the forward chain (where both
the input and output interfaces are known).

Add a simple test case for this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-11 12:10:09 +02:00
Pablo Neira Ayuso
be8be04e5d netfilter: nft_fib: reverse path filter for policy-based routing on iif
If policy-based routing using the iif selector is used, then the fib
expression fails to look up for the reverse path from the prerouting
hook because the input interface cannot be inferred. In order to support
this scenario, extend the fib expression to allow to use after the route
lookup, from the forward hook.

This patch also adds support for the input hook for usability reasons.
Since the prerouting hook cannot be used for the scenario described
above, users need two rules: one for the forward chain and another rule
for the input chain to check for the reverse path check for locally
targeted traffic.

Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-11 12:10:09 +02:00
Lv Ruyi
a21437d2b4 bnx2x: Fix spelling mistake "regiser" -> "register"
There are some spelling mistakes in the comments for macro. Fix it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:52:09 +01:00
Felix Fietkau
4d65f9b686 net: ethernet: mtk_eth_soc/wed: fix sparse endian warnings
Descriptor fields are little-endian

Fixes: 804775dfc2 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:49:41 +01:00
Yang Yingliang
b559edfaf3 net: ethernet: mtk_eth_soc: fix return value check in mtk_wed_add_hw()
If syscon_regmap_lookup_by_phandle() fails, it never return NULL pointer,
change the check to IS_ERR().

Fixes: 804775dfc2 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:45:34 +01:00
David S. Miller
750d019d70 Merge branch 'icmp-skb-reason'
Menglong Dong says:

====================
net: icmp: add skb drop reasons to icmp
In the commit c504e5c2f9 ("net: skb: introduce kfree_skb_reason()"),
we added the support of reporting the reasons of skb drops to kfree_skb
tracepoint. And in this series patches, reasons for skb drops are added
to ICMP protocol.

In order to report the reasons of skb drops in 'sock_queue_rcv_skb()',
the function 'sock_queue_rcv_skb_reason()' is introduced in the 1th
patch, which is used in the 3th patch.

As David Ahern suggested, the reasons for skb drops should be more
general and not be code based. Therefore, in the 2th patch,
SKB_DROP_REASON_PTYPE_ABSENT is renamed to
SKB_DROP_REASON_UNHANDLED_PROTO, which is used for the cases of no
L3 protocol handler, no L4 protocol handler, version extensions, etc.

In the 3th patch, we introduce the new function __ping_queue_rcv_skb()
to report drop reasons by its return value and keep the return value of
ping_queue_rcv_skb() still.

In the 4th patch, we make ICMP message handler functions return drop
reasons, which means we change the return type of 'handler()' in
'struct icmp_control' from 'bool' to 'enum skb_drop_reason'. This
changed its original intention, as 'false' means failure, but
'SKB_NOT_DROPPED_YET', which is 0, means success now. Therefore, we
have to change all usages of these handler. Following "handler"
functions are involved:

icmp_unreach()
icmp_redirect()
icmp_echo()
icmp_timestamp()
icmp_discard()

And following drop reasons are added(what they mean can be see
in the document for them):

SKB_DROP_REASON_ICMP_CSUM
SKB_DROP_REASON_INVALID_PROTO

The reason 'INVALID_PROTO' is introduced for the case that the packet
doesn't follow rfc 1122 and is dropped. I think this reason is different
from the 'UNHANDLED_PROTO', as the 'UNHANDLED_PROTO' means the packet is
fine, and it is just not supported. This is not a common case, and I
believe we can locate the problem from the data in the packet. For now,
this 'INVALID_PROTO' is used for the icmp broadcasts with wrong types.

Maybe there should be a document file for these reasons. For example,
list all the case that causes the 'INVALID_PROTO' drop reason. Therefore,
users can locate their problems according to the document.

Changes since v4:
- rename SKB_DROP_REASON_RFC_1122 to SKB_DROP_REASON_INVALID_PROTO

Changes since v3:
- rename SKB_DROP_REASON_PTYPE_ABSENT to SKB_DROP_REASON_UNHANDLED_PROTO
  in the 2th patch
- fix the return value problem of ping_queue_rcv_skb() in the 3th patch
- remove SKB_DROP_REASON_ICMP_TYPE and SKB_DROP_REASON_ICMP_BROADCAST
  and introduce the SKB_DROP_REASON_RFC_1122 in the 4th patch

Changes since v2:
- fix aliegnment problem in the 2th patch

Changes since v1:
- introduce __ping_queue_rcv_skb() instead of change the return value
  of ping_queue_rcv_skb() in the 2th patch, as Paolo suggested
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:38:38 +01:00
Menglong Dong
b384c95a86 net: icmp: add skb drop reasons to icmp protocol
Replace kfree_skb() used in icmp_rcv() and icmpv6_rcv() with
kfree_skb_reason().

In order to get the reasons of the skb drops after icmp message handle,
we change the return type of 'handler()' in 'struct icmp_control' from
'bool' to 'enum skb_drop_reason'. This may change its original
intention, as 'false' means failure, but 'SKB_NOT_DROPPED_YET' means
success now. Therefore, all 'handler' and the call of them need to be
handled. Following 'handler' functions are involved:

icmp_unreach()
icmp_redirect()
icmp_echo()
icmp_timestamp()
icmp_discard()

And following new drop reasons are added:

SKB_DROP_REASON_ICMP_CSUM
SKB_DROP_REASON_INVALID_PROTO

The reason 'INVALID_PROTO' is introduced for the case that the packet
doesn't follow rfc 1122 and is dropped. This is not a common case, and
I believe we can locate the problem from the data in the packet. For now,
this 'INVALID_PROTO' is used for the icmp broadcasts with wrong types.

Maybe there should be a document file for these reasons. For example,
list all the case that causes the 'UNHANDLED_PROTO' and 'INVALID_PROTO'
drop reason. Therefore, users can locate their problems according to the
document.

Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:38:37 +01:00
Menglong Dong
41a95a00eb net: icmp: introduce __ping_queue_rcv_skb() to report drop reasons
In order to avoid to change the return value of ping_queue_rcv_skb(),
introduce the function __ping_queue_rcv_skb(), which is able to report
the reasons of skb drop as its return value, as Paolo suggested.

Meanwhile, make ping_queue_rcv_skb() a simple call to
__ping_queue_rcv_skb().

The kfree_skb() and sock_queue_rcv_skb() used in ping_queue_rcv_skb()
are replaced with kfree_skb_reason() and sock_queue_rcv_skb_reason()
now.

Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:38:37 +01:00
Menglong Dong
9f8ed577c2 net: skb: rename SKB_DROP_REASON_PTYPE_ABSENT
As David Ahern suggested, the reasons for skb drops should be more
general and not be code based.

Therefore, rename SKB_DROP_REASON_PTYPE_ABSENT to
SKB_DROP_REASON_UNHANDLED_PROTO, which is used for the cases of no
L3 protocol handler, no L4 protocol handler, version extensions, etc.

From previous discussion, now we have the aim to make these reasons
more abstract and users based, avoiding code based.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:38:37 +01:00
Menglong Dong
c1b8a56755 net: sock: introduce sock_queue_rcv_skb_reason()
In order to report the reasons of skb drops in 'sock_queue_rcv_skb()',
introduce the function 'sock_queue_rcv_skb_reason()'.

As the return value of 'sock_queue_rcv_skb()' is used as the error code,
we can't make it as drop reason and have to pass extra output argument.
'sock_queue_rcv_skb()' is used in many places, so we can't change it
directly.

Introduce the new function 'sock_queue_rcv_skb_reason()' and make
'sock_queue_rcv_skb()' an inline call to it.

Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11 10:38:37 +01:00
David S. Miller
516a2f1f6f Merge branch 'tls-rx-refactoring-part-2'
Jakub Kicinski says:

====================
tls: rx: random refactoring part 2

TLS Rx refactoring. Part 2 of 3. This one focusing on the main loop.
A couple of features to follow.
====================
2022-04-10 17:32:12 +01:00
Jakub Kicinski
f940b6efb1 tls: rx: jump out for cases which need to leave skb on list
The current invese logic is harder to follow (and adds extra
tests to the fast path). We have to enumerate all cases which
need to keep the skb before consuming it. It's simpler to
jump out of the full record flow as we detect those cases.

This makes it clear that partial consumption and peek can
only reach end of the function thru the !zc case so move
the code up there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:12 +01:00
Jakub Kicinski
b1a2c17863 tls: rx: clear ctx->recv_pkt earlier
Whatever we do in the loop the skb should not remain on as
ctx->recv_pkt afterwards. We can clear that pointer and
restart strparser earlier.

This adds overhead of extra linking and unlinking to rx_list
but that's not large (upcoming change will switch to unlocked
skb list operations).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:12 +01:00
Jakub Kicinski
465ea73535 tls: rx: inline consuming the skb at the end of the loop
tls_sw_advance_skb() always consumes the skb at the end of the loop.

To fall here the following must be true:

 !async && !is_peek && !retain_skb
   retain_skb => !zc && rxm->full_len > len
     # but non-full record implies !zc, so above can be simplified as
   retain_skb => rxm->full_len > len

 !async && !is_peek && !(rxm->full_len > len)
 !async && !is_peek && rxm->full_len <= len

tls_sw_advance_skb() returns false if len < rxm->full_len
which can't be true given conditions above.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:12 +01:00
Jakub Kicinski
ba13609df1 tls: rx: pull most of zc check out of the loop
Most of the conditions deciding if zero-copy can be used
do not change throughout the iterations, so pre-calculate
them.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
7da18bcc5e tls: rx: don't track the async count
We track both if the last record was handled by async crypto
and how many records were async. This is not necessary. We
implicitly assume once crypto goes async it will stay that
way, otherwise we'd reorder records. So just track if we're
in async mode, the exact number of records is not necessary.

This change also forces us into "async" mode more consistently
in case crypto ever decided to interleave async and sync.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
fc8da80f99 tls: rx: don't handle async in tls_sw_advance_skb()
tls_sw_advance_skb() caters to the async case when skb argument
is NULL. In that case it simply unpauses the strparser.

These are surprising semantics to a person reading the code,
and result in higher LoC, so inline the __strp_unpause and
only call tls_sw_advance_skb() when we actually move past
an skb.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
06554f4ffc tls: rx: factor out writing ContentType to cmsg
cmsg can be filled in during rx_list processing or normal
receive. Consolidate the code.

We don't need to keep the boolean to track if the cmsg was
created. 0 is an invalid content type.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
37943f047b tls: rx: simplify async wait
Since we are protected from async completions by decrypt_compl_lock
we can drop the async_notify and reinit the completion before we
start waiting.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
4175eac371 tls: rx: wrap decryption arguments in a structure
We pass zc as a pointer to bool a few functions down as an in/out
argument. This is error prone since C will happily evalue a pointer
as a boolean (IOW forgetting *zc and writing zc leads to loss of
developer time..). Wrap the arguments into a structure.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
9bdf75ccff tls: rx: don't report text length from the bowels of decrypt
We plumb pointer to chunk all the way to the decryption method.
It's set to the length of the text when decrypt_skb_update()
returns.

I think the code is written this way because original TLS
implementation passed &chunk to zerocopy_from_iter() and this
was carried forward as the code gotten more complex, without
any refactoring.

The fix for peek() introduced a new variable - to_decrypt
which for all practical purposes is what chunk is going to
get set to. Spare ourselves the pointer passing, use to_decrypt.

Use this opportunity to clean things up a little further.

Note that chunk / to_decrypt was mostly needed for the async
path, since the sync path would access rxm->full_len (decryption
transforms full_len from record size to text size). Use the
right source of truth more explicitly.

We have three cases:
 - async - it's TLS 1.2 only, so chunk == to_decrypt, but we
           need the min() because to_decrypt is a whole record
	   and we don't want to underflow len. Note that we can't
	   handle partial record by falling back to sync as it
	   would introduce reordering against records in flight.
 - zc - again, TLS 1.2 only for now, so chunk == to_decrypt,
        we don't do zc if len < to_decrypt, no need to check again.
 - normal - it already handles chunk > len, we can factor out the
            assignment to rxm->full_len and share it with zc.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jakub Kicinski
d4bd88e676 tls: rx: drop unnecessary arguments from tls_setup_from_iter()
sk is unused, remove it to make it clear the function
doesn't poke at the socket.

size_used is always 0 on input and @length on success.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10 17:32:11 +01:00
Jeremy Sowden
00bd435208 netfilter: bitwise: improve error goto labels
Replace two labels (`err1` and `err2`) with more informative ones.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-04-09 12:02:23 +02:00
Jeremy Sowden
c70b921fc1 netfilter: bitwise: replace hard-coded size with sizeof expression
When calculating the length of an array, use the appropriate `sizeof`
expression for its type, rather than an integer literal.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-04-09 12:02:22 +02:00
Leon Romanovsky
2984287c4c net/mlx5: Remove not-implemented IPsec capabilities
Clean a capabilities enum to remove not-implemented bits.

Link: https://lore.kernel.org/r/1044bb7b779107ff38e48e3f6553421104f3f819.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09 08:25:07 +03:00
Leon Romanovsky
f2b41b32cd net/mlx5: Remove ipsec_ops function table
There is only one IPsec implementation and ipsec_ops is not needed
at all in this situation. Together with removal of ipsec_ops, we can
drop the entry checks as these functions are called for IPsec devices
only.

Link: https://lore.kernel.org/r/bc8dd1c8a77b65dbf5e2cf92c813ffaca2505c5f.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09 08:25:07 +03:00