Commit graph

1201172 commits

Author SHA1 Message Date
Shuah Khan
04786c0659 selftests: connector: Fix input argument error paths to skip
Fix input argument parsing paths to skip from their error legs.
This fix helps to avoid false test failure reports without running
the test.

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Link: https://lore.kernel.org/r/20230729002403.4278-1-skhan@linuxfoundation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 20:11:42 -07:00
Martin KaFai Lau
079082c60a tcx: Fix splat during dev unregister
During unregister_netdevice_many_notify(), the ordering of our concerned
function calls is like this:

  unregister_netdevice_many_notify
    dev_shutdown
	qdisc_put
            clsact_destroy
    tcx_uninstall

The syzbot reproducer triggered a case that the qdisc refcnt is not
zero during dev_shutdown().

tcx_uninstall() will then WARN_ON_ONCE(tcx_entry(entry)->miniq_active)
because the miniq is still active and the entry should not be freed.
The latter assumed that qdisc destruction happens before tcx teardown.

This fix is to avoid tcx_uninstall() doing tcx_entry_free() when the
miniq is still alive and let the clsact_destroy() do the free later, so
that we do not assume any specific ordering for either of them.

If still active, tcx_uninstall() does clear the entry when flushing out
the prog/link. clsact_destroy() will then notice the "!tcx_entry_is_active()"
and then does the tcx_entry_free() eventually.

Fixes: e420bed025 ("bpf: Add fd-based tcx multi-prog infra with link support")
Reported-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com
Reported-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+376a289e86a0fd02b9ba@syzkaller.appspotmail.com
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/222255fe07cb58f15ee662e7ee78328af5b438e4.1690549248.git.daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:44:02 -07:00
Florian Fainelli
df41fa677d net: bcmgenet: Remove TX ring full logging
There is no need to spam the kernel log with such an indication, remove
this message.

Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230728183945.760531-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:42:08 -07:00
Yue Haibing
634e449719 vsock: Remove unused function declarations
These are never implemented since introduction in
commit d021c34405 ("VSOCK: Introduce VM Sockets")

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20230729122036.32988-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:41:08 -07:00
Yue Haibing
4cbc32a8a2 net/smc: Remove unused function declarations
commit f9aab6f2ce ("net/smc: immediate freeing in smc_lgr_cleanup_early()")
left behind smc_lgr_schedule_free_work_fast() declaration.
And since commit 349d43127d ("net/smc: fix kernel panic caused by race of smc_sock")
smc_ib_modify_qp_reset() is not used anymore.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/20230729121929.17180-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:40:26 -07:00
Jakub Kicinski
bfd95b56a3 Merge branch 'connector-proc_filter-test-fixes'
Shuah Khan says:

====================
Connector/proc_filter test fixes

The first patch fixes the LKFT reported compile error, second
one adds .gitignore.
====================

Applying the first 2 patches, third one resent separately.

Link: https://lore.kernel.org/r/cover.1690564372.git.skhan@linuxfoundation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:39:37 -07:00
Shuah Khan
f4dcfa6fa1 selftests: connector: Add .gitignore and poupulate it with test
Add gitignore and poupulate it with test name - proc_filter

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/e3d04cc34e9af07909dc882b50fb1b6f1ce7705b.1690564372.git.skhan@linuxfoundation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:39:35 -07:00
Shuah Khan
165f689058 selftests: connector: Fix Makefile to include KHDR_INCLUDES
The test compile fails with following errors. Fix the Makefile
CFLAGS to include KHDR_INCLUDES to pull in uapi defines.

gcc -Wall     proc_filter.c  -o ../tools/testing/selftests/connector/proc_filter
proc_filter.c: In function ‘send_message’:
proc_filter.c:22:33: error: invalid application of ‘sizeof’ to incomplete type ‘struct proc_input’
   22 |                          sizeof(struct proc_input))
      |                                 ^~~~~~
proc_filter.c:42:19: note: in expansion of macro ‘NL_MESSAGE_SIZE’
   42 |         char buff[NL_MESSAGE_SIZE];
      |                   ^~~~~~~~~~~~~~~
proc_filter.c:22:33: error: invalid application of ‘sizeof’ to incomplete type ‘struct proc_input’
   22 |                          sizeof(struct proc_input))
      |                                 ^~~~~~
proc_filter.c:48:34: note: in expansion of macro ‘NL_MESSAGE_SIZE’
   48 |                 hdr->nlmsg_len = NL_MESSAGE_SIZE;
      |                                  ^~~~~~~~~~~~~~~
`

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lore.kernel.org/all/CA+G9fYt=6ysz636XcQ=-KJp7vJcMZ=NjbQBrn77v7vnTcfP2cA@mail.gmail.com/
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/d0055c8cdf18516db8ba9edec99cfc5c08f32a7c.1690564372.git.skhan@linuxfoundation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:39:32 -07:00
Jan Sokolowski
230f3d53a5 i40e: remove i40e_status
Replace uses of i40e_status to as equivalent as possible error codes.
Remove enum i40e_status as it is no longer needed

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230728171336.2446156-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:37:25 -07:00
Yue Haibing
68223f9699 tcp: Remove unused function declarations
commit 8a59f9d1e3 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
left behind tcp_bpf_get_proto() declaration. And tcp_v4_tw_remember_stamp()
function is remove in ccb7c410dd ("timewait_sock: Create and use getpeer op.").
Since commit 686989700c ("tcp: simplify tcp_mark_skb_lost")
tcp_skb_mark_lost_uncond_verify() declaration is not used anymore.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230729122644.10648-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:33:55 -07:00
Yue Haibing
2628d40899 devlink: Remove unused extern declaration devlink_port_region_destroy()
devlink_port_region_destroy() is never implemented since
commit 544e7c33ec ("net: devlink: Add support for port regions").

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230728132113.32888-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:11:00 -07:00
Kuniyuki Iwashima
8936bf53a0 net: Use sockaddr_storage for getsockopt(SO_PEERNAME).
Commit df8fc4e934 ("kbuild: Enable -fstrict-flex-arrays=3") started
applying strict rules to standard string functions.

It does not work well with conventional socket code around each protocol-
specific sockaddr_XXX struct, which is cast from sockaddr_storage and has
a bigger size than fortified functions expect.  See these commits:

 commit 06d4c8a808 ("af_unix: Fix fortify_panic() in unix_bind_bsd().")
 commit ecb4534b6a ("af_unix: Terminate sun_path when bind()ing pathname socket.")
 commit a0ade8404c ("af_packet: Fix warning of fortified memcpy() in packet_getname().")

We must cast the protocol-specific address back to sockaddr_storage
to call such functions.

However, in the case of getsockaddr(SO_PEERNAME), the rationale is a bit
unclear as the buffer is defined by char[128] which is the same size as
sockaddr_storage.

Let's use sockaddr_storage explicitly.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-31 09:14:16 +01:00
Ratheesh Kannoth
2b3082c6ef net: flow_dissector: Use 64bits for used_keys
As 32bits of dissector->used_keys are exhausted,
increase the size to 64bits.

This is base change for ESP/AH flow dissector patch.
Please find patch and discussions at
https://lore.kernel.org/netdev/ZMDNjD46BvZ5zp5I@corigine.com/T/#t

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw
Tested-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-31 09:11:24 +01:00
Yang Li
64a37272fa team: Remove NULL check before dev_{put, hold}
The call netdev_{put, hold} of dev_{put, hold} will check NULL,
so there is no need to check before using dev_{put, hold},
remove it to silence the warning:

./drivers/net/team/team.c:2325:3-10: WARNING: NULL check before dev_{put, hold} functions is not needed.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5991
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 16:58:00 +01:00
Lorenzo Bianconi
88efedf517 net: ethernet: mtk_eth_soc: enable nft hw flowtable_offload for MT7988 SoC
Enable hw Packet Process Engine (PPE) for MT7988 SoC.

Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/5e86341b0220a49620dadc02d77970de5ded9efc.1690441576.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 18:47:26 -07:00
Lorenzo Bianconi
58ea461b69 net: ethernet: mtk_eth_soc: enable page_pool support for MT7988 SoC
In order to recycle pages, enable page_pool allocator for MT7988 SoC.

Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/fd4e8693980e47385a543e7b002eec0b88bd09df.1690440675.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 18:47:26 -07:00
Chen Jiahao
c88c157d25 net: bcmasp: Clean up redundant dev_err_probe()
Referring to platform_get_irq()'s definition, the return value has
already been checked, error message also been printed via
dev_err_probe() if ret < 0. Calling dev_err_probe() one more time
outside platform_get_irq() is obviously redundant.

Removing dev_err_probe() outside platform_get_irq() to clean up
above problem.

Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Justin Chen <justin.chen@broadcom.com>
Link: https://lore.kernel.org/r/20230727115551.2655840-1-chenjiahao16@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 18:47:23 -07:00
YueHaibing
61c5145317 bonding: 3ad: Remove unused declaration bond_3ad_update_lacp_active()
This is not used since commit 3a755cd8b7 ("bonding: add new option lacp_active")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230726143816.15280-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 18:06:30 -07:00
Jakub Kicinski
4e1db4a87f Merge branch 'r8152-reduce-control-transfer'
Hayes Wang says:

====================
r8152: reduce control transfer

The two patches are used to reduce the number of control transfer when
access the registers in bulk.
====================

Link: https://lore.kernel.org/r/20230726030808.9093-417-nic_swsd@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 18:01:24 -07:00
Hayes Wang
e5c266a611 r8152: set bp in bulk
PLA_BP_0 ~ PLA_BP_15 (0xfc28 ~ 0xfc46) are continuous registers, so we
could combine the control transfers into one control transfer.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20230726030808.9093-419-nic_swsd@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 18:01:22 -07:00
Hayes Wang
57df0fb9d5 r8152: adjust generic_ocp_write function
Reduce the control transfer if all bytes of first or the last DWORD are
written.

The original method is to split the control transfer into three parts
(the first DWORD, middle continuous data, and the last DWORD). However,
they could be combined if whole bytes of the first DWORD or last DWORD
are written. That is, the first DWORD or the last DWORD could be combined
with the middle continuous data, if the byte_en is 0xff.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20230726030808.9093-418-nic_swsd@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 18:01:22 -07:00
Colin Ian King
3bdd85e2e3 net: ethernet: slicoss: remove redundant increment of pointer data
The pointer data is being incremented but this change to the pointer
is not used afterwards. The increment is redundant and can be removed.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Link: https://lore.kernel.org/r/20230726164522.369206-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 15:37:08 -07:00
Jakub Kicinski
05191d8896 Merge branch 'in-kernel-support-for-the-tls-alert-protocol'
Chuck Lever says:

====================
In-kernel support for the TLS Alert protocol

IMO the kernel doesn't need user space (ie, tlshd) to handle the TLS
Alert protocol. Instead, a set of small helper functions can be used
to handle sending and receiving TLS Alerts for in-kernel TLS
consumers.
====================

Merged on top of a tag in case it's needed in the NFS tree.

Link: https://lore.kernel.org/r/169047923706.5241.1181144206068116926.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:08:02 -07:00
Chuck Lever
b470985c76 net/handshake: Trace events for TLS Alert helpers
Add observability for the new TLS Alert infrastructure.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047947409.5241.14548832149596892717.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
39067dda1d SUNRPC: Use new helpers to handle TLS Alerts
Use the helpers to parse the level and description fields in
incoming alerts. "Warning" alerts are discarded, and "fatal"
alerts mean the session is no longer valid.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047944747.5241.1974889594004407123.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
39d0e38dcc net/handshake: Add helpers for parsing incoming TLS Alerts
Kernel TLS consumers can replace common TLS Alert parsing code with
these helpers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047942074.5241.13791647439480672048.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
5dd5ad682c SUNRPC: Send TLS Closure alerts before closing a TCP socket
Before closing a TCP connection, the TLS protocol wants peers to
send session close Alert notifications. Add those in both the RPC
client and server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047939404.5241.14392506226409865832.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
35b1b538d4 net/handshake: Add API for sending TLS Closure alerts
This helper sends an alert only if a TLS session was established.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047936730.5241.618595693821012638.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
0257427146 net/tls: Add TLS Alert definitions
I'm about to add support for kernel handshake API consumers to send
TLS Alerts, so introduce the needed protocol definitions in the new
header tls_prot.h.

This presages support for Closure alerts. Also, support for alerts
is a pre-requite for handling session re-keying, where one peer will
signal the need for a re-key by sending a TLS Alert.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047934064.5241.8377890858495063518.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Chuck Lever
6a7eccef47 net/tls: Move TLS protocol elements to a separate header
Kernel TLS consumers will need definitions of various parts of the
TLS protocol, but often do not need the function declarations and
other infrastructure provided in <net/tls.h>.

Break out existing standardized protocol elements into a separate
header, and make room for a few more elements in subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047931374.5241.7713175865185969309.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:07:59 -07:00
Suman Ghosh
222a6c42e9 octeontx2-af: Initialize 'cntr_val' to fix uninitialized symbol error
drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c:860
otx2_tc_update_mcam_table_del_req()
error: uninitialized symbol 'cntr_val'.

Fixes: ec87f05402 ("octeontx2-af: Install TC filter rules in hardware based on priority")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20230727163101.2793453-1-sumang@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 14:02:11 -07:00
Jakub Kicinski
a4989bee92 Merge branch 'eth-bnxt-fix-a-couple-of-w-1-c-1-warnings'
Jakub Kicinski says:

====================
eth: bnxt: fix a couple of W=1 C=1 warnings

Fix a couple of build warnings.
====================

Link: https://lore.kernel.org/r/20230727190726.1859515-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:47:36 -07:00
Jakub Kicinski
9f49db62f5 eth: bnxt: fix warning for define in struct_group
Fix C=1 warning with sparse 0.6.4:

drivers/net/ethernet/broadcom/bnxt/bnxt.c: note: in included file:
drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h:30:1: warning: directive in macro's argument list

Don't put defines in a struct_group().

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230727190726.1859515-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:47:34 -07:00
Jakub Kicinski
833c4a8105 eth: bnxt: fix one of the W=1 warnings about fortified memcpy()
Fix a W=1 warning with gcc 13.1:

In function ‘fortify_memcpy_chk’,
    inlined from ‘bnxt_hwrm_queue_cos2bw_cfg’ at drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c:133:3:
include/linux/fortify-string.h:592:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
  592 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The field group is already defined and starts at queue_id:

struct bnxt_cos2bw_cfg {
	u8			pad[3];
	struct_group_attr(cfg, __packed,
		u8		queue_id;
		__le32		min_bw;

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230727190726.1859515-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:47:34 -07:00
Jakub Kicinski
b10d10a7c1 mlx5-updates-2023-07-24
1) Generalize devcom implementation to be independent of number of ports
    or device's GUID.
 
 2) Save memory on command interface statistics.
 
 3) General code cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmTCuWsACgkQSD+KveBX
 +j7u8Qf+Kqu8R8inJBKMDLlGBejdlA25HacsaRvYGvfPEhpmZMLdol/QsAdD0xZ7
 p7NOy+2YUplVeobEEPH8aQnfocLu+VaO7p1vX0Em65u6GEE3TazggSn0CRgFIROn
 trGM+Rg6Z9uv8LIFJjblpgAUIomuZ5xKjYAqxvK4YoWh6xt0gQwDa8nXBYT+7HA1
 LXh1eGFO0JJuEhJ/xq4EK52P1LmErgEBX5MhbCRG30PENJMRuja15bMDJ1wwNFyS
 wvYiNrVTemO5SP2ZBlkWizm8rMcL6NAGVAYlQLS3iR00C/1MPI8vFJlTRv+1LmM1
 NkcKVpUcxAn6xsghrfxLLrK2r98zSQ==
 =363t
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-07-24

1) Generalize devcom implementation to be independent of number of ports
   or device's GUID.

2) Save memory on command interface statistics.

3) General code cleanups

* tag 'mlx5-updates-2023-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefix
  net/mlx5: Make mlx5_eswitch_load/unload_vport() static
  net/mlx5: Make mlx5_esw_offloads_rep_load/unload() static
  net/mlx5: Remove pointless devlink_rate checks
  net/mlx5: Don't check vport->enabled in port ops
  net/mlx5e: Make flow classification filters static
  net/mlx5e: Remove duplicate code for user flow
  net/mlx5: Allocate command stats with xarray
  net/mlx5: split mlx5_cmd_init() to probe and reload routines
  net/mlx5: Remove redundant cmdif revision check
  net/mlx5: Re-organize mlx5_cmd struct
  net/mlx5e: E-Switch, Allow devcom initialization on more vports
  net/mlx5e: E-Switch, Register devcom device with switch id key
  net/mlx5: Devcom, Infrastructure changes
  net/mlx5: Use shared code for checking lag is supported
====================

Link: https://lore.kernel.org/r/20230727183914.69229-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:41:59 -07:00
Jakub Kicinski
97d0dca794 Merge branch 'mlxsw-avoid-non-tracker-helpers-when-holding-and-putting-netdevices'
Petr Machata says:

====================
mlxsw: Avoid non-tracker helpers when holding and putting netdevices

Using the tracking helpers, netdev_hold() and netdev_put(), makes it easier
to debug netdevice refcount imbalances when CONFIG_NET_DEV_REFCNT_TRACKER
is enabled. For example, the following traceback shows the callpath to the
point of an outstanding hold that was never put:

    unregister_netdevice: waiting for swp3 to become free. Usage count = 6
    ref_tracker: eth%d@ffff888123c9a580 has 1/5 users at
	mlxsw_sp_switchdev_event+0x6bd/0xcc0 [mlxsw_spectrum]
	notifier_call_chain+0xbf/0x3b0
	atomic_notifier_call_chain+0x78/0x200
	br_switchdev_fdb_notify+0x25f/0x2c0 [bridge]
	fdb_notify+0x16a/0x1a0 [bridge]
	[...]

In this patchset, get rid of all non-ref-tracking helpers in mlxsw.

- Patch #1 drops two functions that are not used anymore, but contain
  dev_hold() / dev_put() calls.

- Patch #2 avoids taking a reference in one function which is called
  under RTNL.

- The remaining patches convert individual hold/put sites one by one
  from trackerless to tracker-enabled.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/netdev/4c056da27c19d95ffeaba5acf1427ecadfc3f94c.camel@redhat.com/
====================

Link: https://lore.kernel.org/r/cover.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:48 -07:00
Petr Machata
cb21162041 mlxsw: spectrum_router: IPv6 events: Use tracker helpers to hold & put netdevices
Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with IPv6 address events.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/f0af6ad4722b4ca6e598fd4fda8311a3041651ec.1690471775.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:46 -07:00
Petr Machata
d0e0e88012 mlxsw: spectrum_router: RIF: Use tracker helpers to hold & put netdevices
Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with RIF allocation.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/8b7701a7b439ac268e4be4040eff99d01e27ae47.1690471775.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Petr Machata
b17b2d57b7 mlxsw: spectrum_router: hw_stats: Use tracker helpers to hold & put netdevices
Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with hw_stats events.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/b972314cfef4f4c24e66e60d13cffa5d606d1bf3.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Petr Machata
deeaa3716f mlxsw: spectrum_router: FIB: Use tracker helpers to hold & put netdevices
Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with FIB events.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/5221a92e751c40447c55959f622267ccc999ed04.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Petr Machata
1ae489ab43 mlxsw: spectrum_switchdev: Use tracker helpers to hold & put netdevices
Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
switchdev module.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/774c3d7b5b0231f1435df2ec9dd660192e382756.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Petr Machata
16f8c846cd mlxsw: spectrum_nve: Do not take reference when looking up netdevice
mlxsw_sp_nve_fid_disable() is always called under RTNL. It is therefore
safe to call __dev_get_by_index() to get the netdevice pointer without
bumping the reference count, because we can be sure the netdevice is not
going away. That then obviates the need to put the netdevice later in the
function.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/341d1046f89d8d839d9d00e4a3d58cdc351e9397.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Petr Machata
569f98b36b mlxsw: spectrum: Drop unused functions mlxsw_sp_port_lower_dev_hold/_put()
As of commit 151b89f602 ("mlxsw: spectrum_router: Reuse work neighbor
initialization in work scheduler"), the functions
mlxsw_sp_port_lower_dev_hold() and mlxsw_sp_port_dev_put() have no users.
Drop them.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/d0adcd7cb4ea19416294a0f861100edba84c9f36.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:38:45 -07:00
Patrick Rohr
5027d54a9c net: change accept_ra_min_rtr_lft to affect all RA lifetimes
accept_ra_min_rtr_lft only considered the lifetime of the default route
and discarded entire RAs accordingly.

This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and
applies the value to individual RA sections; in particular, router
lifetime, PIO preferred lifetime, and RIO lifetime. If any of those
lifetimes are lower than the configured value, the specific RA section
is ignored.

In order for the sysctl to be useful to Android, it should really apply
to all lifetimes in the RA, since that is what determines the minimum
frequency at which RAs must be processed by the kernel. Android uses
hardware offloads to drop RAs for a fraction of the minimum of all
lifetimes present in the RA (some networks have very frequent RAs (5s)
with high lifetimes (2h)). Despite this, we have encountered networks
that set the router lifetime to 30s which results in very frequent CPU
wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the
WiFi firmware) entirely on such networks, it seems better to ignore the
misconfigured routers while still processing RAs from other IPv6 routers
on the same network (i.e. to support IoT applications).

The previous implementation dropped the entire RA based on router
lifetime. This turned out to be hard to expand to the other lifetimes
present in the RA in a consistent manner; dropping the entire RA based
on RIO/PIO lifetimes would essentially require parsing the whole thing
twice.

Fixes: 1671bcfd76 ("net: add sysctl accept_ra_min_rtr_lft")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230726230701.919212-1-prohr@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 13:30:51 -07:00
Jakub Kicinski
5bdc312c1d Merge branch 'net-store-netdevs-in-an-xarray'
Jakub Kicinski says:

====================
net: store netdevs in an xarray

One of more annoying developer experience gaps we have in netlink
is iterating over netdevs. It's painful. Add an xarray to make
it trivial.

v1: https://lore.kernel.org/all/20230722014237.4078962-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20230726185530.2247698-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 11:36:00 -07:00
Jakub Kicinski
84e00d9bd4 net: convert some netlink netdev iterators to depend on the xarray
Reap the benefits of easier iteration thanks to the xarray.
Convert just the genetlink ones, those are easier to test.

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230726185530.2247698-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 11:35:58 -07:00
Jakub Kicinski
759ab1edb5 net: store netdevs in an xarray
Iterating over the netdev hash table for netlink dumps is hard.
Dumps are done in "chunks" so we need to save the position
after each chunk, so we know where to restart from. Because
netdevs are stored in a hash table we remember which bucket
we were in and how many devices we dumped.

Since we don't hold any locks across the "chunks" - devices may
come and go while we're dumping. If that happens we may miss
a device (if device is deleted from the bucket we were in).
We indicate to user space that this may have happened by setting
NLM_F_DUMP_INTR. User space is supposed to dump again (I think)
if it sees that. Somehow I doubt most user space gets this right..

To illustrate let's look at an example:

               System state:
  start:       # [A, B, C]
  del:  B      # [A, C]

with the hash table we may dump [A, B], missing C completely even
tho it existed both before and after the "del B".

Add an xarray and use it to allocate ifindexes. This way we
can iterate ifindexes in order, without the worry that we'll
skip one. We may still generate a dump of a state which "never
existed", for example for a set of values and sequence of ops:

               System state:
  start:       # [A, B]
  add:  C      # [A, C, B]
  del:  B      # [A, C]

we may generate a dump of [A], if C got an index between A and B.
System has never been in such state. But I'm 90% sure that's perfectly
fine, important part is that we can't _miss_ devices which exist before
and after. User space which wants to mirror kernel's state subscribes
to notifications and does periodic dumps so it will know that C exists
from the notification about its creation or from the next dump
(next dump is _guaranteed_ to include C, if it doesn't get removed).

To avoid any perf regressions keep the hash table for now. Most
net namespaces have very few devices and microbenchmarking 1M lookups
on Skylake I get the following results (not counting loopback
to number of devs):

 #devs | hash |  xa  | delta
    2  | 18.3 | 20.1 | + 9.8%
   16  | 18.3 | 20.1 | + 9.5%
   64  | 18.3 | 26.3 | +43.8%
  128  | 20.4 | 26.3 | +28.6%
  256  | 20.0 | 26.4 | +32.1%
 1024  | 26.6 | 26.7 | + 0.2%
 8192  |541.3 | 33.5 | -93.8%

No surprises since the hash table has 256 entries.
The microbenchmark scans indexes in order, if the pattern is more
random xa starts to win at 512 devices already. But that's a lot
of devices, in practice.

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230726185530.2247698-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 11:35:58 -07:00
Jakub Kicinski
083476a202 Merge branch 'ynl-couple-of-unrelated-fixes'
Stanislav Fomichev says:

====================
ynl: couple of unrelated fixes

- spelling of xdp-features
- s/xdp_zc_max_segs/xdp-zc-max-segs/
- expose xdp-zc-max-segs
- add /* private: */
- regenerate headers
- print xdp_zc_max_segs from sample
====================

Link: https://lore.kernel.org/r/20230727163001.3952878-1-sdf@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 09:33:14 -07:00
Stanislav Fomichev
26fdb67e8b ynl: print xdp-zc-max-segs in the sample
Technically we don't have to keep extending the sample, but it
feels useful to run these tools locally to confirm everything
is working.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230727163001.3952878-5-sdf@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 09:33:12 -07:00
Stanislav Fomichev
25b5a2a190 ynl: regenerate all headers
Also add support to pass topdir to ynl-regen.sh (Jakub) and call
it from the makefile to update the UAPI headers.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230727163001.3952878-4-sdf@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 09:33:12 -07:00