Commit graph

1046394 commits

Author SHA1 Message Date
Guenter Roeck
8db3cbc507 net: macb: Fix mdio child node detection
Commit 4d98bb0d7e ("net: macb: Use mdio child node for MDIO bus if it
exists") added code to detect if a 'mdio' child node exists to the macb
driver. Ths added code does, however, not actually check if the child node
exists, but if the parent node exists. This results in errors such as

macb 10090000.ethernet eth0: Could not attach PHY (-19)

if there is no 'mdio' child node. Fix the code to actually check for
the child node.

Fixes: 4d98bb0d7e ("net: macb: Use mdio child node for MDIO bus if it exists")
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://lore.kernel.org/r/20211026173950.353636-1-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 17:12:18 -07:00
Seth Forshee
85c0c3eb9a net: sch: simplify condtion for selecting mini_Qdisc_pair buffer
The only valid values for a miniq pointer are NULL or a pointer to
miniq1 or miniq2, so testing for miniq_old != &miniq1 is functionally
equivalent to testing that it is NULL or equal to &miniq2.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Link: https://lore.kernel.org/r/20211026183721.137930-1-seth@forshee.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 17:09:26 -07:00
Seth Forshee
267463823a net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap()
Currently rcu_barrier() is used to ensure that no readers of the
inactive mini_Qdisc buffer remain before it is reused. This waits for
any pending RCU callbacks to complete, when all that is actually
required is to wait for one RCU grace period to elapse after the buffer
was made inactive. This means that using rcu_barrier() may result in
unnecessary waits.

To improve this, store the current RCU state when a buffer is made
inactive and use poll_state_synchronize_rcu() to check whether a full
grace period has elapsed before reusing it. If a full grace period has
not elapsed, wait for a grace period to elapse, and in the non-RT case
use synchronize_rcu_expedited() to hasten it.

Since this approach eliminates the RCU callback it is no longer
necessary to synchronize_rcu() in the tp_head==NULL case. However, the
RCU state should still be saved for the previously active buffer.

Before this change I would typically see mini_qdisc_pair_swap() take
tens of milliseconds to complete. After this change it typcially
finishes in less than 1 ms, and often it takes just a few microseconds.

Thanks to Paul for walking me through the options for improving this.

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Link: https://lore.kernel.org/r/20211026130700.121189-1-seth@forshee.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 17:09:26 -07:00
Arnd Bergmann
f25c0515c5 net: sched: gred: dynamically allocate tc_gred_qopt_offload
The tc_gred_qopt_offload structure has grown too big to be on the
stack for 32-bit architectures after recent changes.

net/sched/sch_gred.c:903:13: error: stack frame size (1180) exceeds limit (1024) in 'gred_destroy' [-Werror,-Wframe-larger-than]
net/sched/sch_gred.c:310:13: error: stack frame size (1212) exceeds limit (1024) in 'gred_offload' [-Werror,-Wframe-larger-than]

Use dynamic allocation per qdisc to avoid this.

Fixes: 50dc9a8572 ("net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types")
Fixes: 67c9e6270f ("net: sched: Protect Qdisc::bstats with u64_stats")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20211026100711.nalhttf6mbe6sudx@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 12:06:52 -07:00
Jakub Kicinski
4796e2518a Merge branch 'two-reverts-to-calm-down-devlink-discussion'
Leon Romanovsky says:

====================
Two reverts to calm down devlink discussion

Two reverts as was discussed in [1], fast, easy and wrong in long run
solution to syzkaller bug [2].

[1] https://lore.kernel.org/all/20211026120234.3408fbcc@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com
[2] https://lore.kernel.org/netdev/000000000000af277405cf0a7ef0@google.com/
====================

Link: https://lore.kernel.org/r/cover.1635276828.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 11:58:11 -07:00
Leon Romanovsky
c5e0321e43 Revert "devlink: Remove not-executed trap policer notifications"
This reverts commit 22849b5ea5 as it
revealed that mlxsw and netdevsim (copy/paste from mlxsw) reregisters
devlink objects during another devlink user triggered command.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 11:58:07 -07:00
Leon Romanovsky
fb9d19c2d8 Revert "devlink: Remove not-executed trap group notifications"
This reverts commit 8bbeed4858 as it
revealed that mlxsw and netdevsim (copy/paste from mlxsw) reregisters
devlink objects during another devlink user triggered command.

Fixes: 22849b5ea5 ("devlink: Remove not-executed trap policer notifications")
Reported-by: syzbot+93d5accfaefceedf43c1@syzkaller.appspotmail.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 11:58:07 -07:00
David S. Miller
6487c81939 Merge branch 'br-fdb-refactoring'
Vladimir Oltean says:

====================
Bridge FDB refactoring

This series refactors the br_fdb.c, br_switchdev.c and switchdev.c files
to offer the same level of functionality with a bit less code, and to
clarify the purpose of some functions.

No functional change intended.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
Vladimir Oltean
716a30a97a net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device
To reduce code churn, the same patch makes multiple changes, since they
all touch the same lines:

1. The implementations for these two are identical, just with different
   function pointers. Reduce duplications and name the function pointers
   "mod_cb" instead of "add_cb" and "del_cb". Pass the event as argument.

2. Drop the "const" attribute from "orig_dev". If the driver needs to
   check whether orig_dev belongs to itself and then
   call_switchdev_notifiers(orig_dev, SWITCHDEV_FDB_OFFLOADED), it
   can't, because call_switchdev_notifiers takes a non-const struct
   net_device *.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
Vladimir Oltean
fab9eca884 net: bridge: create a common function for populating switchdev FDB entries
There are two places where a switchdev FDB entry is constructed, one is
br_switchdev_fdb_notify() and the other is br_fdb_replay(). One uses a
struct initializer, and the other declares the structure as
uninitialized and populates the elements one by one.

One problem when introducing new members of struct
switchdev_notifier_fdb_info is that there is a risk for one of these
functions to run with an uninitialized value.

So centralize the logic of populating such structure into a dedicated
function. Being the primary location where these structures are created,
using an uninitialized variable and populating the members one by one
should be fine, since this one function is supposed to assign values to
all its members.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
Vladimir Oltean
5cda5272a4 net: bridge: move br_fdb_replay inside br_switchdev.c
br_fdb_replay is only called from switchdev code paths, so it makes
sense to be disabled if switchdev is not enabled in the first place.

As opposed to br_mdb_replay and br_vlan_replay which might be turned off
depending on bridge support for multicast and VLANs, FDB support is
always on. So moving br_mdb_replay and br_vlan_replay inside
br_switchdev.c would mean adding some #ifdef's in br_switchdev.c, so we
keep those where they are.

The reason for the movement is that in future changes there will be some
code reuse between br_switchdev_fdb_notify and br_fdb_replay.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
Vladimir Oltean
9574fb5580 net: bridge: reduce indentation level in fdb_create
We can express the same logic without an "if" condition as big as the
function, just return early if the kmem_cache_alloc() call fails.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
Vladimir Oltean
f6814fdcfe net: bridge: rename br_fdb_insert to br_fdb_add_local
br_fdb_insert() is a wrapper over fdb_insert() that also takes the
bridge hash_lock.

With fdb_insert() being renamed to fdb_add_local(), rename
br_fdb_insert() to br_fdb_add_local().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
Vladimir Oltean
4731b6d6b2 net: bridge: rename fdb_insert to fdb_add_local
fdb_insert() is not a descriptive name for this function, and also easy
to confuse with __br_fdb_add(), fdb_add_entry(), br_fdb_update().
Even more confusingly, it is not even related in any way with those
functions, neither one calls the other.

Since fdb_insert() basically deals with the creation of a BR_FDB_LOCAL
entry and is called only from functions where that is the intention:

- br_fdb_changeaddr
- br_fdb_change_mac_address
- br_fdb_insert

then rename it to fdb_add_local(), because its removal counterpart is
called fdb_delete_local().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
Vladimir Oltean
5f94a5e276 net: bridge: remove fdb_insert forward declaration
fdb_insert() has a forward declaration because its first caller,
br_fdb_changeaddr(), is declared before fdb_create(), a function which
fdb_insert() needs.

This patch moves the 2 functions above br_fdb_changeaddr() and deletes
the forward declaration for fdb_insert().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
Vladimir Oltean
4682048af0 net: bridge: remove fdb_notify forward declaration
fdb_notify() has a forward declaration because its first caller,
fdb_delete(), is declared before 3 functions that fdb_notify() needs:
fdb_to_nud(), fdb_fill_info() and fdb_nlmsg_size().

This patch moves the aforementioned 4 functions above fdb_delete() and
deletes the forward declaration.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
David S. Miller
e334df1d33 Merge branch 'mvneta-phylink'
Russell King says:

====================
Convert mvneta to phylink supported_interfaces

This patch series converts mvneta to use phylinks supported_interfaces
bitmap to simplify the validate() implementation. The patches:

1) Add the supported interface modes the supported_interfaces bitmap.
2) Removes the checks for the interface type being supported from
   the validate callback
3) Removes the now unnecessary checks and call to
   phylink_helper_basex_speed() to support switching between
   1000base-X and 2500base-X for SFPs

(3) becomes possible because when asking the MAC for its complete
support, we walk all supported interfaces which will include 1000base-X
and 2500base-X only if the comphy is present.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:50:11 +01:00
Russell King (Oracle)
099cbfa286 net: mvneta: drop use of phylink_helper_basex_speed()
Now that we have a better method to select SFP interface modes, we
no longer need to use phylink_helper_basex_speed() in a driver's
validation function, and we can also get rid of our hack to indicate
both 1000base-X and 2500base-X if the comphy is present to make that
work. Remove this hack and use of phylink_helper_basex_speed().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:50:11 +01:00
Russell King (Oracle)
d9ca72807e net: mvneta: remove interface checks in mvneta_validate()
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:50:11 +01:00
Russell King
fdedb695e6 net: mvneta: populate supported_interfaces member
Populate the phy_interface_t bitmap for the Marvell mvneta driver with
interfaces modes supported by the MAC.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:50:11 +01:00
David S. Miller
c230dc8627 mlx5-updates-2021-10-26
HW-GRO support in mlx5
 
 Beside the HW GRO this series includes two trivial non-mlx5 patches:
  - net: Prevent HW-GRO and LRO features operate together
  - lib: bitmap: Introduce node-aware alloc API
 
 Khalid Manaa Says:
 ==================
 This series implements the HW-GRO offload using the HW feature SHAMPO.
 
 HW-GRO: Hardware offload for the Generic Receive Offload feature.
 
 SHAMPO: Split Headers And Merge Payload Offload.
 
 This feature performs headers data split for each received packed and
 merge the payloads of the packets of the same session.
 
 There are new HW components for this feature:
 
 The headers buffer:
 – cyclic buffer where the packets headers will be located
 
 Reservation buffer:
 – capability to divide RQ WQEs to reservations, a definite size in
   granularity of 4KB, the reservation is used to define the largest segment
   that we can create by packets stitching.
 
 Each reservation will have a session and the new received packet can be merged
 to the session, terminate it, or open a new one according to the match criteria.
 
 When a new packet is received the headers will be written to the headers buffer
 and the data will be written to the reservation, in case the packet matches
 the session the data will be written continuously otherwise it will be written
 after performing an alignment.
 
 SHAMPO RQ, WQ and CQE changes:
 -----------------------------
 RQ (receive queue) new params:
 
  -shampo_no_match_alignment_granularity: the HW alignment granularity in case
   the received packet doesn't match the current session.
 
  -shampo_match_criteria_type: the type of match criteria.
 
  -reservation_timeout: the maximum time that the HW will hold the reservation.
 
  -Each RQ has SKB that represents the current opened flow.
 
 WQ (work queue) new params:
 
  -headers_mkey: mkey that represents the headers buffer, where the packets
   headers will be written by the HW.
 
  -shampo_enable: flag to verify if the WQ supports SHAMPO feature.
 
  -log_reservation_size: the log of the reservation size where the data of
   the packet will be written by the HW.
 
  -log_max_num_of_packets_per_reservation: log of the maximum number of packets
   that can be written to the same reservation.
 
  -log_headers_entry_size: log of the header entry size of the headers buffer.
 
  -log_headers_buffer_entry_num: log of the entries number of the headers buffer.
 
 CQEs (Completion queue entry) SHAMPO fields:
 
  -match: in case it is set, then the current packet matches the opened session.
 
  -flush: in case it is set, the opened session must be flushed.
 
  -header_size: the size of the packet’s headers.
 
  -header_entry_index: the entry index in the headers buffer of the received
   packet headers.
 
  -data_offset: the offset of the received packet data in the WQE.
 
 HW-GRO works as follow:
 ----------------------
 The feature can be enabled on the interface using the ethtool command by
 setting on rx-gro-hw. When the feature is on the mlx5 driver will reopen
 the RQ to support the SHAMPO feature:
 
 Will allocate the headers buffer and fill the parameters regarding the
 reservation and the match criteria.
 
 Receive packet flow:
 
 each RQ will hold SKB that represents the current GRO opened session.
 
 The driver has a new CQE handler mlx5e_handle_rx_cqe_mpwrq_shampo which will
 use the CQE SHAMPO params to extract the location of the packet’s headers
 in the headers buffer and the location of the packets data in the RQ.
 
 Also, the CQE has two flags flush and match that indicate if the current
 packet matches the current session or not and if we need to close the session.
 
 In case there is an opened session, and we receive a matched packet then the
 handler will merge the packet's payload to the current SKB, in case we receive
 no match then the handler will flush the SKB and create a new one for the new packet.
 
 In case the flash flag is set then the driver will close the session, the SKB
 will be passed to the network stack.
 
 In case the driver merges packets in the SKB, before passing the SKB to the network
 stack the driver will update the checksum of the packet’s headers.
 
 SKB build:
 ---------
 The driver will build a new SKB in the following situations:
 in case there is no current opened session.
 In case the current packet doesn’t match the current session.
 In case there is no place to add the packets data to the SKB that represents the
 current session.
 
 Otherwise, the driver will add the packet’s data to the SKB.
 
 When the driver builds a new SKB, the linear area will contain only the packet headers
 and the data will be added to the SKB fragments.
 
 In case the entry size of the headers buffer is sufficient to build the SKB
 it will be used, otherwise the driver will allocate new memory to build the SKB.
 
 ==================
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmF4udIACgkQSD+KveBX
 +j7XBgf+OOLEJrYrbQpffdZQksn37oT98XrsGAoPbcxhaqioH+yErcVPdVyK8lEK
 sERyRkjAn4+2svp15aM7VM/JSN/JbikVWY9DQ5joCVmeJ9wIKttAACybM+VmT8tw
 0RBW+XfXyAmGzlV260/MuO4tzCfyrXuCEw0jDyBgxKyFX2aIvzll64ehryv4tbFq
 tc6P60FuM9E9YTvtXY4KHYx1taIzpOOQ6FjEx4UgvcLiARmuWTYR+Gs872A/NUKb
 9KEyz+wSnX2QzpHXXVr6EP9kVzAUqlBYRYtNc55cWpcOn8zqep4T/AFr3BtB02mJ
 NylP3+EktqBV8ISNRr5QtoY95GLWcQ==
 =v8rX
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-10-26

HW-GRO support in mlx5

Beside the HW GRO this series includes two trivial non-mlx5 patches:
 - net: Prevent HW-GRO and LRO features operate together
 - lib: bitmap: Introduce node-aware alloc API

Khalid Manaa Says:
==================
This series implements the HW-GRO offload using the HW feature SHAMPO.

HW-GRO: Hardware offload for the Generic Receive Offload feature.

SHAMPO: Split Headers And Merge Payload Offload.

This feature performs headers data split for each received packed and
merge the payloads of the packets of the same session.

There are new HW components for this feature:

The headers buffer:
– cyclic buffer where the packets headers will be located

Reservation buffer:
– capability to divide RQ WQEs to reservations, a definite size in
  granularity of 4KB, the reservation is used to define the largest segment
  that we can create by packets stitching.

Each reservation will have a session and the new received packet can be merged
to the session, terminate it, or open a new one according to the match criteria.

When a new packet is received the headers will be written to the headers buffer
and the data will be written to the reservation, in case the packet matches
the session the data will be written continuously otherwise it will be written
after performing an alignment.

SHAMPO RQ, WQ and CQE changes:
-----------------------------
RQ (receive queue) new params:

 -shampo_no_match_alignment_granularity: the HW alignment granularity in case
  the received packet doesn't match the current session.

 -shampo_match_criteria_type: the type of match criteria.

 -reservation_timeout: the maximum time that the HW will hold the reservation.

 -Each RQ has SKB that represents the current opened flow.

WQ (work queue) new params:

 -headers_mkey: mkey that represents the headers buffer, where the packets
  headers will be written by the HW.

 -shampo_enable: flag to verify if the WQ supports SHAMPO feature.

 -log_reservation_size: the log of the reservation size where the data of
  the packet will be written by the HW.

 -log_max_num_of_packets_per_reservation: log of the maximum number of packets
  that can be written to the same reservation.

 -log_headers_entry_size: log of the header entry size of the headers buffer.

 -log_headers_buffer_entry_num: log of the entries number of the headers buffer.

CQEs (Completion queue entry) SHAMPO fields:

 -match: in case it is set, then the current packet matches the opened session.

 -flush: in case it is set, the opened session must be flushed.

 -header_size: the size of the packet’s headers.

 -header_entry_index: the entry index in the headers buffer of the received
  packet headers.

 -data_offset: the offset of the received packet data in the WQE.

HW-GRO works as follow:
----------------------
The feature can be enabled on the interface using the ethtool command by
setting on rx-gro-hw. When the feature is on the mlx5 driver will reopen
the RQ to support the SHAMPO feature:

Will allocate the headers buffer and fill the parameters regarding the
reservation and the match criteria.

Receive packet flow:

each RQ will hold SKB that represents the current GRO opened session.

The driver has a new CQE handler mlx5e_handle_rx_cqe_mpwrq_shampo which will
use the CQE SHAMPO params to extract the location of the packet’s headers
in the headers buffer and the location of the packets data in the RQ.

Also, the CQE has two flags flush and match that indicate if the current
packet matches the current session or not and if we need to close the session.

In case there is an opened session, and we receive a matched packet then the
handler will merge the packet's payload to the current SKB, in case we receive
no match then the handler will flush the SKB and create a new one for the new packet.

In case the flash flag is set then the driver will close the session, the SKB
will be passed to the network stack.

In case the driver merges packets in the SKB, before passing the SKB to the network
stack the driver will update the checksum of the packet’s headers.

SKB build:
---------
The driver will build a new SKB in the following situations:
in case there is no current opened session.
In case the current packet doesn’t match the current session.
In case there is no place to add the packets data to the SKB that represents the
current session.

Otherwise, the driver will add the packet’s data to the SKB.

When the driver builds a new SKB, the linear area will contain only the packet headers
and the data will be added to the SKB fragments.

In case the entry size of the headers buffer is sufficient to build the SKB
it will be used, otherwise the driver will allocate new memory to build the SKB.

==================

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:39:57 +01:00
Maor Dickman
8ca9caee85 net/mlx5: Lag, Make mlx5_lag_is_multipath() be static inline
Fix "no previous prototype" W=1 warnings when CONFIG_MLX5_CORE_EN is not set:

  drivers/net/ethernet/mellanox/mlx5/core/lag_mp.h:34:6: error: no previous prototype for ‘mlx5_lag_is_multipath’ [-Werror=missing-prototypes]
     34 | bool mlx5_lag_is_multipath(struct mlx5_core_dev *dev) { return false; }
        |      ^~~~~~~~~~~~~~~~~~~~~

Fixes: 14fe2471c6 ("net/mlx5: Lag, change multipath and bonding to be mutually exclusive")
Signed-off-by: Maor Dickman <maord@nvidia.com>
2021-10-26 19:30:42 -07:00
Khalid Manaa
ae3452995b net/mlx5e: Prevent HW-GRO and CQE-COMPRESS features operate together
HW-GRO and CQE-COMPRESS are mutually exclusive, this commit adds this
restriction.

Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:42 -07:00
Khalid Manaa
83439f3c37 net/mlx5e: Add HW-GRO offload
This commit introduces HW-GRO offload by using the SHAMPO feature
- Add set feature handler for HW-GRO.

Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:41 -07:00
Khalid Manaa
def09e7bbc net/mlx5e: Add HW_GRO statistics
This patch adds HW_GRO counters to RX packets statistics:
 - gro_match_packets: counter of received packets with set match flag.

 - gro_packets: counter of received packets over the HW_GRO feature,
                this counter is increased by one for every received
                HW_GRO cqe.

 - gro_bytes: counter of received bytes over the HW_GRO feature,
              this counter is increased by the received bytes for every
              received HW_GRO cqe.

 - gro_skbs: counter of built HW_GRO skbs,
             increased by one when we flush HW_GRO skb
             (when we call a napi_gro_receive with hw_gro skb).

 - gro_large_hds: counter of received packets with large headers size,
                  in case the packet needs new SKB, the driver will allocate
                  new one and will not use the headers entry to build it.

Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:41 -07:00
Khalid Manaa
92552d3abd net/mlx5e: HW_GRO cqe handler implementation
this patch updates the SHAMPO CQE handler to support HW_GRO,

changes in the SHAMPO CQE handler:
- CQE match and flush fields are used to determine if to build new skb
  using the new received packet,
  or to add the received packet data to the existing RQ.hw_gro_skb,
  also this fields are used to determine when to flush the skb.
- in the end of the function mlx5e_poll_rx_cq the RQ.hw_gro_skb is flushed.

Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:41 -07:00
Ben Ben-Ishay
64509b0525 net/mlx5e: Add data path for SHAMPO feature
The header buffer is used to store the headers of the rx packets.
The header buffer size deduced from WorkQueue size + restriction
of max packets per WorkQueueElement.
This commit adds the functionality for posting/updating memory for
the header buffer during the posting/updating of WQEs.

Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:40 -07:00
Khalid Manaa
f97d5c2a45 net/mlx5e: Add handle SHAMPO cqe support
This patch adds the new CQE SHAMPO fields:
- flush: indicates that we must close the current session and pass the SKB
         to the network stack.

- match: indicates that the current packet matches the oppened session,
         the packet will be merge into the current SKB.

- header_size: the size of the packet headers that written into the headers
               buffer.

- header_entry_index: the entry index in the headers buffer.

- data_offset: packets data offset in the WQE.

Also new cqe handler is added to handle SHAMPO packets:
- The new handler uses CQE SHAMPO fields to build the SKB.
  CQE's Flush and match fields are not used in this patch, packets are not
  merged in this patch.

Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:40 -07:00
Ben Ben-Ishay
e5ca8fb08a net/mlx5e: Add control path for SHAMPO feature
This commit introduces the control path infrastructure for SHAMPO feature.

SHAMPO feature enables packet stitching by splitting packets to
header and payload, the header is placed on a dedicated buffer
and the payload on the RX ring, this allows stitching the data part
of a flow together continuously in the receive buffer.

SHAMPO feature is implemented as linked list striding RQ feature.
To support packets splitting and payload stitching:
- Enlarge the ICOSQ and the correspond CQ to support the header buffer
  memory regions.
- Add support to create linked list striding RQ with SHAMPO feature set
  in the open_rq function.
- Add deallocation function and corresponded calls for SHAMPO header
  buffer.
- Add mlx5e_create_umr_klm_mkey to support KLM mkey for the header
  buffer.
- Rename mlx5e_create_umr_mkey to mlx5e_create_umr_mtt_mkey.

Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:40 -07:00
Ben Ben-Ishay
d7b896acbd net/mlx5e: Add support to klm_umr_wqe
This commit adds the needed definitions for using the klm_umr_wqe.
UMR stands for user-mode memory registration, is a mechanism to alter
address translation properties of MKEY by posting WorkQueueElement
aka WQE on send queue.
MKEY stands for memory key, MKEY are used to describe a region in memory that
can be later used by HW.
KLM stands for {Key, Length, MemVa}, KLM_MKEY is indirect MKEY that enables
to map multiple memory spaces with different sizes in unified MKEY.
klm_umr_wqe is a UMR that use to update a KLM_MKEY.
SHAMPO feature uses KLM_MKEY for memory registration of his header buffer.

Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:39 -07:00
Khalid Manaa
eaee12f046 net/mlx5e: Rename TIR lro functions to TIR packet merge functions
This series introduces new packet merge type, therefore rename lro
functions to packet merge to support the new merge type:
- Generalize + rename mlx5e_build_tir_ctx_lro to
  mlx5e_build_tir_ctx_packet_merge.
- Rename mlx5e_modify_tirs_lro to mlx5e_modify_tirs_packet_merge.
- Rename lro bit in mlx5_ifc_modify_tir_bitmask_bits to packet_merge.
- Rename lro_en in mlx5e_params to packet_merge_type type and combine
  packet_merge params into one struct mlx5e_packet_merge_param.

Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:39 -07:00
Ben Ben-Ishay
7025329d20 net/mlx5: Add SHAMPO caps, HW bits and enumerations
This commit adds SHAMPO bit to hca_cap and SHAMPO capabilities structure,
SHAMPO related HW spec hardware fields and enumerations.
SHAMPO stands for: split headers and merge payload offload.
SHAMPO new fields:
WQ:
 - headers_mkey: mkey that represents the headers buffer, where the packets
   headers will be written by the HW.

 - shampo_enable: flag to verify if the WQ supports SHAMPO feature.

 - log_reservation_size: the log of the reservation size where the data of
   the packet will be written by the HW.

 - log_max_num_of_packets_per_reservation: log of the maximum number of
   packets that can be written to the same reservation.

 - log_headers_entry_size: log of the header entry size of the headers buffer.

 - log_headers_buffer_entry_num: log of the entries number of the headers buffer.

RQ:
 - shampo_no_match_alignment_granularity: the HW alignment granularity
   in case the received packet doesn't match the current session.

 - shampo_match_criteria_type: the type of match criteria.

 - reservation_timeout: the maximum time that the HW will hold the
   reservation.

mlx5_ifc_shampo_cap_bits, the capabilities of the SHAMPO feature:
 - shampo_log_max_reservation_size: the maximum allowed value of the field
   WQ.log_reservation_size.

 - log_reservation_size: the minimum allowed value of the field
   WQ.log_reservation_size.

 - shampo_min_mss_size: the minimum payload size of packet that can open
   a new session or be merged to a session.

 - shampo_max_log_headers_entry_size: the maximum allowed value of the field
   WQ.log_headers_entry_size

Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:39 -07:00
Ben Ben-Ishay
50f477fe99 net/mlx5e: Rename lro_timeout to packet_merge_timeout
TIR stands for transport interface receive, the TIR object is
responsible for performing all transport related operations on
the receive side like packet processing, demultiplexing the packets
to different RQ's, etc.
lro_timeout is a field in the TIR that is used to set the timeout for lro
session, this series introduces new packet merge type, therefore rename
lro_timeout to packet_merge_timeout for all packet merge types.

Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:38 -07:00
Ben Ben-ishay
54b2b3ecca net: Prevent HW-GRO and LRO features operate together
LRO and HW-GRO are mutually exclusive, this commit adds this restriction
in netdev_fix_feature. HW-GRO is preferred, that means in case both
HW-GRO and LRO features are requested, LRO is cleared.

Signed-off-by: Ben Ben-ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:38 -07:00
Tariq Toukan
7529cc7fbd lib: bitmap: Introduce node-aware alloc API
Expose new node-aware API for bitmap allocation:
bitmap_alloc_node() / bitmap_zalloc_node().

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26 19:30:38 -07:00
Luo Jie
06338ceff9 net: phy: fixed warning: Function parameter not described
Fixed warning: Function parameter or member 'enable' not
described in 'genphy_c45_fast_retrain'

Signed-off-by: Luo Jie <luoj@codeaurora.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20211026102957.17100-1-luoj@codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-26 14:09:50 -07:00
Jakub Kicinski
6b3671746a net/mlx5: remove the recent devlink params
revert commit 46ae40b94d ("net/mlx5: Let user configure io_eq_size param")
revert commit a6cb08daa3 ("net/mlx5: Let user configure event_eq_size param")
revert commit 5546040619 ("net/mlx5: Let user configure max_macs param")

The EQE parameters are applicable to more drivers, they should
be configured via standard API, probably ethtool. Example of
another driver needing something similar:

https://lore.kernel.org/all/1633454136-14679-3-git-send-email-sbhatta@marvell.com/

The last param for "max_macs" is probably fine but the documentation
is severely lacking. The meaning and implications for changing the
param need to be stated.

Link: https://lore.kernel.org/r/20211026152939.3125950-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-26 10:18:32 -07:00
David S. Miller
4d2af64bb7 Merge branch 'phy-supported-interfaces-bitmap'
Russell King says:

====================
Introduce supported interfaces bitmap

This series introduces a new bitmap to allow us to indicate which
phy_interface_t modes are supported.

Currently, phylink will call ->validate with PHY_INTERFACE_MODE_NA to
request all link mode capabilities from the MAC driver before choosing
an interface to use. This leads in some cases to some rather hairly
code. This can be simplified if phylink is aware of the interface modes
that  the MAC supports, and it can instead walk those modes, calling
->validate for each one, and combining the results.

This series merely introduces the support; there is no change of
behaviour until MAC drivers populate their supported_interfaces bitmap.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 15:10:37 +01:00
Russell King (Oracle)
d25f3a74f3 net: phylink: use supported_interfaces for phylink validation
If the network device supplies a supported interface bitmap, we can use
that during phylink's validation to simplify MAC drivers in two ways by
using the supported_interfaces bitmap to:

1. reject unsupported interfaces before calling into the MAC driver.
2. generate the set of all supported link modes across all supported
   interfaces (used mainly for SFP, but also some 10G PHYs.)

Suggested-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 15:10:36 +01:00
Russell King
38c310eb46 net: phylink: add MAC phy_interface_t bitmap
Add a phy_interface_t bitmap so the MAC driver can specifiy which PHY
interface modes it supports.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 15:10:36 +01:00
Russell King (Oracle)
8e20f591f2 net: phy: add phy_interface_t bitmap support
Add support for a bitmap for phy interface modes, which includes:
- a macro to declare the interface bitmap
- an inline helper to zero the interface bitmap
- an inline helper to detect an empty interface bitmap
- inline helpers to do a bitwise AND and OR operations on two interface
  bitmaps

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 15:10:36 +01:00
David S. Miller
656bcd5db8 Merge branch 'dsa-isolation-prep'
Vladimir Oltean says:

====================
DSA preparations for FDB isolation between bridges

This series makes 2 small changes to DSA's SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
handler, which will make it possible to offer switch drivers a stable
association between a FDB entry and a bridge device in a future series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 15:07:36 +01:00
Vladimir Oltean
425d19cede net: dsa: stop calling dev_hold in dsa_slave_fdb_event
Now that we guarantee that SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events have
finished executing by the time we leave our bridge upper interface,
we've established a stronger boundary condition for how long the
dsa_slave_switchdev_event_work() might run.

As such, it is no longer possible for DSA slave interfaces to become
unregistered, since they are still bridge ports.

So delete the unnecessary dev_hold() and dev_put().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 15:07:35 +01:00
Vladimir Oltean
d7d0d423db net: dsa: flush switchdev workqueue when leaving the bridge
DSA is preparing to offer switch drivers an API through which they can
associate each FDB entry with a struct net_device *bridge_dev. This can
be used to perform FDB isolation (the FDB lookup performed on the
ingress of a standalone, or bridged port, should not find an FDB entry
that is present in the FDB of another bridge).

In preparation of that work, DSA needs to ensure that by the time we
call the switch .port_fdb_add and .port_fdb_del methods, the
dp->bridge_dev pointer is still valid, i.e. the port is still a bridge
port.

This is not guaranteed because the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE API
requires drivers that must have sleepable context to handle those events
to schedule the deferred work themselves. DSA does this through the
dsa_owq.

It can happen that a port leaves a bridge, del_nbp() flushes the FDB on
that port, SWITCHDEV_FDB_DEL_TO_DEVICE is notified in atomic context,
DSA schedules its deferred work, but del_nbp() finishes unlinking the
bridge as a master from the port before DSA's deferred work is run.

Fundamentally, the port must not be unlinked from the bridge until all
FDB deletion deferred work items have been flushed. The bridge must wait
for the completion of these hardware accesses.

An attempt has been made to address this issue centrally in switchdev by
making SWITCHDEV_FDB_DEL_TO_DEVICE deferred (=> blocking) at the switchdev
level, which would offer implicit synchronization with del_nbp:

https://patchwork.kernel.org/project/netdevbpf/cover/20210820115746.3701811-1-vladimir.oltean@nxp.com/

but it seems that any attempt to modify switchdev's behavior and make
the events blocking there would introduce undesirable side effects in
other switchdev consumers.

The most undesirable behavior seems to be that
switchdev_deferred_process_work() takes the rtnl_mutex itself, which
would be worse off than having the rtnl_mutex taken individually from
drivers which is what we have now (except DSA which has removed that
lock since commit 0faf890fc5 ("net: dsa: drop rtnl_lock from
dsa_slave_switchdev_event_work")).

So to offer the needed guarantee to DSA switch drivers, I have come up
with a compromise solution that does not require switchdev rework:
we already have a hook at the last moment in time when the bridge is
still an upper of ours: the NETDEV_PRECHANGEUPPER handler. We can flush
the dsa_owq manually from there, which makes all FDB deletions
synchronous.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 15:07:35 +01:00
Lukas Wunner
046178e726 ifb: Depend on netfilter alternatively to tc
IFB originally depended on NET_CLS_ACT for traffic redirection.
But since v4.5, that may be achieved with NFT_FWD_NETDEV as well.

Fixes: 39e6dea28a ("netfilter: nf_tables: add forward expression to the netdev family")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: <stable@vger.kernel.org> # v4.5+: bcfabee1af: netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 15:03:37 +01:00
Jeremy Kerr
99ce45d5e7 mctp: Implement extended addressing
This change allows an extended address struct - struct sockaddr_mctp_ext
- to be passed to sendmsg/recvmsg. This allows userspace to specify
output ifindex and physical address information (for sendmsg) or receive
the input ifindex/physaddr for incoming messages (for recvmsg). This is
typically used by userspace for MCTP address discovery and assignment
operations.

The extended addressing facility is conditional on a new sockopt:
MCTP_OPT_ADDR_EXT; userspace must explicitly enable addressing before
the kernel will consume/populate the extended address data.

Includes a fix for an uninitialised var:
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 14:58:45 +01:00
Nathan Chancellor
971f5c4079 net: ax88796c: Remove pointless check in ax88796c_open()
Clang warns:

drivers/net/ethernet/asix/ax88796c_main.c:851:24: error: address of
array 'ax_local->phydev->advertising' will always evaluate to 'true'
[-Werror,-Wpointer-bool-conversion]
        if (ax_local->phydev->advertising &&
            ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ ~~

advertising cannot be NULL here if ax_local is not NULL, which cannot
happen due to the check in ax88796c_probe(). Remove the check.

Link: https://github.com/ClangBuiltLinux/linux/issues/1492
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 14:57:23 +01:00
Nathan Chancellor
3c5548812a net: ax88796c: Fix clang -Wimplicit-fallthrough in ax88796c_set_mac()
Clang warns:

drivers/net/ethernet/asix/ax88796c_main.c:696:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
        case SPEED_10:
        ^
drivers/net/ethernet/asix/ax88796c_main.c:696:2: note: insert 'break;' to avoid fall-through
        case SPEED_10:
        ^
        break;
drivers/net/ethernet/asix/ax88796c_main.c:706:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
        case DUPLEX_HALF:
        ^
drivers/net/ethernet/asix/ax88796c_main.c:706:2: note: insert 'break;' to avoid fall-through
        case DUPLEX_HALF:
        ^
        break;

Clang is a little more pedantic than GCC, which permits implicit
fallthroughs to cases that contain just break or return. Clang's version
is more in line with the kernel's own stance in deprecated.rst, which
states that all switch/case blocks must end in either break,
fallthrough, continue, goto, or return. Add the missing breaks to fix
the warning.

Link: https://github.com/ClangBuiltLinux/linux/issues/1491
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 14:57:23 +01:00
Haiyang Zhang
a137c069fb net: mana: Allow setting the number of queues while the NIC is down
The existing code doesn't allow setting the number of queues while the
NIC is down.

Update the ethtool handler functions to support setting the number of
queues while the NIC is at down state.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 14:56:19 +01:00
Andreas Oetken
eafaa88b3e net: hsr: Add support for redbox supervision frames
added support for the redbox supervision frames
as defined in the IEC-62439-3:2018.

Signed-off-by: Andreas Oetken <andreas.oetken@siemens-energy.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 14:52:17 +01:00