docs: net: dsa: update information about multiple CPU ports

DSA now supports multiple CPU ports, explain the use cases that are
covered, the new UAPI, the permitted degrees of freedom, the driver API,
and remove some old "hanging fruits".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This commit is contained in:
Vladimir Oltean 2022-09-11 04:07:05 +03:00 committed by Paolo Abeni
parent acc43b7bf5
commit 0773e3a851
2 changed files with 128 additions and 6 deletions

View File

@ -49,6 +49,9 @@ In this documentation the following Ethernet interfaces are used:
*eth0*
the master interface
*eth1*
another master interface
*lan1*
a slave interface
@ -360,3 +363,96 @@ the ``self`` flag) has been removed. This results in the following changes:
Script writers are therefore encouraged to use the ``master static`` set of
flags when working with bridge FDB entries on DSA switch interfaces.
Affinity of user ports to CPU ports
-----------------------------------
Typically, DSA switches are attached to the host via a single Ethernet
interface, but in cases where the switch chip is discrete, the hardware design
may permit the use of 2 or more ports connected to the host, for an increase in
termination throughput.
DSA can make use of multiple CPU ports in two ways. First, it is possible to
statically assign the termination traffic associated with a certain user port
to be processed by a certain CPU port. This way, user space can implement
custom policies of static load balancing between user ports, by spreading the
affinities according to the available CPU ports.
Secondly, it is possible to perform load balancing between CPU ports on a per
packet basis, rather than statically assigning user ports to CPU ports.
This can be achieved by placing the DSA masters under a LAG interface (bonding
or team). DSA monitors this operation and creates a mirror of this software LAG
on the CPU ports facing the physical DSA masters that constitute the LAG slave
devices.
To make use of multiple CPU ports, the firmware (device tree) description of
the switch must mark all the links between CPU ports and their DSA masters
using the ``ethernet`` reference/phandle. At startup, only a single CPU port
and DSA master will be used - the numerically first port from the firmware
description which has an ``ethernet`` property. It is up to the user to
configure the system for the switch to use other masters.
DSA uses the ``rtnl_link_ops`` mechanism (with a "dsa" ``kind``) to allow
changing the DSA master of a user port. The ``IFLA_DSA_MASTER`` u32 netlink
attribute contains the ifindex of the master device that handles each slave
device. The DSA master must be a valid candidate based on firmware node
information, or a LAG interface which contains only slaves which are valid
candidates.
Using iproute2, the following manipulations are possible:
.. code-block:: sh
# See the DSA master in current use
ip -d link show dev swp0
(...)
dsa master eth0
# Static CPU port distribution
ip link set swp0 type dsa master eth1
ip link set swp1 type dsa master eth0
ip link set swp2 type dsa master eth1
ip link set swp3 type dsa master eth0
# CPU ports in LAG, using explicit assignment of the DSA master
ip link add bond0 type bond mode balance-xor && ip link set bond0 up
ip link set eth1 down && ip link set eth1 master bond0
ip link set swp0 type dsa master bond0
ip link set swp1 type dsa master bond0
ip link set swp2 type dsa master bond0
ip link set swp3 type dsa master bond0
ip link set eth0 down && ip link set eth0 master bond0
ip -d link show dev swp0
(...)
dsa master bond0
# CPU ports in LAG, relying on implicit migration of the DSA master
ip link add bond0 type bond mode balance-xor && ip link set bond0 up
ip link set eth0 down && ip link set eth0 master bond0
ip link set eth1 down && ip link set eth1 master bond0
ip -d link show dev swp0
(...)
dsa master bond0
Notice that in the case of CPU ports under a LAG, the use of the
``IFLA_DSA_MASTER`` netlink attribute is not strictly needed, but rather, DSA
reacts to the ``IFLA_MASTER`` attribute change of its present master (``eth0``)
and migrates all user ports to the new upper of ``eth0``, ``bond0``. Similarly,
when ``bond0`` is destroyed using ``RTM_DELLINK``, DSA migrates the user ports
that were assigned to this interface to the first physical DSA master which is
eligible, based on the firmware description (it effectively reverts to the
startup configuration).
In a setup with more than 2 physical CPU ports, it is therefore possible to mix
static user to CPU port assignment with LAG between DSA masters. It is not
possible to statically assign a user port towards a DSA master that has any
upper interfaces (this includes LAG devices - the master must always be the LAG
in this case).
Live changing of the DSA master (and thus CPU port) affinity of a user port is
permitted, in order to allow dynamic redistribution in response to traffic.
Physical DSA masters are allowed to join and leave at any time a LAG interface
used as a DSA master; however, DSA will reject a LAG interface as a valid
candidate for being a DSA master unless it has at least one physical DSA master
as a slave device.

View File

@ -303,6 +303,20 @@ These frames are then queued for transmission using the master network device
Ethernet switch will be able to process these incoming frames from the
management interface and deliver them to the physical switch port.
When using multiple CPU ports, it is possible to stack a LAG (bonding/team)
device between the DSA slave devices and the physical DSA masters. The LAG
device is thus also a DSA master, but the LAG slave devices continue to be DSA
masters as well (just with no user port assigned to them; this is needed for
recovery in case the LAG DSA master disappears). Thus, the data path of the LAG
DSA master is used asymmetrically. On RX, the ``ETH_P_XDSA`` handler, which
calls ``dsa_switch_rcv()``, is invoked early (on the physical DSA master;
LAG slave). Therefore, the RX data path of the LAG DSA master is not used.
On the other hand, TX takes place linearly: ``dsa_slave_xmit`` calls
``dsa_enqueue_skb``, which calls ``dev_queue_xmit`` towards the LAG DSA master.
The latter calls ``dev_queue_xmit`` towards one physical DSA master or the
other, and in both cases, the packet exits the system through a hardware path
towards the switch.
Graphical representation
------------------------
@ -629,6 +643,24 @@ Switch configuration
PHY cannot be found. In this case, probing of the DSA switch continues
without that particular port.
- ``port_change_master``: method through which the affinity (association used
for traffic termination purposes) between a user port and a CPU port can be
changed. By default all user ports from a tree are assigned to the first
available CPU port that makes sense for them (most of the times this means
the user ports of a tree are all assigned to the same CPU port, except for H
topologies as described in commit 2c0b03258b8b). The ``port`` argument
represents the index of the user port, and the ``master`` argument represents
the new DSA master ``net_device``. The CPU port associated with the new
master can be retrieved by looking at ``struct dsa_port *cpu_dp =
master->dsa_ptr``. Additionally, the master can also be a LAG device where
all the slave devices are physical DSA masters. LAG DSA masters also have a
valid ``master->dsa_ptr`` pointer, however this is not unique, but rather a
duplicate of the first physical DSA master's (LAG slave) ``dsa_ptr``. In case
of a LAG DSA master, a further call to ``port_lag_join`` will be emitted
separately for the physical CPU ports associated with the physical DSA
masters, requesting them to create a hardware LAG associated with the LAG
interface.
PHY devices and link management
-------------------------------
@ -1095,9 +1127,3 @@ capable hardware, but does not enforce a strict switch device driver model. On
the other DSA enforces a fairly strict device driver model, and deals with most
of the switch specific. At some point we should envision a merger between these
two subsystems and get the best of both worlds.
Other hanging fruits
--------------------
- allowing more than one CPU/management interface:
http://comments.gmane.org/gmane.linux.network/365657