net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
// SPDX-License-Identifier: (GPL-2.0 OR MIT)
|
|
|
|
/* Copyright 2017 Microsemi Corporation
|
2021-09-17 11:17:35 +00:00
|
|
|
* Copyright 2018-2019 NXP
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
*/
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
#include <linux/fsl/enetc_mdio.h>
|
2020-05-13 02:25:09 +00:00
|
|
|
#include <soc/mscc/ocelot_qsys.h>
|
2020-02-29 14:31:14 +00:00
|
|
|
#include <soc/mscc/ocelot_vcap.h>
|
2021-11-18 10:12:00 +00:00
|
|
|
#include <soc/mscc/ocelot_ana.h>
|
2023-01-19 12:27:04 +00:00
|
|
|
#include <soc/mscc/ocelot_dev.h>
|
2020-05-13 02:25:09 +00:00
|
|
|
#include <soc/mscc/ocelot_ptp.h>
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
#include <soc/mscc/ocelot_sys.h>
|
2021-11-18 10:12:01 +00:00
|
|
|
#include <net/tc_act/tc_gate.h>
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
#include <soc/mscc/ocelot.h>
|
2021-02-13 22:37:56 +00:00
|
|
|
#include <linux/dsa/ocelot.h>
|
2020-08-30 08:34:02 +00:00
|
|
|
#include <linux/pcs-lynx.h>
|
2020-05-13 02:25:09 +00:00
|
|
|
#include <net/pkt_sched.h>
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
#include <linux/iopoll.h>
|
2020-07-19 22:03:34 +00:00
|
|
|
#include <linux/mdio.h>
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
#include <linux/pci.h>
|
2022-06-28 14:52:38 +00:00
|
|
|
#include <linux/time.h>
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
#include "felix.h"
|
|
|
|
|
2022-02-26 22:36:50 +00:00
|
|
|
#define VSC9959_NUM_PORTS 6
|
|
|
|
|
2020-05-13 02:25:09 +00:00
|
|
|
#define VSC9959_TAS_GCL_ENTRY_MAX 63
|
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
The blamed commit broke tc-taprio schedules such as this one:
tc qdisc replace dev $swp1 root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 \
sched-entry S 0x7f 990000 \
sched-entry S 0x80 10000 \
flags 0x2
because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
band added earlier than its 'gate close' event, such that packet
overruns won't occur in the worst case of the largest packet possible.
Since guard bands are statically determined based on the per-tc
QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
we need to discuss what happens with TC 7 depending on kernel version,
since the driver, prior to commit 55a515b1f5a9 ("net: dsa: felix: drop
oversized frames with tc-taprio instead of hanging the port"), did not
touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.
1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
1518, and at gigabit this introduces a static guard band (independent
of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
time of 20 octets => 160 ns). But this is larger than the time window
itself, of 10000 ns. So, the queue system never considers a frame with
TC 7 as eligible for transmission, since the gate practically never
opens, and these frames are forever stuck in the TX queues and hang
the port.
2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
enabling oversized frame dropping, we make an effort to set
QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
one more role, which we did not take into account: per-tc static guard
band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead).
There is a discrepancy between what the driver thinks (that there is
no guard band, and 100% of min_gate_len[tc] is available for egress
scheduling) and what the hardware actually does (crops the equivalent
of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
means that the hardware thinks it has exactly 0 ns for scheduling tc 7.
In both cases, even minimum sized Ethernet frames are stuck on egress
rather than being considered for scheduling on TC 7, even if they would
fit given a proper configuration. Considering the current situation,
with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230
octets in size are not eligible for oversized dropping (because they are
smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
for scheduling either, because the min_gate_len[7] (10000 ns) minus the
guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
octet == 9840 ns) minus the guard band auto-added for L1 overhead by
QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
leaves 0 ns for scheduling in the queue system proper.
Investigating the hardware behavior, it becomes apparent that the queue
system needs precisely 33 ns of 'gate open' time in order to consider a
frame as eligible for scheduling to a tc. So the solution to this
problem is to amend vsc9959_tas_guard_bands_update(), by giving the
per-tc guard bands less space by exactly 33 ns, just enough for one
frame to be scheduled in that interval. This allows the queue system to
make forward progress for that port-tc, and prevents it from hanging.
Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:23 +00:00
|
|
|
#define VSC9959_TAS_MIN_GATE_LEN_NS 33
|
2021-11-18 10:12:02 +00:00
|
|
|
#define VSC9959_VCAP_POLICER_BASE 63
|
|
|
|
#define VSC9959_VCAP_POLICER_MAX 383
|
2021-12-07 17:00:27 +00:00
|
|
|
#define VSC9959_SWITCH_PCI_BAR 4
|
|
|
|
#define VSC9959_IMDIO_PCI_BAR 0
|
2020-05-13 02:25:09 +00:00
|
|
|
|
2022-02-26 22:36:50 +00:00
|
|
|
#define VSC9959_PORT_MODE_SERDES (OCELOT_PORT_MODE_SGMII | \
|
|
|
|
OCELOT_PORT_MODE_QSGMII | \
|
2022-05-10 16:43:20 +00:00
|
|
|
OCELOT_PORT_MODE_1000BASEX | \
|
2022-02-26 22:36:50 +00:00
|
|
|
OCELOT_PORT_MODE_2500BASEX | \
|
|
|
|
OCELOT_PORT_MODE_USXGMII)
|
|
|
|
|
|
|
|
static const u32 vsc9959_port_modes[VSC9959_NUM_PORTS] = {
|
|
|
|
VSC9959_PORT_MODE_SERDES,
|
|
|
|
VSC9959_PORT_MODE_SERDES,
|
|
|
|
VSC9959_PORT_MODE_SERDES,
|
|
|
|
VSC9959_PORT_MODE_SERDES,
|
|
|
|
OCELOT_PORT_MODE_INTERNAL,
|
2022-03-18 19:58:12 +00:00
|
|
|
OCELOT_PORT_MODE_INTERNAL,
|
2022-02-26 22:36:50 +00:00
|
|
|
};
|
|
|
|
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
static const u32 vsc9959_ana_regmap[] = {
|
|
|
|
REG(ANA_ADVLEARN, 0x0089a0),
|
|
|
|
REG(ANA_VLANMASK, 0x0089a4),
|
|
|
|
REG_RESERVED(ANA_PORT_B_DOMAIN),
|
|
|
|
REG(ANA_ANAGEFIL, 0x0089ac),
|
|
|
|
REG(ANA_ANEVENTS, 0x0089b0),
|
|
|
|
REG(ANA_STORMLIMIT_BURST, 0x0089b4),
|
|
|
|
REG(ANA_STORMLIMIT_CFG, 0x0089b8),
|
|
|
|
REG(ANA_ISOLATED_PORTS, 0x0089c8),
|
|
|
|
REG(ANA_COMMUNITY_PORTS, 0x0089cc),
|
|
|
|
REG(ANA_AUTOAGE, 0x0089d0),
|
|
|
|
REG(ANA_MACTOPTIONS, 0x0089d4),
|
|
|
|
REG(ANA_LEARNDISC, 0x0089d8),
|
|
|
|
REG(ANA_AGENCTRL, 0x0089dc),
|
|
|
|
REG(ANA_MIRRORPORTS, 0x0089e0),
|
|
|
|
REG(ANA_EMIRRORPORTS, 0x0089e4),
|
|
|
|
REG(ANA_FLOODING, 0x0089e8),
|
|
|
|
REG(ANA_FLOODING_IPMC, 0x008a08),
|
|
|
|
REG(ANA_SFLOW_CFG, 0x008a0c),
|
|
|
|
REG(ANA_PORT_MODE, 0x008a28),
|
|
|
|
REG(ANA_CUT_THRU_CFG, 0x008a48),
|
|
|
|
REG(ANA_PGID_PGID, 0x008400),
|
|
|
|
REG(ANA_TABLES_ANMOVED, 0x007f1c),
|
|
|
|
REG(ANA_TABLES_MACHDATA, 0x007f20),
|
|
|
|
REG(ANA_TABLES_MACLDATA, 0x007f24),
|
|
|
|
REG(ANA_TABLES_STREAMDATA, 0x007f28),
|
|
|
|
REG(ANA_TABLES_MACACCESS, 0x007f2c),
|
|
|
|
REG(ANA_TABLES_MACTINDX, 0x007f30),
|
|
|
|
REG(ANA_TABLES_VLANACCESS, 0x007f34),
|
|
|
|
REG(ANA_TABLES_VLANTIDX, 0x007f38),
|
|
|
|
REG(ANA_TABLES_ISDXACCESS, 0x007f3c),
|
|
|
|
REG(ANA_TABLES_ISDXTIDX, 0x007f40),
|
|
|
|
REG(ANA_TABLES_ENTRYLIM, 0x007f00),
|
|
|
|
REG(ANA_TABLES_PTP_ID_HIGH, 0x007f44),
|
|
|
|
REG(ANA_TABLES_PTP_ID_LOW, 0x007f48),
|
|
|
|
REG(ANA_TABLES_STREAMACCESS, 0x007f4c),
|
|
|
|
REG(ANA_TABLES_STREAMTIDX, 0x007f50),
|
|
|
|
REG(ANA_TABLES_SEQ_HISTORY, 0x007f54),
|
|
|
|
REG(ANA_TABLES_SEQ_MASK, 0x007f58),
|
|
|
|
REG(ANA_TABLES_SFID_MASK, 0x007f5c),
|
|
|
|
REG(ANA_TABLES_SFIDACCESS, 0x007f60),
|
|
|
|
REG(ANA_TABLES_SFIDTIDX, 0x007f64),
|
|
|
|
REG(ANA_MSTI_STATE, 0x008600),
|
|
|
|
REG(ANA_OAM_UPM_LM_CNT, 0x008000),
|
|
|
|
REG(ANA_SG_ACCESS_CTRL, 0x008a64),
|
|
|
|
REG(ANA_SG_CONFIG_REG_1, 0x007fb0),
|
|
|
|
REG(ANA_SG_CONFIG_REG_2, 0x007fb4),
|
|
|
|
REG(ANA_SG_CONFIG_REG_3, 0x007fb8),
|
|
|
|
REG(ANA_SG_CONFIG_REG_4, 0x007fbc),
|
|
|
|
REG(ANA_SG_CONFIG_REG_5, 0x007fc0),
|
|
|
|
REG(ANA_SG_GCL_GS_CONFIG, 0x007f80),
|
|
|
|
REG(ANA_SG_GCL_TI_CONFIG, 0x007f90),
|
|
|
|
REG(ANA_SG_STATUS_REG_1, 0x008980),
|
|
|
|
REG(ANA_SG_STATUS_REG_2, 0x008984),
|
|
|
|
REG(ANA_SG_STATUS_REG_3, 0x008988),
|
|
|
|
REG(ANA_PORT_VLAN_CFG, 0x007800),
|
|
|
|
REG(ANA_PORT_DROP_CFG, 0x007804),
|
|
|
|
REG(ANA_PORT_QOS_CFG, 0x007808),
|
|
|
|
REG(ANA_PORT_VCAP_CFG, 0x00780c),
|
|
|
|
REG(ANA_PORT_VCAP_S1_KEY_CFG, 0x007810),
|
|
|
|
REG(ANA_PORT_VCAP_S2_CFG, 0x00781c),
|
|
|
|
REG(ANA_PORT_PCP_DEI_MAP, 0x007820),
|
|
|
|
REG(ANA_PORT_CPU_FWD_CFG, 0x007860),
|
|
|
|
REG(ANA_PORT_CPU_FWD_BPDU_CFG, 0x007864),
|
|
|
|
REG(ANA_PORT_CPU_FWD_GARP_CFG, 0x007868),
|
|
|
|
REG(ANA_PORT_CPU_FWD_CCM_CFG, 0x00786c),
|
|
|
|
REG(ANA_PORT_PORT_CFG, 0x007870),
|
|
|
|
REG(ANA_PORT_POL_CFG, 0x007874),
|
|
|
|
REG(ANA_PORT_PTP_CFG, 0x007878),
|
|
|
|
REG(ANA_PORT_PTP_DLY1_CFG, 0x00787c),
|
|
|
|
REG(ANA_PORT_PTP_DLY2_CFG, 0x007880),
|
|
|
|
REG(ANA_PORT_SFID_CFG, 0x007884),
|
|
|
|
REG(ANA_PFC_PFC_CFG, 0x008800),
|
|
|
|
REG_RESERVED(ANA_PFC_PFC_TIMER),
|
|
|
|
REG_RESERVED(ANA_IPT_OAM_MEP_CFG),
|
|
|
|
REG_RESERVED(ANA_IPT_IPT),
|
|
|
|
REG_RESERVED(ANA_PPT_PPT),
|
|
|
|
REG_RESERVED(ANA_FID_MAP_FID_MAP),
|
|
|
|
REG(ANA_AGGR_CFG, 0x008a68),
|
|
|
|
REG(ANA_CPUQ_CFG, 0x008a6c),
|
|
|
|
REG_RESERVED(ANA_CPUQ_CFG2),
|
|
|
|
REG(ANA_CPUQ_8021_CFG, 0x008a74),
|
|
|
|
REG(ANA_DSCP_CFG, 0x008ab4),
|
|
|
|
REG(ANA_DSCP_REWR_CFG, 0x008bb4),
|
|
|
|
REG(ANA_VCAP_RNG_TYPE_CFG, 0x008bf4),
|
|
|
|
REG(ANA_VCAP_RNG_VAL_CFG, 0x008c14),
|
|
|
|
REG_RESERVED(ANA_VRAP_CFG),
|
|
|
|
REG_RESERVED(ANA_VRAP_HDR_DATA),
|
|
|
|
REG_RESERVED(ANA_VRAP_HDR_MASK),
|
|
|
|
REG(ANA_DISCARD_CFG, 0x008c40),
|
|
|
|
REG(ANA_FID_CFG, 0x008c44),
|
|
|
|
REG(ANA_POL_PIR_CFG, 0x004000),
|
|
|
|
REG(ANA_POL_CIR_CFG, 0x004004),
|
|
|
|
REG(ANA_POL_MODE_CFG, 0x004008),
|
|
|
|
REG(ANA_POL_PIR_STATE, 0x00400c),
|
|
|
|
REG(ANA_POL_CIR_STATE, 0x004010),
|
|
|
|
REG_RESERVED(ANA_POL_STATE),
|
|
|
|
REG(ANA_POL_FLOWC, 0x008c48),
|
|
|
|
REG(ANA_POL_HYST, 0x008cb4),
|
|
|
|
REG_RESERVED(ANA_POL_MISC_CFG),
|
|
|
|
};
|
|
|
|
|
|
|
|
static const u32 vsc9959_qs_regmap[] = {
|
|
|
|
REG(QS_XTR_GRP_CFG, 0x000000),
|
|
|
|
REG(QS_XTR_RD, 0x000008),
|
|
|
|
REG(QS_XTR_FRM_PRUNING, 0x000010),
|
|
|
|
REG(QS_XTR_FLUSH, 0x000018),
|
|
|
|
REG(QS_XTR_DATA_PRESENT, 0x00001c),
|
|
|
|
REG(QS_XTR_CFG, 0x000020),
|
|
|
|
REG(QS_INJ_GRP_CFG, 0x000024),
|
|
|
|
REG(QS_INJ_WR, 0x00002c),
|
|
|
|
REG(QS_INJ_CTRL, 0x000034),
|
|
|
|
REG(QS_INJ_STATUS, 0x00003c),
|
|
|
|
REG(QS_INJ_ERR, 0x000040),
|
|
|
|
REG_RESERVED(QS_INH_DBG),
|
|
|
|
};
|
|
|
|
|
net: mscc: ocelot: generalize existing code for VCAP
In the Ocelot switches there are 3 TCAMs: VCAP ES0, IS1 and IS2, which
have the same configuration interface, but different sets of keys and
actions. The driver currently only supports VCAP IS2.
In preparation of VCAP IS1 and ES0 support, the existing code must be
generalized to work with any VCAP.
In that direction, we should move the structures that depend upon VCAP
instantiation, like vcap_is2_keys and vcap_is2_actions, out of struct
ocelot and into struct vcap_props .keys and .actions, a structure that
is replicated 3 times, once per VCAP. We'll pass that structure as an
argument to each function that does the key and action packing - only
the control logic needs to distinguish between ocelot->vcap[VCAP_IS2]
or IS1 or ES0.
Another change is to make use of the newly introduced ocelot_target_read
and ocelot_target_write API, since the 3 VCAPs have the same registers
but put at different addresses.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29 22:27:23 +00:00
|
|
|
static const u32 vsc9959_vcap_regmap[] = {
|
|
|
|
/* VCAP_CORE_CFG */
|
|
|
|
REG(VCAP_CORE_UPDATE_CTRL, 0x000000),
|
|
|
|
REG(VCAP_CORE_MV_CFG, 0x000004),
|
|
|
|
/* VCAP_CORE_CACHE */
|
|
|
|
REG(VCAP_CACHE_ENTRY_DAT, 0x000008),
|
|
|
|
REG(VCAP_CACHE_MASK_DAT, 0x000108),
|
|
|
|
REG(VCAP_CACHE_ACTION_DAT, 0x000208),
|
|
|
|
REG(VCAP_CACHE_CNT_DAT, 0x000308),
|
|
|
|
REG(VCAP_CACHE_TG_DAT, 0x000388),
|
2020-09-29 22:27:26 +00:00
|
|
|
/* VCAP_CONST */
|
|
|
|
REG(VCAP_CONST_VCAP_VER, 0x000398),
|
|
|
|
REG(VCAP_CONST_ENTRY_WIDTH, 0x00039c),
|
|
|
|
REG(VCAP_CONST_ENTRY_CNT, 0x0003a0),
|
|
|
|
REG(VCAP_CONST_ENTRY_SWCNT, 0x0003a4),
|
|
|
|
REG(VCAP_CONST_ENTRY_TG_WIDTH, 0x0003a8),
|
|
|
|
REG(VCAP_CONST_ACTION_DEF_CNT, 0x0003ac),
|
|
|
|
REG(VCAP_CONST_ACTION_WIDTH, 0x0003b0),
|
|
|
|
REG(VCAP_CONST_CNT_WIDTH, 0x0003b4),
|
|
|
|
REG(VCAP_CONST_CORE_CNT, 0x0003b8),
|
|
|
|
REG(VCAP_CONST_IF_CNT, 0x0003bc),
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
static const u32 vsc9959_qsys_regmap[] = {
|
|
|
|
REG(QSYS_PORT_MODE, 0x00f460),
|
|
|
|
REG(QSYS_SWITCH_PORT_MODE, 0x00f480),
|
|
|
|
REG(QSYS_STAT_CNT_CFG, 0x00f49c),
|
|
|
|
REG(QSYS_EEE_CFG, 0x00f4a0),
|
|
|
|
REG(QSYS_EEE_THRES, 0x00f4b8),
|
|
|
|
REG(QSYS_IGR_NO_SHARING, 0x00f4bc),
|
|
|
|
REG(QSYS_EGR_NO_SHARING, 0x00f4c0),
|
|
|
|
REG(QSYS_SW_STATUS, 0x00f4c4),
|
|
|
|
REG(QSYS_EXT_CPU_CFG, 0x00f4e0),
|
|
|
|
REG_RESERVED(QSYS_PAD_CFG),
|
|
|
|
REG(QSYS_CPU_GROUP_MAP, 0x00f4e8),
|
|
|
|
REG_RESERVED(QSYS_QMAP),
|
|
|
|
REG_RESERVED(QSYS_ISDX_SGRP),
|
|
|
|
REG_RESERVED(QSYS_TIMED_FRAME_ENTRY),
|
|
|
|
REG(QSYS_TFRM_MISC, 0x00f50c),
|
|
|
|
REG(QSYS_TFRM_PORT_DLY, 0x00f510),
|
|
|
|
REG(QSYS_TFRM_TIMER_CFG_1, 0x00f514),
|
|
|
|
REG(QSYS_TFRM_TIMER_CFG_2, 0x00f518),
|
|
|
|
REG(QSYS_TFRM_TIMER_CFG_3, 0x00f51c),
|
|
|
|
REG(QSYS_TFRM_TIMER_CFG_4, 0x00f520),
|
|
|
|
REG(QSYS_TFRM_TIMER_CFG_5, 0x00f524),
|
|
|
|
REG(QSYS_TFRM_TIMER_CFG_6, 0x00f528),
|
|
|
|
REG(QSYS_TFRM_TIMER_CFG_7, 0x00f52c),
|
|
|
|
REG(QSYS_TFRM_TIMER_CFG_8, 0x00f530),
|
|
|
|
REG(QSYS_RED_PROFILE, 0x00f534),
|
|
|
|
REG(QSYS_RES_QOS_MODE, 0x00f574),
|
|
|
|
REG(QSYS_RES_CFG, 0x00c000),
|
|
|
|
REG(QSYS_RES_STAT, 0x00c004),
|
|
|
|
REG(QSYS_EGR_DROP_MODE, 0x00f578),
|
|
|
|
REG(QSYS_EQ_CTRL, 0x00f57c),
|
|
|
|
REG_RESERVED(QSYS_EVENTS_CORE),
|
|
|
|
REG(QSYS_QMAXSDU_CFG_0, 0x00f584),
|
|
|
|
REG(QSYS_QMAXSDU_CFG_1, 0x00f5a0),
|
|
|
|
REG(QSYS_QMAXSDU_CFG_2, 0x00f5bc),
|
|
|
|
REG(QSYS_QMAXSDU_CFG_3, 0x00f5d8),
|
|
|
|
REG(QSYS_QMAXSDU_CFG_4, 0x00f5f4),
|
|
|
|
REG(QSYS_QMAXSDU_CFG_5, 0x00f610),
|
|
|
|
REG(QSYS_QMAXSDU_CFG_6, 0x00f62c),
|
|
|
|
REG(QSYS_QMAXSDU_CFG_7, 0x00f648),
|
|
|
|
REG(QSYS_PREEMPTION_CFG, 0x00f664),
|
2020-05-13 02:25:10 +00:00
|
|
|
REG(QSYS_CIR_CFG, 0x000000),
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
REG(QSYS_EIR_CFG, 0x000004),
|
|
|
|
REG(QSYS_SE_CFG, 0x000008),
|
|
|
|
REG(QSYS_SE_DWRR_CFG, 0x00000c),
|
|
|
|
REG_RESERVED(QSYS_SE_CONNECT),
|
|
|
|
REG(QSYS_SE_DLB_SENSE, 0x000040),
|
|
|
|
REG(QSYS_CIR_STATE, 0x000044),
|
|
|
|
REG(QSYS_EIR_STATE, 0x000048),
|
|
|
|
REG_RESERVED(QSYS_SE_STATE),
|
|
|
|
REG(QSYS_HSCH_MISC_CFG, 0x00f67c),
|
|
|
|
REG(QSYS_TAG_CONFIG, 0x00f680),
|
|
|
|
REG(QSYS_TAS_PARAM_CFG_CTRL, 0x00f698),
|
|
|
|
REG(QSYS_PORT_MAX_SDU, 0x00f69c),
|
|
|
|
REG(QSYS_PARAM_CFG_REG_1, 0x00f440),
|
|
|
|
REG(QSYS_PARAM_CFG_REG_2, 0x00f444),
|
|
|
|
REG(QSYS_PARAM_CFG_REG_3, 0x00f448),
|
|
|
|
REG(QSYS_PARAM_CFG_REG_4, 0x00f44c),
|
|
|
|
REG(QSYS_PARAM_CFG_REG_5, 0x00f450),
|
|
|
|
REG(QSYS_GCL_CFG_REG_1, 0x00f454),
|
|
|
|
REG(QSYS_GCL_CFG_REG_2, 0x00f458),
|
|
|
|
REG(QSYS_PARAM_STATUS_REG_1, 0x00f400),
|
|
|
|
REG(QSYS_PARAM_STATUS_REG_2, 0x00f404),
|
|
|
|
REG(QSYS_PARAM_STATUS_REG_3, 0x00f408),
|
|
|
|
REG(QSYS_PARAM_STATUS_REG_4, 0x00f40c),
|
|
|
|
REG(QSYS_PARAM_STATUS_REG_5, 0x00f410),
|
|
|
|
REG(QSYS_PARAM_STATUS_REG_6, 0x00f414),
|
|
|
|
REG(QSYS_PARAM_STATUS_REG_7, 0x00f418),
|
|
|
|
REG(QSYS_PARAM_STATUS_REG_8, 0x00f41c),
|
|
|
|
REG(QSYS_PARAM_STATUS_REG_9, 0x00f420),
|
|
|
|
REG(QSYS_GCL_STATUS_REG_1, 0x00f424),
|
|
|
|
REG(QSYS_GCL_STATUS_REG_2, 0x00f428),
|
|
|
|
};
|
|
|
|
|
|
|
|
static const u32 vsc9959_rew_regmap[] = {
|
|
|
|
REG(REW_PORT_VLAN_CFG, 0x000000),
|
|
|
|
REG(REW_TAG_CFG, 0x000004),
|
|
|
|
REG(REW_PORT_CFG, 0x000008),
|
|
|
|
REG(REW_DSCP_CFG, 0x00000c),
|
|
|
|
REG(REW_PCP_DEI_QOS_MAP_CFG, 0x000010),
|
|
|
|
REG(REW_PTP_CFG, 0x000050),
|
|
|
|
REG(REW_PTP_DLY1_CFG, 0x000054),
|
|
|
|
REG(REW_RED_TAG_CFG, 0x000058),
|
|
|
|
REG(REW_DSCP_REMAP_DP1_CFG, 0x000410),
|
|
|
|
REG(REW_DSCP_REMAP_CFG, 0x000510),
|
|
|
|
REG_RESERVED(REW_STAT_CFG),
|
|
|
|
REG_RESERVED(REW_REW_STICKY),
|
|
|
|
REG_RESERVED(REW_PPT),
|
|
|
|
};
|
|
|
|
|
|
|
|
static const u32 vsc9959_sys_regmap[] = {
|
|
|
|
REG(SYS_COUNT_RX_OCTETS, 0x000000),
|
2022-08-16 13:53:51 +00:00
|
|
|
REG(SYS_COUNT_RX_UNICAST, 0x000004),
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
REG(SYS_COUNT_RX_MULTICAST, 0x000008),
|
2022-08-16 13:53:51 +00:00
|
|
|
REG(SYS_COUNT_RX_BROADCAST, 0x00000c),
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
REG(SYS_COUNT_RX_SHORTS, 0x000010),
|
|
|
|
REG(SYS_COUNT_RX_FRAGMENTS, 0x000014),
|
|
|
|
REG(SYS_COUNT_RX_JABBERS, 0x000018),
|
2022-08-16 13:53:51 +00:00
|
|
|
REG(SYS_COUNT_RX_CRC_ALIGN_ERRS, 0x00001c),
|
|
|
|
REG(SYS_COUNT_RX_SYM_ERRS, 0x000020),
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
REG(SYS_COUNT_RX_64, 0x000024),
|
|
|
|
REG(SYS_COUNT_RX_65_127, 0x000028),
|
|
|
|
REG(SYS_COUNT_RX_128_255, 0x00002c),
|
net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters
Reading stats using the SYS_COUNT_* register definitions is only used by
ocelot_get_stats64() from the ocelot switchdev driver, however,
currently the bucket definitions are incorrect.
Separately, on both RX and TX, we have the following problems:
- a 256-1023 bucket which actually tracks the 256-511 packets
- the 1024-1526 bucket actually tracks the 512-1023 packets
- the 1527-max bucket actually tracks the 1024-1526 packets
=> nobody tracks the packets from the real 1527-max bucket
Additionally, the RX_PAUSE, RX_CONTROL, RX_LONGS and RX_CLASSIFIED_DROPS
all track the wrong thing. However this doesn't seem to have any
consequence, since ocelot_get_stats64() doesn't use these.
Even though this problem only manifests itself for the switchdev driver,
we cannot split the fix for ocelot and for DSA, since it requires fixing
the bucket definitions from enum ocelot_reg, which makes us necessarily
adapt the structures from felix and seville as well.
Fixes: 84705fc16552 ("net: dsa: felix: introduce support for Seville VSC9953 switch")
Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family")
Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-16 13:53:46 +00:00
|
|
|
REG(SYS_COUNT_RX_256_511, 0x000030),
|
|
|
|
REG(SYS_COUNT_RX_512_1023, 0x000034),
|
|
|
|
REG(SYS_COUNT_RX_1024_1526, 0x000038),
|
|
|
|
REG(SYS_COUNT_RX_1527_MAX, 0x00003c),
|
|
|
|
REG(SYS_COUNT_RX_PAUSE, 0x000040),
|
|
|
|
REG(SYS_COUNT_RX_CONTROL, 0x000044),
|
|
|
|
REG(SYS_COUNT_RX_LONGS, 0x000048),
|
2022-08-16 13:53:51 +00:00
|
|
|
REG(SYS_COUNT_RX_CLASSIFIED_DROPS, 0x00004c),
|
|
|
|
REG(SYS_COUNT_RX_RED_PRIO_0, 0x000050),
|
|
|
|
REG(SYS_COUNT_RX_RED_PRIO_1, 0x000054),
|
|
|
|
REG(SYS_COUNT_RX_RED_PRIO_2, 0x000058),
|
|
|
|
REG(SYS_COUNT_RX_RED_PRIO_3, 0x00005c),
|
|
|
|
REG(SYS_COUNT_RX_RED_PRIO_4, 0x000060),
|
|
|
|
REG(SYS_COUNT_RX_RED_PRIO_5, 0x000064),
|
|
|
|
REG(SYS_COUNT_RX_RED_PRIO_6, 0x000068),
|
|
|
|
REG(SYS_COUNT_RX_RED_PRIO_7, 0x00006c),
|
|
|
|
REG(SYS_COUNT_RX_YELLOW_PRIO_0, 0x000070),
|
|
|
|
REG(SYS_COUNT_RX_YELLOW_PRIO_1, 0x000074),
|
|
|
|
REG(SYS_COUNT_RX_YELLOW_PRIO_2, 0x000078),
|
|
|
|
REG(SYS_COUNT_RX_YELLOW_PRIO_3, 0x00007c),
|
|
|
|
REG(SYS_COUNT_RX_YELLOW_PRIO_4, 0x000080),
|
|
|
|
REG(SYS_COUNT_RX_YELLOW_PRIO_5, 0x000084),
|
|
|
|
REG(SYS_COUNT_RX_YELLOW_PRIO_6, 0x000088),
|
|
|
|
REG(SYS_COUNT_RX_YELLOW_PRIO_7, 0x00008c),
|
|
|
|
REG(SYS_COUNT_RX_GREEN_PRIO_0, 0x000090),
|
|
|
|
REG(SYS_COUNT_RX_GREEN_PRIO_1, 0x000094),
|
|
|
|
REG(SYS_COUNT_RX_GREEN_PRIO_2, 0x000098),
|
|
|
|
REG(SYS_COUNT_RX_GREEN_PRIO_3, 0x00009c),
|
|
|
|
REG(SYS_COUNT_RX_GREEN_PRIO_4, 0x0000a0),
|
|
|
|
REG(SYS_COUNT_RX_GREEN_PRIO_5, 0x0000a4),
|
|
|
|
REG(SYS_COUNT_RX_GREEN_PRIO_6, 0x0000a8),
|
|
|
|
REG(SYS_COUNT_RX_GREEN_PRIO_7, 0x0000ac),
|
net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959
The Felix VSC9959 switch supports frame preemption and has a MAC Merge
layer. In addition to the structured stats that exist for the eMAC,
export the counters associated with its pMAC (pause, RMON, MAC, PHY,
control) plus the high-level MAC Merge layer stats. The unstructured
ethtool counters, as well as the rtnl_link_stats64 were left to report
only the eMAC counters.
Because statistics processing is quite self-contained in ocelot_stats.c
now, I've opted for introducing an ocelot->mm_supported bool, based on
which the common switch lib does everything, rather than pushing the
TSN-specific code in felix_vsc9959.c, as happens for other TSN stuff.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-19 12:27:03 +00:00
|
|
|
REG(SYS_COUNT_RX_ASSEMBLY_ERRS, 0x0000b0),
|
|
|
|
REG(SYS_COUNT_RX_SMD_ERRS, 0x0000b4),
|
|
|
|
REG(SYS_COUNT_RX_ASSEMBLY_OK, 0x0000b8),
|
|
|
|
REG(SYS_COUNT_RX_MERGE_FRAGMENTS, 0x0000bc),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_OCTETS, 0x0000c0),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_UNICAST, 0x0000c4),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_MULTICAST, 0x0000c8),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_BROADCAST, 0x0000cc),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_SHORTS, 0x0000d0),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_FRAGMENTS, 0x0000d4),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_JABBERS, 0x0000d8),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_CRC_ALIGN_ERRS, 0x0000dc),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_SYM_ERRS, 0x0000e0),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_64, 0x0000e4),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_65_127, 0x0000e8),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_128_255, 0x0000ec),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_256_511, 0x0000f0),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_512_1023, 0x0000f4),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_1024_1526, 0x0000f8),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_1527_MAX, 0x0000fc),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_PAUSE, 0x000100),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_CONTROL, 0x000104),
|
|
|
|
REG(SYS_COUNT_RX_PMAC_LONGS, 0x000108),
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
REG(SYS_COUNT_TX_OCTETS, 0x000200),
|
2022-08-16 13:53:51 +00:00
|
|
|
REG(SYS_COUNT_TX_UNICAST, 0x000204),
|
|
|
|
REG(SYS_COUNT_TX_MULTICAST, 0x000208),
|
|
|
|
REG(SYS_COUNT_TX_BROADCAST, 0x00020c),
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
REG(SYS_COUNT_TX_COLLISION, 0x000210),
|
|
|
|
REG(SYS_COUNT_TX_DROPS, 0x000214),
|
2022-08-16 13:53:51 +00:00
|
|
|
REG(SYS_COUNT_TX_PAUSE, 0x000218),
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
REG(SYS_COUNT_TX_64, 0x00021c),
|
|
|
|
REG(SYS_COUNT_TX_65_127, 0x000220),
|
net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters
Reading stats using the SYS_COUNT_* register definitions is only used by
ocelot_get_stats64() from the ocelot switchdev driver, however,
currently the bucket definitions are incorrect.
Separately, on both RX and TX, we have the following problems:
- a 256-1023 bucket which actually tracks the 256-511 packets
- the 1024-1526 bucket actually tracks the 512-1023 packets
- the 1527-max bucket actually tracks the 1024-1526 packets
=> nobody tracks the packets from the real 1527-max bucket
Additionally, the RX_PAUSE, RX_CONTROL, RX_LONGS and RX_CLASSIFIED_DROPS
all track the wrong thing. However this doesn't seem to have any
consequence, since ocelot_get_stats64() doesn't use these.
Even though this problem only manifests itself for the switchdev driver,
we cannot split the fix for ocelot and for DSA, since it requires fixing
the bucket definitions from enum ocelot_reg, which makes us necessarily
adapt the structures from felix and seville as well.
Fixes: 84705fc16552 ("net: dsa: felix: introduce support for Seville VSC9953 switch")
Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family")
Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-16 13:53:46 +00:00
|
|
|
REG(SYS_COUNT_TX_128_255, 0x000224),
|
|
|
|
REG(SYS_COUNT_TX_256_511, 0x000228),
|
|
|
|
REG(SYS_COUNT_TX_512_1023, 0x00022c),
|
|
|
|
REG(SYS_COUNT_TX_1024_1526, 0x000230),
|
|
|
|
REG(SYS_COUNT_TX_1527_MAX, 0x000234),
|
2022-08-16 13:53:51 +00:00
|
|
|
REG(SYS_COUNT_TX_YELLOW_PRIO_0, 0x000238),
|
|
|
|
REG(SYS_COUNT_TX_YELLOW_PRIO_1, 0x00023c),
|
|
|
|
REG(SYS_COUNT_TX_YELLOW_PRIO_2, 0x000240),
|
|
|
|
REG(SYS_COUNT_TX_YELLOW_PRIO_3, 0x000244),
|
|
|
|
REG(SYS_COUNT_TX_YELLOW_PRIO_4, 0x000248),
|
|
|
|
REG(SYS_COUNT_TX_YELLOW_PRIO_5, 0x00024c),
|
|
|
|
REG(SYS_COUNT_TX_YELLOW_PRIO_6, 0x000250),
|
|
|
|
REG(SYS_COUNT_TX_YELLOW_PRIO_7, 0x000254),
|
|
|
|
REG(SYS_COUNT_TX_GREEN_PRIO_0, 0x000258),
|
|
|
|
REG(SYS_COUNT_TX_GREEN_PRIO_1, 0x00025c),
|
|
|
|
REG(SYS_COUNT_TX_GREEN_PRIO_2, 0x000260),
|
|
|
|
REG(SYS_COUNT_TX_GREEN_PRIO_3, 0x000264),
|
|
|
|
REG(SYS_COUNT_TX_GREEN_PRIO_4, 0x000268),
|
|
|
|
REG(SYS_COUNT_TX_GREEN_PRIO_5, 0x00026c),
|
|
|
|
REG(SYS_COUNT_TX_GREEN_PRIO_6, 0x000270),
|
|
|
|
REG(SYS_COUNT_TX_GREEN_PRIO_7, 0x000274),
|
2022-09-08 16:48:14 +00:00
|
|
|
REG(SYS_COUNT_TX_AGED, 0x000278),
|
net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959
The Felix VSC9959 switch supports frame preemption and has a MAC Merge
layer. In addition to the structured stats that exist for the eMAC,
export the counters associated with its pMAC (pause, RMON, MAC, PHY,
control) plus the high-level MAC Merge layer stats. The unstructured
ethtool counters, as well as the rtnl_link_stats64 were left to report
only the eMAC counters.
Because statistics processing is quite self-contained in ocelot_stats.c
now, I've opted for introducing an ocelot->mm_supported bool, based on
which the common switch lib does everything, rather than pushing the
TSN-specific code in felix_vsc9959.c, as happens for other TSN stuff.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-19 12:27:03 +00:00
|
|
|
REG(SYS_COUNT_TX_MM_HOLD, 0x00027c),
|
|
|
|
REG(SYS_COUNT_TX_MERGE_FRAGMENTS, 0x000280),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_OCTETS, 0x000284),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_UNICAST, 0x000288),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_MULTICAST, 0x00028c),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_BROADCAST, 0x000290),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_PAUSE, 0x000294),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_64, 0x000298),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_65_127, 0x00029c),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_128_255, 0x0002a0),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_256_511, 0x0002a4),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_512_1023, 0x0002a8),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_1024_1526, 0x0002ac),
|
|
|
|
REG(SYS_COUNT_TX_PMAC_1527_MAX, 0x0002b0),
|
2022-08-16 13:53:51 +00:00
|
|
|
REG(SYS_COUNT_DROP_LOCAL, 0x000400),
|
|
|
|
REG(SYS_COUNT_DROP_TAIL, 0x000404),
|
|
|
|
REG(SYS_COUNT_DROP_YELLOW_PRIO_0, 0x000408),
|
|
|
|
REG(SYS_COUNT_DROP_YELLOW_PRIO_1, 0x00040c),
|
|
|
|
REG(SYS_COUNT_DROP_YELLOW_PRIO_2, 0x000410),
|
|
|
|
REG(SYS_COUNT_DROP_YELLOW_PRIO_3, 0x000414),
|
|
|
|
REG(SYS_COUNT_DROP_YELLOW_PRIO_4, 0x000418),
|
|
|
|
REG(SYS_COUNT_DROP_YELLOW_PRIO_5, 0x00041c),
|
|
|
|
REG(SYS_COUNT_DROP_YELLOW_PRIO_6, 0x000420),
|
|
|
|
REG(SYS_COUNT_DROP_YELLOW_PRIO_7, 0x000424),
|
|
|
|
REG(SYS_COUNT_DROP_GREEN_PRIO_0, 0x000428),
|
|
|
|
REG(SYS_COUNT_DROP_GREEN_PRIO_1, 0x00042c),
|
|
|
|
REG(SYS_COUNT_DROP_GREEN_PRIO_2, 0x000430),
|
|
|
|
REG(SYS_COUNT_DROP_GREEN_PRIO_3, 0x000434),
|
|
|
|
REG(SYS_COUNT_DROP_GREEN_PRIO_4, 0x000438),
|
|
|
|
REG(SYS_COUNT_DROP_GREEN_PRIO_5, 0x00043c),
|
|
|
|
REG(SYS_COUNT_DROP_GREEN_PRIO_6, 0x000440),
|
|
|
|
REG(SYS_COUNT_DROP_GREEN_PRIO_7, 0x000444),
|
2022-09-08 16:48:03 +00:00
|
|
|
REG(SYS_COUNT_SF_MATCHING_FRAMES, 0x000800),
|
|
|
|
REG(SYS_COUNT_SF_NOT_PASSING_FRAMES, 0x000804),
|
|
|
|
REG(SYS_COUNT_SF_NOT_PASSING_SDU, 0x000808),
|
|
|
|
REG(SYS_COUNT_SF_RED_FRAMES, 0x00080c),
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
REG(SYS_RESET_CFG, 0x000e00),
|
|
|
|
REG(SYS_SR_ETYPE_CFG, 0x000e04),
|
|
|
|
REG(SYS_VLAN_ETYPE_CFG, 0x000e08),
|
|
|
|
REG(SYS_PORT_MODE, 0x000e0c),
|
|
|
|
REG(SYS_FRONT_PORT_MODE, 0x000e2c),
|
|
|
|
REG(SYS_FRM_AGING, 0x000e44),
|
|
|
|
REG(SYS_STAT_CFG, 0x000e48),
|
|
|
|
REG(SYS_SW_STATUS, 0x000e4c),
|
|
|
|
REG_RESERVED(SYS_MISC_CFG),
|
|
|
|
REG(SYS_REW_MAC_HIGH_CFG, 0x000e6c),
|
|
|
|
REG(SYS_REW_MAC_LOW_CFG, 0x000e84),
|
|
|
|
REG(SYS_TIMESTAMP_OFFSET, 0x000e9c),
|
|
|
|
REG(SYS_PAUSE_CFG, 0x000ea0),
|
|
|
|
REG(SYS_PAUSE_TOT_CFG, 0x000ebc),
|
|
|
|
REG(SYS_ATOP, 0x000ec0),
|
|
|
|
REG(SYS_ATOP_TOT_CFG, 0x000edc),
|
|
|
|
REG(SYS_MAC_FC_CFG, 0x000ee0),
|
|
|
|
REG(SYS_MMGT, 0x000ef8),
|
|
|
|
REG_RESERVED(SYS_MMGT_FAST),
|
|
|
|
REG_RESERVED(SYS_EVENTS_DIF),
|
|
|
|
REG_RESERVED(SYS_EVENTS_CORE),
|
|
|
|
REG(SYS_PTP_STATUS, 0x000f14),
|
|
|
|
REG(SYS_PTP_TXSTAMP, 0x000f18),
|
|
|
|
REG(SYS_PTP_NXT, 0x000f1c),
|
|
|
|
REG(SYS_PTP_CFG, 0x000f20),
|
|
|
|
REG(SYS_RAM_INIT, 0x000f24),
|
|
|
|
REG_RESERVED(SYS_CM_ADDR),
|
|
|
|
REG_RESERVED(SYS_CM_DATA_WR),
|
|
|
|
REG_RESERVED(SYS_CM_DATA_RD),
|
|
|
|
REG_RESERVED(SYS_CM_OP),
|
|
|
|
REG_RESERVED(SYS_CM_DATA),
|
|
|
|
};
|
|
|
|
|
2019-11-20 08:23:17 +00:00
|
|
|
static const u32 vsc9959_ptp_regmap[] = {
|
2020-09-18 10:57:49 +00:00
|
|
|
REG(PTP_PIN_CFG, 0x000000),
|
|
|
|
REG(PTP_PIN_TOD_SEC_MSB, 0x000004),
|
|
|
|
REG(PTP_PIN_TOD_SEC_LSB, 0x000008),
|
|
|
|
REG(PTP_PIN_TOD_NSEC, 0x00000c),
|
|
|
|
REG(PTP_PIN_WF_HIGH_PERIOD, 0x000014),
|
|
|
|
REG(PTP_PIN_WF_LOW_PERIOD, 0x000018),
|
|
|
|
REG(PTP_CFG_MISC, 0x0000a0),
|
|
|
|
REG(PTP_CLK_CFG_ADJ_CFG, 0x0000a4),
|
|
|
|
REG(PTP_CLK_CFG_ADJ_FREQ, 0x0000a8),
|
2019-11-20 08:23:17 +00:00
|
|
|
};
|
|
|
|
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
static const u32 vsc9959_gcb_regmap[] = {
|
|
|
|
REG(GCB_SOFT_RST, 0x000004),
|
|
|
|
};
|
|
|
|
|
2020-07-13 16:57:01 +00:00
|
|
|
static const u32 vsc9959_dev_gmii_regmap[] = {
|
|
|
|
REG(DEV_CLOCK_CFG, 0x0),
|
|
|
|
REG(DEV_PORT_MISC, 0x4),
|
|
|
|
REG(DEV_EVENTS, 0x8),
|
|
|
|
REG(DEV_EEE_CFG, 0xc),
|
|
|
|
REG(DEV_RX_PATH_DELAY, 0x10),
|
|
|
|
REG(DEV_TX_PATH_DELAY, 0x14),
|
|
|
|
REG(DEV_PTP_PREDICT_CFG, 0x18),
|
|
|
|
REG(DEV_MAC_ENA_CFG, 0x1c),
|
|
|
|
REG(DEV_MAC_MODE_CFG, 0x20),
|
|
|
|
REG(DEV_MAC_MAXLEN_CFG, 0x24),
|
|
|
|
REG(DEV_MAC_TAGS_CFG, 0x28),
|
|
|
|
REG(DEV_MAC_ADV_CHK_CFG, 0x2c),
|
|
|
|
REG(DEV_MAC_IFG_CFG, 0x30),
|
|
|
|
REG(DEV_MAC_HDX_CFG, 0x34),
|
|
|
|
REG(DEV_MAC_DBG_CFG, 0x38),
|
|
|
|
REG(DEV_MAC_FC_MAC_LOW_CFG, 0x3c),
|
|
|
|
REG(DEV_MAC_FC_MAC_HIGH_CFG, 0x40),
|
|
|
|
REG(DEV_MAC_STICKY, 0x44),
|
2023-01-19 12:27:04 +00:00
|
|
|
REG(DEV_MM_ENABLE_CONFIG, 0x48),
|
|
|
|
REG(DEV_MM_VERIF_CONFIG, 0x4C),
|
|
|
|
REG(DEV_MM_STATUS, 0x50),
|
2020-07-13 16:57:01 +00:00
|
|
|
REG_RESERVED(PCS1G_CFG),
|
|
|
|
REG_RESERVED(PCS1G_MODE_CFG),
|
|
|
|
REG_RESERVED(PCS1G_SD_CFG),
|
|
|
|
REG_RESERVED(PCS1G_ANEG_CFG),
|
|
|
|
REG_RESERVED(PCS1G_ANEG_NP_CFG),
|
|
|
|
REG_RESERVED(PCS1G_LB_CFG),
|
|
|
|
REG_RESERVED(PCS1G_DBG_CFG),
|
|
|
|
REG_RESERVED(PCS1G_CDET_CFG),
|
|
|
|
REG_RESERVED(PCS1G_ANEG_STATUS),
|
|
|
|
REG_RESERVED(PCS1G_ANEG_NP_STATUS),
|
|
|
|
REG_RESERVED(PCS1G_LINK_STATUS),
|
|
|
|
REG_RESERVED(PCS1G_LINK_DOWN_CNT),
|
|
|
|
REG_RESERVED(PCS1G_STICKY),
|
|
|
|
REG_RESERVED(PCS1G_DEBUG_STATUS),
|
|
|
|
REG_RESERVED(PCS1G_LPI_CFG),
|
|
|
|
REG_RESERVED(PCS1G_LPI_WAKE_ERROR_CNT),
|
|
|
|
REG_RESERVED(PCS1G_LPI_STATUS),
|
|
|
|
REG_RESERVED(PCS1G_TSTPAT_MODE_CFG),
|
|
|
|
REG_RESERVED(PCS1G_TSTPAT_STATUS),
|
|
|
|
REG_RESERVED(DEV_PCS_FX100_CFG),
|
|
|
|
REG_RESERVED(DEV_PCS_FX100_STATUS),
|
|
|
|
};
|
|
|
|
|
|
|
|
static const u32 *vsc9959_regmap[TARGET_MAX] = {
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
[ANA] = vsc9959_ana_regmap,
|
|
|
|
[QS] = vsc9959_qs_regmap,
|
|
|
|
[QSYS] = vsc9959_qsys_regmap,
|
|
|
|
[REW] = vsc9959_rew_regmap,
|
|
|
|
[SYS] = vsc9959_sys_regmap,
|
2020-09-29 22:27:25 +00:00
|
|
|
[S0] = vsc9959_vcap_regmap,
|
2020-09-29 22:27:24 +00:00
|
|
|
[S1] = vsc9959_vcap_regmap,
|
net: mscc: ocelot: generalize existing code for VCAP
In the Ocelot switches there are 3 TCAMs: VCAP ES0, IS1 and IS2, which
have the same configuration interface, but different sets of keys and
actions. The driver currently only supports VCAP IS2.
In preparation of VCAP IS1 and ES0 support, the existing code must be
generalized to work with any VCAP.
In that direction, we should move the structures that depend upon VCAP
instantiation, like vcap_is2_keys and vcap_is2_actions, out of struct
ocelot and into struct vcap_props .keys and .actions, a structure that
is replicated 3 times, once per VCAP. We'll pass that structure as an
argument to each function that does the key and action packing - only
the control logic needs to distinguish between ocelot->vcap[VCAP_IS2]
or IS1 or ES0.
Another change is to make use of the newly introduced ocelot_target_read
and ocelot_target_write API, since the 3 VCAPs have the same registers
but put at different addresses.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29 22:27:23 +00:00
|
|
|
[S2] = vsc9959_vcap_regmap,
|
2019-11-20 08:23:17 +00:00
|
|
|
[PTP] = vsc9959_ptp_regmap,
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
[GCB] = vsc9959_gcb_regmap,
|
2020-07-13 16:57:01 +00:00
|
|
|
[DEV_GMII] = vsc9959_dev_gmii_regmap,
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
};
|
|
|
|
|
2020-05-22 08:54:34 +00:00
|
|
|
/* Addresses are relative to the PCI device's base address */
|
net: dsa: felix: update regmap requests to be string-based
Existing felix DSA drivers (vsc9959, vsc9953) are all switches that were
integrated in NXP SoCs, which makes them a bit unusual compared to the
usual Microchip branded Ocelot switches.
To be precise, looking at
Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml, one can
see 21 memory regions for the "switch" node, and these correspond to the
"targets" of the switch IP, which are spread throughout the guts of that
SoC's memory space.
In NXP integrations, those targets still exist, but they were condensed
within a single memory region, with no other peripheral in between them,
so it made more sense for the driver to ioremap the entire memory space
of the switch, and then find the targets within that memory space via
some offsets hardcoded in the driver.
The effect of this design decision is that now, the felix driver expects
hardware instantiations to provide their own resource definitions, which
is kind of odd when considering a typical device (those are retrieved
from 'reg' properties in the device tree, using platform_get_resource()
or similar).
Allow other hardware instantiations that share the felix driver to not
provide a hardcoded array of resources in the future. Instead, make the
common denominator based on which regmaps are created be just the
resource "names". Each instantiation comes with its own array of names
that are mandatory for it, and with an optional array of resources.
So we split the resources in 2 arrays, one is what's requested and the
other is what's provided. There is one pool of provided resources, in
felix->info->resources (of length felix->info->num_resources). There are
2 different ways of requesting a resource. One is by enum ocelot_target
(this handles the global regmaps), and one is by int port (this handles
the per-port ones).
For the existing vsc9959 and vsc9953, it would be a bit stupid to
request something that's not provided, given that the 2 arrays are both
defined in the same place.
The advantage is that we can now modify felix_request_regmap_by_name()
to make felix->info->resources[] optional, and if absent, the
implementation can call dev_get_regmap() and this is something that is
compatible with MFD.
Co-developed-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-27 19:15:20 +00:00
|
|
|
static const struct resource vsc9959_resources[] = {
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0010000, 0x0010000, "sys"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0030000, 0x0010000, "rew"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0040000, 0x0000400, "s0"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0050000, 0x0000400, "s1"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0060000, 0x0000400, "s2"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0070000, 0x0000200, "devcpu_gcb"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0080000, 0x0000100, "qs"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0090000, 0x00000cc, "ptp"),
|
2022-09-27 19:15:19 +00:00
|
|
|
DEFINE_RES_MEM_NAMED(0x0100000, 0x0010000, "port0"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0110000, 0x0010000, "port1"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0120000, 0x0010000, "port2"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0130000, 0x0010000, "port3"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0140000, 0x0010000, "port4"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0150000, 0x0010000, "port5"),
|
net: dsa: felix: update regmap requests to be string-based
Existing felix DSA drivers (vsc9959, vsc9953) are all switches that were
integrated in NXP SoCs, which makes them a bit unusual compared to the
usual Microchip branded Ocelot switches.
To be precise, looking at
Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml, one can
see 21 memory regions for the "switch" node, and these correspond to the
"targets" of the switch IP, which are spread throughout the guts of that
SoC's memory space.
In NXP integrations, those targets still exist, but they were condensed
within a single memory region, with no other peripheral in between them,
so it made more sense for the driver to ioremap the entire memory space
of the switch, and then find the targets within that memory space via
some offsets hardcoded in the driver.
The effect of this design decision is that now, the felix driver expects
hardware instantiations to provide their own resource definitions, which
is kind of odd when considering a typical device (those are retrieved
from 'reg' properties in the device tree, using platform_get_resource()
or similar).
Allow other hardware instantiations that share the felix driver to not
provide a hardcoded array of resources in the future. Instead, make the
common denominator based on which regmaps are created be just the
resource "names". Each instantiation comes with its own array of names
that are mandatory for it, and with an optional array of resources.
So we split the resources in 2 arrays, one is what's requested and the
other is what's provided. There is one pool of provided resources, in
felix->info->resources (of length felix->info->num_resources). There are
2 different ways of requesting a resource. One is by enum ocelot_target
(this handles the global regmaps), and one is by int port (this handles
the per-port ones).
For the existing vsc9959 and vsc9953, it would be a bit stupid to
request something that's not provided, given that the 2 arrays are both
defined in the same place.
The advantage is that we can now modify felix_request_regmap_by_name()
to make felix->info->resources[] optional, and if absent, the
implementation can call dev_get_regmap() and this is something that is
compatible with MFD.
Co-developed-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-27 19:15:20 +00:00
|
|
|
DEFINE_RES_MEM_NAMED(0x0200000, 0x0020000, "qsys"),
|
|
|
|
DEFINE_RES_MEM_NAMED(0x0280000, 0x0010000, "ana"),
|
|
|
|
};
|
|
|
|
|
|
|
|
static const char * const vsc9959_resource_names[TARGET_MAX] = {
|
|
|
|
[SYS] = "sys",
|
|
|
|
[REW] = "rew",
|
|
|
|
[S0] = "s0",
|
|
|
|
[S1] = "s1",
|
|
|
|
[S2] = "s2",
|
|
|
|
[GCB] = "devcpu_gcb",
|
|
|
|
[QS] = "qs",
|
|
|
|
[PTP] = "ptp",
|
|
|
|
[QSYS] = "qsys",
|
|
|
|
[ANA] = "ana",
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
};
|
|
|
|
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
/* Port MAC 0 Internal MDIO bus through which the SerDes acting as an
|
|
|
|
* SGMII/QSGMII MAC PCS can be found.
|
|
|
|
*/
|
2022-09-27 19:15:19 +00:00
|
|
|
static const struct resource vsc9959_imdio_res =
|
2023-02-24 15:52:34 +00:00
|
|
|
DEFINE_RES_MEM_NAMED(0x8030, 0x10, "imdio");
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
|
2020-07-13 16:57:02 +00:00
|
|
|
static const struct reg_field vsc9959_regfields[REGFIELD_MAX] = {
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
[ANA_ADVLEARN_VLAN_CHK] = REG_FIELD(ANA_ADVLEARN, 6, 6),
|
|
|
|
[ANA_ADVLEARN_LEARN_MIRROR] = REG_FIELD(ANA_ADVLEARN, 0, 5),
|
|
|
|
[ANA_ANEVENTS_FLOOD_DISCARD] = REG_FIELD(ANA_ANEVENTS, 30, 30),
|
|
|
|
[ANA_ANEVENTS_AUTOAGE] = REG_FIELD(ANA_ANEVENTS, 26, 26),
|
|
|
|
[ANA_ANEVENTS_STORM_DROP] = REG_FIELD(ANA_ANEVENTS, 24, 24),
|
|
|
|
[ANA_ANEVENTS_LEARN_DROP] = REG_FIELD(ANA_ANEVENTS, 23, 23),
|
|
|
|
[ANA_ANEVENTS_AGED_ENTRY] = REG_FIELD(ANA_ANEVENTS, 22, 22),
|
|
|
|
[ANA_ANEVENTS_CPU_LEARN_FAILED] = REG_FIELD(ANA_ANEVENTS, 21, 21),
|
|
|
|
[ANA_ANEVENTS_AUTO_LEARN_FAILED] = REG_FIELD(ANA_ANEVENTS, 20, 20),
|
|
|
|
[ANA_ANEVENTS_LEARN_REMOVE] = REG_FIELD(ANA_ANEVENTS, 19, 19),
|
|
|
|
[ANA_ANEVENTS_AUTO_LEARNED] = REG_FIELD(ANA_ANEVENTS, 18, 18),
|
|
|
|
[ANA_ANEVENTS_AUTO_MOVED] = REG_FIELD(ANA_ANEVENTS, 17, 17),
|
|
|
|
[ANA_ANEVENTS_CLASSIFIED_DROP] = REG_FIELD(ANA_ANEVENTS, 15, 15),
|
|
|
|
[ANA_ANEVENTS_CLASSIFIED_COPY] = REG_FIELD(ANA_ANEVENTS, 14, 14),
|
|
|
|
[ANA_ANEVENTS_VLAN_DISCARD] = REG_FIELD(ANA_ANEVENTS, 13, 13),
|
|
|
|
[ANA_ANEVENTS_FWD_DISCARD] = REG_FIELD(ANA_ANEVENTS, 12, 12),
|
|
|
|
[ANA_ANEVENTS_MULTICAST_FLOOD] = REG_FIELD(ANA_ANEVENTS, 11, 11),
|
|
|
|
[ANA_ANEVENTS_UNICAST_FLOOD] = REG_FIELD(ANA_ANEVENTS, 10, 10),
|
|
|
|
[ANA_ANEVENTS_DEST_KNOWN] = REG_FIELD(ANA_ANEVENTS, 9, 9),
|
|
|
|
[ANA_ANEVENTS_BUCKET3_MATCH] = REG_FIELD(ANA_ANEVENTS, 8, 8),
|
|
|
|
[ANA_ANEVENTS_BUCKET2_MATCH] = REG_FIELD(ANA_ANEVENTS, 7, 7),
|
|
|
|
[ANA_ANEVENTS_BUCKET1_MATCH] = REG_FIELD(ANA_ANEVENTS, 6, 6),
|
|
|
|
[ANA_ANEVENTS_BUCKET0_MATCH] = REG_FIELD(ANA_ANEVENTS, 5, 5),
|
|
|
|
[ANA_ANEVENTS_CPU_OPERATION] = REG_FIELD(ANA_ANEVENTS, 4, 4),
|
|
|
|
[ANA_ANEVENTS_DMAC_LOOKUP] = REG_FIELD(ANA_ANEVENTS, 3, 3),
|
|
|
|
[ANA_ANEVENTS_SMAC_LOOKUP] = REG_FIELD(ANA_ANEVENTS, 2, 2),
|
|
|
|
[ANA_ANEVENTS_SEQ_GEN_ERR_0] = REG_FIELD(ANA_ANEVENTS, 1, 1),
|
|
|
|
[ANA_ANEVENTS_SEQ_GEN_ERR_1] = REG_FIELD(ANA_ANEVENTS, 0, 0),
|
|
|
|
[ANA_TABLES_MACACCESS_B_DOM] = REG_FIELD(ANA_TABLES_MACACCESS, 16, 16),
|
|
|
|
[ANA_TABLES_MACTINDX_BUCKET] = REG_FIELD(ANA_TABLES_MACTINDX, 11, 12),
|
|
|
|
[ANA_TABLES_MACTINDX_M_INDEX] = REG_FIELD(ANA_TABLES_MACTINDX, 0, 10),
|
|
|
|
[SYS_RESET_CFG_CORE_ENA] = REG_FIELD(SYS_RESET_CFG, 0, 0),
|
|
|
|
[GCB_SOFT_RST_SWC_RST] = REG_FIELD(GCB_SOFT_RST, 0, 0),
|
net: mscc: ocelot: convert QSYS_SWITCH_PORT_MODE and SYS_PORT_MODE to regfields
Currently Felix and Ocelot share the same bit layout in these per-port
registers, but Seville does not. So we need reg_fields for that.
Actually since these are per-port registers, we need to also specify the
number of ports, and register size per port, and use the regmap API for
multiple ports.
There's a more subtle point to be made about the other 2 register
fields:
- QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG
- QSYS_SWITCH_PORT_MODE_INGRESS_DROP_MODE
which we are not writing any longer, for 2 reasons:
- Using the previous API (ocelot_write_rix), we were only writing 1 for
Felix and Ocelot, which was their hardware-default value, and which
there wasn't any intention in changing.
- In the case of SCH_NEXT_CFG, in fact Seville does not have this
register field at all, and therefore, if we want to have common code
we would be required to not write to it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 16:57:03 +00:00
|
|
|
/* Replicated per number of ports (7), register size 4 per port */
|
|
|
|
[QSYS_SWITCH_PORT_MODE_PORT_ENA] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 14, 14, 7, 4),
|
|
|
|
[QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 11, 13, 7, 4),
|
|
|
|
[QSYS_SWITCH_PORT_MODE_YEL_RSRVD] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 10, 10, 7, 4),
|
|
|
|
[QSYS_SWITCH_PORT_MODE_INGRESS_DROP_MODE] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 9, 9, 7, 4),
|
|
|
|
[QSYS_SWITCH_PORT_MODE_TX_PFC_ENA] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 1, 8, 7, 4),
|
|
|
|
[QSYS_SWITCH_PORT_MODE_TX_PFC_MODE] = REG_FIELD_ID(QSYS_SWITCH_PORT_MODE, 0, 0, 7, 4),
|
|
|
|
[SYS_PORT_MODE_DATA_WO_TS] = REG_FIELD_ID(SYS_PORT_MODE, 5, 6, 7, 4),
|
|
|
|
[SYS_PORT_MODE_INCL_INJ_HDR] = REG_FIELD_ID(SYS_PORT_MODE, 3, 4, 7, 4),
|
|
|
|
[SYS_PORT_MODE_INCL_XTR_HDR] = REG_FIELD_ID(SYS_PORT_MODE, 1, 2, 7, 4),
|
|
|
|
[SYS_PORT_MODE_INCL_HDR_ERR] = REG_FIELD_ID(SYS_PORT_MODE, 0, 0, 7, 4),
|
2020-07-13 16:57:07 +00:00
|
|
|
[SYS_PAUSE_CFG_PAUSE_START] = REG_FIELD_ID(SYS_PAUSE_CFG, 10, 18, 7, 4),
|
|
|
|
[SYS_PAUSE_CFG_PAUSE_STOP] = REG_FIELD_ID(SYS_PAUSE_CFG, 1, 9, 7, 4),
|
|
|
|
[SYS_PAUSE_CFG_PAUSE_ENA] = REG_FIELD_ID(SYS_PAUSE_CFG, 0, 1, 7, 4),
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
};
|
|
|
|
|
2020-09-29 22:27:25 +00:00
|
|
|
static const struct vcap_field vsc9959_vcap_es0_keys[] = {
|
|
|
|
[VCAP_ES0_EGR_PORT] = { 0, 3},
|
|
|
|
[VCAP_ES0_IGR_PORT] = { 3, 3},
|
|
|
|
[VCAP_ES0_RSV] = { 6, 2},
|
|
|
|
[VCAP_ES0_L2_MC] = { 8, 1},
|
|
|
|
[VCAP_ES0_L2_BC] = { 9, 1},
|
|
|
|
[VCAP_ES0_VID] = { 10, 12},
|
|
|
|
[VCAP_ES0_DP] = { 22, 1},
|
|
|
|
[VCAP_ES0_PCP] = { 23, 3},
|
|
|
|
};
|
|
|
|
|
|
|
|
static const struct vcap_field vsc9959_vcap_es0_actions[] = {
|
|
|
|
[VCAP_ES0_ACT_PUSH_OUTER_TAG] = { 0, 2},
|
|
|
|
[VCAP_ES0_ACT_PUSH_INNER_TAG] = { 2, 1},
|
|
|
|
[VCAP_ES0_ACT_TAG_A_TPID_SEL] = { 3, 2},
|
|
|
|
[VCAP_ES0_ACT_TAG_A_VID_SEL] = { 5, 1},
|
|
|
|
[VCAP_ES0_ACT_TAG_A_PCP_SEL] = { 6, 2},
|
|
|
|
[VCAP_ES0_ACT_TAG_A_DEI_SEL] = { 8, 2},
|
|
|
|
[VCAP_ES0_ACT_TAG_B_TPID_SEL] = { 10, 2},
|
|
|
|
[VCAP_ES0_ACT_TAG_B_VID_SEL] = { 12, 1},
|
|
|
|
[VCAP_ES0_ACT_TAG_B_PCP_SEL] = { 13, 2},
|
|
|
|
[VCAP_ES0_ACT_TAG_B_DEI_SEL] = { 15, 2},
|
|
|
|
[VCAP_ES0_ACT_VID_A_VAL] = { 17, 12},
|
|
|
|
[VCAP_ES0_ACT_PCP_A_VAL] = { 29, 3},
|
|
|
|
[VCAP_ES0_ACT_DEI_A_VAL] = { 32, 1},
|
|
|
|
[VCAP_ES0_ACT_VID_B_VAL] = { 33, 12},
|
|
|
|
[VCAP_ES0_ACT_PCP_B_VAL] = { 45, 3},
|
|
|
|
[VCAP_ES0_ACT_DEI_B_VAL] = { 48, 1},
|
|
|
|
[VCAP_ES0_ACT_RSV] = { 49, 23},
|
|
|
|
[VCAP_ES0_ACT_HIT_STICKY] = { 72, 1},
|
|
|
|
};
|
|
|
|
|
2020-09-29 22:27:24 +00:00
|
|
|
static const struct vcap_field vsc9959_vcap_is1_keys[] = {
|
|
|
|
[VCAP_IS1_HK_TYPE] = { 0, 1},
|
|
|
|
[VCAP_IS1_HK_LOOKUP] = { 1, 2},
|
|
|
|
[VCAP_IS1_HK_IGR_PORT_MASK] = { 3, 7},
|
|
|
|
[VCAP_IS1_HK_RSV] = { 10, 9},
|
|
|
|
[VCAP_IS1_HK_OAM_Y1731] = { 19, 1},
|
|
|
|
[VCAP_IS1_HK_L2_MC] = { 20, 1},
|
|
|
|
[VCAP_IS1_HK_L2_BC] = { 21, 1},
|
|
|
|
[VCAP_IS1_HK_IP_MC] = { 22, 1},
|
|
|
|
[VCAP_IS1_HK_VLAN_TAGGED] = { 23, 1},
|
|
|
|
[VCAP_IS1_HK_VLAN_DBL_TAGGED] = { 24, 1},
|
|
|
|
[VCAP_IS1_HK_TPID] = { 25, 1},
|
|
|
|
[VCAP_IS1_HK_VID] = { 26, 12},
|
|
|
|
[VCAP_IS1_HK_DEI] = { 38, 1},
|
|
|
|
[VCAP_IS1_HK_PCP] = { 39, 3},
|
|
|
|
/* Specific Fields for IS1 Half Key S1_NORMAL */
|
|
|
|
[VCAP_IS1_HK_L2_SMAC] = { 42, 48},
|
|
|
|
[VCAP_IS1_HK_ETYPE_LEN] = { 90, 1},
|
|
|
|
[VCAP_IS1_HK_ETYPE] = { 91, 16},
|
|
|
|
[VCAP_IS1_HK_IP_SNAP] = {107, 1},
|
|
|
|
[VCAP_IS1_HK_IP4] = {108, 1},
|
|
|
|
/* Layer-3 Information */
|
|
|
|
[VCAP_IS1_HK_L3_FRAGMENT] = {109, 1},
|
|
|
|
[VCAP_IS1_HK_L3_FRAG_OFS_GT0] = {110, 1},
|
|
|
|
[VCAP_IS1_HK_L3_OPTIONS] = {111, 1},
|
|
|
|
[VCAP_IS1_HK_L3_DSCP] = {112, 6},
|
|
|
|
[VCAP_IS1_HK_L3_IP4_SIP] = {118, 32},
|
|
|
|
/* Layer-4 Information */
|
|
|
|
[VCAP_IS1_HK_TCP_UDP] = {150, 1},
|
|
|
|
[VCAP_IS1_HK_TCP] = {151, 1},
|
|
|
|
[VCAP_IS1_HK_L4_SPORT] = {152, 16},
|
|
|
|
[VCAP_IS1_HK_L4_RNG] = {168, 8},
|
|
|
|
/* Specific Fields for IS1 Half Key S1_5TUPLE_IP4 */
|
|
|
|
[VCAP_IS1_HK_IP4_INNER_TPID] = { 42, 1},
|
|
|
|
[VCAP_IS1_HK_IP4_INNER_VID] = { 43, 12},
|
|
|
|
[VCAP_IS1_HK_IP4_INNER_DEI] = { 55, 1},
|
|
|
|
[VCAP_IS1_HK_IP4_INNER_PCP] = { 56, 3},
|
|
|
|
[VCAP_IS1_HK_IP4_IP4] = { 59, 1},
|
|
|
|
[VCAP_IS1_HK_IP4_L3_FRAGMENT] = { 60, 1},
|
|
|
|
[VCAP_IS1_HK_IP4_L3_FRAG_OFS_GT0] = { 61, 1},
|
|
|
|
[VCAP_IS1_HK_IP4_L3_OPTIONS] = { 62, 1},
|
|
|
|
[VCAP_IS1_HK_IP4_L3_DSCP] = { 63, 6},
|
|
|
|
[VCAP_IS1_HK_IP4_L3_IP4_DIP] = { 69, 32},
|
|
|
|
[VCAP_IS1_HK_IP4_L3_IP4_SIP] = {101, 32},
|
|
|
|
[VCAP_IS1_HK_IP4_L3_PROTO] = {133, 8},
|
|
|
|
[VCAP_IS1_HK_IP4_TCP_UDP] = {141, 1},
|
|
|
|
[VCAP_IS1_HK_IP4_TCP] = {142, 1},
|
|
|
|
[VCAP_IS1_HK_IP4_L4_RNG] = {143, 8},
|
|
|
|
[VCAP_IS1_HK_IP4_IP_PAYLOAD_S1_5TUPLE] = {151, 32},
|
|
|
|
};
|
|
|
|
|
|
|
|
static const struct vcap_field vsc9959_vcap_is1_actions[] = {
|
|
|
|
[VCAP_IS1_ACT_DSCP_ENA] = { 0, 1},
|
|
|
|
[VCAP_IS1_ACT_DSCP_VAL] = { 1, 6},
|
|
|
|
[VCAP_IS1_ACT_QOS_ENA] = { 7, 1},
|
|
|
|
[VCAP_IS1_ACT_QOS_VAL] = { 8, 3},
|
|
|
|
[VCAP_IS1_ACT_DP_ENA] = { 11, 1},
|
|
|
|
[VCAP_IS1_ACT_DP_VAL] = { 12, 1},
|
|
|
|
[VCAP_IS1_ACT_PAG_OVERRIDE_MASK] = { 13, 8},
|
|
|
|
[VCAP_IS1_ACT_PAG_VAL] = { 21, 8},
|
|
|
|
[VCAP_IS1_ACT_RSV] = { 29, 9},
|
net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1
VCAP IS1 is a VCAP module which can filter on the most common L2/L3/L4
Ethernet keys, and modify the results of the basic QoS classification
and VLAN classification based on those flow keys.
There are 3 VCAP IS1 lookups, mapped over chains 10000, 11000 and 12000.
Currently the driver is hardcoded to use IS1_ACTION_TYPE_NORMAL half
keys.
Note that the VLAN_MANGLE has been omitted for now. In hardware, the
VCAP_IS1_ACT_VID_REPLACE_ENA field replaces the classified VLAN
(metadata associated with the frame) and not the VLAN from the header
itself. There are currently some issues which need to be addressed when
operating in standalone, or in bridge with vlan_filtering=0 modes,
because in those cases the switch ports have VLAN awareness disabled,
and changing the classified VLAN to anything other than the pvid causes
the packets to be dropped. Another issue is that on egress, we expect
port tagging to push the classified VLAN, but port tagging is disabled
in the modes mentioned above, so although the classified VLAN is
replaced, it is not visible in the packet transmitted by the switch.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02 12:02:23 +00:00
|
|
|
/* The fields below are incorrectly shifted by 2 in the manual */
|
2020-09-29 22:27:24 +00:00
|
|
|
[VCAP_IS1_ACT_VID_REPLACE_ENA] = { 38, 1},
|
|
|
|
[VCAP_IS1_ACT_VID_ADD_VAL] = { 39, 12},
|
|
|
|
[VCAP_IS1_ACT_FID_SEL] = { 51, 2},
|
|
|
|
[VCAP_IS1_ACT_FID_VAL] = { 53, 13},
|
|
|
|
[VCAP_IS1_ACT_PCP_DEI_ENA] = { 66, 1},
|
|
|
|
[VCAP_IS1_ACT_PCP_VAL] = { 67, 3},
|
|
|
|
[VCAP_IS1_ACT_DEI_VAL] = { 70, 1},
|
|
|
|
[VCAP_IS1_ACT_VLAN_POP_CNT_ENA] = { 71, 1},
|
|
|
|
[VCAP_IS1_ACT_VLAN_POP_CNT] = { 72, 2},
|
|
|
|
[VCAP_IS1_ACT_CUSTOM_ACE_TYPE_ENA] = { 74, 4},
|
|
|
|
[VCAP_IS1_ACT_HIT_STICKY] = { 78, 1},
|
|
|
|
};
|
|
|
|
|
2020-06-20 15:43:36 +00:00
|
|
|
static struct vcap_field vsc9959_vcap_is2_keys[] = {
|
2020-02-29 14:31:14 +00:00
|
|
|
/* Common: 41 bits */
|
|
|
|
[VCAP_IS2_TYPE] = { 0, 4},
|
|
|
|
[VCAP_IS2_HK_FIRST] = { 4, 1},
|
|
|
|
[VCAP_IS2_HK_PAG] = { 5, 8},
|
|
|
|
[VCAP_IS2_HK_IGR_PORT_MASK] = { 13, 7},
|
|
|
|
[VCAP_IS2_HK_RSV2] = { 20, 1},
|
|
|
|
[VCAP_IS2_HK_HOST_MATCH] = { 21, 1},
|
|
|
|
[VCAP_IS2_HK_L2_MC] = { 22, 1},
|
|
|
|
[VCAP_IS2_HK_L2_BC] = { 23, 1},
|
|
|
|
[VCAP_IS2_HK_VLAN_TAGGED] = { 24, 1},
|
|
|
|
[VCAP_IS2_HK_VID] = { 25, 12},
|
|
|
|
[VCAP_IS2_HK_DEI] = { 37, 1},
|
|
|
|
[VCAP_IS2_HK_PCP] = { 38, 3},
|
|
|
|
/* MAC_ETYPE / MAC_LLC / MAC_SNAP / OAM common */
|
|
|
|
[VCAP_IS2_HK_L2_DMAC] = { 41, 48},
|
|
|
|
[VCAP_IS2_HK_L2_SMAC] = { 89, 48},
|
|
|
|
/* MAC_ETYPE (TYPE=000) */
|
|
|
|
[VCAP_IS2_HK_MAC_ETYPE_ETYPE] = {137, 16},
|
|
|
|
[VCAP_IS2_HK_MAC_ETYPE_L2_PAYLOAD0] = {153, 16},
|
|
|
|
[VCAP_IS2_HK_MAC_ETYPE_L2_PAYLOAD1] = {169, 8},
|
|
|
|
[VCAP_IS2_HK_MAC_ETYPE_L2_PAYLOAD2] = {177, 3},
|
|
|
|
/* MAC_LLC (TYPE=001) */
|
|
|
|
[VCAP_IS2_HK_MAC_LLC_L2_LLC] = {137, 40},
|
|
|
|
/* MAC_SNAP (TYPE=010) */
|
|
|
|
[VCAP_IS2_HK_MAC_SNAP_L2_SNAP] = {137, 40},
|
|
|
|
/* MAC_ARP (TYPE=011) */
|
|
|
|
[VCAP_IS2_HK_MAC_ARP_SMAC] = { 41, 48},
|
|
|
|
[VCAP_IS2_HK_MAC_ARP_ADDR_SPACE_OK] = { 89, 1},
|
|
|
|
[VCAP_IS2_HK_MAC_ARP_PROTO_SPACE_OK] = { 90, 1},
|
|
|
|
[VCAP_IS2_HK_MAC_ARP_LEN_OK] = { 91, 1},
|
|
|
|
[VCAP_IS2_HK_MAC_ARP_TARGET_MATCH] = { 92, 1},
|
|
|
|
[VCAP_IS2_HK_MAC_ARP_SENDER_MATCH] = { 93, 1},
|
|
|
|
[VCAP_IS2_HK_MAC_ARP_OPCODE_UNKNOWN] = { 94, 1},
|
|
|
|
[VCAP_IS2_HK_MAC_ARP_OPCODE] = { 95, 2},
|
|
|
|
[VCAP_IS2_HK_MAC_ARP_L3_IP4_DIP] = { 97, 32},
|
|
|
|
[VCAP_IS2_HK_MAC_ARP_L3_IP4_SIP] = {129, 32},
|
|
|
|
[VCAP_IS2_HK_MAC_ARP_DIP_EQ_SIP] = {161, 1},
|
|
|
|
/* IP4_TCP_UDP / IP4_OTHER common */
|
|
|
|
[VCAP_IS2_HK_IP4] = { 41, 1},
|
|
|
|
[VCAP_IS2_HK_L3_FRAGMENT] = { 42, 1},
|
|
|
|
[VCAP_IS2_HK_L3_FRAG_OFS_GT0] = { 43, 1},
|
|
|
|
[VCAP_IS2_HK_L3_OPTIONS] = { 44, 1},
|
|
|
|
[VCAP_IS2_HK_IP4_L3_TTL_GT0] = { 45, 1},
|
|
|
|
[VCAP_IS2_HK_L3_TOS] = { 46, 8},
|
|
|
|
[VCAP_IS2_HK_L3_IP4_DIP] = { 54, 32},
|
|
|
|
[VCAP_IS2_HK_L3_IP4_SIP] = { 86, 32},
|
|
|
|
[VCAP_IS2_HK_DIP_EQ_SIP] = {118, 1},
|
|
|
|
/* IP4_TCP_UDP (TYPE=100) */
|
|
|
|
[VCAP_IS2_HK_TCP] = {119, 1},
|
2020-09-21 22:56:36 +00:00
|
|
|
[VCAP_IS2_HK_L4_DPORT] = {120, 16},
|
|
|
|
[VCAP_IS2_HK_L4_SPORT] = {136, 16},
|
2020-02-29 14:31:14 +00:00
|
|
|
[VCAP_IS2_HK_L4_RNG] = {152, 8},
|
|
|
|
[VCAP_IS2_HK_L4_SPORT_EQ_DPORT] = {160, 1},
|
|
|
|
[VCAP_IS2_HK_L4_SEQUENCE_EQ0] = {161, 1},
|
2020-09-21 22:56:36 +00:00
|
|
|
[VCAP_IS2_HK_L4_FIN] = {162, 1},
|
|
|
|
[VCAP_IS2_HK_L4_SYN] = {163, 1},
|
|
|
|
[VCAP_IS2_HK_L4_RST] = {164, 1},
|
|
|
|
[VCAP_IS2_HK_L4_PSH] = {165, 1},
|
|
|
|
[VCAP_IS2_HK_L4_ACK] = {166, 1},
|
|
|
|
[VCAP_IS2_HK_L4_URG] = {167, 1},
|
2020-02-29 14:31:14 +00:00
|
|
|
[VCAP_IS2_HK_L4_1588_DOM] = {168, 8},
|
|
|
|
[VCAP_IS2_HK_L4_1588_VER] = {176, 4},
|
|
|
|
/* IP4_OTHER (TYPE=101) */
|
|
|
|
[VCAP_IS2_HK_IP4_L3_PROTO] = {119, 8},
|
|
|
|
[VCAP_IS2_HK_L3_PAYLOAD] = {127, 56},
|
|
|
|
/* IP6_STD (TYPE=110) */
|
|
|
|
[VCAP_IS2_HK_IP6_L3_TTL_GT0] = { 41, 1},
|
|
|
|
[VCAP_IS2_HK_L3_IP6_SIP] = { 42, 128},
|
|
|
|
[VCAP_IS2_HK_IP6_L3_PROTO] = {170, 8},
|
|
|
|
/* OAM (TYPE=111) */
|
|
|
|
[VCAP_IS2_HK_OAM_MEL_FLAGS] = {137, 7},
|
|
|
|
[VCAP_IS2_HK_OAM_VER] = {144, 5},
|
|
|
|
[VCAP_IS2_HK_OAM_OPCODE] = {149, 8},
|
|
|
|
[VCAP_IS2_HK_OAM_FLAGS] = {157, 8},
|
|
|
|
[VCAP_IS2_HK_OAM_MEPID] = {165, 16},
|
|
|
|
[VCAP_IS2_HK_OAM_CCM_CNTS_EQ0] = {181, 1},
|
|
|
|
[VCAP_IS2_HK_OAM_IS_Y1731] = {182, 1},
|
|
|
|
};
|
|
|
|
|
2020-06-20 15:43:36 +00:00
|
|
|
static struct vcap_field vsc9959_vcap_is2_actions[] = {
|
2020-02-29 14:31:14 +00:00
|
|
|
[VCAP_IS2_ACT_HIT_ME_ONCE] = { 0, 1},
|
|
|
|
[VCAP_IS2_ACT_CPU_COPY_ENA] = { 1, 1},
|
|
|
|
[VCAP_IS2_ACT_CPU_QU_NUM] = { 2, 3},
|
|
|
|
[VCAP_IS2_ACT_MASK_MODE] = { 5, 2},
|
|
|
|
[VCAP_IS2_ACT_MIRROR_ENA] = { 7, 1},
|
|
|
|
[VCAP_IS2_ACT_LRN_DIS] = { 8, 1},
|
|
|
|
[VCAP_IS2_ACT_POLICE_ENA] = { 9, 1},
|
|
|
|
[VCAP_IS2_ACT_POLICE_IDX] = { 10, 9},
|
|
|
|
[VCAP_IS2_ACT_POLICE_VCAP_ONLY] = { 19, 1},
|
2020-09-29 11:20:24 +00:00
|
|
|
[VCAP_IS2_ACT_PORT_MASK] = { 20, 6},
|
|
|
|
[VCAP_IS2_ACT_REW_OP] = { 26, 9},
|
|
|
|
[VCAP_IS2_ACT_SMAC_REPLACE_ENA] = { 35, 1},
|
|
|
|
[VCAP_IS2_ACT_RSV] = { 36, 2},
|
|
|
|
[VCAP_IS2_ACT_ACL_ID] = { 38, 6},
|
|
|
|
[VCAP_IS2_ACT_HIT_CNT] = { 44, 32},
|
2020-02-29 14:31:14 +00:00
|
|
|
};
|
|
|
|
|
2020-09-29 22:27:26 +00:00
|
|
|
static struct vcap_props vsc9959_vcap_props[] = {
|
2020-09-29 22:27:25 +00:00
|
|
|
[VCAP_ES0] = {
|
|
|
|
.action_type_width = 0,
|
|
|
|
.action_table = {
|
|
|
|
[ES0_ACTION_TYPE_NORMAL] = {
|
|
|
|
.width = 72, /* HIT_STICKY not included */
|
|
|
|
.count = 1,
|
|
|
|
},
|
|
|
|
},
|
|
|
|
.target = S0,
|
|
|
|
.keys = vsc9959_vcap_es0_keys,
|
|
|
|
.actions = vsc9959_vcap_es0_actions,
|
|
|
|
},
|
2020-09-29 22:27:24 +00:00
|
|
|
[VCAP_IS1] = {
|
|
|
|
.action_type_width = 0,
|
|
|
|
.action_table = {
|
|
|
|
[IS1_ACTION_TYPE_NORMAL] = {
|
|
|
|
.width = 78, /* HIT_STICKY not included */
|
|
|
|
.count = 4,
|
|
|
|
},
|
|
|
|
},
|
|
|
|
.target = S1,
|
|
|
|
.keys = vsc9959_vcap_is1_keys,
|
|
|
|
.actions = vsc9959_vcap_is1_actions,
|
|
|
|
},
|
2020-02-29 14:31:14 +00:00
|
|
|
[VCAP_IS2] = {
|
|
|
|
.action_type_width = 1,
|
|
|
|
.action_table = {
|
|
|
|
[IS2_ACTION_TYPE_NORMAL] = {
|
|
|
|
.width = 44,
|
|
|
|
.count = 2
|
|
|
|
},
|
|
|
|
[IS2_ACTION_TYPE_SMAC_SIP] = {
|
|
|
|
.width = 6,
|
|
|
|
.count = 4
|
|
|
|
},
|
|
|
|
},
|
net: mscc: ocelot: generalize existing code for VCAP
In the Ocelot switches there are 3 TCAMs: VCAP ES0, IS1 and IS2, which
have the same configuration interface, but different sets of keys and
actions. The driver currently only supports VCAP IS2.
In preparation of VCAP IS1 and ES0 support, the existing code must be
generalized to work with any VCAP.
In that direction, we should move the structures that depend upon VCAP
instantiation, like vcap_is2_keys and vcap_is2_actions, out of struct
ocelot and into struct vcap_props .keys and .actions, a structure that
is replicated 3 times, once per VCAP. We'll pass that structure as an
argument to each function that does the key and action packing - only
the control logic needs to distinguish between ocelot->vcap[VCAP_IS2]
or IS1 or ES0.
Another change is to make use of the newly introduced ocelot_target_read
and ocelot_target_write API, since the 3 VCAPs have the same registers
but put at different addresses.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29 22:27:23 +00:00
|
|
|
.target = S2,
|
|
|
|
.keys = vsc9959_vcap_is2_keys,
|
|
|
|
.actions = vsc9959_vcap_is2_actions,
|
2020-02-29 14:31:14 +00:00
|
|
|
},
|
|
|
|
};
|
|
|
|
|
2020-09-18 10:57:52 +00:00
|
|
|
static const struct ptp_clock_info vsc9959_ptp_caps = {
|
|
|
|
.owner = THIS_MODULE,
|
|
|
|
.name = "felix ptp",
|
|
|
|
.max_adj = 0x7fffffff,
|
|
|
|
.n_alarm = 0,
|
|
|
|
.n_ext_ts = 0,
|
|
|
|
.n_per_out = OCELOT_PTP_PINS_NUM,
|
|
|
|
.n_pins = OCELOT_PTP_PINS_NUM,
|
|
|
|
.pps = 0,
|
|
|
|
.gettime64 = ocelot_ptp_gettime64,
|
|
|
|
.settime64 = ocelot_ptp_settime64,
|
|
|
|
.adjtime = ocelot_ptp_adjtime,
|
|
|
|
.adjfine = ocelot_ptp_adjfine,
|
|
|
|
.verify = ocelot_ptp_verify,
|
|
|
|
.enable = ocelot_ptp_enable,
|
|
|
|
};
|
|
|
|
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
#define VSC9959_INIT_TIMEOUT 50000
|
|
|
|
#define VSC9959_GCB_RST_SLEEP 100
|
|
|
|
#define VSC9959_SYS_RAMINIT_SLEEP 80
|
|
|
|
|
|
|
|
static int vsc9959_gcb_soft_rst_status(struct ocelot *ocelot)
|
|
|
|
{
|
|
|
|
int val;
|
|
|
|
|
2020-09-18 10:57:43 +00:00
|
|
|
ocelot_field_read(ocelot, GCB_SOFT_RST_SWC_RST, &val);
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
|
|
|
|
return val;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vsc9959_sys_ram_init_status(struct ocelot *ocelot)
|
|
|
|
{
|
|
|
|
return ocelot_read(ocelot, SYS_RAM_INIT);
|
|
|
|
}
|
|
|
|
|
2020-09-18 10:57:46 +00:00
|
|
|
/* CORE_ENA is in SYS:SYSTEM:RESET_CFG
|
|
|
|
* RAM_INIT is in SYS:RAM_CTRL:RAM_INIT
|
|
|
|
*/
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
static int vsc9959_reset(struct ocelot *ocelot)
|
|
|
|
{
|
|
|
|
int val, err;
|
|
|
|
|
|
|
|
/* soft-reset the switch core */
|
2020-09-18 10:57:43 +00:00
|
|
|
ocelot_field_write(ocelot, GCB_SOFT_RST_SWC_RST, 1);
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
|
|
|
|
err = readx_poll_timeout(vsc9959_gcb_soft_rst_status, ocelot, val, !val,
|
|
|
|
VSC9959_GCB_RST_SLEEP, VSC9959_INIT_TIMEOUT);
|
|
|
|
if (err) {
|
|
|
|
dev_err(ocelot->dev, "timeout: switch core reset\n");
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* initialize switch mem ~40us */
|
|
|
|
ocelot_write(ocelot, SYS_RAM_INIT_RAM_INIT, SYS_RAM_INIT);
|
|
|
|
err = readx_poll_timeout(vsc9959_sys_ram_init_status, ocelot, val, !val,
|
|
|
|
VSC9959_SYS_RAMINIT_SLEEP,
|
|
|
|
VSC9959_INIT_TIMEOUT);
|
|
|
|
if (err) {
|
|
|
|
dev_err(ocelot->dev, "timeout: switch sram init\n");
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* enable switch core */
|
2020-09-18 10:57:43 +00:00
|
|
|
ocelot_field_write(ocelot, SYS_RESET_CFG_CORE_ENA, 1);
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-07-13 16:57:08 +00:00
|
|
|
/* Watermark encode
|
|
|
|
* Bit 8: Unit; 0:1, 1:16
|
|
|
|
* Bit 7-0: Value to be multiplied with unit
|
|
|
|
*/
|
|
|
|
static u16 vsc9959_wm_enc(u16 value)
|
|
|
|
{
|
2020-10-05 09:09:12 +00:00
|
|
|
WARN_ON(value >= 16 * BIT(8));
|
|
|
|
|
2020-07-13 16:57:08 +00:00
|
|
|
if (value >= BIT(8))
|
|
|
|
return BIT(8) | (value / 16);
|
|
|
|
|
|
|
|
return value;
|
|
|
|
}
|
|
|
|
|
2021-01-15 02:11:12 +00:00
|
|
|
static u16 vsc9959_wm_dec(u16 wm)
|
|
|
|
{
|
|
|
|
WARN_ON(wm & ~GENMASK(8, 0));
|
|
|
|
|
|
|
|
if (wm & BIT(8))
|
|
|
|
return (wm & GENMASK(7, 0)) * 16;
|
|
|
|
|
|
|
|
return wm;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vsc9959_wm_stat(u32 val, u32 *inuse, u32 *maxuse)
|
|
|
|
{
|
|
|
|
*inuse = (val & GENMASK(23, 12)) >> 12;
|
|
|
|
*maxuse = val & GENMASK(11, 0);
|
|
|
|
}
|
|
|
|
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
static int vsc9959_mdio_bus_alloc(struct ocelot *ocelot)
|
|
|
|
{
|
2022-09-27 19:15:17 +00:00
|
|
|
struct pci_dev *pdev = to_pci_dev(ocelot->dev);
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
struct felix *felix = ocelot_to_felix(ocelot);
|
|
|
|
struct enetc_mdio_priv *mdio_priv;
|
|
|
|
struct device *dev = ocelot->dev;
|
2022-09-27 19:15:17 +00:00
|
|
|
resource_size_t imdio_base;
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
void __iomem *imdio_regs;
|
2020-05-22 08:54:34 +00:00
|
|
|
struct resource res;
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
struct enetc_hw *hw;
|
|
|
|
struct mii_bus *bus;
|
|
|
|
int port;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
felix->pcs = devm_kcalloc(dev, felix->info->num_ports,
|
2021-12-29 05:03:06 +00:00
|
|
|
sizeof(struct phylink_pcs *),
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
GFP_KERNEL);
|
|
|
|
if (!felix->pcs) {
|
|
|
|
dev_err(dev, "failed to allocate array for PCS PHYs\n");
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2022-09-27 19:15:17 +00:00
|
|
|
imdio_base = pci_resource_start(pdev, VSC9959_IMDIO_PCI_BAR);
|
|
|
|
|
2022-09-27 19:15:16 +00:00
|
|
|
memcpy(&res, &vsc9959_imdio_res, sizeof(res));
|
2022-09-27 19:15:17 +00:00
|
|
|
res.start += imdio_base;
|
|
|
|
res.end += imdio_base;
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
|
2020-05-22 08:54:34 +00:00
|
|
|
imdio_regs = devm_ioremap_resource(dev, &res);
|
2021-03-29 01:54:05 +00:00
|
|
|
if (IS_ERR(imdio_regs))
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
return PTR_ERR(imdio_regs);
|
|
|
|
|
|
|
|
hw = enetc_hw_alloc(dev, imdio_regs);
|
|
|
|
if (IS_ERR(hw)) {
|
|
|
|
dev_err(dev, "failed to allocate ENETC HW structure\n");
|
|
|
|
return PTR_ERR(hw);
|
|
|
|
}
|
|
|
|
|
net: dsa: felix: don't use devres for mdiobus
As explained in commits:
74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres")
5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres")
mdiobus_free() will panic when called from devm_mdiobus_free() <-
devres_release_all() <- __device_release_driver(), and that mdiobus was
not previously unregistered.
The Felix VSC9959 switch is a PCI device, so the initial set of
constraints that I thought would cause this (I2C or SPI buses which call
->remove on ->shutdown) do not apply. But there is one more which
applies here.
If the DSA master itself is on a bus that calls ->remove from ->shutdown
(like dpaa2-eth, which is on the fsl-mc bus), there is a device link
between the switch and the DSA master, and device_links_unbind_consumers()
will unbind the felix switch driver on shutdown.
So the same treatment must be applied to all DSA switch drivers, which
is: either use devres for both the mdiobus allocation and registration,
or don't use devres at all.
The felix driver has the code structure in place for orderly mdiobus
removal, so just replace devm_mdiobus_alloc_size() with the non-devres
variant, and add manual free where necessary, to ensure that we don't
let devres free a still-registered bus.
Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-07 16:15:50 +00:00
|
|
|
bus = mdiobus_alloc_size(sizeof(*mdio_priv));
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
if (!bus)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
bus->name = "VSC9959 internal MDIO bus";
|
2023-01-12 15:15:16 +00:00
|
|
|
bus->read = enetc_mdio_read_c22;
|
|
|
|
bus->write = enetc_mdio_write_c22;
|
|
|
|
bus->read_c45 = enetc_mdio_read_c45;
|
|
|
|
bus->write_c45 = enetc_mdio_write_c45;
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
bus->parent = dev;
|
|
|
|
mdio_priv = bus->priv;
|
|
|
|
mdio_priv->hw = hw;
|
|
|
|
/* This gets added to imdio_regs, which already maps addresses
|
|
|
|
* starting with the proper offset.
|
|
|
|
*/
|
|
|
|
mdio_priv->mdio_base = 0;
|
|
|
|
snprintf(bus->id, MII_BUS_ID_SIZE, "%s-imdio", dev_name(dev));
|
|
|
|
|
|
|
|
/* Needed in order to initialize the bus mutex lock */
|
|
|
|
rc = mdiobus_register(bus);
|
|
|
|
if (rc < 0) {
|
|
|
|
dev_err(dev, "failed to register MDIO bus\n");
|
net: dsa: felix: don't use devres for mdiobus
As explained in commits:
74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres")
5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres")
mdiobus_free() will panic when called from devm_mdiobus_free() <-
devres_release_all() <- __device_release_driver(), and that mdiobus was
not previously unregistered.
The Felix VSC9959 switch is a PCI device, so the initial set of
constraints that I thought would cause this (I2C or SPI buses which call
->remove on ->shutdown) do not apply. But there is one more which
applies here.
If the DSA master itself is on a bus that calls ->remove from ->shutdown
(like dpaa2-eth, which is on the fsl-mc bus), there is a device link
between the switch and the DSA master, and device_links_unbind_consumers()
will unbind the felix switch driver on shutdown.
So the same treatment must be applied to all DSA switch drivers, which
is: either use devres for both the mdiobus allocation and registration,
or don't use devres at all.
The felix driver has the code structure in place for orderly mdiobus
removal, so just replace devm_mdiobus_alloc_size() with the non-devres
variant, and add manual free where necessary, to ensure that we don't
let devres free a still-registered bus.
Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-07 16:15:50 +00:00
|
|
|
mdiobus_free(bus);
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
felix->imdio = bus;
|
|
|
|
|
|
|
|
for (port = 0; port < felix->info->num_ports; port++) {
|
|
|
|
struct ocelot_port *ocelot_port = ocelot->ports[port];
|
2021-12-29 05:03:06 +00:00
|
|
|
struct phylink_pcs *phylink_pcs;
|
2021-12-29 05:03:07 +00:00
|
|
|
struct mdio_device *mdio_device;
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
|
2020-08-30 08:34:02 +00:00
|
|
|
if (dsa_is_unused_port(felix->ds, port))
|
|
|
|
continue;
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
|
2020-08-30 08:34:02 +00:00
|
|
|
if (ocelot_port->phy_mode == PHY_INTERFACE_MODE_INTERNAL)
|
|
|
|
continue;
|
|
|
|
|
2021-12-29 05:03:07 +00:00
|
|
|
mdio_device = mdio_device_create(felix->imdio, port);
|
|
|
|
if (IS_ERR(mdio_device))
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
continue;
|
|
|
|
|
2021-12-29 05:03:07 +00:00
|
|
|
phylink_pcs = lynx_pcs_create(mdio_device);
|
2021-12-29 05:03:06 +00:00
|
|
|
if (!phylink_pcs) {
|
2021-12-29 05:03:07 +00:00
|
|
|
mdio_device_free(mdio_device);
|
2020-08-30 08:34:02 +00:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2021-12-29 05:03:06 +00:00
|
|
|
felix->pcs[port] = phylink_pcs;
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
|
|
|
|
dev_info(dev, "Found PCS at internal MDIO address %d\n", port);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-09-18 10:57:50 +00:00
|
|
|
static void vsc9959_mdio_bus_free(struct ocelot *ocelot)
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
{
|
|
|
|
struct felix *felix = ocelot_to_felix(ocelot);
|
|
|
|
int port;
|
|
|
|
|
|
|
|
for (port = 0; port < ocelot->num_phys_ports; port++) {
|
2021-12-29 05:03:06 +00:00
|
|
|
struct phylink_pcs *phylink_pcs = felix->pcs[port];
|
|
|
|
struct mdio_device *mdio_device;
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
|
2021-12-29 05:03:06 +00:00
|
|
|
if (!phylink_pcs)
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
continue;
|
|
|
|
|
2021-12-29 05:03:06 +00:00
|
|
|
mdio_device = lynx_get_mdio_device(phylink_pcs);
|
|
|
|
mdio_device_free(mdio_device);
|
|
|
|
lynx_pcs_destroy(phylink_pcs);
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
}
|
|
|
|
mdiobus_unregister(felix->imdio);
|
net: dsa: felix: don't use devres for mdiobus
As explained in commits:
74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres")
5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres")
mdiobus_free() will panic when called from devm_mdiobus_free() <-
devres_release_all() <- __device_release_driver(), and that mdiobus was
not previously unregistered.
The Felix VSC9959 switch is a PCI device, so the initial set of
constraints that I thought would cause this (I2C or SPI buses which call
->remove on ->shutdown) do not apply. But there is one more which
applies here.
If the DSA master itself is on a bus that calls ->remove from ->shutdown
(like dpaa2-eth, which is on the fsl-mc bus), there is a device link
between the switch and the DSA master, and device_links_unbind_consumers()
will unbind the felix switch driver on shutdown.
So the same treatment must be applied to all DSA switch drivers, which
is: either use devres for both the mdiobus allocation and registration,
or don't use devres at all.
The felix driver has the code structure in place for orderly mdiobus
removal, so just replace devm_mdiobus_alloc_size() with the non-devres
variant, and add manual free where necessary, to ensure that we don't
let devres free a still-registered bus.
Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-07 16:15:50 +00:00
|
|
|
mdiobus_free(felix->imdio);
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
}
|
|
|
|
|
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
The blamed commit broke tc-taprio schedules such as this one:
tc qdisc replace dev $swp1 root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 \
sched-entry S 0x7f 990000 \
sched-entry S 0x80 10000 \
flags 0x2
because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
band added earlier than its 'gate close' event, such that packet
overruns won't occur in the worst case of the largest packet possible.
Since guard bands are statically determined based on the per-tc
QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
we need to discuss what happens with TC 7 depending on kernel version,
since the driver, prior to commit 55a515b1f5a9 ("net: dsa: felix: drop
oversized frames with tc-taprio instead of hanging the port"), did not
touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.
1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
1518, and at gigabit this introduces a static guard band (independent
of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
time of 20 octets => 160 ns). But this is larger than the time window
itself, of 10000 ns. So, the queue system never considers a frame with
TC 7 as eligible for transmission, since the gate practically never
opens, and these frames are forever stuck in the TX queues and hang
the port.
2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
enabling oversized frame dropping, we make an effort to set
QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
one more role, which we did not take into account: per-tc static guard
band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead).
There is a discrepancy between what the driver thinks (that there is
no guard band, and 100% of min_gate_len[tc] is available for egress
scheduling) and what the hardware actually does (crops the equivalent
of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
means that the hardware thinks it has exactly 0 ns for scheduling tc 7.
In both cases, even minimum sized Ethernet frames are stuck on egress
rather than being considered for scheduling on TC 7, even if they would
fit given a proper configuration. Considering the current situation,
with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230
octets in size are not eligible for oversized dropping (because they are
smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
for scheduling either, because the min_gate_len[7] (10000 ns) minus the
guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
octet == 9840 ns) minus the guard band auto-added for L1 overhead by
QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
leaves 0 ns for scheduling in the queue system proper.
Investigating the hardware behavior, it becomes apparent that the queue
system needs precisely 33 ns of 'gate open' time in order to consider a
frame as eligible for scheduling to a tc. So the solution to this
problem is to amend vsc9959_tas_guard_bands_update(), by giving the
per-tc guard bands less space by exactly 33 ns, just enough for one
frame to be scheduled in that interval. This allows the queue system to
make forward progress for that port-tc, and prevents it from hanging.
Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:23 +00:00
|
|
|
/* The switch considers any frame (regardless of size) as eligible for
|
|
|
|
* transmission if the traffic class gate is open for at least 33 ns.
|
|
|
|
* Overruns are prevented by cropping an interval at the end of the gate time
|
|
|
|
* slot for which egress scheduling is blocked, but we need to still keep 33 ns
|
|
|
|
* available for one packet to be transmitted, otherwise the port tc will hang.
|
|
|
|
* This function returns the size of a gate interval that remains available for
|
|
|
|
* setting the guard band, after reserving the space for one egress frame.
|
|
|
|
*/
|
|
|
|
static u64 vsc9959_tas_remaining_gate_len_ps(u64 gate_len_ns)
|
|
|
|
{
|
|
|
|
/* Gate always open */
|
|
|
|
if (gate_len_ns == U64_MAX)
|
|
|
|
return U64_MAX;
|
|
|
|
|
|
|
|
return (gate_len_ns - VSC9959_TAS_MIN_GATE_LEN_NS) * PSEC_PER_NSEC;
|
|
|
|
}
|
|
|
|
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
/* Extract shortest continuous gate open intervals in ns for each traffic class
|
|
|
|
* of a cyclic tc-taprio schedule. If a gate is always open, the duration is
|
|
|
|
* considered U64_MAX. If the gate is always closed, it is considered 0.
|
|
|
|
*/
|
|
|
|
static void vsc9959_tas_min_gate_lengths(struct tc_taprio_qopt_offload *taprio,
|
|
|
|
u64 min_gate_len[OCELOT_NUM_TC])
|
|
|
|
{
|
|
|
|
struct tc_taprio_sched_entry *entry;
|
|
|
|
u64 gate_len[OCELOT_NUM_TC];
|
net: dsa: felix: fix min gate len calculation for tc when its first gate is closed
min_gate_len[tc] is supposed to track the shortest interval of
continuously open gates for a traffic class. For example, in the
following case:
TC 76543210
t0 00000001b 200000 ns
t1 00000010b 200000 ns
min_gate_len[0] and min_gate_len[1] should be 200000, while
min_gate_len[2-7] should be 0.
However what happens is that min_gate_len[0] is 200000, but
min_gate_len[1] ends up being 0 (despite gate_len[1] being 200000 at the
point where the logic detects the gate close event for TC 1).
The problem is that the code considers a "gate close" event whenever it
sees that there is a 0 for that TC (essentially it's level rather than
edge triggered). By doing that, any time a gate is seen as closed
without having been open prior, gate_len, which is 0, will be written
into min_gate_len. Once min_gate_len becomes 0, it's impossible for it
to track anything higher than that (the length of actually open
intervals).
To fix this, we make the writing to min_gate_len[tc] be edge-triggered,
which avoids writes for gates that are closed in consecutive intervals.
However what this does is it makes us need to special-case the
permanently closed gates at the end.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220804202817.1677572-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-04 20:28:17 +00:00
|
|
|
u8 gates_ever_opened = 0;
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
int tc, i, n;
|
|
|
|
|
|
|
|
/* Initialize arrays */
|
|
|
|
for (tc = 0; tc < OCELOT_NUM_TC; tc++) {
|
|
|
|
min_gate_len[tc] = U64_MAX;
|
|
|
|
gate_len[tc] = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* If we don't have taprio, consider all gates as permanently open */
|
|
|
|
if (!taprio)
|
|
|
|
return;
|
|
|
|
|
|
|
|
n = taprio->num_entries;
|
|
|
|
|
|
|
|
/* Walk through the gate list twice to determine the length
|
|
|
|
* of consecutively open gates for a traffic class, including
|
|
|
|
* open gates that wrap around. We are just interested in the
|
|
|
|
* minimum window size, and this doesn't change what the
|
|
|
|
* minimum is (if the gate never closes, min_gate_len will
|
|
|
|
* remain U64_MAX).
|
|
|
|
*/
|
|
|
|
for (i = 0; i < 2 * n; i++) {
|
|
|
|
entry = &taprio->entries[i % n];
|
|
|
|
|
|
|
|
for (tc = 0; tc < OCELOT_NUM_TC; tc++) {
|
|
|
|
if (entry->gate_mask & BIT(tc)) {
|
|
|
|
gate_len[tc] += entry->interval;
|
net: dsa: felix: fix min gate len calculation for tc when its first gate is closed
min_gate_len[tc] is supposed to track the shortest interval of
continuously open gates for a traffic class. For example, in the
following case:
TC 76543210
t0 00000001b 200000 ns
t1 00000010b 200000 ns
min_gate_len[0] and min_gate_len[1] should be 200000, while
min_gate_len[2-7] should be 0.
However what happens is that min_gate_len[0] is 200000, but
min_gate_len[1] ends up being 0 (despite gate_len[1] being 200000 at the
point where the logic detects the gate close event for TC 1).
The problem is that the code considers a "gate close" event whenever it
sees that there is a 0 for that TC (essentially it's level rather than
edge triggered). By doing that, any time a gate is seen as closed
without having been open prior, gate_len, which is 0, will be written
into min_gate_len. Once min_gate_len becomes 0, it's impossible for it
to track anything higher than that (the length of actually open
intervals).
To fix this, we make the writing to min_gate_len[tc] be edge-triggered,
which avoids writes for gates that are closed in consecutive intervals.
However what this does is it makes us need to special-case the
permanently closed gates at the end.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220804202817.1677572-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-04 20:28:17 +00:00
|
|
|
gates_ever_opened |= BIT(tc);
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
} else {
|
|
|
|
/* Gate closes now, record a potential new
|
|
|
|
* minimum and reinitialize length
|
|
|
|
*/
|
net: dsa: felix: fix min gate len calculation for tc when its first gate is closed
min_gate_len[tc] is supposed to track the shortest interval of
continuously open gates for a traffic class. For example, in the
following case:
TC 76543210
t0 00000001b 200000 ns
t1 00000010b 200000 ns
min_gate_len[0] and min_gate_len[1] should be 200000, while
min_gate_len[2-7] should be 0.
However what happens is that min_gate_len[0] is 200000, but
min_gate_len[1] ends up being 0 (despite gate_len[1] being 200000 at the
point where the logic detects the gate close event for TC 1).
The problem is that the code considers a "gate close" event whenever it
sees that there is a 0 for that TC (essentially it's level rather than
edge triggered). By doing that, any time a gate is seen as closed
without having been open prior, gate_len, which is 0, will be written
into min_gate_len. Once min_gate_len becomes 0, it's impossible for it
to track anything higher than that (the length of actually open
intervals).
To fix this, we make the writing to min_gate_len[tc] be edge-triggered,
which avoids writes for gates that are closed in consecutive intervals.
However what this does is it makes us need to special-case the
permanently closed gates at the end.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220804202817.1677572-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-04 20:28:17 +00:00
|
|
|
if (min_gate_len[tc] > gate_len[tc] &&
|
|
|
|
gate_len[tc])
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
min_gate_len[tc] = gate_len[tc];
|
|
|
|
gate_len[tc] = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
net: dsa: felix: fix min gate len calculation for tc when its first gate is closed
min_gate_len[tc] is supposed to track the shortest interval of
continuously open gates for a traffic class. For example, in the
following case:
TC 76543210
t0 00000001b 200000 ns
t1 00000010b 200000 ns
min_gate_len[0] and min_gate_len[1] should be 200000, while
min_gate_len[2-7] should be 0.
However what happens is that min_gate_len[0] is 200000, but
min_gate_len[1] ends up being 0 (despite gate_len[1] being 200000 at the
point where the logic detects the gate close event for TC 1).
The problem is that the code considers a "gate close" event whenever it
sees that there is a 0 for that TC (essentially it's level rather than
edge triggered). By doing that, any time a gate is seen as closed
without having been open prior, gate_len, which is 0, will be written
into min_gate_len. Once min_gate_len becomes 0, it's impossible for it
to track anything higher than that (the length of actually open
intervals).
To fix this, we make the writing to min_gate_len[tc] be edge-triggered,
which avoids writes for gates that are closed in consecutive intervals.
However what this does is it makes us need to special-case the
permanently closed gates at the end.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220804202817.1677572-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-04 20:28:17 +00:00
|
|
|
|
|
|
|
/* min_gate_len[tc] actually tracks minimum *open* gate time, so for
|
|
|
|
* permanently closed gates, min_gate_len[tc] will still be U64_MAX.
|
|
|
|
* Therefore they are currently indistinguishable from permanently
|
|
|
|
* open gates. Overwrite the gate len with 0 when we know they're
|
|
|
|
* actually permanently closed, i.e. after the loop above.
|
|
|
|
*/
|
|
|
|
for (tc = 0; tc < OCELOT_NUM_TC; tc++)
|
|
|
|
if (!(gates_ever_opened & BIT(tc)))
|
|
|
|
min_gate_len[tc] = 0;
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
}
|
|
|
|
|
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.
Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.
Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.
There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:24 +00:00
|
|
|
/* ocelot_write_rix is a macro that concatenates QSYS_MAXSDU_CFG_* with _RSZ,
|
|
|
|
* so we need to spell out the register access to each traffic class in helper
|
|
|
|
* functions, to simplify callers
|
|
|
|
*/
|
|
|
|
static void vsc9959_port_qmaxsdu_set(struct ocelot *ocelot, int port, int tc,
|
|
|
|
u32 max_sdu)
|
|
|
|
{
|
|
|
|
switch (tc) {
|
|
|
|
case 0:
|
|
|
|
ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_0,
|
|
|
|
port);
|
|
|
|
break;
|
|
|
|
case 1:
|
|
|
|
ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_1,
|
|
|
|
port);
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_2,
|
|
|
|
port);
|
|
|
|
break;
|
|
|
|
case 3:
|
|
|
|
ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_3,
|
|
|
|
port);
|
|
|
|
break;
|
|
|
|
case 4:
|
|
|
|
ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_4,
|
|
|
|
port);
|
|
|
|
break;
|
|
|
|
case 5:
|
|
|
|
ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_5,
|
|
|
|
port);
|
|
|
|
break;
|
|
|
|
case 6:
|
|
|
|
ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_6,
|
|
|
|
port);
|
|
|
|
break;
|
|
|
|
case 7:
|
|
|
|
ocelot_write_rix(ocelot, max_sdu, QSYS_QMAXSDU_CFG_7,
|
|
|
|
port);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static u32 vsc9959_port_qmaxsdu_get(struct ocelot *ocelot, int port, int tc)
|
|
|
|
{
|
|
|
|
switch (tc) {
|
|
|
|
case 0: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_0, port);
|
|
|
|
case 1: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_1, port);
|
|
|
|
case 2: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_2, port);
|
|
|
|
case 3: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_3, port);
|
|
|
|
case 4: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_4, port);
|
|
|
|
case 5: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_5, port);
|
|
|
|
case 6: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_6, port);
|
|
|
|
case 7: return ocelot_read_rix(ocelot, QSYS_QMAXSDU_CFG_7, port);
|
|
|
|
default:
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-09-28 09:51:59 +00:00
|
|
|
static u32 vsc9959_tas_tc_max_sdu(struct tc_taprio_qopt_offload *taprio, int tc)
|
|
|
|
{
|
|
|
|
if (!taprio || !taprio->max_sdu[tc])
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return taprio->max_sdu[tc] + ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN;
|
|
|
|
}
|
|
|
|
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
/* Update QSYS_PORT_MAX_SDU to make sure the static guard bands added by the
|
|
|
|
* switch (see the ALWAYS_GUARD_BAND_SCH_Q comment) are correct at all MTU
|
|
|
|
* values (the default value is 1518). Also, for traffic class windows smaller
|
|
|
|
* than one MTU sized frame, update QSYS_QMAXSDU_CFG to enable oversized frame
|
|
|
|
* dropping, such that these won't hang the port, as they will never be sent.
|
|
|
|
*/
|
|
|
|
static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
|
|
|
|
{
|
|
|
|
struct ocelot_port *ocelot_port = ocelot->ports[port];
|
2022-09-28 09:51:59 +00:00
|
|
|
struct tc_taprio_qopt_offload *taprio;
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
u64 min_gate_len[OCELOT_NUM_TC];
|
|
|
|
int speed, picos_per_byte;
|
|
|
|
u64 needed_bit_time_ps;
|
|
|
|
u32 val, maxlen;
|
|
|
|
u8 tas_speed;
|
|
|
|
int tc;
|
|
|
|
|
|
|
|
lockdep_assert_held(&ocelot->tas_lock);
|
|
|
|
|
2022-09-28 09:51:59 +00:00
|
|
|
taprio = ocelot_port->taprio;
|
|
|
|
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
val = ocelot_read_rix(ocelot, QSYS_TAG_CONFIG, port);
|
|
|
|
tas_speed = QSYS_TAG_CONFIG_LINK_SPEED_X(val);
|
|
|
|
|
|
|
|
switch (tas_speed) {
|
|
|
|
case OCELOT_SPEED_10:
|
|
|
|
speed = SPEED_10;
|
|
|
|
break;
|
|
|
|
case OCELOT_SPEED_100:
|
|
|
|
speed = SPEED_100;
|
|
|
|
break;
|
|
|
|
case OCELOT_SPEED_1000:
|
|
|
|
speed = SPEED_1000;
|
|
|
|
break;
|
|
|
|
case OCELOT_SPEED_2500:
|
|
|
|
speed = SPEED_2500;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
picos_per_byte = (USEC_PER_SEC * 8) / speed;
|
|
|
|
|
|
|
|
val = ocelot_port_readl(ocelot_port, DEV_MAC_MAXLEN_CFG);
|
|
|
|
/* MAXLEN_CFG accounts automatically for VLAN. We need to include it
|
|
|
|
* manually in the bit time calculation, plus the preamble and SFD.
|
|
|
|
*/
|
|
|
|
maxlen = val + 2 * VLAN_HLEN;
|
|
|
|
/* Consider the standard Ethernet overhead of 8 octets preamble+SFD,
|
|
|
|
* 4 octets FCS, 12 octets IFG.
|
|
|
|
*/
|
|
|
|
needed_bit_time_ps = (maxlen + 24) * picos_per_byte;
|
|
|
|
|
|
|
|
dev_dbg(ocelot->dev,
|
|
|
|
"port %d: max frame size %d needs %llu ps at speed %d\n",
|
|
|
|
port, maxlen, needed_bit_time_ps, speed);
|
|
|
|
|
2022-09-28 09:51:59 +00:00
|
|
|
vsc9959_tas_min_gate_lengths(taprio, min_gate_len);
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
|
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.
Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.
Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.
There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:24 +00:00
|
|
|
mutex_lock(&ocelot->fwd_domain_lock);
|
|
|
|
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
for (tc = 0; tc < OCELOT_NUM_TC; tc++) {
|
2022-09-28 09:51:59 +00:00
|
|
|
u32 requested_max_sdu = vsc9959_tas_tc_max_sdu(taprio, tc);
|
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
The blamed commit broke tc-taprio schedules such as this one:
tc qdisc replace dev $swp1 root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 \
sched-entry S 0x7f 990000 \
sched-entry S 0x80 10000 \
flags 0x2
because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
band added earlier than its 'gate close' event, such that packet
overruns won't occur in the worst case of the largest packet possible.
Since guard bands are statically determined based on the per-tc
QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
we need to discuss what happens with TC 7 depending on kernel version,
since the driver, prior to commit 55a515b1f5a9 ("net: dsa: felix: drop
oversized frames with tc-taprio instead of hanging the port"), did not
touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.
1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
1518, and at gigabit this introduces a static guard band (independent
of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
time of 20 octets => 160 ns). But this is larger than the time window
itself, of 10000 ns. So, the queue system never considers a frame with
TC 7 as eligible for transmission, since the gate practically never
opens, and these frames are forever stuck in the TX queues and hang
the port.
2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
enabling oversized frame dropping, we make an effort to set
QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
one more role, which we did not take into account: per-tc static guard
band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead).
There is a discrepancy between what the driver thinks (that there is
no guard band, and 100% of min_gate_len[tc] is available for egress
scheduling) and what the hardware actually does (crops the equivalent
of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
means that the hardware thinks it has exactly 0 ns for scheduling tc 7.
In both cases, even minimum sized Ethernet frames are stuck on egress
rather than being considered for scheduling on TC 7, even if they would
fit given a proper configuration. Considering the current situation,
with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230
octets in size are not eligible for oversized dropping (because they are
smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
for scheduling either, because the min_gate_len[7] (10000 ns) minus the
guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
octet == 9840 ns) minus the guard band auto-added for L1 overhead by
QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
leaves 0 ns for scheduling in the queue system proper.
Investigating the hardware behavior, it becomes apparent that the queue
system needs precisely 33 ns of 'gate open' time in order to consider a
frame as eligible for scheduling to a tc. So the solution to this
problem is to amend vsc9959_tas_guard_bands_update(), by giving the
per-tc guard bands less space by exactly 33 ns, just enough for one
frame to be scheduled in that interval. This allows the queue system to
make forward progress for that port-tc, and prevents it from hanging.
Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:23 +00:00
|
|
|
u64 remaining_gate_len_ps;
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
u32 max_sdu;
|
|
|
|
|
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
The blamed commit broke tc-taprio schedules such as this one:
tc qdisc replace dev $swp1 root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 \
sched-entry S 0x7f 990000 \
sched-entry S 0x80 10000 \
flags 0x2
because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
band added earlier than its 'gate close' event, such that packet
overruns won't occur in the worst case of the largest packet possible.
Since guard bands are statically determined based on the per-tc
QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
we need to discuss what happens with TC 7 depending on kernel version,
since the driver, prior to commit 55a515b1f5a9 ("net: dsa: felix: drop
oversized frames with tc-taprio instead of hanging the port"), did not
touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.
1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
1518, and at gigabit this introduces a static guard band (independent
of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
time of 20 octets => 160 ns). But this is larger than the time window
itself, of 10000 ns. So, the queue system never considers a frame with
TC 7 as eligible for transmission, since the gate practically never
opens, and these frames are forever stuck in the TX queues and hang
the port.
2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
enabling oversized frame dropping, we make an effort to set
QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
one more role, which we did not take into account: per-tc static guard
band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead).
There is a discrepancy between what the driver thinks (that there is
no guard band, and 100% of min_gate_len[tc] is available for egress
scheduling) and what the hardware actually does (crops the equivalent
of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
means that the hardware thinks it has exactly 0 ns for scheduling tc 7.
In both cases, even minimum sized Ethernet frames are stuck on egress
rather than being considered for scheduling on TC 7, even if they would
fit given a proper configuration. Considering the current situation,
with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230
octets in size are not eligible for oversized dropping (because they are
smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
for scheduling either, because the min_gate_len[7] (10000 ns) minus the
guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
octet == 9840 ns) minus the guard band auto-added for L1 overhead by
QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
leaves 0 ns for scheduling in the queue system proper.
Investigating the hardware behavior, it becomes apparent that the queue
system needs precisely 33 ns of 'gate open' time in order to consider a
frame as eligible for scheduling to a tc. So the solution to this
problem is to amend vsc9959_tas_guard_bands_update(), by giving the
per-tc guard bands less space by exactly 33 ns, just enough for one
frame to be scheduled in that interval. This allows the queue system to
make forward progress for that port-tc, and prevents it from hanging.
Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:23 +00:00
|
|
|
remaining_gate_len_ps =
|
|
|
|
vsc9959_tas_remaining_gate_len_ps(min_gate_len[tc]);
|
|
|
|
|
|
|
|
if (remaining_gate_len_ps > needed_bit_time_ps) {
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
/* Setting QMAXSDU_CFG to 0 disables oversized frame
|
|
|
|
* dropping.
|
|
|
|
*/
|
2022-09-28 09:51:59 +00:00
|
|
|
max_sdu = requested_max_sdu;
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
dev_dbg(ocelot->dev,
|
|
|
|
"port %d tc %d min gate len %llu"
|
|
|
|
", sending all frames\n",
|
|
|
|
port, tc, min_gate_len[tc]);
|
|
|
|
} else {
|
|
|
|
/* If traffic class doesn't support a full MTU sized
|
|
|
|
* frame, make sure to enable oversize frame dropping
|
|
|
|
* for frames larger than the smallest that would fit.
|
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
The blamed commit broke tc-taprio schedules such as this one:
tc qdisc replace dev $swp1 root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 \
sched-entry S 0x7f 990000 \
sched-entry S 0x80 10000 \
flags 0x2
because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
band added earlier than its 'gate close' event, such that packet
overruns won't occur in the worst case of the largest packet possible.
Since guard bands are statically determined based on the per-tc
QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
we need to discuss what happens with TC 7 depending on kernel version,
since the driver, prior to commit 55a515b1f5a9 ("net: dsa: felix: drop
oversized frames with tc-taprio instead of hanging the port"), did not
touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.
1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
1518, and at gigabit this introduces a static guard band (independent
of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
time of 20 octets => 160 ns). But this is larger than the time window
itself, of 10000 ns. So, the queue system never considers a frame with
TC 7 as eligible for transmission, since the gate practically never
opens, and these frames are forever stuck in the TX queues and hang
the port.
2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
enabling oversized frame dropping, we make an effort to set
QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
one more role, which we did not take into account: per-tc static guard
band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead).
There is a discrepancy between what the driver thinks (that there is
no guard band, and 100% of min_gate_len[tc] is available for egress
scheduling) and what the hardware actually does (crops the equivalent
of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
means that the hardware thinks it has exactly 0 ns for scheduling tc 7.
In both cases, even minimum sized Ethernet frames are stuck on egress
rather than being considered for scheduling on TC 7, even if they would
fit given a proper configuration. Considering the current situation,
with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230
octets in size are not eligible for oversized dropping (because they are
smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
for scheduling either, because the min_gate_len[7] (10000 ns) minus the
guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
octet == 9840 ns) minus the guard band auto-added for L1 overhead by
QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
leaves 0 ns for scheduling in the queue system proper.
Investigating the hardware behavior, it becomes apparent that the queue
system needs precisely 33 ns of 'gate open' time in order to consider a
frame as eligible for scheduling to a tc. So the solution to this
problem is to amend vsc9959_tas_guard_bands_update(), by giving the
per-tc guard bands less space by exactly 33 ns, just enough for one
frame to be scheduled in that interval. This allows the queue system to
make forward progress for that port-tc, and prevents it from hanging.
Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:23 +00:00
|
|
|
*
|
|
|
|
* However, the exact same register, QSYS_QMAXSDU_CFG_*,
|
|
|
|
* controls not only oversized frame dropping, but also
|
|
|
|
* per-tc static guard band lengths, so it reduces the
|
|
|
|
* useful gate interval length. Therefore, be careful
|
|
|
|
* to calculate a guard band (and therefore max_sdu)
|
|
|
|
* that still leaves 33 ns available in the time slot.
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
*/
|
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
The blamed commit broke tc-taprio schedules such as this one:
tc qdisc replace dev $swp1 root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 \
sched-entry S 0x7f 990000 \
sched-entry S 0x80 10000 \
flags 0x2
because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
band added earlier than its 'gate close' event, such that packet
overruns won't occur in the worst case of the largest packet possible.
Since guard bands are statically determined based on the per-tc
QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
we need to discuss what happens with TC 7 depending on kernel version,
since the driver, prior to commit 55a515b1f5a9 ("net: dsa: felix: drop
oversized frames with tc-taprio instead of hanging the port"), did not
touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.
1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
1518, and at gigabit this introduces a static guard band (independent
of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
time of 20 octets => 160 ns). But this is larger than the time window
itself, of 10000 ns. So, the queue system never considers a frame with
TC 7 as eligible for transmission, since the gate practically never
opens, and these frames are forever stuck in the TX queues and hang
the port.
2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
enabling oversized frame dropping, we make an effort to set
QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
one more role, which we did not take into account: per-tc static guard
band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead).
There is a discrepancy between what the driver thinks (that there is
no guard band, and 100% of min_gate_len[tc] is available for egress
scheduling) and what the hardware actually does (crops the equivalent
of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
means that the hardware thinks it has exactly 0 ns for scheduling tc 7.
In both cases, even minimum sized Ethernet frames are stuck on egress
rather than being considered for scheduling on TC 7, even if they would
fit given a proper configuration. Considering the current situation,
with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230
octets in size are not eligible for oversized dropping (because they are
smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
for scheduling either, because the min_gate_len[7] (10000 ns) minus the
guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
octet == 9840 ns) minus the guard band auto-added for L1 overhead by
QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
leaves 0 ns for scheduling in the queue system proper.
Investigating the hardware behavior, it becomes apparent that the queue
system needs precisely 33 ns of 'gate open' time in order to consider a
frame as eligible for scheduling to a tc. So the solution to this
problem is to amend vsc9959_tas_guard_bands_update(), by giving the
per-tc guard bands less space by exactly 33 ns, just enough for one
frame to be scheduled in that interval. This allows the queue system to
make forward progress for that port-tc, and prevents it from hanging.
Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:23 +00:00
|
|
|
max_sdu = div_u64(remaining_gate_len_ps, picos_per_byte);
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
/* A TC gate may be completely closed, which is a
|
|
|
|
* special case where all packets are oversized.
|
|
|
|
* Any limit smaller than 64 octets accomplishes this
|
|
|
|
*/
|
|
|
|
if (!max_sdu)
|
|
|
|
max_sdu = 1;
|
|
|
|
/* Take L1 overhead into account, but just don't allow
|
|
|
|
* max_sdu to go negative or to 0. Here we use 20
|
|
|
|
* because QSYS_MAXSDU_CFG_* already counts the 4 FCS
|
|
|
|
* octets as part of packet size.
|
|
|
|
*/
|
|
|
|
if (max_sdu > 20)
|
|
|
|
max_sdu -= 20;
|
2022-09-28 09:51:59 +00:00
|
|
|
|
|
|
|
if (requested_max_sdu && requested_max_sdu < max_sdu)
|
|
|
|
max_sdu = requested_max_sdu;
|
|
|
|
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
dev_info(ocelot->dev,
|
|
|
|
"port %d tc %d min gate length %llu"
|
|
|
|
" ns not enough for max frame size %d at %d"
|
|
|
|
" Mbps, dropping frames over %d"
|
|
|
|
" octets including FCS\n",
|
|
|
|
port, tc, min_gate_len[tc], maxlen, speed,
|
|
|
|
max_sdu);
|
|
|
|
}
|
|
|
|
|
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.
Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.
Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.
There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:24 +00:00
|
|
|
vsc9959_port_qmaxsdu_set(ocelot, port, tc, max_sdu);
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
ocelot_write_rix(ocelot, maxlen, QSYS_PORT_MAX_SDU, port);
|
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.
Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.
Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.
There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:24 +00:00
|
|
|
|
|
|
|
ocelot->ops->cut_through_fwd(ocelot);
|
|
|
|
|
|
|
|
mutex_unlock(&ocelot->fwd_domain_lock);
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
}
|
|
|
|
|
2020-05-13 02:25:09 +00:00
|
|
|
static void vsc9959_sched_speed_set(struct ocelot *ocelot, int port,
|
|
|
|
u32 speed)
|
|
|
|
{
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
struct ocelot_port *ocelot_port = ocelot->ports[port];
|
2020-09-24 01:57:46 +00:00
|
|
|
u8 tas_speed;
|
|
|
|
|
|
|
|
switch (speed) {
|
|
|
|
case SPEED_10:
|
|
|
|
tas_speed = OCELOT_SPEED_10;
|
|
|
|
break;
|
|
|
|
case SPEED_100:
|
|
|
|
tas_speed = OCELOT_SPEED_100;
|
|
|
|
break;
|
|
|
|
case SPEED_1000:
|
|
|
|
tas_speed = OCELOT_SPEED_1000;
|
|
|
|
break;
|
|
|
|
case SPEED_2500:
|
|
|
|
tas_speed = OCELOT_SPEED_2500;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
tas_speed = OCELOT_SPEED_1000;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2022-09-05 17:01:25 +00:00
|
|
|
mutex_lock(&ocelot->tas_lock);
|
|
|
|
|
2020-05-13 02:25:09 +00:00
|
|
|
ocelot_rmw_rix(ocelot,
|
2020-09-24 01:57:46 +00:00
|
|
|
QSYS_TAG_CONFIG_LINK_SPEED(tas_speed),
|
2020-05-13 02:25:09 +00:00
|
|
|
QSYS_TAG_CONFIG_LINK_SPEED_M,
|
|
|
|
QSYS_TAG_CONFIG, port);
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
|
|
|
|
if (ocelot_port->taprio)
|
|
|
|
vsc9959_tas_guard_bands_update(ocelot, port);
|
|
|
|
|
|
|
|
mutex_unlock(&ocelot->tas_lock);
|
2020-05-13 02:25:09 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void vsc9959_new_base_time(struct ocelot *ocelot, ktime_t base_time,
|
|
|
|
u64 cycle_time,
|
|
|
|
struct timespec64 *new_base_ts)
|
|
|
|
{
|
|
|
|
struct timespec64 ts;
|
|
|
|
ktime_t new_base_time;
|
|
|
|
ktime_t current_time;
|
|
|
|
|
|
|
|
ocelot_ptp_gettime64(&ocelot->ptp_info, &ts);
|
|
|
|
current_time = timespec64_to_ktime(ts);
|
|
|
|
new_base_time = base_time;
|
|
|
|
|
|
|
|
if (base_time < current_time) {
|
|
|
|
u64 nr_of_cycles = current_time - base_time;
|
|
|
|
|
|
|
|
do_div(nr_of_cycles, cycle_time);
|
|
|
|
new_base_time += cycle_time * (nr_of_cycles + 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
*new_base_ts = ktime_to_timespec64(new_base_time);
|
|
|
|
}
|
|
|
|
|
|
|
|
static u32 vsc9959_tas_read_cfg_status(struct ocelot *ocelot)
|
|
|
|
{
|
|
|
|
return ocelot_read(ocelot, QSYS_TAS_PARAM_CFG_CTRL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vsc9959_tas_gcl_set(struct ocelot *ocelot, const u32 gcl_ix,
|
|
|
|
struct tc_taprio_sched_entry *entry)
|
|
|
|
{
|
|
|
|
ocelot_write(ocelot,
|
|
|
|
QSYS_GCL_CFG_REG_1_GCL_ENTRY_NUM(gcl_ix) |
|
|
|
|
QSYS_GCL_CFG_REG_1_GATE_STATE(entry->gate_mask),
|
|
|
|
QSYS_GCL_CFG_REG_1);
|
|
|
|
ocelot_write(ocelot, entry->interval, QSYS_GCL_CFG_REG_2);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vsc9959_qos_port_tas_set(struct ocelot *ocelot, int port,
|
|
|
|
struct tc_taprio_qopt_offload *taprio)
|
|
|
|
{
|
2022-06-17 03:24:23 +00:00
|
|
|
struct ocelot_port *ocelot_port = ocelot->ports[port];
|
2020-05-13 02:25:09 +00:00
|
|
|
struct timespec64 base_ts;
|
|
|
|
int ret, i;
|
|
|
|
u32 val;
|
|
|
|
|
2022-06-17 03:24:23 +00:00
|
|
|
mutex_lock(&ocelot->tas_lock);
|
|
|
|
|
2020-05-13 02:25:09 +00:00
|
|
|
if (!taprio->enable) {
|
2022-06-28 14:52:36 +00:00
|
|
|
ocelot_rmw_rix(ocelot, 0, QSYS_TAG_CONFIG_ENABLE,
|
2020-05-13 02:25:09 +00:00
|
|
|
QSYS_TAG_CONFIG, port);
|
|
|
|
|
2022-06-28 14:52:35 +00:00
|
|
|
taprio_offload_free(ocelot_port->taprio);
|
|
|
|
ocelot_port->taprio = NULL;
|
|
|
|
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
vsc9959_tas_guard_bands_update(ocelot, port);
|
|
|
|
|
2022-06-17 03:24:23 +00:00
|
|
|
mutex_unlock(&ocelot->tas_lock);
|
2020-05-13 02:25:09 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (taprio->cycle_time > NSEC_PER_SEC ||
|
2022-06-17 03:24:23 +00:00
|
|
|
taprio->cycle_time_extension >= NSEC_PER_SEC) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto err;
|
|
|
|
}
|
2020-05-13 02:25:09 +00:00
|
|
|
|
2022-06-17 03:24:23 +00:00
|
|
|
if (taprio->num_entries > VSC9959_TAS_GCL_ENTRY_MAX) {
|
|
|
|
ret = -ERANGE;
|
|
|
|
goto err;
|
|
|
|
}
|
2020-05-13 02:25:09 +00:00
|
|
|
|
2021-05-10 11:07:08 +00:00
|
|
|
/* Enable guard band. The switch will schedule frames without taking
|
|
|
|
* their length into account. Thus we'll always need to enable the
|
|
|
|
* guard band which reserves the time of a maximum sized frame at the
|
|
|
|
* end of the time window.
|
|
|
|
*
|
|
|
|
* Although the ALWAYS_GUARD_BAND_SCH_Q bit is global for all ports, we
|
|
|
|
* need to set PORT_NUM, because subsequent writes to PARAM_CFG_REG_n
|
|
|
|
* operate on the port number.
|
2021-04-19 10:25:30 +00:00
|
|
|
*/
|
2021-05-10 11:07:08 +00:00
|
|
|
ocelot_rmw(ocelot, QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM(port) |
|
|
|
|
QSYS_TAS_PARAM_CFG_CTRL_ALWAYS_GUARD_BAND_SCH_Q,
|
2020-05-13 02:25:09 +00:00
|
|
|
QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM_M |
|
|
|
|
QSYS_TAS_PARAM_CFG_CTRL_ALWAYS_GUARD_BAND_SCH_Q,
|
|
|
|
QSYS_TAS_PARAM_CFG_CTRL);
|
|
|
|
|
|
|
|
/* Hardware errata - Admin config could not be overwritten if
|
|
|
|
* config is pending, need reset the TAS module
|
|
|
|
*/
|
|
|
|
val = ocelot_read(ocelot, QSYS_PARAM_STATUS_REG_8);
|
2022-06-17 03:24:23 +00:00
|
|
|
if (val & QSYS_PARAM_STATUS_REG_8_CONFIG_PENDING) {
|
|
|
|
ret = -EBUSY;
|
|
|
|
goto err;
|
|
|
|
}
|
2020-05-13 02:25:09 +00:00
|
|
|
|
|
|
|
ocelot_rmw_rix(ocelot,
|
|
|
|
QSYS_TAG_CONFIG_ENABLE |
|
|
|
|
QSYS_TAG_CONFIG_INIT_GATE_STATE(0xFF) |
|
|
|
|
QSYS_TAG_CONFIG_SCH_TRAFFIC_QUEUES(0xFF),
|
|
|
|
QSYS_TAG_CONFIG_ENABLE |
|
|
|
|
QSYS_TAG_CONFIG_INIT_GATE_STATE_M |
|
|
|
|
QSYS_TAG_CONFIG_SCH_TRAFFIC_QUEUES_M,
|
|
|
|
QSYS_TAG_CONFIG, port);
|
|
|
|
|
|
|
|
vsc9959_new_base_time(ocelot, taprio->base_time,
|
|
|
|
taprio->cycle_time, &base_ts);
|
|
|
|
ocelot_write(ocelot, base_ts.tv_nsec, QSYS_PARAM_CFG_REG_1);
|
|
|
|
ocelot_write(ocelot, lower_32_bits(base_ts.tv_sec), QSYS_PARAM_CFG_REG_2);
|
|
|
|
val = upper_32_bits(base_ts.tv_sec);
|
|
|
|
ocelot_write(ocelot,
|
|
|
|
QSYS_PARAM_CFG_REG_3_BASE_TIME_SEC_MSB(val) |
|
|
|
|
QSYS_PARAM_CFG_REG_3_LIST_LENGTH(taprio->num_entries),
|
|
|
|
QSYS_PARAM_CFG_REG_3);
|
|
|
|
ocelot_write(ocelot, taprio->cycle_time, QSYS_PARAM_CFG_REG_4);
|
|
|
|
ocelot_write(ocelot, taprio->cycle_time_extension, QSYS_PARAM_CFG_REG_5);
|
|
|
|
|
|
|
|
for (i = 0; i < taprio->num_entries; i++)
|
|
|
|
vsc9959_tas_gcl_set(ocelot, i, &taprio->entries[i]);
|
|
|
|
|
|
|
|
ocelot_rmw(ocelot, QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE,
|
|
|
|
QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE,
|
|
|
|
QSYS_TAS_PARAM_CFG_CTRL);
|
|
|
|
|
|
|
|
ret = readx_poll_timeout(vsc9959_tas_read_cfg_status, ocelot, val,
|
|
|
|
!(val & QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE),
|
|
|
|
10, 100000);
|
2022-06-28 14:52:35 +00:00
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
ocelot_port->taprio = taprio_offload_get(taprio);
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
vsc9959_tas_guard_bands_update(ocelot, port);
|
2020-05-13 02:25:09 +00:00
|
|
|
|
2022-06-17 03:24:23 +00:00
|
|
|
err:
|
|
|
|
mutex_unlock(&ocelot->tas_lock);
|
|
|
|
|
2020-05-13 02:25:09 +00:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2022-06-17 03:24:23 +00:00
|
|
|
static void vsc9959_tas_clock_adjust(struct ocelot *ocelot)
|
|
|
|
{
|
2022-06-28 14:52:35 +00:00
|
|
|
struct tc_taprio_qopt_offload *taprio;
|
2022-06-17 03:24:23 +00:00
|
|
|
struct ocelot_port *ocelot_port;
|
|
|
|
struct timespec64 base_ts;
|
|
|
|
int port;
|
|
|
|
u32 val;
|
|
|
|
|
|
|
|
mutex_lock(&ocelot->tas_lock);
|
|
|
|
|
|
|
|
for (port = 0; port < ocelot->num_phys_ports; port++) {
|
2022-06-28 14:52:35 +00:00
|
|
|
ocelot_port = ocelot->ports[port];
|
|
|
|
taprio = ocelot_port->taprio;
|
|
|
|
if (!taprio)
|
2022-06-17 03:24:23 +00:00
|
|
|
continue;
|
|
|
|
|
|
|
|
ocelot_rmw(ocelot,
|
|
|
|
QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM(port),
|
|
|
|
QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM_M,
|
|
|
|
QSYS_TAS_PARAM_CFG_CTRL);
|
|
|
|
|
2022-06-28 14:52:36 +00:00
|
|
|
/* Disable time-aware shaper */
|
|
|
|
ocelot_rmw_rix(ocelot, 0, QSYS_TAG_CONFIG_ENABLE,
|
2022-06-17 03:24:23 +00:00
|
|
|
QSYS_TAG_CONFIG, port);
|
|
|
|
|
2022-06-28 14:52:35 +00:00
|
|
|
vsc9959_new_base_time(ocelot, taprio->base_time,
|
|
|
|
taprio->cycle_time, &base_ts);
|
2022-06-17 03:24:23 +00:00
|
|
|
|
|
|
|
ocelot_write(ocelot, base_ts.tv_nsec, QSYS_PARAM_CFG_REG_1);
|
|
|
|
ocelot_write(ocelot, lower_32_bits(base_ts.tv_sec),
|
|
|
|
QSYS_PARAM_CFG_REG_2);
|
|
|
|
val = upper_32_bits(base_ts.tv_sec);
|
|
|
|
ocelot_rmw(ocelot,
|
|
|
|
QSYS_PARAM_CFG_REG_3_BASE_TIME_SEC_MSB(val),
|
|
|
|
QSYS_PARAM_CFG_REG_3_BASE_TIME_SEC_MSB_M,
|
|
|
|
QSYS_PARAM_CFG_REG_3);
|
|
|
|
|
|
|
|
ocelot_rmw(ocelot, QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE,
|
|
|
|
QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE,
|
|
|
|
QSYS_TAS_PARAM_CFG_CTRL);
|
|
|
|
|
2022-06-28 14:52:36 +00:00
|
|
|
/* Re-enable time-aware shaper */
|
|
|
|
ocelot_rmw_rix(ocelot, QSYS_TAG_CONFIG_ENABLE,
|
2022-06-17 03:24:23 +00:00
|
|
|
QSYS_TAG_CONFIG_ENABLE,
|
|
|
|
QSYS_TAG_CONFIG, port);
|
|
|
|
}
|
|
|
|
mutex_unlock(&ocelot->tas_lock);
|
|
|
|
}
|
|
|
|
|
2020-05-13 02:25:10 +00:00
|
|
|
static int vsc9959_qos_port_cbs_set(struct dsa_switch *ds, int port,
|
|
|
|
struct tc_cbs_qopt_offload *cbs_qopt)
|
|
|
|
{
|
|
|
|
struct ocelot *ocelot = ds->priv;
|
|
|
|
int port_ix = port * 8 + cbs_qopt->queue;
|
|
|
|
u32 rate, burst;
|
|
|
|
|
|
|
|
if (cbs_qopt->queue >= ds->num_tx_queues)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (!cbs_qopt->enable) {
|
|
|
|
ocelot_write_gix(ocelot, QSYS_CIR_CFG_CIR_RATE(0) |
|
|
|
|
QSYS_CIR_CFG_CIR_BURST(0),
|
|
|
|
QSYS_CIR_CFG, port_ix);
|
|
|
|
|
|
|
|
ocelot_rmw_gix(ocelot, 0, QSYS_SE_CFG_SE_AVB_ENA,
|
|
|
|
QSYS_SE_CFG, port_ix);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Rate unit is 100 kbps */
|
|
|
|
rate = DIV_ROUND_UP(cbs_qopt->idleslope, 100);
|
|
|
|
/* Avoid using zero rate */
|
|
|
|
rate = clamp_t(u32, rate, 1, GENMASK(14, 0));
|
|
|
|
/* Burst unit is 4kB */
|
|
|
|
burst = DIV_ROUND_UP(cbs_qopt->hicredit, 4096);
|
|
|
|
/* Avoid using zero burst size */
|
2020-05-14 18:33:02 +00:00
|
|
|
burst = clamp_t(u32, burst, 1, GENMASK(5, 0));
|
2020-05-13 02:25:10 +00:00
|
|
|
ocelot_write_gix(ocelot,
|
|
|
|
QSYS_CIR_CFG_CIR_RATE(rate) |
|
|
|
|
QSYS_CIR_CFG_CIR_BURST(burst),
|
|
|
|
QSYS_CIR_CFG,
|
|
|
|
port_ix);
|
|
|
|
|
|
|
|
ocelot_rmw_gix(ocelot,
|
|
|
|
QSYS_SE_CFG_SE_FRM_MODE(0) |
|
|
|
|
QSYS_SE_CFG_SE_AVB_ENA,
|
|
|
|
QSYS_SE_CFG_SE_AVB_ENA |
|
|
|
|
QSYS_SE_CFG_SE_FRM_MODE_M,
|
|
|
|
QSYS_SE_CFG,
|
|
|
|
port_ix);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-09-28 09:51:59 +00:00
|
|
|
static int vsc9959_qos_query_caps(struct tc_query_caps_base *base)
|
|
|
|
{
|
|
|
|
switch (base->type) {
|
|
|
|
case TC_SETUP_QDISC_TAPRIO: {
|
|
|
|
struct tc_taprio_caps *caps = base->caps;
|
|
|
|
|
|
|
|
caps->supports_queue_max_sdu = true;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-05-13 02:25:09 +00:00
|
|
|
static int vsc9959_port_setup_tc(struct dsa_switch *ds, int port,
|
|
|
|
enum tc_setup_type type,
|
|
|
|
void *type_data)
|
|
|
|
{
|
|
|
|
struct ocelot *ocelot = ds->priv;
|
|
|
|
|
|
|
|
switch (type) {
|
2022-09-28 09:51:59 +00:00
|
|
|
case TC_QUERY_CAPS:
|
|
|
|
return vsc9959_qos_query_caps(type_data);
|
2020-05-13 02:25:09 +00:00
|
|
|
case TC_SETUP_QDISC_TAPRIO:
|
|
|
|
return vsc9959_qos_port_tas_set(ocelot, port, type_data);
|
2020-05-13 02:25:10 +00:00
|
|
|
case TC_SETUP_QDISC_CBS:
|
|
|
|
return vsc9959_qos_port_cbs_set(ds, port, type_data);
|
2020-05-13 02:25:09 +00:00
|
|
|
default:
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
#define VSC9959_PSFP_SFID_MAX 175
|
|
|
|
#define VSC9959_PSFP_GATE_ID_MAX 183
|
2021-11-18 10:12:03 +00:00
|
|
|
#define VSC9959_PSFP_POLICER_BASE 63
|
2021-11-18 10:12:00 +00:00
|
|
|
#define VSC9959_PSFP_POLICER_MAX 383
|
2021-11-18 10:12:01 +00:00
|
|
|
#define VSC9959_PSFP_GATE_LIST_NUM 4
|
|
|
|
#define VSC9959_PSFP_GATE_CYCLETIME_MIN 5000
|
2021-11-18 10:12:00 +00:00
|
|
|
|
|
|
|
struct felix_stream {
|
|
|
|
struct list_head list;
|
|
|
|
unsigned long id;
|
2021-11-18 10:12:04 +00:00
|
|
|
bool dummy;
|
|
|
|
int ports;
|
|
|
|
int port;
|
2021-11-18 10:12:00 +00:00
|
|
|
u8 dmac[ETH_ALEN];
|
|
|
|
u16 vid;
|
|
|
|
s8 prio;
|
|
|
|
u8 sfid_valid;
|
|
|
|
u8 ssid_valid;
|
|
|
|
u32 sfid;
|
|
|
|
u32 ssid;
|
|
|
|
};
|
|
|
|
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
struct felix_stream_filter_counters {
|
|
|
|
u64 match;
|
|
|
|
u64 not_pass_gate;
|
|
|
|
u64 not_pass_sdu;
|
|
|
|
u64 red;
|
|
|
|
};
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
struct felix_stream_filter {
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
struct felix_stream_filter_counters stats;
|
2021-11-18 10:12:00 +00:00
|
|
|
struct list_head list;
|
|
|
|
refcount_t refcount;
|
|
|
|
u32 index;
|
|
|
|
u8 enable;
|
2021-11-18 10:12:04 +00:00
|
|
|
int portmask;
|
2021-11-18 10:12:00 +00:00
|
|
|
u8 sg_valid;
|
|
|
|
u32 sgid;
|
|
|
|
u8 fm_valid;
|
|
|
|
u32 fmid;
|
|
|
|
u8 prio_valid;
|
|
|
|
u8 prio;
|
|
|
|
u32 maxsdu;
|
|
|
|
};
|
|
|
|
|
2021-11-18 10:12:01 +00:00
|
|
|
struct felix_stream_gate {
|
|
|
|
u32 index;
|
|
|
|
u8 enable;
|
|
|
|
u8 ipv_valid;
|
|
|
|
u8 init_ipv;
|
|
|
|
u64 basetime;
|
|
|
|
u64 cycletime;
|
|
|
|
u64 cycletime_ext;
|
|
|
|
u32 num_entries;
|
2021-11-27 18:03:20 +00:00
|
|
|
struct action_gate_entry entries[];
|
2021-11-18 10:12:01 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
struct felix_stream_gate_entry {
|
|
|
|
struct list_head list;
|
|
|
|
refcount_t refcount;
|
|
|
|
u32 index;
|
|
|
|
};
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
static int vsc9959_stream_identify(struct flow_cls_offload *f,
|
|
|
|
struct felix_stream *stream)
|
|
|
|
{
|
|
|
|
struct flow_rule *rule = flow_cls_offload_flow_rule(f);
|
|
|
|
struct flow_dissector *dissector = rule->match.dissector;
|
|
|
|
|
|
|
|
if (dissector->used_keys &
|
|
|
|
~(BIT(FLOW_DISSECTOR_KEY_CONTROL) |
|
|
|
|
BIT(FLOW_DISSECTOR_KEY_BASIC) |
|
|
|
|
BIT(FLOW_DISSECTOR_KEY_VLAN) |
|
|
|
|
BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS)))
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
|
|
|
if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
|
|
|
|
struct flow_match_eth_addrs match;
|
|
|
|
|
|
|
|
flow_rule_match_eth_addrs(rule, &match);
|
|
|
|
ether_addr_copy(stream->dmac, match.key->dst);
|
|
|
|
if (!is_zero_ether_addr(match.mask->src))
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
} else {
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
|
|
|
|
struct flow_match_vlan match;
|
|
|
|
|
|
|
|
flow_rule_match_vlan(rule, &match);
|
|
|
|
if (match.mask->vlan_priority)
|
|
|
|
stream->prio = match.key->vlan_priority;
|
|
|
|
else
|
|
|
|
stream->prio = -1;
|
|
|
|
|
|
|
|
if (!match.mask->vlan_id)
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
stream->vid = match.key->vlan_id;
|
|
|
|
} else {
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
stream->id = f->cookie;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vsc9959_mact_stream_set(struct ocelot *ocelot,
|
|
|
|
struct felix_stream *stream,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
enum macaccess_entry_type type;
|
|
|
|
int ret, sfid, ssid;
|
|
|
|
u32 vid, dst_idx;
|
|
|
|
u8 mac[ETH_ALEN];
|
|
|
|
|
|
|
|
ether_addr_copy(mac, stream->dmac);
|
|
|
|
vid = stream->vid;
|
|
|
|
|
|
|
|
/* Stream identification desn't support to add a stream with non
|
|
|
|
* existent MAC (The MAC entry has not been learned in MAC table).
|
|
|
|
*/
|
|
|
|
ret = ocelot_mact_lookup(ocelot, &dst_idx, mac, vid, &type);
|
|
|
|
if (ret) {
|
|
|
|
if (extack)
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Stream is not learned in MAC table");
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((stream->sfid_valid || stream->ssid_valid) &&
|
|
|
|
type == ENTRYTYPE_NORMAL)
|
|
|
|
type = ENTRYTYPE_LOCKED;
|
|
|
|
|
|
|
|
sfid = stream->sfid_valid ? stream->sfid : -1;
|
|
|
|
ssid = stream->ssid_valid ? stream->ssid : -1;
|
|
|
|
|
|
|
|
ret = ocelot_mact_learn_streamdata(ocelot, dst_idx, mac, vid, type,
|
|
|
|
sfid, ssid);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct felix_stream *
|
|
|
|
vsc9959_stream_table_lookup(struct list_head *stream_list,
|
|
|
|
struct felix_stream *stream)
|
|
|
|
{
|
|
|
|
struct felix_stream *tmp;
|
|
|
|
|
|
|
|
list_for_each_entry(tmp, stream_list, list)
|
|
|
|
if (ether_addr_equal(tmp->dmac, stream->dmac) &&
|
|
|
|
tmp->vid == stream->vid)
|
|
|
|
return tmp;
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vsc9959_stream_table_add(struct ocelot *ocelot,
|
|
|
|
struct list_head *stream_list,
|
|
|
|
struct felix_stream *stream,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
struct felix_stream *stream_entry;
|
|
|
|
int ret;
|
|
|
|
|
2021-12-07 06:44:18 +00:00
|
|
|
stream_entry = kmemdup(stream, sizeof(*stream_entry), GFP_KERNEL);
|
2021-11-18 10:12:00 +00:00
|
|
|
if (!stream_entry)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
if (!stream->dummy) {
|
|
|
|
ret = vsc9959_mact_stream_set(ocelot, stream_entry, extack);
|
|
|
|
if (ret) {
|
|
|
|
kfree(stream_entry);
|
|
|
|
return ret;
|
|
|
|
}
|
2021-11-18 10:12:00 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
list_add_tail(&stream_entry->list, stream_list);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct felix_stream *
|
|
|
|
vsc9959_stream_table_get(struct list_head *stream_list, unsigned long id)
|
|
|
|
{
|
|
|
|
struct felix_stream *tmp;
|
|
|
|
|
|
|
|
list_for_each_entry(tmp, stream_list, list)
|
|
|
|
if (tmp->id == id)
|
|
|
|
return tmp;
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vsc9959_stream_table_del(struct ocelot *ocelot,
|
|
|
|
struct felix_stream *stream)
|
|
|
|
{
|
2021-11-18 10:12:04 +00:00
|
|
|
if (!stream->dummy)
|
|
|
|
vsc9959_mact_stream_set(ocelot, stream, NULL);
|
2021-11-18 10:12:00 +00:00
|
|
|
|
|
|
|
list_del(&stream->list);
|
|
|
|
kfree(stream);
|
|
|
|
}
|
|
|
|
|
|
|
|
static u32 vsc9959_sfi_access_status(struct ocelot *ocelot)
|
|
|
|
{
|
|
|
|
return ocelot_read(ocelot, ANA_TABLES_SFIDACCESS);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vsc9959_psfp_sfi_set(struct ocelot *ocelot,
|
|
|
|
struct felix_stream_filter *sfi)
|
|
|
|
{
|
|
|
|
u32 val;
|
|
|
|
|
|
|
|
if (sfi->index > VSC9959_PSFP_SFID_MAX)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (!sfi->enable) {
|
|
|
|
ocelot_write(ocelot, ANA_TABLES_SFIDTIDX_SFID_INDEX(sfi->index),
|
|
|
|
ANA_TABLES_SFIDTIDX);
|
|
|
|
|
|
|
|
val = ANA_TABLES_SFIDACCESS_SFID_TBL_CMD(SFIDACCESS_CMD_WRITE);
|
|
|
|
ocelot_write(ocelot, val, ANA_TABLES_SFIDACCESS);
|
|
|
|
|
|
|
|
return readx_poll_timeout(vsc9959_sfi_access_status, ocelot, val,
|
|
|
|
(!ANA_TABLES_SFIDACCESS_SFID_TBL_CMD(val)),
|
|
|
|
10, 100000);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (sfi->sgid > VSC9959_PSFP_GATE_ID_MAX ||
|
|
|
|
sfi->fmid > VSC9959_PSFP_POLICER_MAX)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
ocelot_write(ocelot,
|
|
|
|
(sfi->sg_valid ? ANA_TABLES_SFIDTIDX_SGID_VALID : 0) |
|
|
|
|
ANA_TABLES_SFIDTIDX_SGID(sfi->sgid) |
|
|
|
|
(sfi->fm_valid ? ANA_TABLES_SFIDTIDX_POL_ENA : 0) |
|
|
|
|
ANA_TABLES_SFIDTIDX_POL_IDX(sfi->fmid) |
|
|
|
|
ANA_TABLES_SFIDTIDX_SFID_INDEX(sfi->index),
|
|
|
|
ANA_TABLES_SFIDTIDX);
|
|
|
|
|
|
|
|
ocelot_write(ocelot,
|
|
|
|
(sfi->prio_valid ? ANA_TABLES_SFIDACCESS_IGR_PRIO_MATCH_ENA : 0) |
|
|
|
|
ANA_TABLES_SFIDACCESS_IGR_PRIO(sfi->prio) |
|
|
|
|
ANA_TABLES_SFIDACCESS_MAX_SDU_LEN(sfi->maxsdu) |
|
|
|
|
ANA_TABLES_SFIDACCESS_SFID_TBL_CMD(SFIDACCESS_CMD_WRITE),
|
|
|
|
ANA_TABLES_SFIDACCESS);
|
|
|
|
|
|
|
|
return readx_poll_timeout(vsc9959_sfi_access_status, ocelot, val,
|
|
|
|
(!ANA_TABLES_SFIDACCESS_SFID_TBL_CMD(val)),
|
|
|
|
10, 100000);
|
|
|
|
}
|
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
static int vsc9959_psfp_sfidmask_set(struct ocelot *ocelot, u32 sfid, int ports)
|
|
|
|
{
|
|
|
|
u32 val;
|
|
|
|
|
|
|
|
ocelot_rmw(ocelot,
|
|
|
|
ANA_TABLES_SFIDTIDX_SFID_INDEX(sfid),
|
|
|
|
ANA_TABLES_SFIDTIDX_SFID_INDEX_M,
|
|
|
|
ANA_TABLES_SFIDTIDX);
|
|
|
|
|
|
|
|
ocelot_write(ocelot,
|
|
|
|
ANA_TABLES_SFID_MASK_IGR_PORT_MASK(ports) |
|
|
|
|
ANA_TABLES_SFID_MASK_IGR_SRCPORT_MATCH_ENA,
|
|
|
|
ANA_TABLES_SFID_MASK);
|
|
|
|
|
|
|
|
ocelot_rmw(ocelot,
|
|
|
|
ANA_TABLES_SFIDACCESS_SFID_TBL_CMD(SFIDACCESS_CMD_WRITE),
|
|
|
|
ANA_TABLES_SFIDACCESS_SFID_TBL_CMD_M,
|
|
|
|
ANA_TABLES_SFIDACCESS);
|
|
|
|
|
|
|
|
return readx_poll_timeout(vsc9959_sfi_access_status, ocelot, val,
|
|
|
|
(!ANA_TABLES_SFIDACCESS_SFID_TBL_CMD(val)),
|
|
|
|
10, 100000);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vsc9959_psfp_sfi_list_add(struct ocelot *ocelot,
|
|
|
|
struct felix_stream_filter *sfi,
|
|
|
|
struct list_head *pos)
|
|
|
|
{
|
|
|
|
struct felix_stream_filter *sfi_entry;
|
|
|
|
int ret;
|
|
|
|
|
2021-12-07 06:44:18 +00:00
|
|
|
sfi_entry = kmemdup(sfi, sizeof(*sfi_entry), GFP_KERNEL);
|
2021-11-18 10:12:04 +00:00
|
|
|
if (!sfi_entry)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
refcount_set(&sfi_entry->refcount, 1);
|
|
|
|
|
|
|
|
ret = vsc9959_psfp_sfi_set(ocelot, sfi_entry);
|
|
|
|
if (ret) {
|
|
|
|
kfree(sfi_entry);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
vsc9959_psfp_sfidmask_set(ocelot, sfi->index, sfi->portmask);
|
|
|
|
|
|
|
|
list_add(&sfi_entry->list, pos);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
static int vsc9959_psfp_sfi_table_add(struct ocelot *ocelot,
|
|
|
|
struct felix_stream_filter *sfi)
|
|
|
|
{
|
|
|
|
struct list_head *pos, *q, *last;
|
2021-11-18 10:12:04 +00:00
|
|
|
struct felix_stream_filter *tmp;
|
2021-11-18 10:12:00 +00:00
|
|
|
struct ocelot_psfp_list *psfp;
|
|
|
|
u32 insert = 0;
|
|
|
|
|
|
|
|
psfp = &ocelot->psfp;
|
|
|
|
last = &psfp->sfi_list;
|
|
|
|
|
|
|
|
list_for_each_safe(pos, q, &psfp->sfi_list) {
|
|
|
|
tmp = list_entry(pos, struct felix_stream_filter, list);
|
|
|
|
if (sfi->sg_valid == tmp->sg_valid &&
|
|
|
|
sfi->fm_valid == tmp->fm_valid &&
|
2021-11-18 10:12:04 +00:00
|
|
|
sfi->portmask == tmp->portmask &&
|
2021-11-18 10:12:00 +00:00
|
|
|
tmp->sgid == sfi->sgid &&
|
|
|
|
tmp->fmid == sfi->fmid) {
|
|
|
|
sfi->index = tmp->index;
|
|
|
|
refcount_inc(&tmp->refcount);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
/* Make sure that the index is increasing in order. */
|
|
|
|
if (tmp->index == insert) {
|
|
|
|
last = pos;
|
|
|
|
insert++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
sfi->index = insert;
|
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
return vsc9959_psfp_sfi_list_add(ocelot, sfi, last);
|
|
|
|
}
|
2021-11-18 10:12:00 +00:00
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
static int vsc9959_psfp_sfi_table_add2(struct ocelot *ocelot,
|
|
|
|
struct felix_stream_filter *sfi,
|
|
|
|
struct felix_stream_filter *sfi2)
|
|
|
|
{
|
|
|
|
struct felix_stream_filter *tmp;
|
|
|
|
struct list_head *pos, *q, *last;
|
|
|
|
struct ocelot_psfp_list *psfp;
|
|
|
|
u32 insert = 0;
|
|
|
|
int ret;
|
2021-11-18 10:12:00 +00:00
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
psfp = &ocelot->psfp;
|
|
|
|
last = &psfp->sfi_list;
|
|
|
|
|
|
|
|
list_for_each_safe(pos, q, &psfp->sfi_list) {
|
|
|
|
tmp = list_entry(pos, struct felix_stream_filter, list);
|
|
|
|
/* Make sure that the index is increasing in order. */
|
|
|
|
if (tmp->index >= insert + 2)
|
|
|
|
break;
|
|
|
|
|
|
|
|
insert = tmp->index + 1;
|
|
|
|
last = pos;
|
2021-11-18 10:12:00 +00:00
|
|
|
}
|
2021-11-18 10:12:04 +00:00
|
|
|
sfi->index = insert;
|
|
|
|
|
|
|
|
ret = vsc9959_psfp_sfi_list_add(ocelot, sfi, last);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
2021-11-18 10:12:00 +00:00
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
sfi2->index = insert + 1;
|
2021-11-18 10:12:00 +00:00
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
return vsc9959_psfp_sfi_list_add(ocelot, sfi2, last->next);
|
2021-11-18 10:12:00 +00:00
|
|
|
}
|
|
|
|
|
2021-11-18 10:12:01 +00:00
|
|
|
static struct felix_stream_filter *
|
|
|
|
vsc9959_psfp_sfi_table_get(struct list_head *sfi_list, u32 index)
|
|
|
|
{
|
|
|
|
struct felix_stream_filter *tmp;
|
|
|
|
|
|
|
|
list_for_each_entry(tmp, sfi_list, list)
|
|
|
|
if (tmp->index == index)
|
|
|
|
return tmp;
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
static void vsc9959_psfp_sfi_table_del(struct ocelot *ocelot, u32 index)
|
|
|
|
{
|
|
|
|
struct felix_stream_filter *tmp, *n;
|
|
|
|
struct ocelot_psfp_list *psfp;
|
|
|
|
u8 z;
|
|
|
|
|
|
|
|
psfp = &ocelot->psfp;
|
|
|
|
|
|
|
|
list_for_each_entry_safe(tmp, n, &psfp->sfi_list, list)
|
|
|
|
if (tmp->index == index) {
|
|
|
|
z = refcount_dec_and_test(&tmp->refcount);
|
|
|
|
if (z) {
|
|
|
|
tmp->enable = 0;
|
|
|
|
vsc9959_psfp_sfi_set(ocelot, tmp);
|
|
|
|
list_del(&tmp->list);
|
|
|
|
kfree(tmp);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-11-18 10:12:01 +00:00
|
|
|
static void vsc9959_psfp_parse_gate(const struct flow_action_entry *entry,
|
|
|
|
struct felix_stream_gate *sgi)
|
|
|
|
{
|
2021-12-17 18:16:19 +00:00
|
|
|
sgi->index = entry->hw_index;
|
2021-11-18 10:12:01 +00:00
|
|
|
sgi->ipv_valid = (entry->gate.prio < 0) ? 0 : 1;
|
|
|
|
sgi->init_ipv = (sgi->ipv_valid) ? entry->gate.prio : 0;
|
|
|
|
sgi->basetime = entry->gate.basetime;
|
|
|
|
sgi->cycletime = entry->gate.cycletime;
|
|
|
|
sgi->num_entries = entry->gate.num_entries;
|
|
|
|
sgi->enable = 1;
|
|
|
|
|
|
|
|
memcpy(sgi->entries, entry->gate.entries,
|
|
|
|
entry->gate.num_entries * sizeof(struct action_gate_entry));
|
|
|
|
}
|
|
|
|
|
|
|
|
static u32 vsc9959_sgi_cfg_status(struct ocelot *ocelot)
|
|
|
|
{
|
|
|
|
return ocelot_read(ocelot, ANA_SG_ACCESS_CTRL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vsc9959_psfp_sgi_set(struct ocelot *ocelot,
|
|
|
|
struct felix_stream_gate *sgi)
|
|
|
|
{
|
|
|
|
struct action_gate_entry *e;
|
|
|
|
struct timespec64 base_ts;
|
|
|
|
u32 interval_sum = 0;
|
|
|
|
u32 val;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (sgi->index > VSC9959_PSFP_GATE_ID_MAX)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
ocelot_write(ocelot, ANA_SG_ACCESS_CTRL_SGID(sgi->index),
|
|
|
|
ANA_SG_ACCESS_CTRL);
|
|
|
|
|
|
|
|
if (!sgi->enable) {
|
|
|
|
ocelot_rmw(ocelot, ANA_SG_CONFIG_REG_3_INIT_GATE_STATE,
|
|
|
|
ANA_SG_CONFIG_REG_3_INIT_GATE_STATE |
|
|
|
|
ANA_SG_CONFIG_REG_3_GATE_ENABLE,
|
|
|
|
ANA_SG_CONFIG_REG_3);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (sgi->cycletime < VSC9959_PSFP_GATE_CYCLETIME_MIN ||
|
|
|
|
sgi->cycletime > NSEC_PER_SEC)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (sgi->num_entries > VSC9959_PSFP_GATE_LIST_NUM)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
vsc9959_new_base_time(ocelot, sgi->basetime, sgi->cycletime, &base_ts);
|
|
|
|
ocelot_write(ocelot, base_ts.tv_nsec, ANA_SG_CONFIG_REG_1);
|
|
|
|
val = lower_32_bits(base_ts.tv_sec);
|
|
|
|
ocelot_write(ocelot, val, ANA_SG_CONFIG_REG_2);
|
|
|
|
|
|
|
|
val = upper_32_bits(base_ts.tv_sec);
|
|
|
|
ocelot_write(ocelot,
|
|
|
|
(sgi->ipv_valid ? ANA_SG_CONFIG_REG_3_IPV_VALID : 0) |
|
|
|
|
ANA_SG_CONFIG_REG_3_INIT_IPV(sgi->init_ipv) |
|
|
|
|
ANA_SG_CONFIG_REG_3_GATE_ENABLE |
|
|
|
|
ANA_SG_CONFIG_REG_3_LIST_LENGTH(sgi->num_entries) |
|
|
|
|
ANA_SG_CONFIG_REG_3_INIT_GATE_STATE |
|
|
|
|
ANA_SG_CONFIG_REG_3_BASE_TIME_SEC_MSB(val),
|
|
|
|
ANA_SG_CONFIG_REG_3);
|
|
|
|
|
|
|
|
ocelot_write(ocelot, sgi->cycletime, ANA_SG_CONFIG_REG_4);
|
|
|
|
|
|
|
|
e = sgi->entries;
|
|
|
|
for (i = 0; i < sgi->num_entries; i++) {
|
|
|
|
u32 ips = (e[i].ipv < 0) ? 0 : (e[i].ipv + 8);
|
|
|
|
|
|
|
|
ocelot_write_rix(ocelot, ANA_SG_GCL_GS_CONFIG_IPS(ips) |
|
|
|
|
(e[i].gate_state ?
|
|
|
|
ANA_SG_GCL_GS_CONFIG_GATE_STATE : 0),
|
|
|
|
ANA_SG_GCL_GS_CONFIG, i);
|
|
|
|
|
|
|
|
interval_sum += e[i].interval;
|
|
|
|
ocelot_write_rix(ocelot, interval_sum, ANA_SG_GCL_TI_CONFIG, i);
|
|
|
|
}
|
|
|
|
|
|
|
|
ocelot_rmw(ocelot, ANA_SG_ACCESS_CTRL_CONFIG_CHANGE,
|
|
|
|
ANA_SG_ACCESS_CTRL_CONFIG_CHANGE,
|
|
|
|
ANA_SG_ACCESS_CTRL);
|
|
|
|
|
|
|
|
return readx_poll_timeout(vsc9959_sgi_cfg_status, ocelot, val,
|
|
|
|
(!(ANA_SG_ACCESS_CTRL_CONFIG_CHANGE & val)),
|
|
|
|
10, 100000);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vsc9959_psfp_sgi_table_add(struct ocelot *ocelot,
|
|
|
|
struct felix_stream_gate *sgi)
|
|
|
|
{
|
|
|
|
struct felix_stream_gate_entry *tmp;
|
|
|
|
struct ocelot_psfp_list *psfp;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
psfp = &ocelot->psfp;
|
|
|
|
|
|
|
|
list_for_each_entry(tmp, &psfp->sgi_list, list)
|
|
|
|
if (tmp->index == sgi->index) {
|
|
|
|
refcount_inc(&tmp->refcount);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
|
|
|
|
if (!tmp)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
ret = vsc9959_psfp_sgi_set(ocelot, sgi);
|
|
|
|
if (ret) {
|
|
|
|
kfree(tmp);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
tmp->index = sgi->index;
|
|
|
|
refcount_set(&tmp->refcount, 1);
|
|
|
|
list_add_tail(&tmp->list, &psfp->sgi_list);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vsc9959_psfp_sgi_table_del(struct ocelot *ocelot,
|
|
|
|
u32 index)
|
|
|
|
{
|
|
|
|
struct felix_stream_gate_entry *tmp, *n;
|
|
|
|
struct felix_stream_gate sgi = {0};
|
|
|
|
struct ocelot_psfp_list *psfp;
|
|
|
|
u8 z;
|
|
|
|
|
|
|
|
psfp = &ocelot->psfp;
|
|
|
|
|
|
|
|
list_for_each_entry_safe(tmp, n, &psfp->sgi_list, list)
|
|
|
|
if (tmp->index == index) {
|
|
|
|
z = refcount_dec_and_test(&tmp->refcount);
|
|
|
|
if (z) {
|
|
|
|
sgi.index = index;
|
|
|
|
sgi.enable = 0;
|
|
|
|
vsc9959_psfp_sgi_set(ocelot, &sgi);
|
|
|
|
list_del(&tmp->list);
|
|
|
|
kfree(tmp);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
static int vsc9959_psfp_filter_add(struct ocelot *ocelot, int port,
|
2021-11-18 10:12:00 +00:00
|
|
|
struct flow_cls_offload *f)
|
|
|
|
{
|
|
|
|
struct netlink_ext_ack *extack = f->common.extack;
|
2021-11-18 10:12:04 +00:00
|
|
|
struct felix_stream_filter old_sfi, *sfi_entry;
|
2021-11-18 10:12:00 +00:00
|
|
|
struct felix_stream_filter sfi = {0};
|
|
|
|
const struct flow_action_entry *a;
|
|
|
|
struct felix_stream *stream_entry;
|
|
|
|
struct felix_stream stream = {0};
|
2021-11-18 10:12:01 +00:00
|
|
|
struct felix_stream_gate *sgi;
|
2021-11-18 10:12:00 +00:00
|
|
|
struct ocelot_psfp_list *psfp;
|
2021-11-18 10:12:03 +00:00
|
|
|
struct ocelot_policer pol;
|
2021-11-18 10:12:01 +00:00
|
|
|
int ret, i, size;
|
2021-11-18 10:12:03 +00:00
|
|
|
u64 rate, burst;
|
|
|
|
u32 index;
|
2021-11-18 10:12:00 +00:00
|
|
|
|
|
|
|
psfp = &ocelot->psfp;
|
|
|
|
|
|
|
|
ret = vsc9959_stream_identify(f, &stream);
|
|
|
|
if (ret) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Only can match on VID, PCP, and dest MAC");
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
mutex_lock(&psfp->lock);
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
flow_action_for_each(i, a, &f->rule->action) {
|
|
|
|
switch (a->id) {
|
|
|
|
case FLOW_ACTION_GATE:
|
2021-11-18 10:12:01 +00:00
|
|
|
size = struct_size(sgi, entries, a->gate.num_entries);
|
|
|
|
sgi = kzalloc(size, GFP_KERNEL);
|
2022-03-29 09:08:00 +00:00
|
|
|
if (!sgi) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto err;
|
|
|
|
}
|
2021-11-18 10:12:01 +00:00
|
|
|
vsc9959_psfp_parse_gate(a, sgi);
|
|
|
|
ret = vsc9959_psfp_sgi_table_add(ocelot, sgi);
|
|
|
|
if (ret) {
|
|
|
|
kfree(sgi);
|
2021-11-18 10:12:03 +00:00
|
|
|
goto err;
|
2021-11-18 10:12:01 +00:00
|
|
|
}
|
|
|
|
sfi.sg_valid = 1;
|
|
|
|
sfi.sgid = sgi->index;
|
|
|
|
kfree(sgi);
|
|
|
|
break;
|
2021-11-18 10:12:00 +00:00
|
|
|
case FLOW_ACTION_POLICE:
|
2021-12-17 18:16:19 +00:00
|
|
|
index = a->hw_index + VSC9959_PSFP_POLICER_BASE;
|
2021-11-18 10:12:03 +00:00
|
|
|
if (index > VSC9959_PSFP_POLICER_MAX) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
rate = a->police.rate_bytes_ps;
|
|
|
|
burst = rate * PSCHED_NS2TICKS(a->police.burst);
|
|
|
|
pol = (struct ocelot_policer) {
|
|
|
|
.burst = div_u64(burst, PSCHED_TICKS_PER_SEC),
|
|
|
|
.rate = div_u64(rate, 1000) * 8,
|
|
|
|
};
|
|
|
|
ret = ocelot_vcap_policer_add(ocelot, index, &pol);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
sfi.fm_valid = 1;
|
|
|
|
sfi.fmid = index;
|
|
|
|
sfi.maxsdu = a->police.mtu;
|
|
|
|
break;
|
2021-11-18 10:12:00 +00:00
|
|
|
default:
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
mutex_unlock(&psfp->lock);
|
2021-11-18 10:12:00 +00:00
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
stream.ports = BIT(port);
|
|
|
|
stream.port = port;
|
2021-11-18 10:12:00 +00:00
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
sfi.portmask = stream.ports;
|
2021-11-18 10:12:00 +00:00
|
|
|
sfi.prio_valid = (stream.prio < 0 ? 0 : 1);
|
|
|
|
sfi.prio = (sfi.prio_valid ? stream.prio : 0);
|
|
|
|
sfi.enable = 1;
|
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
/* Check if stream is set. */
|
|
|
|
stream_entry = vsc9959_stream_table_lookup(&psfp->stream_list, &stream);
|
|
|
|
if (stream_entry) {
|
|
|
|
if (stream_entry->ports & BIT(port)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
|
|
|
"The stream is added on this port");
|
|
|
|
ret = -EEXIST;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (stream_entry->ports != BIT(stream_entry->port)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
|
|
|
"The stream is added on two ports");
|
|
|
|
ret = -EEXIST;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
stream_entry->ports |= BIT(port);
|
|
|
|
stream.ports = stream_entry->ports;
|
|
|
|
|
|
|
|
sfi_entry = vsc9959_psfp_sfi_table_get(&psfp->sfi_list,
|
|
|
|
stream_entry->sfid);
|
|
|
|
memcpy(&old_sfi, sfi_entry, sizeof(old_sfi));
|
|
|
|
|
|
|
|
vsc9959_psfp_sfi_table_del(ocelot, stream_entry->sfid);
|
|
|
|
|
|
|
|
old_sfi.portmask = stream_entry->ports;
|
|
|
|
sfi.portmask = stream.ports;
|
|
|
|
|
|
|
|
if (stream_entry->port > port) {
|
|
|
|
ret = vsc9959_psfp_sfi_table_add2(ocelot, &sfi,
|
|
|
|
&old_sfi);
|
|
|
|
stream_entry->dummy = true;
|
|
|
|
} else {
|
|
|
|
ret = vsc9959_psfp_sfi_table_add2(ocelot, &old_sfi,
|
|
|
|
&sfi);
|
|
|
|
stream.dummy = true;
|
|
|
|
}
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
stream_entry->sfid = old_sfi.index;
|
|
|
|
} else {
|
|
|
|
ret = vsc9959_psfp_sfi_table_add(ocelot, &sfi);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
}
|
2021-11-18 10:12:00 +00:00
|
|
|
|
|
|
|
stream.sfid = sfi.index;
|
|
|
|
stream.sfid_valid = 1;
|
|
|
|
ret = vsc9959_stream_table_add(ocelot, &psfp->stream_list,
|
|
|
|
&stream, extack);
|
2021-11-18 10:12:01 +00:00
|
|
|
if (ret) {
|
2021-11-18 10:12:00 +00:00
|
|
|
vsc9959_psfp_sfi_table_del(ocelot, stream.sfid);
|
2021-11-18 10:12:01 +00:00
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
mutex_unlock(&psfp->lock);
|
|
|
|
|
2021-11-18 10:12:01 +00:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
err:
|
|
|
|
if (sfi.sg_valid)
|
|
|
|
vsc9959_psfp_sgi_table_del(ocelot, sfi.sgid);
|
2021-11-18 10:12:00 +00:00
|
|
|
|
2021-11-18 10:12:03 +00:00
|
|
|
if (sfi.fm_valid)
|
|
|
|
ocelot_vcap_policer_del(ocelot, sfi.fmid);
|
|
|
|
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
mutex_unlock(&psfp->lock);
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vsc9959_psfp_filter_del(struct ocelot *ocelot,
|
|
|
|
struct flow_cls_offload *f)
|
|
|
|
{
|
2021-11-18 10:12:04 +00:00
|
|
|
struct felix_stream *stream, tmp, *stream_entry;
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
struct ocelot_psfp_list *psfp = &ocelot->psfp;
|
2021-11-18 10:12:01 +00:00
|
|
|
static struct felix_stream_filter *sfi;
|
2021-11-18 10:12:00 +00:00
|
|
|
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
mutex_lock(&psfp->lock);
|
2021-11-18 10:12:00 +00:00
|
|
|
|
|
|
|
stream = vsc9959_stream_table_get(&psfp->stream_list, f->cookie);
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
if (!stream) {
|
|
|
|
mutex_unlock(&psfp->lock);
|
2021-11-18 10:12:00 +00:00
|
|
|
return -ENOMEM;
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
}
|
2021-11-18 10:12:00 +00:00
|
|
|
|
2021-11-18 10:12:01 +00:00
|
|
|
sfi = vsc9959_psfp_sfi_table_get(&psfp->sfi_list, stream->sfid);
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
if (!sfi) {
|
|
|
|
mutex_unlock(&psfp->lock);
|
2021-11-18 10:12:01 +00:00
|
|
|
return -ENOMEM;
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
}
|
2021-11-18 10:12:01 +00:00
|
|
|
|
|
|
|
if (sfi->sg_valid)
|
|
|
|
vsc9959_psfp_sgi_table_del(ocelot, sfi->sgid);
|
|
|
|
|
2021-11-18 10:12:03 +00:00
|
|
|
if (sfi->fm_valid)
|
|
|
|
ocelot_vcap_policer_del(ocelot, sfi->fmid);
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
vsc9959_psfp_sfi_table_del(ocelot, stream->sfid);
|
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
memcpy(&tmp, stream, sizeof(tmp));
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
stream->sfid_valid = 0;
|
|
|
|
vsc9959_stream_table_del(ocelot, stream);
|
|
|
|
|
2021-11-18 10:12:04 +00:00
|
|
|
stream_entry = vsc9959_stream_table_lookup(&psfp->stream_list, &tmp);
|
|
|
|
if (stream_entry) {
|
|
|
|
stream_entry->ports = BIT(stream_entry->port);
|
|
|
|
if (stream_entry->dummy) {
|
|
|
|
stream_entry->dummy = false;
|
|
|
|
vsc9959_mact_stream_set(ocelot, stream_entry, NULL);
|
|
|
|
}
|
|
|
|
vsc9959_psfp_sfidmask_set(ocelot, stream_entry->sfid,
|
|
|
|
stream_entry->ports);
|
|
|
|
}
|
|
|
|
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
mutex_unlock(&psfp->lock);
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
static void vsc9959_update_sfid_stats(struct ocelot *ocelot,
|
|
|
|
struct felix_stream_filter *sfi)
|
|
|
|
{
|
|
|
|
struct felix_stream_filter_counters *s = &sfi->stats;
|
|
|
|
u32 match, not_pass_gate, not_pass_sdu, red;
|
|
|
|
u32 sfid = sfi->index;
|
|
|
|
|
|
|
|
lockdep_assert_held(&ocelot->stat_view_lock);
|
|
|
|
|
|
|
|
ocelot_rmw(ocelot, SYS_STAT_CFG_STAT_VIEW(sfid),
|
|
|
|
SYS_STAT_CFG_STAT_VIEW_M,
|
|
|
|
SYS_STAT_CFG);
|
|
|
|
|
|
|
|
match = ocelot_read(ocelot, SYS_COUNT_SF_MATCHING_FRAMES);
|
|
|
|
not_pass_gate = ocelot_read(ocelot, SYS_COUNT_SF_NOT_PASSING_FRAMES);
|
|
|
|
not_pass_sdu = ocelot_read(ocelot, SYS_COUNT_SF_NOT_PASSING_SDU);
|
|
|
|
red = ocelot_read(ocelot, SYS_COUNT_SF_RED_FRAMES);
|
|
|
|
|
|
|
|
/* Clear the PSFP counter. */
|
|
|
|
ocelot_write(ocelot,
|
|
|
|
SYS_STAT_CFG_STAT_VIEW(sfid) |
|
|
|
|
SYS_STAT_CFG_STAT_CLEAR_SHOT(0x10),
|
|
|
|
SYS_STAT_CFG);
|
|
|
|
|
|
|
|
s->match += match;
|
|
|
|
s->not_pass_gate += not_pass_gate;
|
|
|
|
s->not_pass_sdu += not_pass_sdu;
|
|
|
|
s->red += red;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Caller must hold &ocelot->stat_view_lock */
|
|
|
|
static void vsc9959_update_stats(struct ocelot *ocelot)
|
|
|
|
{
|
|
|
|
struct ocelot_psfp_list *psfp = &ocelot->psfp;
|
|
|
|
struct felix_stream_filter *sfi;
|
|
|
|
|
|
|
|
mutex_lock(&psfp->lock);
|
|
|
|
|
|
|
|
list_for_each_entry(sfi, &psfp->sfi_list, list)
|
|
|
|
vsc9959_update_sfid_stats(ocelot, sfi);
|
|
|
|
|
|
|
|
mutex_unlock(&psfp->lock);
|
|
|
|
}
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
static int vsc9959_psfp_stats_get(struct ocelot *ocelot,
|
|
|
|
struct flow_cls_offload *f,
|
|
|
|
struct flow_stats *stats)
|
|
|
|
{
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
struct ocelot_psfp_list *psfp = &ocelot->psfp;
|
|
|
|
struct felix_stream_filter_counters *s;
|
|
|
|
static struct felix_stream_filter *sfi;
|
2021-11-18 10:12:00 +00:00
|
|
|
struct felix_stream *stream;
|
|
|
|
|
|
|
|
stream = vsc9959_stream_table_get(&psfp->stream_list, f->cookie);
|
|
|
|
if (!stream)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
sfi = vsc9959_psfp_sfi_table_get(&psfp->sfi_list, stream->sfid);
|
|
|
|
if (!sfi)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
mutex_lock(&ocelot->stat_view_lock);
|
|
|
|
|
|
|
|
vsc9959_update_sfid_stats(ocelot, sfi);
|
|
|
|
|
|
|
|
s = &sfi->stats;
|
|
|
|
stats->pkts = s->match;
|
|
|
|
stats->drops = s->not_pass_gate + s->not_pass_sdu + s->red;
|
2021-11-18 10:12:00 +00:00
|
|
|
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
memset(s, 0, sizeof(*s));
|
|
|
|
|
|
|
|
mutex_unlock(&ocelot->stat_view_lock);
|
2021-11-18 10:12:00 +00:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void vsc9959_psfp_init(struct ocelot *ocelot)
|
|
|
|
{
|
|
|
|
struct ocelot_psfp_list *psfp = &ocelot->psfp;
|
|
|
|
|
|
|
|
INIT_LIST_HEAD(&psfp->stream_list);
|
|
|
|
INIT_LIST_HEAD(&psfp->sfi_list);
|
|
|
|
INIT_LIST_HEAD(&psfp->sgi_list);
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
mutex_init(&psfp->lock);
|
2021-11-18 10:12:00 +00:00
|
|
|
}
|
|
|
|
|
net: dsa: felix: enable cut-through forwarding between ports by default
The VSC9959 switch embedded within NXP LS1028A (and that version of
Ocelot switches only) supports cut-through forwarding - meaning it can
start the process of looking up the destination ports for a packet, and
forward towards those ports, before the entire packet has been received
(as opposed to the store-and-forward mode).
The up side is having lower forwarding latency for large packets. The
down side is that frames with FCS errors are forwarded instead of being
dropped. However, erroneous frames do not result in incorrect updates of
the FDB or incorrect policer updates, since these processes are deferred
inside the switch to the end of frame. Since the switch starts the
cut-through forwarding process after all packet headers (including IP,
if any) have been processed, packets with large headers and small
payload do not see the benefit of lower forwarding latency.
There are two cases that need special attention.
The first is when a packet is multicast (or flooded) to multiple
destinations, one of which doesn't have cut-through forwarding enabled.
The switch deals with this automatically by disabling cut-through
forwarding for the frame towards all destination ports.
The second is when a packet is forwarded from a port of lower link speed
towards a port of higher link speed. This is not handled by the hardware
and needs software intervention.
Since we practically need to update the cut-through forwarding domain
from paths that aren't serialized by the rtnl_mutex (phylink
mac_link_down/mac_link_up ops), this means we need to serialize physical
link events with user space updates of bonding/bridging domains.
Enabling cut-through forwarding is done per {egress port, traffic class}.
I don't see any reason why this would be a configurable option as long
as it works without issues, and there doesn't appear to be any user
space configuration tool to toggle this on/off, so this patch enables
cut-through forwarding on all eligible ports and traffic classes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 12:58:08 +00:00
|
|
|
/* When using cut-through forwarding and the egress port runs at a higher data
|
|
|
|
* rate than the ingress port, the packet currently under transmission would
|
|
|
|
* suffer an underrun since it would be transmitted faster than it is received.
|
|
|
|
* The Felix switch implementation of cut-through forwarding does not check in
|
|
|
|
* hardware whether this condition is satisfied or not, so we must restrict the
|
|
|
|
* list of ports that have cut-through forwarding enabled on egress to only be
|
|
|
|
* the ports operating at the lowest link speed within their respective
|
|
|
|
* forwarding domain.
|
|
|
|
*/
|
|
|
|
static void vsc9959_cut_through_fwd(struct ocelot *ocelot)
|
|
|
|
{
|
|
|
|
struct felix *felix = ocelot_to_felix(ocelot);
|
|
|
|
struct dsa_switch *ds = felix->ds;
|
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.
Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.
Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.
There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:24 +00:00
|
|
|
int tc, port, other_port;
|
net: dsa: felix: enable cut-through forwarding between ports by default
The VSC9959 switch embedded within NXP LS1028A (and that version of
Ocelot switches only) supports cut-through forwarding - meaning it can
start the process of looking up the destination ports for a packet, and
forward towards those ports, before the entire packet has been received
(as opposed to the store-and-forward mode).
The up side is having lower forwarding latency for large packets. The
down side is that frames with FCS errors are forwarded instead of being
dropped. However, erroneous frames do not result in incorrect updates of
the FDB or incorrect policer updates, since these processes are deferred
inside the switch to the end of frame. Since the switch starts the
cut-through forwarding process after all packet headers (including IP,
if any) have been processed, packets with large headers and small
payload do not see the benefit of lower forwarding latency.
There are two cases that need special attention.
The first is when a packet is multicast (or flooded) to multiple
destinations, one of which doesn't have cut-through forwarding enabled.
The switch deals with this automatically by disabling cut-through
forwarding for the frame towards all destination ports.
The second is when a packet is forwarded from a port of lower link speed
towards a port of higher link speed. This is not handled by the hardware
and needs software intervention.
Since we practically need to update the cut-through forwarding domain
from paths that aren't serialized by the rtnl_mutex (phylink
mac_link_down/mac_link_up ops), this means we need to serialize physical
link events with user space updates of bonding/bridging domains.
Enabling cut-through forwarding is done per {egress port, traffic class}.
I don't see any reason why this would be a configurable option as long
as it works without issues, and there doesn't appear to be any user
space configuration tool to toggle this on/off, so this patch enables
cut-through forwarding on all eligible ports and traffic classes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 12:58:08 +00:00
|
|
|
|
|
|
|
lockdep_assert_held(&ocelot->fwd_domain_lock);
|
|
|
|
|
|
|
|
for (port = 0; port < ocelot->num_phys_ports; port++) {
|
|
|
|
struct ocelot_port *ocelot_port = ocelot->ports[port];
|
|
|
|
int min_speed = ocelot_port->speed;
|
|
|
|
unsigned long mask = 0;
|
|
|
|
u32 tmp, val = 0;
|
|
|
|
|
|
|
|
/* Disable cut-through on ports that are down */
|
|
|
|
if (ocelot_port->speed <= 0)
|
|
|
|
goto set;
|
|
|
|
|
|
|
|
if (dsa_is_cpu_port(ds, port)) {
|
|
|
|
/* Ocelot switches forward from the NPI port towards
|
|
|
|
* any port, regardless of it being in the NPI port's
|
|
|
|
* forwarding domain or not.
|
|
|
|
*/
|
|
|
|
mask = dsa_user_ports(ds);
|
|
|
|
} else {
|
|
|
|
mask = ocelot_get_bridge_fwd_mask(ocelot, port);
|
|
|
|
mask &= ~BIT(port);
|
|
|
|
if (ocelot->npi >= 0)
|
|
|
|
mask |= BIT(ocelot->npi);
|
|
|
|
else
|
2022-05-21 21:37:42 +00:00
|
|
|
mask |= ocelot_port_assigned_dsa_8021q_cpu_mask(ocelot,
|
|
|
|
port);
|
net: dsa: felix: enable cut-through forwarding between ports by default
The VSC9959 switch embedded within NXP LS1028A (and that version of
Ocelot switches only) supports cut-through forwarding - meaning it can
start the process of looking up the destination ports for a packet, and
forward towards those ports, before the entire packet has been received
(as opposed to the store-and-forward mode).
The up side is having lower forwarding latency for large packets. The
down side is that frames with FCS errors are forwarded instead of being
dropped. However, erroneous frames do not result in incorrect updates of
the FDB or incorrect policer updates, since these processes are deferred
inside the switch to the end of frame. Since the switch starts the
cut-through forwarding process after all packet headers (including IP,
if any) have been processed, packets with large headers and small
payload do not see the benefit of lower forwarding latency.
There are two cases that need special attention.
The first is when a packet is multicast (or flooded) to multiple
destinations, one of which doesn't have cut-through forwarding enabled.
The switch deals with this automatically by disabling cut-through
forwarding for the frame towards all destination ports.
The second is when a packet is forwarded from a port of lower link speed
towards a port of higher link speed. This is not handled by the hardware
and needs software intervention.
Since we practically need to update the cut-through forwarding domain
from paths that aren't serialized by the rtnl_mutex (phylink
mac_link_down/mac_link_up ops), this means we need to serialize physical
link events with user space updates of bonding/bridging domains.
Enabling cut-through forwarding is done per {egress port, traffic class}.
I don't see any reason why this would be a configurable option as long
as it works without issues, and there doesn't appear to be any user
space configuration tool to toggle this on/off, so this patch enables
cut-through forwarding on all eligible ports and traffic classes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 12:58:08 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Calculate the minimum link speed, among the ports that are
|
|
|
|
* up, of this source port's forwarding domain.
|
|
|
|
*/
|
|
|
|
for_each_set_bit(other_port, &mask, ocelot->num_phys_ports) {
|
|
|
|
struct ocelot_port *other_ocelot_port;
|
|
|
|
|
|
|
|
other_ocelot_port = ocelot->ports[other_port];
|
|
|
|
if (other_ocelot_port->speed <= 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (min_speed > other_ocelot_port->speed)
|
|
|
|
min_speed = other_ocelot_port->speed;
|
|
|
|
}
|
|
|
|
|
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.
Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.
Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.
There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:24 +00:00
|
|
|
/* Enable cut-through forwarding for all traffic classes that
|
|
|
|
* don't have oversized dropping enabled, since this check is
|
|
|
|
* bypassed in cut-through mode.
|
|
|
|
*/
|
|
|
|
if (ocelot_port->speed == min_speed) {
|
net: dsa: felix: enable cut-through forwarding between ports by default
The VSC9959 switch embedded within NXP LS1028A (and that version of
Ocelot switches only) supports cut-through forwarding - meaning it can
start the process of looking up the destination ports for a packet, and
forward towards those ports, before the entire packet has been received
(as opposed to the store-and-forward mode).
The up side is having lower forwarding latency for large packets. The
down side is that frames with FCS errors are forwarded instead of being
dropped. However, erroneous frames do not result in incorrect updates of
the FDB or incorrect policer updates, since these processes are deferred
inside the switch to the end of frame. Since the switch starts the
cut-through forwarding process after all packet headers (including IP,
if any) have been processed, packets with large headers and small
payload do not see the benefit of lower forwarding latency.
There are two cases that need special attention.
The first is when a packet is multicast (or flooded) to multiple
destinations, one of which doesn't have cut-through forwarding enabled.
The switch deals with this automatically by disabling cut-through
forwarding for the frame towards all destination ports.
The second is when a packet is forwarded from a port of lower link speed
towards a port of higher link speed. This is not handled by the hardware
and needs software intervention.
Since we practically need to update the cut-through forwarding domain
from paths that aren't serialized by the rtnl_mutex (phylink
mac_link_down/mac_link_up ops), this means we need to serialize physical
link events with user space updates of bonding/bridging domains.
Enabling cut-through forwarding is done per {egress port, traffic class}.
I don't see any reason why this would be a configurable option as long
as it works without issues, and there doesn't appear to be any user
space configuration tool to toggle this on/off, so this patch enables
cut-through forwarding on all eligible ports and traffic classes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 12:58:08 +00:00
|
|
|
val = GENMASK(7, 0);
|
|
|
|
|
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.
Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.
Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.
There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:24 +00:00
|
|
|
for (tc = 0; tc < OCELOT_NUM_TC; tc++)
|
|
|
|
if (vsc9959_port_qmaxsdu_get(ocelot, port, tc))
|
|
|
|
val &= ~BIT(tc);
|
|
|
|
}
|
|
|
|
|
net: dsa: felix: enable cut-through forwarding between ports by default
The VSC9959 switch embedded within NXP LS1028A (and that version of
Ocelot switches only) supports cut-through forwarding - meaning it can
start the process of looking up the destination ports for a packet, and
forward towards those ports, before the entire packet has been received
(as opposed to the store-and-forward mode).
The up side is having lower forwarding latency for large packets. The
down side is that frames with FCS errors are forwarded instead of being
dropped. However, erroneous frames do not result in incorrect updates of
the FDB or incorrect policer updates, since these processes are deferred
inside the switch to the end of frame. Since the switch starts the
cut-through forwarding process after all packet headers (including IP,
if any) have been processed, packets with large headers and small
payload do not see the benefit of lower forwarding latency.
There are two cases that need special attention.
The first is when a packet is multicast (or flooded) to multiple
destinations, one of which doesn't have cut-through forwarding enabled.
The switch deals with this automatically by disabling cut-through
forwarding for the frame towards all destination ports.
The second is when a packet is forwarded from a port of lower link speed
towards a port of higher link speed. This is not handled by the hardware
and needs software intervention.
Since we practically need to update the cut-through forwarding domain
from paths that aren't serialized by the rtnl_mutex (phylink
mac_link_down/mac_link_up ops), this means we need to serialize physical
link events with user space updates of bonding/bridging domains.
Enabling cut-through forwarding is done per {egress port, traffic class}.
I don't see any reason why this would be a configurable option as long
as it works without issues, and there doesn't appear to be any user
space configuration tool to toggle this on/off, so this patch enables
cut-through forwarding on all eligible ports and traffic classes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 12:58:08 +00:00
|
|
|
set:
|
|
|
|
tmp = ocelot_read_rix(ocelot, ANA_CUT_THRU_CFG, port);
|
|
|
|
if (tmp == val)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
dev_dbg(ocelot->dev,
|
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.
Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.
Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.
There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:24 +00:00
|
|
|
"port %d fwd mask 0x%lx speed %d min_speed %d, %s cut-through forwarding on TC mask 0x%x\n",
|
net: dsa: felix: enable cut-through forwarding between ports by default
The VSC9959 switch embedded within NXP LS1028A (and that version of
Ocelot switches only) supports cut-through forwarding - meaning it can
start the process of looking up the destination ports for a packet, and
forward towards those ports, before the entire packet has been received
(as opposed to the store-and-forward mode).
The up side is having lower forwarding latency for large packets. The
down side is that frames with FCS errors are forwarded instead of being
dropped. However, erroneous frames do not result in incorrect updates of
the FDB or incorrect policer updates, since these processes are deferred
inside the switch to the end of frame. Since the switch starts the
cut-through forwarding process after all packet headers (including IP,
if any) have been processed, packets with large headers and small
payload do not see the benefit of lower forwarding latency.
There are two cases that need special attention.
The first is when a packet is multicast (or flooded) to multiple
destinations, one of which doesn't have cut-through forwarding enabled.
The switch deals with this automatically by disabling cut-through
forwarding for the frame towards all destination ports.
The second is when a packet is forwarded from a port of lower link speed
towards a port of higher link speed. This is not handled by the hardware
and needs software intervention.
Since we practically need to update the cut-through forwarding domain
from paths that aren't serialized by the rtnl_mutex (phylink
mac_link_down/mac_link_up ops), this means we need to serialize physical
link events with user space updates of bonding/bridging domains.
Enabling cut-through forwarding is done per {egress port, traffic class}.
I don't see any reason why this would be a configurable option as long
as it works without issues, and there doesn't appear to be any user
space configuration tool to toggle this on/off, so this patch enables
cut-through forwarding on all eligible ports and traffic classes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 12:58:08 +00:00
|
|
|
port, mask, ocelot_port->speed, min_speed,
|
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.
Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.
Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.
There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 17:01:24 +00:00
|
|
|
val ? "enabling" : "disabling", val);
|
net: dsa: felix: enable cut-through forwarding between ports by default
The VSC9959 switch embedded within NXP LS1028A (and that version of
Ocelot switches only) supports cut-through forwarding - meaning it can
start the process of looking up the destination ports for a packet, and
forward towards those ports, before the entire packet has been received
(as opposed to the store-and-forward mode).
The up side is having lower forwarding latency for large packets. The
down side is that frames with FCS errors are forwarded instead of being
dropped. However, erroneous frames do not result in incorrect updates of
the FDB or incorrect policer updates, since these processes are deferred
inside the switch to the end of frame. Since the switch starts the
cut-through forwarding process after all packet headers (including IP,
if any) have been processed, packets with large headers and small
payload do not see the benefit of lower forwarding latency.
There are two cases that need special attention.
The first is when a packet is multicast (or flooded) to multiple
destinations, one of which doesn't have cut-through forwarding enabled.
The switch deals with this automatically by disabling cut-through
forwarding for the frame towards all destination ports.
The second is when a packet is forwarded from a port of lower link speed
towards a port of higher link speed. This is not handled by the hardware
and needs software intervention.
Since we practically need to update the cut-through forwarding domain
from paths that aren't serialized by the rtnl_mutex (phylink
mac_link_down/mac_link_up ops), this means we need to serialize physical
link events with user space updates of bonding/bridging domains.
Enabling cut-through forwarding is done per {egress port, traffic class}.
I don't see any reason why this would be a configurable option as long
as it works without issues, and there doesn't appear to be any user
space configuration tool to toggle this on/off, so this patch enables
cut-through forwarding on all eligible ports and traffic classes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 12:58:08 +00:00
|
|
|
|
|
|
|
ocelot_write_rix(ocelot, val, ANA_CUT_THRU_CFG, port);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-11-18 10:12:00 +00:00
|
|
|
static const struct ocelot_ops vsc9959_ops = {
|
|
|
|
.reset = vsc9959_reset,
|
|
|
|
.wm_enc = vsc9959_wm_enc,
|
|
|
|
.wm_dec = vsc9959_wm_dec,
|
|
|
|
.wm_stat = vsc9959_wm_stat,
|
|
|
|
.port_to_netdev = felix_port_to_netdev,
|
|
|
|
.netdev_to_port = felix_netdev_to_port,
|
|
|
|
.psfp_init = vsc9959_psfp_init,
|
|
|
|
.psfp_filter_add = vsc9959_psfp_filter_add,
|
|
|
|
.psfp_filter_del = vsc9959_psfp_filter_del,
|
|
|
|
.psfp_stats_get = vsc9959_psfp_stats_get,
|
net: dsa: felix: enable cut-through forwarding between ports by default
The VSC9959 switch embedded within NXP LS1028A (and that version of
Ocelot switches only) supports cut-through forwarding - meaning it can
start the process of looking up the destination ports for a packet, and
forward towards those ports, before the entire packet has been received
(as opposed to the store-and-forward mode).
The up side is having lower forwarding latency for large packets. The
down side is that frames with FCS errors are forwarded instead of being
dropped. However, erroneous frames do not result in incorrect updates of
the FDB or incorrect policer updates, since these processes are deferred
inside the switch to the end of frame. Since the switch starts the
cut-through forwarding process after all packet headers (including IP,
if any) have been processed, packets with large headers and small
payload do not see the benefit of lower forwarding latency.
There are two cases that need special attention.
The first is when a packet is multicast (or flooded) to multiple
destinations, one of which doesn't have cut-through forwarding enabled.
The switch deals with this automatically by disabling cut-through
forwarding for the frame towards all destination ports.
The second is when a packet is forwarded from a port of lower link speed
towards a port of higher link speed. This is not handled by the hardware
and needs software intervention.
Since we practically need to update the cut-through forwarding domain
from paths that aren't serialized by the rtnl_mutex (phylink
mac_link_down/mac_link_up ops), this means we need to serialize physical
link events with user space updates of bonding/bridging domains.
Enabling cut-through forwarding is done per {egress port, traffic class}.
I don't see any reason why this would be a configurable option as long
as it works without issues, and there doesn't appear to be any user
space configuration tool to toggle this on/off, so this patch enables
cut-through forwarding on all eligible ports and traffic classes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-25 12:58:08 +00:00
|
|
|
.cut_through_fwd = vsc9959_cut_through_fwd,
|
2022-06-17 03:24:23 +00:00
|
|
|
.tas_clock_adjust = vsc9959_tas_clock_adjust,
|
net: dsa: felix: check the 32-bit PSFP stats against overflow
The Felix PSFP counters suffer from the same problem as the ocelot
ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and
this can easily go undetected.
Add a custom hook in ocelot_check_stats_work() through which driver
specific actions can be taken, and update the stats for the existing
PSFP filters from that hook.
Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were
serialized with respect to each other via rtnl_lock(). However, with the
new entry point into &psfp->sfi_list coming from the periodic worker, we
now need an explicit mutex to serialize access to these lists.
We used to keep a struct felix_stream_filter_counters on stack, through
which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would
retrieve data from vsc9959_psfp_counters_get(). We need to become
smarter about that in 3 ways:
- we need to keep a persistent set of counters for each stream instead
of keeping them on stack
- we need to promote those counters from u32 to u64, and create a
procedure that properly keeps 64-bit counters. Since we clear the
hardware counters anyway, and we poll every 2 seconds, a simple
increment of a u64 counter with a u32 value will perfectly do the job.
- FLOW_CLS_STATS also expect incremental counters, so we also need to
zeroize our u64 counters every time sch_flower calls us
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08 16:48:05 +00:00
|
|
|
.update_stats = vsc9959_update_stats,
|
2021-11-18 10:12:00 +00:00
|
|
|
};
|
|
|
|
|
2020-07-13 16:57:09 +00:00
|
|
|
static const struct felix_info felix_info_vsc9959 = {
|
net: dsa: felix: update regmap requests to be string-based
Existing felix DSA drivers (vsc9959, vsc9953) are all switches that were
integrated in NXP SoCs, which makes them a bit unusual compared to the
usual Microchip branded Ocelot switches.
To be precise, looking at
Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml, one can
see 21 memory regions for the "switch" node, and these correspond to the
"targets" of the switch IP, which are spread throughout the guts of that
SoC's memory space.
In NXP integrations, those targets still exist, but they were condensed
within a single memory region, with no other peripheral in between them,
so it made more sense for the driver to ioremap the entire memory space
of the switch, and then find the targets within that memory space via
some offsets hardcoded in the driver.
The effect of this design decision is that now, the felix driver expects
hardware instantiations to provide their own resource definitions, which
is kind of odd when considering a typical device (those are retrieved
from 'reg' properties in the device tree, using platform_get_resource()
or similar).
Allow other hardware instantiations that share the felix driver to not
provide a hardcoded array of resources in the future. Instead, make the
common denominator based on which regmaps are created be just the
resource "names". Each instantiation comes with its own array of names
that are mandatory for it, and with an optional array of resources.
So we split the resources in 2 arrays, one is what's requested and the
other is what's provided. There is one pool of provided resources, in
felix->info->resources (of length felix->info->num_resources). There are
2 different ways of requesting a resource. One is by enum ocelot_target
(this handles the global regmaps), and one is by int port (this handles
the per-port ones).
For the existing vsc9959 and vsc9953, it would be a bit stupid to
request something that's not provided, given that the 2 arrays are both
defined in the same place.
The advantage is that we can now modify felix_request_regmap_by_name()
to make felix->info->resources[] optional, and if absent, the
implementation can call dev_get_regmap() and this is something that is
compatible with MFD.
Co-developed-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-27 19:15:20 +00:00
|
|
|
.resources = vsc9959_resources,
|
|
|
|
.num_resources = ARRAY_SIZE(vsc9959_resources),
|
|
|
|
.resource_names = vsc9959_resource_names,
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
.regfields = vsc9959_regfields,
|
|
|
|
.map = vsc9959_regmap,
|
|
|
|
.ops = &vsc9959_ops,
|
2020-02-29 14:31:14 +00:00
|
|
|
.vcap = vsc9959_vcap_props,
|
2021-11-18 10:12:02 +00:00
|
|
|
.vcap_pol_base = VSC9959_VCAP_POLICER_BASE,
|
|
|
|
.vcap_pol_max = VSC9959_VCAP_POLICER_MAX,
|
|
|
|
.vcap_pol_base2 = 0,
|
|
|
|
.vcap_pol_max2 = 0,
|
2020-05-03 22:20:26 +00:00
|
|
|
.num_mact_rows = 2048,
|
2022-02-26 22:36:50 +00:00
|
|
|
.num_ports = VSC9959_NUM_PORTS,
|
2021-01-15 02:11:16 +00:00
|
|
|
.num_tx_queues = OCELOT_NUM_TC,
|
2023-01-27 19:35:52 +00:00
|
|
|
.quirks = FELIX_MAC_QUIRKS,
|
net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021q
Since the tag_8021q tagger is software-defined, it has no means by
itself for retrieving hardware timestamps of PTP event messages.
Because we do want to support PTP on ocelot even with tag_8021q, we need
to use the CPU port module for that. The RX timestamp is present in the
Extraction Frame Header. And because we can't use NPI mode which redirects
the CPU queues to an "external CPU" (meaning the ARM CPU running Linux),
then we need to poll the CPU port module through the MMIO registers to
retrieve TX and RX timestamps.
Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC
without wiring the extraction IRQ line to the ARM GIC. So, if we want to
be notified of any PTP packets received on the CPU port module, we have
a problem.
There is a possible workaround, which is to use the Ethernet CPU port as
a notification channel that packets are available on the CPU port module
as well. When a PTP packet is received by the DSA tagger (without timestamp,
of course), we go to the CPU extraction queues, poll for it there, then
we drop the original Ethernet packet and masquerade the packet retrieved
over MMIO (plus the timestamp) as the original when we inject it up the
stack.
Create a quirk in struct felix is selected by the Felix driver (but not
by Seville, since that doesn't support PTP at all). We want to do this
such that the workaround is minimally invasive for future switches that
don't require this workaround.
The only traffic for which we need timestamps is PTP traffic, so add a
redirection rule to the CPU port module for this. Currently we only have
the need for PTP over L2, so redirection rules for UDP ports 319 and 320
are TBD for now.
Note that for the workaround of matching of PTP-over-Ethernet-port with
PTP-over-MMIO queues to work properly, both channels need to be
absolutely lossless. There are two parts to achieving that:
- We keep flow control enabled on the tag_8021q CPU port
- We put the DSA master interface in promiscuous mode, so it will never
drop a PTP frame (for the profiles we are interested in, these are
sent to the multicast MAC addresses of 01-80-c2-00-00-0e and
01-1b-19-00-00-00).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13 22:38:00 +00:00
|
|
|
.quirk_no_xtr_irq = true,
|
2020-09-18 10:57:52 +00:00
|
|
|
.ptp_caps = &vsc9959_ptp_caps,
|
net: dsa: felix: Add PCS operations for PHYLINK
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06 01:34:17 +00:00
|
|
|
.mdio_bus_alloc = vsc9959_mdio_bus_alloc,
|
|
|
|
.mdio_bus_free = vsc9959_mdio_bus_free,
|
2022-02-26 22:36:50 +00:00
|
|
|
.port_modes = vsc9959_port_modes,
|
2020-09-18 10:57:49 +00:00
|
|
|
.port_setup_tc = vsc9959_port_setup_tc,
|
|
|
|
.port_sched_speed_set = vsc9959_sched_speed_set,
|
net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port
Currently, sending a packet into a time gate too small for it (or always
closed) causes the queue system to hold the frame forever. Even worse,
this frame isn't subject to aging either, because for that to happen, it
needs to be scheduled for transmission in the first place. But the frame
will consume buffer memory and frame references while it is forever held
in the queue system.
Before commit a4ae997adcbd ("net: mscc: ocelot: initialize watermarks to
sane defaults"), this behavior was somewhat subtle, as the switch had a
more intricately tuned default watermark configuration out of reset,
which did not allow any single port and tc to consume the entire switch
buffer space. Nonetheless, the held frames are still there, and they
reduce the total backplane capacity of the switch.
However, after the aforementioned commit, the behavior can be very
clearly seen, since we deliberately allow each {port, tc} to consume the
entire shared buffer of the switch minus the reservations (and we
disable all reservations by default). That is to say, we allow a
permanently closed tc-taprio gate to hang the entire switch.
A careful inspection of the documentation shows that the QSYS:Q_MAX_SDU
per-port-tc registers serve 2 purposes: one is for guard band calculation
(when zero, this falls back to QSYS:PORT_MAX_SDU), and the other is to
enable oversized frame dropping (when non-zero).
Currently the QSYS:Q_MAX_SDU registers are all zero, so oversized frame
dropping is disabled. The goal of the change is to enable it seamlessly.
For that, we need to hook into the MTU change, tc-taprio change, and
port link speed change procedures, since we depend on these variables.
Frames are not dropped on egress due to a queue system oversize
condition, instead that egress port is simply excluded from the mask of
valid destination ports for the packet. If there are no destination
ports at all, the ingress counter that increments is the generic
"drop_tail" in ethtool -S.
The issue exists in various forms since the tc-taprio offload was introduced.
Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28 14:52:37 +00:00
|
|
|
.tas_guard_bands_update = vsc9959_tas_guard_bands_update,
|
net: dsa: ocelot: add driver for Felix switch family
This supports an Ethernet switching core from Vitesse / Microsemi /
Microchip (VSC9959) which is part of the Ocelot family (a brand name),
and whose code name is Felix. The switch can be (and is) integrated on
different SoCs as a PCIe endpoint device.
The functionality is provided by the core of the Ocelot switch driver
(drivers/net/ethernet/mscc). In this regard, the current driver is an
instance of Microsemi's Ocelot core driver, with a DSA front-end. It
inherits its name from VSC9959's code name, to distinguish itself from
the switchdev ocelot driver.
The patch adds the logic for probing a PCI device and defines the
register map for the VSC9959 switch core, since it has some differences
in register addresses and bitfield mappings compared to the other Ocelot
switches (VSC7511, VSC7512, VSC7513, VSC7514).
The Felix driver declares the register map as part of the "instance
table". Currently the VSC9959 inside NXP LS1028A is the only instance,
but presumably it can support other switches in the Ocelot family, when
used in DSA mode (Linux running on the external CPU, and not on the
embedded MIPS).
In a few cases, some h/w operations have to be done differently on
VSC9959 due to missing bitfields. This is the case for the switch core
reset and init. Because for this operation Ocelot uses some bits that
are not present on Felix, the latter has to use a register from the
global registers block (GCB) instead.
Although it is a PCI driver, it relies on DT bindings for compatibility
with DSA (CPU port link, PHY library). It does not have any custom
device tree bindings, since we would like to minimize its dependency on
device tree though.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14 15:03:30 +00:00
|
|
|
};
|
2020-07-13 16:57:09 +00:00
|
|
|
|
2023-01-19 12:27:04 +00:00
|
|
|
/* The INTB interrupt is shared between for PTP TX timestamp availability
|
|
|
|
* notification and MAC Merge status change on each port.
|
|
|
|
*/
|
2020-07-13 16:57:09 +00:00
|
|
|
static irqreturn_t felix_irq_handler(int irq, void *data)
|
|
|
|
{
|
|
|
|
struct ocelot *ocelot = (struct ocelot *)data;
|
2023-01-19 12:27:04 +00:00
|
|
|
int port;
|
2020-07-13 16:57:09 +00:00
|
|
|
|
|
|
|
ocelot_get_txtstamp(ocelot);
|
|
|
|
|
2023-01-19 12:27:04 +00:00
|
|
|
for (port = 0; port < ocelot->num_phys_ports; port++)
|
|
|
|
ocelot_port_mm_irq(ocelot, port);
|
|
|
|
|
2020-07-13 16:57:09 +00:00
|
|
|
return IRQ_HANDLED;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int felix_pci_probe(struct pci_dev *pdev,
|
|
|
|
const struct pci_device_id *id)
|
|
|
|
{
|
|
|
|
struct dsa_switch *ds;
|
|
|
|
struct ocelot *ocelot;
|
|
|
|
struct felix *felix;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (pdev->dev.of_node && !of_device_is_available(pdev->dev.of_node)) {
|
|
|
|
dev_info(&pdev->dev, "device is disabled, skipping\n");
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = pci_enable_device(pdev);
|
|
|
|
if (err) {
|
|
|
|
dev_err(&pdev->dev, "device enable failed\n");
|
|
|
|
goto err_pci_enable;
|
|
|
|
}
|
|
|
|
|
|
|
|
felix = kzalloc(sizeof(struct felix), GFP_KERNEL);
|
|
|
|
if (!felix) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
dev_err(&pdev->dev, "Failed to allocate driver memory\n");
|
|
|
|
goto err_alloc_felix;
|
|
|
|
}
|
|
|
|
|
|
|
|
pci_set_drvdata(pdev, felix);
|
|
|
|
ocelot = &felix->ocelot;
|
|
|
|
ocelot->dev = &pdev->dev;
|
2021-01-15 02:11:16 +00:00
|
|
|
ocelot->num_flooding_pgids = OCELOT_NUM_TC;
|
2020-07-13 16:57:09 +00:00
|
|
|
felix->info = &felix_info_vsc9959;
|
2021-12-07 17:00:27 +00:00
|
|
|
felix->switch_base = pci_resource_start(pdev, VSC9959_SWITCH_PCI_BAR);
|
2020-07-13 16:57:09 +00:00
|
|
|
|
|
|
|
pci_set_master(pdev);
|
|
|
|
|
|
|
|
err = devm_request_threaded_irq(&pdev->dev, pdev->irq, NULL,
|
|
|
|
&felix_irq_handler, IRQF_ONESHOT,
|
|
|
|
"felix-intb", ocelot);
|
|
|
|
if (err) {
|
|
|
|
dev_err(&pdev->dev, "Failed to request irq\n");
|
|
|
|
goto err_alloc_irq;
|
|
|
|
}
|
|
|
|
|
|
|
|
ocelot->ptp = 1;
|
net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959
The Felix VSC9959 switch supports frame preemption and has a MAC Merge
layer. In addition to the structured stats that exist for the eMAC,
export the counters associated with its pMAC (pause, RMON, MAC, PHY,
control) plus the high-level MAC Merge layer stats. The unstructured
ethtool counters, as well as the rtnl_link_stats64 were left to report
only the eMAC counters.
Because statistics processing is quite self-contained in ocelot_stats.c
now, I've opted for introducing an ocelot->mm_supported bool, based on
which the common switch lib does everything, rather than pushing the
TSN-specific code in felix_vsc9959.c, as happens for other TSN stuff.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-19 12:27:03 +00:00
|
|
|
ocelot->mm_supported = true;
|
2020-07-13 16:57:09 +00:00
|
|
|
|
|
|
|
ds = kzalloc(sizeof(struct dsa_switch), GFP_KERNEL);
|
|
|
|
if (!ds) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
dev_err(&pdev->dev, "Failed to allocate DSA switch\n");
|
|
|
|
goto err_alloc_ds;
|
|
|
|
}
|
|
|
|
|
|
|
|
ds->dev = &pdev->dev;
|
|
|
|
ds->num_ports = felix->info->num_ports;
|
|
|
|
ds->num_tx_queues = felix->info->num_tx_queues;
|
|
|
|
ds->ops = &felix_switch_ops;
|
|
|
|
ds->priv = ocelot;
|
|
|
|
felix->ds = ds;
|
net: dsa: felix: convert to the new .change_tag_protocol DSA API
In expectation of the new tag_ocelot_8021q tagger implementation, we
need to be able to do runtime switchover between one tagger and another.
So we must structure the existing code for the current NPI-based tagger
in a certain way.
We move the felix_npi_port_init function in expectation of the future
driver configuration necessary for tag_ocelot_8021q: we would like to
not have the NPI-related bits interspersed with the tag_8021q bits.
The conversion from this:
ocelot_write_rix(ocelot,
ANA_PGID_PGID_PGID(GENMASK(ocelot->num_phys_ports, 0)),
ANA_PGID_PGID, PGID_UC);
to this:
cpu_flood = ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports));
ocelot_rmw_rix(ocelot, cpu_flood, cpu_flood, ANA_PGID_PGID, PGID_UC);
is perhaps non-trivial, but is nonetheless non-functional. The PGID_UC
(replicator for unknown unicast) is already configured out of hardware
reset to flood to all ports except ocelot->num_phys_ports (the CPU port
module). All we change is that we use a read-modify-write to only add
the CPU port module to the unknown unicast replicator, as opposed to
doing a full write to the register.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29 01:00:07 +00:00
|
|
|
felix->tag_proto = DSA_TAG_PROTO_OCELOT;
|
2020-07-13 16:57:09 +00:00
|
|
|
|
|
|
|
err = dsa_register_switch(ds);
|
|
|
|
if (err) {
|
2022-04-08 10:15:21 +00:00
|
|
|
dev_err_probe(&pdev->dev, err, "Failed to register DSA switch\n");
|
2020-07-13 16:57:09 +00:00
|
|
|
goto err_register_ds;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err_register_ds:
|
|
|
|
kfree(ds);
|
|
|
|
err_alloc_ds:
|
|
|
|
err_alloc_irq:
|
|
|
|
kfree(felix);
|
2021-01-09 20:34:15 +00:00
|
|
|
err_alloc_felix:
|
2020-07-13 16:57:09 +00:00
|
|
|
pci_disable_device(pdev);
|
|
|
|
err_pci_enable:
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void felix_pci_remove(struct pci_dev *pdev)
|
|
|
|
{
|
net: dsa: be compatible with masters which unregister on shutdown
Lino reports that on his system with bcmgenet as DSA master and KSZ9897
as a switch, rebooting or shutting down never works properly.
What does the bcmgenet driver have special to trigger this, that other
DSA masters do not? It has an implementation of ->shutdown which simply
calls its ->remove implementation. Otherwise said, it unregisters its
network interface on shutdown.
This message can be seen in a loop, and it hangs the reboot process there:
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
So why 3?
A usage count of 1 is normal for a registered network interface, and any
virtual interface which links itself as an upper of that will increment
it via dev_hold. In the case of DSA, this is the call path:
dsa_slave_create
-> netdev_upper_dev_link
-> __netdev_upper_dev_link
-> __netdev_adjacent_dev_insert
-> dev_hold
So a DSA switch with 3 interfaces will result in a usage count elevated
by two, and netdev_wait_allrefs will wait until they have gone away.
Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
delete themselves, but DSA cannot just vanish and go poof, at most it
can unbind itself from the switch devices, but that must happen strictly
earlier compared to when the DSA master unregisters its net_device, so
reacting on the NETDEV_UNREGISTER event is way too late.
It seems that it is a pretty established pattern to have a driver's
->shutdown hook redirect to its ->remove hook, so the same code is
executed regardless of whether the driver is unbound from the device, or
the system is just shutting down. As Florian puts it, it is quite a big
hammer for bcmgenet to unregister its net_device during shutdown, but
having a common code path with the driver unbind helps ensure it is well
tested.
So DSA, for better or for worse, has to live with that and engage in an
arms race of implementing the ->shutdown hook too, from all individual
drivers, and do something sane when paired with masters that unregister
their net_device there. The only sane thing to do, of course, is to
unlink from the master.
However, complications arise really quickly.
The pattern of redirecting ->shutdown to ->remove is not unique to
bcmgenet or even to net_device drivers. In fact, SPI controllers do it
too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
and MDIO controllers do it too (this is something I have not researched
too deeply, but even if this is not the case today, it is certainly
plausible to happen in the future, and must be taken into consideration).
Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
insane implication is that for the exact same DSA switch device, we
might have both ->shutdown and ->remove getting called.
So we need to do something with that insane environment. The pattern
I've come up with is "if this, then not that", so if either ->shutdown
or ->remove gets called, we set the device's drvdata to NULL, and in the
other hook, we check whether the drvdata is NULL and just do nothing.
This is probably not necessary for platform devices, just for devices on
buses, but I would really insist for consistency among drivers, because
when code is copy-pasted, it is not always copy-pasted from the best
sources.
So depending on whether the DSA switch's ->remove or ->shutdown will get
called first, we cannot really guarantee even for the same driver if
rebooting will result in the same code path on all platforms. But
nonetheless, we need to do something minimally reasonable on ->shutdown
too to fix the bug. Of course, the ->remove will do more (a full
teardown of the tree, with all data structures freed, and this is why
the bug was not caught for so long). The new ->shutdown method is kept
separate from dsa_unregister_switch not because we couldn't have
unregistered the switch, but simply in the interest of doing something
quick and to the point.
The big question is: does the DSA switch's ->shutdown get called earlier
than the DSA master's ->shutdown? If not, there is still a risk that we
might still trigger the WARN_ON in unregister_netdevice that says we are
attempting to unregister a net_device which has uppers. That's no good.
Although the reference to the master net_device won't physically go away
even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
on it.
The answer to that question lies in this comment above device_link_add:
* A side effect of the link creation is re-ordering of dpm_list and the
* devices_kset list by moving the consumer device and all devices depending
* on it to the ends of these lists (that does not happen to devices that have
* not been registered when this function is called).
so the fact that DSA uses device_link_add towards its master is not
exactly for nothing. device_shutdown() walks devices_kset from the back,
so this is our guarantee that DSA's shutdown happens before the master's
shutdown.
Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-17 13:34:33 +00:00
|
|
|
struct felix *felix = pci_get_drvdata(pdev);
|
2020-07-13 16:57:09 +00:00
|
|
|
|
net: dsa: be compatible with masters which unregister on shutdown
Lino reports that on his system with bcmgenet as DSA master and KSZ9897
as a switch, rebooting or shutting down never works properly.
What does the bcmgenet driver have special to trigger this, that other
DSA masters do not? It has an implementation of ->shutdown which simply
calls its ->remove implementation. Otherwise said, it unregisters its
network interface on shutdown.
This message can be seen in a loop, and it hangs the reboot process there:
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
So why 3?
A usage count of 1 is normal for a registered network interface, and any
virtual interface which links itself as an upper of that will increment
it via dev_hold. In the case of DSA, this is the call path:
dsa_slave_create
-> netdev_upper_dev_link
-> __netdev_upper_dev_link
-> __netdev_adjacent_dev_insert
-> dev_hold
So a DSA switch with 3 interfaces will result in a usage count elevated
by two, and netdev_wait_allrefs will wait until they have gone away.
Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
delete themselves, but DSA cannot just vanish and go poof, at most it
can unbind itself from the switch devices, but that must happen strictly
earlier compared to when the DSA master unregisters its net_device, so
reacting on the NETDEV_UNREGISTER event is way too late.
It seems that it is a pretty established pattern to have a driver's
->shutdown hook redirect to its ->remove hook, so the same code is
executed regardless of whether the driver is unbound from the device, or
the system is just shutting down. As Florian puts it, it is quite a big
hammer for bcmgenet to unregister its net_device during shutdown, but
having a common code path with the driver unbind helps ensure it is well
tested.
So DSA, for better or for worse, has to live with that and engage in an
arms race of implementing the ->shutdown hook too, from all individual
drivers, and do something sane when paired with masters that unregister
their net_device there. The only sane thing to do, of course, is to
unlink from the master.
However, complications arise really quickly.
The pattern of redirecting ->shutdown to ->remove is not unique to
bcmgenet or even to net_device drivers. In fact, SPI controllers do it
too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
and MDIO controllers do it too (this is something I have not researched
too deeply, but even if this is not the case today, it is certainly
plausible to happen in the future, and must be taken into consideration).
Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
insane implication is that for the exact same DSA switch device, we
might have both ->shutdown and ->remove getting called.
So we need to do something with that insane environment. The pattern
I've come up with is "if this, then not that", so if either ->shutdown
or ->remove gets called, we set the device's drvdata to NULL, and in the
other hook, we check whether the drvdata is NULL and just do nothing.
This is probably not necessary for platform devices, just for devices on
buses, but I would really insist for consistency among drivers, because
when code is copy-pasted, it is not always copy-pasted from the best
sources.
So depending on whether the DSA switch's ->remove or ->shutdown will get
called first, we cannot really guarantee even for the same driver if
rebooting will result in the same code path on all platforms. But
nonetheless, we need to do something minimally reasonable on ->shutdown
too to fix the bug. Of course, the ->remove will do more (a full
teardown of the tree, with all data structures freed, and this is why
the bug was not caught for so long). The new ->shutdown method is kept
separate from dsa_unregister_switch not because we couldn't have
unregistered the switch, but simply in the interest of doing something
quick and to the point.
The big question is: does the DSA switch's ->shutdown get called earlier
than the DSA master's ->shutdown? If not, there is still a risk that we
might still trigger the WARN_ON in unregister_netdevice that says we are
attempting to unregister a net_device which has uppers. That's no good.
Although the reference to the master net_device won't physically go away
even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
on it.
The answer to that question lies in this comment above device_link_add:
* A side effect of the link creation is re-ordering of dpm_list and the
* devices_kset list by moving the consumer device and all devices depending
* on it to the ends of these lists (that does not happen to devices that have
* not been registered when this function is called).
so the fact that DSA uses device_link_add towards its master is not
exactly for nothing. device_shutdown() walks devices_kset from the back,
so this is our guarantee that DSA's shutdown happens before the master's
shutdown.
Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-17 13:34:33 +00:00
|
|
|
if (!felix)
|
|
|
|
return;
|
2020-07-13 16:57:09 +00:00
|
|
|
|
|
|
|
dsa_unregister_switch(felix->ds);
|
|
|
|
|
|
|
|
kfree(felix->ds);
|
|
|
|
kfree(felix);
|
|
|
|
|
|
|
|
pci_disable_device(pdev);
|
net: dsa: be compatible with masters which unregister on shutdown
Lino reports that on his system with bcmgenet as DSA master and KSZ9897
as a switch, rebooting or shutting down never works properly.
What does the bcmgenet driver have special to trigger this, that other
DSA masters do not? It has an implementation of ->shutdown which simply
calls its ->remove implementation. Otherwise said, it unregisters its
network interface on shutdown.
This message can be seen in a loop, and it hangs the reboot process there:
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
So why 3?
A usage count of 1 is normal for a registered network interface, and any
virtual interface which links itself as an upper of that will increment
it via dev_hold. In the case of DSA, this is the call path:
dsa_slave_create
-> netdev_upper_dev_link
-> __netdev_upper_dev_link
-> __netdev_adjacent_dev_insert
-> dev_hold
So a DSA switch with 3 interfaces will result in a usage count elevated
by two, and netdev_wait_allrefs will wait until they have gone away.
Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
delete themselves, but DSA cannot just vanish and go poof, at most it
can unbind itself from the switch devices, but that must happen strictly
earlier compared to when the DSA master unregisters its net_device, so
reacting on the NETDEV_UNREGISTER event is way too late.
It seems that it is a pretty established pattern to have a driver's
->shutdown hook redirect to its ->remove hook, so the same code is
executed regardless of whether the driver is unbound from the device, or
the system is just shutting down. As Florian puts it, it is quite a big
hammer for bcmgenet to unregister its net_device during shutdown, but
having a common code path with the driver unbind helps ensure it is well
tested.
So DSA, for better or for worse, has to live with that and engage in an
arms race of implementing the ->shutdown hook too, from all individual
drivers, and do something sane when paired with masters that unregister
their net_device there. The only sane thing to do, of course, is to
unlink from the master.
However, complications arise really quickly.
The pattern of redirecting ->shutdown to ->remove is not unique to
bcmgenet or even to net_device drivers. In fact, SPI controllers do it
too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
and MDIO controllers do it too (this is something I have not researched
too deeply, but even if this is not the case today, it is certainly
plausible to happen in the future, and must be taken into consideration).
Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
insane implication is that for the exact same DSA switch device, we
might have both ->shutdown and ->remove getting called.
So we need to do something with that insane environment. The pattern
I've come up with is "if this, then not that", so if either ->shutdown
or ->remove gets called, we set the device's drvdata to NULL, and in the
other hook, we check whether the drvdata is NULL and just do nothing.
This is probably not necessary for platform devices, just for devices on
buses, but I would really insist for consistency among drivers, because
when code is copy-pasted, it is not always copy-pasted from the best
sources.
So depending on whether the DSA switch's ->remove or ->shutdown will get
called first, we cannot really guarantee even for the same driver if
rebooting will result in the same code path on all platforms. But
nonetheless, we need to do something minimally reasonable on ->shutdown
too to fix the bug. Of course, the ->remove will do more (a full
teardown of the tree, with all data structures freed, and this is why
the bug was not caught for so long). The new ->shutdown method is kept
separate from dsa_unregister_switch not because we couldn't have
unregistered the switch, but simply in the interest of doing something
quick and to the point.
The big question is: does the DSA switch's ->shutdown get called earlier
than the DSA master's ->shutdown? If not, there is still a risk that we
might still trigger the WARN_ON in unregister_netdevice that says we are
attempting to unregister a net_device which has uppers. That's no good.
Although the reference to the master net_device won't physically go away
even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
on it.
The answer to that question lies in this comment above device_link_add:
* A side effect of the link creation is re-ordering of dpm_list and the
* devices_kset list by moving the consumer device and all devices depending
* on it to the ends of these lists (that does not happen to devices that have
* not been registered when this function is called).
so the fact that DSA uses device_link_add towards its master is not
exactly for nothing. device_shutdown() walks devices_kset from the back,
so this is our guarantee that DSA's shutdown happens before the master's
shutdown.
Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-17 13:34:33 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void felix_pci_shutdown(struct pci_dev *pdev)
|
|
|
|
{
|
|
|
|
struct felix *felix = pci_get_drvdata(pdev);
|
|
|
|
|
|
|
|
if (!felix)
|
|
|
|
return;
|
|
|
|
|
|
|
|
dsa_switch_shutdown(felix->ds);
|
|
|
|
|
|
|
|
pci_set_drvdata(pdev, NULL);
|
2020-07-13 16:57:09 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct pci_device_id felix_ids[] = {
|
|
|
|
{
|
|
|
|
/* NXP LS1028A */
|
|
|
|
PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, 0xEEF0),
|
|
|
|
},
|
|
|
|
{ 0, }
|
|
|
|
};
|
|
|
|
MODULE_DEVICE_TABLE(pci, felix_ids);
|
|
|
|
|
2020-09-18 10:57:53 +00:00
|
|
|
static struct pci_driver felix_vsc9959_pci_driver = {
|
2020-07-13 16:57:09 +00:00
|
|
|
.name = "mscc_felix",
|
|
|
|
.id_table = felix_ids,
|
|
|
|
.probe = felix_pci_probe,
|
|
|
|
.remove = felix_pci_remove,
|
net: dsa: be compatible with masters which unregister on shutdown
Lino reports that on his system with bcmgenet as DSA master and KSZ9897
as a switch, rebooting or shutting down never works properly.
What does the bcmgenet driver have special to trigger this, that other
DSA masters do not? It has an implementation of ->shutdown which simply
calls its ->remove implementation. Otherwise said, it unregisters its
network interface on shutdown.
This message can be seen in a loop, and it hangs the reboot process there:
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
So why 3?
A usage count of 1 is normal for a registered network interface, and any
virtual interface which links itself as an upper of that will increment
it via dev_hold. In the case of DSA, this is the call path:
dsa_slave_create
-> netdev_upper_dev_link
-> __netdev_upper_dev_link
-> __netdev_adjacent_dev_insert
-> dev_hold
So a DSA switch with 3 interfaces will result in a usage count elevated
by two, and netdev_wait_allrefs will wait until they have gone away.
Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
delete themselves, but DSA cannot just vanish and go poof, at most it
can unbind itself from the switch devices, but that must happen strictly
earlier compared to when the DSA master unregisters its net_device, so
reacting on the NETDEV_UNREGISTER event is way too late.
It seems that it is a pretty established pattern to have a driver's
->shutdown hook redirect to its ->remove hook, so the same code is
executed regardless of whether the driver is unbound from the device, or
the system is just shutting down. As Florian puts it, it is quite a big
hammer for bcmgenet to unregister its net_device during shutdown, but
having a common code path with the driver unbind helps ensure it is well
tested.
So DSA, for better or for worse, has to live with that and engage in an
arms race of implementing the ->shutdown hook too, from all individual
drivers, and do something sane when paired with masters that unregister
their net_device there. The only sane thing to do, of course, is to
unlink from the master.
However, complications arise really quickly.
The pattern of redirecting ->shutdown to ->remove is not unique to
bcmgenet or even to net_device drivers. In fact, SPI controllers do it
too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
and MDIO controllers do it too (this is something I have not researched
too deeply, but even if this is not the case today, it is certainly
plausible to happen in the future, and must be taken into consideration).
Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
insane implication is that for the exact same DSA switch device, we
might have both ->shutdown and ->remove getting called.
So we need to do something with that insane environment. The pattern
I've come up with is "if this, then not that", so if either ->shutdown
or ->remove gets called, we set the device's drvdata to NULL, and in the
other hook, we check whether the drvdata is NULL and just do nothing.
This is probably not necessary for platform devices, just for devices on
buses, but I would really insist for consistency among drivers, because
when code is copy-pasted, it is not always copy-pasted from the best
sources.
So depending on whether the DSA switch's ->remove or ->shutdown will get
called first, we cannot really guarantee even for the same driver if
rebooting will result in the same code path on all platforms. But
nonetheless, we need to do something minimally reasonable on ->shutdown
too to fix the bug. Of course, the ->remove will do more (a full
teardown of the tree, with all data structures freed, and this is why
the bug was not caught for so long). The new ->shutdown method is kept
separate from dsa_unregister_switch not because we couldn't have
unregistered the switch, but simply in the interest of doing something
quick and to the point.
The big question is: does the DSA switch's ->shutdown get called earlier
than the DSA master's ->shutdown? If not, there is still a risk that we
might still trigger the WARN_ON in unregister_netdevice that says we are
attempting to unregister a net_device which has uppers. That's no good.
Although the reference to the master net_device won't physically go away
even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
on it.
The answer to that question lies in this comment above device_link_add:
* A side effect of the link creation is re-ordering of dpm_list and the
* devices_kset list by moving the consumer device and all devices depending
* on it to the ends of these lists (that does not happen to devices that have
* not been registered when this function is called).
so the fact that DSA uses device_link_add towards its master is not
exactly for nothing. device_shutdown() walks devices_kset from the back,
so this is our guarantee that DSA's shutdown happens before the master's
shutdown.
Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-17 13:34:33 +00:00
|
|
|
.shutdown = felix_pci_shutdown,
|
2020-07-13 16:57:09 +00:00
|
|
|
};
|
2020-09-18 10:57:53 +00:00
|
|
|
module_pci_driver(felix_vsc9959_pci_driver);
|
|
|
|
|
|
|
|
MODULE_DESCRIPTION("Felix Switch driver");
|
|
|
|
MODULE_LICENSE("GPL v2");
|