Commit graph

768998 commits

Author SHA1 Message Date
Jiri Pirko
f71e0ca4db net: sched: Avoid implicit chain 0 creation
Currently, chain 0 is implicitly created during block creation. However
that does not align with chain object exposure, creation and destruction
api introduced later on. So make the chain 0 behave the same way as any
other chain and only create it when it is needed. Since chain 0 is
somehow special as the qdiscs need to hold pointer to the first chain
tp, this requires to move the chain head change callback infra to the
block structure.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:44:12 -07:00
Jiri Pirko
f34e8bff58 net: sched: push ops lookup bits into tcf_proto_lookup_ops()
Push all bits that take care of ops lookup, including module loading
outside tcf_proto_create() function, into tcf_proto_lookup_ops()

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:44:12 -07:00
David S. Miller
d3585edf87 Merge branch 'cpsw-add-MQPRIO-and-CBS-Qdisc-offload'
Ivan Khoronzhuk says:

====================
net: ethernet: ti: cpsw: add MQPRIO and CBS Qdisc offload

This series adds MQPRIO and CBS Qdisc offload for TI cpsw driver.
It potentially can be used in audio video bridging (AVB) and time
sensitive networking (TSN).

Patchset was tested on AM572x EVM and BBB boards. Last patch from this
series adds detailed description of configuration with examples. For
consistency reasons, in role of talker and listener, tools from
patchset "TSN: Add qdisc based config interface for CBS" were used and
can be seen here: https://www.spinics.net/lists/netdev/msg460869.html

Based on net-next/master

v5..v4:
- corrected typo of "am57xx" board name, no functional changes

v4..v3:
 - nothing, just rebase

v3..v2:
 - corrected typo of "shaper" word, no functional changes

v2..v1:
 - changed name cpsw.txt on ti-cpsw.txt
 - changed name cpsw_set_tc() on cpsw_set_mqprio()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:34:36 -07:00
Ivan Khoronzhuk
ae62372f27 Documentation: networking: cpsw: add MQPRIO & CBS offload examples
This document describes MQPRIO and CBS Qdisc offload configuration
for cpsw driver based on examples. It potentially can be used in
audio video bridging (AVB) and time sensitive networking (TSN).

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:34:36 -07:00
Ivan Khoronzhuk
4b4255ed06 net: ethernet: ti: cpsw: restore shaper configuration while down/up
Need to restore shapers configuration after interface was down/up.
This is needed as appropriate configuration is still replicated in
kernel settings. This only shapers context restore, so vlan
configuration should be restored by user if needed, especially for
devices with one port where vlan frames are sent via ALE.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:34:36 -07:00
Ivan Khoronzhuk
57d9014825 net: ethernet: ti: cpsw: add CBS Qdisc offload
The cpsw has up to 4 FIFOs per port and upper 3 FIFOs can feed rate
limited queue with shaping. In order to set and enable shaping for
those 3 FIFOs queues the network device with CBS qdisc attached is
needed. The CBS configuration is added for dual-emac/single port mode
only, but potentially can be used in switch mode also, based on
switchdev for instance.

Despite the FIFO shapers can work w/o cpdma level shapers the base
usage must be in combine with cpdma level shapers as described in TRM,
that are set as maximum rates for interface queues with sysfs.

One of the possible configuration with txq shapers and CBS shapers:

                      Configured with echo RATE >
                  /sys/class/net/eth0/queues/tx-0/tx_maxrate
             /---------------------------------------------------
            /
           /            cpdma level shapers
        +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+
        | c7 | | c6 | | c5 | | c4 | | c3 | | c2 | | c1 | | c0 |
        \    / \    / \    / \    / \    / \    / \    / \    /
         \  /   \  /   \  /   \  /   \  /   \  /   \  /   \  /
          \/     \/     \/     \/     \/     \/     \/     \/
+---------|------|------|------|-------------------------------------+
|    +----+      |      |  +---+                                     |
|    |      +----+      |  |                                         |
|    v      v           v  v                                         |
| +----+ +----+ +----+ +----+ p        p+----+ +----+ +----+ +----+  |
| |    | |    | |    | |    | o        o|    | |    | |    | |    |  |
| | f3 | | f2 | | f1 | | f0 | r  CPSW  r| f3 | | f2 | | f1 | | f0 |  |
| |    | |    | |    | |    | t        t|    | |    | |    | |    |  |
| \    / \    / \    / \    / 0        1\    / \    / \    / \    /  |
|  \  X   \  /   \  /   \  /             \  /   \  /   \  /   \  /   |
|   \/ \   \/     \/     \/               \/     \/     \/     \/    |
+-------\------------------------------------------------------------+
         \
          \ FIFO shaper, set with CBS offload added in this patch,
           \ FIFO0 cannot be rate limited
            ------------------------------------------------------

CBS shaper configuration is supposed to be used with root MQPRIO Qdisc
offload allowing to add sk_prio->tc->txq maps that direct traffic to
appropriate tx queue and maps L2 priority to FIFO shaper.

The CBS shaper is intended to be used for AVB where L2 priority
(pcp field) is used to differentiate class of traffic. So additionally
vlan needs to be created with appropriate egress sk_prio->l2 prio map.

If CBS has several tx queues assigned to it, the sum of their
bandwidth has not overlap bandwidth set for CBS. It's recomended the
CBS bandwidth to be a little bit more.

The CBS shaper is configured with CBS qdisc offload interface using tc
tool from iproute2 packet.

For instance:

$ tc qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1

$ tc -g class show dev eth0
+---(100:ffe2) mqprio
|    +---(100:3) mqprio
|    +---(100:4) mqprio
|    
+---(100:ffe1) mqprio
|    +---(100:2) mqprio
|    
+---(100:ffe0) mqprio
     +---(100:1) mqprio

$ tc qdisc add dev eth0 parent 100:1 cbs locredit -1440 \
hicredit 60 sendslope -960000 idleslope 40000 offload 1

$ tc qdisc add dev eth0 parent 100:2 cbs locredit -1470 \
hicredit 62 sendslope -980000 idleslope 20000 offload 1

The above code set CBS shapers for tc0 and tc1, for that txq0 and
txq1 is used. Pay attention, the real set bandwidth can differ a bit
due to discreteness of configuration parameters.

Here parameters like locredit, hicredit and sendslope are ignored
internally and are supposed to be set with assumption that maximum
frame size for frame - 1500.

It's supposed that interface speed is not changed while reconnection,
not always is true, so inform user in case speed of interface was
changed, as it can impact on dependent shapers configuration.

For more examples see Documentation.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:34:36 -07:00
Ivan Khoronzhuk
7929a66871 net: ethernet: ti: cpsw: add MQPRIO Qdisc offload
That's possible to offload vlan to tc priority mapping with
assumption sk_prio == L2 prio.

Example:
$ ethtool -L eth0 rx 1 tx 4

$ qdisc replace dev eth0 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 1

$ tc -g class show dev eth0
+---(100:ffe2) mqprio
|    +---(100:3) mqprio
|    +---(100:4) mqprio
|    
+---(100:ffe1) mqprio
|    +---(100:2) mqprio
|    
+---(100:ffe0) mqprio
     +---(100:1) mqprio

Here, 100:1 is txq0, 100:2 is txq1, 100:3 is txq2, 100:4 is txq3
txq0 belongs to tc0, txq1 to tc1, txq2 and txq3 to tc2
The offload part only maps L2 prio to classes of traffic, but not
to transmit queues, so to direct traffic to traffic class vlan has
to be created with appropriate egress map.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:34:36 -07:00
Ivan Khoronzhuk
4bb6c356a0 net: ethernet: ti: cpdma: fit rated channels in backward order
According to TRM tx rated channels should be in 7..0 order,
so correct it.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:34:36 -07:00
Ivan Khoronzhuk
79b3325d0d net: ethernet: ti: cpsw: use cpdma channels in backward order for txq
The cpdma channel highest priority is from hi to lo number.
The driver has limited number of descriptors that are shared between
number of cpdma channels. Number of queues can be tuned with ethtool,
that allows to not spend descriptors on not needed cpdma channels.
In AVB usually only 2 tx queues can be enough with rate limitation.
The rate limitation can be used only for hi priority queues. Thus, to
use only 2 queues the 8 has to be created. It's wasteful.

So, in order to allow using only needed number of rate limited
tx queues, save resources, and be able to set rate limitation for
them, let assign tx cpdma channels in backward order to queues.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:34:36 -07:00
David S. Miller
b19c7bb1ac mlx5e-updates-2018-07-18
This series includes update for mlx5e net device driver.
 
 1) From Feras Daoud, Added the support for firmware log tracing,
 first by introducing the firmware API needed for the task and then
 For each PF do the following:
     1- Allocate memory for the tracer strings database and read it from the FW to the SW.
     2- Allocate and dma map tracer buffers.
 
     Traces that will be written into the buffer will be parsed as a group
     of one or more traces, referred to as trace message. The trace message
     represents a C-like printf string.
 Once a new trace is available  FW will generate an event indicates new trace/s are
 available and the driver will parse them and dump them using tracepoints
 event tracing
 
 Enable mlx5 fw tracing by:
 echo 1 > /sys/kernel/debug/tracing/events/mlx5/mlx5_fw/enable
 
 Read traces by:
 cat /sys/kernel/debug/tracing/trace
 
 2) From Roi Dayan, Remove redundant WARN when we cannot find neigh entry
 
 3) From Jianbo Liu, TC double vlan support
 - Support offloading tc double vlan headers match
 - Support offloading double vlan push/pop tc actions
 
 4) From Boris, re-visit UDP GSO, remove the splitting of UDP_GSO_L4 packets
 in the driver, and exposes UDP_GSO_L4 as a PARTIAL_GSO feature.
 -----BEGIN PGP SIGNATURE-----
 
 iQEbBAABAgAGBQJbVlEZAAoJEEg/ir3gV/o+x00H8gKfpMcKoDpT/EOq0NbCjnHI
 87cxUqtk999TaoxD7YbNjQh6vyMvQOE6WwEZIIpvc6JzeSWtYN9FELyQC+deYH+/
 299WbfdiPxADfBB2DzbTlPhGOgaO26zA+yAYgdp7FW9M1r3USWExaUg1UzMTdxKR
 4CsWUsG+yB3KlAKvuGjjRU1bN/+NivmK5mgT9PXd9m9fjobBENERU8dscCVmpMro
 o2z6ajKZ26a0jo0az99vDBUu6t1SC6QN1nJHY3iWBVY1Mvjy9XrcQ4LDR5wSjelU
 EiM9Hn2eVg5OddrlFEEi7yEeLHgtda3p/3qb1zx2YY9vuUM79R3MYz0uAPuaIw==
 =j+2g
 -----END PGP SIGNATURE-----

Merge tag 'mlx5e-updates-2018-07-18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-07-18

This series includes update for mlx5e net device driver.

1) From Feras Daoud, Added the support for firmware log tracing,
first by introducing the firmware API needed for the task and then
For each PF do the following:
    1- Allocate memory for the tracer strings database and read it from the FW to the SW.
    2- Allocate and dma map tracer buffers.

    Traces that will be written into the buffer will be parsed as a group
    of one or more traces, referred to as trace message. The trace message
    represents a C-like printf string.
Once a new trace is available  FW will generate an event indicates new trace/s are
available and the driver will parse them and dump them using tracepoints
event tracing

Enable mlx5 fw tracing by:
echo 1 > /sys/kernel/debug/tracing/events/mlx5/mlx5_fw/enable

Read traces by:
cat /sys/kernel/debug/tracing/trace

2) From Roi Dayan, Remove redundant WARN when we cannot find neigh entry

3) From Jianbo Liu, TC double vlan support
- Support offloading tc double vlan headers match
- Support offloading double vlan push/pop tc actions

4) From Boris, re-visit UDP GSO, remove the splitting of UDP_GSO_L4 packets
in the driver, and exposes UDP_GSO_L4 as a PARTIAL_GSO feature.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:22:33 -07:00
Boris Pismenny
3f44899ef2 net/mlx5e: Use PARTIAL_GSO for UDP segmentation
This patch removes the splitting of UDP_GSO_L4 packets in the driver,
and exposes UDP_GSO_L4 as a PARTIAL_GSO feature. Thus, the network stack
is not responsible for splitting the packet into two.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Jianbo Liu
cc495188a8 net/mlx5e: Support offloading double vlan push/pop tc actions
As we can configure two push/pop actions in one flow table entry,
add support to offload those double vlan actions in a rule to HW.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Jianbo Liu
1482bd3d50 net/mlx5e: Refactor tc vlan push/pop actions offloading
Extract actions offloading code to a new function, and also extend data
structures for double vlan actions.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Jianbo Liu
699e96ddf4 net/mlx5e: Support offloading tc double vlan headers match
We can match on both outer and inner vlan tags, add support for
offloading that.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Roi Dayan
c7f7ba8df8 net/mlx5e: Remove redundant WARN when we cannot find neigh entry
It is possible for neigh entry not to exist if it was cleaned already.
When we bring down an interface the neigh gets deleted but it could be
that our listener for neigh event to clear the encap valid bit didn't
start yet and the neigh update last used work is started first.
In this scenario the encap entry has valid bit set but the neigh entry
doesn't exist.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Saeed Mahameed
3101d1fc6b net/mlx5: FW tracer, Add debug prints
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Feras Daoud
244069532f net/mlx5: FW tracer, Enable tracing
Add the tracer file to the makefile and add the init
function to the load one flow.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Feras Daoud
70dd6fdb89 net/mlx5: FW tracer, parse traces and kernel tracing support
For each message the driver should do the following:
1- Find the message string in the strings database
2- Count the param number of each message
3- Wait for the param events and accumulate them
4- Calculate the event timestamp using the local event timestamp
and the first timestamp event following it.
5- Print message to trace log

Enable the tracing by:
echo 1 > /sys/kernel/debug/tracing/events/mlx5/mlx5_fw/enable

Read traces by:
cat /sys/kernel/debug/tracing/trace

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Feras Daoud
c71ad41ccb net/mlx5: FW tracer, events handling
The tracer has one event, event 0x26, with two subtypes:
- Subtype 0: Ownership change
- Subtype 1: Traces available

An ownership change occurs in the following cases:
1- Owner releases his ownership, in this case, an event will be
sent to inform others to reattempt acquire ownership.
2- Ownership was taken by a higher priority tool, in this case
the owner should understand that it lost ownership, and go through
tear down flow.

The second subtype indicates that there are traces in the trace buffer,
in this case, the driver polls the tracer buffer for new traces, parse
them and prepares the messages for printing.

The HW starts tracing from the first address in the tracer buffer.
Driver receives an event notifying that new trace block exists.
HW posts a timestamp event at the last 8B of every 256B block.
Comparing the timestamp to the last handled timestamp would indicate
that this is a new trace block. Once the new timestamp is detected,
the entire block is considered valid.

Block validation and parsing, should be done after copying the current
block to a different location, in order to avoid block overwritten
during processing.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Saeed Mahameed
e9cad2cea7 net/mlx5: FW tracer, register log buffer memory key
Create a memory key and protection domain for the tracer log buffer.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Feras Daoud
48967ffdeb net/mlx5: FW tracer, create trace buffer and copy strings database
For each PF do the following:
1- Allocate memory for the tracer strings database and read the
strings from the FW to the SW. These strings will be used later for
parsing traces.
2- Allocate and dma map tracer buffers.

Traces that will be written into the buffer will be parsed as a group
of one or more traces, referred to as trace message. The trace message
represents a C-like printf string.
First trace of a message holds the pointer to the correct string in
strings database. The following traces holds the variables of the
message.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Feras Daoud
f53aaa31cc net/mlx5: FW tracer, implement tracer logic
Implement FW tracer logic and registers access, initialization and
cleanup flows.

Initializing the tracer will be part of load one flow, as multiple
PFs will try to acquire ownership but only one will succeed and will
be the tracer owner.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Saeed Mahameed
7854ac44fe Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5 core infrastructure updates and fixes.

From Eran:
 - Add MPEGC (Management PCIe General Configuration) registers and btis
 - Fix tristate and description for MLX5 module

rom Feras:
 - Add hardware structures for the firmware tracer

From Jainbo:
 - Core support for double vlan push/pop steering action

From Max:
 - Add XRQ commands definitions

From Noa:
 - Add missing SET_DRIVER_VERSION command translation

From Roi:
 - Use ERR_CAST() instead of coding it

From Tariq:
 - Better return types for CQE API

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 14:58:46 -07:00
David S. Miller
c9eaaa1773 Merge branch 'lan743x-Add-features-to-lan743x-driver'
Bryan Whitehead says:

====================
lan743x: Add features to lan743x driver

This patch series adds extra features to the lan743x driver.

Updates for v4:
Patch 6/8 - Modified get/set_wol to use super set of
	    MAC and PHY driver support.
Patch 7/9 - In set_eee, return the return value from phy_ethtool_set_eee.

Updates for v3:
Removed patch 9 from this series, regarding PTP support
Patch 6/8 - Add call to phy_ethtool_get_wol to lan743x_ethtool_get_wol
Patch 7/8 - Add call to phy_ethtool_set_eee on (!eee->eee_enabled)

Updates for v2:
Patch 3/9 - Used ARRAY_SIZE macro in lan743x_ethtool_get_ethtool_stats.
Patch 5/9 - Used MAX_EEPROM_SIZE in lan743x_ethtool_set_eeprom.
Patch 6/9 - Removed unnecessary read of PMT_CTL.
	    Used CRC algorithm from lib.
	    Removed PHY interrupt settings from lan743x_pm_suspend
	    Change "#if CONFIG_PM" to "#ifdef CONFIG_PM"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 14:09:19 -07:00
Bryan Whitehead
43e8fe9b84 lan743x: Add RSS support
Implement RSS support

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 14:09:19 -07:00
Bryan Whitehead
c9cf96bb5f lan743x: Add EEE support
Implement EEE support

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 14:09:19 -07:00
Bryan Whitehead
4d94282afd lan743x: Add power management support
Implement power management
Supports suspend, resume, and Wake on LAN

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 14:09:18 -07:00
Bryan Whitehead
695846047a lan743x: Add support for ethtool eeprom access
Implement ethtool eeprom access
Also provides access to OTP (One Time Programming)

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 14:09:18 -07:00
Bryan Whitehead
2958337d68 lan743x: Add support for ethtool message level
Implement ethtool message level

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 14:09:18 -07:00
Bryan Whitehead
8114e8a2f1 lan743x: Add support for ethtool statistics
Implement ethtool statistics

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 14:09:18 -07:00
Bryan Whitehead
63b92a91a4 lan743x: Add support for ethtool link settings
Use default link setting functions

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 14:09:18 -07:00
Bryan Whitehead
0cf632265d lan743x: Add support for ethtool get_drvinfo
Implement ethtool get_drvinfo

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 14:09:18 -07:00
David S. Miller
8760c4d6d5 Merge branch 'sh_eth-clean-up-the-TSU-register-accessors'
Sergei Shtylyov says:

====================
sh_eth: clean up the TSU register accessors

Here's a set of 5 patches against DaveM's 'net-next.git' repo. They do
a final clean up of the TSU register accessors...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 12:34:51 -07:00
Sergei Shtylyov
51459d4c76 sh_eth: make sh_eth_tsu_{read|write}_entry() prototypes symmetric
sh_eth_tsu_read_entry() is still asymmetric with sh_eth_tsu_write_entry()
WRT their prototypes -- make them symmetric by passing to the former a TSU
register offset instead of its address and also adding the (now necessary)
'ndev' parameter...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 12:34:50 -07:00
Sergei Shtylyov
7a54c867ba sh_eth: make sh_eth_tsu_write_entry() take 'offset' parameter
We can add the TSU register base address to a TSU register offset right
in sh_eth_tsu_write_entry(),  no need to do it in its callers...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 12:34:50 -07:00
Sergei Shtylyov
ecbecb0a90 sh_eth: call sh_eth_tsu_get_offset() from TSU register accessors
With sh_eth_tsu_get_offset() now actually returning TSU register's offset,
we  can at last use it in sh_eth_tsu_{read|write}(). Somehow this saves 248
bytes of object code with AArch64 gcc 4.8.5... :-)

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 12:34:50 -07:00
Sergei Shtylyov
41414f0a85 sh_eth: make sh_eth_tsu_get_offset() match its name
sh_eth_tsu_get_offset(), despite its name, returns a TSU register's address,
not its offset.  Make this  function match its name and return a register's
offset  from the TSU  registers base address instead.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 12:34:50 -07:00
Sergei Shtylyov
388c4bb4dc sh_eth: uninline sh_eth_tsu_get_offset()
sh_eth_tsu_get_offset() is called several  times  by the driver, remove
*inline* and move  that function  from the header to the driver  itself
to let gcc decide  whether to expand it inline or not...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 12:34:50 -07:00
YueHaibing
fd800f6464 wan/fsl_ucc_hdlc: use IS_ERR_VALUE() to check return value of qe_muram_alloc
qe_muram_alloc return a unsigned long integer,which should not
compared with zero. check it using IS_ERR_VALUE() to fix this.

Fixes: c19b6d246a ("drivers/net: support hdlc function for QE-UCC")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 11:07:10 -07:00
David S. Miller
6a525818b1 Merge branch 'smc-next'
Ursula Braun says:

====================
net/smc: patches 2018-07-23

here are some small patches for SMC: Just the first patch contains a
functional change. It allows to differ between the modes SMCR and SMCD
on s390 when monitoring SMC sockets. The remaining patches are cleanups
without functional changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 10:57:14 -07:00
Ursula Braun
48bf523177 net/smc: remove local variable page in smc_rx_splice()
The page map address is already stored in the RMB descriptor.
There is no need to derive it from the cpu_addr value.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 10:57:14 -07:00
Ursula Braun
144ce4b9b5 net/smc: use DECLARE_BITMAP for rtokens_used_mask
Link group field tokens_used_mask is a bitmap. Use macro
DECLARE_BITMAP for its definition.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 10:57:14 -07:00
Stefan Raspl
00e5fb263f net/smc: add function to get link group from link
Replace a frequently used construct with a more readable variant,
reducing the code. Also might come handy when we start to support
more than a single per link group.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 10:57:14 -07:00
Stefan Raspl
bac6de7b63 net/smc: eliminate cursor read and write calls
The functions to read and write cursors are exclusively used to copy
cursors. Therefore switch to a respective function instead.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 10:57:14 -07:00
Karsten Graul
c601171d7a net/smc: provide smc mode in smc_diag.c
Rename field diag_fallback into diag_mode and set the smc mode of a
connection explicitly.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 10:57:14 -07:00
Petr Machata
9a2ad36238 selftests: forwarding: gre_multipath: Drop IPv6 tests
Support for device-only IPv6 multipath next hops was dropped in
commit 33bd5ac54d ("net/ipv6: Revert attempt to simplify route replace
and append") and as of commit b5d2d75e07 ("net/ipv6: Do not allow
device only routes via the multipath API"), attempts to add a next hop
like that yield an explicit diagnostic.

Correspondingly, drop the IPv6 parts of GRE multipath test that are
supposed to test that code.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 09:46:52 -07:00
YueHaibing
7fa41efac1 ipv6: sr: Use kmemdup instead of duplicating it in parse_nla_srh
Replace calls to kmalloc followed by a memcpy with a direct call to
kmemdup.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 09:39:07 -07:00
David S. Miller
f8b2990fd9 Merge branch 'net-bridge-add-support-for-backup-port'
Nikolay Aleksandrov says:

====================
net: bridge: add support for backup port

This set introduces a new bridge port option that allows any port to have
any other port (in the same bridge of course) as its backup and traffic
will be forwarded to the backup port when the primary goes down. This is
mainly used in MLAG and EVPN setups where we have peerlink path which is
a backup of many (or even all) ports and is a participating bridge port
itself. There's more detailed information in patch 02. Patch 01 just
prepares the port sysfs code for options that take raw value. The main
issues that this set solves are scalability and fallback latency.

We have used similar code for over 6 months now to bring the fallback
latency of the backup peerlink down and avoid fdb notification storms.
Also due to the nature of master devices such setup is currently not
possible, and last but not least having tens of thousands of fdbs require
thousands of calls to switch.

I've also CCed our MLAG experts that have been using similar option.

Roopa also adds:

"Two switches acting in a MLAG pair are connected by the peerlink
interface which is a bridge port.

the config on one of the switches looks like the below. The other
switch also has a similar config.
eth0 is connected to one port on the server. And the server is
connected to both switches.

br0 -- team0---eth0
      |
      -- switch-peerlink

switch-peerlink becomes the failover/backport port when say team0 to
the server goes down.
Today, when team0 goes down, control plane has to withdraw all the fdb
entries pointing to team0
and re-install the fdb entries pointing to switch-peerlink...and
restore the fdb entries when team0 comes back up again.
and  this is the problem we are trying to solve.

This also becomes necessary when multihoming is implemented by a
standard like E-VPN https://tools.ietf.org/html/rfc8365#section-8
where the 'switch-peerlink' is an overlay vxlan port (like nikolay
mentions in his patch commit). In these implementations, the fdb scale
can be much larger.

On why bond failover cannot be used here ?: the point that nikolay was
alluding to is, switch-peerlink in the above example is a bridge port
and is a failover/backport port for more than one or all ports in the
bridge br0. And you cannot enslave switch-peerlink into a second level
team
with other bridge ports. Hence a multi layered team device is not an
option (FWIW, switch-peerlink is also a teamed interface to the peer
switch)."

v3: Added Roopa's explanation and diagram
v2: In patch 01 use kstrdup/kfree to avoid casting the const buf. In order
to avoid using GFP_ATOMIC or always allocating I kept the spinlock inside
each branch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 09:32:15 -07:00
Nikolay Aleksandrov
2756f68c31 net: bridge: add support for backup port
This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
allows to set a backup port to be used for known unicast traffic if the
port has gone carrier down. The backup pointer is rcu protected and set
only under RTNL, a counter is maintained so when deleting a port we know
how many other ports reference it as a backup and we remove it from all.
Also the pointer is in the first cache line which is hot at the time of
the check and thus in the common case we only add one more test.
The backup port will be used only for the non-flooding case since
it's a part of the bridge and the flooded packets will be forwarded to it
anyway. To remove the forwarding just send a 0/non-existing backup port.
This is used to avoid numerous scalability problems when using MLAG most
notably if we have thousands of fdbs one would need to change all of them
on port carrier going down which takes too long and causes a storm of fdb
notifications (and again when the port comes back up). In a Multi-chassis
Link Aggregation setup usually hosts are connected to two different
switches which act as a single logical switch. Those switches usually have
a control and backup link between them called peerlink which might be used
for communication in case a host loses connectivity to one of them.
We need a fast way to failover in case a host port goes down and currently
none of the solutions (like bond) cannot fulfill the requirements because
the participating ports are actually the "master" devices and must have the
same peerlink as their backup interface and at the same time all of them
must participate in the bridge device. As Roopa noted it's normal practice
in routing called fast re-route where a precalculated backup path is used
when the main one is down.
Another use case of this is with EVPN, having a single vxlan device which
is backup of every port. Due to the nature of master devices it's not
currently possible to use one device as a backup for many and still have
all of them participate in the bridge (which is master itself).
More detailed information about MLAG is available at the link below.
https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG

Further explanation and a diagram by Roopa:
Two switches acting in a MLAG pair are connected by the peerlink
interface which is a bridge port.

the config on one of the switches looks like the below. The other
switch also has a similar config.
eth0 is connected to one port on the server. And the server is
connected to both switches.

br0 -- team0---eth0
      |
      -- switch-peerlink

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 09:32:15 -07:00
Nikolay Aleksandrov
a5f3ea54f3 net: bridge: add support for raw sysfs port options
This patch adds a new alternative store callback for port sysfs options
which takes a raw value (buf) and can use it directly. It is needed for the
backup port sysfs support since we have to pass the device by its name.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 09:32:15 -07:00