Commit graph

20651 commits

Author SHA1 Message Date
Lipeng
7036d26f32 net: hns3: fix the bug of hns3_set_txbd_baseinfo
The SC bits of TX BD mean switch control. For this area, value 0
indicates no switch control, the packet is routed according to the
forwarding table. Value 1 indicates that the packet is transmitted
to the network bypassing the forwarding table.

As HNS3 driver need support VF later, VF conmunicate with its own
PF need forwarding table. This patch sets SC bits of TX BD 0 and use
forwarding table.

Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC)

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 17:25:35 +09:00
Doug Berger
6c97f010ce net: bcmgenet: use dev->phydev instead of priv->phydev
Now that the software reset of the PHY has been removed it is no
longer necessary to retain a private pointer to the phydev for
use when the PHY is detached (which isn't generally safe anyway).

The driver now uses the phydev member attached to the net_device.

For ethtool commands that have a PHY component, an explicit check
is made to prevent accessing an invalid phydev pointer when one
is not attached (e.g. interface is down).

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 10:14:54 +09:00
Doug Berger
484bfa1507 Revert "net: bcmgenet: Software reset EPHY after power on"
With commit f7d72996e222 ("net: bcmgenet: enable loopback during
UniMAC sw_reset") it is no longer necessary to force the software
reset of the internal EPHY before resetting the UniMAC to ensure a
clean reset.

Therefore this commit reverts commit 5dbebbb44a ("net: bcmgenet:
Software reset EPHY after power on").

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 10:14:54 +09:00
Doug Berger
b0447ecb53 net: bcmgenet: relax lock constraints to reduce IRQ latency
Since the ring locks are not used in a hard IRQ context it is often
not necessary to disable global IRQs while waiting on a lock.

Using less restrictive lock and unlock calls improves the real-time
responsiveness of the system.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 10:14:54 +09:00
Doug Berger
d215dbac48 net: bcmgenet: rework bcmgenet_netif_start and bcmgenet_netif_stop
This commit consolidates more common functionality from
bcmgenet_close and bcmgenet_suspend into bcmgenet_netif_stop and
modifies the start and stop sequences to better suit the design
of the GENET hardware.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 10:14:54 +09:00
Doug Berger
fbf557d9d1 net: bcmgenet: cleanup ring interrupt masking and unmasking
Since the NAPI interrupts are basically ignored when NAPI is
disabled we don't need to mask them within the functions
bcmgenet_disable_tx_napi() and bcmgenet_disable_rx_napi().
So wait until all NAPI instances are disabled and mask all of the
bcmgenet driver interrupts together in bcmgenet_netif_stop().

The interrupts can still be enabled in the functions
bcmgenet_enable_tx_napi() and bcmgenet_enable_rx_napi(), but use
the ring context int_enable() method to keep the functionality
consistent and the code cleaner.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 10:14:54 +09:00
Doug Berger
7587935cfa net: bcmgenet: move NAPI initialization to ring initialization
Since each ring has its own NAPI instance it might as well be
initialized along with the other ring context.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 10:14:54 +09:00
Doug Berger
28c2d1a7a0 net: bcmgenet: enable loopback during UniMAC sw_reset
It is necessary for the UniMAC to be clocked at least 5 cycles
while the sw_reset is asserted to ensure a clean reset.

It was discovered that this condition was not being met when
connected to an external RGMII PHY that disabled the Rx clock in
the Power Save state.

This commit modifies the reset_umac function to place the (RG)MII
interface into a local loopback mode where the Rx clock comes
from the GENET sourced Tx clk during the sw_reset to ensure the
presence and stability of the clock.

In addition, it turns out that the sw_reset of the UniMAC is not
self clearing, but this was masked by a bug in the timeout code.

The sw_reset is now explicitly cleared by zeroing the UMAC_CMD
register before returning from reset_umac which makes it no
longer necessary to do so in init_umac and makes the clearing of
CMD_TX_EN and CMD_RX_EN by umac_enable_set redundant. The
timeout code (and its associated bug) are removed so reset_umac
no longer needs to return a result, and that means init_umac
that calls reset_umac does not need to as well.

Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 10:14:54 +09:00
Doug Berger
4fd6dc98c1 net: bcmgenet: prevent duplicate calls of bcmgenet_dma_teardown
When bcmgenet_dma_teardown is called from bcmgenet_fini_dma it ends
up getting called twice from the bcmgenet_close and bcmgenet_suspend
functions (once directly and once inside the bcmgenet_fini_dma call).

This commit removes the call from bcmgenet_fini_dma and ensures that
bcmgenet_dma_teardown is called before bcmgenet_fini_dma in all paths
of execution.

Fixes: 4a0c081eff ("net: bcmgenet: call bcmgenet_dma_teardown in bcmgenet_fini_dma")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 10:14:54 +09:00
Doug Berger
0d314502bb net: bcmgenet: correct bad merge
As noted in the net-next submission for GENETv5 support [1], there
were merge conflicts with an earlier net submission [2] that had not
yet found its way to the net-next repository.

Unfortunately, when the branches were merged the conflicts were not
correctly resolved.  This commit attempts to correct that.

[1] https://lkml.org/lkml/2017/3/13/1145
[2] https://lkml.org/lkml/2017/3/9/890

Fixes: 101c431492 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 10:14:54 +09:00
Steven J. Hill
1769af432a ethernet: cavium: octeon: Switch to using netdev_info().
Signed-off-by: Steven J. Hill <Steven.Hill@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-26 10:03:35 +09:00
Kees Cook
fd71e13bc7 drivers/net: sis: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Daniele Venzano <venza@brownhat.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniele Venzano <venza@brownhat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 13:09:47 +09:00
Kees Cook
7aa1402e2e net: ethernet/sfc: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 12:57:33 +09:00
Yotam Gigi
ea00aa3a27 mlxsw: spectrum: mr_tcam: Include the mr_tcam header file
Make the spectrum_mr_tcam.c include the spectrum_mr_tcam.h header file.

Cleans up sparse warning:
symbol 'mlxsw_sp_mr_tcam_ops' was not declared. Should it be static?

Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:07:13 +09:00
Yotam Gigi
6a30dc29a4 mlxsw: spectrum: mr: Make the function mlxsw_sp_mr_dev_vif_lookup static
The function is only used internally in spectrum_mr.c and is not declared
in the header file, thus make it static.

Cleans up sparse warning:
symbol 'mlxsw_sp_mr_dev_vif_lookup' was not declared. Should it be static?

Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:07:13 +09:00
Yotam Gigi
de3872cd18 mlxsw: spectrum: mr: Fix various endianness issues
Fix various endianness issues in comparisons and assignments. The fix is
entirely cosmetic as all the values fixed are endianness-agnostic.

Cleans up sparse warnings:
spectrum_mr.c:156:49: warning: restricted __be32 degrades to integer
spectrum_mr.c:206:26: warning: restricted __be32 degrades to integer
spectrum_mr.c:212:31: warning: incorrect type in assignment (different
  base types)
spectrum_mr.c:212:31:    expected restricted __be32 [usertype] addr4
spectrum_mr.c:212:31:    got unsigned int
spectrum_mr.c:214:32: warning: incorrect type in assignment (different
  base types)
spectrum_mr.c:214:32:    expected restricted __be32 [usertype] addr4
spectrum_mr.c:214:32:    got unsigned int
spectrum_mr.c:461:16: warning: restricted __be32 degrades to integer
spectrum_mr.c:461:49: warning: restricted __be32 degrades to integer

Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:07:13 +09:00
Arkadi Sharshevsky
69715dd50d mlxsw: spectrum_dpipe: Fix entries dump of the adjacency table
During the dump the per netlink packet entry counter should be zeroed out
when new packet is created.

Fixes: 190d38a52a ("mlxsw: spectrum_dpipe: Add support for adjacency table dump")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:02:02 +09:00
Veerasenareddy Burru
907aaa6bab liquidio: pass date and time info to NIC firmware
Pass date and time information to NIC at the time of loading
firmware and periodically update the host time to NIC firmware.
This is to make NIC firmware use the same time reference as Host,
so that it is easy to correlate logs from firmware and host for debugging.

Signed-off-by: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 18:57:10 +09:00
Jakub Kicinski
9f16c8abcd nfp: bpf: optimize mov64 a little
Loading 64bit constants require up to 4 load immediates, since
we can only load 16 bits at a time.  If the 32bit halves of
the 64bit constant are the same, however, we can save a cycle
by doing a register move instead of two loads of 16 bits.

Note that we don't optimize the normal ALU64 load because even
though it's a 64 bit load the upper half of the register is
a coming from sign extension so we can load it in one cycle
anyway.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
b14157eeed nfp: bpf: support stack accesses via non-constant pointers
If stack pointer has a different value on different paths
but the alignment to words (4B) remains the same, we can
set a new LMEM access pointer to the calculated value and
access whichever word it's pointing to.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
2df03a50f1 nfp: bpf: support accessing the stack beyond 64 bytes
To access beyond 64th byte of the stack we need to set a new
stack pointer register (LMEM is accessed indirectly through
those pointers).  Add a function for encoding local CSR access
instruction.  Use stack pointer number 3.

Note that stack pointer registers allow us to index into 32
bytes of LMEM (with shift operations i.e. when operands are
restricted).  This means if access is crossing 32 byte boundary
we must not use offsetting, we have to set the pointer to the
exact address and move it with post-increments.

We depend on the datapath placing the stack base address in
GPR A22 for our use.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
d348848063 nfp: bpf: allow stack accesses via modified stack registers
As long as the verifier tells us the stack offset exactly we
can render the LMEM reads quite easily.  Simply make sure that
the offset is constant for a given instruction and add it to
the instruction's offset.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
9a90c83c09 nfp: bpf: optimize the RMW for stack accesses
When we are performing unaligned stack accesses in the 32-64B window
we have to do a read-modify-write cycle.  E.g. for reading 8 bytes
from address 17:

0:  tmp    = stack[16]
1:  gprLo  = tmp >> 8
2:  tmp    = stack[20]
3:  gprLo |= tmp << 24
4:  tmp    = stack[20]
5:  gprHi  = tmp >> 8
6:  tmp    = stack[24]
7:  gprHi |= tmp << 24

The load on line 4 is unnecessary, because tmp already contains data
from stack[20].

For write we can optimize both loads and writebacks away.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
a82b23fb38 nfp: bpf: add stack read support
Add simple stack read support, similar to write in every aspect,
but data flowing the other way.  Note that unlike write which can
be done in smaller than word quantities, if registers are loaded
with less-than-word of stack contents - the values have to be
zero extended.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
ee9133a845 nfp: bpf: add stack write support
Stack is implemented by the LMEM register file.  Unaligned accesses
to LMEM are not allowed.  Accesses also have to be 4B wide.

To support stack we need to make sure offsets of pointers are known
at translation time (for now) and perform correct load/mask/shift
operations.

Since we can access first 64B of LMEM without much effort support
only stacks not bigger than 64B.  Following commits will extend
the possible sizes beyond that.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
70c78fc138 nfp: bpf: refactor nfp_bpf_check_ptr()
nfp_bpf_check_ptr() mostly looks at the pointer register.
Add a temporary variable to shorten the code.

While at it make sure we print error messages if translation
fails to help users identify the problem (to be carried in
ext_ack in due course).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Jakub Kicinski
ff42bb9fe3 nfp: bpf: add helper for emitting nops
The need to emitting a few nops will become more common soon
as we add stack and map support.  Add a helper.  This allows
for code to be shorter but also may be handy for marking the
nops with a "reason" to ease applying optimizations.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 17:38:37 +09:00
Lipeng
24e750c410 net: hns3: fix a bug about hns3_clean_tx_ring
The return value of hns3_clean_tx_ring means tx ring clean result.
Return true means clean complete and there is no more pakcet need
clean. Retrun false means there is packets need clean and napi need
poll again. The last return of hns3_clean_tx_ring is
"return !!budget" as budget will decrease when clean a buffer.

If there is no valid BD in TX ring, return 0 for hns3_clean_tx_ring
will cause napi poll again and never complete the napi poll. This
patch fixes the bug.

Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC)

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:16:42 +01:00
Lipeng
51145dae27 net: hns3: remove redundant memset when alloc buffer
HW will use packet length to write packets to buffer or read
packets from buffer. There is a redundant memset when alloc buffer,
the memset have no sense and will increase time-consuming.
This patch removes it.

Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC)

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:16:42 +01:00
Lipeng
66b447301a net: hns3: fix the TX/RX ring.queue_index in hns3_ring_get_cfg
The interface hns3_ring_get_cfg only update TX ring queue_index,
but do not update RX ring queue_index. This patch fixes it.

Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC)

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:16:42 +01:00
Lipeng
709eb41ad8 net: hns3: get vf count by pci_sriov_get_totalvfs
This patch gets vf count by standard function pci_sriov_get_totalvfs,
instead of info from NIC HW.

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:16:41 +01:00
Lipeng
7410343eab net: hns3: fix the ops check in hns3_get_rxnfc
1# patch: 07d2995 net: hns3: add support for ETHTOOL_GRXFH.
2# patch: 5668abd net: hns3: add support for set_ringparam.

1# patch adds ae_algo->ops->get_rss_tuple to hns3_get_rxnfc
and 2# patch delete ae_algo->ops->get_tc_size
from hns3_get_rxnfc.This patch fix the ops check in hns3_get_rxnfc.

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:16:41 +01:00
Lipeng
564883bb4d net: hns3: fix the bug when map buffer fail
If one buffer had been recieved to stack, driver will alloc a new buffer,
map the buffer to device and replace the old buffer. When map fail, should
only free the new alloced buffer, but not free all buffers in the ring.

Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC)

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:16:41 +01:00
Lipeng
b9077428ec net: hns3: fix a bug when alloc new buffer
When alloce new buffer to HW, should unmap the old buffer first.
This old code map the old buffer but not unmap the old buffer,
this patch fixes it.

Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC)

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 01:16:41 +01:00
Florian Fainelli
e83b171568 net: systemport: Guard against unmapped TX ring
Because SYSTEMPORT is a (semi) normal network device, the stack may attempt to
queue packets on it oustide of the DSA slave transmit path.  When that happens,
the DSA layer has not had a chance to tag packets with the appropriate per-port
and per-queue information, and if that happens and we don't have a port 0 queue
0 available (e.g: on boards where this does not exist), we will hit a NULL
pointer de-reference in bcm_sysport_select_queue().

Guard against such cases by testing for the TX ring validity.

Fixes: 84ff33eeb23d ("net: systemport: Establish DSA network device queue mapping")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:28:40 +01:00
Ido Schimmel
330e2cc65d mlxsw: spectrum: Add another partition to KVD linear
The KVD linear is currently partitioned into two partitions. One for
single entries and another for groups of 32 entries.

Add another partition consisting of groups of 512 entries which will
allow us to more accurately represent the nexthop weights in non-equal
cost multi-path routing.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
f11fbaf8b5 mlxsw: spectrum: Increase number of linear entries
The memory region where adjacency entries (nexthops) are stored is
called the KVD linear and is configured during initialization with a
size of 64K.

Extend this area with 32K more entries, that will be partitioned into 64
groups of 0.5K entries, thereby allowing us to support weighted nexthops
with high accuracy.

Change the ratio between both types of hash entries, so as to prevent
reduction in the number of double hash entries, which are used for IPv6
neighbours and routes with a prefix length greater than 64.

Note that the user will be able to control all these sizes once the
devlink resource manager is introduced.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
eb789980d0 mlxsw: spectrum_router: Populate adjacency entries according to weights
Up until now the driver assumed all the nexthops have an equal weight
and wrote each to a single adjacency entry.

This patch takes the `weight` parameter into account and populates the
adjacency group according to the relative weight of each nexthop.

Specifically, the weights of all the nexthops that should be offloaded
are first normalized and then used to calculate the upper adjacency
index of each nexthop. This is done according to the hash-threshold
algorithm used by the kernel for IPv4 multi-path routing.

Adjacency groups are currently limited to 32 entries which limits the
weights that can be used, but follow-up patches will introduce groups of
512 entries.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
425a08c673 mlxsw: spectrum_router: Prepare for large adjacency groups
The device has certain restrictions regarding the size of an adjacency
group.

Have the router determine the size of the adjacency group according to
available KVDL allocation sizes and these restrictions.

This was not needed until now since only allocations of up 32 entries
were supported and these are all valid sizes for an adjacency group.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
408bd946bf mlxsw: spectrum_router: Store weight in nexthop struct
As the first step towards non-equal-cost multi-path support, store each
nexthop's weight.

For IPv6 nexthops always set the weight to 1, as it only supports ECMP.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
d672aec45f mlxsw: spectrum: Add ability to query KVDL allocation size
The current KVDL allocation API allows the user to specify the requested
number of entries, but the user has no way of knowing how many entries
were actually allocated.

This works because existing users (e.g., router) request the exact
number they end up using. With the introduction of large adjacency
groups, this will change, as the router will have the ability to choose
from several allocation sizes, where larger allocations provide higher
accuracy with respect to requested weights and better resilience against
nexthop failures.

One option is to have the router try several allocations of descending
size until one succeeds, but a better way is to simply allow it to query
the actual allocation size and then size its request accordingly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
a875a2ee2d mlxsw: spectrum: Better represent KVDL partitions
The KVD linear (KVDL) allocator currently consists of a very large
bitmap that reflects the KVDL's usage. The boundaries of each partition
as well as their allocation size are represented using defines.

This representation requires us to patch all the functions that act on a
partition whenever the partitioning scheme is changed. In addition, it
does not enable the dynamic configuration of the KVDL using the
up-coming resource manager.

Add objects to represent these partitions as well as the accompanying
code that acts on them to perform allocations and de-allocations.

In the following patches, this will allow us to easily add another
partition as well as new operations to act on these partitions.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
e69cd9d75e mlxsw: spectrum_dpipe: Add adjacency group size
The adjacency group size is part of the match on the adjacency group and
should therefore be exposed using dpipe.

When non-equal-cost multi-path support will be introduced, the group's
size will help users understand the exact number of adjacency entries
each nexthop occupies, as a nexthop will no longer correspond to a
single entry.

The output for a multi-path route with two nexthops, one with weight 255
and the second 1 will be:

Example:

$ devlink dpipe table dump pci/0000:01:00.0 name mlxsw_adj
pci/0000:01:00.0:
  index 0
  match_value:
    type field_exact header mlxsw_meta field adj_index value 65536
    type field_exact header mlxsw_meta field adj_size value 512
    type field_exact header mlxsw_meta field adj_hash_index value 0
  action_value:
    type field_modify header ethernet field destination mac value e4:1d:2d:a5:f3:64
    type field_modify header mlxsw_meta field erif_port mapping ifindex mapping_value 3 value 1

  index 1
  match_value:
    type field_exact header mlxsw_meta field adj_index value 65536
    type field_exact header mlxsw_meta field adj_size value 512
    type field_exact header mlxsw_meta field adj_hash_index value 510
  action_value:
    type field_modify header ethernet field destination mac value e4:1d:2d:a5:f3:65
    type field_modify header mlxsw_meta field erif_port mapping ifindex mapping_value 4 value 2

Thus, the first nexthop occupies 510 adjacency entries and the second 2,
which leads to a ratio of 255 to 1.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:05 +01:00
David S. Miller
f8ddadc4db Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
There were quite a few overlapping sets of changes here.

Daniel's bug fix for off-by-ones in the new BPF branch instructions,
along with the added allowances for "data_end > ptr + x" forms
collided with the metadata additions.

Along with those three changes came veritifer test cases, which in
their final form I tried to group together properly.  If I had just
trimmed GIT's conflict tags as-is, this would have split up the
meta tests unnecessarily.

In the socketmap code, a set of preemption disabling changes
overlapped with the rename of bpf_compute_data_end() to
bpf_compute_data_pointers().

Changes were made to the mv88e6060.c driver set addr method
which got removed in net-next.

The hyperv transport socket layer had a locking change in 'net'
which overlapped with a change of socket state macro usage
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 13:39:14 +01:00
Bernd Edlinger
8d5f4b0717 stmmac: Don't access tx_q->dirty_tx before netif_tx_lock
This is the possible reason for different hard to reproduce
problems on my ARMv7-SMP test system.

The symptoms are in recent kernels imprecise external aborts,
and in older kernels various kinds of network stalls and
unexpected page allocation failures.

My testing indicates that the trouble started between v4.5 and v4.6
and prevails up to v4.14.

Using the dirty_tx before acquiring the spin lock is clearly
wrong and was first introduced with v4.6.

Fixes: e3ad57c967 ("stmmac: review RX/TX ring management")

Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:24:43 +01:00
Pieter Jansen van Vuuren
62d3f60b4d nfp: use struct fields for 8 bit-wide access
Use direct access struct fields rather than PREP_FIELD()
macros to manipulate the jump ID and length, both of which
are exactly 8-bits wide. This simplifies the code somewhat.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:09:32 +01:00
Jose Abreu
9454360dec net: stmmac: Prevent infinite loop in get_rx_timestamp_status()
Prevent infinite loop by correctly setting the loop condition to
break when i == 10.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:50:40 +01:00
Jose Abreu
98870943a5 net: stmmac: Fix stmmac_get_rx_hwtstamp()
When using GMAC4 the valid timestamp is from CTX next desc but
we are passing the previous desc to get_rx_timestamp_status()
callback.

Fix this and while at it rework a little bit the function logic.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:50:40 +01:00
Jose Abreu
9c8080d068 net: stmmac: Add missing call to dev_kfree_skb()
When RX HW timestamp is enabled and a frame is discarded we are
not freeing the skb but instead only setting to NULL the entry.

Add a call to dev_kfree_skb_any() so that skb entry is correctly
freed.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:50:40 +01:00
Elena Reshetova
dd8e19456d drivers, net, mlx5: convert fs_node.refcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable fs_node.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:22:39 +01:00