Commit graph

650580 commits

Author SHA1 Message Date
Arnd Bergmann
321fa4ffd9 net/mlx5e: fix another maybe-uninitialized false-positive
In commit abeffce ("net/mlx5e: Fix a -Wmaybe-uninitialized warning"), I fixed a
gcc warning for the ipv4 offload handling. Now we get the same warning for the
added ipv6 support:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:815:40: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]

We can apply the same workaround here as well.

Fixes: ce99f6b97f ("net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:35:12 -05:00
Parav Pandit
d0d7b10b05 net-next: treewide use is_vlan_dev() helper function.
This patch makes use of is_vlan_dev() function instead of flag
comparison which is exactly done by is_vlan_dev() helper function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jon Maxwell <jmaxwell37@gmail.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:33:29 -05:00
Dan Carpenter
73cfb2a2e4 net/mlx4_en: fix a condition
There is a "||" vs "|" typo here so we test 0x1 instead of 0x6.

Fixes: 1f8176f735 ("net/mlx4_en: Check the enabling pptx/pprx flags in SET_PORT wrapper flow")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 12:01:06 -05:00
Bert Kenward
f820c0ac6c sfc: don't rearm interrupts if busy polling
Since commit 364b605573 ("net: busy-poll: return busypolling status
to drivers"), napi_complete_done() returns a boolean that can be used
by drivers to conditionally rearm interrupts.

Testing with a 7142 shows a small latency improvement of ~100 ns.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:59:36 -05:00
Xin Long
d15c9ede61 sctp: process fwd tsn chunk only when prsctp is enabled
This patch is to check if asoc->peer.prsctp_capable is set before
processing fwd tsn chunk, if not, it will return an ERROR to the
peer, just as rfc3758 section 3.3.1 demands.

Reported-by: Julian Cordes <julian.cordes@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:57:15 -05:00
David S. Miller
3bc32d0396 Merge branch 'mlxsw-cleanup-neigh-handling'
Jiri Pirko says:

====================
mlxsw: cleanup neigh handling

Ido says:

This series addresses long standing issues in the mlxsw driver
concerning neighbour reflection. It also prepares the code for follow-up
changes dealing with proper resource cleanup and nexthop reflection.

The first two patches convert the neighbour reflection code to use an
ordered workqueue, to prevent re-ordering of NEIGH_UPDATE events that
may happen following subsequent patches.

The third to fifth patches remove the ndo_neigh_{construct,destroy}
entry points from the driver, thereby relying only on NEIGH_UPDATE
events for neighbour reflection. This simplifies the code considerably.

Last patches are fallout and adjust nits in the code I noticed while
going over it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:58 -05:00
Ido Schimmel
fd76d9105b mlxsw: spectrum_router: Fix typo in comment
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Ido Schimmel
01b1aa359d mlxsw: spectrum_router: Don't read 'nud_state' without lock
We periodically ask the neighbouring system to try and resolve
neighbours that are used for nexthops, but aren't currently resolved.

However, 'nud_state' is protected by the neighbour lock, so we shouldn't
access it without taking it. Instead, we can simply check the
'connected' field of the neighbour entry, which we update upon
NEIGH_UPDATE events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Ido Schimmel
8a0b727526 mlxsw: spectrum_router: Remove redundant check
We only add neighbour entries that are also used for nexthops to
'nexthop_neighs_list', so when iterating over this list there's no need
to check that the entry is indeed used for nexthops.

Remove the redundant check.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Ido Schimmel
a8eca32615 net: remove ndo_neigh_{construct, destroy} from stacked devices
In commit 18bfb924f0 ("net: introduce default neigh_construct/destroy
ndo calls for L2 upper devices") we added these ndos to stacked devices
such as team and bond, so that calls will be propagated to mlxsw.

However, previous commit removed the reliance on these ndos and no new
users of these ndos have appeared since above mentioned commit. We can
therefore safely remove this dead code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Ido Schimmel
5c8802f14a mlxsw: spectrum_router: Simplify neighbour reflection
Up until now we had two interfaces for neighbour related configuration:
ndo_neigh_{construct,destroy} and NEIGH_UPDATE netevents. The ndos were
used to add and remove neighbours from the driver's cache, whereas the
netevent was used to reflect the neighbours into the device's tables.

However, if the NUD state of a neighbour isn't NUD_VALID or if the
neighbour is dead, then there's really no reason for us to keep it
inside our cache. The only exception to this rule are neighbours that
are also used for nexthops, which we periodically refresh to get them
resolved.

We can therefore eliminate the ndo entry point into the driver and
simplify the code, making it similar to the FIB reflection, which is
based solely on events. This also helps us avoid a locking issue, in
which the RIF cache was traversed without proper locking during
insertion into the neigh entry cache.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:56 -05:00
Ido Schimmel
de04b6a358 mlxsw: spectrum_router: Remove unused variable
Since commit 33b1341cd1 ("mlxsw: spectrum_router: Fix handling of
neighbour structure") we no longer use destination IP for neighbour
lookup, so remove it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:56 -05:00
Ido Schimmel
e60234ddb5 mlxsw: spectrum_router: Use ordered workqueue for neigh updates
We currently associate each neighbour entry with a work item, so it's
not possible to have multiple events queued for the same neighbour
entry. However, this is about to be changed so that the neighbour entry
is only resolved when the work item is scheduled.

The above can result in a mismatch between the kernel's and the device's
neighbour table, unless the associated work items are processed in the
order in which they were submitted.

Do that by migrating the NEIGH_UPDATE work items to be processed in the
ordered workqueue which was recently introduced in mlxsw in commit
a3832b3189 ("mlxsw: core: Create an ordered workqueue for FIB
offload").

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:56 -05:00
Ido Schimmel
a0e4761d9b mlxsw: core: Queue work immediately instead of delaying it
We always use zero delay before queueing a work on the ordered workqueue
('mlxsw_owq'), so use work_struct directly instead of delayable work.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:55 -05:00
David S. Miller
fcdc103dac linux-can-next-for-4.11-20170206
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAliYiW8THG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAe/sog7Dgc9jdWCACU/2SxD8WcIcP5HZgJMxv9K9LyRC++
 qQb1SQKUiedAF0173IQAVawe105LEdZS08o8ovNCzcU0gLHmfAKysjXEnWVqpyLA
 9zOBkl5XzjFsi9KI1TVsn/pUlk3yCq8w2azmt3mi/G5yLoAE0G46pzjbpOYyz67d
 Gyoh4i9OhsQOmgpefkwtyPxlh5Xd28ohdPv8NCBb17No+1Dfs4qji1TTyVY9sqNr
 8AwRl9BmTqDvZMUPRsdXk+XqZyITGLt+kMa80hd+XFmNkoWy6EKMbFiuVjkpZqtL
 kX/Tfe5/mEiBegBvIUVkeOrA7MryNLOf0ZO+bLdNnSTJZJyIro1kAyIc
 =kgvC
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-4.11-20170206' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2017-02-06

this is a pull request of 16 patches for net-next/master.

The first two patches by David Jander and me add the rx-offload
framework for CAN devices to the kernel. The remaining 14 patches
convert the flexcan driver to make use of it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:08:51 -05:00
Elad Raz
e158e5ef24 mlxsw: reg: Fix HTGT register length
HTGT register length is limited to 32 bytes and not 256 bytes.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:07:21 -05:00
Jingju Hou
b60a00f9c5 net: mvneta: implement .set_wol and .get_wol
The mvneta itself does not support WOL, but the PHY might.
So pass the calls to the PHY

Signed-off-by: Jingju Hou <houjingj@marvell.com>
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 10:54:02 -05:00
Marc Kleine-Budde
096de07f1d can: flexcan: switch imx6 and vf610 to timestamp based offloading
This patch switches the imx6 and vf610 based SoCs from the hardware FIFO
to the timestamp based rx offloading.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:45 +01:00
Marc Kleine-Budde
b3cf53e988 can: flexcan: add support for timestamp based rx-offload
The flexcan IP core has 64 mailboxes. For now they are configured for
RX as a hardware FIFO. This FIFO has a fixed depth of 6 CAN frames. In
some high load scenarios it turns out thas this buffer is too small.

In order to have a buffer larger than the 6 frames FIFO, this patch adds
support for timestamp based offloading via the generic rx-offload
infrastructure.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:44 +01:00
Marc Kleine-Budde
9eb7aa8911 can: flexcan: add quirk FLEXCAN_QUIRK_ENABLE_EACEN_RRS
In order to receive RTR frames in the non HW FIFO mode the RSS and EACEN bits
of the reg_ctrl2 have to be activated. As this has no side effect in the FIFO
mode, we do this unconditionally on cores with the reg_ctrl2.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:42 +01:00
Marc Kleine-Budde
4bd888a80b can: flexcan: activate individual RX masking and initialize reg_rximr
Modern flexcan IP cores support two RX modes. One is using the 6 fames deep
hardware FIFO, the other is using up to 64 mailboxes (in non FIFO mode). For
now only the HW FIFO mode is activated.

In order to make use of the RX mailboxes the individual RX masking feature has
to be activated, otherwise matching mailboxes are overwritten during the
reception process. This however switches on the individual RX masking, which
uses reg_rximr registers for masking.

This patch activates the individual RX masking feature unconditionally and
initializes the mask registers (reg_rximr) with 0x0 == "don't care", which
switches off any filtering.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:41 +01:00
Marc Kleine-Budde
30164759db can: flexcan: make use of rx-offload's irq_offload_fifo
This patch converts the flexcan driver to make use of the rx-offload
can_rx_offload_irq_offload_fifo() helper function. The idea is to read
the CAN frames already in the interrupt context, as the depth of the
flexcan HW FIFO is too shallow, resulting in too many missed frames.
During a normal NAPI poll the frames are the pushed into the upper
layers.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:39 +01:00
Marc Kleine-Budde
b93917c370 can: flexcan: make TX mailbox selectable during runtime
This patch makes the TX mailbox selectable duing runtime. This is a preparation
patch to use of the hardware FIFO selectable via runtime. As the TX mailbox
number is different in HW FIFO and normal mode.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:37 +01:00
Marc Kleine-Budde
28ac7dcd5b can: flexcan: calculate default value for imask1 during runtime
This patch converts the define FLEXCAN_IFLAG_DEFAULT into the runtime
calculated value priv->reg_imask1_default. This is a preparation patch to make
the TX mailbox selectable during runtime, too.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:36 +01:00
Marc Kleine-Budde
dd2f122a96 can: flexcan: flexcan_irq(): don't unconditionally return IRQ_HANDLED
This patch changes the flexcan_irq() function to only return
IRQ_HANDLED, if the interrupt really has been handled, otherwise
IRQ_NONE is returned.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:35 +01:00
Marc Kleine-Budde
a5c02f668c can: flexcan: flexcan_poll_bus_err(): fold in do_bus_err()
This patch folds in the do_bus_err() function into
flexcan_poll_bus_err().

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:34 +01:00
Marc Kleine-Budde
238443df81 can: flexcan: flexcan_poll_state(): no need to initialize new_state, rx_state, tx_state
This patch removed the not needed initialisation from the new_state,
rx_state, tx_state variabled.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:32 +01:00
Marc Kleine-Budde
d166f56bf5 can: flexcan: do_bus_err(): convert rx_,tx_errors into bool
This patch converts the rx_errors and tx_errors from int into bool
values, to reflect their actual meaning.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:31 +01:00
Marc Kleine-Budde
a3c11a7ac6 can: flexcan: make declaration of devtype_data const
This patch changes the declaration of the devtype data to const.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:30 +01:00
Marc Kleine-Budde
1c10feee3e can: flexcan: remove write-only member pdata of struct flexcan_priv
This patch removes the write only member pdata from the struct
flexcan_priv.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:28 +01:00
Marc Kleine-Budde
62d1086e87 can: flexcan: add missing register definitions
This patch adds some missing register definitions, which are needed in an
upcoming patch.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:27 +01:00
Marc Kleine-Budde
3abbac0b5d can: rx-offload: Add support for timestamp based irq offloading
Some CAN controllers don't implement a FIFO in hardware, but fill their
mailboxes in a particular order (from lowest to highest or highest to lowest).
This makes problems to read the frames in the correct order from the hardware,
as new frames might be filled into just read (low) mailboxes. This gets worse,
when following new frames are received into not read (higher) mailboxes.

On the bright side some these CAN controllers put a timestamp on each received
CAN frame. This patch adds support to offload CAN frames in interrupt context,
order them by timestamp and then transmitted in a NAPI context.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:24 +01:00
David Jander
d254586c34 can: rx-offload: Add support for HW fifo based irq offloading
Some CAN controllers have a usable FIFO already but can still benefit
from off-loading the CAN controller FIFO. The CAN frames of the FIFO are
read and put into a skb queue during interrupt and then transmitted in a
NAPI context.

Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-02-06 15:13:23 +01:00
David S. Miller
bd092ad146 Merge branch 'remove-__napi_complete_done'
Eric Dumazet says:

====================
net: get rid of __napi_complete()

This patch series removes __napi_complete() calls, in an effort
to make NAPI API simpler and generalize GRO and napi_complete_done()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:59 -05:00
Eric Dumazet
02c1602ee7 net: remove __napi_complete()
All __napi_complete() callers have been converted to
use the more standard napi_complete_done(),
we can now remove this NAPI method for good.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet
32e19300a4 aeroflex/greth: use napi_complete_done()
We plan to remove __napi_complete() soon,
this driver is the last user.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet
3d1a6333d9 ibm/emac: use napi_complete_done()
Use napi_complete_done() instead of __napi_complete()

We plan to remove __napi_complete() to reduce NAPI complexity.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet
0eb7b85c96 qla3xxx: add GRO support
Use napi_complete_done() instead of __napi_complete() to :

1) Get support of gro_flush_timeout if opt-in
2) Not rearm interrupts for busy-polling users.
3) use standard NAPI API.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet
7ea4007757 ks8695net: add GRO support
Use napi_complete_done() instead of __napi_complete() to :

1) Get support of gro_flush_timeout if opt-in
2) Not rearm interrupts for busy-polling users.
3) use standard NAPI API.

Note that rx_lock seems to be useless, NAPI logic should
not need this extra care.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet
135844ef9f skge: use napi_complete_done()
Use napi_complete_done() instead of __napi_complete() to :

1) Get support of gro_flush_timeout if opt-in
2) Not rearm interrupts for busy-polling users.
3) use standard NAPI API and get rid of napi_gro_flush()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet
a396178972 ep93xx_eth: add GRO support
Use napi_complete_done() instead of __napi_complete() to :

1) Get support of gro_flush_timeout if opt-in
2) Not rearm interrupts for busy-polling users.
3) use standard NAPI API.
4) get rid of baroque code and ease maintenance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet
5b2ec6f2be pcnet32: use napi_complete_done()
Use napi_complete_done() instead of __napi_complete() to :

1) Get support of gro_flush_timeout if opt-in
2) Not rearm interrupts for busy-polling users.
3) use standard NAPI API.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet
c46e9907d4 amd8111e: add GRO support
Use napi_complete_done() instead of __napi_complete() to :

1) Get support of gro_flush_timeout if opt-in
2) Not rearm interrupts for busy-polling users.
3) use standard NAPI API.
4) get rid of baroque code and ease maintenance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet
1fa8c5f33a epic100: use napi_complete_done()
Use napi_complete_done() instead of __napi_complete() to :

1) Get support of gro_flush_timeout if opt-in
2) Not rearm interrupts for busy-polling users.
3) use standard NAPI API.
4) get rid of baroque code and ease maintenance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet
ab1e7e1d26 8139cp: use napi_complete_done()
Use napi_complete_done() instead of __napi_complete() to :

1) Get support of gro_flush_timeout if opt-in
2) Not rearm interrupts for busy-polling users.
3) use standard NAPI API.
4) Eventually get rid of napi_gro_flush() in the future.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
Eric Dumazet
617f01211b 8139too: use napi_complete_done()
Use napi_complete_done() instead of __napi_complete() to :

1) Get support of gro_flush_timeout if opt-in
2) Not rearm interrupts for busy-polling users.
3) use standard NAPI API.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 16:11:57 -05:00
David S. Miller
3976001c9d Merge branch 'ipv6-Improve-user-experience-with-multipath-routes'
David Ahern says:

====================
net: ipv6: Improve user experience with multipath routes

This series closes a couple of gaps between IPv4 and IPv6 with respect
to multipath routes:

1. IPv4 allows all nexthops of multipath routes to be deleted using just
   the prefix and length; IPv6 only deletes the first nexthop for the
   route if only the prefix and length are given.

2. IPv4 returns multipath routes encoded in the RTA_MULTIPATH attribute.
   IPv6 returns a series of routes with the same prefix and length - one
   for each nexthop. This happens for both dumps and notifications.

IPv6 does accept RTA_MULTIPATH encoded routes, but installs them as a
series of routes.

Patch 1 addresses the first item by allowing IPv6 multipath routes to be
deleted using just the prefix and length. Patch 2 addresses the second
allowing IPv6 multipath routes to be returned encoded in the RTA_MULTIPATH.

Patches 3 and 4 upate the RTM_{NEW,DEL}ROUTE notifications to generate
1 notification with RTA_MULTIPATH where applicable.

Patch 5 prints IPv6 addresses in compressed format when showing route
replace errors. This was noticed testing REPLACE failures.

The end result for multipath routes:
1. Dump
   - RTA_MULTIPATH used for multipath routes

    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:200::/120 metric 1024
	    nexthop via 2001:db8:1::2  dev eth1 weight 1
	    nexthop via 2001:db8:2::2  dev eth2 weight 1
    ...

2. Route Add
   - one notification with RTA_MULTIPATH attribute

    $ ip -6 ro add vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::2 nexthop via 2001:db8:2::2

    $ ip mon route
    2001:db8:200::/120 table red metric 1024
	nexthop via 2001:db8:1::2  dev eth1 weight 1
	nexthop via 2001:db8:2::2  dev eth2 weight 1

2. Route Replace
   - one notification with RTA_MULTIPATH attribute

    $ ip -6 ro replace vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::16 nexthop via 2001:db8:2::16

    $ ip mon route
    Replaced 2001:db8:200::/120 table red metric 1024
	    nexthop via 2001:db8:1::16  dev eth1 weight 1
	    nexthop via 2001:db8:2::16  dev eth2 weight 1

   - on a failure after the insertion of the first nexthop (which means
     the original route has been replaced in the FIB), a notification is
     sent with the successful nexthops and then the nexthops are deleted
     with one notification per hop. This is consistent with how it works
     today except the successful additions are coalesced into 1
     notification.

3. Route Delete
   - delete of entire multipath route using prefix/length only 1
     notification is generated:
    $ ip -6 ro del vrf red 2001:db8:200::/120

    $ ip mon route
    Deleted 2001:db8:200::/120 table red metric 1024
	    nexthop via 2001:db8:1::16  dev eth1 weight 1
	    nexthop via 2001:db8:2::16  dev eth2 weight 1

   - if a delete request contains nexthops one notification is
     generated per nexthop deleted. This is unavoidable since IPv6
     alllows a single nexthop to be deleted within a multipath route

4. Route Appends
   - IPv6 allows nexthops to be appended to an existing route. In this
     case one notification is sent for the new route with the append
     flag set.

    $ ip -6 ro append vrf red 2001:db8:200::/120 nexthop via 2001:db8:2::20 nexthop via 2001:db8:1::20

    $ ip mon route
    Append 2001:db8:200::/120 table red metric 1024
	    nexthop via 2001:db8:1::2  dev eth1 weight 1
	    nexthop via 2001:db8:2::2  dev eth2 weight 1
	    nexthop via 2001:db8:2::20  dev eth2 weight 1
	    nexthop via 2001:db8:1::20  dev eth1 weight 1

  - on failure of an append, a notification is sent with the route
    containing all of the nexthops successfully added, and it is
    followed by delete notifications as the hops are removed
    returning the route to its prior state. This is consistent with
    how it works today except the successful additions are coalesced
    into 1 notification.

Addresses some of the inconsistencies also noted by Roopa at netdev0.1:
https://www.netdev01.org/docs/prabhu-linux_ipv4_ipv6_inconsistencies_talk_slides.pdf

v4
- changed series to do encoding in 1 patch and updating notificatons
  in separate patches to make it easier to review and understand

- 1 notification for delete when using prefix/length; 1 notification for
  append

- handle delete of a single nexthop without RTA_MULTIPATH in delete request

- upated commit messages and cover letter

v3
- removed the need for a user API to opt-in to change. Requiring an
  API just shifts the difference from same API with different
  behavior to different API to achieve equivalent behavior
- route notifications changed to use RTA_MULTIPATH for add and replace
- upated commit messages and cover letter

v2
- fixed locking in patch 1 as noted by DaveM
- changed user API for patch 2 to require an rtmsg with RTM_F_ALL_NEXTHOPS
  set in rtm_flags
- revamped explanation of patch 2 and cover letter
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-04 19:58:15 -05:00
David Ahern
7d4d5065ec net: ipv6: Use compressed IPv6 addresses showing route replace error
ip6_print_replace_route_err logs an error if a route replace fails with
IPv6 addresses in the full format. e.g,:

IPv6: IPV6: multipath route replace failed (check consistency of installed routes): 2001:0db8:0200:0000:0000:0000:0000:0000 nexthop 2001:0db8:0001:0000:0000:0000:0000:0016 ifi 0

Change the message to dump the addresses in the compressed format.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-04 19:58:14 -05:00
David Ahern
16a16cd35e net: ipv6: Change notifications for multipath delete to RTA_MULTIPATH
If an entire multipath route is deleted using prefix and len (without any
nexthops), send a single RTM_DELROUTE notification with the full route
using RTA_MULTIPATH. This is done by generating the skb before the route
delete when all of the sibling routes are still present but sending it
after the route has been removed from the FIB. The skip_notify flag
is used to tell the lower fib code not to send notifications for the
individual nexthop routes.

If a route is deleted using RTA_MULTIPATH for any nexthops or a single
nexthop entry is deleted, then the nexthops are deleted one at a time with
notifications sent as each hop is deleted. This is necessary given that
IPv6 allows individual hops within a route to be deleted.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-04 19:58:14 -05:00
David Ahern
3b1137fe74 net: ipv6: Change notifications for multipath add to RTA_MULTIPATH
Change ip6_route_multipath_add to send one notifciation with the full
route encoded with RTA_MULTIPATH instead of a series of individual routes.
This is done by adding a skip_notify flag to the nl_info struct. The
flag is used to skip sending of the notification in the fib code that
actually inserts the route. Once the full route has been added, a
notification is generated with all nexthops.

ip6_route_multipath_add handles 3 use cases: new routes, route replace,
and route append. The multipath notification generated needs to be
consistent with the order of the nexthops and it should be consistent
with the order in a FIB dump which means the route with the first nexthop
needs to be used as the route reference. For the first 2 cases (new and
replace), a reference to the route used to send the notification is
obtained by saving the first route added. For the append case, the last
route added is used to loop back to its first sibling route which is
the first nexthop in the multipath route.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-04 19:58:14 -05:00