linux-stable/net
Vladimir Oltean d5f19486ce net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors
Some DSA switches (and not only) cannot learn source MAC addresses from
packets injected from the CPU. They only perform hardware address
learning from inbound traffic.

This can be problematic when we have a bridge spanning some DSA switch
ports and some non-DSA ports (which we'll call "foreign interfaces" from
DSA's perspective).

There are 2 classes of problems created by the lack of learning on
CPU-injected traffic:
- excessive flooding, due to the fact that DSA treats those addresses as
  unknown
- the risk of stale routes, which can lead to temporary packet loss

To illustrate the second class, consider the following situation, which
is common in production equipment (wireless access points, where there
is a WLAN interface and an Ethernet switch, and these form a single
bridging domain).

 AP 1:
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
       |                                                       ^        ^
       |                                                       |        |
       |                                                       |        |
       |                                                    Client A  Client B
       |
       |
       |
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 AP 2

- br0 of AP 1 will know that Clients A and B are reachable via wlan0
- the hardware fdb of a DSA switch driver today is not kept in sync with
  the software entries on other bridge ports, so it will not know that
  clients A and B are reachable via the CPU port UNLESS the hardware
  switch itself performs SA learning from traffic injected from the CPU.
  Nonetheless, a substantial number of switches don't.
- the hardware fdb of the DSA switch on AP 2 may autonomously learn that
  Client A and B are reachable through swp0. Therefore, the software br0
  of AP 2 also may or may not learn this. In the example we're
  illustrating, some Ethernet traffic has been going on, and br0 from AP
  2 has indeed learnt that it can reach Client B through swp0.

One of the wireless clients, say Client B, disconnects from AP 1 and
roams to AP 2. The topology now looks like this:

 AP 1:
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
       |                                                            ^
       |                                                            |
       |                                                         Client A
       |
       |
       |                                                         Client B
       |                                                            |
       |                                                            v
 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
 +------------+ +------------+ +------------+ +------------+ +------------+
 +------------------------------------------------------------------------+
 |                                          br0                           |
 +------------------------------------------------------------------------+
 AP 2

- br0 of AP 1 still knows that Client A is reachable via wlan0 (no change)
- br0 of AP 1 will (possibly) know that Client B has left wlan0. There
  are cases where it might never find out though. Either way, DSA today
  does not process that notification in any way.
- the hardware FDB of the DSA switch on AP 1 may learn autonomously that
  Client B can be reached via swp0, if it receives any packet with
  Client 1's source MAC address over Ethernet.
- the hardware FDB of the DSA switch on AP 2 still thinks that Client B
  can be reached via swp0. It does not know that it has roamed to wlan0,
  because it doesn't perform SA learning from the CPU port.

Now Client A contacts Client B.
AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet
segment.
AP 2 sees a frame on swp0 and its fdb says that the destination is swp0.
Hairpinning is disabled => drop.

This problem comes from the fact that these switches have a 'blind spot'
for addresses coming from software bridging. The generic solution is not
to assume that hardware learning can be enabled somehow, but to listen
to more bridge learning events. It turns out that the bridge driver does
learn in software from all inbound frames, in __br_handle_local_finish.
A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the
addresses serviced by the bridge on 'foreign' interfaces. The software
bridge also does the right thing on migration, by notifying that the old
entry is deleted, so that does not need to be special-cased in DSA. When
it is deleted, we just need to delete our static FDB entry towards the
CPU too, and wait.

The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE
events received on its own interfaces, such as static FDB entries.

Luckily we can change that, and DSA can listen to all switchdev FDB
add/del events in the system and figure out if those events were emitted
by a bridge that spans at least one of DSA's own ports. In case that is
true, DSA will also offload that address towards its own CPU port, in
the eventuality that there might be bridge clients attached to the DSA
switch who want to talk to the station connected to the foreign
interface.

In terms of implementation, we need to keep the fdb_info->added_by_user
check for the case where the switchdev event was targeted directly at a
DSA switch port. But we don't need to look at that flag for snooped
events. So the check is currently too late, we need to move it earlier.
This also simplifies the code a bit, since we avoid uselessly allocating
and freeing switchdev_work.

We could probably do some improvements in the future. For example,
multi-bridge support is rudimentary at the moment. If there are two
bridges spanning a DSA switch's ports, and both of them need to service
the same MAC address, then what will happen is that the migration of one
of those stations will trigger the deletion of the FDB entry from the
CPU port while it is still used by other bridge. That could be improved
with reference counting but is left for another time.

This behavior needs to be enabled at driver level by setting
ds->assisted_learning_on_cpu_port = true. This is because we don't want
to inflict a potential performance penalty (accesses through
MDIO/I2C/SPI are expensive) to hardware that really doesn't need it
because address learning on the CPU port works there.

Reported-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 15:34:46 -08:00
..
6lowpan
9p 9p for 5.11-rc1 2020-12-21 10:28:02 -08:00
802
8021q net: vlan: Fixed signedness in vlan_group_prealloc_vid() 2020-09-28 00:51:39 -07:00
appletalk net: appletalk: fix kerneldoc warnings 2020-10-30 11:48:17 -07:00
atm atm: nicstar: Replace in_interrupt() usage 2020-11-18 16:43:55 -08:00
ax25
batman-adv batman-adv: Drop unused soft-interface.h include in fragmentation.c 2020-12-04 08:41:16 +01:00
bluetooth Bluetooth: Increment management interface revision 2020-12-07 17:02:00 +02:00
bpf bpf: fix raw_tp test run in preempt kernel 2020-09-30 08:34:08 -07:00
bpfilter Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" 2020-10-15 12:33:24 -07:00
bridge net: bridge: notify switchdev of disappearance of old FDB entry upon migration 2021-01-07 15:34:45 -08:00
caif caif: Remove duplicate macro SRVL_CTRL_PKT_SIZE 2020-09-05 15:57:05 -07:00
can can: raw: return -ERANGE when filterset does not fit into user space buffer 2021-01-06 15:15:41 +01:00
ceph libceph: align session_key and con_secret to 16 bytes 2020-12-28 20:34:33 +01:00
core udp_tunnel: hard-wire NDOs to udp_tunnel_nic_*_port() helpers 2021-01-07 12:53:29 -08:00
dcb net: dcb: Validate netlink message in DCB handler 2020-12-23 12:19:48 -08:00
dccp selinux/stable-5.11 PR 20201214 2020-12-16 11:01:04 -08:00
decnet treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
dns_resolver
dsa net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors 2021-01-07 15:34:46 -08:00
ethernet net: datagram: fix some kernel-doc markups 2020-11-17 14:15:03 -08:00
ethtool ethtool: fix error paths in ethnl_set_channels() 2020-12-16 13:27:17 -08:00
hsr genetlink: move to smaller ops wherever possible 2020-10-02 19:11:11 -07:00
ieee802154 treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
ife
ipv4 udp_tunnel: reshuffle NETIF_F_RX_UDP_TUNNEL_PORT checks 2021-01-07 12:53:29 -08:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2020-12-18 18:07:14 -08:00
iucv net/af_iucv: use DECLARE_SOCKADDR to cast from sockaddr 2020-12-08 15:56:53 -08:00
kcm net: kcm: Replace fput with sockfd_put 2021-01-05 16:02:32 -08:00
key
l2tp lsm,selinux: pass flowi_common instead of flowi to the LSM hooks 2020-11-23 18:36:21 -05:00
l3mdev net: l3mdev: Fix kerneldoc warning 2020-10-30 11:43:42 -07:00
lapb net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event 2021-01-04 13:42:41 -08:00
llc net: llc: Fix kerneldoc warnings 2020-10-30 11:34:09 -07:00
mac80211 A new set of wireless changes: 2020-12-12 10:07:56 -08:00
mac802154 net: mac802154: convert tasklets to use new tasklet_setup() API 2020-11-07 10:40:56 -08:00
mpls mpls: drop skb's dst in mpls_forward() 2020-11-03 12:55:53 -08:00
mptcp net: mptcp: cap forward allocation to 1M 2020-12-28 13:53:57 -08:00
ncsi net/ncsi: Use real net-device for response handler 2020-12-23 12:22:23 -08:00
netfilter netfilter: nftables: add set expression flags 2020-12-28 10:50:26 +01:00
netlabel Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-11-19 19:08:46 -08:00
netlink netlink: export policy in extended ACK 2020-10-09 20:22:32 -07:00
netrom
nfc net: nfc: nci: Change the NCI close sequence 2021-01-05 16:06:34 -08:00
nsh
openvswitch net: openvswitch: fix TTL decrement exception action execution 2020-12-14 17:18:25 -08:00
packet net: af_packet: fix procfs header for 64-bit pointers 2020-12-18 12:17:23 -08:00
phonet
psample genetlink: move to smaller ops wherever possible 2020-10-02 19:11:11 -07:00
qrtr wireless-drivers-next patches for v5.11 2020-12-04 10:56:37 -08:00
rds rds: stop using dmapool 2020-11-17 15:22:06 -04:00
rfkill rfkill: add a reason to the HW rfkill state 2020-12-11 12:47:17 +01:00
rose rose: Fix Null pointer dereference in rose_send_frame() 2020-11-20 10:04:58 -08:00
rxrpc net: rxrpc: convert comma to semicolon 2020-12-09 16:23:07 -08:00
sched net: sched: prevent invalid Scell_log shift count 2020-12-28 14:52:54 -08:00
sctp sctp: Fix some typo 2020-11-23 17:44:11 -08:00
smc net/smc: fix access to parent of an ib device 2020-12-16 13:33:47 -08:00
strparser
sunrpc NFS client updates for Linux 5.11 2020-12-17 12:15:03 -08:00
switchdev net: switchdev: Fixed kerneldoc warning 2020-09-23 17:46:31 -07:00
tipc net: tipc: Replace expression with offsetof() 2021-01-05 15:43:41 -08:00
tls net: fix proc_fs init handling in af_packet and tls 2020-12-14 19:39:30 -08:00
unix networking changes for the 5.10 merge window 2020-10-15 18:42:13 -07:00
vmw_vsock af_vsock: Assign the vsock transport considering the vsock address flags 2020-12-14 19:33:39 -08:00
wireless A new set of wireless changes: 2020-12-12 10:07:56 -08:00
x25 net: x25: Remove unimplemented X.25-over-LLC code stubs 2020-12-12 17:15:33 -08:00
xdp xsk: Rollback reservation at NETDEV_TX_BUSY 2020-12-18 16:10:21 +01:00
xfrm selinux/stable-5.11 PR 20201214 2020-12-16 11:01:04 -08:00
compat.c iov_iter: transparently handle compat iovecs in import_iovec 2020-10-03 00:02:13 -04:00
devres.c
Kconfig wimax: move out to staging 2020-10-29 19:27:45 +01:00
Makefile wimax: move out to staging 2020-10-29 19:27:45 +01:00
socket.c for-5.11/io_uring-2020-12-14 2020-12-16 12:44:05 -08:00
sysctl_net.c