Commit Graph

452 Commits

Author SHA1 Message Date
Linus Lüssing 04a553d8ac bridge: mcast: fix disabled snooping after long uptime
[ Upstream commit f5c3eb4b72 ]

The original idea of the delay_time check was to not apply multicast
snooping too early when an MLD querier appears. And to instead wait at
least for MLD reports to arrive before switching from flooding to group
based, MLD snooped forwarding, to avoid temporary packet loss.

However in a batman-adv mesh network it was noticed that after 248 days of
uptime 32bit MIPS based devices would start to signal that they had
stopped applying multicast snooping due to missing queriers - even though
they were the elected querier and still sending MLD queries themselves.

While time_is_before_jiffies() generally is safe against jiffies
wrap-arounds, like the code comments in jiffies.h explain, it won't
be able to track a difference larger than ULONG_MAX/2. With a 32bit
large jiffies and one jiffies tick every 10ms (CONFIG_HZ=100) on these MIPS
devices running OpenWrt this would result in a difference larger than
ULONG_MAX/2 after 248 (= 2^32/100/60/60/24/2) days and
time_is_before_jiffies() would then start to return false instead of
true. Leading to multicast snooping not being applied to multicast
packets anymore.

Fix this issue by using a proper timer_list object which won't have this
ULONG_MAX/2 difference limitation.

Fixes: b00589af3b ("bridge: disable snooping if there is no querier")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240127175033.9640-1-linus.luessing@c0d3.blue
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05 20:13:01 +00:00
Alaa Mohamed ca4567f1e6 rtnetlink: add extack support in fdb del handlers
Add extack support to .ndo_fdb_del in netdevice.h and
all related methods.

Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-09 11:58:20 +01:00
Nikolay Aleksandrov 564445fb4f net: bridge: fdb: add support for flush filtering based on ndm flags and state
Add support for fdb flush filtering based on ndm flags and state. NDM
state and flags are mapped to bridge-specific flags and matched
according to the specified masks. NTF_USE is used to represent
added_by_user flag since it sets it on fdb add and we don't have a 1:1
mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are
ignored.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13 12:46:26 +01:00
Nikolay Aleksandrov 1f78ee14ee net: bridge: fdb: add support for fine-grained flushing
Add the ability to specify exactly which fdbs to be flushed. They are
described by a new structure - net_bridge_fdb_flush_desc. Currently it
can match on port/bridge ifindex, vlan id and fdb flags. It is used to
describe the existing dynamic fdb flush operation. Note that this flush
operation doesn't treat permanent entries in a special way (fdb_delete vs
fdb_delete_local), it will delete them regardless if any port is using
them, so currently it can't directly replace deletes which need to handle
that case, although we can extend it later for that too.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13 12:46:26 +01:00
Nikolay Aleksandrov edaef19172 net: bridge: fdb: add ndo_fdb_del_bulk
Add a minimal ndo_fdb_del_bulk implementation which flushes all entries.
Support for more fine-grained filtering will be added in the following
patches.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13 12:46:26 +01:00
Tobias Waldekranz 122c29486e net: bridge: mst: Support setting and reporting MST port states
Make it possible to change the port state in a given MSTI by extending
the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The
proposed iproute2 interface would be:

    bridge mst set dev <PORT> msti <MSTI> state <STATE>

Current states in all applicable MSTIs can also be dumped via a
corresponding RTM_GETLINK. The proposed iproute interface looks like
this:

$ bridge mst
port              msti
vb1               0
		    state forwarding
		  100
		    state disabled
vb2               0
		    state forwarding
		  100
		    state forwarding

The preexisting per-VLAN states are still valid in the MST
mode (although they are read-only), and can be queried as usual if one
is interested in knowing a particular VLAN's state without having to
care about the VID to MSTI mapping (in this example VLAN 20 and 30 are
bound to MSTI 100):

$ bridge -d vlan
port              vlan-id
vb1               10
		    state forwarding mcast_router 1
		  20
		    state disabled mcast_router 1
		  30
		    state disabled mcast_router 1
		  40
		    state forwarding mcast_router 1
vb2               10
		    state forwarding mcast_router 1
		  20
		    state forwarding mcast_router 1
		  30
		    state forwarding mcast_router 1
		  40
		    state forwarding mcast_router 1

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17 16:49:57 -07:00
Tobias Waldekranz 8c678d6056 net: bridge: mst: Allow changing a VLAN's MSTI
Allow a VLAN to move out of the CST (MSTI 0), to an independent tree.

The user manages the VID to MSTI mappings via a global VLAN
setting. The proposed iproute2 interface would be:

    bridge vlan global set dev br0 vid <VID> msti <MSTI>

Changing the state in non-zero MSTIs is still not supported, but will
be addressed in upcoming changes.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17 16:49:57 -07:00
Tobias Waldekranz ec7328b591 net: bridge: mst: Multiple Spanning Tree (MST) mode
Allow the user to switch from the current per-VLAN STP mode to an MST
mode.

Up to this point, per-VLAN STP states where always isolated from each
other. This is in contrast to the MSTP standard (802.1Q-2018, Clause
13.5), where VLANs are grouped into MST instances (MSTIs), and the
state is managed on a per-MSTI level, rather that at the per-VLAN
level.

Perhaps due to the prevalence of the standard, many switching ASICs
are built after the same model. Therefore, add a corresponding MST
mode to the bridge, which we can later add offloading support for in a
straight-forward way.

For now, all VLANs are fixed to MSTI 0, also called the Common
Spanning Tree (CST). That is, all VLANs will follow the port-global
state.

Upcoming changes will make this actually useful by allowing VLANs to
be mapped to arbitrary MSTIs and allow individual MSTI states to be
changed.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17 16:49:57 -07:00
Vladimir Oltean 8d23a54f5b net: bridge: switchdev: differentiate new VLANs from changed ones
br_switchdev_port_vlan_add() currently emits a SWITCHDEV_PORT_OBJ_ADD
event with a SWITCHDEV_OBJ_ID_PORT_VLAN for 2 distinct cases:

- a struct net_bridge_vlan got created
- an existing struct net_bridge_vlan was modified

This makes it impossible for switchdev drivers to properly balance
PORT_OBJ_ADD with PORT_OBJ_DEL events, so if we want to allow that to
happen, we must provide a way for drivers to distinguish between a
VLAN with changed flags and a new one.

Annotate struct switchdev_obj_port_vlan with a "bool changed" that
distinguishes the 2 cases above.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-16 11:21:04 +00:00
Jakub Kicinski aec53e60e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
  commit 077cdda764 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
  commit 31108d142f ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
  commit 4390c6edc0 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
  https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/

net/smc/smc_wr.c
  commit 49dc9013e3 ("net/smc: Use the bitmap API when applicable")
  commit 349d43127d ("net/smc: fix kernel panic caused by race of smc_sock")
  bitmap_zero()/memset() is removed by the fix

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-30 12:12:12 -08:00
Nikolay Aleksandrov 168fed986b net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper
We need to first check if the context is a vlan one, then we need to
check the global bridge multicast vlan snooping flag, and finally the
vlan's multicast flag, otherwise we will unnecessarily enable vlan mcast
processing (e.g. querier timers).

Fixes: 7b54aaaf53 ("net: bridge: multicast: add vlan state initialization and control")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20211228153142.536969-1-nikolay@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-29 17:49:45 -08:00
Nikolay Aleksandrov f83a112bd9 net: bridge: mcast: add and enforce startup query interval minimum
As reported[1] if startup query interval is set too low in combination with
large number of startup queries and we have multiple bridges or even a
single bridge with multiple querier vlans configured we can crash the
machine. Add a 1 second minimum which must be enforced by overwriting the
value if set lower (i.e. without returning an error) to avoid breaking
user-space. If that happens a log message is emitted to let the admin know
that the startup interval has been set to the minimum. It doesn't make
sense to make the startup interval lower than the normal query interval
so use the same value of 1 second. The issue has been present since these
intervals could be user-controlled.

[1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/

Fixes: d902eee43f ("bridge: Add multicast count/interval sysfs entries")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-29 12:59:38 -08:00
Nikolay Aleksandrov 99b4061095 net: bridge: mcast: add and enforce query interval minimum
As reported[1] if query interval is set too low and we have multiple
bridges or even a single bridge with multiple querier vlans configured
we can crash the machine. Add a 1 second minimum which must be enforced
by overwriting the value if set lower (i.e. without returning an error) to
avoid breaking user-space. If that happens a log message is emitted to let
the administrator know that the interval has been set to the minimum.
The issue has been present since these intervals could be user-controlled.

[1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/

Fixes: d902eee43f ("bridge: Add multicast count/interval sysfs entries")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-29 12:59:37 -08:00
Eric Dumazet b2dcdc7f73 net: bridge: add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:58 -08:00
Jakub Kicinski 8a33dcc2f6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in the fixes we had queued in case there was another -rc.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01 20:05:14 -07:00
Vladimir Oltean ae0393500e net: bridge: switchdev: fix shim definition for br_switchdev_mdb_notify
br_switchdev_mdb_notify() is conditionally compiled only when
CONFIG_NET_SWITCHDEV=y and CONFIG_BRIDGE_IGMP_SNOOPING=y. It is called
from br_mdb.c, which is conditionally compiled only when
CONFIG_BRIDGE_IGMP_SNOOPING=y.

The shim definition of br_switchdev_mdb_notify() is therefore needed for
the case where CONFIG_NET_SWITCHDEV=n, however we mistakenly put it
there for the case where CONFIG_BRIDGE_IGMP_SNOOPING=n. This results in
build failures when CONFIG_BRIDGE_IGMP_SNOOPING=y and
CONFIG_NET_SWITCHDEV=n.

To fix this, put the shim definition right next to
br_switchdev_fdb_notify(), which is properly guarded by NET_SWITCHDEV=n.
Since this is called only from br_mdb.c, we need not take any extra
safety precautions, when NET_SWITCHDEV=n and BRIDGE_IGMP_SNOOPING=n,
this shim definition will be absent but nobody will be needing it.

Fixes: 9776457c78 ("net: bridge: mdb: move all switchdev logic to br_switchdev.c")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211029223606.3450523-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-29 21:32:39 -07:00
Ivan Vecera 829e050eea net: bridge: fix uninitialized variables when BRIDGE_CFM is disabled
Function br_get_link_af_size_filtered() calls br_cfm_{,peer}_mep_count()
that return a count. When BRIDGE_CFM is not enabled these functions
simply return -EOPNOTSUPP but do not modify count parameter and
calling function then works with uninitialized variables.
Modify these inline functions to return zero in count parameter.

Fixes: b6d0425b81 ("bridge: cfm: Netlink Notifications.")
Cc: Henrik Bjoernlund <henrik.bjoernlund@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29 13:31:32 +01:00
Vladimir Oltean 9776457c78 net: bridge: mdb: move all switchdev logic to br_switchdev.c
The following functions:

br_mdb_complete
br_switchdev_mdb_populate
br_mdb_replay_one
br_mdb_queue_one
br_mdb_replay
br_mdb_switchdev_host_port
br_mdb_switchdev_host
br_switchdev_mdb_notify

are only accessible from code paths where CONFIG_NET_SWITCHDEV is
enabled. So move them to br_switchdev.c, in order for that code to be
compiled out if that config option is disabled.

Note that br_switchdev.c gets build regardless of whether
CONFIG_BRIDGE_IGMP_SNOOPING is enabled or not, whereas br_mdb.c only got
built when CONFIG_BRIDGE_IGMP_SNOOPING was enabled. So to preserve
correct compilation with CONFIG_BRIDGE_IGMP_SNOOPING being disabled, we
must now place an #ifdef around these functions in br_switchdev.c.
The offending bridge data structures that need this are
br->multicast_lock and br->mdb_list, these are also compiled out of
struct net_bridge when CONFIG_BRIDGE_IGMP_SNOOPING is turned off.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28 20:05:57 -07:00
Vladimir Oltean 4a6849e461 net: bridge: move br_vlan_replay to br_switchdev.c
br_vlan_replay() is relevant only if CONFIG_NET_SWITCHDEV is enabled, so
move it to br_switchdev.c.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28 20:05:57 -07:00
Vladimir Oltean c5f6e5ebc2 net: bridge: provide shim definition for br_vlan_flags
br_vlan_replay() needs this, and we're preparing to move it to
br_switchdev.c, which will be compiled regardless of whether or not
CONFIG_BRIDGE_VLAN_FILTERING is enabled.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28 20:05:57 -07:00
Vladimir Oltean 5cda5272a4 net: bridge: move br_fdb_replay inside br_switchdev.c
br_fdb_replay is only called from switchdev code paths, so it makes
sense to be disabled if switchdev is not enabled in the first place.

As opposed to br_mdb_replay and br_vlan_replay which might be turned off
depending on bridge support for multicast and VLANs, FDB support is
always on. So moving br_mdb_replay and br_vlan_replay inside
br_switchdev.c would mean adding some #ifdef's in br_switchdev.c, so we
keep those where they are.

The reason for the movement is that in future changes there will be some
code reuse between br_switchdev_fdb_notify and br_fdb_replay.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
Vladimir Oltean f6814fdcfe net: bridge: rename br_fdb_insert to br_fdb_add_local
br_fdb_insert() is a wrapper over fdb_insert() that also takes the
bridge hash_lock.

With fdb_insert() being renamed to fdb_add_local(), rename
br_fdb_insert() to br_fdb_add_local().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
Nikolay Aleksandrov fac3cb82a5 net: bridge: mcast: use multicast_membership_interval for IGMPv3
When I added IGMPv3 support I decided to follow the RFC for computing
the GMI dynamically:
" 8.4. Group Membership Interval

   The Group Membership Interval is the amount of time that must pass
   before a multicast router decides there are no more members of a
   group or a particular source on a network.

   This value MUST be ((the Robustness Variable) times (the Query
   Interval)) plus (one Query Response Interval)."

But that actually is inconsistent with how the bridge used to compute it
for IGMPv2, where it was user-configurable that has a correct default value
but it is up to user-space to maintain it. This would make it consistent
with the other timer values which are also maintained correct by the user
instead of being dynamically computed. It also changes back to the previous
user-expected GMI behaviour for IGMPv3 queries which were supported before
IGMPv3 was added. Note that to properly compute it dynamically we would
need to add support for "Robustness Variable" which is currently missing.

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: 0436862e41 ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16 15:05:58 +01:00
Thomas Gleixner f936bb42ae net: bridge: mcast: Associate the seqcount with its protecting lock.
The sequence count bridge_mcast_querier::seq is protected by
net_bridge::multicast_lock but seqcount_init() does not associate the
seqcount with the lock. This leads to a warning on PREEMPT_RT because
preemption is still enabled.

Let seqcount_init() associate the seqcount with lock that protects the
write section. Remove lockdep_assert_held_once() because lockdep already checks
whether the associated lock is held.

Fixes: 67b746f94f ("net: bridge: mcast: make sure querier port/address updates are consistent")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20210928141049.593833-1-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-28 17:30:36 -07:00
Nikolay Aleksandrov 2796d846d7 net: bridge: vlan: convert mcast router global option to per-vlan entry
The per-vlan router option controls the port/vlan and host vlan entries'
mcast router config. The global option controlled only the host vlan
config, but that is unnecessary and incosistent as it's not really a
global vlan option, but rather bridge option to control host router
config, so convert BRIDGE_VLANDB_GOPTS_MCAST_ROUTER to
BRIDGE_VLANDB_ENTRY_MCAST_ROUTER which can be used to control both host
vlan and port vlan mcast router config.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20 15:00:35 +01:00
Nikolay Aleksandrov a53581d555 net: bridge: mcast: br_multicast_set_port_router takes multicast context as argument
Change br_multicast_set_port_router to take port multicast context as
its first argument so we can later use it to control port/vlan mcast
router option.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20 15:00:35 +01:00
Nikolay Aleksandrov affce9a774 net: bridge: mcast: toggle also host vlan state in br_multicast_toggle_vlan
When changing vlan mcast state by br_multicast_toggle_vlan it iterates
over all ports and enables/disables the port mcast ctx based on the new
state, but I forgot to update the host vlan (bridge master vlan entry)
with the new state so it will be left out. Also that function is not
used outside of br_multicast.c, so make it static.

Fixes: f4b7002a70 ("net: bridge: add vlan mcast snooping knob")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:37:29 +01:00
Nikolay Aleksandrov 05d6f38ec0 net: bridge: vlan: account for router port lists when notifying
When sending a global vlan notification we should account for the number
of router ports when allocating the skb, otherwise we might end up
losing notifications.

Fixes: dc002875c2 ("net: bridge: vlan: use br_rports_fill_info() to export mcast router ports")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:37:29 +01:00
Nikolay Aleksandrov c7fa1d9b1f net: bridge: mcast: dump ipv4 querier state
Add support for dumping global IPv4 querier state, we dump the state
only if our own querier is enabled or there has been another external
querier which has won the election. For the bridge global state we use
a new attribute IFLA_BR_MCAST_QUERIER_STATE and embed the state inside.
The structure is:
 [IFLA_BR_MCAST_QUERIER_STATE]
  `[BRIDGE_QUERIER_IP_ADDRESS] - ip address of the querier
  `[BRIDGE_QUERIER_IP_PORT]    - bridge port ifindex where the querier was
                                 seen (set only if external querier)
  `[BRIDGE_QUERIER_IP_OTHER_TIMER]   -  other querier timeout

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14 14:02:43 +01:00
Nikolay Aleksandrov 67b746f94f net: bridge: mcast: make sure querier port/address updates are consistent
Use a sequence counter to make sure port/address updates can be read
consistently without requiring the bridge multicast_lock. We need to
zero out the port and address when the other querier has expired and
we're about to select ourselves as querier. br_multicast_read_querier
will be used later when dumping querier state. Updates are done only
with the multicast spinlock and softirqs disabled, while reads are done
from process context and from softirqs (due to notifications).

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14 14:02:43 +01:00
Nikolay Aleksandrov bb18ef8e7e net: bridge: mcast: record querier port device ifindex instead of pointer
Currently when a querier port is detected its net_bridge_port pointer is
recorded, but it's used only for comparisons so it's fine to have stale
pointer, in order to dereference and use the port pointer a proper
accounting of its usage must be implemented adding unnecessary
complexity. To solve the problem we can just store the netdevice ifindex
instead of the port pointer and retrieve the bridge port. It is a best
effort and the device needs to be validated that is still part of that
bridge before use, but that is small price to pay for avoiding querier
reference counting for each port/vlan.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14 14:02:43 +01:00
Jakub Kicinski f4083a752a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
  9e26680733 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
  9e518f2580 ("bnxt_en: 1PPS functions to configure TSIO pins")
  099fdeda65 ("bnxt_en: Event handler for PPS events")

kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
  a2baf4e8bb ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
  c7603cfa04 ("bpf: Add ambient BPF runtime context stored in current")

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
  5957cc557d ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
  2d0b41a376 ("net/mlx5: Refcount mlx5_irq with integer")

MAINTAINERS
  7b637cd52f ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
  7d901a1e87 ("net: phy: add Maxlinear GPY115/21x/24x driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13 06:41:22 -07:00
Nikolay Aleksandrov dc002875c2 net: bridge: vlan: use br_rports_fill_info() to export mcast router ports
Embed the standard multicast router port export by br_rports_fill_info()
into a new global vlan attribute BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS.
In order to have the same format for the global bridge mcast context and
the per-vlan mcast context we need a double-nesting:
 - BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS
   - MDBA_ROUTER

Currently we don't compare router lists, if any router port exists in
the bridge mcast contexts we consider their option sets as different and
export them separately.

In addition we export the router port vlan id when dumping similar to
the router port notification format.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov a97df080b6 net: bridge: vlan: add support for mcast router global option
Add support to change and retrieve global vlan multicast router state
which is used for the bridge itself. We just need to pass multicast context
to br_multicast_set_router instead of bridge device and the rest of the
logic remains the same.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov 62938182c3 net: bridge: vlan: add support for mcast querier global option
Add support to change and retrieve global vlan multicast querier state.
We just need to pass multicast context to br_multicast_set_querier
instead of bridge device and the rest of the logic remains the same.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov cb486ce995 net: bridge: mcast: querier and query state affect only current context type
It is a minor optimization and better behaviour to make sure querier and
query sending routines affect only the matching multicast context
depending if vlan snooping is enabled (vlan ctx vs bridge ctx).
It also avoids sending unnecessary extra query packets.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov 4d5b4e84c7 net: bridge: mcast: move querier state to the multicast context
We need to have the querier state per multicast context in order to have
per-vlan control, so remove the internal option bit and move it to the
multicast context. Also annotate the lockless reads of the new variable.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov 941121ee22 net: bridge: vlan: add support for mcast startup query interval global option
Add support to change and retrieve global vlan multicast startup query
interval option.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov 425214508b net: bridge: vlan: add support for mcast query response interval global option
Add support to change and retrieve global vlan multicast query response
interval option.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov d6c08aba4f net: bridge: vlan: add support for mcast query interval global option
Add support to change and retrieve global vlan multicast query interval
option.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov cd9269d463 net: bridge: vlan: add support for mcast querier interval global option
Add support to change and retrieve global vlan multicast querier interval
option.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov 2da0aea21f net: bridge: vlan: add support for mcast membership interval global option
Add support to change and retrieve global vlan multicast membership
interval option.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov 77f6ababa2 net: bridge: vlan: add support for mcast last member interval global option
Add support to change and retrieve global vlan multicast last member
interval option.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov 50725f6e6b net: bridge: vlan: add support for mcast startup query count global option
Add support to change and retrieve global vlan multicast startup query
count option.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov 931ba87d20 net: bridge: vlan: add support for mcast last member count global option
Add support to change and retrieve global vlan multicast last member
count option.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov df271cd641 net: bridge: vlan: add support for mcast igmp/mld version global options
Add support to change and retrieve global vlan IGMP/MLD versions.

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov 45a687879b net: bridge: fix flags interpretation for extern learn fdb entries
Ignore fdb flags when adding port extern learn entries and always set
BR_FDB_LOCAL flag when adding bridge extern learn entries. This is
closest to the behaviour we had before and avoids breaking any use cases
which were allowed.

This patch fixes iproute2 calls which assume NUD_PERMANENT and were
allowed before, example:
$ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn

Extern learn entries are allowed to roam, but do not expire, so static
or dynamic flags make no sense for them.

Also add a comment for future reference.

Fixes: eb100e0e24 ("net: bridge: allow to add externally learned entries from user-space")
Fixes: 0541a62932 ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-10 11:29:39 -07:00
Jakub Kicinski 0ca8d3ca45 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Build failure in drivers/net/wwan/mhi_wwan_mbim.c:
add missing parameter (0, assuming we don't want buffer pre-alloc).

Conflict in drivers/net/dsa/sja1105/sja1105_main.c between:
  589918df93 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
  0fac6aa098 ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode")

Follow the instructions from the commit message of the former commit
- removed the if conditions. When looking at commit 589918df93 ("net:
dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
note that the mask_iotag fields get removed by the following patch.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-05 15:08:47 -07:00
Vladimir Oltean 957e2235e5 net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge
With the introduction of explicit offloading API in switchdev in commit
2f5dc00f7a ("net: bridge: switchdev: let drivers inform which bridge
ports are offloaded"), we started having Ethernet switch drivers calling
directly into a function exported by net/bridge/br_switchdev.c, which is
a function exported by the bridge driver.

This means that drivers that did not have an explicit dependency on the
bridge before, like cpsw and am65-cpsw, now do - otherwise it is not
possible to call a symbol exported by a driver that can be built as
module unless you are a module too.

There was an attempt to solve the dependency issue in the form of commit
b0e8181762 ("net: build all switchdev drivers as modules when the
bridge is a module"). Grygorii Strashko, however, says about it:

| In my opinion, the problem is a bit bigger here than just fixing the
| build :(
|
| In case, of ^cpsw the switchdev mode is kinda optional and in many
| cases (especially for testing purposes, NFS) the multi-mac mode is
| still preferable mode.
|
| There were no such tight dependency between switchdev drivers and
| bridge core before and switchdev serviced as independent, notification
| based layer between them, so ^cpsw still can be "Y" and bridge can be
| "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE
| will need to be set as "Y", or we will have to update drivers to
| support build with BRIDGE=n and maintain separate builds for
| networking vs non-networking testing.  But is this enough?  Wouldn't
| it cause 'chain reaction' required to add more and more "Y" options
| (like CONFIG_VLAN_8021Q)?
|
| PS. Just to be sure we on the same page - ARM builds will be forced
| (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our
| automation testing will just fail with omap2plus_defconfig.

In the light of this, it would be desirable for some configurations to
avoid dependencies between switchdev drivers and the bridge, and have
the switchdev mode as completely optional within the driver.

Arnd Bergmann also tried to write a patch which better expressed the
build time dependency for Ethernet switch drivers where the switchdev
support is optional, like cpsw/am65-cpsw, and this made the drivers
follow the bridge (compile as module if the bridge is a module) only if
the optional switchdev support in the driver was enabled in the first
place:
https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/

but this still did not solve the fact that cpsw and am65-cpsw now must
be built as modules when the bridge is a module - it just expressed
correctly that optional dependency. But the new behavior is an apparent
regression from Grygorii's perspective.

So to support the use case where the Ethernet driver is built-in,
NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we
need a framework that can handle the possible absence of the bridge from
the running system, i.e. runtime bloatware as opposed to build-time
bloatware.

Luckily we already have this framework, since switchdev has been using
it extensively. Events from the bridge side are transmitted to the
driver side using notifier chains - this was originally done so that
unrelated drivers could snoop for events emitted by the bridge towards
ports that are implemented by other drivers (think of a switch driver
with LAG offload that listens for switchdev events on a bonding/team
interface that it offloads).

There are also events which are transmitted from the driver side to the
bridge side, which again are modeled using notifiers.
SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with
notifying the bridge that a MAC address has been dynamically learned.
So there is a precedent we can use for modeling the new framework.

The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work
that the bridge needs to do when a port becomes offloaded is blocking in
its nature: replay VLANs, MDBs etc. The calling context is indeed
blocking (we are under rtnl_mutex), but the existing switchdev
notification chain that the bridge is subscribed to is only the atomic
one. So we need to subscribe the bridge to the blocking switchdev
notification chain too.

This patch:
- keeps the driver-side perception of the switchdev_bridge_port_{,un}offload
  unchanged
- moves the implementation of switchdev_bridge_port_{,un}offload from
  the bridge module into the switchdev module.
- makes everybody that is subscribed to the switchdev blocking notifier
  chain "hear" offload & unoffload events
- makes the bridge driver subscribe and handle those events
- moves the bridge driver's handling of those events into 2 new
  functions called br_switchdev_port_{,un}offload. These functions
  contain in fact the core of the logic that was previously in
  switchdev_bridge_port_{,un}offload, just that now we go through an
  extra indirection layer to reach them.

Unlike all the other switchdev notification structures, the structure
used to carry the bridge port information, struct
switchdev_notifier_brport_info, does not contain a "bool handled".
This is because in the current usage pattern, we always know that a
switchdev bridge port offloading event will be handled by the bridge,
because the switchdev_bridge_port_offload() call was initiated by a
NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a
bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event
couldn't have happened.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-04 12:35:07 +01:00
Vladimir Oltean 0541a62932 net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry
Currently it is possible to add broken extern_learn FDB entries to the
bridge in two ways:

1. Entries pointing towards the bridge device that are not local/permanent:

ip link add br0 type bridge
bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn static

2. Entries pointing towards the bridge device or towards a port that
are marked as local/permanent, however the bridge does not process the
'permanent' bit in any way, therefore they are recorded as though they
aren't permanent:

ip link add br0 type bridge
bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn permanent

Since commit 52e4bec155 ("net: bridge: switchdev: treat local FDBs the
same as entries towards the bridge"), these incorrect FDB entries can
even trigger NULL pointer dereferences inside the kernel.

This is because that commit made the assumption that all FDB entries
that are not local/permanent have a valid destination port. For context,
local / permanent FDB entries either have fdb->dst == NULL, and these
point towards the bridge device and are therefore local and not to be
used for forwarding, or have fdb->dst == a net_bridge_port structure
(but are to be treated in the same way, i.e. not for forwarding).

That assumption _is_ correct as long as things are working correctly in
the bridge driver, i.e. we cannot logically have fdb->dst == NULL under
any circumstance for FDB entries that are not local. However, the
extern_learn code path where FDB entries are managed by a user space
controller show that it is possible for the bridge kernel driver to
misinterpret the NUD flags of an entry transmitted by user space, and
end up having fdb->dst == NULL while not being a local entry. This is
invalid and should be rejected.

Before, the two commands listed above both crashed the kernel in this
check from br_switchdev_fdb_notify:

	struct net_device *dev = info.is_local ? br->dev : dst->dev;

info.is_local == false, dst == NULL.

After this patch, the invalid entry added by the first command is
rejected:

ip link add br0 type bridge && bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn static; ip link del br0
Error: bridge: FDB entry towards bridge must be permanent.

and the valid entry added by the second command is properly treated as a
local address and does not crash br_switchdev_fdb_notify anymore:

ip link add br0 type bridge && bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn permanent; ip link del br0

Fixes: eb100e0e24 ("net: bridge: allow to add externally learned entries from user-space")
Reported-by: syzbot+9ba1174359adba5a5b7c@syzkaller.appspotmail.com
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20210801231730.7493-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-02 15:00:48 -07:00