linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-26 20:38:12 +00:00

Author	SHA1	Message	Date
Jakub Kicinski	272ac32f56	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 18:21:16 -07:00
Benjamin Poirier	9b134b1694	bridge: Do not send empty IFLA_AF_SPEC attribute After commit `b6c02ef549` ("bridge: Netlink interface fix."), br_fill_ifinfo() started to send an empty IFLA_AF_SPEC attribute when a bridge vlan dump is requested but an interface does not have any vlans configured. iproute2 ignores such an empty attribute since commit b262a9becbcb ("bridge: Fix output with empty vlan lists") but older iproute2 versions as well as other utilities have their output changed by the cited kernel commit, resulting in failed test cases. Regardless, emitting an empty attribute is pointless and inefficient. Avoid this change by canceling the attribute if no AF_SPEC data was added. Fixes: `b6c02ef549` ("bridge: Netlink interface fix.") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220725001236.95062-1-bpoirier@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-26 15:35:53 +02:00
Florian Westphal	7278b3c1e4	netfilter: nf_tables: add and use BE register load-store helpers Same as the existing ones, no conversions. This is just for sparse sake only so that we no longer mix be16/u16 and be32/u32 types. Alternative is to add __force __beX in various places, but this seems nicer. objdiff shows no changes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:40:46 +02:00
Jakub Kicinski	0d8730f07c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c `9c5de246c1` ("net: sparx5: mdb add/del handle non-sparx5 devices") `fbb89d02e3` ("net: sparx5: Allow mdb entries to both CPU and ports") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-30 16:31:00 -07:00
Florian Westphal	c2577862ee	netfilter: br_netfilter: do not skip all hooks with 0 priority When br_netfilter module is loaded, skbs may be diverted to the ipv4/ipv6 hooks, just like as if we were routing. Unfortunately, bridge filter hooks with priority 0 may be skipped in this case. Example: 1. an nftables bridge ruleset is loaded, with a prerouting hook that has priority 0. 2. interface is added to the bridge. 3. no tcp packet is ever seen by the bridge prerouting hook. 4. flush the ruleset 5. load the bridge ruleset again. 6. tcp packets are processed as expected. After 1) the only registered hook is the bridge prerouting hook, but its not called yet because the bridge hasn't been brought up yet. After 2), hook order is: 0 br_nf_pre_routing // br_netfilter internal hook 0 chain bridge f prerouting // nftables bridge ruleset The packet is diverted to br_nf_pre_routing. If call-iptables is off, the nftables bridge ruleset is called as expected. But if its enabled, br_nf_hook_thresh() will skip it because it assumes that all 0-priority hooks had been called previously in bridge context. To avoid this, check for the br_nf_pre_routing hook itself, we need to resume directly after it, even if this hook has a priority of 0. Unfortunately, this still results in different packet flow. With this fix, the eval order after in 3) is: 1. br_nf_pre_routing 2. ip(6)tables (if enabled) 3. nftables bridge but after 5 its the much saner: 1. nftables bridge 2. br_nf_pre_routing 3. ip(6)tables (if enabled) Unfortunately I don't see a solution here: It would be possible to move br_nf_pre_routing to a higher priority so that it will be called later in the pipeline, but this also impacts ebtables evaluation order, and would still result in this very ordering problem for all nftables-bridge hooks with the same priority as the br_nf_pre_routing one. Searching back through the git history I don't think this has ever behaved in any other way, hence, no fixes-tag. Reported-by: Radim Hrazdil <rhrazdil@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-06-27 19:23:27 +02:00
Casper Andersson	2aa4abed37	net: bridge: allow add/remove permanent mdb entries on disabled ports Adding mdb entries on disabled ports allows you to do setup before accepting any traffic, avoiding any time where the port is not in the multicast group. Signed-off-by: Casper Andersson <casper.casan@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-15 09:35:21 +01:00
Eric Dumazet	9962acefbc	net: adopt u64_stats_t in struct pcpu_sw_netstats As explained in commit `316580b69d` ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-09 21:53:11 -07:00
Jakub Kicinski	d62607c3fe	net: rename reference+tracking helpers Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code. Rename: dev_hold_track() -> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace() Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-09 21:52:55 -07:00
Jakub Kicinski	d7e6f58360	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/mellanox/mlx5/core/main.c `b33886971d` ("net/mlx5: Initialize flow steering during driver probe") `40379a0084` ("net/mlx5_fpga: Drop INNOVA TLS support") `f2b41b32cd` ("net/mlx5: Remove ipsec_ops function table") https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/ `16d42d3133` ("net/mlx5: Drain fw_reset when removing device") `8324a02c34` ("net/mlx5: Add exit route when waiting for FW") https://lore.kernel.org/all/20220519114119.060ce014@canb.auug.org.au/ tools/testing/selftests/net/mptcp/mptcp_join.sh `e274f71540` ("selftests: mptcp: add subflow limits test-cases") `b6e074e171` ("selftests: mptcp: add infinite map testcase") `5ac1d2d634` ("selftests: mptcp: Add tests for userspace PM type") https://lore.kernel.org/all/20220516111918.366d747f@canb.auug.org.au/ net/mptcp/options.c `ba2c89e0ea` ("mptcp: fix checksum byte order") `1e39e5a32a` ("mptcp: infinite mapping sending") `ea66758c17` ("tcp: allow MPTCP to update the announced window") https://lore.kernel.org/all/20220519115146.751c3a37@canb.auug.org.au/ net/mptcp/pm.c `95d6865178` ("mptcp: fix subflow accounting on close") `4d25247d3a` ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") https://lore.kernel.org/all/20220516111435.72f35dca@canb.auug.org.au/ net/mptcp/subflow.c `ae66fb2ba6` ("mptcp: Do TCP fallback on early DSS checksum failure") `0348c690ed` ("mptcp: add the fallback check") `f8d4bcacff` ("mptcp: infinite mapping receiving") https://lore.kernel.org/all/20220519115837.380bb8d4@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-19 11:23:59 -07:00
Andrew Lunn	fbb3abdf22	net: bridge: Clear offload_fwd_mark when passing frame up bridge interface. It is possible to stack bridges on top of each other. Consider the following which makes use of an Ethernet switch: br1 / \ / \ / \ br0.11 wlan0 \| br0 / \| \ p1 p2 p3 br0 is offloaded to the switch. Above br0 is a vlan interface, for vlan 11. This vlan interface is then a slave of br1. br1 also has a wireless interface as a slave. This setup trunks wireless lan traffic over the copper network inside a VLAN. A frame received on p1 which is passed up to the bridge has the skb->offload_fwd_mark flag set to true, indicating that the switch has dealt with forwarding the frame out ports p2 and p3 as needed. This flag instructs the software bridge it does not need to pass the frame back down again. However, the flag is not getting reset when the frame is passed upwards. As a result br1 sees the flag, wrongly interprets it, and fails to forward the frame to wlan0. When passing a frame upwards, clear the flag. This is the Rx equivalent of br_switchdev_frame_unmark() in br_dev_xmit(). Fixes: `f1c2eddf4c` ("bridge: switchdev: Use an helper to clear forward mark") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220518005840.771575-1-andrew@lunn.ch Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-19 09:20:44 +02:00
Alaa Mohamed	ca4567f1e6	rtnetlink: add extack support in fdb del handlers Add extack support to .ndo_fdb_del in netdevice.h and all related methods. Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-09 11:58:20 +01:00
Jakub Kicinski	ee8b7a1156	net: make drivers set the TSO limit not the GSO limit Drivers should call the TSO setting helper, GSO is controllable by user space. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-06 12:07:56 +01:00
Jakub Kicinski	0e55546b18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/linux/netdevice.h net/core/dev.c `6510ea973d` ("net: Use this_cpu_inc() to increment net->core_stats") `794c24e992` ("net-core: rx_otherhost_dropped to core_stats") https://lore.kernel.org/all/20220428111903.5f4304e0@canb.auug.org.au/ drivers/net/wan/cosa.c `d48fea8401` ("net: cosa: fix error check return value of register_chrdev()") `89fbca3307` ("net: wan: remove support for COSA and SRP synchronous serial boards") https://lore.kernel.org/all/20220428112130.1f689e5e@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-28 13:02:01 -07:00
Clément Léger	7f40ea2145	net: bridge: switchdev: check br_vlan_group() return value br_vlan_group() can return NULL and thus return value must be checked to avoid dereferencing a NULL pointer. Fixes: `6284c723d9` ("net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations") Signed-off-by: Clément Léger <clement.leger@bootlin.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220421101247.121896-1-clement.leger@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-22 15:12:18 -07:00
Nikolay Aleksandrov	0dbe886a4d	net: bridge: fdb: add support for flush filtering based on ifindex and vlan Add support for fdb flush filtering based on destination ifindex and vlan id. The ifindex must either match a port's device ifindex or the bridge's. The vlan support is trivial since it's already validated by rtnl_fdb_del, we just need to fill it in. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-13 12:46:26 +01:00
Nikolay Aleksandrov	564445fb4f	net: bridge: fdb: add support for flush filtering based on ndm flags and state Add support for fdb flush filtering based on ndm flags and state. NDM state and flags are mapped to bridge-specific flags and matched according to the specified masks. NTF_USE is used to represent added_by_user flag since it sets it on fdb add and we don't have a 1:1 mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are ignored. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-13 12:46:26 +01:00
Nikolay Aleksandrov	1f78ee14ee	net: bridge: fdb: add support for fine-grained flushing Add the ability to specify exactly which fdbs to be flushed. They are described by a new structure - net_bridge_fdb_flush_desc. Currently it can match on port/bridge ifindex, vlan id and fdb flags. It is used to describe the existing dynamic fdb flush operation. Note that this flush operation doesn't treat permanent entries in a special way (fdb_delete vs fdb_delete_local), it will delete them regardless if any port is using them, so currently it can't directly replace deletes which need to handle that case, although we can extend it later for that too. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-13 12:46:26 +01:00
Nikolay Aleksandrov	edaef19172	net: bridge: fdb: add ndo_fdb_del_bulk Add a minimal ndo_fdb_del_bulk implementation which flushes all entries. Support for more fine-grained filtering will be added in the following patches. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-13 12:46:26 +01:00
Joachim Wiberg	e65693b017	net: bridge: add support for host l2 mdb entries This patch expands on the earlier work on layer-2 mdb entries by adding support for host entries. Due to the fact that host joined entries do not have any flag field, we infer the permanent flag when reporting the entries to userspace, which otherwise would be listed as 'temp'. Before patch: ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee permanent Error: bridge: Flags are not allowed for host groups. ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee Error: bridge: Only permanent L2 entries allowed. After patch: ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee permanent ~# bridge mdb show dev br0 port br0 grp 01:00:00:c0:ff:ee permanent vid 1 Signed-off-by: Joachim Wiberg <troglobit@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-12 10:06:53 +02:00
Arınç ÜNAL	c3976a3f84	net: bridge: offload BR_HAIRPIN_MODE, BR_ISOLATED, BR_MULTICAST_TO_UNICAST Add BR_HAIRPIN_MODE, BR_ISOLATED and BR_MULTICAST_TO_UNICAST port flags to BR_PORT_FLAGS_HW_OFFLOAD so that switchdev drivers which have an offloaded data plane have a chance to reject these bridge port flags if they don't support them yet. It makes the code path go through the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS driver handlers, which return -EINVAL for everything they don't recognize. For drivers that don't catch SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS at all, switchdev will return -EOPNOTSUPP for those which is then ignored, but those are in the minority. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220410134227.18810-1-arinc.unal@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-11 20:52:38 -07:00
Tobias Waldekranz	a911ad18a5	net: bridge: mst: Restrict info size queries to bridge ports Ensure that no bridge masters are ever considered for MST info dumping. MST states are only supported on bridge ports, not bridge masters - which br_mst_info_size relies on. Fixes: `122c29486e` ("net: bridge: mst: Support setting and reporting MST port states") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Link: https://lore.kernel.org/r/20220322133001.16181-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-23 10:32:44 -07:00
Eric Dumazet	cde3fc244b	net: bridge: mst: prevent NULL deref in br_mst_info_size() Call br_mst_info_size() only if vg pointer is not NULL. general protection fault, probably for non-canonical address 0xdffffc0000000058: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000002c0-0x00000000000002c7] CPU: 0 PID: 975 Comm: syz-executor.0 Tainted: G W 5.17.0-next-20220321-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:br_mst_info_size+0x97/0x270 net/bridge/br_mst.c:242 Code: 00 00 31 c0 e8 ba 10 53 f9 31 c0 b9 40 00 00 00 4c 8d 6c 24 30 4c 89 ef f3 48 ab 48 8d 83 c0 02 00 00 48 89 04 24 48 c1 e8 03 <80> 3c 28 00 0f 85 ae 01 00 00 48 8b 83 c0 02 00 00 41 bf 04 00 00 RSP: 0018:ffffc900153770a8 EFLAGS: 00010202 RAX: 0000000000000058 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff88259876 RDI: ffffc900153772d8 RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffffff8db68957 R10: ffffffff881f737b R11: 0000000000000000 R12: 0000000000000000 R13: ffffc900153770d8 R14: 00000000000002a0 R15: 00000000ffffffff FS: 00007f18bbb6f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020001a80 CR3: 000000001a7d9000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 00000000000000d8 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> br_get_link_af_size_filtered+0x6e9/0xc00 net/bridge/br_netlink.c:123 rtnl_link_get_af_size net/core/rtnetlink.c:598 [inline] if_nlmsg_size+0x40c/0xa50 net/core/rtnetlink.c:1040 rtnl_calcit.isra.0+0x25f/0x460 net/core/rtnetlink.c:3780 rtnetlink_rcv_msg+0xa65/0xb80 net/core/rtnetlink.c:5937 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2496 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f18baa89049 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f18bbb6f168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f18bab9bf60 RCX: 00007f18baa89049 RDX: 0000000000000000 RSI: 0000000020001a80 RDI: 0000000000000004 RBP: 00007f18baae308d R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffeedb2be2f R14: 00007f18bbb6f300 R15: 0000000000022000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:br_mst_info_size+0x97/0x270 net/bridge/br_mst.c:242 Code: 00 00 31 c0 e8 ba 10 53 f9 31 c0 b9 40 00 00 00 4c 8d 6c 24 30 4c 89 ef f3 48 ab 48 8d 83 c0 02 00 00 48 89 04 24 48 c1 e8 03 <80> 3c 28 00 0f 85 ae 01 00 00 48 8b 83 c0 02 00 00 41 bf 04 00 00 RSP: 0018:ffffc900153770a8 EFLAGS: 00010202 RAX: 0000000000000058 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff88259876 RDI: ffffc900153772d8 RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffffff8db68957 R10: ffffffff881f737b R11: 0000000000000000 R12: 0000000000000000 R13: ffffc900153770d8 R14: 00000000000002a0 R15: 00000000ffffffff FS: 00007f18bbb6f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2ca22000 CR3: 000000001a7d9000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 00000000000000d8 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: `122c29486e` ("net: bridge: mst: Support setting and reporting MST port states") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tobias Waldekranz <tobias@waldekranz.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220322012314.795187-1-eric.dumazet@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-03-22 12:47:51 +01:00
Florian Westphal	aaa7b20bd4	netfilter: nft_meta: extend reduce support to bridge family its enough to export the meta get reduce helper and then call it from nft_meta_bridge too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-03-20 00:29:46 +01:00
Pablo Neira Ayuso	34cc9e5288	netfilter: nf_tables: cancel tracking for clobbered destination registers Output of expressions might be larger than one single register, this might clobber existing data. Reset tracking for all destination registers that required to store the expression output. This patch adds three new helper functions: - nft_reg_track_update: cancel previous register tracking and update it. - nft_reg_track_cancel: cancel any previous register tracking info. - __nft_reg_track_cancel: cancel only one single register tracking info. Partial register clobbering detection is also supported by checking the .num_reg field which describes the number of register that are used. This patch updates the following expressions: - meta_bridge - bitwise - byteorder - meta - payload to use these helper functions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-03-20 00:29:46 +01:00
Pablo Neira Ayuso	b2d306542f	netfilter: nf_tables: do not reduce read-only expressions Skip register tracking for expressions that perform read-only operations on the registers. Define and use a cookie pointer NFT_REDUCE_READONLY to avoid defining stubs for these expressions. This patch re-enables register tracking which was disabled in `ed5f85d422` ("netfilter: nf_tables: disable register tracking"). Follow up patches add remaining register tracking for existing expressions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-03-20 00:29:46 +01:00
Tobias Waldekranz	f54fd0e163	net: bridge: mst: Add helper to query a port's MST state This is useful for switchdev drivers who are offloading MST states into hardware. As an example, a driver may wish to flush the FDB for a port when it transitions from forwarding to blocking - which means that the previous state must be discoverable. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:58 -07:00
Tobias Waldekranz	48d57b2e5f	net: bridge: mst: Add helper to check if MST is enabled This is useful for switchdev drivers that might want to refuse to join a bridge where MST is enabled, if the hardware can't support it. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:58 -07:00
Tobias Waldekranz	cceac97afa	net: bridge: mst: Add helper to map an MSTI to a VID set br_mst_get_info answers the question: "On this bridge, which VIDs are mapped to the given MSTI?" This is useful in switchdev drivers, which might have to fan-out operations, relating to an MSTI, per VLAN. An example: When a port's MST state changes from forwarding to blocking, a driver may choose to flush the dynamic FDB entries on that port to get faster reconvergence of the network, but this should only be done in the VLANs that are managed by the MSTI in question. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:58 -07:00
Tobias Waldekranz	7ae9147f43	net: bridge: mst: Notify switchdev drivers of MST state changes Generate a switchdev notification whenever an MST state changes. This notification is keyed by the VLANs MSTI rather than the VID, since multiple VLANs may share the same MST instance. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:58 -07:00
Tobias Waldekranz	6284c723d9	net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations Whenever a VLAN moves to a new MSTI, send a switchdev notification so that switchdevs can track a bridge's VID to MSTI mappings. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:58 -07:00
Tobias Waldekranz	87c167bb94	net: bridge: mst: Notify switchdev drivers of MST mode changes Trigger a switchdev event whenever the bridge's MST mode is enabled/disabled. This allows constituent ports to either perform any required hardware config, or refuse the change if it not supported. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:57 -07:00
Tobias Waldekranz	122c29486e	net: bridge: mst: Support setting and reporting MST port states Make it possible to change the port state in a given MSTI by extending the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The proposed iproute2 interface would be: bridge mst set dev <PORT> msti <MSTI> state <STATE> Current states in all applicable MSTIs can also be dumped via a corresponding RTM_GETLINK. The proposed iproute interface looks like this: $ bridge mst port msti vb1 0 state forwarding 100 state disabled vb2 0 state forwarding 100 state forwarding The preexisting per-VLAN states are still valid in the MST mode (although they are read-only), and can be queried as usual if one is interested in knowing a particular VLAN's state without having to care about the VID to MSTI mapping (in this example VLAN 20 and 30 are bound to MSTI 100): $ bridge -d vlan port vlan-id vb1 10 state forwarding mcast_router 1 20 state disabled mcast_router 1 30 state disabled mcast_router 1 40 state forwarding mcast_router 1 vb2 10 state forwarding mcast_router 1 20 state forwarding mcast_router 1 30 state forwarding mcast_router 1 40 state forwarding mcast_router 1 Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:57 -07:00
Tobias Waldekranz	8c678d6056	net: bridge: mst: Allow changing a VLAN's MSTI Allow a VLAN to move out of the CST (MSTI 0), to an independent tree. The user manages the VID to MSTI mappings via a global VLAN setting. The proposed iproute2 interface would be: bridge vlan global set dev br0 vid <VID> msti <MSTI> Changing the state in non-zero MSTIs is still not supported, but will be addressed in upcoming changes. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:57 -07:00
Tobias Waldekranz	ec7328b591	net: bridge: mst: Multiple Spanning Tree (MST) mode Allow the user to switch from the current per-VLAN STP mode to an MST mode. Up to this point, per-VLAN STP states where always isolated from each other. This is in contrast to the MSTP standard (802.1Q-2018, Clause 13.5), where VLANs are grouped into MST instances (MSTIs), and the state is managed on a per-MSTI level, rather that at the per-VLAN level. Perhaps due to the prevalence of the standard, many switching ASICs are built after the same model. Therefore, add a corresponding MST mode to the bridge, which we can later add offloading support for in a straight-forward way. For now, all VLANs are fixed to MSTI 0, also called the Common Spanning Tree (CST). That is, all VLANs will follow the port-global state. Upcoming changes will make this actually useful by allowing VLANs to be mapped to arbitrary MSTIs and allow individual MSTI states to be changed. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:57 -07:00
Jakub Kicinski	abe2fec8ee	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Revert CHECKSUM_UNNECESSARY for UDP packet from conntrack. 2) Reject unsupported families when creating tables, from Phil Sutter. 3) GRE support for the flowtable, from Toshiaki Makita. 4) Add GRE offload support for act_ct, also from Toshiaki. 5) Update mlx5 driver to support for GRE flowtable offload, from Toshiaki Makita. 6) Oneliner to clean up incorrect indentation in nf_conntrack_bridge, from Jiapeng Chong. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: bridge: clean up some inconsistent indenting net/mlx5: Support GRE conntrack offload act_ct: Support GRE offload netfilter: flowtable: Support GRE netfilter: nf_tables: Reject tables of unsupported family Revert "netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY" ==================== Link: https://lore.kernel.org/r/20220315091513.66544-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-15 11:52:25 -07:00
Jiapeng Chong	334ff12284	netfilter: bridge: clean up some inconsistent indenting Eliminate the follow smatch warning: net/bridge/netfilter/nf_conntrack_bridge.c:385 nf_ct_bridge_confirm() warn: inconsistent indenting. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2022-03-07 12:42:37 +01:00
Sebastian Andrzej Siewior	2e83bdd5d6	net: bridge: Use netif_rx(). Since commit `baebdf48c3` ("net: dev: Makes sure netif_rx() can be invoked in any context.") the function netif_rx() can be used in preemptible/thread context as well as in interrupt context. Use netif_rx(). Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: bridge@lists.linux-foundation.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 12:02:19 +00:00
Martin KaFai Lau	de79910151	net: Add skb_clear_tstamp() to keep the mono delivery_time Right now, skb->tstamp is reset to 0 whenever the skb is forwarded. If skb->tstamp has the mono delivery_time, clearing it can hurt the performance when it finally transmits out to fq@phy-dev. The earlier patch added a skb->mono_delivery_time bit to flag the skb->tstamp carrying the mono delivery_time. This patch adds skb_clear_tstamp() helper which keeps the mono delivery_time and clears everything else. The delivery_time clearing will be postponed until the stack knows the skb will be delivered locally. It will be done in a latter patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:38:48 +00:00
Martin KaFai Lau	a1ac9c8ace	net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp skb->tstamp was first used as the (rcv) timestamp. The major usage is to report it to the user (e.g. SO_TIMESTAMP). Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP) during egress and used by the qdisc (e.g. sch_fq) to make decision on when the skb can be passed to the dev. Currently, there is no way to tell skb->tstamp having the (rcv) timestamp or the delivery_time, so it is always reset to 0 whenever forwarded between egress and ingress. While it makes sense to always clear the (rcv) timestamp in skb->tstamp to avoid confusing sch_fq that expects the delivery_time, it is a performance issue [0] to clear the delivery_time if the skb finally egress to a fq@phy-dev. For example, when forwarding from egress to ingress and then finally back to egress: tcp-sender => veth@netns => veth@hostns => fq@eth0@hostns ^ ^ reset rest This patch adds one bit skb->mono_delivery_time to flag the skb->tstamp is storing the mono delivery_time (EDT) instead of the (rcv) timestamp. The current use case is to keep the TCP mono delivery_time (EDT) and to be used with sch_fq. A latter patch will also allow tc-bpf@ingress to read and change the mono delivery_time. In the future, another bit (e.g. skb->user_delivery_time) can be added for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid. [ This patch is a prep work. The following patches will get the other parts of the stack ready first. Then another patch after that will finally set the skb->mono_delivery_time. ] skb_set_delivery_time() function is added. It is used by the tcp_output.c and during ip[6] fragmentation to assign the delivery_time to the skb->tstamp and also set the skb->mono_delivery_time. A note on the change in ip_send_unicast_reply() in ip_output.c. It is only used by TCP to send reset/ack out of a ctl_sk. Like the new skb_set_delivery_time(), this patch sets the skb->mono_delivery_time to 0 for now as a place holder. It will be enabled in a latter patch. A similar case in tcp_ipv6 can be done with skb_set_delivery_time() in tcp_v6_send_response(). [0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:38:48 +00:00
Hans Schultz	fa1c833429	net: bridge: Add support for offloading of locked port flag Various switchcores support setting ports in locked mode, so that clients behind locked ports cannot send traffic through the port unless a fdb entry is added with the clients MAC address. Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-23 12:52:34 +00:00
Hans Schultz	a21d9a670d	net: bridge: Add support for bridge port in locked mode In a 802.1X scenario, clients connected to a bridge port shall not be allowed to have traffic forwarded until fully authenticated. A static fdb entry of the clients MAC address for the bridge port unlocks the client and allows bidirectional communication. This scenario is facilitated with setting the bridge port in locked mode, which is also supported by various switchcore chipsets. Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-23 12:52:34 +00:00
Eric Dumazet	36a29fb6b2	bridge: switch br_net_exit to batch mode cleanup_net() is competing with other rtnl users. Instead of calling br_net_exit() for each netns, call br_net_exit_batch() once. This gives cleanup_net() ability to group more devices and call unregister_netdevice_many() only once for all bridge devices. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-19 16:20:12 +00:00
Jakub Kicinski	6b5567b1b2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-17 11:44:20 -08:00
Oleksandr Mazur	c832962ac9	net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled Whenever bridge driver hits the max capacity of MDBs, it disables the MC processing (by setting corresponding bridge option), but never notifies switchdev about such change (the notifiers are called only upon explicit setting of this option, through the registered netlink interface). This could lead to situation when Software MDB processing gets disabled, but this event never gets offloaded to the underlying Hardware. Fix this by adding a notify message in such case. Fixes: `147c1e9b90` ("switchdev: bridge: Offload multicast disabled") Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20220215165303.31908-1-oleksandr.mazur@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-16 20:35:00 -08:00
Vladimir Oltean	b28d580e29	net: bridge: switchdev: replay all VLAN groups The major user of replayed switchdev objects is DSA, and so far it hasn't needed information about anything other than bridge port VLANs, so this is all that br_switchdev_vlan_replay() knows to handle. DSA has managed to get by through replicating every VLAN addition on a user port such that the same VLAN is also added on all DSA and CPU ports, but there is a corner case where this does not work. The mv88e6xxx DSA driver currently prints this error message as soon as the first port of a switch joins a bridge: mv88e6085 0x0000000008b96000:00: port 0 failed to add a6:ef:77:c8:5f:3d vid 1 to fdb: -95 where a6:ef:77:c8:5f:3d vid 1 is a local FDB entry corresponding to the bridge MAC address in the default_pvid. The -EOPNOTSUPP is returned by mv88e6xxx_port_db_load_purge() because it tries to map VID 1 to a FID (the ATU is indexed by FID not VID), but fails to do so. This is because ->port_fdb_add() is called before ->port_vlan_add() for VID 1. The abridged timeline of the calls is: br_add_if -> netdev_master_upper_dev_link -> dsa_port_bridge_join -> switchdev_bridge_port_offload -> br_switchdev_vlan_replay () -> br_switchdev_fdb_replay -> mv88e6xxx_port_fdb_add -> nbp_vlan_init -> nbp_vlan_add -> mv88e6xxx_port_vlan_add and the issue is that at the time of (), the bridge port isn't in VID 1 (nbp_vlan_init hasn't been called), therefore br_switchdev_vlan_replay() won't have anything to replay, therefore VID 1 won't be in the VTU by the time mv88e6xxx_port_fdb_add() is called. This happens only when the first port of a switch joins. For further ports, the initial mv88e6xxx_port_vlan_add() is sufficient for VID 1 to be loaded in the VTU (which is switch-wide, not per port). The problem is somewhat unique to mv88e6xxx by chance, because most other drivers offload an FDB entry by VID, so FDBs and VLANs can be added asynchronously with respect to each other, but addressing the issue at the bridge layer makes sense, since what mv88e6xxx requires isn't absurd. To fix this problem, we need to recognize that it isn't the VLAN group of the port that we're interested in, but the VLAN group of the bridge itself (so it isn't a timing issue, but rather insufficient information being passed from switchdev to drivers). As mentioned, currently nbp_switchdev_sync_objs() only calls br_switchdev_vlan_replay() for VLANs corresponding to the port, but the VLANs corresponding to the bridge itself, for local termination, also need to be replayed. In this case, VID 1 is not (yet) present in the port's VLAN group but is present in the bridge's VLAN group. So to fix this bug, DSA is now obligated to explicitly handle VLANs pointing towards the bridge in order to "close this race" (which isn't really a race). As Tobias Waldekranz notices, this also implies that it must explicitly handle port VLANs on foreign interfaces, something that worked implicitly before: https://patchwork.kernel.org/project/netdevbpf/patch/20220209213044.2353153-6-vladimir.oltean@nxp.com/#24735260 So in the end, br_switchdev_vlan_replay() must replay all VLANs from all VLAN groups: all the ports, and the bridge itself. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-16 11:21:04 +00:00
Vladimir Oltean	263029ae31	net: bridge: make nbp_switchdev_unsync_objs() follow reverse order of sync() There may be switchdev drivers that can add/remove a FDB or MDB entry only as long as the VLAN it's in has been notified and offloaded first. The nbp_switchdev_sync_objs() method satisfies this requirement on addition, but nbp_switchdev_unsync_objs() first deletes VLANs, then deletes MDBs and FDBs. Reverse the order of the function calls to cater to this requirement. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-16 11:21:04 +00:00
Vladimir Oltean	8d23a54f5b	net: bridge: switchdev: differentiate new VLANs from changed ones br_switchdev_port_vlan_add() currently emits a SWITCHDEV_PORT_OBJ_ADD event with a SWITCHDEV_OBJ_ID_PORT_VLAN for 2 distinct cases: - a struct net_bridge_vlan got created - an existing struct net_bridge_vlan was modified This makes it impossible for switchdev drivers to properly balance PORT_OBJ_ADD with PORT_OBJ_DEL events, so if we want to allow that to happen, we must provide a way for drivers to distinguish between a VLAN with changed flags and a new one. Annotate struct switchdev_obj_port_vlan with a "bool changed" that distinguishes the 2 cases above. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-16 11:21:04 +00:00
Vladimir Oltean	27c5f74c7b	net: bridge: vlan: notify switchdev only when something changed Currently, when a VLAN entry is added multiple times in a row to a bridge port, nbp_vlan_add() calls br_switchdev_port_vlan_add() each time, even if the VLAN already exists and nothing about it has changed: bridge vlan add dev lan12 vid 100 master static Similarly, when a VLAN is added multiple times in a row to a bridge, br_vlan_add_existing() doesn't filter at all the calls to br_switchdev_port_vlan_add(): bridge vlan add dev br0 vid 100 self This behavior makes driver-level accounting of VLANs impossible, since it is enough for a single deletion event to remove a VLAN, but the addition event can be emitted an unlimited number of times. The cause for this can be identified as follows: we rely on __vlan_add_flags() to retroactively tell us whether it has changed anything about the VLAN flags or VLAN group pvid. So we'd first have to call __vlan_add_flags() before calling br_switchdev_port_vlan_add(), in order to have access to the "bool *changed" information. But we don't want to change the event ordering, because we'd have to revert the struct net_bridge_vlan changes we've made if switchdev returns an error. So to solve this, we need another function that tells us whether any change is going to occur in the VLAN or VLAN group, _prior_ to calling __vlan_add_flags(). Split __vlan_add_flags() into a precommit and a commit stage, and rename it to __vlan_flags_update(). The precommit stage, __vlan_flags_would_change(), will determine whether there is any reason to notify switchdev due to a change of flags (note: the BRENTRY flag transition from false to true is treated separately: as a new switchdev entry, because we skipped notifying the master VLAN when it wasn't a brentry yet, and therefore not as a change of flags). With this lookahead/precommit function in place, we can avoid notifying switchdev if nothing changed for the VLAN and VLAN group. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-16 11:21:04 +00:00
Vladimir Oltean	cab2cd7700	net: bridge: vlan: make __vlan_add_flags react only to PVID and UNTAGGED Currently there is a very subtle aspect to the behavior of __vlan_add_flags(): it changes the struct net_bridge_vlan flags and pvid, yet it returns true ("changed") even if none of those changed, just a transition of br_vlan_is_brentry(v) took place from false to true. This can be seen in br_vlan_add_existing(), however we do not actually rely on this subtle behavior, since the "if" condition that checks that the vlan wasn't a brentry before had a useless (until now) assignment: *changed = true; Make things more obvious by actually making __vlan_add_flags() do what's written on the box, and be more specific about what is actually written on the box. This is needed because further transformations will be done to __vlan_add_flags(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-16 11:21:04 +00:00
Vladimir Oltean	3116ad0696	net: bridge: vlan: don't notify to switchdev master VLANs without BRENTRY flag When a VLAN is added to a bridge port and it doesn't exist on the bridge device yet, it gets created for the multicast context, but it is 'hidden', since it doesn't have the BRENTRY flag yet: ip link add br0 type bridge && ip link set swp0 master br0 bridge vlan add dev swp0 vid 100 # the master VLAN 100 gets created bridge vlan add dev br0 vid 100 self # that VLAN becomes brentry just now All switchdev drivers ignore switchdev notifiers for VLAN entries which have the BRENTRY unset, and for good reason: these are merely private data structures used by the bridge driver. So we might just as well not notify those at all. Cleanup in the switchdev drivers that check for the BRENTRY flag is now possible, and will be handled separately, since those checks just became dead code. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-16 11:21:04 +00:00

1 2 3 4 5 ...

2395 commits