linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-15 15:15:47 +00:00

Author	SHA1	Message	Date
Nikolay Aleksandrov	7d0888d52f	net: bridge: mcast: drop hosts limit sysfs support We decided to stop adding new sysfs bridge options and continue with netlink only, so remove hosts limit sysfs support. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:33:02 -08:00
Jakub Kicinski	56435d9145	Merge branch 'tag_8021q-for-ocelot-switches' Vladimir Oltean says: ==================== tag_8021q for Ocelot switches The Felix switch inside LS1028A has an issue. It has a 2.5G CPU port, and the external ports, in the majority of use cases, run at 1G. This means that, when the CPU injects traffic into the switch, it is very easy to run into congestion. This is not to say that it is impossible to enter congestion even with all ports running at the same speed, just that the default configuration is already very prone to that by design. Normally, the way to deal with that is using Ethernet flow control (PAUSE frames). However, this functionality is not working today with the ENETC - Felix switch pair. The hardware issue is undergoing documentation right now as an erratum within NXP, but several customers have been requesting a reasonable workaround for it. In truth, the LS1028A has 2 internal port pairs. The lack of flow control is an issue only when NPI mode (Node Processor Interface, aka the mode where the "CPU port module", which carries DSA-style tagged packets, is connected to a regular Ethernet port) is used, and NPI mode is supported by Felix on a single port. In past BSPs, we have had setups where both internal port pairs were enabled. We were advertising the following setup: "data port" "control port" (2.5G) (1G) eno2 eno3 ^ ^ \| \| \| regular \| DSA-tagged \| frames \| frames \| \| v v swp4 swp5 This works but is highly unpractical, due to NXP shifting the task of designing a functional system (choosing which port to use, depending on type of traffic required) up to the end user. The swpN interfaces would have to be bridged with swp4, in order for the eno2 "data port" to have access to the outside network. And the swpN interfaces would still be capable of IP networking. So running a DHCP client would give us two IP interfaces from the same subnet, one assigned to eno2, and the other to swpN (0, 1, 2, 3). Also, the dual port design doesn't scale. When attaching another DSA switch to a Felix port, the end result is that the "data port" cannot carry any meaningful data to the external world, since it lacks the DSA tags required to traverse the sja1105 switches below. All that traffic needs to go through the "control port". So in newer BSPs there was a desire to simplify that setup, and only have one internal port pair: eno2 eno3 ^ \| \| DSA-tagged x disabled \| frames \| v swp4 swp5 However, this setup only exacerbates the issue of not having flow control on the NPI port, since that is the only port now. Also, there are use cases that still require the "data port", such as IEEE 802.1CB (TSN stream identification doesn't work over an NPI port), source MAC address learning over NPI, etc. Again, there is a desire to keep the simplicity of the single internal port setup, while regaining the benefits of having a dedicated data port as well. And this series attempts to deliver just that. So the NPI functionality is disabled conditionally. Its purpose was: - To ensure individually addressable ports on TX. This can be replaced by using some designated VLAN tags which are pushed by the DSA tagger code, then removed by the switch (so they are invisible to the outside world and to the user). - To ensure source port identification on RX. Again, this can be replaced by using some designated VLAN tags to encapsulate all RX traffic (each VLAN uniquely identifies a source port). The DSA tagger determines which port it was based on the VLAN number, then removes that header. - To deliver PTP timestamps. This cannot be obtained through VLAN headers, so we need to take a step back and see how else we can do that. The Microchip Ocelot-1 (VSC7514 MIPS) driver performs manual injection/extraction from the CPU port module using register-based MMIO, and not over Ethernet. We will need to do the same from DSA, which makes this tagger a sort of hybrid between DSA and pure switchdev. ==================== Link: https://lore.kernel.org/r/20210129010009.3959398-1-olteanv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:29:32 -08:00
Vladimir Oltean	e21268efbe	net: dsa: felix: perform switch setup for tag_8021q Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:25:27 -08:00
Vladimir Oltean	7c83a7c539	net: dsa: add a second tagger for Ocelot switches based on tag_8021q There are use cases for which the existing tagger, based on the NPI (Node Processor Interface) functionality, is insufficient. Namely: - Frames injected through the NPI port bypass the frame analyzer, so no source address learning is performed, no TSN stream classification, etc. - Flow control is not functional over an NPI port (PAUSE frames are encapsulated in the same Extraction Frame Header as all other frames) - There can be at most one NPI port configured for an Ocelot switch. But in NXP LS1028A and T1040 there are two Ethernet CPU ports. The non-NPI port is currently either disabled, or operated as a plain user port (albeit an internally-facing one). Having the ability to configure the two CPU ports symmetrically could pave the way for e.g. creating a LAG between them, to increase bandwidth seamlessly for the system. So there is a desire to have an alternative to the NPI mode. This change keeps the default tagger for the Seville and Felix switches as "ocelot", but it can be changed via the following device attribute: echo ocelot-8021q > /sys/class/<dsa-master>/dsa/tagging Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:25:27 -08:00
Vladimir Oltean	adb3dccf09	net: dsa: felix: convert to the new .change_tag_protocol DSA API In expectation of the new tag_ocelot_8021q tagger implementation, we need to be able to do runtime switchover between one tagger and another. So we must structure the existing code for the current NPI-based tagger in a certain way. We move the felix_npi_port_init function in expectation of the future driver configuration necessary for tag_ocelot_8021q: we would like to not have the NPI-related bits interspersed with the tag_8021q bits. The conversion from this: ocelot_write_rix(ocelot, ANA_PGID_PGID_PGID(GENMASK(ocelot->num_phys_ports, 0)), ANA_PGID_PGID, PGID_UC); to this: cpu_flood = ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)); ocelot_rmw_rix(ocelot, cpu_flood, cpu_flood, ANA_PGID_PGID, PGID_UC); is perhaps non-trivial, but is nonetheless non-functional. The PGID_UC (replicator for unknown unicast) is already configured out of hardware reset to flood to all ports except ocelot->num_phys_ports (the CPU port module). All we change is that we use a read-modify-write to only add the CPU port module to the unknown unicast replicator, as opposed to doing a full write to the register. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:25:27 -08:00
Vladimir Oltean	53da0ebaad	net: dsa: allow changing the tag protocol via the "tagging" device attribute Currently DSA exposes the following sysfs: $ cat /sys/class/net/eno2/dsa/tagging ocelot which is a read-only device attribute, introduced in the kernel as commit `98cdb48071` ("net: dsa: Expose tagging protocol to user-space"), and used by libpcap since its commit 993db3800d7d ("Add support for DSA link-layer types"). It would be nice if we could extend this device attribute by making it writable: $ echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging This is useful with DSA switches that can make use of more than one tagging protocol. It may be useful in dsa_loop in the future too, to perform offline testing of various taggers, or for changing between dsa and edsa on Marvell switches, if that is desirable. In terms of implementation, drivers can support this feature by implementing .change_tag_protocol, which should always leave the switch in a consistent state: either with the new protocol if things went well, or with the old one if something failed. Teardown of the old protocol, if necessary, must be handled by the driver. Some things remain as before: - The .get_tag_protocol is currently only called at probe time, to load the initial tagging protocol driver. Nonetheless, new drivers should report the tagging protocol in current use now. - The driver should manage by itself the initial setup of tagging protocol, no later than the .setup() method, as well as destroying resources used by the last tagger in use, no earlier than the .teardown() method. For multi-switch DSA trees, error handling is a bit more complicated, since e.g. the 5th out of 7 switches may fail to change the tag protocol. When that happens, a revert to the original tag protocol is attempted, but that may fail too, leaving the tree in an inconsistent state despite each individual switch implementing .change_tag_protocol transactionally. Since the intersection between drivers that implement .change_tag_protocol and drivers that support D in DSA is currently the empty set, the possibility for this error to happen is ignored for now. Testing: $ insmod mscc_felix.ko [ 79.549784] mscc_felix 0000:00:00.5: Adding to iommu group 14 [ 79.565712] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517 $ insmod tag_ocelot.ko $ rmmod mscc_felix.ko $ insmod mscc_felix.ko [ 97.261724] libphy: VSC9959 internal MDIO bus: probed [ 97.267363] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 0 [ 97.274998] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 1 [ 97.282561] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 2 [ 97.289700] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 3 [ 97.599163] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL) [ 97.862034] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL) [ 97.950731] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode [ 97.964278] 8021q: adding VLAN 0 to HW filter on device swp0 [ 98.146161] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL) [ 98.238649] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode [ 98.251845] 8021q: adding VLAN 0 to HW filter on device swp1 [ 98.433916] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL) [ 98.485542] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode [ 98.503584] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control rx/tx [ 98.527948] device eno2 entered promiscuous mode [ 98.544755] DSA: tree 0 setup $ ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1): 56 data bytes 64 bytes from 10.0.0.1: seq=0 ttl=64 time=2.337 ms 64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.754 ms ^C - 10.0.0.1 ping statistics - 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.754/1.545/2.337 ms $ cat /sys/class/net/eno2/dsa/tagging ocelot $ cat ./test_ocelot_8021q.sh #!/bin/bash ip link set swp0 down ip link set swp1 down ip link set swp2 down ip link set swp3 down ip link set swp5 down ip link set eno2 down echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging ip link set eno2 up ip link set swp0 up ip link set swp1 up ip link set swp2 up ip link set swp3 up ip link set swp5 up $ ./test_ocelot_8021q.sh ./test_ocelot_8021q.sh: line 9: echo: write error: Protocol not available $ rmmod tag_ocelot.ko rmmod: can't unload module 'tag_ocelot': Resource temporarily unavailable $ insmod tag_ocelot_8021q.ko $ ./test_ocelot_8021q.sh $ cat /sys/class/net/eno2/dsa/tagging ocelot-8021q $ rmmod tag_ocelot.ko $ rmmod tag_ocelot_8021q.ko rmmod: can't unload module 'tag_ocelot_8021q': Resource temporarily unavailable $ ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1): 56 data bytes 64 bytes from 10.0.0.1: seq=0 ttl=64 time=0.953 ms 64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.787 ms 64 bytes from 10.0.0.1: seq=2 ttl=64 time=0.771 ms $ rmmod mscc_felix.ko [ 645.544426] mscc_felix 0000:00:00.5: Link is Down [ 645.838608] DSA: tree 0 torn down $ rmmod tag_ocelot_8021q.ko Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:24:39 -08:00
Vladimir Oltean	357f203bb3	net: dsa: keep a copy of the tagging protocol in the DSA switch tree Cascading DSA switches can be done multiple ways. There is the brute force approach / tag stacking, where one upstream switch, located between leaf switches and the host Ethernet controller, will just happily transport the DSA header of those leaf switches as payload. For this kind of setups, DSA works without any special kind of treatment compared to a single switch - they just aren't aware of each other. Then there's the approach where the upstream switch understands the tags it transports from its leaves below, as it doesn't push a tag of its own, but it routes based on the source port & switch id information present in that tag (as opposed to DMAC & VID) and it strips the tag when egressing a front-facing port. Currently only Marvell implements the latter, and Marvell DSA trees contain only Marvell switches. So it is safe to say that DSA trees already have a single tag protocol shared by all switches, and in fact this is what makes the switches able to understand each other. This fact is also implied by the fact that currently, the tagging protocol is reported as part of a sysfs installed on the DSA master and not per port, so it must be the same for all the ports connected to that DSA master regardless of the switch that they belong to. It's time to make this official and enforce it (yes, this also means we won't have any "switch understands tag to some extent but is not able to speak it" hardware oddities that we'll support in the future). This is needed due to the imminent introduction of the dsa_switch_ops:: change_tag_protocol driver API. When that is introduced, we'll have to notify switches of the tagging protocol that they're configured to use. Currently the tag_ops structure pointer is held only for CPU ports. But there are switches which don't have CPU ports and nonetheless still need to be configured. These would be Marvell leaf switches whose upstream port is just a DSA link. How do we inform these of their tagging protocol setup/deletion? One answer to the above would be: iterate through the DSA switch tree's ports once, list the CPU ports, get their tag_ops, then iterate again now that we have it, and notify everybody of that tag_ops. But what to do if conflicts appear between one cpu_dp->tag_ops and another? There's no escaping the fact that conflict resolution needs to be done, so we can be upfront about it. Ease our work and just keep the master copy of the tag_ops inside the struct dsa_switch_tree. Reference counting is now moved to be per-tree too, instead of per-CPU port. There are many places in the data path that access master->dsa_ptr->tag_ops and we would introduce unnecessary performance penalty going through yet another indirection, so keep those right where they are. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:24:31 -08:00
Vladimir Oltean	886f8e26f5	net: dsa: document the existing switch tree notifiers and add a new one The existence of dsa_broadcast has generated some confusion in the past: https://www.mail-archive.com/netdev@vger.kernel.org/msg365042.html So let's document the existing dsa_port_notify and dsa_broadcast functions and explain when each of them should be used. Also, in fact, the in-between function has always been there but was lacking a name, and is the main reason for this patch: dsa_tree_notify. Refactor dsa_broadcast to use it. This patch also moves dsa_broadcast (a top-level function) to dsa2.c, where it really belonged in the first place, but had no companion so it stood with dsa_port_notify. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:24:30 -08:00
Vladimir Oltean	cacea62fcd	net: mscc: ocelot: don't use NPI tag prefix for the CPU port module Context: Ocelot switches put the injection/extraction frame header in front of the Ethernet header. When used in NPI mode, a DSA master would see junk instead of the destination MAC address, and it would most likely drop the packets. So the Ocelot frame header can have an optional prefix, which is just "ff:ff:ff:ff:ff:fe > ff:ff:ff:ff:ff:ff" padding put before the actual tag (still before the real Ethernet header) such that the DSA master thinks it's looking at a broadcast frame with a strange EtherType. Unfortunately, a lesson learned in commit `69df578c5f` ("net: mscc: ocelot: eliminate confusion between CPU and NPI port") seems to have been forgotten in the meanwhile. The CPU port module and the NPI port have independent settings for the length of the tag prefix. However, the driver is using the same variable to program both of them. There is no reason really to use any tag prefix with the CPU port module, since that is not connected to any Ethernet port. So this patch makes the inj_prefix and xtr_prefix variables apply only to the NPI port (which the switchdev ocelot_vsc7514 driver does not use). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:24:30 -08:00
Vladimir Oltean	9b521250bf	net: mscc: ocelot: reapply bridge forwarding mask on bonding join/leave Applying the bridge forwarding mask currently is done only on the STP state changes for any port. But it depends on both STP state changes, and bonding interface state changes. Export the bit that recalculates the forwarding mask so that it could be reused, and call it when a port starts and stops offloading a bonding interface. Now that the logic is split into a separate function, we can rename "p" into "port", since the "port" variable was already taken in ocelot_bridge_stp_state_set. Also, we can rename "i" into "lag", to make it more clear what is it that we're iterating through. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:24:30 -08:00
Vladimir Oltean	50c6cc5b92	net: mscc: ocelot: store a namespaced VCAP filter ID We will be adding some private VCAP filters that should not interfere in any way with the filters added using tc-flower. So we need to allocate some IDs which will not be used by tc. Currently ocelot uses an u32 id derived from the flow cookie, which in itself is an unsigned long. This is a problem in itself, since on 64 bit systems, sizeof(unsigned long)=8, so the driver is already truncating these. Create a struct ocelot_vcap_id which contains the full unsigned long cookie from tc, as well as a boolean that is supposed to namespace the filters added by tc with the ones that aren't. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:24:30 -08:00
Vladimir Oltean	0e9bb4e9d9	net: mscc: ocelot: export VCAP structures to include/soc/mscc The Felix driver will need to preinstall some VCAP filters for its tag_8021q implementation (outside of the tc-flower offload logic), so these need to be exported to the common includes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:24:30 -08:00
Vladimir Oltean	9c7caf2806	net: dsa: tag_8021q: add helpers to deduce whether a VLAN ID is RX or TX VLAN The sja1105 implementation can be blind about this, but the felix driver doesn't do exactly what it's being told, so it needs to know whether it is a TX or an RX VLAN, so it can install the appropriate type of TCAM rule. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:24:30 -08:00
Ronak Doshi	de1da8bcf4	vmxnet3: Remove buf_info from device accessible structures buf_info structures in RX & TX queues are private driver data that do not need to be visible to the device. Although there is physical address and length in the queue descriptor that points to these structures, their layout is not standardized, and device never looks at them. So lets allocate these structures in non-DMA-able memory, and fill physical address as all-ones and length as zero in the queue descriptor. That should alleviate worries brought by Martin Radev in https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210104/022829.html that malicious vmxnet3 device could subvert SVM/TDX guarantees. Signed-off-by: Petr Vandrovec <petr@vmware.com> Signed-off-by: Ronak Doshi <doshir@vmware.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:07:03 -08:00
Kurt Kanzenbach	6c13d75bee	net: dsa: hellcreek: Add missing TAPRIO dependency Add missing dependency to TAPRIO to avoid build failures such as: \|ERROR: modpost: "taprio_offload_get" [drivers/net/dsa/hirschmann/hellcreek_sw.ko] undefined! \|ERROR: modpost: "taprio_offload_free" [drivers/net/dsa/hirschmann/hellcreek_sw.ko] undefined! Fixes: `24dfc6eb39` ("net: dsa: hellcreek: Add TAPRIO offloading support") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20210128163338.22665-1-kurt@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:02:31 -08:00
Eric Dumazet	0d6cd689f9	net: proc: speedup /proc/net/netstat Use cache friendly helpers to better use cpu caches while reading /proc/net/netstat Tested on a platform with 256 threads (AMD Rome) Before: 305 usec spent in netstat_seq_show() After: 130 usec spent in netstat_seq_show() Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210128162145.1703601-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 20:59:53 -08:00
Kuniyuki Iwashima	df610cd916	net: Remove redundant calls of sk_tx_queue_clear(). The commit `41b14fb872` ("net: Do not clear the sock TX queue in sk_set_socket()") removes sk_tx_queue_clear() from sk_set_socket() and adds it instead in sk_alloc() and sk_clone_lock() to fix an issue introduced in the commit `e022f0b4a0` ("net: Introduce sk_tx_queue_mapping"). On the other hand, the original commit had already put sk_tx_queue_clear() in sk_prot_alloc(): the callee of sk_alloc() and sk_clone_lock(). Thus sk_tx_queue_clear() is called twice in each path. If we remove sk_tx_queue_clear() in sk_alloc() and sk_clone_lock(), it currently works well because (i) sk_tx_queue_mapping is defined between sk_dontcopy_begin and sk_dontcopy_end, and (ii) sock_copy() called after sk_prot_alloc() in sk_clone_lock() does not overwrite sk_tx_queue_mapping. However, if we move sk_tx_queue_mapping out of the no copy area, it introduces a bug unintentionally. Therefore, this patch adds a compile-time check to take care of the order of sock_copy() and sk_tx_queue_clear() and removes sk_tx_queue_clear() from sk_prot_alloc() so that it does the only allocation and its callers initialize fields. CC: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20210128150217.6060-1-kuniyu@amazon.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 20:57:25 -08:00
Jakub Kicinski	77609b1db2	Merge branch 'net-hns3-updates-for-next' Huazhong Tan says: ==================== net: hns3: updates for -next This patchset adds dump tm info of nodes, priority and qset in debugfs. Three debugfs files tm_nodes, tm_priority and tm_qset are created in new tm directory, and use cat command to dump their info, for examples: $ cat tm_nodes BASE_ID MAX_NUM PG 0 8 PRI 0 8 QSET 0 8 QUEUE 0 1024 $ cat tm_priority ID MODE DWRR C_IR_B C_IR_U C_IR_S C_BS_B C_BS_S C_FLAG C_RATE(Mbps) P_IR_B P_IR_U P_IR_S P_BS_B P_BS_S P_FLAG P_RATE(Mbps) 0000 dwrr 100 0 0 0 5 20 0 0 150 7 0 5 20 0 0 0001 sp 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0002 sp 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0003 sp 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0004 sp 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0005 sp 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0006 sp 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0007 sp 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 $ cat tm_qset ID MAP_PRI LINK_VLD MODE DWRR 0000 0 1 dwrr 100 0001 0 0 sp 0 0002 0 0 sp 0 0003 0 0 sp 0 0004 0 0 sp 0 0005 0 0 sp 0 0006 0 0 sp 0 change log: V2: add readonly files for dump all nodes, priority and qset info suggested by Jakub Kicinski. previous version: V1: https://patchwork.kernel.org/project/netdevbpf/patch/1610694569-43099-1-git-send-email-tanhuazhong@huawei.com/ ==================== Link: https://lore.kernel.org/r/1611834696-56207-1-git-send-email-tanhuazhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 20:42:26 -08:00
Guangbin Huang	04987ca1b9	net: hns3: add debugfs support for tm nodes, priority and qset info In order to query tm info of nodes, priority and qset for debugging, adds three debugfs files tm_nodes, tm_priority and tm_qset in newly created tm directory. Unlike previous debugfs commands, these three files just support read ops, so they only support to use cat command to dump their info. The new tm file style is acccording to suggestion from Jakub Kicinski's opinion as link https://lkml.org/lkml/2020/9/29/2101. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 20:42:24 -08:00
Guangbin Huang	2bbad0aa40	net: hns3: add interfaces to query information of tm priority/qset Add some interfaces to get information of tm priority and qset, then they can be used by debugfs. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 20:42:23 -08:00
Jakub Kicinski	2d88296a80	Merge branch 'net-add-support-for-ip-generic-checksum-offload-for-gre' Xin Long says: ==================== net: add support for ip generic checksum offload for gre This patchset it to add ip generic csum processing first in skb_csum_hwoffload_help() in Patch 1/2 and then add csum offload support for GRE header in Patch 2/2. ==================== Link: https://lore.kernel.org/r/cover.1611825446.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 20:39:16 -08:00
Xin Long	efa1a65c7e	ip_gre: add csum offload support for gre header This patch is to add csum offload support for gre header: On the TX path in gre_build_header(), when CHECKSUM_PARTIAL's set for inner proto, it will calculate the csum for outer proto, and inner csum will be offloaded later. Otherwise, CHECKSUM_PARTIAL and csum_start/offset will be set for outer proto, and the outer csum will be offloaded later. On the GSO path in gre_gso_segment(), when CHECKSUM_PARTIAL is not set for inner proto and the hardware supports csum offload, CHECKSUM_PARTIAL and csum_start/offset will be set for outer proto, and outer csum will be offloaded later. Otherwise, it will do csum for outer proto by calling gso_make_checksum(). Note that SCTP has to do the csum by itself for non GSO path in sctp_packet_pack(), as gre_build_header() can't handle the csum with CHECKSUM_PARTIAL set for SCTP CRC csum offload. v1->v2: - remove the SCTP part, as GRE dev doesn't support SCTP CRC CSUM and it will always do checksum for SCTP in sctp_packet_pack() when it's not a GSO packet. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 20:39:14 -08:00
Xin Long	62fafcd631	net: support ip generic csum processing in skb_csum_hwoffload_help NETIF_F_IP\|IPV6_CSUM feature flag indicates UDP and TCP csum offload while NETIF_F_HW_CSUM feature flag indicates ip generic csum offload for HW, which includes not only for TCP/UDP csum, but also for other protocols' csum like GRE's. However, in skb_csum_hwoffload_help() it only checks features against NETIF_F_CSUM_MASK(NETIF_F_HW\|IP\|IPV6_CSUM). So if it's a non TCP/UDP packet and the features doesn't support NETIF_F_HW_CSUM, but supports NETIF_F_IP\|IPV6_CSUM only, it would still return 0 and leave the HW to do csum. This patch is to support ip generic csum processing by checking NETIF_F_HW_CSUM for all protocols, and check (NETIF_F_IP_CSUM \| NETIF_F_IPV6_CSUM) only for TCP and UDP. Note that we're using skb->csum_offset to check if it's a TCP/UDP proctol, this might be fragile. However, as Alex said, for now we only have a few L4 protocols that are requesting Tx csum offload, we'd better fix this until a new protocol comes with a same csum offset. v1->v2: - not extend skb->csum_not_inet, but use skb->csum_offset to tell if it's an UDP/TCP csum packet. v2->v3: - add a note in the changelog, as Willem suggested. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 20:39:14 -08:00
Jakub Kicinski	fd3d37551c	linux-can-next-for-5.12-20210129 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmATyC0THG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqUMNB/4s7WkirDS+fMDpdB+vRjqguvJ/R4Ef loplrX5NN8437xZUrA0x4d/jQ59jXOOT7F3MjEN2kU389FPikVmpw6owUSbxQhOD cZ8BK1WhBNJBezOITfO4TWGowKsBN3guEBolcqBrCtMt97mxQ5u9rB0Ke0uFyGZ7 2ws1BpGK/wQBeORkyHRS7hY8dlyZbFJkUl4J9oKYuhvIwHc0x26+LZFFWvs8GFFu NSnDqwhg4ImI27/wamyHGSWukk2IdUgKVwtNzUpSzEWmeEtp2EHFTgY56ErizCce 8EihbIB15brChkMO2J8mrHoJxzWR99kMqWuJ/EKdiycxtMx4+fj3wz91 =QfkK -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.12-20210129' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== linux-can-next-for-5.12-20210129 All patches are by me and target the mcp251xfd driver. The first 4 patches update the information regarding the "85% of (FSYSCLK/2)" errata. The other 4 are misc cleanups, unitfy error messages, add missing postfix to a macro, simplify the return of a function, and make use of dev_err_probe() in the mcp251xfd_probe() function. ==================== Link: https://lore.kernel.org/r/20210129084302.3040284-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 19:45:21 -08:00
Loic Poulain	6e10785ee1	net: mhi: Get rid of local rx queue count Use the new mhi_get_free_desc_count helper to track queue usage instead of relying on the locally maintained rx_queued count. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 19:42:06 -08:00
Loic Poulain	e6ec3ccd4e	net: mhi: Get RX queue size from MHI core The RX queue size can be determined at runtime by retrieving the number of available transfer descriptors. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 19:41:59 -08:00
Jakub Kicinski	2bca263cda	Merge branch 'mhi-net-immutable' of https://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi Needed by mhi-net patches. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 19:40:52 -08:00
Jan Luebbe	5daf83846c	docs: networking: timestamping: fix section title markup This section was missed during the conversion to ReST, so convert it in the same style as the surrounding section titles. Signed-off-by: Jan Luebbe <jlu@pengutronix.de> Link: https://lore.kernel.org/r/20210128111930.29473-1-jlu@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 19:14:23 -08:00
dingsenjie	afa4f675aa	net/ethernet: convert to use module_platform_driver in octeon_mgmt.c Simplify the code by using module_platform_driver macro for octeon_mgmt. Signed-off-by: dingsenjie <dingsenjie@yulong.com> Link: https://lore.kernel.org/r/20210128035330.17676-1-dingsenjie@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 19:02:43 -08:00
Emil Renner Berthing	a58745979c	net: atm: pppoatm: use new API for wakeup tasklet This converts the driver to use the new tasklet API introduced in commit `12cc923f1c` ("tasklet: Introduce new initialization API") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Link: https://lore.kernel.org/r/20210127173256.13954-2-kernel@esmil.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 18:24:05 -08:00
Emil Renner Berthing	a5b88632fc	net: atm: pppoatm: use tasklet_init to initialize wakeup tasklet Previously a temporary tasklet structure was initialized on the stack using DECLARE_TASKLET_OLD() and then copied over and modified. Nothing else in the kernel seems to use this pattern, so let's just call tasklet_init() like everyone else. Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Link: https://lore.kernel.org/r/20210127173256.13954-1-kernel@esmil.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 18:24:05 -08:00
Jakub Kicinski	810e754c7b	Merge branch 'net-sched-cls_flower-add-support-for-matching-on-ct_state-reply-flag' Paul Blakey says: ==================== net/sched: cls_flower: Add support for matching on ct_state reply flag This patchset adds software match support and offload of flower match ct_state reply flag (+/-rpl). The first patch adds the definition for the flag and match to flower. Second patch gives the direction of the connection to the offloading drivers via ct_metadata flow offload action. The last patch does offload of this new ct_state by using the supplied connection's direction. ==================== Link: https://lore.kernel.org/r/1611757967-18236-1-git-send-email-paulb@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 18:05:32 -08:00
Paul Blakey	6895cb3a95	net/mlx5: CT: Add support for matching on ct_state reply flag Add support for matching on ct_state reply flag. Example: $ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \ ct_state +trk+est+rpl \ action mirred egress redirect dev ens1f0_1 $ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \ ct_state +trk+est-rpl \ action mirred egress redirect dev ens1f0_0 Signed-off-by: Paul Blakey <paulb@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 18:05:30 -08:00
Paul Blakey	941eff5aea	net: flow_offload: Add original direction flag to ct_metadata Give offloading drivers the direction of the offloaded ct flow, this will be used for matches on direction (ct_state +/-rpl). Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 18:05:30 -08:00
Paul Blakey	8c85d18ce6	net/sched: cls_flower: Add match on the ct_state reply flag Add match on the ct_state reply flag. Example: $ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \ ct_state +trk+est+rpl \ action mirred egress redirect dev ens1f0_1 $ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \ ct_state +trk+est-rpl \ action mirred egress redirect dev ens1f0_0 Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 18:05:30 -08:00
Jakub Kicinski	cf3c7c7b37	Merge branch 'add-nci-suit-and-virtual-nci-device-driver' Bongsu Jeon says: ==================== Add nci suit and virtual nci device driver 1/2 is the Virtual NCI device driver. 2/2 is the NCI selftest suite ==================== Link: https://lore.kernel.org/r/20210127130829.4026-1-bongsu.jeon@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 18:03:37 -08:00
Bongsu Jeon	f595cf1242	selftests: Add nci suite This is the NCI test suite. It tests the NFC/NCI module using virtual NCI device. Test cases consist of making the virtual NCI device on/off and controlling the device's polling for NCI1.0 and NCI2.0 version. Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 18:03:33 -08:00
Bongsu Jeon	e624e6c3e7	nfc: Add a virtual nci device driver NCI virtual device simulates a NCI device to the user. It can be used to validate the NCI module and applications. This driver supports communication between the virtual NCI device and NCI module. Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 18:03:33 -08:00
Menglong Dong	8c22475148	net: packet: make pkt_sk() inline It's better make 'pkt_sk()' inline here, as non-inline function shouldn't occur in headers. Besides, this function is simple enough to be inline. Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Link: https://lore.kernel.org/r/20210127123302.29842-1-dong.menglong@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 17:38:55 -08:00
Andrea Parri (Microsoft)	0ba35fe91c	hv_netvsc: Copy packets sent by Hyper-V out of the receive buffer Pointers to receive-buffer packets sent by Hyper-V are used within the guest VM. Hyper-V can send packets with erroneous values or modify packet fields after they are processed by the guest. To defend against these scenarios, copy (sections of) the incoming packet after validating their length and offset fields in netvsc_filter_receive(). In this way, the packet can no longer be modified by the host. Reported-by: Juan Vazquez <juvazq@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20210126162907.21056-1-parri.andrea@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 16:44:07 -08:00
Marc Kleine-Budde	cf8ee6de25	can: mcp251xfd: mcp251xfd_probe(): use dev_err_probe() to simplify error handling dev_err_probe() can reduce code size, uniform error handling and record the defer probe reason etc., use it to simplify the code. Link: https://lore.kernel.org/r/20210128104644.2982125-9-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-01-29 09:31:58 +01:00
Marc Kleine-Budde	dfe99ba29e	can: mcp251xfd: mcp251xfd_chip_clock_enable(): simplify return This patch simplifies the return of the mcp251xfd_chip_clock_enable() function by direct returning the error. Link: https://lore.kernel.org/r/20210128104644.2982125-8-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-01-29 09:31:58 +01:00
Marc Kleine-Budde	49ffacbc4c	can: mcp251xfd: add missing _MASK postfix to MCP251XFD_OBJ_FLAGS_DLC As MCP251XFD_OBJ_FLAGS_DLC is a mask, add the missing _MASK postfix, that all other masks in the driver have. Link: https://lore.kernel.org/r/20210128104644.2982125-7-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-01-29 09:31:57 +01:00
Marc Kleine-Budde	f93486a79a	can: mcp251xfd: unify error messages and commets This patch unifies the error messages: - have a "." and the end of each message - write controller with a small "c", if not the first word of an error message. Link: https://lore.kernel.org/r/20210128104644.2982125-6-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-01-29 09:31:57 +01:00
Marc Kleine-Budde	9f1fbc1c9c	can: mcp251xfd: mcp251xfd_probe(): add imx6 to errata table This patch adds an imx6 as known good to the errata table. Link: https://lore.kernel.org/r/20210128104644.2982125-5-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-01-29 09:31:57 +01:00
Marc Kleine-Budde	01b2a0e5a0	can: mcp251xfd: mcp251xfd_probe(): remove known bad combinations from errata tabe The published errata specify the maximum allowed SPI frequency to be max 85% of (FSYSCLK/2). So there's no need to track known bad clock settings in the driver. As the setup of known good values is a bit tricky, keep them. Link: https://lore.kernel.org/r/20210128104644.2982125-4-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-01-29 09:31:57 +01:00
Marc Kleine-Budde	b98e68e91c	can: mcp251xfd: mcp251xfd_probe(): sort errata table alphabetically, fix indention This patch sorts the errata table alphabetically and fixes the indention. Link: https://lore.kernel.org/r/20210128104644.2982125-3-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-01-29 09:31:56 +01:00
Marc Kleine-Budde	28eb119c04	can: mcp251xfd: mcp251xfd_probe(): fix errata reference This patch fixes the reference to the errata for both the mcp2517fd and the mcp2518fd. Fixes: `f5b84dedf7` ("can: mcp25xxfd: mcp25xxfd_probe(): add SPI clk limit related errata information") Link: https://lore.kernel.org/r/20210128104644.2982125-2-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-01-29 09:31:56 +01:00
Bjorn Helgaas	46eb3c108f	octeontx2-af: Fix 'physical' typos Fix misspellings of "physical". Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20210127181359.3008316-1-helgaas@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 21:24:47 -08:00
dingsenjie	1d3f9bb1be	linux/qed: fix spelling typo in qed_chain.h allocted -> allocated Signed-off-by: dingsenjie <dingsenjie@yulong.com> Link: https://lore.kernel.org/r/20210127022801.8028-1-dingsenjie@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 21:24:40 -08:00

1 2 3 4 5 ...

984367 commits