linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-30 14:19:16 +00:00

Author	SHA1	Message	Date
Oleksij Rempel	46e31db55d	net: macb: fix negative max_mtu size for sama5d3 JML register on probe will return zero . This register is configured later on macb_init_hw() which is called on open. Since we have zero, after header and FCS length subtraction we will get negative max_mtu size. This issue was affecting DSA drivers with MTU support (for example KSZ9477). Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 11:42:35 +01:00
Kees Cook	2c0ab32b73	hinic: Replace memcpy() with direct assignment Under CONFIG_FORTIFY_SOURCE=y and CONFIG_UBSAN_BOUNDS=y, Clang is bugged here for calculating the size of the destination buffer (0x10 instead of 0x14). This copy is a fixed size (sizeof(struct fw_section_info_st)), with the source and dest being struct fw_section_info_st, so the memcpy should be safe, assuming the index is within bounds, which is UBSAN_BOUNDS's responsibility to figure out. Avoid the whole thing and just do a direct assignment. This results in no change to the executable code. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Tom Rix <trix@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Jiri Pirko <jiri@nvidia.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Simon Horman <simon.horman@corigine.com> Cc: netdev@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://github.com/ClangBuiltLinux/linux/issues/1592 Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> # build Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 11:31:18 +01:00
Oleksij Rempel	225b0ed27e	net: ag71xx: fix discards 'const' qualifier warning Current kernel will compile this driver with warnings. This patch will fix it. drivers/net/ethernet/atheros/ag71xx.c: In function 'ag71xx_fast_reset': drivers/net/ethernet/atheros/ag71xx.c:996:31: warning: passing argument 2 of 'ag71xx_hw_set _macaddr' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] 996 \| ag71xx_hw_set_macaddr(ag, dev->dev_addr); \| ~~~^~~~~~~~~~ drivers/net/ethernet/atheros/ag71xx.c:951:69: note: expected 'unsigned char ' but argument is of type 'const unsigned char ' 951 \| static void ag71xx_hw_set_macaddr(struct ag71xx ag, unsigned char mac) \| ~~~~~~~~~~~~~~~^~~ drivers/net/ethernet/atheros/ag71xx.c: In function 'ag71xx_open': drivers/net/ethernet/atheros/ag71xx.c:1441:32: warning: passing argument 2 of 'ag71xx_hw_se t_macaddr' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] 1441 \| ag71xx_hw_set_macaddr(ag, ndev->dev_addr); \| ~~~~^~~~~~~~~~ drivers/net/ethernet/atheros/ag71xx.c:951:69: note: expected 'unsigned char ' but argument is of type 'const unsigned char ' 951 \| static void ag71xx_hw_set_macaddr(struct ag71xx ag, unsigned char mac) \| ~~~~~~~~~~~~~~~^~~ Fixes: `adeef3e321` ("net: constify netdev->dev_addr") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 11:04:45 +01:00
David S. Miller	fd8b330ce1	tcp: fix build... Remove accidental dup of tcp_wmem_schedule. Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:57:22 +01:00
David S. Miller	47cfd06192	Merge branch 'pcs-xpcs-stmmac-add-1000BASE-X-AN-for-network-switch' Ong Boon Leong says: ==================== pcs-xpcs, stmmac: add 1000BASE-X AN for network switch Thanks for v4 review feedback in [1] and [2]. I have changed the v5 implementation as follow. v5 changes: 1/5 - No change from v4. 2/5 - No change from v4. 3/5 - [Fix] make xpcs_modify_changed() static and use mdiodev_modify_changed() for cleaner code as suggested by Russell King. 4/5 - [Fix] Use fwnode_get_phy_mode() as recommended by Andrew Lunn. 5/5 - [Fix] Make fwnode = of_fwnode_handle(priv->plat->phylink_node) order after priv = netdev_priv(dev). v4 changes: 1/5 - Squash v3:1/7 & 2/7 patches into v4:1/6 so that it passes build. 2/5 - [No change] same as v3:3/7 3/5 - [Fix] Fix issues identified by Russell in [1] 4/5 - [Fix] Drop v3:5/7 patch per input by Russell in [2] and make dwmac-intel clear the ovr_an_inband flag if fixed-link is used in ACPI _DSD. 5/5 - [No change] same as v3:7/7 For the steps to setup ACPI _DSD and checking, they are the same as in [3] Reference: [1] https://patchwork.kernel.org/comment/24894239/ [2] https://patchwork.kernel.org/comment/24895330/ [3] https://patchwork.kernel.org/project/netdevbpf/cover/20220610033610.114084-1-boon.leong.ong@intel.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:55:35 +01:00
Ong Boon Leong	ab21cf9209	net: stmmac: make mdio register skips PHY scanning for fixed-link stmmac_mdio_register() lacks fixed-link consideration and only skip PHY scanning if it has done DT style PHY discovery. So, for DT or ACPI _DSD setting of fixed-link, the PHY scanning should not happen. v2: fix incorrect order related to fwnode that is not caught in non-DT platform. Tested-by: Emilio Riva <emilio.riva@ericsson.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:55:35 +01:00
Ong Boon Leong	72edaf39fc	stmmac: intel: add phy-mode and fixed-link ACPI _DSD setting support Currently, phy_interface for TSN controller instance is set based on its PCI Device ID. For SGMII PHY interface, phy_interface default to PHY_INTERFACE_MODE_SGMII. As C37 AN supports both SGMII and 1000BASE-X mode, we add support for 'phy-mode' ACPI _DSD for port-specific and customer platform specific customization. v3: use fwnode_get_phy_mode() as suggested by Andrew Lunn in https://patchwork.kernel.org/comment/24895330/ v2: For platform that sets 'fixed-link' using ACPI _DSD, we will unset xpcs_an_inband within stmmac. Thanks to Russell King for his comment in https://patchwork.kernel.org/comment/24890222/ v1: Thanks to Andrew Lunn's guidance in https://patchwork.kernel.org/comment/24827101/ Tested-by: Emilio Riva <emilio.riva@ericsson.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:55:35 +01:00
Ong Boon Leong	b47aec885b	net: pcs: xpcs: add CL37 1000BASE-X AN support For CL37 1000BASE-X AN, DW xPCS does not support C22 method but offers C45 vendor-specific MII MMD for programming. We also add the ability to disable Autoneg (through ethtool for certain network switch that supports 1000BASE-X (1000Mbps and Full-Duplex) but not Autoneg capability. v4: Fixes to comment from Russell King. Thanks! https://patchwork.kernel.org/comment/24894239/ Make xpcs_modify_changed() as private, change to use mdiodev_modify_changed() for cleaner code. v3: Fixes to issues spotted by Russell King. Thanks! https://patchwork.kernel.org/comment/24890210/ Use phylink_mii_c22_pcs_decode_state(), remove unnecessary interrupt clearing and skip speed & duplex setting if AN is enabled. v2: Fixes to issues spotted by Russell King in v1. Thanks! https://patchwork.kernel.org/comment/24826650/ Use phylink_mii_c22_pcs_encode_advertisement() and implement C45 MII ADV handling since IP only support C45 access. Tested-by: Emilio Riva <emilio.riva@ericsson.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:55:35 +01:00
Ong Boon Leong	c82386310d	stmmac: intel: prepare to support 1000BASE-X phy interface setting Currently, intel_speed_mode_2500() redundantly fix-up phy_interface to PHY_INTERFACE_MODE_SGMII if the underlying controller is in 1000Mbps SGMII mode. The value of phy_interface has been initialized earlier. This patch removes such redundancy to prepare for setting 1000BASE-X mode for certain hardware platform configuration. Also update the intel_mgbe_common_data() to include 1000BASE-X setup. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:55:35 +01:00
Ong Boon Leong	fa9c562f97	net: make xpcs_do_config to accept advertising for pcs-xpcs and sja1105 xpcs_config() has 'advertising' input that is required for C37 1000BASE-X AN in later patch series. So, we prepare xpcs_do_config() for it. For sja1105, xpcs_do_config() is used for xpcs configuration without depending on advertising input, so set to NULL. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:55:35 +01:00
David S. Miller	982c3e2948	Merge branch 'mlxsw-L3-HW-stats-improvements' Ido Schimmel says: ==================== mlxsw: L3 HW stats improvements While testing L3 HW stats [1] on top of mlxsw, two issues were found: 1. Stats cannot be enabled for more than 205 netdevs. This was fixed in commit `4b7a632ac4` ("mlxsw: spectrum_cnt: Reorder counter pools"). 2. ARP packets are counted as errors. Patch #1 takes care of that. See the commit message for details. The goal of the majority of the rest of the patches is to add selftests that would have discovered that only about 205 netdevs can have L3 HW stats supported, despite the HW supporting much more. The obvious place to plug this in is the scale test framework. The scale tests are currently testing two things: that some number of instances of a given resource can actually be created; and that when an attempt is made to create more than the supported amount, the failures are noted and handled gracefully. However the ability to allocate the resource does not mean that the resource actually works when passing traffic. For that, make it possible for a given scale to also test traffic. To that end, this patchset adds traffic tests. The goal of these is to run traffic and observe whether a sample of the allocated resource instances actually perform their task. Traffic tests are only run on the positive leg of the scale test (no point trying to pass traffic when the expected outcome is that the resource will not be allocated). They are opt-in, if a given test does not expose it, it is not run. The patchset proceeds as follows: - Patches #2 and #3 add to "devlink resource" support for number of allocated RIFs, and the capacity. This is necessary, because when evaluating how many L3 HW stats instances it should be possible to allocate, the limiting resource on Spectrum-2 and above currently is not the counters themselves, but actually the RIFs. - Patch #6 adds support for invocation of a traffic test, if a given scale tests exposes it. - Patch #7 adds support for skipping a given scale test. Because on Spectrum-2 and above, the limiting factor to L3 HW stats instances is actually the number of RIFs, there is no point in running the failing leg of a scale tests, because it would test exhaustion of RIFs, not of RIF counters. - With patch #8, the scale tests drivers pass the target number to the cleanup function of a scale test. - In patch #9, add a traffic test to the tc_flower selftests. This makes sure that the flow counters installed with the ACLs actually do count as they are supposed to. - In patch #10, add a new scale selftest for RIF counter scale, including a traffic test. - In patch #11, the scale target for the tc_flower selftest is dynamically set instead of being hard coded. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca0a53dcec9495d1dc5bbc369c810c520d728373 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:34 +01:00
Ido Schimmel	ed62af4546	selftests: spectrum-2: tc_flower_scale: Dynamically set scale target Instead of hard coding the scale target in the test, dynamically set it based on the maximum number of flow counters and their current occupancy. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:33 +01:00
Petr Machata	be00853bfd	selftests: mlxsw: Add a RIF counter scale test This tests creates as many RIFs as possible, ideally more than there can be RIF counters (though that is currently only possible on Spectrum-1). It then tries to enable L3 HW stats on each of the RIFs. It also contains the traffic test, which tries to run traffic through a log2 of those counters and checks that the traffic is shown in the counter values. Like with tc_flower traffic test, take a log2 subset of rules. The logic behind picking log2 rules is that then every bit of the instantiated item's number is exercised. This should catch issues whether they happen at the high end, low end, or somewhere in between. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:33 +01:00
Petr Machata	dd5d20e17c	selftests: mlxsw: tc_flower_scale: Add a traffic test Add a test that checks that the created filters do actually trigger on matching traffic. Exercising all the rules would be a very lengthy process. Instead, take a log2 subset of rules. The logic behind picking log2 rules is that then every bit of the instantiated item's number is exercised. This should catch issues whether they happen at the high end, low end, or somewhere in between. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:33 +01:00
Petr Machata	35d5829e86	selftests: mlxsw: resource_scale: Pass target count to cleanup The scale tests are verifying behavior of mlxsw when number of instances of some resource reaches the ASIC capacity. The number of instances is referred to as "target" number. No scale tests so far needed to know this target number to clean up. E.g. the tc_flower simply removes the clsact qdisc that all the tested filters are hooked onto, and that takes care of collecting all the filters. However, for the RIF counter test, which is being added in a future patch, VLAN netdevices are created. These are created as part of the test, but of course the cleanup needs to undo them again. For that it needs to know how many there were. To support this usage, pass the target number to the cleanup callback. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:33 +01:00
Petr Machata	8cad339db3	selftests: mlxsw: resource_scale: Allow skipping a test The scale tests are currently testing two things: that some number of instances of a given resource can actually be created; and that when an attempt is made to create more than the supported amount, the failures are noted and handled gracefully. Sometimes the scale test depends on more than one resource. In particular, a following patch will add a RIF counter scale test, which depends on the number of RIF counters that can be bound, and also on the number of RIFs that can be created. When the test is limited by the auxiliary resource and not by the primary one, there's no point trying to run the overflow test, because it would be testing exhaustion of the wrong resource. To support this use case, when the $test_get_target yields 0, skip the test instead. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:33 +01:00
Petr Machata	3128b9f51e	selftests: mlxsw: resource_scale: Introduce traffic tests The scale tests are currently testing two things: that some number of instances of a given resource can actually be created; and that when an attempt is made to create more than the supported amount, the failures are noted and handled gracefully. However the ability to allocate the resource does not mean that the resource actually works when passing traffic. For that, make it possible for a given scale to also test traffic. Traffic test is only run on the positive leg of the scale test (no point trying to pass traffic when the expected outcome is that the resource will not be allocated). Traffic tests are opt-in, if a given test does not expose it, it is not run. To this end, delay the test cleanup until after the traffic test is run. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:33 +01:00
Ido Schimmel	d3ffeb2dba	selftests: mlxsw: resource_scale: Update scale target after test setup The scale of each resource is tested in the following manner: 1. The scale target is queried. 2. The test setup is prepared. 3. The test is invoked. In some cases, the occupancy of a resource changes as part of the second step, requiring the test to return a scale target that takes this change into account. Make this more robust by re-querying the scale target after the second step. Another possible solution is to swap the first and second steps, but when a test needs to be skipped (i.e., scale target is zero), the setup would have been in vain. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:33 +01:00
Amit Cohen	e386a527fc	selftests: mirror_gre_bridge_1q_lag: Enslave port to bridge before other configurations Using mlxsw driver, the configurations are offloaded just in case that there is a physical port which is enslaved to the virtual device (e.g., to a bridge). In 'mirror_gre_bridge_1q_lag' test, the bridge gets an address and route before there are ports in the bridge. It means that these configurations are not offloaded. Till now the test passes with mlxsw driver even that the RIF of the bridge is not in the hardware, because the ARP packets are trapped in layer 2 and also mirrored, so there is no real need of the RIF in hardware. The previous patch changed the traps 'ARP_REQUEST' and 'ARP_RESPONSE' to be done at layer 3 instead of layer 2. With this change the ARP packets are not trapped during the test, as the RIF is not in the hardware because of the order of configurations. Reorder the configurations to make them to be offloaded, then the test will pass with the change of the traps. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:33 +01:00
Petr Machata	4ec2feb26c	mlxsw: Add a resource describing number of RIFs The Spectrum ASIC has a limit on how many L3 devices (called RIFs) can be created. The limit depends on the ASIC and FW revision, and mlxsw reads it from the FW. In order to communicate both the number of RIFs that there can be, and how many are taken now (i.e. occupancy), introduce a corresponding devlink resource. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:33 +01:00
Petr Machata	b9840fe035	mlxsw: Keep track of number of allocated RIFs In order to expose number of RIFs as a resource, it is going to be handy to have the number of currently-allocated RIFs as a single number. Introduce such. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:33 +01:00
Amit Cohen	4b1cc357f8	mlxsw: Trap ARP packets at layer 3 instead of layer 2 Currently, the traps 'ARP_REQUEST' and 'ARP_RESPONSE' occur at layer 2. To allow the packets to be flooded, they are configured with the action 'MIRROR_TO_CPU' which means that the CPU receives a replica of the packet. Today, Spectrum ASICs also support trapping ARP packets at layer 3. This behavior is better, then the packets can just be trapped and there is no need to mirror them. An additional motivation is that using the traps at layer 2, the ARP packets are dropped in the router as they do not have an IP header, then they are counted as error packets, which might confuse users. Add the relevant traps for layer 3 and use them instead of the existing traps. There is no visible change to user space. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:31:33 +01:00
David S. Miller	e42134b57e	Merge branch 'tcp-mem-pressure-fixes' Eric Dumazet says: ==================== tcp: final (?) round of mem pressure fixes While working on prior patch series (`e10b02ee5b` "Merge branch 'net-reduce-tcp_memory_allocated-inflation'"), I found that we could still have frozen TCP flows under memory pressure. I thought we had solved this in 2015, but the fix was not complete. v2: deal with zerocopy tx paths. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:11:04 +01:00
Eric Dumazet	f54755f6a1	tcp: fix possible freeze in tx path under memory pressure Blamed commit only dealt with applications issuing small writes. Issue here is that we allow to force memory schedule for the sk_buff allocation, but we have no guarantee that sendmsg() is able to copy some payload in it. In this patch, I make sure the socket can use up to tcp_wmem[0] bytes. For example, if we consider tcp_wmem[0] = 4096 (default on x86), and initial skb->truesize being 1280, tcp_sendmsg() is able to copy up to 2816 bytes under memory pressure. Before this patch a sendmsg() sending more than 2816 bytes would either block forever (if persistent memory pressure), or return -EAGAIN. For bigger MTU networks, it is advised to increase tcp_wmem[0] to avoid sending too small packets. v2: deal with zero copy paths. Fixes: `8e4d980ac2` ("tcp: fix behavior for epoll edge trigger") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:10:41 +01:00
Eric Dumazet	849b425cd0	tcp: fix possible freeze in tx path under memory pressure Blamed commit only dealt with applications issuing small writes. Issue here is that we allow to force memory schedule for the sk_buff allocation, but we have no guarantee that sendmsg() is able to copy some payload in it. In this patch, I make sure the socket can use up to tcp_wmem[0] bytes. For example, if we consider tcp_wmem[0] = 4096 (default on x86), and initial skb->truesize being 1280, tcp_sendmsg() is able to copy up to 2816 bytes under memory pressure. Before this patch a sendmsg() sending more than 2816 bytes would either block forever (if persistent memory pressure), or return -EAGAIN. For bigger MTU networks, it is advised to increase tcp_wmem[0] to avoid sending too small packets. v2: deal with zero copy paths. Fixes: `8e4d980ac2` ("tcp: fix behavior for epoll edge trigger") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:03:42 +01:00
Eric Dumazet	c4ee118561	tcp: fix over estimation in sk_forced_mem_schedule() sk_forced_mem_schedule() has a bug similar to ones fixed in commit `7c80b038d2` ("net: fix sk_wmem_schedule() and sk_rmem_schedule() errors") While this bug has little chance to trigger in old kernels, we need to fix it before the following patch. Fixes: `d83769a580` ("tcp: fix possible deadlock in tcp_send_fin()") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-17 10:03:42 +01:00
Andrii Nakryiko	08c79c9cd6	selftests/bpf: Don't force lld on non-x86 architectures LLVM's lld linker doesn't have a universal architecture support (e.g., it definitely doesn't work on s390x), so be safe and force lld for urandom_read and liburandom_read.so only on x86 architectures. This should fix s390x CI runs. Fixes: `3e6fe5ce4d` ("libbpf: Fix internal USDT address translation logic for shared libraries") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220617045512.1339795-1-andrii@kernel.org	2022-06-17 10:16:01 +02:00
Alexei Starovoitov	4429bdc408	Merge branch 'New BPF helpers to accelerate synproxy' Maxim Mikityanskiy says: ==================== The first patch of this series is a documentation fix. The second patch allows BPF helpers to accept memory regions of fixed size without doing runtime size checks. The two next patches add new functionality that allows XDP to accelerate iptables synproxy. v1 of this series [1] used to include a patch that exposed conntrack lookup to BPF using stable helpers. It was superseded by series [2] by Kumar Kartikeya Dwivedi, which implements this functionality using unstable helpers. The third patch adds new helpers to issue and check SYN cookies without binding to a socket, which is useful in the synproxy scenario. The fourth patch adds a selftest, which includes an XDP program and a userspace control application. The XDP program uses socketless SYN cookie helpers and queries conntrack status instead of socket status. The userspace control application allows to tune parameters of the XDP program. This program also serves as a minimal example of usage of the new functionality. The last two patches expose the new helpers to TC BPF and extend the selftest. The draft of the new functionality was presented on Netdev 0x15 [3]. v2 changes: Split into two series, submitted bugfixes to bpf, dropped the conntrack patches, implemented the timestamp cookie in BPF using bpf_loop, dropped the timestamp cookie patch. v3 changes: Moved some patches from bpf to bpf-next, dropped the patch that changed error codes, split the new helpers into IPv4/IPv6, added verifier functionality to accept memory regions of fixed size. v4 changes: Converted the selftest to the test_progs runner. Replaced some deprecated functions in xdp_synproxy userspace helper. v5 changes: Fixed a bug in the selftest. Added questionable functionality to support new helpers in TC BPF, added selftests for it. v6 changes: Wrap the new helpers themselves into #ifdef CONFIG_SYN_COOKIES, replaced fclose with pclose and fixed the MSS for IPv6 in the selftest. v7 changes: Fixed the off-by-one error in indices, changed the section name to "xdp", added missing kernel config options to vmtest in CI. v8 changes: Properly rebased, dropped the first patch (the same change was applied by someone else), updated the cover letter. v9 changes: Fixed selftests for no_alu32. v10 changes: Selftests for s390x were blacklisted due to lack of support of kfunc, rebased the series, split selftests to separate commits, created ARG_PTR_TO_FIXED_SIZE_MEM and packed arg_size, addressed the rest of comments. [1]: https://lore.kernel.org/bpf/20211020095815.GJ28644@breakpoint.cc/t/ [2]: https://lore.kernel.org/bpf/20220114163953.1455836-1-memxor@gmail.com/ [3]: https://netdevconf.info/0x15/session.html?Accelerating-synproxy-with-XDP ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 21:20:30 -07:00
Maxim Mikityanskiy	784d5dc0ef	selftests/bpf: Add selftests for raw syncookie helpers in TC mode This commit extends selftests for the new BPF helpers bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} to also test the TC BPF functionality added in the previous commit. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-7-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 21:20:30 -07:00
Maxim Mikityanskiy	9a4cf07386	bpf: Allow the new syncookie helpers to work with SKBs This commit allows the new BPF helpers to work in SKB context (in TC BPF programs): bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6}. Using these helpers in TC BPF programs is not recommended, because it's unlikely that the BPF program will provide any substantional speedup compared to regular SYN cookies or synproxy, after the SKB is already created. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-6-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 21:20:30 -07:00
Maxim Mikityanskiy	fb5cd0ce70	selftests/bpf: Add selftests for raw syncookie helpers This commit adds selftests for the new BPF helpers: bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6}. xdp_synproxy_kern.c is a BPF program that generates SYN cookies on allowed TCP ports and sends SYNACKs to clients, accelerating synproxy iptables module. xdp_synproxy.c is a userspace control application that allows to configure the following options in runtime: list of allowed ports, MSS, window scale, TTL. A selftest is added to prog_tests that leverages the above programs to test the functionality of the new helpers. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-5-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 21:20:30 -07:00
Maxim Mikityanskiy	33bf988504	bpf: Add helpers to issue and check SYN cookies in XDP The new helpers bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} allow an XDP program to generate SYN cookies in response to TCP SYN packets and to check those cookies upon receiving the first ACK packet (the final packet of the TCP handshake). Unlike bpf_tcp_{gen,check}_syncookie these new helpers don't need a listening socket on the local machine, which allows to use them together with synproxy to accelerate SYN cookie generation. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-4-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 21:20:30 -07:00
Maxim Mikityanskiy	508362ac66	bpf: Allow helpers to accept pointers with a fixed size Before this commit, the BPF verifier required ARG_PTR_TO_MEM arguments to be followed by ARG_CONST_SIZE holding the size of the memory region. The helpers had to check that size in runtime. There are cases where the size expected by a helper is a compile-time constant. Checking it in runtime is an unnecessary overhead and waste of BPF registers. This commit allows helpers to accept pointers to memory without the corresponding ARG_CONST_SIZE, given that they define the memory region size in struct bpf_func_proto and use ARG_PTR_TO_FIXED_SIZE_MEM type. arg_size is unionized with arg_btf_id to reduce the kernel image size, and it's valid because they are used by different argument types. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-3-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 21:20:29 -07:00
Maxim Mikityanskiy	ac80287a6a	bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie bpf_tcp_gen_syncookie expects the full length of the TCP header (with all options), and bpf_tcp_check_syncookie accepts lengths bigger than sizeof(struct tcphdr). Fix the documentation that says these lengths should be exactly sizeof(struct tcphdr). While at it, fix a typo in the name of struct ipv6hdr. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-2-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 21:20:29 -07:00
Jakub Kicinski	e8b03391b6	Merge branch 'net-lan743x-pci11010-pci11414-devices-enhancements' Raju Lakkaraju says: ==================== net: lan743x: PCI11010 / PCI11414 devices Enhancements This patch series continues with the addition of supported features for the Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver. ==================== Link: https://lore.kernel.org/r/20220616041226.26996-1-Raju.Lakkaraju@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:52 -07:00
Raju Lakkaraju	311abcdddc	net: phy: add support to get Master-Slave configuration Add support to Master-Slave configuration and state Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:48 -07:00
Raju Lakkaraju	46b777ad9a	net: lan743x: Add support to SGMII 1G and 2.5G Add SGMII access read and write functions Add support to SGMII 1G and 2.5G for PCI11010/PCI11414 chips Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:48 -07:00
Raju Lakkaraju	6b3768ac8e	net: lan743x: Add support to Secure-ON WOL Add support to Magic Packet Detection with Secure-ON for PCI11010/PCI11414 chips Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:48 -07:00
Raju Lakkaraju	9aeb87d2b5	net: lan743x: Add support to LAN743x register dump Add support to LAN743x common register dump Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:48 -07:00
Jakub Kicinski	f050272436	Merge branch 'net-dsa-realtek-rtl8365mb-improve-handling-of-phy-modes' Alvin Šipraga says: ==================== net: dsa: realtek: rtl8365mb: improve handling of PHY modes This series introduces some minor cleanup of the driver and improves the handling of PHY interface modes to break the assumption that CPU ports are always over an external interface, and the assumption that user ports are always using an internal PHY. ==================== Link: https://lore.kernel.org/r/20220615225116.432283-1-alvin@pqrs.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:07 -07:00
Alvin Šipraga	a48b6e44a9	net: dsa: realtek: rtl8365mb: handle PHY interface modes correctly Realtek switches in the rtl8365mb family always have at least one port with a so-called external interface, supporting PHY interface modes such as RGMII or SGMII. The purpose of this patch is to improve the driver's handling of these ports. A new struct rtl8365mb_chip_info is introduced together with a static array of such structs. An instance of this struct is added for each supported switch, distinguished by its chip ID and version. Embedded in each chip_info struct is an array of struct rtl8365mb_extint, describing the external interfaces available. This is more specific than the old rtl8365mb_extint_port_map, which was only valid for switches with up to 6 ports. The struct rtl8365mb_extint also contains a bitmask of supported PHY interface modes, which allows the driver to distinguish which ports support RGMII. This corrects a previous mistake in the driver whereby it was assumed that any port with an external interface supports RGMII. This is not actually the case: for example, the RTL8367S has two external interfaces, only the second of which supports RGMII. The first supports only SGMII and HSGMII. This new design will make it easier to add support for other interface modes. Finally, rtl8365mb_phylink_get_caps() is fixed up to return supported capabilities based on the external interface properties described above. This addresses Vladimir's point in the linked thread that the capabilities are not actually a function of the DSA port type: Although most typical applications will treat the ports with internal PHY as user ports, there is no actual hardware limitation preventing one from using them as a CPU port. Equally, ports with external interface(s) may well be treated as user ports, even though it is typical to use those ports as CPU ports. Link: https://lore.kernel.org/netdev/20220510192301.5djdt3ghoavxulhl@bang-olufsen.dk/ Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:47 -07:00
Alvin Šipraga	b3456030f5	net: dsa: realtek: rtl8365mb: remove learn_limit_max private data member The variable is just assigned the value of a macro, so it can be removed. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:47 -07:00
Alvin Šipraga	ca5ecd4246	net: dsa: realtek: rtl8365mb: correct the max number of ports The maximum number of ports is actually 11, according to two observations: 1. The highest port ID used in the vendor driver is 10. Since port IDs are indexed from 0, and since DSA follows the same numbering system, this means up to 11 ports are to be presumed. 2. The registers with port mask fields always amount to a maximum port mask of 0x7FF, corresponding to a maximum 11 ports. In view of this, I also deleted the comment. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:47 -07:00
Alvin Šipraga	b325159d00	net: dsa: realtek: rtl8365mb: remove port_mask private data member There is no real need for this variable: the line change interrupt mask is sufficiently masked out when getting linkup_ind and linkdown_ind in the interrupt handler. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:46 -07:00
Alvin Šipraga	5eb1a23840	net: dsa: realtek: rtl8365mb: rename macro RTL8367RB -> RTL8367RB_VB The official name of this switch is RTL8367RB-VB, not RTL8367RB. There is also an RTL8367RB-VC which is rather different. Change the name of the CHIP_ID/_VER macros for reasons of consistency. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:46 -07:00
Jakub Kicinski	821c7733d2	Merge branch 'net-ipa-more-multi-channel-event-ring-work' Alex Elder says: ==================== net: ipa: more multi-channel event ring work This series makes a little more progress toward supporting multiple channels with a single event ring. The first removes the assumption that consecutive events are associated with the same RX channel. The second derives the channel associated with an event from the event itself, and the next does a small cleanup enabled by that. The fourth causes updates to occur for every event processed (rather once). And the final patch does a little more rework to make TX completion have more in common with RX completion. ==================== Link: https://lore.kernel.org/r/20220615165929.5924-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:08 -07:00
Alex Elder	81765eeac1	net: ipa: move more code out of gsi_channel_update() Move the processing done for TX channels in gsi_channel_update() into gsi_evt_ring_rx_update(). The called function is called for both RX and TX channels, so rename it to be gsi_evt_ring_update(). As a result, this code no longer assumes events in an event ring are associated with just one channel. Because all events in a ring are handled in that function, we can move the call to gsi_trans_move_complete() there, and can ring the event ring doorbell there as well after all new events in the ring have been processed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:04 -07:00
Alex Elder	9f1c3ad654	net: ipa: call gsi_evt_ring_rx_update() unconditionally When an RX transaction completes, we update the trans->len field to contain the actual number of bytes received. This is done in a loop in gsi_evt_ring_rx_update(). Change that function so it checks the data transfer direction recorded in the transaction, and only updates trans->len for RX transfers. Then call it unconditionally. This means events for TX endpoints will run through the loop without otherwise doing anything, but this will change shortly. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:04 -07:00
Alex Elder	2f48fb0edc	net: ipa: pass GSI pointer to gsi_evt_ring_rx_update() The only reason the event ring's channel pointer is needed in gsi_evt_ring_rx_update() is so we can get at its GSI pointer. We can pass the GSI pointer as an argument, along with the event ring ID, and thereby avoid using the event ring channel pointer. This is another step toward no longer assuming an event ring services a single channel. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:03 -07:00
Alex Elder	8eec783195	net: ipa: don't pass channel when mapping transaction Change gsi_channel_trans_map() so it derives the channel used from the transaction. Pass the index of the first TRE used by the transaction, and have the called function account for the fact that the last one used is what's important. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:03 -07:00

1 2 3 4 5 ...

1105448 commits