linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-06 10:57:46 +00:00

Author	SHA1	Message	Date
Danielle Ratson	5fc4053df3	mlxsw: ethtool: Remove max lanes filtering Currently, when a speed can be supported by different number of lanes, the supported link modes bitmask contains only link modes with a single number of lanes. This was done in order to prevent auto negotiation on number of lanes after 50G-1-lane and 100G-2-lanes link modes were introduced. For example, if a port's max width is 4, only link modes with 4 lanes will be presented as supported by that port, so 100G is always achieved by 4 lanes of 25G. After the previous patches that allow selection of the number of lanes, auto negotiation on number of lanes becomes practical. Remove that filtering of the maximum number of lanes supported link modes, so indeed all the supported and advertised link modes will be shown. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Danielle Ratson	7dc33f0914	ethtool: Expose the number of lanes in use Currently, ethtool does not expose how many lanes are used when the link is up. After adding a possibility to advertise or force a specific number of lanes, the lanes in use value can be either the maximum width of the port or below. Extend ethtool to expose the number of lanes currently in use for drivers that support it. For example: $ ethtool -s swp1 speed 100000 lanes 4 $ ethtool -s swp2 speed 100000 lanes 4 $ ip link set swp1 up $ ip link set swp2 up $ ethtool swp1 Settings for swp1: Supported ports: [ FIBRE Backplane ] Supported link modes: 1000baseT/Full 10000baseT/Full 1000baseKX/Full 10000baseKR/Full 10000baseR_FEC 40000baseKR4/Full 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full 25000baseCR/Full 25000baseKR/Full 25000baseSR/Full 50000baseCR2/Full 50000baseKR2/Full 100000baseKR4/Full 100000baseSR4/Full 100000baseCR4/Full 100000baseLR4_ER4/Full 50000baseSR2/Full 10000baseCR/Full 10000baseSR/Full 10000baseLR/Full 10000baseER/Full 50000baseKR/Full 50000baseSR/Full 50000baseCR/Full 50000baseLR_ER_FR/Full 50000baseDR/Full 100000baseKR2/Full 100000baseSR2/Full 100000baseCR2/Full 100000baseLR2_ER2_FR2/Full 100000baseDR2/Full 200000baseKR4/Full 200000baseSR4/Full 200000baseLR4_ER4_FR4/Full 200000baseDR4/Full 200000baseCR4/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 1000baseT/Full 10000baseT/Full 1000baseKX/Full 1000baseKX/Full 10000baseKR/Full 10000baseR_FEC 40000baseKR4/Full 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full 25000baseCR/Full 25000baseKR/Full 25000baseSR/Full 50000baseCR2/Full 50000baseKR2/Full 100000baseKR4/Full 100000baseSR4/Full 100000baseCR4/Full 100000baseLR4_ER4/Full 50000baseSR2/Full 10000baseCR/Full 10000baseSR/Full 10000baseLR/Full 10000baseER/Full 200000baseKR4/Full 200000baseSR4/Full 200000baseLR4_ER4_FR4/Full 200000baseDR4/Full 200000baseCR4/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Advertised link modes: 100000baseKR4/Full 100000baseSR4/Full 100000baseCR4/Full 100000baseLR4_ER4/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Speed: 100000Mb/s Lanes: 4 Duplex: Full Auto-negotiation: on Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Link detected: yes Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Danielle Ratson	c8907043c6	ethtool: Get link mode in use instead of speed and duplex parameters Currently, when user space queries the link's parameters, as speed and duplex, each parameter is passed from the driver to ethtool. Instead, get the link mode bit in use, and derive each of the parameters from it in ethtool. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:29 -08:00
Danielle Ratson	012ce4dd31	ethtool: Extend link modes settings uAPI with lanes Currently, when auto negotiation is on, the user can advertise all the linkmodes which correspond to a specific speed, but does not have a similar selector for the number of lanes. This is significant when a specific speed can be achieved using different number of lanes. For example, 2x50 or 4x25. Add 'ETHTOOL_A_LINKMODES_LANES' attribute and expand 'struct ethtool_link_settings' with lanes field in order to implement a new lanes-selector that will enable the user to advertise a specific number of lanes as well. When auto negotiation is off, lanes parameter can be forced only if the driver supports it. Add a capability bit in 'struct ethtool_ops' that allows ethtool know if the driver can handle the lanes parameter when auto negotiation is off, so if it does not, an error message will be returned when trying to set lanes. Example: $ ethtool -s swp1 lanes 4 $ ethtool swp1 Settings for swp1: Supported ports: [ FIBRE ] Supported link modes: 1000baseKX/Full 10000baseKR/Full 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full 25000baseCR/Full 25000baseSR/Full 50000baseCR2/Full 100000baseSR4/Full 100000baseCR4/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 40000baseCR4/Full 40000baseSR4/Full 40000baseLR4/Full 100000baseSR4/Full 100000baseCR4/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Speed: Unknown! Duplex: Unknown! (255) Auto-negotiation: on Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Link detected: no Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:28 -08:00
Danielle Ratson	189e7a8d94	ethtool: Validate master slave configuration before rtnl_lock() Create a new function for input validations to be called before rtnl_lock() and move the master slave validation to that function. This would be a cleanup for next patch that would add another validation to the new function. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:37:28 -08:00
Vladimir Oltean	99b8202b17	net: dsa: fix SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING getting ignored The bridge emits VLAN filtering events and quite a few others via switchdev with orig_dev = br->dev. After the blamed commit, these events started getting ignored. The point of the patch was to not offload switchdev objects for ports that didn't go through dsa_port_bridge_join, because the configuration is unsupported: - ports that offload a bonding/team interface go through dsa_port_bridge_join when that bonding/team interface is later bridged with another switch port or LAG - ports that don't offload LAG don't get notified of the bridge that is on top of that LAG. Sadly, a check is missing, which is that the orig_dev is equal to the bridge device. This check is compatible with the original intention, because ports that don't offload bridging because they use a software LAG don't have dp->bridge_dev set. On a semi-related note, we should not offload switchdev objects or populate dp->bridge_dev if the driver doesn't implement .port_bridge_join either. However there is no regression associated with that, so it can be done separately. Fixes: `5696c8aedf` ("net: dsa: Don't offload port attributes on standalone ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Tested-by: Tobias Waldekranz <tobias@waldekranz.com> Link: https://lore.kernel.org/r/20210202233109.1591466-1-olteanv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 18:25:49 -08:00
Jakub Kicinski	75b8f78fb9	Merge branch 'chelsio-cxgb-use-threaded-interrupts-for-deferred-work' Sebastian Andrzej Siewior says: ==================== chelsio: cxgb: Use threaded interrupts for deferred work Patch #2 fixes an issue in which del_timer_sync() and tasklet_kill() is invoked from the interrupt handler. This is probably a rare error case since it disables interrupts / the card in that case. Patch #1 converts a worker to use a threaded interrupt which is then also used in patch #2 instead adding another worker for this task (and flush_work() to synchronise vs rmmod). ==================== Link: https://lore.kernel.org/r/20210202170104.1909200-1-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:41:04 -08:00
Sebastian Andrzej Siewior	82154580a7	chelsio: cxgb: Disable the card on error in threaded interrupt t1_fatal_err() is invoked from the interrupt handler. The bad part is that it invokes (via t1_sge_stop()) del_timer_sync() and tasklet_kill(). Both functions must not be called from an interrupt because it is possible that it will wait for the completion of the timer/tasklet it just interrupted. In case of a fatal error, use t1_interrupts_disable() to disable all interrupt sources and then wake the interrupt thread with F_PL_INTR_SGE_ERR as pending flag. The threaded-interrupt will stop the card via t1_sge_stop() and not re-enable the interrupts again. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:41:01 -08:00
Sebastian Andrzej Siewior	fec7fa0a75	chelsio: cxgb: Replace the workqueue with threaded interrupt The external interrupt (F_PL_INTR_EXT) needs to be handled in a process context and this is accomplished by utilizing a workqueue. The process context can also be provided by a threaded interrupt instead of a workqueue. The threaded interrupt can be used later for other interrupt related processing which require non-atomic context without using yet another workqueue. free_irq() also ensures that the thread is done which is currently missing (the worker could continue after the module has been removed). Save pending flags in pending_thread_intr. Use the same mechanism to disable F_PL_INTR_EXT as interrupt source like it is used before the worker is scheduled. Enable the interrupt again once t1_elmer0_ext_intr_handler() is done. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:41:01 -08:00
Jakub Kicinski	462e99a18b	Merge branch 'support-for-octeontx2-98xx-cpt-block' Srujana Challa says: ==================== Support for OcteonTX2 98xx CPT block. OcteonTX2 series of silicons have multiple variants, the 98xx variant has two crypto (CPT) blocks to double the crypto performance. This patchset adds support for new CPT block(CPT1). ==================== Link: https://lore.kernel.org/r/20210202152709.20450-1-schalla@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:31:36 -08:00
Srujana Challa	c57c58fd5c	octeontx2-af: Handle CPT function level reset When FLR is initiated for a VF (PCI function level reset), the parent PF gets a interrupt. PF then sends a message to admin function (AF), which then cleans up all resources attached to that VF. This patch adds support to handle CPT FLR. Signed-off-by: Narayana Prasad Raju Atherya <pathreya@marvell.com> Signed-off-by: Suheil Chandran <schandran@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:31:34 -08:00
Srujana Challa	b0f60fab78	octeontx2-af: Add support for CPT1 in debugfs Adds support to display block CPT1 stats at "/sys/kernel/debug/octeontx2/cpt1". Signed-off-by: Mahipal Challa <mchalla@marvell.com> Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:31:34 -08:00
Srujana Challa	de2854c87c	octeontx2-af: Mailbox changes for 98xx CPT block This patch changes CPT mailbox message format to support new block CPT1 in 98xx silicon. cpt_rd_wr_reg -> Modify cpt_rd_wr_reg mailbox and its handler to accommodate new block CPT1. cpt_lf_alloc -> Modify cpt_lf_alloc mailbox and its handler to configure LFs from a block address out of multiple blocks of same type. If a PF/VF needs to configure LFs from both the blocks then this mbox should be called twice. Signed-off-by: Mahipal Challa <mchalla@marvell.com> Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:31:34 -08:00
Mike Looijmans	e0183b974d	net: mdiobus: Prevent spike on MDIO bus reset signal The mdio_bus reset code first de-asserted the reset by allocating with GPIOD_OUT_LOW, then asserted and de-asserted again. In other words, if the reset signal defaulted to asserted, there'd be a short "spike" before the reset. Here is what happens depending on the pre-existing state of the reset signal: Reset (previously asserted): ~~~\|_\|~~~~\|_______ Reset (previously deasserted): _____\|~~~~\|_______ ^ ^ ^ A B C At point A, the low going transition is because the reset line is requested using GPIOD_OUT_LOW. If the line is successfully requested, the first thing we do is set it high _without_ any delay. This is point B. So, a glitch occurs between A and B. We then fsleep() and finally set the GPIO low at point C. Requesting the line using GPIOD_OUT_HIGH eliminates the A and B transitions. Instead we get: Reset (previously asserted) : ~~~~~~~~~~\|______ Reset (previously deasserted): ____\|~~~~~\|______ ^ ^ A C Where A and C are the points described above in the code. Point B has been eliminated. The issue was found when we pulled down the reset signal for the Marvell 88E1512P PHY (because it requires at least 50ms after POR with an active clock). Looking at the reset signal with a scope revealed a short spike, point B in the artwork above. Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210202143239.10714-1-mike.looijmans@topic.nl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 17:28:05 -08:00
Dan Carpenter	4160d9ec5b	net: mscc: ocelot: fix error code in mscc_ocelot_probe() Probe should return an error code if platform_get_irq_byname() fails but it returns success instead. Fixes: `6c30384eb1` ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXyFIl4V9hgxYM@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 16:18:10 -08:00
Dan Carpenter	e0c1623357	net: mscc: ocelot: fix error handling bugs in mscc_ocelot_init_ports() There are several error handling bugs in mscc_ocelot_init_ports(). I went through the code, and carefully audited it and made fixes and cleanups. 1) The ocelot_probe_port() function didn't have a mirror release function so it was hard to follow. I created the ocelot_release_port() function. 2) In the ocelot_probe_port() function, if the register_netdev() call failed, then it lead to a double free_netdev(dev) bug. Fix this by setting "ocelot->ports[port] = NULL" on the error path. 3) I was concerned that the "port" which comes from of_property_read_u32() might be out of bounds so I added a check for that. 4) In the original code if ocelot_regmap_init() failed then the driver tried to continue but I think that should be a fatal error. 5) If ocelot_probe_port() failed then the most recent devlink was leaked. The fix for mostly came Vladimir Oltean. Get rid of "registered_ports" and just set a bit in "devlink_ports_registered" to say when the devlink port has been registered (and needs to be unregistered on error). There are fewer than 32 ports so a u32 is large enough for this purpose. 6) The error handling if the final ocelot_port_devlink_init() failed had two problems. The "while (port-- >= 0)" loop should have been "--port" pre-op instead of a post-op to avoid a buffer underflow. The "if (!registered_ports[port])" condition was reversed leading to resource leaks and double frees. Fixes: `6c30384eb1` ("net: mscc: ocelot: register devlink ports") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/YBkXhqRxHtRGzSnJ@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 16:18:10 -08:00
Jakub Kicinski	2d912da016	Merge branch 'net-use-indirect_call-in-some-dst_ops' Brian Vazquez says: ==================== net: use INDIRECT_CALL in some dst_ops This patch series uses the INDIRECT_CALL wrappers in some dst_ops functions to mitigate retpoline costs. Benefits depend on the platform as described below. Background: The kernel rewrites the retpoline code at __x86_indirect_thunk_r11 depending on the CPU's requirements. The INDIRECT_CALL wrappers provide hints on possible targets and save the retpoline overhead using a direct call in case the target matches one of the hints. The retpoline overhead for the following three cases has been measured by Luigi Rizzo in microbenchmarks, using CPU performance counters, and cover reasonably well the range of possible retpoline overheads compared to a plain indirect call (in equal conditions, specifically with predicted branch, hot cache): - just "jmp (%r11)" on modern platforms like Intel Cascadelake. In this case the overhead is just 2 clock cycles: - "lfence; jmp (%r11)" on e.g. some recent AMD CPUs. In this case the lfence is blocked until pending reads complete, so the actual overhead depends on previous instructions. The best case we have measured 15 clock cycles of overhead. - worst case, e.g. skylake, the full retpoline is used __x86_indirect_thunk_r11: call set_u_target capture_speculation: pause lfence jmp capture_speculation .align 16 set_up_target: mov %r11, (%rsp) ret In this case the overhead has been measured in 35-40 clock cycles. The actual time saved hence depends on the platform and current clock speed (which varies heavily, especially when C-states are active). Also note that actual benefit might be lower than expected if the longer retpoline overlaps with some pending memory read. MEASUREMENTS: The INDIRECT_CALL wrappers in this patchset involve the processing of incoming SYN and generation of syncookies. Hence, the test has been run by configuring a receiving host with a single NIC rx queue, disabling RPS and RFS so that all processing occurs on the same core. An external source generates SYN fast enough to saturate the receiving CPU. We ran two sets of experiments, with and without the dst_output patch, comparing the number of syncookies generated over a 20s period in multiple runs. Assuming the CPU is saturated, the time per packet is t = number_of_packets/total_time and if the two datasets have statistically meaningful difference, the difference in times between the two cases gives an estimate of the benefits from one INDIRECT_CALL. Here are the experimental results: Skylake Syncookies over 20s (5 tests) --------------------------------------------------- indirect 9166325 9182023 9170093 9134014 9171082 retpoline 9099308 9126350 9154841 9056377 9122376 Computing the stats on the ns_pkt = 20e6/total_packets gives the following: $ ministat -c 95 -w 70 /tmp/sk-indirect /tmp/sk-retp x /tmp/sk-indirect + /tmp/sk-retp +----------------------------------------------------------------------+ \|x xx x + x + + + +\| \|\|______M__A_______\|_\|____________M_____A___________________\| \| +----------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 2.17817e-06 2.18962e-06 2.181e-06 2.182292e-06 4.3252133e-09 + 5 2.18464e-06 2.20839e-06 2.19241e-06 2.194974e-06 8.8695958e-09 Difference at 95.0% confidence 1.2682e-08 +/- 1.01766e-08 0.581132% +/- 0.466326% (Student's t, pooled s = 6.97772e-09) This suggests a difference of 13ns +/- 10ns Our expectation from microbenchmarks was 35-40 cycles per call, but part of the gains may be eaten by stalls from pending memory reads. For Cascadelake: Cascadelake Syncookies over 20s (5 tests) --------------------------------------------------------- indirect 10339797 10297547 `10366826` 10378891 10384854 retpoline 10332674 10366805 10320374 10334272 10374087 Computing the stats on the ns_pkt = 20e6/total_packets gives no meaningful difference even at just 80% (this was expected): $ ministat -c 80 -w 70 /tmp/cl-indirect /tmp/cl-retp x /tmp/cl-indirect + /tmp/cl-retp +----------------------------------------------------------------------+ \| x x + * x + + + x\| \|\|______________\|_M_________A_____A_______M________\|___\| \| +----------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 1.92588e-06 1.94221e-06 1.92923e-06 1.931716e-06 6.6936746e-09 + 5 1.92788e-06 1.93791e-06 1.93531e-06 1.933188e-06 4.3734106e-09 No difference proven at 80.0% confidence ==================== Link: https://lore.kernel.org/r/20210201174132.3534118-1-brianvv@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 14:51:41 -08:00
Brian Vazquez	bbd807dfbf	net: indirect call helpers for ipv4/ipv6 dst_check functions This patch avoids the indirect call for the common case: ip6_dst_check and ipv4_dst_check Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 14:51:40 -08:00
Brian Vazquez	f67fbeaebd	net: use indirect call helpers for dst_mtu This patch avoids the indirect call for the common case: ip6_mtu and ipv4_mtu Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 14:51:39 -08:00
Brian Vazquez	6585d7dc49	net: use indirect call helpers for dst_output This patch avoids the indirect call for the common case: ip6_output and ip_output Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 14:51:39 -08:00
Brian Vazquez	e43b219064	net: use indirect call helpers for dst_input This patch avoids the indirect call for the common case: ip_local_deliver and ip6_input Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 14:51:39 -08:00
Emil Renner Berthing	4f4e54366e	net: usb: cdc_ncm: use new API for bh tasklet This converts the driver to use the new tasklet API introduced in commit `12cc923f1c` ("tasklet: Introduce new initialization API") It is unfortunate that we need to add a pointer to the driver context to get back to the usbnet device, but the space will be reclaimed once there are no more users of the old API left and we can remove the data value and flag from the tasklet struct. Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Link: https://lore.kernel.org/r/20210130234637.26505-1-kernel@esmil.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-03 14:22:21 -08:00
Geert Uytterhoeven	32d1bbb1d6	net: fec: Silence M5272 build warnings If CONFIG_M5272=y: drivers/net/ethernet/freescale/fec_main.c: In function ‘fec_restart’: drivers/net/ethernet/freescale/fec_main.c:948:6: warning: unused variable ‘val’ [-Wunused-variable] 948 \| u32 val; \| ^~~ drivers/net/ethernet/freescale/fec_main.c: In function ‘fec_get_mac’: drivers/net/ethernet/freescale/fec_main.c:1667:28: warning: unused variable ‘pdata’ [-Wunused-variable] 1667 \| struct fec_platform_data *pdata = dev_get_platdata(&fep->pdev->dev); \| ^~~~~ Fix this by moving the variable declarations inside the existing #ifdef blocks. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210202130650.865023-1-geert@linux-m68k.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 19:02:26 -08:00
Eric Dumazet	fca23f37f3	inet: do not export inet_gro_{receive\|complete} inet_gro_receive() and inet_gro_complete() are part of GRO engine which can not be modular. Similarly, inet_gso_segment() does not need to be exported, being part of GSO stack. In other words, net/ipv6/ip6_offload.o is part of vmlinux, regardless of CONFIG_IPV6. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20210202154145.1568451-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:41:10 -08:00
Jakub Kicinski	0256317a61	This time, only RTNL locking reduction fallout. - cfg80211_dev_rename() requires RTNL - cfg80211_change_iface() and cfg80211_set_encryption() require wiphy mutex (was missing in wireless extensions) - cfg80211_destroy_ifaces() requires wiphy mutex - netdev registration can fail due to notifiers, and then notifiers are "unrolled", need to handle this properly -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmAZZAoACgkQB8qZga/f l8QW4A/+NSpqm/MMuw9IUwRsZ3wJUYI2SDj7xhbfpFEdNAmm7RjJLxF8NqCzFtqo 2XM9DzGkbvQKlPi4DmahnRVycFqlqTGkDDs5WOvg9NdL8/zygDLsTMWJsvyI6XEp 4Y8qLuwJpoaxDhmtEpjBNbQiZrXDdYptMRsvWpaLiLN8nlzWim+Sm+qiMeIWpxz2 axutgbyfO4pREU3wxRbxe2V0RNLxRqJ7g10siAkchP+NoK2SjM1tQKzyuN7ruImZ cVriA+j1u43rWseedoKZzofCvgd74nZAi87u8dpk673s7V71//8uTHhpIOBYUbfp 6mn1V2QhjiLZ3UZfdQFFQ+WjoowSEPMQ6gPe0EdPlWTYVNRWcQXzjIlznooZnxrE KVWYDYxkQKMgqrTFdUjcjOza9m4DG0aAJqqSQZ3r9KsRftseLM680vZY6AoteOOM WeaEt3p1Qaza18CA3BH1wVHmbNnwIfCiHtmsAefkgTD3cVH28IIUyGk4RsFPGWi0 APNNQOyiPPQCVnDAMZjhrKPMX2NTyWw5UziFhPPc2jo00XjPW0+mPRjiSO2W55kz ixui/foUN5um3kEc9wJgh62eLYjOAtBomKCNZEZQjJpS9VtsvyoaY/ZjK69lP3wQ XERj8D+fVpm8Hidbv7tb2gElJvra0X4ZCky2KixnICjCrwDBBzw= =bgnR -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-net-next-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== This time, only RTNL locking reduction fallout. - cfg80211_dev_rename() requires RTNL - cfg80211_change_iface() and cfg80211_set_encryption() require wiphy mutex (was missing in wireless extensions) - cfg80211_destroy_ifaces() requires wiphy mutex - netdev registration can fail due to notifiers, and then notifiers are "unrolled", need to handle this properly * tag 'mac80211-next-for-net-next-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: cfg80211: fix netdev registration deadlock cfg80211: call cfg80211_destroy_ifaces() with wiphy lock held wext: call cfg80211_set_encryption() with wiphy lock held wext: call cfg80211_change_iface() with wiphy lock held nl80211: call cfg80211_dev_rename() under RTNL ==================== Link: https://lore.kernel.org/r/20210202144106.38207-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:40:42 -08:00
Jakub Kicinski	390d9b565e	mlx5-updates-2021-02-01 mlx5 netdev updates: 1) Trivial refactoring ahead of the upcoming uplink representor series. 2) Increased RSS table size to 256, for better results 3) Misc. Cleanup and very trivial improvements -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmAY9rQACgkQSD+KveBX +j5PpggAy8h7xd4zUZpFWvoTgmzEUJV04StwdghfR+m7EtlJyU3mqGkbGoWV9d0O Vljh9sRs0V1/CnABThJ/UG5dqkJjU1ZbQDhHK/HLr9U0MggDoJqC1T6OT4+p3TRe Px91P9eYE73chhf1aDUSi9MI+xGvoGI1Dt3K2WX3cHiftl1U11G3w6hiL9/9bNVK xBlHZP6qtqIoFEs0nh7Ze/IsR0v7i+HTdujXy3g7BdJ1Q8hG/mEfmHxZV8YIjX9s huvrIlSXtLKCk8JGknJtgGVPF+5m5K6GlWPg7chZqhkK51G3vJn952LyWhqgwT5r I1S/vccVX4ROYTWaaCVfkTcmRjp1Gg== =iPSk -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-02-01 mlx5 netdev updates: 1) Trivial refactoring ahead of the upcoming uplink representor series. 2) Increased RSS table size to 256, for better results 3) Misc. Cleanup and very trivial improvements * tag 'mlx5-updates-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: DR, Avoid unnecessary csum recalculation on supporting devices net/mlx5e: CT: remove useless conversion to PTR_ERR then ERR_PTR net/mlx5e: accel, remove redundant space net/mlx5e: kTLS, Improve TLS RX workqueue scope net/mlx5e: remove h from printk format specifier net/mlx5e: Increase indirection RQ table size to 256 net/mlx5e: Enable napi in channel's activation stage net/mlx5e: Move representor neigh init into profile enable net/mlx5e: Avoid false lock depenency warning on tc_ht net/mlx5e: Move set vxlan nic info to profile init net/mlx5e: Move netif_carrier_off() out of mlx5e_priv_init() net/mlx5e: Refactor mlx5e_netdev_init/cleanup to mlx5e_priv_init/cleanup net/mxl5e: Add change profile method net/mlx5e: Separate between netdev objects and mlx5e profiles initialization ==================== Link: https://lore.kernel.org/r/20210202065457.613312-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:38:54 -08:00
Jakub Kicinski	a1a809c489	Merge branch 'mptcp-add_addr-enhancements' Mat Martineau says: ==================== mptcp: ADD_ADDR enhancements This patch series from the MPTCP tree contains enhancements and associated tests for the ADD_ADDR ("add address") MPTCP option. This option allows already-connected MPTCP peers to share additional IP addresses with each other, which can then be used to create additional subflows within those MPTCP connections. Patches 1 & 2 remove duplicated data in the per-connection path manager structure. Patches 3-6 initiate additional subflows when an address is added using the netlink path manager interface and improve ADD_ADDR signaling reliability, subject to configured limits. Self tests are also updated. Patches 7-15 add new support for optional port numbers in ADD_ADDR. This includes creating an additional in-kernel TCP listening socket for the requested port number, validating the port number when processing incoming subflow connections, including the port number in netlink interfaces, and adding some new MIBs. New self test cases are added for subflows connecting with alternate port numbers. ==================== Link: https://lore.kernel.org/r/20210201230920.66027-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:22 -08:00
Geliang Tang	8a127bf68a	selftests: mptcp: add testcases for ADD_ADDR with port This patch adds testcases for ADD_ADDR with port and the related MIB counters check in chk_add_nr. The output looks like this: 24 signal address with port syn[ ok ] - synack[ ok ] - ack[ ok ] add[ ok ] - echo [ ok ] - pt [ ok ] syn[ ok ] - synack[ ok ] - ack[ ok ] syn[ ok ] - ack [ ok ] 25 subflow and signal with port syn[ ok ] - synack[ ok ] - ack[ ok ] add[ ok ] - echo [ ok ] - pt [ ok ] syn[ ok ] - synack[ ok ] - ack[ ok ] syn[ ok ] - ack [ ok ] 26 remove single address with port syn[ ok ] - synack[ ok ] - ack[ ok ] add[ ok ] - echo [ ok ] - pt [ ok ] syn[ ok ] - synack[ ok ] - ack[ ok ] syn[ ok ] - ack [ ok ] rm [ ok ] - sf [ ok ] Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:20 -08:00
Geliang Tang	2fbdd9eaf1	mptcp: add the mibs for ADD_ADDR with port This patch adds the mibs for ADD_ADDR with port: MPTCP_MIB_PORTADD for received ADD_ADDR suboption with a port number. MPTCP_MIB_PORTSYNRX, MPTCP_MIB_PORTSYNACKRX, MPTCP_MIB_PORTACKRX, for received MP_JOIN's SYN or SYN/ACK or ACK with a port number which is different from the msk's port number. MPTCP_MIB_MISMATCHPORTSYNRX and MPTCP_MIB_MISMATCHPORTACKRX, for received SYN or ACK MP_JOIN with a mismatched port-number. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:20 -08:00
Geliang Tang	d4a7726a79	selftests: mptcp: add port argument for pm_nl_ctl This patch adds a new argument for pm_nl_ctl tool. We can use it like this: # pm_nl_ctl add 10.0.2.1 flags signal port 10100 # pm_nl_ctl dump id 1 flags signal 10.0.2.1 10100 Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:19 -08:00
Geliang Tang	a77e9179c7	mptcp: deal with MPTCP_PM_ADDR_ATTR_PORT in PM netlink This patch adds MPTCP_PM_ADDR_ATTR_PORT filling and parsing in PM netlink. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:19 -08:00
Geliang Tang	60b57bf76c	mptcp: enable use_port when invoke addresses_equal When dealing with the addresses list local_addr_list or anno_list, we should enable the function addresses_equal's parameter use_port. And enable it in address_zero too. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:19 -08:00
Geliang Tang	5bc56388c7	mptcp: add port number check for MP_JOIN This patch adds two new helpers, subflow_use_different_sport and subflow_use_different_dport, to check whether the subflow's source or destination port number is different from the msk's port number. When receiving the MP_JOIN's SYN/SYNACK/ACK, we do these port number checks and print out the different port numbers. And furthermore, when receiving the MP_JOIN's SYN/ACK, we also use a new helper mptcp_pm_sport_in_anno_list to check whether this port number is announced. If it isn't, we need to abort this connection. This patch also populates the local address's port field in local_address. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:19 -08:00
Geliang Tang	ec20e14396	mptcp: add a new helper subflow_req_create_thmac This patch adds a new helper named subflow_req_create_thmac, which is extracted from subflow_token_join_request. It initializes subflow_req's local_nonce and thmac fields, those are the more expensive to populate. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:19 -08:00
Geliang Tang	b5e2e42fe5	mptcp: drop unused skb in subflow_token_join_request This patch drops the unused parameter skb in subflow_token_join_request. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:19 -08:00
Geliang Tang	1729cf186d	mptcp: create the listening socket for new port This patch creates a listening socket when an address with a port-number is added by PM netlink. Then binds the new port to the socket, and listens for new connections. When the address is removed or the addresses are flushed by PM netlink, release the listening socket. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:19 -08:00
Geliang Tang	6208fd822a	selftests: mptcp: add testcases for newly added addresses This patch adds testcases to create subflows or signal addresses for the newly added IPv4 or IPv6 addresses. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:18 -08:00
Geliang Tang	2e8cbf45cf	selftests: mptcp: use minus values for removing address numbers This patch changes the removing addresses numbers to minus values, left the plus values for the adding addresses numbers. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:18 -08:00
Geliang Tang	b5a7acd3bd	mptcp: send ack for every add_addr This patch changes the sending ACK conditions for the ADD_ADDR, send an ACK packet for any ADD_ADDR, not just when ipv6 addresses or port numbers are included. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/139 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:18 -08:00
Geliang Tang	875b76718f	mptcp: create subflow or signal addr for newly added address Currently, when a new MPTCP endpoint is added, the existing MPTCP sockets are not affected. This patch implements a new function mptcp_nl_add_subflow_or_signal_addr, invoked when an address is added from PM netlink. This function traverses the MPTCP sockets list and invokes mptcp_pm_create_subflow_or_signal_addr to try to create a subflow or signal an address for the newly added address, if local constraint allows that. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/19 Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:18 -08:00
Geliang Tang	a914e58668	mptcp: drop _max fields in mptcp_pm_data This patch drops the per-msk values add_addr_signal_max, add_addr_accept_max, local_addr_max and subflows_max fields in struct mptcp_pm_data, uses the pernet _max values instead. And adds four new helpers to get the pernet *_max values separately. Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:18 -08:00
Geliang Tang	72603d207d	mptcp: use WRITE_ONCE for the pernet *_max This patch uses WRITE_ONCE() for all the pernet add_addr_signal_max, add_addr_accept_max, local_addr_max and subflows_max fields in struct pm_nl_pernet to avoid concurrency issues. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:37:18 -08:00
Kai-Heng Feng	e6d6ca6e12	r8169: Add support for another RTL8168FP According to the vendor driver, the new chip with XID 0x54b is essentially the same as the one with XID 0x54a, but it doesn't need the firmware. So add support accordingly. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/20210202044813.1304266-1-kai.heng.feng@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 18:02:55 -08:00
Jakub Kicinski	389cb1ecc8	Merge branch 'add-notifications-when-route-hardware-flags-change' Ido Schimmel says: ==================== Add notifications when route hardware flags change Routes installed to the kernel can be programmed to capable devices, in which case they are marked with one of two flags. RTM_F_OFFLOAD for routes that offload traffic from the kernel and RTM_F_TRAP for routes that trap packets to the kernel for processing (e.g., host routes). These flags are of interest to routing daemons since they would like to delay advertisement of routes until they are installed in hardware. This allows them to avoid packet loss or misrouted packets. Currently, routing daemons do not receive any notifications when these flags are changed, requiring them to poll the kernel tables for changes which is inefficient. This series addresses the issue by having the kernel emit RTM_NEWROUTE notifications whenever these flags change. The behavior is controlled by two sysctls (net.ipv4.fib_notify_on_flag_change and net.ipv6.fib_notify_on_flag_change) that default to 0 (no notifications). Note that even if route installation in hardware is improved to be more synchronous, these notifications are still of interest. For example, a multipath route can change from RTM_F_OFFLOAD to RTM_F_TRAP if its neighbours become invalid. A routing daemon can choose to withdraw / replace the route in that case. In addition, the deletion of a route from the kernel can prompt the installation of an identical route (already in kernel, with an higher metric) to hardware. For testing purposes, netdevsim is aligned to simulate a "real" driver that programs routes to hardware. Series overview: Patches #1-#2 align netdevsim to perform route programming in a non-atomic context Patches #3-#5 add sysctl to control IPv4 notifications Patches #6-#8 add sysctl to control IPv6 notifications Patch #9 extends existing fib tests to set sysctls before running tests Patch #10 adds test for fib notifications over netdevsim ==================== Link: https://lore.kernel.org/r/20210201194757.3463461-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 17:46:02 -08:00
Amit Cohen	19d36d2971	selftests: netdevsim: Add fib_notifications test Add test to check fib notifications behavior. The test checks route addition, route deletion and route replacement for both IPv4 and IPv6. When fib_notify_on_flag_change=0, expect single notification for route addition/deletion/replacement. When fib_notify_on_flag_change=1, expect: - two notification for route addition/replacement, first without RTM_F_TRAP and second with RTM_F_TRAP. - single notification for route deletion. $ ./fib_notifications.sh TEST: IPv4 route addition [ OK ] TEST: IPv4 route deletion [ OK ] TEST: IPv4 route replacement [ OK ] TEST: IPv6 route addition [ OK ] TEST: IPv6 route deletion [ OK ] TEST: IPv6 route replacement [ OK ] Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 17:45:59 -08:00
Amit Cohen	d1a7a48928	selftests: Extend fib tests to run with and without flags notifications Run the test cases with both `fib_notify_on_flag_change` sysctls set to '1', and then with both sysctls set to '0' to verify there are no regressions in the test when notifications are added. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 17:45:59 -08:00
Amit Cohen	907eea4868	net: ipv6: Emit notification when fib hardware flags are changed After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. It is also possible for a route already installed in hardware to change its action and therefore its flags. For example, a host route that is trapping packets can be "promoted" to perform decapsulation following the installation of an IPinIP/VXLAN tunnel. Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed. The aim is to provide an indication to user-space (e.g., routing daemons) about the state of the route in hardware. Introduce a sysctl that controls this behavior. Keep the default value at 0 (i.e., do not emit notifications) for several reasons: - Multiple RTM_NEWROUTE notification per-route might confuse existing routing daemons. - Convergence reasons in routing daemons. - The extra notifications will negatively impact the insertion rate. - Not all users are interested in these notifications. Move fib6_info_hw_flags_set() to C file because it is no longer a short function. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 17:45:59 -08:00
Amit Cohen	efc42879ec	net: Do not call fib6_info_hw_flags_set() when IPv6 is disabled With the next patch mlxsw and netdevsim will fail in compilation if CONFIG_IPV6 is disabled. Do not call fib6_info_hw_flags_set() when IPv6 is disabled. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 17:45:59 -08:00
Amit Cohen	fbaca8f895	net: Pass 'net' struct as first argument to fib6_info_hw_flags_set() The next patch will emit notification when hardware flags are changed, in case that fib_notify_on_flag_change sysctl is set to 1. To know sysctl values, net struct is needed. This change is consistent with the IPv4 version, which gets 'net' struct as its first argument. Currently, the only callers of this function are mlxsw and netdevsim. Patch the callers to pass net. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 17:45:59 -08:00
Amit Cohen	680aea08e7	net: ipv4: Emit notification when fib hardware flags are changed After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. It is also possible for a route already installed in hardware to change its action and therefore its flags. For example, a host route that is trapping packets can be "promoted" to perform decapsulation following the installation of an IPinIP/VXLAN tunnel. Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed. The aim is to provide an indication to user-space (e.g., routing daemons) about the state of the route in hardware. Introduce a sysctl that controls this behavior. Keep the default value at 0 (i.e., do not emit notifications) for several reasons: - Multiple RTM_NEWROUTE notification per-route might confuse existing routing daemons. - Convergence reasons in routing daemons. - The extra notifications will negatively impact the insertion rate. - Not all users are interested in these notifications. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Acked-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-02 17:45:59 -08:00

1 2 3 4 5 ...

984655 commits