Commit graph

45905 commits

Author SHA1 Message Date
Tom Rix
51aaa68222 net: alteon: remove unused len variable
clang with W=1 reports
drivers/net/ethernet/alteon/acenic.c:2438:10: error: variable
  'len' set but not used [-Werror,-Wunused-but-set-variable]
                int i, len = 0;
                       ^
This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-02 13:43:43 +01:00
Ido Schimmel
cc19439f70 mlxsw: core_thermal: Simplify transceiver module get_temp() callback
The get_temp() callback of a thermal zone associated with a transceiver
module no longer needs to read the temperature thresholds of the module.
Therefore, simplify the callback by only reading the temperature.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-02 13:42:30 +01:00
Ido Schimmel
c1536d856e mlxsw: core_thermal: Make mlxsw_thermal_module_init() void
The function can no longer fail so make it void and remove the
associated error path.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-02 13:42:30 +01:00
Ido Schimmel
5601ef91fb mlxsw: core_thermal: Use static trip points for transceiver modules
The driver registers a thermal zone for each transceiver module and
tries to set the trip point temperatures according to the thresholds
read from the transceiver. If a threshold cannot be read or if a
transceiver is unplugged, the trip point temperature is set to zero,
which means that it is disabled as far as the thermal subsystem is
concerned.

A recent change in the thermal core made it so that such trip points are
no longer marked as disabled, which lead the thermal subsystem to
incorrectly set the associated cooling devices to the their maximum
state [1]. A fix to restore this behavior was merged in commit
f1b80a3878 ("thermal: core: Restore behavior regarding invalid trip
points"). However, the thermal maintainer suggested to not rely on this
behavior and instead always register a valid array of trip points [2].

Therefore, create a static array of trip points with sane defaults
(suggested by Vadim) and register it with the thermal zone of each
transceiver module. User space can choose to override these defaults
using the thermal zone sysfs interface since these files are writeable.

Before:

 $ cat /sys/class/thermal/thermal_zone11/type
 mlxsw-module11
 $ cat /sys/class/thermal/thermal_zone11/trip_point_*_temp
 65000
 75000
 80000

After:

 $ cat /sys/class/thermal/thermal_zone11/type
 mlxsw-module11
 $ cat /sys/class/thermal/thermal_zone11/trip_point_*_temp
 55000
 65000
 80000

Also tested by reverting commit f1b80a3878 ("thermal: core: Restore
behavior regarding invalid trip points") and making sure that the
associated cooling devices are not set to their maximum state.

[1] https://lore.kernel.org/linux-pm/ZA3CFNhU4AbtsP4G@shredder/
[2] https://lore.kernel.org/linux-pm/f78e6b70-a963-c0ca-a4b2-0d4c6aeef1fb@linaro.org/

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-02 13:42:30 +01:00
Sylwester Dziedziuch
ceb29474bb i40e: Add support for VF to specify its primary MAC address
Currently in the i40e driver there is no implementation of different
MAC address handling depending on whether it is a legacy or primary.
Introduce new checks for VF to be able to specify its primary MAC
address based on the VIRTCHNL_ETHER_ADDR_PRIMARY type.

Primary MAC address are treated differently compared to legacy
ones in a scenario where:
1. If a unicast MAC is being added and it's specified as
VIRTCHNL_ETHER_ADDR_PRIMARY, then replace the current
default_lan_addr.addr.
2. If a unicast MAC is being deleted and it's type
is specified as VIRTCHNL_ETHER_ADDR_PRIMARY, then zero the
hw_lan_addr.addr.

Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-02 13:20:52 +01:00
Jakub Kicinski
d74aab2ca1 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-03-30 (documentation, ice)

This series contains updates to driver documentation and the ice driver.

Tony removes links and addresses related to the out-of-tree driver from the
Intel ethernet driver documentation.

Jake removes a comment that is no longer valid to the ice driver.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: remove comment about not supporting driver reinit
  Documentation/eth/intel: Remove references to SourceForge
  Documentation/eth/intel: Update address for driver support
====================

Link: https://lore.kernel.org/r/20230330165935.2503604-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-31 21:41:11 -07:00
Simon Horman
709d0b880c octeontx2-af: update type of prof fields in nix_aw_enq_req
Update type of prof and prof_mask fields in nix_as_enq_req
from u64 to struct nix_bandprof_s, which is 128 bits wide.

This is to address warnings with compiling with gcc-12 W=1
regarding string fortification.

Although the union of which these fields are a member is 128bits
wide, and thus writing a 128bit entity is safe, the compiler flags
a problem as the field being written is only 64 bits wide.

  CC [M]  drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.o
scripts/Makefile.build:252: ./drivers/net/ethernet/marvell/octeontx2/nic/Makefile: otx2_dcbnl.o is added to multiple modules: rvu_nicpf rvu_nicvf
  CC [M]  drivers/net/ethernet/marvell/octeontx2/nic/otx2_dcbnl.o
  CC [M]  drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.o
  CC [M]  drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.o
  CC [M]  drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.o
In file included from ./include/linux/string.h:254,
                 from ./include/linux/bitmap.h:11,
                 from ./include/linux/cpumask.h:12,
                 from ./arch/x86/include/asm/paravirt.h:17,
                 from ./arch/x86/include/asm/cpuid.h:62,
                 from ./arch/x86/include/asm/processor.h:19,
                 from ./arch/x86/include/asm/timex.h:5,
                 from ./include/linux/timex.h:67,
                 from ./include/linux/time32.h:13,
                 from ./include/linux/time.h:60,
                 from ./include/linux/stat.h:19,
                 from ./include/linux/module.h:13,
                 from drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:8:
In function 'fortify_memcpy_chk',
    inlined from 'rvu_nix_blk_aq_enq_inst' at drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:969:4:
./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
  529 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function 'fortify_memcpy_chk',
    inlined from 'rvu_nix_blk_aq_enq_inst' at drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:984:4:
./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
  529 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Compile tested only!

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230329112356.458072-1-horms@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-30 23:27:58 -07:00
Nathan Chancellor
3292004c90 net: ethernet: ti: Fix format specifier in netcp_create_interface()
After commit 3948b05950 ("net: introduce a config option to tweak
MAX_SKB_FRAGS"), clang warns:

  drivers/net/ethernet/ti/netcp_core.c:2085:4: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
                          MAX_SKB_FRAGS);
                          ^~~~~~~~~~~~~
  include/linux/dev_printk.h:144:65: note: expanded from macro 'dev_err'
          dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), ##__VA_ARGS__)
                                                                 ~~~     ^~~~~~~~~~~
  include/linux/dev_printk.h:110:23: note: expanded from macro 'dev_printk_index_wrap'
                  _p_func(dev, fmt, ##__VA_ARGS__);                       \
                               ~~~    ^~~~~~~~~~~
  include/linux/skbuff.h:352:23: note: expanded from macro 'MAX_SKB_FRAGS'
  #define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
                        ^~~~~~~~~~~~~~~~~~~~
  ./include/generated/autoconf.h:11789:30: note: expanded from macro 'CONFIG_MAX_SKB_FRAGS'
  #define CONFIG_MAX_SKB_FRAGS 17
                               ^~
  1 warning generated.

Follow the pattern of the rest of the tree by changing the specifier to
'%u' and casting MAX_SKB_FRAGS explicitly to 'unsigned int', which
eliminates the warning.

Fixes: 3948b05950 ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230329-net-ethernet-ti-wformat-v1-1-83d0f799b553@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-30 23:21:04 -07:00
Tom Rix
9a865a98a3 net: ksz884x: remove unused change variable
clang with W=1 reports
drivers/net/ethernet/micrel/ksz884x.c:3216:6: error: variable
  'change' set but not used [-Werror,-Wunused-but-set-variable]
        int change = 0;
            ^
This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230329125929.1808420-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-30 23:18:48 -07:00
Jakub Kicinski
79548b7984 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/mediatek/mtk_ppe.c
  3fbe4d8c0e ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting")
  924531326e ("net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-30 14:43:03 -07:00
Felix Fietkau
924531326e net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
The cache needs to be flushed to ensure that the hardware stops offloading
the flow immediately.

Fixes: 33fc42de33 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-3-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-30 11:44:59 -07:00
Felix Fietkau
5f36ca1b84 net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
Check for skb metadata in order to detect the case where the DSA header
is not present.

Fixes: 2d7605a729 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-2-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-30 11:44:59 -07:00
Felix Fietkau
8c1cb87c2a net: ethernet: mtk_eth_soc: fix flow block refcounting logic
Since we call flow_block_cb_decref on FLOW_BLOCK_UNBIND, we also need to
call flow_block_cb_incref for a newly allocated cb.
Also fix the accidentally inverted refcount check on unbind.

Fixes: 502e84e238 ("net: ethernet: mtk_eth_soc: add flow offloading support")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230330120840.52079-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-30 11:44:59 -07:00
Russell King (Oracle)
2960a2d33b net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
Reported on the Turris forum, mvneta provokes kernel warnings in the
architecture DMA mapping code when mvneta_setup_txqs() fails to
allocate memory. This happens because when mvneta_cleanup_txqs() is
called in the mvneta_stop() path, we leave pointers in the structure
that have been freed.

Then on mvneta_open(), we call mvneta_setup_txqs(), which starts
allocating memory. On memory allocation failure, mvneta_cleanup_txqs()
will walk all the queues freeing any non-NULL pointers - which includes
pointers that were previously freed in mvneta_stop().

Fix this by setting these pointers to NULL to prevent double-freeing
of the same memory.

Fixes: 2adb719d74 ("net: mvneta: Implement software TSO")
Link: https://forum.turris.cz/t/random-kernel-exceptions-on-hbl-tos-7-0/18865/8
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1phUe5-00EieL-7q@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-30 11:43:39 -07:00
Jacob Keller
503d473c98 ice: remove comment about not supporting driver reinit
Since commit 31c8db2c4f ("ice: implement devlink reinit action"), the ice
driver does support driver re-initialization via devlink reload. Remove the
stale comment indicating that the driver lacks this support from the
ice_devlink_ops structure.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-30 09:35:07 -07:00
Wolfram Sang
da617cd8d9 smsc911x: remove superfluous variable init
phydev is assigned a value right away, no need to initialize it.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230329064414.25028-1-wsa+renesas@sang-engineering.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-30 15:35:33 +02:00
Jakub Kicinski
7079d5e61a mlx5-updates-2023-03-28
Dragos Tatulea says:
 ====================
 
 net/mlx5e: RX, Drop page_cache and fully use page_pool
 
 For page allocation on the rx path, the mlx5e driver has been using an
 internal page cache in tandem with the page pool. The internal page
 cache uses a queue for page recycling which has the issue of head of
 queue blocking.
 
 This patch series drops the internal page_cache altogether and uses the
 page_pool to implement everything that was done by the page_cache
 before:
 * Let the page_pool handle dma mapping and unmapping.
 * Use fragmented pages with fragment counter instead of tracking via
   page ref.
 * Enable skb recycling.
 
 The patch series has the following effects on the rx path:
 
 * Improved performance for the cases when there was low page recycling
   due to head of queue blocking in the internal page_cache. The test
   for this was running a single iperf TCP stream to a rx queue
   which is bound on the same cpu as the application.
 
   |-------------+--------+--------+------+---------|
   | rq type     | before | after  | unit |   diff  |
   |-------------+--------+--------+------+---------|
   | striding rq |  30.1  |  31.4  | Gbps |  4.14 % |
   | legacy rq   |  30.2  |  33.0  | Gbps |  8.48 % |
   |-------------+--------+--------+------+---------|
 
 * Small XDP performance degradation. The test was is XDP drop
   program running on a single rx queue with small packets incoming
   it looks like this:
 
   |-------------+----------+----------+------+---------|
   | rq type     | before   | after    | unit |   diff  |
   |-------------+----------+----------+------+---------|
   | striding rq | 19725449 | 18544617 | pps  | -6.37 % |
   | legacy rq   | 19879931 | 18631841 | pps  | -6.70 % |
   |-------------+----------+----------+------+---------|
 
   This will be handled in a different patch series by adding support for
   multi-packet per page.
 
 * For other cases the performance is roughly the same.
 
 The above numbers were obtained on the following system:
   24 core Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
   32 GB RAM
   ConnectX-7 single port
 
 The breakdown on the patch series is the following:
 * Preparations for introducing the mlx5e_frag_page struct.
 * Delete the mlx5e_page_cache struct.
 * Enable dma mapping from page_pool.
 * Enable skb recycling and fragment counting.
 * Do deferred release of pages (just before alloc) to ensure better
   page_pool cache utilization.
 
 ====================
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmQjUY8ACgkQSD+KveBX
 +j6tVAf/QHCbKgt9c2Q5EpFch2e4x3A/HfE7DbxTancIj0cc1bH98xd4wO574aE4
 PCJ/aJ+9zTLvTUgUnKDaiqonfmcsF7v6d/ltoLW1PTNnPqdsjsXpVy76dnL81SWy
 u/g7h68cfeMdMjAAoewyVv+k7GeTIZCsIdvik3dWGFQ67IpE1k5dLbO13YBNW/5m
 Cm39RzD55tjgxS8GHdyFYAV4MwgHy+pdhTYR9LGzH80hfd02KqsCO38u1NIShuez
 1rwjRF213Qdln20bMNSNiXG36JUV65mo+Q/XHKOEjB0qNKRcF5bzZovqHzP+R7QZ
 qhhhfce8c63UWpcXADP6k6qevW8+UA==
 =8F1t
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-03-28

Dragos Tatulea says:
====================

net/mlx5e: RX, Drop page_cache and fully use page_pool

For page allocation on the rx path, the mlx5e driver has been using an
internal page cache in tandem with the page pool. The internal page
cache uses a queue for page recycling which has the issue of head of
queue blocking.

This patch series drops the internal page_cache altogether and uses the
page_pool to implement everything that was done by the page_cache
before:
* Let the page_pool handle dma mapping and unmapping.
* Use fragmented pages with fragment counter instead of tracking via
  page ref.
* Enable skb recycling.

The patch series has the following effects on the rx path:

* Improved performance for the cases when there was low page recycling
  due to head of queue blocking in the internal page_cache. The test
  for this was running a single iperf TCP stream to a rx queue
  which is bound on the same cpu as the application.

  |-------------+--------+--------+------+---------|
  | rq type     | before | after  | unit |   diff  |
  |-------------+--------+--------+------+---------|
  | striding rq |  30.1  |  31.4  | Gbps |  4.14 % |
  | legacy rq   |  30.2  |  33.0  | Gbps |  8.48 % |
  |-------------+--------+--------+------+---------|

* Small XDP performance degradation. The test was is XDP drop
  program running on a single rx queue with small packets incoming
  it looks like this:

  |-------------+----------+----------+------+---------|
  | rq type     | before   | after    | unit |   diff  |
  |-------------+----------+----------+------+---------|
  | striding rq | 19725449 | 18544617 | pps  | -6.37 % |
  | legacy rq   | 19879931 | 18631841 | pps  | -6.70 % |
  |-------------+----------+----------+------+---------|

  This will be handled in a different patch series by adding support for
  multi-packet per page.

* For other cases the performance is roughly the same.

The above numbers were obtained on the following system:
  24 core Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
  32 GB RAM
  ConnectX-7 single port

The breakdown on the patch series is the following:
* Preparations for introducing the mlx5e_frag_page struct.
* Delete the mlx5e_page_cache struct.
* Enable dma mapping from page_pool.
* Enable skb recycling and fragment counting.
* Do deferred release of pages (just before alloc) to ensure better
  page_pool cache utilization.

====================

* tag 'mlx5-updates-2023-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats
  net/mlx5e: RX, Break the wqe bulk refill in smaller chunks
  net/mlx5e: RX, Increase WQE bulk size for legacy rq
  net/mlx5e: RX, Split off release path for xsk buffers for legacy rq
  net/mlx5e: RX, Defer page release in legacy rq for better recycling
  net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags
  net/mlx5e: RX, Defer page release in striding rq for better recycling
  net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name
  net/mlx5e: RX, Enable skb page recycling through the page_pool
  net/mlx5e: RX, Enable dma map and sync from page_pool allocator
  net/mlx5e: RX, Remove internal page_cache
  net/mlx5e: RX, Store SHAMPO header pages in array
  net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
  net/mlx5e: RX, Remove alloc unit layout constraint for legacy rq
  net/mlx5e: RX, Remove mlx5e_alloc_unit argument in page allocation
====================

Link: https://lore.kernel.org/r/20230328205623.142075-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-29 22:15:24 -07:00
Michael Chan
581bce7bcb bnxt_en: Add missing 200G link speed reporting
bnxt_fw_to_ethtool_speed() is missing the case statement for 200G
link speed reported by firmware.  As a result, ethtool will report
unknown speed when the firmware reports 200G link speed.

Fixes: 532262ba3b ("bnxt_en: ethtool: support PAM4 link speeds up to 200G")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-29 21:48:16 -07:00
Kalesh AP
62aad36ed3 bnxt_en: Fix typo in PCI id to device description string mapping
Fix 57502 and 57508 NPAR description string entries.  The typos
caused these devices to not match up with lspci output.

Fixes: 49c98421e6 ("bnxt_en: Add PCI IDs for 57500 series NPAR devices.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-29 21:48:16 -07:00
Kalesh AP
83714dc3db bnxt_en: Fix reporting of test result in ethtool selftest
When the selftest command fails, driver is not reporting the failure
by updating the "test->flags" when bnxt_close_nic() fails.

Fixes: eb51365846 ("bnxt_en: Add basic ethtool -t selftest support.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-29 21:48:16 -07:00
Radoslaw Tyl
c5cff16f46 i40e: fix registers dump after run ethtool adapter self test
Fix invalid registers dump from ethtool -d ethX after adapter self test
by ethtool -t ethY. It causes invalid data display.

The problem was caused by overwriting i40e_reg_list[].elements
which is common for ethtool self test and dump.

Fixes: 22dd9ae8af ("i40e: Rework register diagnostic")
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230328172659.3906413-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-29 21:47:31 -07:00
Jakub Kicinski
165d35159c Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-03-28 (ice)

This series contains updates to ice driver only.

Jesse fixes mismatched header documentation reported when building with
W=1.

Brett restricts setting of VSI context to only applicable fields for the
given ICE_AQ_VSI_PROP_Q_OPT_VALID bit.

Junfeng adds check when adding Flow Director filters that conflict with
existing filter rules.

Jakob Koschel adds interim variable for iterating to prevent possible
misuse after looping.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
  ice: add profile conflict check for AVF FDIR
  ice: Fix ice_cfg_rdma_fltr() to only update relevant fields
  ice: fix W=1 headers mismatch
====================

Link: https://lore.kernel.org/r/20230328172035.3904953-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-29 21:46:18 -07:00
Simon Horman
c5370374bb net: ena: removed unused tx_bytes variable
clang 16.0.0 with W=1 reports:

drivers/net/ethernet/amazon/ena/ena_netdev.c:1901:6: error: variable 'tx_bytes' set but not used [-Werror,-Wunused-but-set-variable]
        u32 tx_bytes = 0;

The variable is not used so remove it.

Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20230328151958.410687-1-horms@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-29 21:39:35 -07:00
Dan Carpenter
765f360464 octeon_ep: unlock the correct lock on error path
The h and the f letters are swapped so it unlocks the wrong lock.

Fixes: 577f0d1b1c ("octeon_ep: add separate mailbox command and response queues")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/251aa2a2-913e-4868-aac9-0a90fc3eeeda@kili.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-29 21:39:35 -07:00
Jakub Kicinski
8c49527084 bnx2x: use the right build_skb() helper
build_skb() no longer accepts slab buffers. Since slab use is fairly
uncommon we prefer the drivers to call a separate slab_build_skb()
function appropriately.

bnx2x uses the old semantics where size of 0 meant buffer from slab.
It sets the fp->rx_frag_size to 0 for MTUs which don't fit in a page.
It needs to call slab_build_skb().

This fixes the WARN_ONCE() of incorrect API use seen with bnx2x.

Reported-by: Thomas Voegtle <tv@lio96.de>
Link: https://lore.kernel.org/all/b8f295e4-ba57-8bfb-7d9c-9d62a498a727@lio96.de/
Fixes: ce098da149 ("skbuff: Introduce slab_build_skb()")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230329000013.2734957-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-29 21:29:31 -07:00
Hao Lan
3b064f541b net: hns3: support wake on lan configuration and query
The HNS3 driver supports Wake-on-LAN, which can wake up
the server from power off state to power on state by magic
packet or magic security packet.

ChangeLog:
v1->v2:
Deleted the debugfs function that overlaps with the ethtool function
from suggestion of Andrew Lunn.

v2->v3:
Return the wol configuration stored in driver,
suggested by Alexander H Duyck.

v3->v4:
Add a helper to go from netdev to the local struct,
suggested by Simon Horman and Jakub Kicinski.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-29 09:07:42 +01:00
Edward Cree
17654d84b4 sfc: add offloading of 'foreign' TC (decap) rules
A 'foreign' rule is one for which the net_dev is not the sfc netdevice
 or any of its representors.  The driver registers indirect flow blocks
 for tunnel netdevs so that it can offload decap rules.  For example:

    tc filter add dev vxlan0 parent ffff: protocol ipv4 flower \
        enc_src_ip 10.1.0.2 enc_dst_ip 10.1.0.1 \
        enc_key_id 1000 enc_dst_port 4789 \
        action tunnel_key unset \
        action mirred egress redirect dev $REPRESENTOR

When notified of a rule like this, register an encap match on the IP
 and dport tuple (creating an Outer Rule table entry) and insert an MAE
 action rule to perform the decapsulation and deliver to the representee.

Moved efx_tc_delete_rule() below efx_tc_flower_release_encap_match() to
 avoid the need for a forward declaration.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-29 09:06:08 +01:00
Edward Cree
746224cdef sfc: add code to register and unregister encap matches
Add a hashtable to detect duplicate and conflicting matches.  If match
 is not a duplicate, call MAE functions to add/remove it from OR table.
Calling code not added yet, so mark the new functions as unused.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-29 09:06:08 +01:00
Edward Cree
2245eb0086 sfc: add functions to insert encap matches into the MAE
An encap match corresponds to an entry in the exact-match Outer Rule
 table; the lookup response includes the encap type (protocol) allowing
 the hardware to continue parsing into the inner headers.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-29 09:06:08 +01:00
Edward Cree
b7f5e17b3b sfc: handle enc keys in efx_tc_flower_parse_match()
Translate the fields from flow dissector into struct efx_tc_match.
In efx_tc_flower_replace(), reject filters that match on them, because
 only 'foreign' filters (i.e. those for which the ingress dev is not
 the sfc netdev or any of its representors, e.g. a tunnel netdev) can
 use them.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-29 09:06:08 +01:00
Edward Cree
b9d5c9b7d8 sfc: add notion of match on enc keys to MAE machinery
Extend the MAE caps check to validate that the hardware supports these
 outer-header matches where used by the driver.
Extend efx_mae_populate_match_criteria() to fill in the outer rule ID
 and VNI match fields.
Nothing yet populates these match fields, nor creates outer rules.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-29 09:06:08 +01:00
Edward Cree
edd025ca08 sfc: document TC-to-EF100-MAE action translation concepts
Includes an explanation of the lifetime of the 'cursor' action-set `act`.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-29 09:06:08 +01:00
Kuniyuki Iwashima
8cdc3223e7 ipv6: Remove in6addr_any alternatives.
Some code defines the IPv6 wildcard address as a local variable and
use it with memcmp() or ipv6_addr_equal().

Let's use in6addr_any and ipv6_addr_any() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-29 08:22:52 +01:00
Jakub Kicinski
de7494524d mlx5-updates-2023-03-20
mlx5 dynamic msix
 
 This patch series adds support for dynamic msix vectors allocation in mlx5.
 
 Eli Cohen Says:
 
 ================
 
 The following series of patches modifies mlx5_core to work with the
 dynamic MSIX API. Currently, mlx5_core allocates all the interrupt
 vectors it needs and distributes them amongst the consumers. With the
 introduction of dynamic MSIX support, which allows for allocation of
 interrupts more than once, we now allocate vectors as we need them.
 This allows other drivers running on top of mlx5_core to allocate
 interrupt vectors for their own use. An example for this is mlx5_vdpa,
 which uses these vectors to propagate interrupts directly from the
 hardware to the vCPU [1].
 
 As a preparation for using this series, a use after free issue is fixed
 in lib/cpu_rmap.c and the allocator for rmap entries has been modified.
 A complementary API for irq_cpu_rmap_add() has also been introduced.
 
 [1] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/patch/?id=0f2bf1fcae96a83b8c5581854713c9fc3407556e
 
 ================
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmQeTIUACgkQSD+KveBX
 +j7oCQgAx9yNHM4BZD2UfIx/P+W13v1B+xOds04Vezl9JlakoqvviPxm3vvuKkl+
 j/8DdyoqMUbWV0j5XxgZ+GG91bc14jN1GQ+4fUf63SzA99vAGb9GJPV2aQt5roGh
 JmMqI2utDfoz+29qtQ+kVchY5AN5AoPXSQH2zkEZmJaPUjYb9Dr/4IayL0JaViAw
 S31QLHKkSJ8bL8Wc6Op1emNVV7eXs18f7IIjVs3sYOb3WJRPVpmdKneRqLgVYplf
 Td40Gwobl1elpjEqSSRTJI5YUSR8gcAJlBqIwHeJzFFpO3Pnciopl761osNKKs/a
 5ctES5DS6JHqqFGbWV1gKYcRMil3LA==
 =9i8l
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-03-20

mlx5 dynamic msix

This patch series adds support for dynamic msix vectors allocation in mlx5.

Eli Cohen Says:

================

The following series of patches modifies mlx5_core to work with the
dynamic MSIX API. Currently, mlx5_core allocates all the interrupt
vectors it needs and distributes them amongst the consumers. With the
introduction of dynamic MSIX support, which allows for allocation of
interrupts more than once, we now allocate vectors as we need them.
This allows other drivers running on top of mlx5_core to allocate
interrupt vectors for their own use. An example for this is mlx5_vdpa,
which uses these vectors to propagate interrupts directly from the
hardware to the vCPU [1].

As a preparation for using this series, a use after free issue is fixed
in lib/cpu_rmap.c and the allocator for rmap entries has been modified.
A complementary API for irq_cpu_rmap_add() has also been introduced.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/patch/?id=0f2bf1fcae96a83b8c5581854713c9fc3407556e

================

* tag 'mlx5-updates-2023-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Provide external API for allocating vectors
  net/mlx5: Use one completion vector if eth is disabled
  net/mlx5: Refactor calculation of required completion vectors
  net/mlx5: Move devlink registration before mlx5_load
  net/mlx5: Use dynamic msix vectors allocation
  net/mlx5: Refactor completion irq request/release code
  net/mlx5: Improve naming of pci function vectors
  net/mlx5: Use newer affinity descriptor
  net/mlx5: Modify struct mlx5_irq to use struct msi_map
  net/mlx5: Fix wrong comment
  net/mlx5e: Coding style fix, add empty line
  lib: cpu_rmap: Add irq_cpu_rmap_remove to complement irq_cpu_rmap_add
  lib: cpu_rmap: Use allocator for rmap entries
  lib: cpu_rmap: Avoid use after free on rmap->obj array entries
====================

Link: https://lore.kernel.org/r/20230324231341.29808-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-28 23:52:12 -07:00
Tom Rix
e48cefb9c8 net: ethernet: 8390: axnet_cs: remove unused xfer_count variable
clang with W=1 reports
drivers/net/ethernet/8390/axnet_cs.c:653:9: error: variable
  'xfer_count' set but not used [-Werror,-Wunused-but-set-variable]
    int xfer_count = count;
        ^
This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230327235423.1777590-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-28 23:48:02 -07:00
Felix Fietkau
07b3af42d8 net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links
Using the QDMA tx scheduler to throttle tx to line speed works fine for
switch ports, but apparently caused a regression on non-switch ports.

Based on a number of tests, it seems that this throttling can be safely
dropped without re-introducing the issues on switch ports that the
tx scheduling changes resolved.

Link: https://lore.kernel.org/netdev/trinity-92c3826f-c2c8-40af-8339-bc6d0d3ffea4-1678213958520@3c-app-gmx-bs16/
Fixes: f63959c7ee ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues")
Reported-by: Frank Wunderlich <frank-w@public-files.de>
Reported-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230324140404.95745-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-28 23:23:50 -07:00
Wolfram Sang
cdeccd13a0 Revert "sh_eth: remove open coded netif_running()"
This reverts commit ce1fdb0656. It turned
out this actually introduces a race condition. netif_running() is not a
suitable check for get_stats.

Reported-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230327152112.15635-1-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-28 19:23:32 -07:00
Saeed Mahameed
163c2c7059 net/mlx5e: Fix build break on 32bit
The cited commit caused the following build break in mlx5 due to a change
in size of MAX_SKB_FRAGS.

error: format '%lu' expects argument of type 'long unsigned int',
       but argument 7 has type 'unsigned int' [-Werror=format=]

Fix this by explicit casting.

Fixes: 3948b05950 ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230328200723.125122-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-28 16:18:59 -07:00
Dragos Tatulea
3905f8d64c net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats
The recycle parameter used during page release is no longer
necessary: the page pool can detect when the page cannot be
recycled to the cache or ring without any outside hint.

The page pool will also take care of cleaning up after itself
once all the inflight pages have been released. So no need to
explicitly release pages to the system.

Remove the internal page_cache stats as the mlx5e_page_cache
struct no longer exists.

Delete the documentation entries along with the stats.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:59 -07:00
Dragos Tatulea
cd640b0503 net/mlx5e: RX, Break the wqe bulk refill in smaller chunks
To avoid overflowing the page pool's cache, don't release the
whole bulk which is usually larger than the cache refill size.
Group release+alloc instead into cache refill units that
allow releasing to the cache and then allocating from the cache.

A refill_unit variable is added as a iteration unit over the
wqe_bulk when doing release+alloc.

For a single ring, single core, default MTU (1500) TCP stream
test the number of pages allocated from the cache directly
(rx_pp_recycle_cached) increases from 0% to 52%:

+---------------------------------------------+
| Page Pool stats (/sec)  |  Before |   After |
+-------------------------+---------+---------+
|rx_pp_alloc_fast         | 2145422 | 2193802 |
|rx_pp_alloc_slow         |       2 |       0 |
|rx_pp_alloc_empty        |       2 |       0 |
|rx_pp_alloc_refill       |   34059 |   16634 |
|rx_pp_alloc_waive        |       0 |       0 |
|rx_pp_recycle_cached     |       0 | 1145818 |
|rx_pp_recycle_cache_full |       0 |       0 |
|rx_pp_recycle_ring       | 2179361 | 1064616 |
|rx_pp_recycle_ring_full  |     121 |       0 |
+---------------------------------------------+

With this patch, the performance for legacy rq for the above test is
back to baseline.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:59 -07:00
Dragos Tatulea
4ba2b4988c net/mlx5e: RX, Increase WQE bulk size for legacy rq
Deferred page release was added to legacy rq but its desired effect
(driver releases last fragment to page pool cache) is not yet visible
due to the WQE bulks being too small.

This patch increases the WQE bulk size to span 512 KB and clip it to
one quarter of the rx queue size.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:59 -07:00
Dragos Tatulea
76238d0fbd net/mlx5e: RX, Split off release path for xsk buffers for legacy rq
Don't mix xsk buffer releases with page releases anymore. This is
needed for handling of deferred page release.

Add a new bulk free function for xsk buffers from wqe frags.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:59 -07:00
Dragos Tatulea
3f93f82988 net/mlx5e: RX, Defer page release in legacy rq for better recycling
Currently, fragmented pages from the page pool can be released
in two ways:

1) In the mlx5e driver when trimming off the unused fragments AND the
associated skb fragments have been released. This path allows
recycling of pages to the page pool cache (allow_direct == true).

2) On the skb release path (last fragment release), which
will always release pages to the page pool ring
(allow_direct == false).

Whichever is releasing the last fragment will be decisive on
where the page gets released: the cache or the ring. So we
obviously want to maximize for doing the release from 1.

This patch does that by deferring the release of page fragments
right before requesting new ones from the page pool. A flag is
added to make sure that there's no release before first alloc
and that XDP_TX fragments are not released prematurely.

This is a preparation patch that doesn't unlock the performance
improvements yet. A followup patch will do that.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:59 -07:00
Dragos Tatulea
625dff29df net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags
Change the bool flag to a bitfield as we'll use it in a downstream patch
in the series to add signaling about skipping a fragment release.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:58 -07:00
Dragos Tatulea
4c2a132368 net/mlx5e: RX, Defer page release in striding rq for better recycling
Currently, for striding RQ, fragmented pages from the page pool can
get released in two ways:

1) In the mlx5e driver when trimming off the unused fragments AND the
   associated skb fragments have been released. This path allows
   recycling of pages to the page pool cache (allow_direct == true).

2) On the skb release path (last fragment release), which
   will always release pages to the page pool ring
   (allow_direct == false).

Whichever is releasing the last fragment will be decisive on
where the page gets released: the cache or the ring. So we
obviously want to maximize for doing the release from 1.

This patch does that by deferring the release of page fragments
right before requesting new ones from the page pool. Extra care
needs to be taken for the corner cases:

* On first call, make sure that release is not called. The
  skip_release_bitmap is used for this purpose.

* On rq shutdown, make sure that all wqes that were not
  in the linked list are released.

For a single ring, single core, default MTU (1500) TCP stream
test the number of pages allocated from the cache directly
(rx_pp_recycle_cached) increases from 31 % to 98 %:

+----------------------------------------------+
| Page Pool stats (/sec)  |  Before |   After  |
+-------------------------+---------+----------+
|rx_pp_alloc_fast         | 2137754 |  2261033 |
|rx_pp_alloc_slow         |      47 |        9 |
|rx_pp_alloc_empty        |      47 |        9 |
|rx_pp_alloc_refill       |   23230 |      819 |
|rx_pp_alloc_waive        |       0 |        0 |
|rx_pp_recycle_cached     |  672182 |  2209015 |
|rx_pp_recycle_cache_full |    1789 |        0 |
|rx_pp_recycle_ring       | 1485848 |    52259 |
|rx_pp_recycle_ring_full  |    3003 |      584 |
+----------------------------------------------+

With this patch, the performance in striding rq for the above test is
back to baseline.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:58 -07:00
Dragos Tatulea
38a36efccd net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name
The xdp_xmit_bitmap currently serves only one purpose: to avoid
releasing pages that are still in use due to XDP TX.

A following patch will use this bitmap in a slightly different context
but for the same purpose. So rename the bitmap to a more generic name
that reflects the purpose not the context.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:58 -07:00
Dragos Tatulea
6f57428460 net/mlx5e: RX, Enable skb page recycling through the page_pool
Start using the page_pool skb recycling api to recycle all pages back to
the page pool and stop using atomic page reference counting.

The mlx5e driver used to manage in-flight pages using page refcounting:
for each fragment there were 2 atomic write operations happening (one
for building the skb and one on skb release).

The page_pool api introduced a method to track page fragments more
optimally:
* The page's pp_fragment_count is set to a large bias on page alloc
  (1 x atomic write operation).
* The driver tracks the actual page fragments in a non atomic variable.
* When the skb is recycled, pp_fragment_count is decremented
  (atomic write operation).
* When page is released in the driver, the unused number of fragments
  (relative to the bias) is deducted from pp_fragment_count (atomic
  write operation).
* Last page defragmentation will only be an atomic read.

So in total there are `number of fragments + 1` atomic write ops. As
opposed to previously: `2 * frags` atomic writes ops.

Pages are wrapped in a mlx5e_frag_page structure which also contains the
number of fragments. This makes it easy to count the fragments in the
driver.

This change brings performance improvements for the case when the old rx
page_cache had low recycling rates due to head of queue blocking. For a
iperf3 TCP test with a single stream, on a single core (iperf and receive
queue running on same core), the following improvements can be noticed:

* Striding rq:
  - before (net-next baseline): bitrate = 30.1 Gbits/sec
  - after                     : bitrate = 31.4 Gbits/sec (diff: 4.14 %)

* Legacy rq:
  - before (net-next baseline): bitrate = 30.2 Gbits/sec
  - after                     : bitrate = 33.0 Gbits/sec (diff: 8.48 %)

There are 2 temporary performance degradations introduced:

1) TCP streams that had a good recycling rate with the old page_cache
   have a degradation for both striding and linear rq. This is due to
   very low page pool cache recycling: the pages are released during skb
   recycle which will release pages to the page pool ring for safety.
   The following patches in this series will tackle this problem by
   deferring the page release in the driver to increase the
   chance of having pages recycled to the cache.

2) XDP performance is now lower (4-5 %) due to the higher number of
   atomic operations used for fragment management. But this opens the
   door for supporting multiple packets per page in XDP, which will
   bring a big gain.

Otherwise, performance is similar to baseline.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:58 -07:00
Dragos Tatulea
4a5c5e2500 net/mlx5e: RX, Enable dma map and sync from page_pool allocator
Remove driver dma mapping and unmapping of pages. Let the
page_pool api do it.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:58 -07:00
Dragos Tatulea
08c9b61b07 net/mlx5e: RX, Remove internal page_cache
This patch removes the internal rx page_cache and uses the generic
page_pool api only. It used to be that the page_pool couldn't handle all
the mlx5 driver usecases, but with the introduction of skb recycling and
page fragmentaton in the page_pool full switch can now be made. Some
benfits of this transition:
* Better page recycling in the cases when the page_cache was suffering
  from head of queue blocking. The page_pool doesn't have this issue.
* DMA mapping/unmapping can be managed by the page_pool.
* mlx5e_rq size reduced by more than 50% due to the page_cache array
  being deleted.

This patch only removes the page_cache. Downstream patches will enable
the required page_pool features and will add further fine-tuning.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:57 -07:00
Dragos Tatulea
ca6ef9f031 net/mlx5e: RX, Store SHAMPO header pages in array
Save allocated SHAMPO header pages to an array to which the
mlx5e_dma_info page will point to.

This change is a preparation for introducing mlx5e_frag_page structure
in a downstream patch. There's no new functionality introduced.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-28 13:43:57 -07:00