Currently, IRQ layer have a separate flow for ctrl and comp IRQs, and
the distinction between ctrl and comp IRQs is done in the IRQ layer.
In order to ease the coding and maintenance of the IRQ layer,
introduce a new API for requesting control IRQs -
mlx5_ctrl_irq_request(struct mlx5_core_dev *dev).
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
commit 077cdda764 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
commit 31108d142f ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
commit 4390c6edc0 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/
net/smc/smc_wr.c
commit 49dc9013e3 ("net/smc: Use the bitmap API when applicable")
commit 349d43127d ("net/smc: fix kernel panic caused by race of smc_sock")
bitmap_zero()/memset() is removed by the fix
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case IRQ layer failed to find or to request irq, the driver is
printing the first cpu of the provided affinity as part of the error
print. Empty affinity is a valid input for the IRQ layer, and it is
an error to call cpumask_first() on empty affinity.
Remove the first cpu print from the error message.
Fixes: c36326d38d ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Hard coded CPU (0 in our case) might be offline. Hence, use the first
online CPU instead.
Fixes: f891b7cdbd ("net/mlx5: Enable single IRQ for PCI Function")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Treat the string as an argument to avoid this.
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:482:5:
error: format string is not a string literal (potentially insecure)
name);
^~~~
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:2079:4:
error: format string is not a string literal (potentially insecure)
ptp_ch_stats_desc[i].format);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Prior to this patch the driver requires two IRQs to function properly,
one required IRQ for control and at least one required IRQ for IO.
This requirement can be relaxed to one as the driver now allows
sharing of IRQs, so control and IO EQs can share the same irq.
This is needed for high scale amount of VFs.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Control IRQ is the first IRQ vector. This complicates handling of
completion irqs as we need to offset them by one.
in the next patch, there are scenarios where completion and control EQs
will share the same irq. for example: functions with single IRQ. To ease
such scenarios, we shift control IRQ to the end of the irq array.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When setting number of completion EQs of the SF, consider number of
online CPUs.
Without this consideration, when number of online cpus are less than 8,
unnecessary 8 completion EQs are allocated.
Fixes: c36326d38d ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The maximum irq_index can be 2047, This means irq_name should have 4
characters reserve for the irq_index. Hence, increase it to 4.
Fixes: 3af26495a2 ("net/mlx5: Enlarge interrupt field in CREATE_EQ")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, all access to mlx5 IRQs are done undere a lock. Hance, there
isn't a reason to have kref in struct mlx5_irq.
Switch it to integer.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When MSI-X vectors allocated are not enough for SFs to have dedicated,
MSI-X, kernel log buffer has too many entries.
Hence only enable such log with debug level.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently irq->pool is set after the irq is insert to the xarray.
Set irq->pool before the irq is inserted to the xarray.
Fixes: 71e084e264 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Change order of functions in mlx5_irq_detach_nb() so it will be
a mirror of mlx5_irq_attach_nb.
Fixes: 71e084e264 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There is a spelling mistake in a mlx5_core_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Whenever users provided affinity for an EQ creation request, map the
EQ to a matching IRQ.
Matching IRQ=IRQ with the same affinity and type (completion/control) of
the EQ created.
This mapping is being done in agressive dedicated IRQ allocation scheme,
which described bellow.
First, we check whether there is a matching IRQ that his min threshold
is not exhausted.
- min_eqs_threshold = 3 for control EQ.
- min_eqs_threshold = 1 for completion EQ.
In case no matching IRQ was found, try to request a new IRQ.
In case we can't request a new IRQ, reuse least-used matching IRQ.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
SFs (Sub Functions) currently use IRQs from the global IRQ table their
parent Physical Function have. In order to better scale, we need to
allocate more IRQs and share them between different SFs.
Driver will maintain 3 separated irq pools:
1. A pool that serve the PF consumer (PF's netdev, rdma stacks), similar
to what the driver had before this patch. i.e, this pool will share irqs
between rdma and netev, and will keep the irq indexes and allocation
order. The last is important for PF netdev rmap (aRFS).
2. A pool of control IRQs for SFs. The size of this pool is the number
of SFs that can be created divided by SFS_PER_IRQ. This pool will serve
the control path EQs of the SFs.
3. A pool of completion data path IRQs for SFs transport queues. The
size of this pool is:
num_irqs_allocated - pf_pool_size - sf_ctrl_pool_size.
This pool will served netdev and rdma stacks. Moreover, rmap is not
supported on SFs.
Sharing methodology of the SFs pools is explained in the next patch.
Important note: rmap is not supported on SFs because rmap mapping cannot
function correctly for IRQs that are shared for different core/netdev RX
rings.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Store newly created IRQs in the xarray DB instead of a static array,
so we will be able to store only IRQs which are being used.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
IRQs are being simplified in order to ease their sharing and any feature
specific object will be moved to upper layer.
Hence we move rmap object into eq_table.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Extend mlx5_irq_request so that IRQs will be requested upon EQ creation,
and not on driver boot.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In next patches, IRQs will be requested according to demand, instead of
statically on driver boot.
Also, currently, rmap is managed by the IRQ layer. rmap management will
move out from the IRQ layer in future patches.
Therefore, we want to remove the IRQ from the rmap, when IRQ is destroyed,
instead of removing all the IRQs from the rmap when irq_table is destroyed.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The users of EQ are running their code on different CPUs and with
various affinity patterns. Move the cpumask setting close to their
actual usage.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Introduce new API that will allow IRQs users to hold a pointer to
mlx5_irq.
In the end of this series, IRQs will be allocated on demand. Hence,
this will allow us to properly manage and use IRQs.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Shared IRQ are consumed by multiple EQ users and in order to properly
initialize and later release such IRQs, we add kref counting of IRQ
structure.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
During driver probe of device that has dynamic MSI-X feature enabled,
the following error is printed in some FW flavour (not released yet).
mlx5_core 0000:06:00.0: firmware version: 4.7.4387
mlx5_core 0000:06:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
mlx5_core 0000:06:00.0: mlx5_cmd_check:777:(pid 70599): SET_HCA_CAP(0x109) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x0)
mlx5_core 0000:06:00.0: set_hca_cap:622:(pid 70599): handle_hca_cap failed
mlx5_core 0000:06:00.0: mlx5_function_setup:1045:(pid 70599): set_hca_cap failed
mlx5_core 0000:06:00.0: probe_one:1465:(pid 70599): mlx5_init_one failed with error code -22
mlx5_core: probe of 0000:06:00.0 failed with error -22
In order to make the setting capability of MSI-X future proof, let's
query the current capabilities first.
Fixes: 604774add5 ("net/mlx5: Dynamically assign MSI-X vectors count")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
void function return statements are not generally useful.
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The number of MSI-X vectors is a PCI property visible through lspci. The
field is read-only and configured by the device. The mlx5 devices work in
a static or dynamic assignment mode.
Static assignment means that all newly created VFs have a preset number of
MSI-X vectors determined by device configuration parameters. This can
result in some VFs having too many or too few MSI-X vectors. Till now this
has been the only means of fine-tuning the MSI-X vector count and it was
acceptable for small numbers of VFs.
With dynamic assignment the inefficiency of having a fixed number of MSI-X
vectors can be avoided with each VF having exactly the required
vectors. Userspace will provide this information while provisioning the VF
for use, based on the intended use. For instance if being used with a VM,
the MSI-X vector count might be matched to the CPU count of the VM.
For compatibility mlx5 continues to start up with MSI-X vector assignment,
but the kernel can now access a larger dynamic vector pool and assign more
vectors to created VFs.
Link: https://lore.kernel.org/linux-pci/20210314124256.70253-4-leon@kernel.org
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Add auxiliary device driver for mlx5 subfunction auxiliary device.
A mlx5 subfunction is similar to PCI PF and VF. For a subfunction
an auxiliary device is created.
As a result, when mlx5 SF auxiliary device binds to the driver,
its netdev and rdma device are created, they appear as
$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4
$ ls -l /sys/class/net/eth1/device
/sys/class/net/eth1/device -> ../../../mlx5_core.sf.4
$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88
$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4
$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
$ rdma link show mlx5_0/1
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
$ rdma dev show
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
In future, devlink device instance name will adapt to have sfnum
annotation using either an alias or as devlink instance name described
in RFC [1].
[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The patch modifies the IRQ allocation so that all async EQs are
assigned to the same IRQ resulting in more available IRQs for
completion EQs.
The changes are using the support for IRQ sharing and EQ polling budget
that was introduced in previous patches so when the shared interrupt is
triggered, the kernel will serially call the handler of each of the
sharing EQs with a certain budget of EQEs to poll in order to prevent
starvation.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
struct mlx5_irq_info is an active object and not just info.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>