Commit graph

10409 commits

Author SHA1 Message Date
Rahul Rameshbabu
1b2bd0c026 net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors
Treat the operation as an error case when the return value is equivalent to
the size of the name buffer. Failed to write null terminator to the name
buffer, making the string malformed and should not be used. Provide a
string with only the firmware version when forming the string with the
board id fails. This logic for representors is identical to normal flow
with ethtool.

Without check, will trigger -Wformat-truncation with W=1.

    drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: In function 'mlx5e_rep_get_drvinfo':
    drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:78:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=]
      78 |                  "%d.%d.%04d (%.16s)",
         |                               ^~~~~
    drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:77:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32
      77 |         snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      78 |                  "%d.%d.%04d (%.16s)",
         |                  ~~~~~~~~~~~~~~~~~~~~~
      79 |                  fw_rev_maj(mdev), fw_rev_min(mdev),
         |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      80 |                  fw_rev_sub(mdev), mdev->board_id);
         |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: cf83c8fdcd ("net/mlx5e: Add missing ethtool driver info for representors")
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-16-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:31 -08:00
Rahul Rameshbabu
41e63c2baa net/mlx5e: Check return value of snprintf writing to fw_version buffer
Treat the operation as an error case when the return value is equivalent to
the size of the name buffer. Failed to write null terminator to the name
buffer, making the string malformed and should not be used. Provide a
string with only the firmware version when forming the string with the
board id fails.

Without check, will trigger -Wformat-truncation with W=1.

    drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: In function 'mlx5e_ethtool_get_drvinfo':
    drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:49:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=]
      49 |                  "%d.%d.%04d (%.16s)",
         |                               ^~~~~
    drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:48:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32
      48 |         snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      49 |                  "%d.%d.%04d (%.16s)",
         |                  ~~~~~~~~~~~~~~~~~~~~~
      50 |                  fw_rev_maj(mdev), fw_rev_min(mdev), fw_rev_sub(mdev),
         |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      51 |                  mdev->board_id);
         |                  ~~~~~~~~~~~~~~~

Fixes: 84e11edb71 ("net/mlx5e: Show board id in ethtool driver information")
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:31 -08:00
Saeed Mahameed
dce9414284 net/mlx5e: Reduce the size of icosq_str
icosq_str size is unnecessarily too long, and it causes a build warning
-Wformat-truncation with W=1. Looking closely, It doesn't need to be 255B,
hence this patch reduces the size to 32B which should be more than enough
to host the string: "ICOSQ: 0x%x, ".

While here, add a missing space in the formatted string.

This fixes the following build warning:

$ KCFLAGS='-Wall -Werror'
$ make O=/tmp/kbuild/linux W=1 -s -j12 drivers/net/ethernet/mellanox/mlx5/core/

drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c: In function 'mlx5e_reporter_rx_timeout':
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:718:56:
error: ', CQ: 0x' directive output may be truncated writing 8 bytes into a region of size between 0 and 255 [-Werror=format-truncation=]
  718 |                  "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x",
      |                                                        ^~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:717:9: note: 'snprintf' output between 43 and 322 bytes into a destination of size 288
  717 |         snprintf(err_str, sizeof(err_str),
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  718 |                  "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x",
      |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  719 |                  rq->ix, icosq_str, rq->rqn, rq->cq.mcq.cqn);
      |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 521f31af00 ("net/mlx5e: Allow RQ outside of channel context")
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-14-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:31 -08:00
Rahul Rameshbabu
3338bebfc2 net/mlx5: Increase size of irq name buffer
Without increased buffer size, will trigger -Wformat-truncation with W=1
for the snprintf operation writing to the buffer.

    drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c: In function 'mlx5_irq_alloc':
    drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:296:7: error: '@pci:' directive output may be truncated writing 5 bytes into a region of size between 1 and 32 [-Werror=format-truncation=]
      296 |    "%s@pci:%s", name, pci_name(dev->pdev));
          |       ^~~~~
    drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:295:2: note: 'snprintf' output 6 or more bytes (assuming 37) into a destination of size 32
      295 |  snprintf(irq->name, MLX5_MAX_IRQ_NAME,
          |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      296 |    "%s@pci:%s", name, pci_name(dev->pdev));
          |    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: ada9f5d007 ("IB/mlx5: Fix eq names to display nicely in /proc/interrupts")
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-13-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:31 -08:00
Rahul Rameshbabu
92214be597 net/mlx5e: Update doorbell for port timestamping CQ before the software counter
Previously, mlx5e_ptp_poll_ts_cq would update the device doorbell with the
incremented consumer index after the relevant software counters in the
kernel were updated. In the mlx5e_sq_xmit_wqe context, this would lead to
either overrunning the device CQ or exceeding the expected software buffer
size in the device CQ if the device CQ size was greater than the software
buffer size. Update the relevant software counter only after updating the
device CQ consumer index in the port timestamping napi_poll context.

Log:
    mlx5_core 0000:08:00.0: cq_err_event_notifier:517:(pid 0): CQ error on CQN 0x487, syndrome 0x1
    mlx5_core 0000:08:00.0 eth2: mlx5e_cq_error_event: cqn=0x000487 event=0x04

Fixes: 1880bc4e4a ("net/mlx5e: Add TX port timestamp support")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-12-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:31 -08:00
Rahul Rameshbabu
7e3f3ba97e net/mlx5e: Track xmit submission to PTP WQ after populating metadata map
Ensure the skb is available in metadata mapping to skbs before tracking the
metadata index for detecting undelivered CQEs. If the metadata index is put
in the tracking list before putting the skb in the map, the metadata index
might be used for detecting undelivered CQEs before the relevant skb is
available in the map, which can lead to a null-ptr-deref.

Log:
    general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN
    KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
    CPU: 0 PID: 1243 Comm: kworker/0:2 Not tainted 6.6.0-rc4+ #108
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    Workqueue: events mlx5e_rx_dim_work [mlx5_core]
    RIP: 0010:mlx5e_ptp_napi_poll+0x9a4/0x2290 [mlx5_core]
    Code: 8c 24 38 cc ff ff 4c 8d 3c c1 4c 89 f9 48 c1 e9 03 42 80 3c 31 00 0f 85 97 0f 00 00 4d 8b 3f 49 8d 7f 28 48 89 f9 48 c1 e9 03 <42> 80 3c 31 00 0f 85 8b 0f 00 00 49 8b 47 28 48 85 c0 0f 84 05 07
    RSP: 0018:ffff8884d3c09c88 EFLAGS: 00010206
    RAX: 0000000000000069 RBX: ffff8881160349d8 RCX: 0000000000000005
    RDX: ffffed10218f48cf RSI: 0000000000000004 RDI: 0000000000000028
    RBP: ffff888122707700 R08: 0000000000000001 R09: ffffed109a781383
    R10: 0000000000000003 R11: 0000000000000003 R12: ffff88810c7a7a40
    R13: ffff888122707700 R14: dffffc0000000000 R15: 0000000000000000
    FS:  0000000000000000(0000) GS:ffff8884d3c00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f4f878dd6e0 CR3: 000000014d108002 CR4: 0000000000370eb0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    <IRQ>
    ? die_addr+0x3c/0xa0
    ? exc_general_protection+0x144/0x210
    ? asm_exc_general_protection+0x22/0x30
    ? mlx5e_ptp_napi_poll+0x9a4/0x2290 [mlx5_core]
    ? mlx5e_ptp_napi_poll+0x8f6/0x2290 [mlx5_core]
    __napi_poll.constprop.0+0xa4/0x580
    net_rx_action+0x460/0xb80
    ? _raw_spin_unlock_irqrestore+0x32/0x60
    ? __napi_poll.constprop.0+0x580/0x580
    ? tasklet_action_common.isra.0+0x2ef/0x760
    __do_softirq+0x26c/0x827
    irq_exit_rcu+0xc2/0x100
    common_interrupt+0x7f/0xa0
    </IRQ>
    <TASK>
    asm_common_interrupt+0x22/0x40
    RIP: 0010:__kmem_cache_alloc_node+0xb/0x330
    Code: 41 5d 41 5e 41 5f c3 8b 44 24 14 8b 4c 24 10 09 c8 eb d5 e8 b7 43 ca 01 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 <41> 56 41 89 d6 41 55 41 89 f5 41 54 49 89 fc 53 48 83 e4 f0 48 83
    RSP: 0018:ffff88812c4079c0 EFLAGS: 00000246
    RAX: 1ffffffff083c7fe RBX: ffff888100042dc0 RCX: 0000000000000218
    RDX: 00000000ffffffff RSI: 0000000000000dc0 RDI: ffff888100042dc0
    RBP: ffff88812c4079c8 R08: ffffffffa0289f96 R09: ffffed1025880ea9
    R10: ffff888138839f80 R11: 0000000000000002 R12: 0000000000000dc0
    R13: 0000000000000100 R14: 000000000000008c R15: ffff8881271fc450
    ? cmd_exec+0x796/0x2200 [mlx5_core]
    kmalloc_trace+0x26/0xc0
    cmd_exec+0x796/0x2200 [mlx5_core]
    mlx5_cmd_do+0x22/0xc0 [mlx5_core]
    mlx5_cmd_exec+0x17/0x30 [mlx5_core]
    mlx5_core_modify_cq_moderation+0x139/0x1b0 [mlx5_core]
    ? mlx5_add_cq_to_tasklet+0x280/0x280 [mlx5_core]
    ? lockdep_set_lock_cmp_fn+0x190/0x190
    ? process_one_work+0x659/0x1220
    mlx5e_rx_dim_work+0x9d/0x100 [mlx5_core]
    process_one_work+0x730/0x1220
    ? lockdep_hardirqs_on_prepare+0x400/0x400
    ? max_active_store+0xf0/0xf0
    ? assign_work+0x168/0x240
    worker_thread+0x70f/0x12d0
    ? __kthread_parkme+0xd1/0x1d0
    ? process_one_work+0x1220/0x1220
    kthread+0x2d9/0x3b0
    ? kthread_complete_and_exit+0x20/0x20
    ret_from_fork+0x2d/0x70
    ? kthread_complete_and_exit+0x20/0x20
    ret_from_fork_asm+0x11/0x20
    </TASK>
    Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_ib ib_uverbs ib_core zram zsmalloc mlx5_core fuse
    ---[ end trace 0000000000000000 ]---

Fixes: 3178308ad4 ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-11-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:31 -08:00
Rahul Rameshbabu
64f14d16ee net/mlx5e: Avoid referencing skb after free-ing in drop path of mlx5e_sq_xmit_wqe
When SQ is a port timestamping SQ for PTP, do not access tx flags of skb
after free-ing the skb. Free the skb only after all references that depend
on it have been handled in the dropped WQE path.

Fixes: 3178308ad4 ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-10-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:31 -08:00
Jianbo Liu
bdf788cf22 net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offload
As IPSec packet offload in switchdev mode is not supported with LAG,
it's unnecessary to modify those sent-to-vport rules to the peer eswitch.

Fixes: c6c2bf5db4 ("net/mlx5e: Support IPsec packet offload for TX in switchdev mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:31 -08:00
Vlad Buslov
0c101a23ca net/mlx5e: Fix pedit endianness
Referenced commit addressed endianness issue in mlx5 pedit implementation
in ad hoc manner instead of systematically treating integer values
according to their types which left pedit fields of sizes not equal to 4
and where the bytes being modified are not least significant ones broken on
big endian machines since wrong bits will be consumed during parsing which
leads to following example error when applying pedit to source and
destination MAC addresses:

[Wed Oct 18 12:52:42 2023] mlx5_core 0001:00:00.1 p1v3_r: attempt to offload an unsupported field (cmd 0)
[Wed Oct 18 12:52:42 2023] mask: 00000000330c5b68: 00 00 00 00 ff ff 00 00 00 00 ff ff 00 00 00 00  ................
[Wed Oct 18 12:52:42 2023] mask: 0000000017d22fd9: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[Wed Oct 18 12:52:42 2023] mask: 000000008186d717: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[Wed Oct 18 12:52:42 2023] mask: 0000000029eb6149: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[Wed Oct 18 12:52:42 2023] mask: 000000007ed103e4: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[Wed Oct 18 12:52:42 2023] mask: 00000000db8101a6: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[Wed Oct 18 12:52:42 2023] mask: 00000000ec3c08a9: 00 00 00 00 00 00 00 00 00 00 00 00              ............

Treat masks and values of pedit and filter match as network byte order,
refactor pointers to them to void pointers instead of confusing u32
pointers and only cast to pointer-to-integer when reading a value from
them. Treat pedit mlx5_fields->field_mask as host byte order according to
its type u32, change the constants in fields array accordingly.

Fixes: 82198d8bcd ("net/mlx5e: Fix endianness when calculating pedit mask first bit")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:31 -08:00
Gavin Li
3a4aa3cb83 net/mlx5e: fix double free of encap_header in update funcs
Follow up to the previous patch to fix the same issue for
mlx5e_tc_tun_update_header_ipv4{6} when mlx5_packet_reformat_alloc()
fails.

When mlx5_packet_reformat_alloc() fails, the encap_header allocated in
mlx5e_tc_tun_update_header_ipv4{6} will be released within it. However,
e->encap_header is already set to the previously freed encap_header
before mlx5_packet_reformat_alloc(). As a result, the later
mlx5e_encap_put() will free e->encap_header again, causing a double free
issue.

mlx5e_encap_put()
     --> mlx5e_encap_dealloc()
         --> kfree(e->encap_header)

This patch fix it by not setting e->encap_header until
mlx5_packet_reformat_alloc() success.

Fixes: a54e20b4fc ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-7-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:31 -08:00
Dust Li
6f9b1a0731 net/mlx5e: fix double free of encap_header
When mlx5_packet_reformat_alloc() fails, the encap_header allocated in
mlx5e_tc_tun_create_header_ipv4{6} will be released within it. However,
e->encap_header is already set to the previously freed encap_header
before mlx5_packet_reformat_alloc(). As a result, the later
mlx5e_encap_put() will free e->encap_header again, causing a double free
issue.

mlx5e_encap_put()
    --> mlx5e_encap_dealloc()
        --> kfree(e->encap_header)

This happens when cmd: MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT fail.

This patch fix it by not setting e->encap_header until
mlx5_packet_reformat_alloc() success.

Fixes: d589e785ba ("net/mlx5e: Allow concurrent creation of encap entries")
Reported-by: Cruz Zhao <cruzzhao@linux.alibaba.com>
Reported-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:25 -08:00
Rahul Rameshbabu
fd64fd13c4 net/mlx5: Decouple PHC .adjtime and .adjphase implementations
When running a phase adjustment operation, the free running clock should
not be modified at all. The phase control keyword is intended to trigger an
internal servo on the device that will converge to the provided delta. A
free running counter cannot implement phase adjustment.

Fixes: 8e11a68e2e ("net/mlx5: Add adjphase function to support hardware-only offset control")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:09 -08:00
Erez Shitrit
ad4d82c3ea net/mlx5: DR, Allow old devices to use multi destination FTE
The current check isn't aware of old devices that don't have the
relevant FW capability. This patch allows multi destination FTE
in old cards, as it was before this check.

Fixes: f6f46e7173 ("net/mlx5: DR, Add check for multi destination FTE")
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:09 -08:00
Maher Sanalla
7d2f74d1d4 net/mlx5: Free used cpus mask when an IRQ is released
Each EQ table maintains a cpumask of the already used CPUs that are mapped
to IRQs to ensure that each IRQ gets mapped to a unique CPU.

However, on IRQ release, the said cpumask is not updated by clearing the
CPU from the mask to allow future IRQ request, causing the following
error when a SF is reloaded after it has utilized all CPUs for its IRQs:

mlx5_irq_affinity_request:135:(pid 306010): Didn't find a matching IRQ.
err = -28

Thus, when releasing an IRQ, clear its mapped CPU from the used CPUs
mask, to prevent the case described above.

While at it, move the used cpumask update to the EQ layer as it is more
fitting and preserves symmetricity of the IRQ request/release API.

Fixes: a1772de78d ("net/mlx5: Refactor completion IRQ request/release API")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:09 -08:00
Itamar Gozlan
df3aafe501 Revert "net/mlx5: DR, Supporting inline WQE when possible"
This reverts commit 95c337cce0.
The revert is required due to the suspicion it cause some tests
fail and will be moved to further investigation.

Fixes: 95c337cce0 ("net/mlx5: DR, Supporting inline WQE when possible")
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15 22:34:08 -08:00
Linus Torvalds
43468456c9 RDMA for v6.7
The usual collection of patches:
 
 - Bug fixes for hns, mlx5, and hfi1
 
 - Hardening patches for size_*, counted_by, strscpy
 
 - rts fixes from static analysis
 
 - Dump SRQ objects in rdma netlink, with hns support
 
 - Fix a performance regression in mlx5 MR deregistration
 
 - New XDR (200Gb/lane) link speed
 
 - SRQ record doorbell latency optimization for hns
 
 - IPSEC support for mlx5 multi-port mode
 
 - ibv_rereg_mr() support for irdma
 
 - Affiliated event support for bnxt_re
 
 - Opt out for the spec compliant qkey security enforcement as we
   discovered SW that breaks under enforcement
 
 - Comment and trivial updates
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZUEvXwAKCRCFwuHvBreF
 YQ/iAPkBl7vCH+oFGJKnh9/pUjKrWwRYBkGNgNesH7lX4Q/cIAEA+R3bAI2d4O4i
 tMrRRowTnxohbVRebuy4Pfsoo6m9dA4=
 =TxMJ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Nothing exciting this cycle, most of the diffstat is changing SPDX
  'or' to 'OR'.

  Summary:

   - Bugfixes for hns, mlx5, and hfi1

   - Hardening patches for size_*, counted_by, strscpy

   - rts fixes from static analysis

   - Dump SRQ objects in rdma netlink, with hns support

   - Fix a performance regression in mlx5 MR deregistration

   - New XDR (200Gb/lane) link speed

   - SRQ record doorbell latency optimization for hns

   - IPSEC support for mlx5 multi-port mode

   - ibv_rereg_mr() support for irdma

   - Affiliated event support for bnxt_re

   - Opt out for the spec compliant qkey security enforcement as we
     discovered SW that breaks under enforcement

   - Comment and trivial updates"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (50 commits)
  IB/mlx5: Fix init stage error handling to avoid double free of same QP and UAF
  RDMA/mlx5: Fix mkey cache WQ flush
  RDMA/hfi1: Workaround truncation compilation error
  IB/hfi1: Fix potential deadlock on &irq_src_lock and &dd->uctxt_lock
  RDMA/core: Remove NULL check before dev_{put, hold}
  RDMA/hfi1: Remove redundant assignment to pointer ppd
  RDMA/mlx5: Change the key being sent for MPV device affiliation
  RDMA/bnxt_re: Fix clang -Wimplicit-fallthrough in bnxt_re_handle_cq_async_error()
  RDMA/hns: Fix init failure of RoCE VF and HIP08
  RDMA/hns: Fix unnecessary port_num transition in HW stats allocation
  RDMA/hns: The UD mode can only be configured with DCQCN
  RDMA/hns: Add check for SL
  RDMA/hns: Fix signed-unsigned mixed comparisons
  RDMA/hns: Fix uninitialized ucmd in hns_roce_create_qp_common()
  RDMA/hns: Fix printing level of asynchronous events
  RDMA/core: Add support to set privileged QKEY parameter
  RDMA/bnxt_re: Do not report SRQ error in srq notification
  RDMA/bnxt_re: Report async events and errors
  RDMA/bnxt_re: Update HW interface headers
  IB/mlx5: Fix rdma counter binding for RAW QP
  ...
2023-11-02 15:20:30 -10:00
Jason Gunthorpe
162e348024 Merge tag 'v6.6' into rdma.git for-next
Resolve conflict by taking the spin_lock hunk from for-next:

 https://lore.kernel.org/r/20230928113851.5197a1ec@canb.auug.org.au

Required for the next patch.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-31 10:54:48 -03:00
Przemek Kitszel
5af8d8ce64 net/mlx5: fix uninit value use
Avoid use of uninitialized state variable.

In case of mlx5e_tx_reporter_build_diagnose_output_sq_common() it's better
to still collect other data than bail out entirely.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/netdev/8bd30131-c9f2-4075-a575-7fa2793a1760@moroto.mountain
Fixes: d17f98bf7c ("net/mlx5: devlink health: use retained error fmsg API")
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://lore.kernel.org/r/20231025145050.36114-1-przemyslaw.kitszel@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26 15:47:18 -07:00
Yunsheng Lin
09d96ee567 page_pool: remove PP_FLAG_PAGE_FRAG
PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count
handling is unified and page_pool_alloc_frag() is supported
in 32-bit arch with 64-bit DMA, so remove it.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-3-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-23 19:14:48 -07:00
Petr Machata
b46c1f3f5e mlxsw: spectrum: Set SW LAG mode on Spectrum>1
On Spectrum-2, Spectrum-3 and Spectrum-4 machines, request SW
responsibility for placement of the LAG table.

On Spectrum-1, some FW versions claim to support lag_mode field despite
quietly ignoring any settings made to that field. Thus refrain from
attempting to configure lag_mode on those systems at all.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:50 +01:00
Petr Machata
c678972580 mlxsw: spectrum: Allocate LAG table when in SW LAG mode
In this patch, if the LAG mode is SW, allocate the LAG table and configure
SGCR to indicate where it was allocated.

We use the default "DDD" (for dynamic data duplication) layout of the LAG
table. In the DDD mode, the membership information for each LAG is copied
in 8 PGT entries. This is done for performance reasons. The LAG table then
needs to be allocated on an address aligned to 8. Deal with this by
moving the LAG init ahead so that the LAG table is allocated at address 0.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:50 +01:00
Petr Machata
8c893abd64 mlxsw: spectrum_pgt: Generalize PGT allocation
PGT blocks are allocated through the function
mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows
which piece of PGT exactly they want to get. That was fine while the FID
code was the only client allocating blocks of PGT. However for SW-allocated
LAG table, there will be an additional client: mlxsw_sp_lag_init(). The
interface should therefore be changed to not require particular
coordinates, but to take just the requested size, allocate the block
wherever, and give back the PGT address.

In this patch, change the interface accordingly. Initialize FID family's
pgt_base from the result of the PGT allocation (note that mlxsw makes a
copy of the family structure, so what gets initialized is not actually the
global structure). Drop the now-unnecessary pgt_base initializations and
the corresponding defines.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:50 +01:00
Petr Machata
f5e293f993 mlxsw: spectrum_fid: Allocate PGT for the whole FID family in one go
PGT blocks are allocated through the function
mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows
which piece of PGT exactly they want to get. That was fine while the FID
code was the only client allocating blocks of PGT. However for SW-allocated
LAG table, there will be an additional client: mlxsw_sp_lag_init(). The
interface should therefore be changed to not require particular
coordinates, but to take just the requested size, allocate the block
wherever, and give back the PGT address.

The current FID mode has one place where PGT address can be stored: the FID
family's pgt_base. The allocation scheme should therefore be changed from
allocating a block per FID flood table, to allocating a block per FID
family.

Do just that in this patch.

The per-family allocation is going to be useful for another related feature
as well: the CFF mode.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:50 +01:00
Petr Machata
daee7aaba8 mlxsw: pci: Permit toggling LAG mode
Add to struct mlxsw_config_profile a field lag_mode_prefer_sw for the
driver to indicate that SW LAG mode should be configured if possible. Add
to the PCI module code to set lag_mode as appropriate.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
b2e9b1fe8c mlxsw: core, pci: Add plumbing related to LAG mode
lag_mode describes where the responsibility for LAG table placement lies:
SW or FW. The bus module determines whether LAG is supported, can configure
it if it is, and knows what (if any) configuration has been applied.
Therefore add a bus callback to determine the configured LAG mode. Also add
to core an API to query it.

The LAG mode is for now kept at the default value of 0 for FW-managed. The
code to actually toggle it will be added later.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
8eabd10cdc mlxsw: cmd: Add QUERY_FW.lag_mode_support
Add QUERY_FW.lag_mode_support, which determines whether
CONFIG_PROFILE.lag_mode is available.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
eb26a59232 mlxsw: cmd: Add CONFIG_PROFILE.{set_, }lag_mode
Add CONFIG_PROFILE.lag_mode, which serves for moving responsibility for
placement of the LAG table from FW to SW. Whether lag_mode should be
configured is determined by CONFIG_PROFILE.set_lag_mode, which also add.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
be9ed47d3f mlxsw: cmd: Fix omissions in CONFIG_PROFILE field names in comments
A number of CONFIG_PROFILE fields' comments refer to a field named like
cmd_mbox_config_* instead of cmd_mbox_config_profile_*. Correct these
omissions.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
cf0a86e8ce mlxsw: reg: Add SGCR.lag_lookup_pgt_base
Add SGCR.lag_lookup_pgt_base, which is used for configuring the base
address of the LAG table within the PGT table for cases when the driver
is responsible for the table placement.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
66eaaa8541 mlxsw: reg: Drop SGCR.llb
SGCR, Switch General Configuration Register, has not been used since commit
b0d80c013b ("mlxsw: Remove Mellanox SwitchX-2 ASIC support"). We will
need the register again shortly, so instead of dropping it and
reintroducing again, just drop the sole unused field.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:48 +01:00
Przemek Kitszel
d17f98bf7c net/mlx5: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:50 +01:00
Przemek Kitszel
1d434b4849 mlxsw: core: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:50 +01:00
Jakub Kicinski
041c3466f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

net/mac80211/key.c
  02e0e426a2 ("wifi: mac80211: fix error path key leak")
  2a8b665e6b ("wifi: mac80211: remove key_mtx")
  7d6904bf26 ("Merge wireless into wireless-next")
https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/ti/Kconfig
  a602ee3176 ("net: ethernet: ti: Fix mixed module-builtin object")
  98bdeae950 ("net: cpmac: remove driver to prepare for platform removal")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 13:29:01 -07:00
Leon Romanovsky
627aa13921 net/mlx5e: Allow IPsec soft/hard limits in bytes
Actually the mlx5 code already has needed support to allow users
to configure soft/hard limits in bytes. It is possible due to the
situation with TX path, where CX7 devices are missing hardware
implementation to send events to the software, see commit b2f7b01d36
("net/mlx5e: Simulate missing IPsec TX limits hardware functionality").

That software workaround is not limited to TX and works for bytes too.
So relax the validation logic to not block soft/hard limits in bytes.

Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:33 -07:00
Adham Faris
6dd6eaf43e net/mlx5e: Increase max supported channels number to 256
Increase max supported channels number to 256 (it is not extended
further due to testing disabilities).

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:33 -07:00
Adham Faris
74a8dadac1 net/mlx5e: Preparations for supporting larger number of channels
Data center server CPUs number keeps getting larger with time.
Currently, our driver limits the number of channels to 128.

Maximum channels number is enforced and bounded by hardcoded
defines (en.h/MLX5E_MAX_NUM_CHANNELS) even though the device and machine
(CPUs num) can allow more.

Refactor current implementation in order to handle further channels.

The maximum supported channels number will be increased in the followup
patch.

Introduce RQT size calculation/allocation scheme below:
1) Preserve current RQT size of 256 for channels number up to 128 (the
   old limit).
2) For greater channels number, RQT size is calculated by multiplying
   the channels number by 2 and rounding up the result to the nearest
   power of 2. If the calculated RQT size exceeds the maximum supported
   size by the NIC, fallback to this maximum RQT size
   (1 << log_max_rqt_size).

Since RQT size is no more static, allocate and free the indirection
table SW shadow dynamically.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:33 -07:00
Adham Faris
0d806cf9c0 net/mlx5e: Refactor mlx5e_rss_init() and mlx5e_rss_free() API's
Introduce code refactoring below:
1) Introduce single API for creating and destroying rss object,
   mlx5e_rss_create() and mlx5e_rss_destroy() respectively.
2) mlx5e_rss_create() constructs and initializes RSS object depends
   on a function new param enum mlx5e_rss_create_type. Callers (like
   rx_res.c) will no longer need to allocate RSS object via
   mlx5e_rss_alloc() and initialize it immediately via
   mlx5e_rss_init_no_tirs() or mlx5e_rss_init(), this will be done by
   a single call to mlx5e_rss_create(). Hence, mlx5e_rss_alloc() and
   mlx5e_rss_init_no_tirs() have been removed from rss.h file and became
   static functions.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:32 -07:00
Adham Faris
cae8e6dea2 net/mlx5e: Refactor mlx5e_rss_set_rxfh() and mlx5e_rss_get_rxfh()
Initialize indirect table array with memcpy rather than for loop.
This change has made for two reasons:
1) To be consistent with the indirect table array init in
   mlx5e_rss_set_rxfh().
2) In general, prefer to use memcpy for array initializing rather than
   for loop.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:32 -07:00
Adham Faris
d90ea84375 net/mlx5e: Refactor rx_res_init() and rx_res_free() APIs
Refactor mlx5e_rx_res_init() and mlx5e_rx_res_free() by wrapping
mlx5e_rx_res_alloc() and mlx5e_rx_res_destroy() API's respectively.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:32 -07:00
Yu Liao
5a37b28824 net/mlx5e: Use PTR_ERR_OR_ZERO() to simplify code
Use the standard error pointer macro to shorten the code and simplify.

Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:32 -07:00
Jinjie Ruan
68e81110fb net/mlx5: Use PTR_ERR_OR_ZERO() to simplify code
Return PTR_ERR_OR_ZERO() instead of return 0 or PTR_ERR() to
simplify code.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:31 -07:00
Yue Haibing
0d2d6bc7e7 net/mlx5: Remove unused declaration
Commit 2ac9cfe782 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
declared mlx5e_ipsec_inverse_table_init() but never implemented it.
Commit f52f2faee5 ("net/mlx5e: Introduce flow steering API")
declared mlx5e_fs_set_tc() but never implemented it.
Commit f2f3df5501 ("net/mlx5: EQ, Privatize eq_table and friends")
declared mlx5_eq_comp_cpumask() but never implemented it.
Commit cac1eb2cf2 ("net/mlx5: Lag, properly lock eswitch if needed")
removed mlx5_lag_update() but not its declaration.
Commit 35ba005d82 ("net/mlx5: DR, Set flex parser for TNL_MPLS dynamically")
removed mlx5dr_ste_build_tnl_mpls() but not its declaration.

Commit e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
declared but never implemented mlx5_alloc_cmd_mailbox_chain() and mlx5_free_cmd_mailbox_chain().
Commit 0cf53c1247 ("net/mlx5: FWPage, Use async events chain")
removed mlx5_core_req_pages_handler() but not its declaration.
Commit 938fe83c8d ("net/mlx5_core: New device capabilities handling")
removed mlx5_query_odp_caps() but not its declaration.
Commit f6a8a19bb1 ("RDMA/netdev: Hoist alloc_netdev_mqs out of the driver")
removed mlx5_rdma_netdev_alloc() but not its declaration.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:31 -07:00
Shay Drory
b430c1b4f6 net/mlx5: Replace global mlx5_intf_lock with HCA devcom component lock
mlx5_intf_lock is used to sync between LAG changes and its slaves
mlx5 core dev aux devices changes, which means every time mlx5 core
dev add/remove aux devices, mlx5 is taking this global lock, even if
LAG functionality isn't supported over the core dev.
This cause a bottleneck when probing VFs/SFs in parallel.

Hence, replace mlx5_intf_lock with HCA devcom component lock, or no
lock if LAG functionality isn't supported.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:31 -07:00
Shay Drory
e534552c92 net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcom
LAG peer device lookout bus logic required the usage of global lock,
mlx5_intf_mutex.
As part of the effort to remove this global lock, refactor LAG peer
device lookout to use mlx5 devcom layer.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:30 -07:00
Shay Drory
89d351c224 net/mlx5: Avoid false positive lockdep warning by adding lock_class_key
Downstream patch will add devcom component which will be locked in
many places. This can lead to a false positive "possible circular
locking dependency" warning by lockdep, on flows which lock more than
one mlx5 devcom component, such as probing ETH aux device.
Hence, add a lock_class_key per mlx5 device.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:30 -07:00
Wei Zhang
15fa898aeb net/mlx5: Redesign SF active work to remove table_lock
active_work is a work that iterates over all
possible SF devices which their SF port
representors are located on different function,
and in case SF is in active state, probes it.
Currently, the active_work in active_wq is
synced with mlx5_vhca_events_work via table_lock
and this lock causing a bottleneck in performance.

To remove table_lock, redesign active_wq logic
so that it now pushes active_work per SF to
mlx5_vhca_events_workqueues. Since the latter
workqueues are ordered, active_work and
mlx5_vhca_events_work with same index will be
pushed into same workqueue, thus it completely
eliminates the need for a lock.

Signed-off-by: Wei Zhang <weizhang@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:30 -07:00
Wei Zhang
3f7f31fff2 net/mlx5: Parallelize vhca event handling
At present, mlx5 driver have a general purpose
event handler which not only handles vhca event
but also many other events. This incurs a huge
bottleneck because the event handler is
implemented by single threaded workqueue and all
events are forced to be handled in serial manner
even though application tries to create multiple
SFs simultaneously.

Introduce a dedicated vhca event handler which
manages SFs parallel creation.

Signed-off-by: Wei Zhang <weizhang@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14 10:16:29 -07:00
Justin Stitt
88fca39b66 net/mlx4_core: replace deprecated strncpy with strscpy
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

We expect `dst` to be NUL-terminated based on its use with format
strings:
|       mlx4_dbg(dev, "Reporting Driver Version to FW: %s\n", dst);

Moreover, NUL-padding is not required.

Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231011-strncpy-drivers-net-ethernet-mellanox-mlx4-fw-c-v1-1-4d7b5d34c933@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 17:10:28 -07:00
Ido Schimmel
958a140d7a mlxsw: pci: Allocate skbs using GFP_KERNEL during initialization
The driver allocates skbs during initialization and during Rx
processing. Take advantage of the fact that the former happens in
process context and allocate the skbs using GFP_KERNEL to decrease the
probability of allocation failure.

Tested with CONFIG_DEBUG_ATOMIC_SLEEP=y.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/dfa6ed0926e045fe7c14f0894cc0c37fee81bf9d.1697034729.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 17:08:03 -07:00
Jakub Kicinski
1bc60524ca Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:

====================
This PR is collected from
https://lore.kernel.org/all/cover.1695296682.git.leon@kernel.org

This series from Patrisious extends mlx5 to support IPsec packet offload
in multiport devices (MPV, see [1] for more details).

These devices have single flow steering logic and two netdev interfaces,
which require extra logic to manage IPsec configurations as they performed
on netdevs.

[1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/

* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Handle IPsec steering upon master unbind/bind
  net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic
  net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic
  net/mlx5: Add create alias flow table function to ipsec roce
  net/mlx5: Implement alias object allow and create functions
  net/mlx5: Add alias flow table bits
  net/mlx5: Store devcom pointer inside IPsec RoCE
  net/mlx5: Register mlx5e priv to devcom in MPV mode
  RDMA/mlx5: Send events from IB driver about device affiliation state
  net/mlx5: Introduce ifc bits for migration in a chunk mode

====================

Link: https://lore.kernel.org/r/20231002083832.19746-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13 09:35:34 -07:00