Commit graph

8165 commits

Author SHA1 Message Date
Yevgeny Kliteynik
1ffd498901 net/mlx5: DR, Increase supported num of actions to 32
Increase max supported number of actions in the same rule.

Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 16:19:00 -07:00
Yevgeny Kliteynik
11a45def2e net/mlx5: DR, Add support for SF vports
Move all the vport capabilities to a separate struct and store vport caps
in XArray: SFs vport numbers will not come in the same range as VF vports,
so the existing implementation of vport capabilities as a fixed size array
is not suitable here.

XArray is a perfect fit: it is efficient when the indices used are densely
clustered. In addition to being a perfect fit as a dynamic data structure,
XArray also provides locking - it uses RCU and an internal spinlock to
synchronise access, so no additional protection needed.

Now except for the eswitch manager vport, all other vports (including the
uplink vport) are handled in the same way: when a new go-to-vport action
is added, this vport's caps are loaded from the xarray. If it is the first
time for this particular vport number, then its capabilities are queried
from FW and filled in into the appropriate entry.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 16:18:59 -07:00
Yevgeny Kliteynik
c0e90fc2cc net/mlx5: DR, Support csum recalculation flow table on SFs
Implement csum recalculation flow tables in XAarray instead of a fixed
array, thus adding support for csum recalc table on any valid vport
number, which enables this support for SFs.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 16:18:59 -07:00
Yevgeny Kliteynik
ee1887fb7c net/mlx5: DR, Align error messages for failure to obtain vport caps
Print similar error messages when an invalid vport number is
provided during action creation and during STEv0/1 creation.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 16:18:59 -07:00
Yevgeny Kliteynik
dd4acb2a09 net/mlx5: DR, Add missing query for vport 0
Currently, vport 0 capabilities are not set.
To fix this, we now querying both eswitch manager and vport 0.
Eswitch manager has an access to all the vports - for eswitch manager PF, all
vports can be referred as other vports. The exception is embedded CPU mode,
where there is vport 0 of ECPF and the PF vport 0.

Here is how vport are queried:

For Connect-X5/6:
    PF vport (0) and vports 1..n: vport number, other = true
    esw_manager is vport 0 (PF)
For BlueField (in embedded CPU mode):
    ECPF vport: vport = 0, other = false
    PF vport (0) and 1..n: vport number, other = true
    esw_manager = vport 0 (ECPF)

Also, note that there's no need for other_vport function parameter
in dr_domain_query_vport - this value is now deduced locally in the
function.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 16:18:58 -07:00
Yevgeny Kliteynik
7ae8ac9a58 net/mlx5: DR, Replace local WIRE_PORT macro with the existing MLX5_VPORT_UPLINK
SW steering defines its own macro for uplink vport number.
Replace this macro with an already existing mlx5 macro.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 16:18:58 -07:00
Yevgeny Kliteynik
f9f93bd55c net/mlx5: DR, Fix vport number data type to u16
According to the HW spec, vport number is a 16-bit value.
Fix vport usage all over the code to u16 data type.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 16:18:58 -07:00
Aya Levin
3bf1742f3c net/mlx5e: Mutually exclude setting of TX-port-TS and MQPRIO in channel mode
TX-port-TS hijacks the PTP traffic to a specific HW TX-queue. This
conflicts with MQPRIO in channel mode, which specifies explicitly which
TC accepts the packet. This patch mutually excludes the above
configuration.

Fixes: ec60c4581b ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:57 -07:00
Lama Kayal
dd1979cf3c net/mlx5e: Fix the presented RQ index in PTP stats
PTP-RQ counters title format contains PTP-RQ identifier, which is
mistakenly not passed to sprinft().
This leads to unexpected garbage values instead.
This patch fixes it.

Before applying the patch:
ethtool -S eth3 | grep ptp_rq
     ptp_rq15_packets: 0
     ptp_rq8_bytes: 0
     ptp_rq6_csum_complete: 0
     ptp_rq14_csum_complete_tail: 0
     ptp_rq3_csum_complete_tail_slow : 0
     ptp_rq9_csum_unnecessary: 0
     ptp_rq1_csum_unnecessary_inner: 0
     ptp_rq7_csum_none: 0
     ptp_rq10_xdp_drop: 0
     ptp_rq9_xdp_redirect: 0
     ptp_rq13_lro_packets: 0
     ptp_rq12_lro_bytes: 0
     ptp_rq10_ecn_mark: 0
     ptp_rq9_removed_vlan_packets: 0
     ptp_rq5_wqe_err: 0
     ptp_rq8_mpwqe_filler_cqes: 0
     ptp_rq2_mpwqe_filler_strides: 0
     ptp_rq5_oversize_pkts_sw_drop: 0
     ptp_rq6_buff_alloc_err: 0
     ptp_rq15_cqe_compress_blks: 0
     ptp_rq2_cqe_compress_pkts: 0
     ptp_rq2_cache_reuse: 0
     ptp_rq12_cache_full: 0
     ptp_rq11_cache_empty: 256
     ptp_rq12_cache_busy: 0
     ptp_rq11_cache_waive: 0
     ptp_rq12_congst_umr: 0
     ptp_rq11_arfs_err: 0
     ptp_rq9_recover: 0

After applying the patch:
ethtool -S eth3 | grep ptp_rq
     ptp_rq0_packets: 0
     ptp_rq0_bytes: 0
     ptp_rq0_csum_complete: 0
     ptp_rq0_csum_complete_tail: 0
     ptp_rq0_csum_complete_tail_slow : 0
     ptp_rq0_csum_unnecessary: 0
     ptp_rq0_csum_unnecessary_inner: 0
     ptp_rq0_csum_none: 0
     ptp_rq0_xdp_drop: 0
     ptp_rq0_xdp_redirect: 0
     ptp_rq0_lro_packets: 0
     ptp_rq0_lro_bytes: 0
     ptp_rq0_ecn_mark: 0
     ptp_rq0_removed_vlan_packets: 0
     ptp_rq0_wqe_err: 0
     ptp_rq0_mpwqe_filler_cqes: 0
     ptp_rq0_mpwqe_filler_strides: 0
     ptp_rq0_oversize_pkts_sw_drop: 0
     ptp_rq0_buff_alloc_err: 0
     ptp_rq0_cqe_compress_blks: 0
     ptp_rq0_cqe_compress_pkts: 0
     ptp_rq0_cache_reuse: 0
     ptp_rq0_cache_full: 0
     ptp_rq0_cache_empty: 256
     ptp_rq0_cache_busy: 0
     ptp_rq0_cache_waive: 0
     ptp_rq0_congst_umr: 0
     ptp_rq0_arfs_err: 0
     ptp_rq0_recover: 0

Fixes: a28359e922 ("net/mlx5e: Add PTP-RX statistics")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:57 -07:00
Shay Drory
f88c487634 net/mlx5: Fix setting number of EQs of SFs
When setting number of completion EQs of the SF, consider number of
online CPUs.
Without this consideration, when number of online cpus are less than 8,
unnecessary 8 completion EQs are allocated.

Fixes: c36326d38d ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:56 -07:00
Shay Drory
ac8b7d50ae net/mlx5: Fix length of irq_index in chars
The maximum irq_index can be 2047, This means irq_name should have 4
characters reserve for the irq_index. Hence, increase it to 4.

Fixes: 3af26495a2 ("net/mlx5: Enlarge interrupt field in CREATE_EQ")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:56 -07:00
Aya Levin
99b9a678b2 net/mlx5: Avoid generating event after PPS out in Real time mode
When in Real-time mode, HW clock is synced with the PTP daemon. Hence
driver should not re-calibrate the next pulse (via MTPPSE repetitive
events mechanism).

This patch arms repetitive events only in free-running mode.

Fixes: 432119de33 ("net/mlx5: Add cyc2time HW translation mode support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:56 -07:00
Aya Levin
6472829470 net/mlx5: Force round second at 1PPS out start time
Allow configuration of 1PPS start time only with time-stamp representing
a round second. Prior to this patch driver allowed setting of a
non-round-second which is not supported by the device. Avoid unexpected
behavior by restricting start-time configuration to a round-second.

Fixes: 4272f9b88d ("net/mlx5e: Change 1PPS out scheme")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:55 -07:00
Moshe Shemesh
a586775f83 net/mlx5: E-Switch, Fix double allocation of acl flow counter
Flow counter is allocated in eswitch legacy acl setting functions
without checking if already allocated by previous setting. Add a check
to avoid such double allocation.

Fixes: 07bab95026 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: ea651a86d4 ("net/mlx5: E-Switch, Refactor eswitch egress acl codes")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:55 -07:00
Tariq Toukan
7dbc849b2a net/mlx5e: Improve MQPRIO resiliency
* Add netdev->tc_to_txq rollback in case of failure in
  mlx5e_update_netdev_queues().
* Fix broken transition between the two modes:
  MQPRIO DCB mode with tc==8, and MQPRIO channel mode.
* Disable MQPRIO channel mode if re-attaching with a different number
  of channels.
* Improve code sharing.

Fixes: ec60c4581b ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:55 -07:00
Tariq Toukan
9d758d4a3a net/mlx5e: Keep the value for maximum number of channels in-sync
The value for maximum number of channels is first calculated based
on the netdev's profile and current function resources (specifically,
number of MSIX vectors, which depends among other things on the number
of online cores in the system).
This value is then used to calculate the netdev's number of rxqs/txqs.
Once created (by alloc_etherdev_mqs), the number of netdev's rxqs/txqs
is constant and we must not exceed it.

To achieve this, keep the maximum number of channels in sync upon any
netdevice re-attach.

Use mlx5e_get_max_num_channels() for calculating the number of netdev's
rxqs/txqs. After netdev is created, use mlx5e_calc_max_nch() (which
coinsiders core device resources, profile, and netdev) to init or
update priv->max_nch.

Before this patch, the value of priv->max_nch might get out of sync,
mistakenly allowing accesses to out-of-bounds objects, which would
crash the system.

Track the number of channels stats structures used in a separate
field, as they are persistent to suspend/resume operations. All the
collected stats of every channel index that ever existed should be
preserved. They are reset only when struct mlx5e_priv is,
in mlx5e_priv_cleanup(), which is part of the profile changing flow.

There is no point anymore in blocking a profile change due to max_nch
mismatch in mlx5e_netdev_change_profile(). Remove the limitation.

Fixes: a1f240f180 ("net/mlx5e: Adjust to max number of channles when re-attaching")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:54 -07:00
Raed Salem
f9a10440f0 net/mlx5e: IPSEC RX, enable checksum complete
Currently in Rx data path IPsec crypto offloaded packets uses
csum_none flag, so checksum is handled by the stack, this naturally
have some performance/cpu utilization impact on such flows. As Nvidia
NIC starting from ConnectX6DX provides checksum complete value out of
the box also for such flows there is no sense in taking csum_none path,
furthermore the stack (xfrm) have the method to handle checksum complete
corrections for such flows i.e. IPsec trailer removal and consequently
checksum value adjustment.

Because of the above and in addition the ConnectX6DX is the first HW
which supports IPsec crypto offload then it is safe to report csum
complete for IPsec offloaded traffic.

Fixes: b2ac7541e3 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:54 -07:00
Joshua Roys
dee3b2d0fa net/mlx4_en: Add XDP_REDIRECT statistics
Add counters for XDP REDIRECT success and failure. This brings the
redirect path in line with metrics gathered via the other XDP paths.

Signed-off-by: Joshua Roys <roysjosh@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30 14:14:30 +01:00
Gustavo A. R. Silva
f69bf5dee7 net/mlx4: Use array_size() helper in copy_to_user()
Use array_size() helper instead of the open-coded version in
copy_to_user(). These sorts of multiplication factors need
to be wrapped in array_size().

Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29 11:32:14 +01:00
Meir Lichtinger
d2c8a1554c IB/mlx5: Enable UAR to have DevX UID
UID field was added to alloc_uar and dealloc_uar PRM command, to specify
DevX UID for UAR. This change enables firmware validating user access to
its own UAR resources.

For the kernel allocated UARs the UID will stay 0 as of today.

Signed-off-by: Meir Lichtinger <meirl@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-09-28 18:31:21 +03:00
MichelleJin
d7cade5137 net/mlx5e: check return value of rhashtable_init
When rhashtable_init() fails, it returns -EINVAL.
However, since error return value of rhashtable_init is not checked,
it can cause use of uninitialized pointers.
So, fix unhandled errors of rhashtable_init.

Signed-off-by: MichelleJin <shjy180909@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-28 12:59:24 +01:00
Leon Romanovsky
b2ab483fcb mlxsw: core: Register devlink instance last
Make sure that devlink is open to receive user input when all
parameters are initialized.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27 16:31:59 +01:00
Leon Romanovsky
64ea2d0e72 net/mlx5: Accept devlink user input after driver initialization complete
The change of devlink_alloc() to accept device makes sure that device
is fully initialized and device_register() does nothing except allowing
users to use that devlink instance.

Such change ensures that no user input will be usable till that point and
it eliminates the need to worry about internal locking as long as devlink_register
is called last since all accesses to the devlink are during initialization.

This change fixes the following lockdep warning.

 ======================================================
 WARNING: possible circular locking dependency detected
 5.14.0-rc2+ #27 Not tainted
 ------------------------------------------------------
 devlink/265 is trying to acquire lock:
 ffff8880133c2bc0 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_unload_one+0x1e/0xa0 [mlx5_core]
 but task is already holding lock:
 ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0
 which lock already depends on the new lock.
 the existing dependency chain (in reverse order) is:

 -> #1 (devlink_mutex){+.+.}-{3:3}:
        __mutex_lock+0x149/0x1310
        devlink_register+0xe7/0x280
        mlx5_devlink_register+0x118/0x480 [mlx5_core]
        mlx5_init_one+0x34b/0x440 [mlx5_core]
        probe_one+0x480/0x6e0 [mlx5_core]
        pci_device_probe+0x2a0/0x4a0
        really_probe+0x1cb/0xba0
        __driver_probe_device+0x18f/0x470
        driver_probe_device+0x49/0x120
        __driver_attach+0x1ce/0x400
        bus_for_each_dev+0x11e/0x1a0
        bus_add_driver+0x309/0x570
        driver_register+0x20f/0x390
        0xffffffffa04a0062
        do_one_initcall+0xd5/0x400
        do_init_module+0x1c8/0x760
        load_module+0x7d9d/0xa4b0
        __do_sys_finit_module+0x118/0x1a0
        do_syscall_64+0x3d/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
        __lock_acquire+0x2999/0x5a40
        lock_acquire+0x1a9/0x4a0
        __mutex_lock+0x149/0x1310
        mlx5_unload_one+0x1e/0xa0 [mlx5_core]
        mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core]
        devlink_reload+0x1f2/0x640
        devlink_nl_cmd_reload+0x6c3/0x10d0
        genl_family_rcv_msg_doit+0x1e9/0x2f0
        genl_rcv_msg+0x27f/0x4a0
        netlink_rcv_skb+0x11e/0x340
        genl_rcv+0x24/0x40
        netlink_unicast+0x433/0x700
        netlink_sendmsg+0x6fb/0xbe0
        sock_sendmsg+0xb0/0xe0
        __sys_sendto+0x192/0x240
        __x64_sys_sendto+0xdc/0x1b0
        do_syscall_64+0x3d/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(devlink_mutex);
                                lock(&dev->intf_state_mutex);
                                lock(devlink_mutex);
   lock(&dev->intf_state_mutex);

  *** DEADLOCK ***

 3 locks held by devlink/265:
  #0: ffffffff836371d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
  #1: ffffffff83637288 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x31a/0x4a0
  #2: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0

 stack backtrace:
 CPU: 0 PID: 265 Comm: devlink Not tainted 5.14.0-rc2+ #27
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  dump_stack_lvl+0x45/0x59
  check_noncircular+0x268/0x310
  ? print_circular_bug+0x460/0x460
  ? __kernel_text_address+0xe/0x30
  ? alloc_chain_hlocks+0x1e6/0x5a0
  __lock_acquire+0x2999/0x5a40
  ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
  ? add_lock_to_list.constprop.0+0x6c/0x530
  lock_acquire+0x1a9/0x4a0
  ? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  ? lock_release+0x6c0/0x6c0
  ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
  ? lock_is_held_type+0x98/0x110
  __mutex_lock+0x149/0x1310
  ? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  ? lock_is_held_type+0x98/0x110
  ? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  ? find_held_lock+0x2d/0x110
  ? mutex_lock_io_nested+0x1160/0x1160
  ? mlx5_lag_is_active+0x72/0x90 [mlx5_core]
  ? lock_downgrade+0x6d0/0x6d0
  ? do_raw_spin_lock+0x12e/0x270
  ? rwlock_bug.part.0+0x90/0x90
  ? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  mlx5_unload_one+0x1e/0xa0 [mlx5_core]
  mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core]
  ? netlink_broadcast_filtered+0x308/0xac0
  ? mlx5_devlink_info_get+0x1f0/0x1f0 [mlx5_core]
  ? __build_skb_around+0x110/0x2b0
  ? __alloc_skb+0x113/0x2b0
  devlink_reload+0x1f2/0x640
  ? devlink_unregister+0x1e0/0x1e0
  ? security_capable+0x51/0x90
  devlink_nl_cmd_reload+0x6c3/0x10d0
  ? devlink_nl_cmd_get_doit+0x1e0/0x1e0
  ? devlink_nl_pre_doit+0x72/0x8d0
  genl_family_rcv_msg_doit+0x1e9/0x2f0
  ? __lock_acquire+0x15e2/0x5a40
  ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
  ? mutex_lock_io_nested+0x1160/0x1160
  ? security_capable+0x51/0x90
  genl_rcv_msg+0x27f/0x4a0
  ? genl_get_cmd+0x3c0/0x3c0
  ? lock_acquire+0x1a9/0x4a0
  ? devlink_nl_cmd_get_doit+0x1e0/0x1e0
  ? lock_release+0x6c0/0x6c0
  netlink_rcv_skb+0x11e/0x340
  ? genl_get_cmd+0x3c0/0x3c0
  ? netlink_ack+0x930/0x930
  genl_rcv+0x24/0x40
  netlink_unicast+0x433/0x700
  ? netlink_attachskb+0x750/0x750
  ? __alloc_skb+0x113/0x2b0
  netlink_sendmsg+0x6fb/0xbe0
  ? netlink_unicast+0x700/0x700
  ? netlink_unicast+0x700/0x700
  sock_sendmsg+0xb0/0xe0
  __sys_sendto+0x192/0x240
  ? __x64_sys_getpeername+0xb0/0xb0
  ? do_sys_openat2+0x10a/0x370
  ? down_write_nested+0x150/0x150
  ? do_user_addr_fault+0x215/0xd50
  ? __x64_sys_openat+0x11f/0x1d0
  ? __x64_sys_open+0x1a0/0x1a0
  __x64_sys_sendto+0xdc/0x1b0
  ? syscall_enter_from_user_mode+0x1d/0x50
  do_syscall_64+0x3d/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f50b50b6b3a
 Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
 RSP: 002b:00007fff6c0d3f38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f50b50b6b3a
 RDX: 0000000000000038 RSI: 000055763ac08440 RDI: 0000000000000003
 RBP: 000055763ac08410 R08: 00007f50b5192200 R09: 000000000000000c
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
 R13: 0000000000000000 R14: 000055763ac08410 R15: 000055763ac08440
 mlx5_core 0000:00:09.0: firmware version: 4.8.9999
 mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link)
 mlx5_core 0000:00:09.0 eth1: Link up

Fixes: a6f3b62386 ("net/mlx5: Move devlink registration before interfaces load")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27 16:31:59 +01:00
Leon Romanovsky
1e72685916 net/mlx4: Move devlink_register to be the last initialization command
Refactor the code to make sure that devlink_register() is the last
command during initialization stage.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27 16:31:59 +01:00
Dima Chumak
05000bbba1 net/mlx5e: Enable TC offload for ingress MACVLAN
Support offloading of TC rules that filter ingress traffic from a MACVLAN
device, which is attached to uplink representor.

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:57 -07:00
Dima Chumak
fca572f2bc net/mlx5e: Enable TC offload for egress MACVLAN
Support offloading of TC rules that mirror/redirect egress traffic to a
MACVLAN device, which is attached to mlx5 representor net device.

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:56 -07:00
Roi Dayan
7990b1b5e8 net/mlx5e: loopback test is not supported in switchdev mode
In switchdev mode we insert steering rules to eswitch that
make sure packets can't be looped back.
Modify the self tests infra and have flags per test.
Add a flag for tests that needs to be skipped in switchdev mode.

Before this commit:

$ ethtool --test enp8s0f0
 The test result is FAIL
 The test extra info:
 Link Test        0
 Speed Test       0
 Health Test      0
 Loopback Test    1

After this commit:

$ ethtool --test enp8s0f0
 The test result is PASS
 The test extra info:
 Link Test        0
 Speed Test       0
 Health Test      0

Example output in dmesg:

enp8s0f0: Self test begin..
enp8s0f0:         [0] Link Test start..
enp8s0f0:         [0] Link Test end: result(0)
enp8s0f0:         [1] Speed Test start..
enp8s0f0:         [1] Speed Test end: result(0)
enp8s0f0:         [2] Health Test start..
enp8s0f0:         [2] Health Test end: result(0)
enp8s0f0: Self test out: status flags(0x1)

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:55 -07:00
Roi Dayan
c50775d0e2 net/mlx5e: Use NL_SET_ERR_MSG_MOD() for errors parsing tunnel attributes
This to be consistent and adds the module name to the error message.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:55 -07:00
Roi Dayan
f3e02e479d net/mlx5e: Use tc sample stubs instead of ifdefs in source file
Instead of having sparse ifdefs in source files use a single
ifdef in the tc sample header file and use stubs.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:54 -07:00
Roi Dayan
1cc35b707c net/mlx5e: Remove redundant priv arg from parse_pedit_to_reformat()
The priv argument is not being used. remove it.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:54 -07:00
Roi Dayan
6b50cf45b6 net/mlx5e: Check action fwd/drop flag exists also for nic flows
The driver should add offloaded rules with either a fwd or drop action.
The check existed in parsing fdb flows but not when parsing nic flows.
Move the test into actions_match_supported() which is called for
checking nic flows and fdb flows.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:53 -07:00
Roi Dayan
7f8770c716 net/mlx5e: Set action fwd flag when parsing tc action goto
Do it when parsing like in other actions instead of when
checking if goto is supported in current scenario.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:53 -07:00
Roi Dayan
475fb86ac9 net/mlx5e: Remove incorrect addition of action fwd flag
A user is expected to explicit request a fwd or drop action.
It is not correct to implicit add a fwd action for the user,
when modify header action flag exists.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:52 -07:00
Roi Dayan
1836d78015 net/mlx5e: Use correct return type
modify_header_match_supported() should return type bool but
it returns the value returned by is_action_keys_supported()
which is type int.

is_action_keys_supported() always returns either -EOPNOTSUPP
or 0 and it shouldn't change as the purpose of the function
is checking for support. so just make the function return
a bool type.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:52 -07:00
Aya Levin
6c2509d446 net/mlx5e: Add error flow for ethtool -X command
Prior to this patch, ethtool -X fail but the user receives a success
status. Try to roll-back when failing and return success status
accordingly.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:51 -07:00
Yevgeny Kliteynik
c228dce262 net/mlx5: DR, Fix code indentation in dr_ste_v1
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-24 11:46:51 -07:00
Joshua Roys
a8551c9b75 net: mlx4: Add support for XDP_REDIRECT
Signed-off-by: Joshua Roys <roysjosh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 14:10:35 +01:00
Amit Cohen
ba1c71324b mlxsw: Add support for IP-in-IP with IPv6 underlay for Spectrum-2 and above
Currently, mlxsw driver supports IP-in-IP only with IPv4 underlay.
Add support for IPv6 underlay for Spectrum-2 and above.

Most of the configurations are same to IPv4, the main difference between
IPv4 and IPv6 is related to saving IP addresses.
IPv6 addresses are saved as part of KVD and the relevant registers hold
pointer to them.
Add API for that as part of ipip_ops, so then only Spectrum-2 and above
will save IPv6 addresses in this way.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:52 +01:00
Amit Cohen
8d4f10463c mlxsw: spectrum_router: Increase parsing depth for IPv6 decapsulation
The Spectrum ASIC has a configurable limit on how deep into the packet
it parses. By default, the limit is 96 bytes.

For IP-in-IP packets, with IPv6 outer and inner headers, the default
parsing depth is not enough and without increasing it such packets cannot
be properly decapsulated.

Use the existing API to set parsing depth, call it once for each
decapsulation entry when it is created/destroyed.
There is no need to protect the code with new mutex because 'router->lock'
is already taken in these code paths.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:52 +01:00
Amit Cohen
53eedd61de mlxsw: Add IPV6_ADDRESS kvdl entry type
Add support for allocating and freeing KVD entries for IPv6 addresses.

These addresses are programmed by the RIPS register and referenced by
the RATR and RTDP registers for IPv6 underlay encapsulation and
decapsulation, respectively.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:52 +01:00
Amit Cohen
713e8502fd mlxsw: spectrum_ipip: Add mlxsw_sp_ipip_gre6_ops
Add operations for IP-in-IP GRE6. For now the function can_offload()
returns false and the other functions warn as they should never be called.

A later patch will add dedicated operations for Spectrum-2 which will
support IPv6 underlay, so use 'mlxsw_sp1_' prefix for the new operations.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:52 +01:00
Amit Cohen
a82feba686 mlxsw: Create separate ipip_ops_arr for different ASICs
Currently, there is support for IP-in-IP only with IPv4 underlay for all
supported Spectrum ASICs.
The next patches will add support for IPv6 underlay only for Spectrum-2
and above.

Add infrastructure for splitting IP-in-IP support between different
ASICs - create separate ipip_ops_arr and add ipips_init function to set the
right ops.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:52 +01:00
Amit Cohen
36c2ab890b mlxsw: reg: Add support for ritr_loopback_ipip6_pack()
The RITR register is used to configure the router interface table.

For IP-in-IP, it stores the underlay source IP address for encapsulation
and also the ingress RIF for the underlay lookup.

Add support for IPv6 IP-in-IP configuration.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:52 +01:00
Amit Cohen
c729ae8d6c mlxsw: reg: Add support for ratr_ipip6_entry_pack()
The RATR register is used to configure the Router Adjacency (next-hop)
Table.

For IP-in-IP entry, underlay destination IPv4 is saved as part of this
register and underlay destination IPv6 is saved by RIPS register and RATR
saves pointer to it.

Add function for setting IPv6 IP-in-IP configuration.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:51 +01:00
Amit Cohen
a917bb271d mlxsw: reg: Add support for rtdp_ipip6_pack()
The RTDP register is used for configuring the tunnel decapsulation
properties of NVE and IP-in-IP.

Linux tunnels verify packets before decapsulation based on the packet's
source IP, which must match tunnel remote IP.
RTDP is used to configure decapsulation so that it filters out packets that
are not IPv6 or have the wrong source IP or wrong GRE key.

For IP-in-IP entry, source IPv4 is saved as part of this register and
source IPv6 is saved by RIPS register and RTDP saves pointer to it.

Create common function for configuring both IPv4 and IPv6 and add
dedicated functions for each protocol.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:51 +01:00
Amit Cohen
dd8a9552d4 mlxsw: reg: Add Router IP version Six Register
The RIPS register is used to store IPv6 addresses for use by the NVE and
IP-in-IP.

For IPv6 underlay support, RATR register needs to hold a pointer to the
remote IPv6 address for encapsulation and RTDP register needs to hold a
pointer to the local IPv6 address for decapsulation check.

Add the required register for saving IPv6 addresses.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:51 +01:00
Amit Cohen
59bf980dd9 mlxsw: Take tunnel's type into account when searching underlay device
The function __mlxsw_sp_ipip_netdev_ul_dev_get() returns the underlay
device that corresponds to the overlay device that it gets.
Currently, this function assumes that the tunnel is IPv4 GRE, because it
is the only one that is supported by mlxsw driver.

This assumption will no longer be correct when IPv6 GRE support is added,
resulting in wrong underlay device being returned.
Instead, check 'ol_dev->type' and return the underlay device accordingly.

Move the function to spectrum_ipip.c because spectrum_router.c should not
be aware to tunnel type.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:51 +01:00
Amit Cohen
80ef2abcdd mlxsw: spectrum_ipip: Create common function for mlxsw_sp_ipip_ol_netdev_change_gre()
The function mlxsw_sp_ipip_ol_netdev_change_gre4() contains code that
can be shared between IPv4 and IPv6.
The only difference is the way that arguments are taken from tunnel
parameters, which are different between IPv4 and IPv6.

For that, add structure 'mlxsw_sp_ipip_parms' to hold all the required
parameters for the function and save it as part of
'struct mlxsw_sp_ipip_entry' instead of the existing structure that is
not shared between IPv4 and IPv6. Add new operation as part of
'mlxsw_sp_ipip_ops' to initialize the new structure.

Then mlxsw_sp_ipip_ol_netdev_change_gre{4,6}() will prepare the new
structure and both will call the same function.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:51 +01:00
Amit Cohen
8aba32cea3 mlxsw: spectrum_router: Fix arguments alignment
Suppress the following checkpatch.pl check [1] by adding a variable to
store the IP-in-IP options. Noticed while adding equivalent IPv6 code in
subsequent patches.

[1]
CHECK: Alignment should match open parenthesis
+               mlxsw_reg_ritr_loopback_ipip4_pack(ritr_pl, lb_cf.lb_ipipt,
+
+ MLXSW_REG_RITR_LOOPBACK_IPIP_OPTIONS_GRE_KEY_PRESET,

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:51 +01:00
Amit Cohen
aa6fd8f177 mlxsw: spectrum_ipip: Pass IP tunnel parameters by reference and as 'const'
Currently, there are several functions that handle 'struct ip_tunnel_parm'
and 'struct __ip6_tnl_parm'.

Change the mentioned functions to get IP tunnel parameters by reference
and as 'const'.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:51 +01:00
Amit Cohen
45bce5c99d mlxsw: spectrum_router: Create common function for fib_entry_type_unset() code
mlxsw_sp_fib4_entry_type_unset() is not specific for IPv4 FIB entry,
move the code to mlxsw_sp_fib_entry_type_unset(), and call this function
from mlxsw_sp_fib4_entry_type_unset() so then it will be used for IPv6
also.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24 10:26:51 +01:00
Jakub Kicinski
2fcd14d0f7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/mptcp/protocol.c
  977d293e23 ("mptcp: ensure tx skbs always have the MPTCP ext")
  efe686ffce ("mptcp: ensure tx skbs always have the MPTCP ext")

same patch merged in both trees, keep net-next.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-23 11:19:49 -07:00
Aya Levin
fdbccea419 net/mlx4_en: Don't allow aRFS for encapsulated packets
Driver doesn't support aRFS for encapsulated packets, return early error
in such a case.

Fixes: 1eb8c695bd ("net/mlx4_en: Add accelerated RFS support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23 13:17:39 +01:00
Ido Schimmel
e3a3aae74d mlxsw: spectrum_router: Start using new trap adjacency entry
Start using the trap adjacency entry that was added in the previous
patch and remove the existing one which is no longer needed.

Note that the name of the old entry was inaccurate as the entry did not
discard packets, but trapped them.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-22 14:35:02 +01:00
Ido Schimmel
4bdf80bcb7 mlxsw: spectrum_router: Add trap adjacency entry upon first nexthop group
In commit 0c3cbbf96d ("mlxsw: Add specific trap for packets routed via
invalid nexthops"), mlxsw started allocating a new adjacency entry
during driver initialization, to trap packets routed via invalid
nexthops.

This behavior was later altered in commit 983db6198f ("mlxsw:
spectrum_router: Allocate discard adjacency entry when needed") to only
allocate the entry upon the first route that requires it. The motivation
for the change is explained in the commit message.

The problem with the current behavior is that the entry shows up as a
"leak" in a new BPF resource monitoring tool [1]. This is caused by the
asymmetry of the allocation/free scheme. While the entry is allocated
upon the first route that requires it, it is only freed during
de-initialization of the driver.

Instead, track the number of active nexthop groups and allocate the
adjacency entry upon the creation of the first group. Free it when the
number of active groups reaches zero.

The next patch will convert mlxsw to start using the new entry and
remove the old one.

[1] https://github.com/Mellanox/mlxsw/tree/master/Debugging/libbpf-tools/resmon

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-22 14:35:01 +01:00
Leon Romanovsky
db4278c55f devlink: Make devlink_register to be void
devlink_register() can't fail and always returns success, but all drivers
are obligated to check returned status anyway. This adds a lot of boilerplate
code to handle impossible flow.

Make devlink_register() void and simplify the drivers that use that
API call.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com> # dsa
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-22 14:15:12 +01:00
Lama Kayal
72a3c58d18 net/mlx4_en: Resolve bad operstate value
Any link state change that's done prior to net device registration
isn't reflected on the state, thus the operational state is left
obsolete, with 'UNKNOWN' status.

To resolve the issue, query link state from FW upon open operations
to ensure operational state is updated.

Fixes: c27a02cd94 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19 13:21:04 +01:00
Jakub Kicinski
561bed688b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts!

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-16 13:58:38 -07:00
David Thompson
ee8a9600b5 mlxbf_gige: clear valid_polarity upon open
The network interface managed by the mlxbf_gige driver can
get into a problem state where traffic does not flow.
In this state, the interface will be up and enabled, but
will stop processing received packets.  This problem state
will happen if three specific conditions occur:
    1) driver has received more than (N * RxRingSize) packets but
       less than (N+1 * RxRingSize) packets, where N is an odd number
       Note: the command "ethtool -g <interface>" will display the
       current receive ring size, which currently defaults to 128
    2) the driver's interface was disabled via "ifconfig oob_net0 down"
       during the window described in #1.
    3) the driver's interface is re-enabled via "ifconfig oob_net0 up"

This patch ensures that the driver's "valid_polarity" field is
cleared during the open() method so that it always matches the
receive polarity used by hardware.  Without this fix, the driver
needs to be unloaded and reloaded to correct this problem state.

Fixes: f92e1869d7 ("Add Mellanox BlueField Gigabit Ethernet driver")
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-16 14:31:58 +01:00
Eli Cohen
7c3a0a018e net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert
Remove the assert from the callback priv lookup function since it does
not require RTNL lock and is already protected by flow_indr_block_lock.

This will avoid warnings from being emitted to dmesg if the driver
registers its callback after an ingress qdisc was created for a
netdevice.

The warnings started after the following patch was merged:
commit 74fc4f8287 ("net: Fix offloading indirect devices dependency on qdisc order creation")

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-16 14:09:30 +01:00
Ido Schimmel
49fd3b645d mlxsw: Add support for transceiver modules reset
Implement support for ethtool_ops::reset in order to reset transceiver
modules. The module backing the netdev is reset when the 'ETH_RESET_PHY'
flag is set. After a successful reset, the flag is cleared by the driver
and other flags are ignored. This is in accordance with the interface
documentation:

"The reset() operation must clear the flags for the components which
were actually reset. On successful return, the flags indicate the
components which were not reset, either because they do not exist in the
hardware or because they cannot be reset independently. The driver must
never reset any components that were not requested."

Reset is useful in order to allow a module to transition out of a fault
state. From section 6.3.2.12 in CMIS 5.0: "Except for a power cycle, the
only exit path from the ModuleFault state is to perform a module reset
by taking an action that causes the ResetS transition signal to become
TRUE (see Table 6-11)".

An error is returned when the netdev is administratively up:

 # ip link set dev swp11 up

 # ethtool --reset swp11 phy
 ETHTOOL_RESET 0x40
 Cannot issue ETHTOOL_RESET: Invalid argument

 # ip link set dev swp11 down

 # ethtool --reset swp11 phy
 ETHTOOL_RESET 0x40
 Components reset:     0x40

An error is returned when the module is shared by multiple ports (split
ports) and the "phy-shared" flag is not set:

 # devlink port split swp11 count 4

 # ethtool --reset swp11s0 phy
 ETHTOOL_RESET 0x40
 Cannot issue ETHTOOL_RESET: Invalid argument

 # ethtool --reset swp11s0 phy-shared
 ETHTOOL_RESET 0x400000
 Components reset:     0x400000

 # devlink port unsplit swp11s0

 # ethtool --reset swp11 phy
 ETHTOOL_RESET 0x40
 Components reset:     0x40

An error is also returned when one of the ports using the module is
administratively up:

 # devlink port split swp11 count 4

 # ip link set dev swp11s1 up

 # ethtool --reset swp11s0 phy-shared
 ETHTOOL_RESET 0x400000
 Cannot issue ETHTOOL_RESET: Invalid argument

 # ip link set dev swp11s1 down

 # ethtool --reset swp11s0 phy-shared
 ETHTOOL_RESET 0x400000
 Components reset:     0x400000

Reset is performed by writing to the "rst" bit of the PMAOS register,
which instructs the firmware to assert the reset signal connected to the
module for a fixed amount of time.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15 16:17:16 +01:00
Ido Schimmel
8f4ebdb0a2 mlxsw: Make PMAOS pack function more generic
The PMAOS register has enable bits (e.g., PMAOS.ee) that allow changing
only a subset of the fields, which is exactly what subsequent patches
will need to do. Instead of passing multiple arguments to its pack
function, only pass the module index and let the rest be set by the
different callers.

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15 16:17:16 +01:00
Ido Schimmel
ef23841bb9 mlxsw: reg: Add fields to PMAOS register
The Ports Module Administrative and Operational Status (PMAOS) register
configures and retrieves the per-module status. Extend it with fields
required to support various module settings such as reset and power
mode.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15 16:17:16 +01:00
Ido Schimmel
896f399be0 mlxsw: Track per-module port status
In the common port module core, track the number of logical ports that
are mapped to the port module and the number of logical ports using it
that are administratively up.

This will be used by later patches to potentially veto and control
certain operations on the module, such as reset and setting its power
mode.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15 16:17:16 +01:00
Ido Schimmel
196bff2927 mlxsw: spectrum: Do not return an error in mlxsw_sp_port_module_unmap()
The return value is never checked. Allows us to simplify a later patch.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15 16:17:15 +01:00
Ido Schimmel
06277ca238 mlxsw: spectrum: Do not return an error in ndo_stop()
The return value is not checked by the networking stack. Allows us to
simplify a later patch.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15 16:17:15 +01:00
Ido Schimmel
bd6e43f595 mlxsw: core_env: Convert 'module_info_lock' to a mutex
After the previous patch, the lock is always taken in process context so
it can be converted to a mutex. It is needed for future changes where we
will need to be able to sleep when holding the lock.

Convert the lock to a mutex.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15 16:17:15 +01:00
Ido Schimmel
163f3d2dd0 mlxsw: core_env: Defer handling of module temperature warning events
Module temperature events are currently handled in softIRQ context,
requiring the 'module_info_lock' to be a spin lock. In future patchsets
we will need to be able to hold the lock while sleeping.

Therefore, defer handling of these events using a work queue so that the
next patch will be able to convert the lock to a mutex.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15 16:17:15 +01:00
Ido Schimmel
25a91f835a mlxsw: core: Remove mlxsw_core_is_initialized()
After the previous patch, the switch driver is always initialized last,
making this function redundant.

Remove it.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15 16:17:15 +01:00
Ido Schimmel
3d7a6f6779 mlxsw: core: Initialize switch driver last
Commit 961cf99a07 ("mlxsw: core: Re-order initialization sequence")
changed the initialization sequence so that the switch driver (e.g.,
mlxsw_spectrum) is initialized before registration with the hwmon and
thermal subsystems.

This was done in order to avoid situations where hwmon/thermal code uses
features not supported by current firmware version, which is only
validated as part of switch driver initialization.

Later, commit b79cb787ac ("mlxsw: Move fw flashing code into core.c")
moved firmware validation and flashing code from the switch driver to
mlxsw_core so that it is performed before driver initialization.

Therefore, change the initialization sequence back to its original form.

In addition to being more straightforward, it will allow us to simplify
parts of the code in subsequent patches and future patchsets.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15 16:17:15 +01:00
Leon Romanovsky
e9310aed8e net/mlx5: Publish and unpublish all devlink parameters at once
The devlink parameters were published in two steps despite being static
and known in advance.

First step was to use devlink_params_publish() which iterated over all
known up to that point parameters and sent notification messages.
In second step, the call was devlink_param_publish() that looped over
same parameters list and sent notification for new parameters.

In order to simplify the API, move devlink_params_publish() to be called
when all parameters were already added and save the need to iterate over
parameters list again.

As a side effect, this change fixes the error unwind flow in which
parameters were not marked as unpublished.

Fixes: 82e6c96f04 ("net/mlx5: Register to devlink ingress VLAN filter trap")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-15 16:12:55 +01:00
Jiri Pirko
cd92d79d5f mlxsw: reg: Remove PMTM register
It is not used anymore, remove it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14 12:44:16 +01:00
Jiri Pirko
32ada69bba mlxsw: spectrum: Use PMTDB register to obtain split info
Newly introduced PMTDB register is there to provide all needed info
about particular requested port split configuration. Use it instead of
figuring the info out manually in the driver.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14 12:44:16 +01:00
Jiri Pirko
78f824b335 mlxsw: reg: Add Port Module To local DataBase Register
The PMTDB register allows to query the possible module<->local port
mapping than can be used in PMLP. It does not represent the actual/current
mapping of the local to module. Actual mapping is only defined by PMLP.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14 12:44:16 +01:00
Jiri Pirko
1dbfc9d765 mlxsw: spectrum: Use PLLP to get front panel number and split number
Instead of relying on the values coming from the PMLP register, use PLLP
to get the information about port front panel number and split number.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14 12:44:16 +01:00
Jiri Pirko
ed403777f6 mlxsw: reg: Add Port Local port to Label Port mapping Register
The PLLP register returns the mapping from Local Port into Label Port.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14 12:44:16 +01:00
Jiri Pirko
fec2386162 mlxsw: spectrum: Move port SWID set before core port init
During port creation, mlxsw_core_port_init() is called with the front
panel port number and the split port sub-number. Currently, this
information is determined by the driver without firmware assistance.

Subsequent patches are going to query this information from firmware,
but this requires the port to assigned to SWID.

Therefore, move port SWID assignment before mlxsw_core_port_init().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14 12:44:16 +01:00
Jiri Pirko
13eb056ee5 mlxsw: spectrum: Move port module mapping before core port init
During port creation, mlxsw_core_port_init() is called with the front
panel port number and the split port sub-number. Currently, this
information is determined by the driver without firmware assistance.

Subsequent patches are going to query this information from firmware,
but this requires the port to be mapped to a module.

Therefore, move port mapping before mlxsw_core_port_init().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14 12:44:16 +01:00
Jiri Pirko
847371ce04 mlxsw: spectrum: Bump minimum FW version to xx.2008.3326
Add latest verified version of Nvidia Spectrum-family switch firmware,
for Spectrum (13.2008.3326), Spectrum-2 (29.2008.3326) and Spectrum-3
(30.2008.3326).

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14 12:44:16 +01:00
Aya Levin
8db6a54f3c net/mlx5e: Fix condition when retrieving PTP-rqn
When activating the PTP-RQ, redirect the RQT from drop-RQ to PTP-RQ.
Use mlx5e_channels_get_ptp_rqn to retrieve the rqn. This helper returns
a boolean (not status), hence caller should consider return value 0 as a
fail. Change the caller interpretation of the return value.

Fixes: 43ec0f41fa ("net/mlx5e: Hide all implementation details of mlx5e_rx_res")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-07 14:17:02 -07:00
Aya Levin
c91c1da72b net/mlx5e: Fix mutual exclusion between CQE compression and HW TS
Some profiles of the driver don't support a dedicated PTP-RQ, hence can't
support HW TS and CQE compression simultaneously. When HW TS is enabled
the COE compression is disabled, and should be restored when the HW TS
is turned off. Add rx_filter as an input to modifying CQE compression to
enforce this restriction.

Fixes: 256f79d13c ("net/mlx5e: Fix HW TS with CQE compression according to profile")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-07 14:16:58 -07:00
Maor Gottlieb
ee27e330a9 net/mlx5: Fix potential sleeping in atomic context
Fixes the below flow of sleeping in atomic context by releasing
the RCU lock before calling to free_match_list.

build_match_list() <- disables preempt
-> free_match_list()
   -> tree_put_node()
      -> down_write_ref_node() <- take write lock

Fixes: 693c6883bb ("net/mlx5: Add hash table for flow groups in flow table")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-07 14:16:55 -07:00
Saeed Mahameed
dfe6fd72b5 net/mlx5: FWTrace, cancel work on alloc pd error flow
Handle error flow on mlx5_core_alloc_pd() failure,
read_fw_strings_work must be canceled.

Fixes: c71ad41ccb ("net/mlx5: FW tracer, events handling")
Reported-by: Pavel Machek (CIP) <pavel@denx.de>
Suggested-by: Pavel Machek (CIP) <pavel@denx.de>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
2021-09-07 14:16:52 -07:00
Mark Bloch
da8252d580 net/mlx5: Lag, don't update lag if lag isn't supported
In NICs that don't support LAG, the LAG control structure won't be
allocated. If it wasn't allocated it means LAG doesn't exists and can be
skipped.

Fixes: cac1eb2cf2 ("net/mlx5: Lag, properly lock eswitch if needed")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-07 14:16:49 -07:00
Parav Pandit
897ae4b40e net/mlx5: Fix rdma aux device on devlink reload
RDMA auxdev parameter registration was skipped for eswitch manager PCI PF.
Due to this when devlink parameter is read, it reads as false in below
code flow.

$ devlink dev reload pci/0000:06:00.0
  devlink_reload()
    mlx5_load_one()
      mlx5_attach_device()
        is_ib_enabled()
          devlink_param_driverinit_value_get()

Due to this, is_ib_enabled() returns false for the RDMA auxiliary
device. This results into a skipping RDMA auxiliary device creation on
reload.

There is no need to check for eswitch manager capability to support RDMA
auxiliary device. Hence, fix it by skipping eswitch manager capability.

Fixes: 87158cedf0 ("net/mlx5: Support enable_rdma devlink dev param")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-07 14:16:46 -07:00
Vlad Buslov
8343268ec3 net/mlx5: Bridge, fix uninitialized variable usage
In some conditions variable 'err' is not assigned with value in
mlx5_esw_bridge_port_obj_attr_set() and mlx5_esw_bridge_port_changeupper()
functions after recent changes to support LAG. Initialize the variable with
zero value in both cases.

Reported-by: Colin King <colin.king@canonical.com>
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
CC: linux-kernel@vger.kernel.org
Fixes: ff9b752146 ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
2021-09-07 14:16:42 -07:00
Jakub Kicinski
29ce8f9701 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/linux/netdevice.h
net/socket.c

  d0efb16294 ("net: don't unconditionally copy_from_user a struct ifreq for socket ioctls")

  876f0bf9d0 ("net: socket: simplify dev_ifconf handling")
  29c4964822 ("net: socket: rework compat_ifreq_ioctl()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-31 09:06:04 -07:00
Cai Huoqing
464a57281f net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx()
Use the devm_platform_ioremap_resource_byname() helper instead of
calling platform_get_resource_byname() and devm_ioremap_resource()
separately

Use the devm_platform_ioremap_resource() helper instead of
calling platform_get_resource() and devm_ioremap_resource()
separately

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-31 12:08:05 +01:00
Maxim Mikityanskiy
ca49bfd90a sch_htb: Fix inconsistency when leaf qdisc creation fails
In HTB offload mode, qdiscs of leaf classes are grafted to netdev
queues. sch_htb expects the dev_queue field of these qdiscs to point to
the corresponding queues. However, qdisc creation may fail, and in that
case noop_qdisc is used instead. Its dev_queue doesn't point to the
right queue, so sch_htb can lose track of used netdev queues, which will
cause internal inconsistencies.

This commit fixes this bug by keeping track of the netdev queue inside
struct htb_class. All reads of cl->leaf.q->dev_queue are replaced by the
new field, the two values are synced on writes, and WARNs are added to
assert equality of the two values.

The driver API has changed: when TC_HTB_LEAF_DEL needs to move a queue,
the driver used to pass the old and new queue IDs to sch_htb. Now that
there is a new field (offload_queue) in struct htb_class that needs to
be updated on this operation, the driver will pass the old class ID to
sch_htb instead (it already knows the new class ID).

Fixes: d03b195b5a ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210826115425.1744053-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30 16:33:59 -07:00
Yevgeny Kliteynik
a2ebfbb7b1 net/mlx5: DR, Add support for update FTE
Add the support for update FTE, which is needed for cases where there are
multiple rules with the same match. In such case fs_core will merge the
actions and call update FTE to update current FTE. Since we don't want to
disrupt the traffic, we will add the new duplicate rule, and only then
remove the old duplicate rule.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:04 -07:00
Yevgeny Kliteynik
8a015baef5 net/mlx5: DR, Improve rule tracking memory consumption
To track each STE of the rule a rule member was allocated, each
member would point to one STE. This means that we would allocate
40B (rule member) * number of STEs per rule.

To reduce this per rule allocation we use the STE tree pointers
for next_htbl and pointing STE to navigate the tree, this allows
us to keep only the pointer to the last STE of rule (always unique).
From the last rule STE we are able to traverse and rebuild all of
the STEs that construct the rule.

In our testing with 8M rules, each consisting of 7 STES, we were able
to reduce 1.6GB of memory.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:04 -07:00
Yevgeny Kliteynik
32c8e3b230 net/mlx5: DR, Remove rehash ctrl struct from dr_htbl
The calculations to decide for the maximum allowed collision threshold
are simple and there is no reason to save them on the htbl struct.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:04 -07:00
Yevgeny Kliteynik
46f2a8ae8a net/mlx5: DR, Remove HW specific STE type from nic domain
Instead of using the HW specific STEv0 type, it is better to use
an enum to indicate if this is an RX or TX nic domain.
This means that now we will need to convert the nic domain type
to the corresponding STE type.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:03 -07:00
Yevgeny Kliteynik
ab9d1f9612 net/mlx5: DR, Merge DR_STE_SIZE enums
Merge DR_STE_SIZE enums - no need for a separate enum for reduced STE size.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:03 -07:00
Yevgeny Kliteynik
990467f8af net/mlx5: DR, Skip source port matching on FDB RX domain
The FDB RX pipe is connected to the wire and the source port for all
incoming packets equals to wire, single uplink port per PF, this means
there is no point of matching on the source port in such case.
Once we recognize such case, we will optimize the RX steering rule.
Note that in such case we clean both source_eswitch_owner_vhca_id and
source_port.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:03 -07:00
Yevgeny Kliteynik
63b85f49c0 net/mlx5: DR, Add ignore_flow_level support for multi-dest flow tables
When creating an FTE, we might need to create multi-destination flow table,
which is eventually created by FW. In such case, this FW table should
include all the FTE properties as requested by the upper layer, including
the ability to point to another flow table with level lower or equal to
the current table - indicated by the "ignore_flow_level" property.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:02 -07:00
Yevgeny Kliteynik
a01a43fa16 net/mlx5: DR, Use FW API when updating FW-owned flow table
Need to call the DR API only when it is DR table.
To update FW-owned table the driver should call the FW API.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:02 -07:00
Yevgeny Kliteynik
ae3eddcff7 net/mlx5: DR, replace uintN_t with kernel-style types
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:02 -07:00
Yevgeny Kliteynik
0733535d59 net/mlx5: DR, Support IPv6 matching on flow label for STEv0
Add missing support for matching on IPv6 flow label for STEv0.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:01 -07:00
Bodong Wang
d7d0b2450e net/mlx5: DR, Reduce print level for FT chaining level check
There are usecases with Connection Tracking that have such connection
as default, printing this warning in dmesg confuses the user.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:01 -07:00
Yevgeny Kliteynik
d5a84e968f net/mlx5: DR, Warn and ignore SW steering rule insertion on QP err
In the event of SW steering QP entering error state, SW steering
cannot insert more rules, and will silently ignore the insertion
after issuing a warning.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:01 -07:00
Yevgeny Kliteynik
f35715a657 net/mlx5: DR, Improve error flow in actions_build_ste_arr
Improve error flow and print actions sequence when an
invalid/unsupported sequence provided.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:01 -07:00
Yevgeny Kliteynik
ec449ed823 net/mlx5: DR, Enable QP retransmission
Under high stress, SW steering might get stuck on polling for completion
that never comes.
For such cases QP needs to have protocol retransmission mechanism enabled.
Currently the retransmission timeout is defined as 0 (unlimited). Fix this
by defining a real timeout.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:00 -07:00
Yevgeny Kliteynik
2de40f68cf net/mlx5: DR, Enable VLAN pop on TX and VLAN push on RX
Enable pop VLAN action in TX and push VLAN in RX.
These actions are supported only on STEv1.

On TX: when a host sends a packet, VLAN is popped at the beginning.
On RX: just before passing the packet to the host the VLAN is pushed.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:00 -07:00
Yevgeny Kliteynik
f5e22be534 net/mlx5: DR, Split modify VLAN state to separate pop/push states
Split modify vlan state in the actions state machine to pop vlan
and push vlan states. This enables using of pop/push vlan without
restrictions (e.g. pop vlan on TX in STEv1).

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:38:00 -07:00
Yevgeny Kliteynik
0139145fb8 net/mlx5: DR, Added support for REMOVE_HEADER packet reformat
ConnectX supports offloading of various encapsulations and decapsulations
(e.g. VXLAN), which are performed by 'Packet Reformat' action. Starting
with ConnectX-6 DX, a new reformat type is supported - REMOVE_HEADER, which
allows deleting an arbitrary size chunk at the selected position in the packet.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:37:59 -07:00
Wentao_Liang
6cc64770fb net/mlx5: DR, fix a potential use-after-free bug
In line 849 (#1), "mlx5dr_htbl_put(cur_htbl);" drops the reference to
cur_htbl and may cause cur_htbl to be freed.

However, cur_htbl is subsequently used in the next line, which may result
in an use-after-free bug.

Fix this by calling mlx5dr_err() before the cur_htbl is put.

Signed-off-by: Wentao_Liang <Wentao_Liang_g@163.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:42 -07:00
Dmytro Linkin
f9d196bd63 net/mlx5e: Use correct eswitch for stack devices with lag
If link aggregation is used within stack devices driver rejects encap
rules if PF of the VF tunnel device is down. This happens because route
resolved for other PF and its eswitch instance is used to determine
correct vport.
To fix that use devcom feature to retrieve other eswitch instance if
failed to find vport for the 1st eswitch and LAG is active.

Fixes: 10742efc20 ("net/mlx5e: VF tunnel TX traffic offloading")
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:42 -07:00
Maor Dickman
ca6891f9b2 net/mlx5: E-Switch, Set vhca id valid flag when creating indir fwd group
When indirect forward group is created, flow is added with vhca id but
without setting vhca id valid flag which violates the PRM.

Fix by setting the missing flag, vhca id valid.

Fixes: 34ca65352d ("net/mlx5: E-Switch, Indirect table infrastructure")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:42 -07:00
Roi Dayan
9a5f9cc794 net/mlx5e: Fix possible use-after-free deleting fdb rule
After neigh-update-add failure we are still with a slow path rule but
the driver always assume the rule is an fdb rule.
Fix neigh-update-del by checking slow path tc flag on the flow.
Also fix neigh-update-add for when neigh-update-del fails the same.

Fixes: 5dbe906ff1 ("net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:42 -07:00
Leon Romanovsky
8e7e2e8ed0 net/mlx5: Remove all auxiliary devices at the unregister event
The call to mlx5_unregister_device() means that mlx5_core driver is
removed. In such scenario, we need to disregard all other flags like
attach/detach and forcibly remove all auxiliary devices.

Fixes: a5ae8fc905 ("net/mlx5e: Don't create devices during unload flow")
Tested-and-Reported-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:41 -07:00
Dima Chumak
2f8b6161cc net/mlx5: Lag, fix multipath lag activation
When handling FIB_EVENT_ENTRY_REPLACE event for a new multipath route,
lag activation can be missed if a stale (struct lag_mp)->mfi pointer
exists, which was associated with an older multipath route that had been
removed.

Normally, when a route is removed, it triggers mlx5_lag_fib_event(),
which handles FIB_EVENT_ENTRY_DEL and clears mfi pointer. But, if
mlx5_lag_check_prereq() condition isn't met, for example when eswitch is
in legacy mode, the fib event is skipped and mfi pointer becomes stale.

Fix by resetting mfi pointer to NULL in mlx5_deactivate_lag().

Fixes: 8a66e45859 ("net/mlx5: Change ownership model for lag")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-26 15:15:41 -07:00
Yufeng Mo
f3ccfda193 ethtool: extend coalesce setting uAPI with CQE mode
In order to support more coalesce parameters through netlink,
add two new parameter kernel_coal and extack for .set_coalesce
and .get_coalesce, then some extra info can return to user with
the netlink API.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24 07:38:29 -07:00
Christophe JAILLET
eb9c5c0d3a net/mellanox: switch from 'pci_' to 'dma_' API
The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below.

It has been hand modified to use 'dma_set_mask_and_coherent()' instead of
'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable.
This is less verbose.

It has been compile tested.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-23 12:02:28 +01:00
Amit Cohen
43c1b83305 mlxsw: spectrum_router: Increase parsing depth for multipath hash
Commit 01848e05f8 ("mlxsw: spectrum_router: Add support for inner
layer 3 multipath hash policy") and commit daeabf89eb ("mlxsw:
spectrum_router: Add support for custom multipath hash policy") added
support for multipath hash policies where the hash is calculated based
on inner packet fields.

For IPv6-in-IPv6 packets, the default parsing depth (96 bytes) is not
enough when these policies are used.

Therefore, for such cases, call the new API to increase / decrease the
parsing depth as necessary. Care is taken to ensure the API is not
called multiple times.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-22 21:39:55 +01:00
Amit Cohen
c3d2ed93b1 mlxsw: Remove old parsing depth infrastructure
The previous patches added new API to handle parsing depth and converted
the existing code to use it.

Remove the old infrastructure which is not needed anymore.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-22 21:39:55 +01:00
Amit Cohen
0071e7cdc3 mlxsw: Convert existing consumers to use new API for parsing configuration
Convert VxLAN and PTP modules to increase parsing depth using new API
that was added in the previous patch.

Separate MPRS register's configuration to VxLAN related configuration
and parsing depth configuration. Handle each one using the appropriate
API.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-22 21:39:55 +01:00
Amit Cohen
2d91f0803b mlxsw: spectrum: Add infrastructure for parsing configuration
Spectrum ASICs have a configurable limit on how deep into the packet
they parse. By default, the limit is 96 bytes.

There are several cases where this parsing depth is not enough and there
is a need to increase it. Currently, increasing parsing depth is
maintained as part of VxLAN module, because the MPRS register which
configures parsing depth also configures UDP destination port number
used for VxLAN encapsulation and decapsulation.

Add an API for increasing parsing depth as part of spectrum.c code, so
that it will be possible to use it from other modules. In addition, add
an API for setting UDP destination port and protect it using a dedicated
lock for saving parsing configurations. The lock is needed as not all
the callers hold RTNL lock.

Maintain a counter for increased parsing depth consumers. For first
consumer subscription, increase the parsing depth and for last consumer
unsubscription, set parsing depth to default value.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-22 21:39:55 +01:00
Dmytro Linkin
3202ea65f8 net/mlx5: E-switch, Add QoS tracepoints
Add tracepoints to log QoS enabling/disabling/configuration for vports
and rate groups.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:41 -07:00
Dmytro Linkin
0fe132eac3 net/mlx5: E-switch, Allow to add vports to rate groups
Implement eswitch API that allows updating rate groups. If group
pointer is NULL, then move the vport to internal unlimited group zero.

Implement devlink_ops->rate_parent_node_set() callback in the terms of
the new eswitch group update API.

Enable QoS for all group's elements if a group has allocated BW share.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:40 -07:00
Dmytro Linkin
f47e04eb96 net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups
Provide eswitch API to allow controlling group rate limits. Use it to
implement devlink_ops->mlx5_devlink_rate_node_tx_{share|max}_set().

The share rate will create relative bandwidth share on the groups level
while within the group the user can set shared rate on the member vports
of that group and this rate will be relative to the group's share rate.
The group with the highest shared rate will get a BW share of 100 and
the rest of the groups will get a value that reflects the ratio between
their share rate and the maximum share rate.

Example:
Created four rate groups with tx_share limits:

$ devlink port function rate add \
    pci/0000:06:00.0/group_1 tx_share 30gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_2 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_3 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_4 tx_share 10gbit

Assuming link speed is 50 Gbit/sec ratio divider will be
50 / (30+20+20+10) = 0.625. Normalized rate values for the groups:

<group_1> 30 * 0.625 = 18.75 Gbit/sec
<group_2> 20 * 0.625 = 12.5 Gbit/sec
<group_3> 20 * 0.625 = 12.5 Gbit/sec
<group_4> 10 * 0.625 = 6.25 Gbit/sec

Rate group with unlimited tx_share rate will receive minimum BW value
(1Mbit/sec) if presented any group with tx_share rate limit. This allow
to not drop all packets in case of heavy traffic.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:40 -07:00
Dmytro Linkin
1ae258f8b3 net/mlx5: E-switch, Introduce rate limiting groups API
Extend eswitch API with rate limiting groups:

- Define new struct mlx5_esw_rate_group that is used to hold all
  internal group data.

- Implement functions that allow creation, destruction and cleanup of
  groups.

- Assign all vports to internal unlimited zero group by default.

This commit lays the groundwork for group rate limiting by implementing
devlink_ops->rate_node_{new|del}() callbacks to support creating and
deleting groups through devlink rate node objects. APIs that allows
setting rates and adding/removing members are implemented in following
patches.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:40 -07:00
Dmytro Linkin
ad34f02fe2 net/mlx5: E-switch, Enable devlink port tx_{share|max} rate control
Register devlink rate leaf object for every eswitch vport.
Implement devlink ops that enable setting shared and max tx rates
through devlink API.
Extract common eswitch code from existing tx rate set function that is
accessed through NDO to be reused for the devlink. Values configured
with NDO API are not visible for the devlink API, therefore shouldn't be
used simultaneously.

When normalizing the BW share value, dividing the desired minimum rate
by the common divider results in losing information since the quotient
is rounded down. This has a significant affect on configurations of low
rate where the round down eliminates a large percentage of the total
rate. To improve the formula, round up the division result to make sure
that the BW share is at least the value it was supposed to be and won't
lost a significant amount of the expected value.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:39 -07:00
Dmytro Linkin
2d116e3e7e net/mlx5: E-switch, Move QoS related code to dedicated file
Move eswitch QoS related code into dedicated file. Provide eswitch API
to access this code meaning it is isolated and restricted to be used
only by eswitch.c. Exception is legacy NDO vf set rate, which moved to
esw/legacy.c.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:39 -07:00
Chris Mi
2741f22309 net/mlx5e: TC, Support sample offload action for tunneled traffic
Currently the sample offload actions send the encapsulated packet
to software. This commit decapsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.

If decapsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:38 -07:00
Chris Mi
ee950e5db1 net/mlx5e: TC, Restore tunnel info for sample offload
Currently the sample offload actions send the encapsulated packet
to software. sFlow expects tunneled packets to be decapsulated while
having the tunnel properties on the skb metadata fields.

Reuse the functions used by connection tracking to map the outer
header properties to a unique id. The next patch  will use that id
to restore the tunnel information of decapsulated packets onto the
skb.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:38 -07:00
Chris Mi
d12e20ac06 net/mlx5e: TC, Remove CONFIG_NET_TC_SKB_EXT dependency when restoring tunnel
CONFIG_NET_TC_SKB_EXT controls the SKB extension support for
restoring chain ids. SKB extension is not required for tunnel
restoration.

Remove the CONFIG_NET_TC_SKB_EXT dependency as a pre-step for
using the tunnel restore methods for sample offload use cases.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:38 -07:00
Chris Mi
f0da4daa34 net/mlx5e: Refactor ct to use post action infrastructure
Move post action table management to common library providing
add/del/get API. Refactor the ct action offload to use the common
API.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:37 -07:00
Chris Mi
6f0b692a5a net/mlx5e: Introduce post action infrastructure
Some tc actions are modeled in hardware using multiple tables
causing a tc action list split. For example, CT action is modeled
by jumping to a ct table which is controlled by nf flowtable.
sFlow jumps in hardware to a sample table, which continues to a
"default table" where it should continue processing the action list.

Multi table actions are modeled in hardware using a unique fte_id.
The fte_id is set before jumping to a table. Split actions continue
to a post-action table where the matched fte_id value continues the
execution the tc action list.

Currently the post-action design is implemented only by the ct
action. Introduce post action infrastructure as a pre-step for
reusing it with the sFlow offload feature. Init and destroy the
common post action table. Refactor the ct offload to use the
common post table infrastructure in the next patch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:37 -07:00
Chris Mi
2799797845 net/mlx5e: CT, Use xarray to manage fte ids
IDR is deprecated. Use xarray instead.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:37 -07:00
Chris Mi
bcd6740c6b net/mlx5e: Move sample attribute to flow attribute
Currently it is in eswitch attribute. Move it to flow attribute to
reflect the change in previous patch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:36 -07:00
Chris Mi
0027d70c73 net/mlx5e: Move esw/sample to en/tc/sample
Module sample belongs to en/tc instead of esw. Move it and rename
accordingly.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:36 -07:00
Saeed Mahameed
5024fa95a1 net/mlx5e: Remove mlx5e dependency from E-Switch sample
mlx5/esw/sample.c doesn't really need mlx5e_priv object, we can remove
this redundant dependency by passing the eswitch object directly to
the sample object constructor.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
2021-08-19 21:50:35 -07:00
Jason Wang
19b8ece42c net/mlx4: Use ARRAY_SIZE to get an array's size
The ARRAY_SIZE macro is defined to get an array's size which is
more compact and more formal in linux source. Thus, we can replace
the long sizeof(arr)/sizeof(arr[0]) with the compact ARRAY_SIZE.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210817121106.44189-1-wangborong@cdjrlc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-18 15:16:54 -07:00
Vlad Buslov
ff9b752146 net/mlx5: Bridge, support LAG
Allow adding bond net devices to mlx5 bridge with following changes:

- Modify bridge representor code to obtain uplink represetor that belongs
to eswitch that is registered for notification. Require representor to be
in shared FDB mode. If representor is the lag master, then consider its
port as local, otherwise treat it as peer.

- Use devcom to match on paired eswitch metadata in peer FDB entries. This
is necessary for shared FDB LAG to function since packets are always
received on active eswitch instance as opposed to parent eswitch of port.

- Support for deleting peer flows when receiving
SWITCHDEV_FDB_DEL_TO_BRIDGE notification was implemented in one of previous
patches in series. Now also implement support for handling
SWITCHDEV_FDB_ADD_TO_BRIDGE which can be generated on peer by bridge update
workqueue task in LAG configuration. Refresh the flow 'lastuse' timestamp
to current jiffies when receiving such notification on eswitch that manages
the local FDB entry. This allows peer entries to prevent ageing of the FDB.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:32 -07:00
Vlad Buslov
c358ea1741 net/mlx5: Bridge, allow merged eswitch connectivity
Allow connectivity between representors of different eswitch instances that
are attached to same bridge when merged_eswitch capability is enabled. Add
ports of peer eswitch to bridge instance and mark them with
MLX5_ESW_BRIDGE_PORT_FLAG_PEER. Mark FDBs offloaded on peer ports with
MLX5_ESW_BRIDGE_FLAG_PEER flag. Such FDBs can only be aged out on their
local eswitch instance, which then sends SWITCHDEV_FDB_DEL_TO_BRIDGE event.
Listen to the event on mlx5 bridge implementation and delete peer FDBs in
event handler.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:31 -07:00
Vlad Buslov
bf3d56d8f5 net/mlx5: Bridge, extract FDB delete notification to function
SWITCHDEV_FDB_DEL_TO_BRIDGE notification is generated in multiple places in
bridge code. Following patch in series changes the condition for the
notification. Extract the notification into dedicated helper function
mlx5_esw_bridge_fdb_del_notify() to only modify it in single place in the
future changes.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:31 -07:00
Vlad Buslov
3ee6233e61 net/mlx5: Bridge, identify port by vport_num+esw_owner_vhca_id pair
Following patches in series allow traffic between vports of different
eswitch instances, which requires addressing bridge port by
vport_num+esw_owner_vhca_id pair since vport_num is only unique
per-eswitch. As a preparation, extend struct mlx5_esw_bridge_port with
'esw_owner_vhca_id' field and use it as part of key for
mlx5_esw_bridge->vports xarray.

With this change we can't rely on switchdev_handle_port_obj_add() helper to
get mlx5 representor from stacked device because we need specifically
representor from parent eswitch that registered the callback to obtain
correct esw_owner_vhca_id. The helper doesn't allow passing additional
parameters to predicate function and doesn't provide access to the notifier
block to obtain eswitch through br_offloads. Implement custom helpers to
obtain mlx5 representor and use them in
mlx5_esw_bridge_port_obj_{add|del|attr_set}() implementations.

Remove direct pointer to parent bridge from struct mlx5_vport as it is no
longer needed.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:30 -07:00
Vlad Buslov
a514d17350 net/mlx5: Bridge, obtain core device from eswitch instead of priv
Following patches in series will pass bond device to bridge, which means
the code can't assume the device is mlx5 representor. Moreover, the core
device can be easily obtained from eswitch instance, so there is no reason
for more complex code that obtains struct mlx5_priv from net_device in
order to use its mdev. Refactor the code to use esw->dev instead of
priv->mdev.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:30 -07:00
Vlad Buslov
4de20e9a12 net/mlx5: Bridge, release bridge in same function where it is taken
Refactor mlx5_esw_bridge_vport_link() to release the bridge instance if
mlx5_esw_bridge_vport_init() returned an error instead of relying on it to
release the bridge. This improves the design because object instance is
taken and released in same layer and simplifies following patches that add
more logic to mlx5_esw_bridge_vport_link().

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:30 -07:00
Tariq Toukan
ec60c4581b net/mlx5e: Support MQPRIO channel mode
Add support for MQPRIO channel mode, in which a partition to TCs
is defined over the channels. We allow partitions with contiguous
queue indices, with no holes within. We do not allow modification
to the num of channels while this MQPRIO mode is active.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:29 -07:00
Tariq Toukan
21ecfcb83a net/mlx5e: Handle errors of netdev_set_num_tc()
Add handling for failures in netdev_set_num_tc().
Let mlx5e_netdev_set_tcs return an int.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:29 -07:00
Tariq Toukan
e2aeac448f net/mlx5e: Maintain MQPRIO mode parameter
This is in preparation for supporting MQPRIO CHANNEL mode in
downstream patch, in addition to DCB mode that's supported today.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:29 -07:00
Tariq Toukan
86d747a3f9 net/mlx5e: Abstract MQPRIO params
Abstract the MQPRIO params into a struct.
Use a getter for DCB mode num_tcs.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:28 -07:00
Tariq Toukan
248d3b4c9a net/mlx5e: Support flow classification into RSS contexts
Extend the existing flow classification support, to steer
flows not only directly to a receive ring, but also into
the new RSS contexts.

Create needed TIR objects on demand, and hold reference
on the RSS context.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:28 -07:00
Tariq Toukan
f01cc58c18 net/mlx5e: Support multiple RSS contexts
Add support to multiple RSS contexts. Resources of the non-default
RSS contexts are allocated and created on demand. Each RSS context
can be controlled and configured separately, via the implemented
ethtool ops. Here we limit the num of total contexts to 16.

We do not enforce any kind of new limitation over the indirection table
content. More specifically, two separate contexts can be configured to
fully or partially point to the same set of receive rings.

The default RSS context (index 0) is created with its full set of TIRs.
All other contexts are created with an empty set, then TIRs are added
upon first usage when steering rules are added.
We use a reference counting mechanism to make sure an RSS context is
not removed before the rules pointing to it.

Block ethtool set_channels operations when multiple RSS contexts exist,
as currently the kernel doesn't protect against inconsistent channels
configs that break non-default RSS contexts.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:28 -07:00
Tariq Toukan
49095f641b net/mlx5e: Dynamically allocate TIRs in RSS contexts
Move from static to dynamic memory allocations for TIR.
This is in preparation to supporting on-demand TIR operations in
downstream patches, where every RSS context will be init with an
empty set of TIRs.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:27 -07:00
Tariq Toukan
25307a91cb net/mlx5e: Convert RSS to a dedicated object
Code related to RSS is now encapsulated into a dedicated object and put
into new files en/rss.{c,h}. All usages are converted.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:27 -07:00
Tariq Toukan
713ba5e5f6 net/mlx5e: Introduce abstraction of RSS context
Bring all fields that define and maintain RSS behavior together
into a new structure.
Align all usages with this new structure. Keep it hidden within
rx_res.c.
This helps supporting multiple RSS contexts in downstream patch.

Use dynamic allocations for the RSS context.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:27 -07:00
Tariq Toukan
fc651ff910 net/mlx5e: Introduce TIR create/destroy API in rx_res
Take TIR control operations in rx_res into functions.
This is in preparation to supporting on-demand TIR operations in
downstream patches.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:26 -07:00
Tariq Toukan
6e5fea5196 net/mlx5e: Do not try enable RSS when resetting indir table
All calls to mlx5e_rx_res_rss_set_indir_uniform() occur while the RSS
state is inactive, i.e. the RQT is pointing to the drop RQ, not to the
channels' RQs.
It means that the "apply" part of the function is not called.
Remove this part from the function, and document the change. It will be
useful for next patches in the series, allows code simplifications when
multiple RSS contexts are introduced.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:26 -07:00
Arnd Bergmann
e5f3155267 ethernet: fix PTP_1588_CLOCK dependencies
The 'imply' keyword does not do what most people think it does, it only
politely asks Kconfig to turn on another symbol, but does not prevent
it from being disabled manually or built as a loadable module when the
user is built-in. In the ICE driver, the latter now causes a link failure:

aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl':
ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config'
ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config'
aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config'
ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset':
ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release'
ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild':

This is a recurring problem in many drivers, and we have discussed
it several times befores, without reaching a consensus. I'm providing
a link to the previous email thread for reference, which discusses
some related problems.

To solve the dependency issue better than the 'imply' keyword, introduce a
separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver
can depend on if it is able to use PTP support when available, but works
fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are
then prevented from being built-in, the same way as with a 'depends on
PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick,
but that can be rather confusing when you first see it.

Since this should cover the dependencies correctly, the IS_REACHABLE()
hack in the header is no longer needed now, and can be turned back
into a normal IS_ENABLED() check. Any driver that gets the dependency
wrong will now cause a link time failure rather than being unable to use
PTP support when that is in a loadable module.

However, the two recently added ptp_get_vclocks_index() and
ptp_convert_timestamp() interfaces are only called from builtin code with
ethtool and socket timestamps, so keep the current behavior by stubbing
those out completely when PTP is in a loadable module. This should be
addressed properly in a follow-up.

As Richard suggested, we may want to actually turn PTP support into a
'bool' option later on, preventing it from being a loadable module
altogether, which would be one way to solve the problem with the ethtool
interface.

Fixes: 06c16d89d2 ("ice: register 1588 PTP clock device object for E810 devices")
Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/
Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/
Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/
Acked-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13 17:49:05 -07:00
Jakub Kicinski
f4083a752a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
  9e26680733 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
  9e518f2580 ("bnxt_en: 1PPS functions to configure TSIO pins")
  099fdeda65 ("bnxt_en: Event handler for PPS events")

kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
  a2baf4e8bb ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
  c7603cfa04 ("bpf: Add ambient BPF runtime context stored in current")

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
  5957cc557d ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
  2d0b41a376 ("net/mlx5: Refcount mlx5_irq with integer")

MAINTAINERS
  7b637cd52f ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
  7d901a1e87 ("net: phy: add Maxlinear GPY115/21x/24x driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13 06:41:22 -07:00
Cai Huoqing
61b6a6c395 net/mlx5e: Make use of netdev_warn()
to replace printk(KERN_WARNING ...) with netdev_warn() kindly

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:34 -07:00
Eran Ben Elisha
979aa51967 net/mlx5: Fix variable type to match 64bit
Fix the following smatch warning:
wait_func_handle_exec_timeout() warn: should '1 << ent->idx' be a 64 bit type?

Use 1ULL, to have a 64 bit type variable.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:33 -07:00
Parav Pandit
44f66ac981 net/mlx5: Initialize numa node for all core devices
Subsequent patches make use of numa node affinity for memory
allocations. Initialize it for PCI PF, VF and SF devices.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:33 -07:00
Parav Pandit
48f02eef7f net/mlx5: Allocate individual capability
Currently mlx5_core_dev contains array of capabilities. It contains 19
valid capabilities of the device, 2 reserved entries and 12 holes.
Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K
bytes of memory which is never used. Due to this mlx5_core_dev structure
size is 270Kbytes odd. This allocation further aligns to next power of 2
to 512Kbytes.

By skipping non-existent entries,
(a) 112Kbyte is saved,
(b) mlx5_core_dev reduces to 8KB with alignment
(c) 350KB saved in alignment

In future individual capability allocation can be used to skip its
allocation when such capability is disabled at the device level. This
patch prepares mlx5_core_dev to hold capability using a pointer instead
of inline array.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:33 -07:00
Parav Pandit
5958a6fad6 net/mlx5: Reorganize current and maximal capabilities to be per-type
In the current code, the current and maximal capabilities are
maintained in separate arrays which are both per type. In order to
allow the creation of such a basic structure as a dynamically
allocated array, we move curr and max fields to a unified
structure so that specific capabilities can be allocated as one unit.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:32 -07:00
Parav Pandit
4445abbd13 net/mlx5: SF, use recent sysfs api
Use sysfs_emit() which is aware of PAGE_SIZE buffer.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:32 -07:00
Shay Drory
2d0b41a376 net/mlx5: Refcount mlx5_irq with integer
Currently, all access to mlx5 IRQs are done undere a lock. Hance, there
isn't a reason to have kref in struct mlx5_irq.
Switch it to integer.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:31 -07:00
Shay Drory
68fefb7089 net/mlx5: Change SF missing dedicated MSI-X err message to dbg
When MSI-X vectors allocated are not enough for SFs to have dedicated,
MSI-X, kernel log buffer has too many entries.
Hence only enable such log with debug level.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:31 -07:00
Shay Drory
211f4f99ed net/mlx5: Align mlx5_irq structure
mlx5_irq structure have holes due to incorrect position of fields in it.
Make them naturally align.

pahole output after alignment:
struct mlx5_irq {
        struct atomic_notifier_head nh;                  /*     0    72 */
        /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
        cpumask_var_t              mask;                 /*    72     8 */
        char                       name[32];             /*    80    32 */
        struct mlx5_irq_pool *     pool;                 /*   112     8 */
        struct kref                kref;                 /*   120     4 */
        u32                        index;                /*   124     4 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        int                        irqn;                 /*   128     4 */

        /* size: 136, cachelines: 3, members: 7 */
        /* padding: 4 */
        /* last cacheline: 8 bytes */

};

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:31 -07:00
Leon Romanovsky
8e792700b9 net/mlx5: Delete impossible dev->state checks
New mlx5_core device structure is allocated through devlink_alloc
with\ kzalloc and that ensures that all fields are equal to zero
and it includes ->state too.

That means that checks of that field in the mlx5_init_one() is
completely redundant, because that function is called only once
in the begging of mlx5_core_dev lifetime.

PCI:
 .probe()
  -> probe_one()
   -> mlx5_init_one()

The recovery flow can't run at that time or before it, because relevant
work initialized later in mlx5_init_once().

Such initialization flow ensures that dev->state can't be
MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible
checks.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:30 -07:00
Maor Gottlieb
90b85d4e31 net/mlx5: Fix inner TTC table creation
Fix typo of the cited commit that calls to mlx5_create_ttc_table, instead
of mlx5_create_inner_ttc_table.

Fixes: f4b45940e9 ("net/mlx5: Embed mlx5_ttc_table")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:30 -07:00
Cai Huoqing
39c538d644 net/mlx5: Fix typo in comments
Fix typo:
*vectores  ==> vectors
*realeased  ==> released
*erros  ==> errors
*namepsace  ==> namespace
*trafic  ==> traffic
*proccessed  ==> processed
*retore  ==> restore
*Currenlty  ==> Currently
*crated  ==> created
*chane  ==> change
*cannnot  ==> cannot
*usuallly  ==> usually
*failes  ==> fails
*importent  ==> important
*reenabled  ==> re-enabled
*alocation  ==> allocation
*recived  ==> received
*tanslation  ==> translation

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:30 -07:00
Parav Pandit
70862a5d60 net/mlx5: Support enable_vnet devlink dev param
Enable user to disable VDPA net auxiliary device so that when it is not
required, user can disable it.

For example,

$ devlink dev param set pci/0000:06:00.0 \
              name enable_vnet value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0

At this point devlink instance do not create auxiliary device
mlx5_core.vnet.2 for the VDPA net functionality.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 14:34:22 +01:00
Parav Pandit
87158cedf0 net/mlx5: Support enable_rdma devlink dev param
Enable user to disable RDMA auxiliary device so that when it is not
required, user can disable it.

For example,

$ devlink dev param set pci/0000:06:00.0 \
              name enable_rdma value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0

At this point devlink instance do not create auxiliary device
mlx5_core.rdma.2 for the RDMA functionality.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 14:34:22 +01:00
Parav Pandit
a17beb28ed net/mlx5: Support enable_eth devlink dev param
Enable user to disable Ethernet auxiliary device so that when it is not
required, user can disable it.

For example,

$ devlink dev param set pci/0000:06:00.0 \
              name enable_eth value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0

At this point devlink instance do not create mlx5_core.eth.2 auxiliary
device for the Ethernet functionality.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 14:34:21 +01:00
Parav Pandit
6f35723864 net/mlx5: Fix unpublish devlink parameters
Cleanup routine missed to unpublish the parameters. Add it.

Fixes: e890acd5ff ("net/mlx5: Add devlink flow_steering_mode parameter")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11 14:34:21 +01:00
Vladimir Oltean
c35b57ceff net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
The blamed commit added a new field to struct switchdev_notifier_fdb_info,
but did not make sure that all call paths set it to something valid.
For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
notifier, and since the 'is_local' flag is not set, it contains junk
from the stack, so the bridge might interpret those notifications as
being for local FDB entries when that was not intended.

To avoid that now and in the future, zero-initialize all
switchdev_notifier_fdb_info structures created by drivers such that all
newly added fields to not need to touch drivers again.

Fixes: 2c4eca3ef7 ("net: bridge: switchdev: include local flag in FDB notifications")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-10 13:22:57 -07:00
Jakub Kicinski
ebd0d30cc5 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
pull-request: mlx5-next 2020-08-9

This pulls mlx5-next branch which includes patches already reviewed on
net-next and rdma mailing lists.

1) mlx5 single E-Switch FDB for lag

2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq

3) Add DCS caps & fields support

[1] https://patchwork.kernel.org/project/netdevbpf/cover/20210803231959.26513-1-saeed@kernel.org/

[2] 0e3364dab7.1626609184.git.leonro@nvidia.com/

[3] 55e1d69bef.1624258894.git.leonro@nvidia.com/

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Lag, Create shared FDB when in switchdev mode
  net/mlx5: E-Switch, add logic to enable shared FDB
  net/mlx5: Lag, move lag destruction to a workqueue
  net/mlx5: Lag, properly lock eswitch if needed
  net/mlx5: Add send to vport rules on paired device
  net/mlx5: E-Switch, Add event callback for representors
  net/mlx5e: Use shared mappings for restoring from metadata
  net/mlx5e: Add an option to create a shared mapping
  net/mlx5: E-Switch, set flow source for send to uplink rule
  RDMA/mlx5: Add shared FDB support
  {net, RDMA}/mlx5: Extend send to vport rules
  RDMA/mlx5: Fill port info based on the relevant eswitch
  net/mlx5: Lag, add initial logic for shared FDB
  net/mlx5: Return mdev from eswitch
  IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
  net/mlx5: Add DCS caps & fields support
====================

Link: https://lore.kernel.org/r/20210809202522.316930-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-10 13:19:17 -07:00
Aya Levin
bd37c2888c net/mlx5: Fix return value from tracer initialization
Check return value of mlx5_fw_tracer_start(), set error path and fix
return value of mlx5_fw_tracer_init() accordingly.

Fixes: c71ad41ccb ("net/mlx5: FW tracer, events handling")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:57:03 -07:00
Shay Drory
563476ae0c net/mlx5: Synchronize correct IRQ when destroying CQ
The CQ destroy is performed based on the IRQ number that is stored in
cq->irqn. That number wasn't set explicitly during CQ creation and as
expected some of the API users of mlx5_core_create_cq() forgot to update
it.

This caused to wrong synchronization call of the wrong IRQ with a number
0 instead of the real one.

As a fix, set the IRQ number directly in the mlx5_core_create_cq() and
update all users accordingly.

Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Fixes: ef1659ade3 ("IB/mlx5: Add DEVX support for CQ events")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:57:00 -07:00
Chris Mi
88bbd7b236 net/mlx5e: TC, Fix error handling memory leak
Free the offload sample action on error.

Fixes: f94d6389f6 ("net/mlx5e: TC, Add support to offload sample action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:56:58 -07:00
Shay Drory
ba317e832d net/mlx5: Destroy pool->mutex
Destroy pool->mutex when we destroy the pool.

Fixes: c36326d38d ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:56:55 -07:00
Shay Drory
5957cc557d net/mlx5: Set all field of mlx5_irq before inserting it to the xarray
Currently irq->pool is set after the irq is insert to the xarray.
Set irq->pool before the irq is inserted to the xarray.

Fixes: 71e084e264 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:56:52 -07:00
Shay Drory
3c8946e0e2 net/mlx5: Fix order of functions in mlx5_irq_detach_nb()
Change order of functions in mlx5_irq_detach_nb() so it will be
a mirror of mlx5_irq_attach_nb.

Fixes: 71e084e264 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:56:49 -07:00
Aya Levin
c85a6b8feb net/mlx5: Block switchdev mode while devlink traps are active
Since switchdev mode can't support  devlink traps, verify there are
no active devlink traps before moving eswitch to switchdev mode. If
there are active traps, prevent the switchdev mode configuration.

Fixes: eb3862a052 ("net/mlx5e: Enable traps according to link state")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:56:46 -07:00
Maxim Mikityanskiy
8ba3e4c858 net/mlx5e: Destroy page pool after XDP SQ to fix use-after-free
mlx5e_close_xdpsq does the cleanup: it calls mlx5e_free_xdpsq_descs to
free the outstanding descriptors, which relies on
mlx5e_page_release_dynamic and page_pool_release_page. However,
page_pool_destroy is already called by this point, because
mlx5e_close_rq runs before mlx5e_close_xdpsq.

This commit fixes the use-after-free by swapping mlx5e_close_xdpsq and
mlx5e_close_rq.

The commit cited below started calling page_pool_destroy directly from
the driver. Previously, the page pool was destroyed under a call_rcu
from xdp_rxq_info_unreg_mem_model, which would defer the deallocation
until after the XDPSQ is cleaned up.

Fixes: 1da4bbeffe ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:56:43 -07:00
Vlad Buslov
6d8680da2e net/mlx5: Bridge, fix ageing time
Ageing time is not converted from clock_t to jiffies which results
incorrect ageing timeout calculation in workqueue update task. Fix it by
applying clock_t_to_jiffies() to provided value.

Fixes: c636a0f0f3 ("net/mlx5: Bridge, dynamic entry ageing")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:56:40 -07:00
Roi Dayan
c623c95afa net/mlx5e: Avoid creating tunnel headers for local route
It could be local and remote are on the same machine and the route
result will be a local route which will result in creating encap id
with src/dst mac address of 0.

Fixes: a54e20b4fc ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:56:38 -07:00
Alex Vesker
d3875924da net/mlx5: DR, Add fail on error check on decap
While processing encapsulated packet on RX, one of the fields that is
checked is the inner packet length. If the length as specified in the header
doesn't match the actual inner packet length, the packet is invalid
and should be dropped. However, such packet caused the NIC to hang.

This patch turns on a 'fail_on_error' HW bit which allows HW to drop
such an invalid packet while processing RX packet and trying to decap it.

Fixes: ad17dc8cf9 ("net/mlx5: DR, Move STEv0 action apply logic")
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:56:34 -07:00
Leon Romanovsky
c633e79964 net/mlx5: Don't skip subfunction cleanup in case of error in module init
Clean SF resources if mlx5 eth failed to initialize.

Fixes: 1958fc2f07 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:56:31 -07:00
Leon Romanovsky
919d13a7e4 devlink: Set device as early as possible
All kernel devlink implementations call to devlink_alloc() during
initialization routine for specific device which is used later as
a parent device for devlink_register().

Such late device assignment causes to the situation which requires us to
call to device_register() before setting other parameters, but that call
opens devlink to the world and makes accessible for the netlink users.

Any attempt to move devlink_register() to be the last call generates the
following error due to access to the devlink->dev pointer.

[    8.758862]  devlink_nl_param_fill+0x2e8/0xe50
[    8.760305]  devlink_param_notify+0x6d/0x180
[    8.760435]  __devlink_params_register+0x2f1/0x670
[    8.760558]  devlink_params_register+0x1e/0x20

The simple change of API to set devlink device in the devlink_alloc()
instead of devlink_register() fixes all this above and ensures that
prior to call to devlink_register() everything already set.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-09 10:21:40 +01:00
Leon Romanovsky
82564f6c70 devlink: Simplify devlink port API calls
Devlink port already has pointer to the devlink instance and all API
calls that forward these devlink ports to the drivers perform same
"devlink_port->devlink" assignment before actual call.

This patch removes useless parameter and allows us in the future
to create specific devlink_port_ops to manage user space access with
reliable ops assignment.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-08 12:54:12 +01:00
Mark Bloch
598fe77df8 net/mlx5: Lag, Create shared FDB when in switchdev mode
If both eswitches are in switchdev mode and the uplink representors
are enslaved to the same bond device create a shared FDB configuration.

When moving to shared FDB mode not only the hardware needs be configured
but the RDMA driver needs to reconfigure itself.

When such change is done, unload the RDMA devices, configure the hardware
and load the RDMA representors.

When destroying the lag (can happen if a PCI function is unbinded,
driver is unloaded or by just removing a netdev from the bond) make sure
to restore the system to the previous state only if possible.

For example, if a PCI function is unbinded there is no need to load the
representors as the device is going away.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:50:52 -07:00
Mark Bloch
db202995f5 net/mlx5: E-Switch, add logic to enable shared FDB
Shared FDB allows to direct traffic from all the vports in the HCA to a
single eswitch. In order to do that three things are needed.

1) Point the ingress ACL of the slave uplink to that of the master.
   With this, wire traffic from both uplinks will reach the same eswitch
   with the same metadata where a single steering rule can catch traffic
   from both ports.

2) Set the FDB root flow table of the slave's eswitch to that of the
   master. As this flow table can change dynamically make sure to
   sync it on any set root flow table FDB command.
   This will make sure traffic from SFs, VFs, ECPFs and PFs reach the
   master eswitch.

3) Split wire traffic at the eswitch manager egress ACL so that it's
   directed to the native eswitch manager. We only treat wire traffic
   from both ports the same at the eswitch level. If such traffic wasn't
   handled in the eswitch it needs to reach the right representor to be
   processed by software. For example LACP packets should *always*
   reach the right uplink representor for correct operation.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:50:44 -07:00
Mark Bloch
63d4a9afbc net/mlx5: Lag, move lag destruction to a workqueue
If a netdev is removed from the lag the lag should be destroyed.
With downstream patches this might trigger a reconfiguration of
representors on a different eswitch and such we don't have the proper
locking to so from this path. Move the destruction to be done by the
workqueue.

As the destruction won't affect the netdev side it okay to do so.
The RDMA side will be reconfigured and it already coded to handle such
reconfiguration.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:25 -07:00
Mark Bloch
cac1eb2cf2 net/mlx5: Lag, properly lock eswitch if needed
Currently when doing hardware lag we check the eswitch mode
but as this isn't done under a lock the check isn't valid.

As the code needs to sync between two different devices an extra
care is needed.

- When going to change eswitch mode, if hardware lag is active destroy it.
- While changing eswitch modes block any hardware bond creation.
- Delay handling bonding events until there are no mode changes in
  progress.
- When attaching a new mdev to lag, block until there is no mode change
  in progress. In order for the mode change to finish the interface lock
  will have to be taken. Release the lock and sleep for 100ms to
  allow forward progress. As this is a very rare condition (can happen if
  the user unbinds and binds a PCI function while also changing eswitch
  mode of the other PCI function) it has no real world impact.

As taking multiple eswitch mode locks is now required lockdep will
complain about a possible deadlock. Register a key per eswitch to make
lockdep happy.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:25 -07:00
Mark Bloch
898b078615 net/mlx5: Add send to vport rules on paired device
When two mlx5 devices are paired in switchdev mode, always offload the
send-to-vport rule to the peer E-Switch. This allows to abstract
the logic when this is really necessary (single FDB) and combine
the logic of both cases into one.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:25 -07:00
Mark Bloch
c8e6a9e6d6 net/mlx5: E-Switch, Add event callback for representors
This callback will allow to notify representors about relevant events
when in OFFLOADS mode. In downstream patches, this will be used to notify
about PAIR/UNPAIR devcom events.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Roi Dayan
2198b93279 net/mlx5e: Use shared mappings for restoring from metadata
FTEs are added with mapped metadata which is saved per eswitch.
When uplink reps are bonded and we are in a single FDB mode,
we could fail to find metadata which was stored on one eswitch mapping
but not the other or with a different id.
To resolve this issue use shared mapping between eswitch ports.
We do not have any conflict using a single mapping, for a type,
between the ports.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Roi Dayan
5d5defd6b8 net/mlx5e: Add an option to create a shared mapping
The shared mapping is identified by an id and type.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Ariel Levkovich
d044425403 net/mlx5: E-Switch, set flow source for send to uplink rule
Set the flow source param to local vport for the uplink rep
send-to-vport rule.

This will comply with the recent changes in SW steering that
use the flow source as an indication for the rule type - rx or tx.

Since the uplink send-to-vport rule is forwarding traffic to the wire
it has to indicate that it is an sx rule and can't use the any port
value in the flow source.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Mark Bloch
979bf468fc {net, RDMA}/mlx5: Extend send to vport rules
In shared FDB there is only one eswitch which is active and it receives
traffic from all representors and all vports in the HCA.

While the Ethernet representor will always reside on its native PF
the IB representor will not. Extend send to vport rule creation to
support such flows. Need to account for source vport that sends the
traffic (on which the representors resides) and the target eswitch
the traffic which reach.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Mark Bloch
af8c0e25f2 net/mlx5: Lag, add initial logic for shared FDB
As shared FDB requires changes in two subsystems first expose the needed
core functions so the RDMA side can be changed.

mlx5_lag_is_master(): return true if a given mlx5 device is the lag master.
mlx5_lag_is_shared_fdb(): Returns true if the lag mode is shared FDB.
mlx5_lag_get_peer_mdev(): Return the peer mdev in lag.

The mentioned functions will be used by downstream patches in order
to add support for shared FDB for the RDMA side.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Mark Bloch
97a8a8c1f9 net/mlx5: Return mdev from eswitch
Export a function so users can retrieve the mellanox device that manages
the eswitch from the eswitch device.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:23 -07:00
David S. Miller
fa976624ae mlx5-updates-2021-08-02
This patch-set changes the TTC (Traffic Type Classification) logic
 to be independent from the mlx5 ethernet driver by renaming the traffic
 types enums and making the TTC API generic to the mlx5 core driver.
 
 It allows to decouple TTC logic from mlx5e and reused by other parts
 of mlx5 drivers, namely ADQ and lag TX steering hashing.
 
 Patches overview:
 1 - Rename traffic type enums to be mlx5 generic.
 2 - Rename related TTC arguments and functions.
 3 - Remove dependency in the mlx5e driver from the TTC implementation.
 4 - Move TTC logic to fs_ttc.
 5 - Embed struct mlx5_ttc_table in fs_ttc.
 
 The refactoring series is followed by misc' cleanup patches.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmEIqVYACgkQSD+KveBX
 +j7CzQf/YpHlfRec0HLmVnSksw/TLh3MVl+Swd+cJRTku6JPF/zdhsJ9AbBxZIGh
 U1cD+B2Fapqh4rAKHjlIgXKx0vMTyQxfEW/z83PSKINvlKJKEnT2hUjzT+3I04oR
 1Wtd0FpgzBgfwKOwktnGr8i0zJ4k/kdF7PlQiN8XDbsI4u2jIII1sjwdrC1O0W24
 29HQJJtKvXFEVLDI0ZdiuAHS6e57M9hAqp6SJkTukSpLXj7gNUfTPyoghk4ao0CF
 1zmyuNjiojcltafqK1hZNdRO2ZwlxeRWNY/7ntVbHIpFEDnoDes8EDTzISpIPejH
 aZdwjnVzBEOcztmc/RskJ1N7JktPdA==
 =VFaz
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-08-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
This patch-set changes the TTC (Traffic Type Classification) logic
to be independent from the mlx5 ethernet driver by renaming the traffic
types enums and making the TTC API generic to the mlx5 core driver.

It allows to decouple TTC logic from mlx5e and reused by other parts
of mlx5 drivers, namely ADQ and lag TX steering hashing.

Patches overview:
1 - Rename traffic type enums to be mlx5 generic.
2 - Rename related TTC arguments and functions.
3 - Remove dependency in the mlx5e driver from the TTC implementation.
4 - Move TTC logic to fs_ttc.
5 - Embed struct mlx5_ttc_table in fs_ttc.

The refactoring series is followed by misc' cleanup patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03 11:16:13 +01:00
Jiapeng Chong
bcd68c04c7 net/mlx5: Fix missing return value in mlx5_devlink_eswitch_inline_mode_set()
The return value is missing in this code scenario, add the return value
'0' to the return value 'err'.

Eliminate the follow smatch warning:

drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:3083
mlx5_devlink_eswitch_inline_mode_set() warn: missing error code 'err'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: 8e0aa4bc95 ("net/mlx5: E-switch, Protect eswitch mode changes")
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:29 -07:00
Roi Dayan
25f150f4bb net/mlx5e: Return -EOPNOTSUPP if more relevant when parsing tc actions
Instead of returning -EINVAL.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:29 -07:00
Roi Dayan
97a8d29ae9 net/mlx5e: Remove redundant assignment of counter to null
counter is being initialized before being used.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:29 -07:00
Roi Dayan
c6cfe1137f net/mlx5e: Remove redundant parse_attr arg
Passing parse_attr is redundant in parse_tc_nic_actions() and
mlx5e_tc_add_nic_flow() as we can get it from flow.
This is the same as with parse_tc_fdb_actions() and mlx5e_tc_add_fdb_flow().

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:28 -07:00
Roi Dayan
950b4df9fb net/mlx5e: Remove redundant cap check for flow counter
The cap is very old and today will always exists.
The cap is not being checked anywhere else. Remove the check from
drop action when parsing tc rules in nic mode.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:28 -07:00
Roi Dayan
70f8019e7b net/mlx5e: Remove redundant filter_dev arg from parse_tc_fdb_actions()
filter_dev is saved in parse_attr. and being used in other cases from
there. use it also for the leftover case.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:28 -07:00
Roi Dayan
696ceeb203 net/mlx5e: Remove redundant tc act includes
Since the code changed to use the flow action infra
there is no usage of tcf values from those includes.
Remove those.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:27 -07:00
Maor Gottlieb
f4b45940e9 net/mlx5: Embed mlx5_ttc_table
mlx5_ttc_table struct shouldn't be exposed to the users so
this patch make it internal to ttc.

In addition add a getter function to get the TTC flow table for users
that need to add a rule which points on it.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
2021-08-02 19:26:27 -07:00
Maor Gottlieb
371cf74e78 net/mlx5: Move TTC logic to fs_ttc
Now that TTC logic is not dependent on mlx5e structs, move it to
lib/fs_ttc.c so it could be used other part of the mlx5 driver.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:26 -07:00
Maor Gottlieb
bc29764ed9 net/mlx5e: Decouple TTC logic from mlx5e
Remove dependency in the mlx5e driver from the TTC implementation
by changing the TTC related functions to receive mlx5 generic arguments.
It allows to decouple TTC logic from mlx5e and reused by other parts of
mlx5 driver.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:26 -07:00
Maor Gottlieb
5fba089e96 net/mlx5e: Rename some related TTC args and functions
Since TTC logic is going to be moved to a separate file, make the
relevant functions and arguments that used by TTC to be mlx5 generic.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:26 -07:00
Maor Gottlieb
d443c6f684 net/mlx5e: Rename traffic type enums
Rename traffic type enums as part of the preparation for moving
the traffic type logic to a separate file.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:25 -07:00
Maxim Mikityanskiy
3ac90dec3a net/mlx5e: Allocate the array of channels according to the real max_nch
The channels array in struct mlx5e_rx_res is converted to a dynamic one,
which will use the dynamic value of max_nch instead of
implementation-defined maximum of MLX5E_MAX_NUM_CHANNELS.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:25 -07:00
Maxim Mikityanskiy
43ec0f41fa net/mlx5e: Hide all implementation details of mlx5e_rx_res
This commit moves all implementation details of struct mlx5e_rx_res
under en/rx_res.c. All access to RX resources is now done using methods.
Encapsulating RX resources into an object allows for better
manageability, because all the implementation details are now in a
single place, and external code can use only a limited set of API
methods to init/teardown the whole thing, reconfigure RSS and LRO
parameters, connect TIRs to flow steering and activate/deactivate TIRs.

mlx5e_rx_res is self-contained and doesn't depend on struct mlx5e_priv
or include en.h.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:24 -07:00
Maxim Mikityanskiy
e6e01b5fdc net/mlx5e: Introduce mlx5e_channels API to get RQNs
Currently, struct mlx5e_channels is defined in en.h, along with a lot of
other stuff. In the following commit mlx5e_rx_res will need to get RQNs
(RQ hardware IDs), given a pointer to mlx5e_channels and the channel
index. In order to make it possible without including the whole en.h,
this commit introduces functions that will hide the implementation
details of mlx5e_channels.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:24 -07:00
Maxim Mikityanskiy
43befe99bc net/mlx5e: Use a new initializer to build uniform indir table
Replace mlx5e_build_default_indir_rqt with a new initializer of struct
mlx5e_rss_params_indir that works directly with the struct, rather than
its internals.

The new initializer is called mlx5e_rss_params_indir_init_uniform, which
also reflects the purpose (uniform spreading) better.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:24 -07:00
Colin Ian King
7cdd0a89ec net/mlx4: make the array states static const, makes object smaller
Don't populate the array states on the stack but instead it
static const. Makes the object code smaller by 79 bytes.

Before:
   text   data   bss    dec    hex filename
  21309   8304   192  29805   746d drivers/net/ethernet/mellanox/mlx4/qp.o

After:
   text   data   bss    dec    hex filename
  21166   8368   192  29726   741e drivers/net/ethernet/mellanox/mlx4/qp.o

(gcc version 10.2.0)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210801153742.147304-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-02 15:02:13 -07:00
Jakub Kicinski
d2e11fd2b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicting commits, all resolutions pretty trivial:

drivers/bus/mhi/pci_generic.c
  5c2c853159 ("bus: mhi: pci-generic: configurable network interface MRU")
  56f6f4c4eb ("bus: mhi: pci_generic: Apply no-op for wake using sideband wake boolean")

drivers/nfc/s3fwrn5/firmware.c
  a0302ff590 ("nfc: s3fwrn5: remove unnecessary label")
  46573e3ab0 ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")
  801e541c79 ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")

MAINTAINERS
  7d901a1e87 ("net: phy: add Maxlinear GPY115/21x/24x driver")
  8a7b46fa79 ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-31 09:14:46 -07:00
Leon Romanovsky
35f6986743 net/mlx5: Don't rely on always true registered field
Devlink is an integral part of mlx5 driver and all flows ensure that
devlink_*_register() will success. That makes the ->registered check
an obsolete.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28 10:23:45 +01:00
Chris Mi
740452e09c net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32
The offending refactor commit uses u16 chain wrongly. Actually, it
should be u32.

Fixes: c620b77215 ("net/mlx5: Refactor tc flow attributes structure")
CC: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:06 -07:00
Dima Chumak
b1c2f6312c net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()
The result of __dev_get_by_index() is not checked for NULL and then gets
dereferenced immediately.

Also, __dev_get_by_index() must be called while holding either RTNL lock
or @dev_base_lock, which isn't satisfied by mlx5e_hairpin_get_mdev() or
its callers. This makes the underlying hlist_for_each_entry() loop not
safe, and can have adverse effects in itself.

Fix by using dev_get_by_index() and handling nullptr return value when
ifindex device is not found. Update mlx5e_hairpin_get_mdev() callers to
check for possible PTR_ERR() result.

Fixes: 77ab67b7f0 ("net/mlx5e: Basic setup of hairpin object")
Addresses-Coverity: ("Dereference null return value")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:05 -07:00
Aya Levin
7f331bf0f0 net/mlx5: Unload device upon firmware fatal error
When fw_fatal reporter reports an error, the firmware in not responding.
Unload the device to ensure that the driver closes all its resources,
even if recovery is not due (user disabled auto-recovery or reporter is
in grace period). On successful recovery the device is loaded back up.

Fixes: b3bd076f75 ("net/mlx5: Report devlink health on FW fatal issues")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:05 -07:00
Aya Levin
678b1ae1af net/mlx5e: Fix page allocation failure for ptp-RQ over SF
Set the correct pci-device pointer to the ptp-RQ. This allows access to
dma_mask and avoids allocation request with wrong pci-device.

Fixes: a099da8ffc ("net/mlx5e: Add RQ to PTP channel")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:05 -07:00
Aya Levin
497008e783 net/mlx5e: Fix page allocation failure for trap-RQ over SF
Set the correct device pointer to the trap-RQ, to allow access to
dma_mask and avoid allocation request with the wrong pci-dev.

WARNING: CPU: 1 PID: 12005 at kernel/dma/mapping.c:151 dma_map_page_attrs+0x139/0x1c0
...
all Trace:
<IRQ>
? __page_pool_alloc_pages_slow+0x5a/0x210
mlx5e_post_rx_wqes+0x258/0x400 [mlx5_core]
mlx5e_trap_napi_poll+0x44/0xc0 [mlx5_core]
__napi_poll+0x24/0x150
net_rx_action+0x22b/0x280
__do_softirq+0xc7/0x27e
do_softirq+0x61/0x80
</IRQ>
__local_bh_enable_ip+0x4b/0x50
mlx5e_handle_action_trap+0x2dd/0x4d0 [mlx5_core]
blocking_notifier_call_chain+0x5a/0x80
mlx5_devlink_trap_action_set+0x8b/0x100 [mlx5_core]

Fixes: 5543e989fe ("net/mlx5e: Add trap entity to ETH driver")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:04 -07:00
Aya Levin
a759f845d1 net/mlx5e: Consider PTP-RQ when setting RX VLAN stripping
Add PTP-RQ to the loop when setting rx-vlan-offload feature via ethtool.
On PTP-RQ's creation, set rx-vlan-offload into its parameters.

Fixes: a099da8ffc ("net/mlx5e: Add RQ to PTP channel")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:04 -07:00
Maxim Mikityanskiy
9841d58f35 net/mlx5e: Add NETIF_F_HW_TC to hw_features when HTB offload is available
If a feature flag is only present in features, but not in hw_features,
the user can't reset it. Although hw_features may contain NETIF_F_HW_TC
by the point where the driver checks whether HTB offload is supported,
this flag is controlled by another condition that may not hold. Set it
explicitly to make sure the user can disable it.

Fixes: 214baf2287 ("net/mlx5e: Support HTB offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:03 -07:00
Tariq Toukan
e2351e5170 net/mlx5e: RX, Avoid possible data corruption when relaxed ordering and LRO combined
When HW aggregates packets for an LRO session, it writes the payload
of two consecutive packets of a flow contiguously, so that they usually
share a cacheline.

The first byte of a packet's payload is written immediately after
the last byte of the preceding packet.
In this flow, there are two consecutive write requests to the shared
cacheline:
1. Regular write for the earlier packet.
2. Read-modify-write for the following packet.

In case of relaxed-ordering on, these two writes might be re-ordered.
Using the end padding optimization (to avoid partial write for the last
cacheline of a packet) becomes problematic if the two writes occur
out-of-order, as the padding would overwrite payload that belongs to
the following packet, causing data corruption.

Avoid this by disabling the end padding optimization when both
LRO and relaxed-ordering are enabled.

Fixes: 17347d5430 ("net/mlx5e: Add support for PCI relaxed ordering")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:03 -07:00
Roi Dayan
dd3fddb827 net/mlx5: E-Switch, handle devcom events only for ports on the same device
This is the same check as LAG mode checks if to enable lag.
This will fix adding peer miss rules if lag is not supported
and even an incorrect rules in socket direct mode.

Also fix the incorrect comment on mlx5_get_next_phys_dev() as flow #1
doesn't exists.

Fixes: ac004b8321 ("net/mlx5e: E-Switch, Add peer miss rules")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:03 -07:00
Maor Dickman
c671972534 net/mlx5: E-Switch, Set destination vport vhca id only when merged eswitch is supported
Destination vport vhca id is valid flag is set only merged eswitch isn't supported.
Change destination vport vhca id value to be set also only when merged eswitch
is supported.

Fixes: e4ad91f23f ("net/mlx5e: Split offloaded eswitch TC rules for port mirroring")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:02 -07:00
Maor Dickman
90b22b9bcd net/mlx5e: Disable Rx ntuple offload for uplink representor
Rx ntuple offload is not supported in switchdev mode.
Tryng to enable it cause kernel panic.

 BUG: kernel NULL pointer dereference, address: 0000000000000008
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 80000001065a5067 P4D 80000001065a5067 PUD 106594067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 7 PID: 1089 Comm: ethtool Not tainted 5.13.0-rc7_for_upstream_min_debug_2021_06_23_16_44 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:mlx5e_arfs_enable+0x70/0xd0 [mlx5_core]
 Code: 44 24 10 00 00 00 00 48 c7 44 24 18 00 00 00 00 49 63 c4 48 89 e2 44 89 e6 48 69 c0 20 08 00 00 48 89 ef 48 03 85 68 ac 00 00 <48> 8b 40 08 48 89 44 24 08 e8 d2 aa fd ff 48 83 05 82 96 18 00 01
 RSP: 0018:ffff8881047679e0 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000004000000000 RCX: 0000004000000000
 RDX: ffff8881047679e0 RSI: 0000000000000000 RDI: ffff888115100880
 RBP: ffff888115100880 R08: ffffffffa00f6cb0 R09: ffff888104767a18
 R10: ffff8881151000a0 R11: ffff888109479540 R12: 0000000000000000
 R13: ffff888104767bb8 R14: ffff888115100000 R15: ffff8881151000a0
 FS:  00007f41a64ab740(0000) GS:ffff8882f5dc0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000008 CR3: 0000000104cbc005 CR4: 0000000000370ea0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  set_feature_arfs+0x1e/0x40 [mlx5_core]
  mlx5e_handle_feature+0x43/0xa0 [mlx5_core]
  mlx5e_set_features+0x139/0x1b0 [mlx5_core]
  __netdev_update_features+0x2b3/0xaf0
  ethnl_set_features+0x176/0x3a0
  ? __nla_parse+0x22/0x30
  genl_family_rcv_msg_doit+0xe2/0x140
  genl_rcv_msg+0xde/0x1d0
  ? features_reply_size+0xe0/0xe0
  ? genl_get_cmd+0xd0/0xd0
  netlink_rcv_skb+0x4e/0xf0
  genl_rcv+0x24/0x40
  netlink_unicast+0x1f6/0x2b0
  netlink_sendmsg+0x225/0x450
  sock_sendmsg+0x33/0x40
  __sys_sendto+0xd4/0x120
  ? __sys_recvmsg+0x4e/0x90
  ? exc_page_fault+0x219/0x740
  __x64_sys_sendto+0x25/0x30
  do_syscall_64+0x3f/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f41a65b0cba
 Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
 RSP: 002b:00007ffd8d688358 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 00000000010f42a0 RCX: 00007f41a65b0cba
 RDX: 0000000000000058 RSI: 00000000010f43b0 RDI: 0000000000000003
 RBP: 000000000047ae60 R08: 00007f41a667c000 R09: 000000000000000c
 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000010f4340
 R13: 00000000010f4350 R14: 00007ffd8d688400 R15: 00000000010f42a0
 Modules linked in: mlx5_vdpa vhost_iotlb vdpa xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core ptp pps_core fuse
 CR2: 0000000000000008
 ---[ end trace c66523f2aba94b43 ]---

Fixes: 7a9fb35e8c ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:02 -07:00
Maor Gottlieb
8b54874ef1 net/mlx5: Fix flow table chaining
Fix a bug when flow table is created in priority that already
has other flow tables as shown in the below diagram.
If the new flow table (FT-B) has the lowest level in the priority,
we need to connect the flow tables from the previous priority (p0)
to this new table. In addition when this flow table is destroyed
(FT-B), we need to connect the flow tables from the previous
priority (p0) to the next level flow table (FT-C) in the same
priority of the destroyed table (if exists).

                       ---------
                       |root_ns|
                       ---------
                            |
            --------------------------------
            |               |              |
       ----------      ----------      ---------
       |p(prio)-x|     |   p-y  |      |   p-n |
       ----------      ----------      ---------
            |               |
     ----------------  ------------------
     |ns(e.g bypass)|  |ns(e.g. kernel) |
     ----------------  ------------------
            |            |           |
	-------	       ------       ----
        |  p0 |        | p1 |       |p2|
        -------        ------       ----
           |             |    \
        --------       ------- ------
        | FT-A |       |FT-B | |FT-C|
        --------       ------- ------

Fixes: f90edfd279 ("net/mlx5_core: Connect flow tables")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27 16:10:01 -07:00
Arnd Bergmann
a76053707d dev_ioctl: split out ndo_eth_ioctl
Most users of ndo_do_ioctl are ethernet drivers that implement
the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.

Separate these from the few drivers that use ndo_do_ioctl to
implement SIOCBOND, SIOCBR and SIOCWANDEV commands.

This is a purely cosmetic change intended to help readers find
their way through the implementation.

Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Vladimir Oltean <olteanv@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27 20:11:45 +01:00
David S. Miller
9bff668419 mlx5-updates-2021-07-24
This series aims to reduce coupling in mlx5e, particularly between RX
 resources (TIRs, RQTs) and numerous code units that use them.
 
 This refactoring is required for upcoming features, ADQ and TX lag
 hashing.
 
 The issue with the current code is that TIRs and RQTs are unmanaged,
 different places all over the driver create, destroy, track and
 configure them, often in an uncoordinated way. The responsibilities of
 different units become vague, leading to a lot of hidden dependencies
 between numerous units and tight coupling between them, which is prone
 to bugs and hard to maintain.
 
 The result of this refactoring is:
 
 1. Creating a manager for RX resources, that controls their lifecycle
 and provides a clear API, which restricts the set of actions that other
 units can do.
 
 2. Using object-oriented approach for TIRs, RQTs and RX resource
 manager (struct mlx5e_rx_res).
 
 3. Fixing a few bugs and misbehaviors found during the refactoring.
 
 4. Reducing the amount of dependencies, removing hidden dependencies,
 making them one-directional and organizing the code in clear abstraction
 layers.
 
 5. Explicitly exposing the remaining weird dependencies.
 
 6. Simplifying and organizing code that creates and modifies TIRs and
 RQTs.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmD+5+EACgkQSD+KveBX
 +j6l/Qf9FHU8xRCBW0Bn4Ja0CnS/C3MJMIBTRbe+Qq2noj6qr8y6ut8Z7jV3hGaw
 dvcA6bVxpWPsqrY5CX4errfo2T4QiAsYhIeXrE2liCZsMUeYeZLusAYquks0oo57
 ai2FnihMvL5Xf9U2r+aAeT4dxpQBJ7ANON+b+ZRCFjwRl8CZkd7i4pnifLYpptye
 d6de6QSZkvxc818homiALdu0VH8HOBZALzfMI9y2Oh9peW+irc2UhEoThukD5VCD
 Turm6WBQ4xteQq3uNKebbqg0cEevyQjHoUmBg799xD0tHzFbSp+kV3ivHKxsksUF
 sOBdEumQjDc6J22uTnA086dIStN7sA==
 =jxI/
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-updates-2021-07-24

This series aims to reduce coupling in mlx5e, particularly between RX
resources (TIRs, RQTs) and numerous code units that use them.

This refactoring is required for upcoming features, ADQ and TX lag
hashing.

The issue with the current code is that TIRs and RQTs are unmanaged,
different places all over the driver create, destroy, track and
configure them, often in an uncoordinated way. The responsibilities of
different units become vague, leading to a lot of hidden dependencies
between numerous units and tight coupling between them, which is prone
to bugs and hard to maintain.

The result of this refactoring is:

1. Creating a manager for RX resources, that controls their lifecycle
and provides a clear API, which restricts the set of actions that other
units can do.

2. Using object-oriented approach for TIRs, RQTs and RX resource
manager (struct mlx5e_rx_res).

3. Fixing a few bugs and misbehaviors found during the refactoring.

4. Reducing the amount of dependencies, removing hidden dependencies,
making them one-directional and organizing the code in clear abstraction
layers.

5. Explicitly exposing the remaining weird dependencies.

6. Simplifying and organizing code that creates and modifies TIRs and
RQTs.

Saeed Mahameed says:

====================
mlx5 updates 2021-07-24

This series provides some refactoring to mlx5e RX resource management,
it is required for upcoming ADQ and TX lag hashing features.

The first two patches in this series :
  net/mlx5e: Prohibit inner indir TIRs in IPoIB
  net/mlx5e: Block LRO if firmware asks for tunneled LRO
Were supposed to go to net, but due to dependency and timing they were
included here.
I would appreciate it if you'd apply them to net and mark for -stable.

For more information please see tag log below.

Please pull and let me know if there is any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26 22:32:24 +01:00
Maxim Mikityanskiy
09f8356918 net/mlx5e: Use the new TIR API for kTLS
One of the previous commits introduced a dedicated object for a TIR.
kTLS code creates a TIR per connection using the low-level mlx5_core
API. This commit converts it to the new mlx5e_tir API.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:41 -07:00
Maxim Mikityanskiy
65d6b6e5a5 net/mlx5e: Move management of indir traffic types to rx_res
This commit moves the responsibility of keeping the RSS configuration
for different traffic types to en/rx_res.{c,h}, hiding the
implementation details behind the new getters, and abandons all usage of
struct mlx5e_tirc_config, which is no longer useful and superseded by
struct mlx5e_rss_params_traffic_type.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:41 -07:00
Maxim Mikityanskiy
a6696735d6 net/mlx5e: Convert TIR to a dedicated object
Code related to TIR is now encapsulated into a dedicated object and put
into new files en/tir.{c,h}. All usages are converted.

The Builder pattern is used to initialize a TIR. It allows to create a
multitude of different configurations, turning on and off some specific
features in different combinations, without having long parameter lists,
initializers per usage and repeating code in initializers.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:40 -07:00
Maxim Mikityanskiy
6fe5ff2c77 net/mlx5e: Create struct mlx5e_rss_params_hash
This commit introduces a new struct to store RSS hash parameters: hash
function and hash key. The existing usages are changed to use the new
struct.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:40 -07:00
Maxim Mikityanskiy
4b3e42eecb net/mlx5e: Remove mdev from mlx5e_build_indir_tir_ctx_common()
In order to drop a dependency to mdev and make the function more
universal, stop passing mdev to mlx5e_build_indir_tir_ctx_common() and
pass transport domain directly instead. It also prepares this function
to be used in other contexts that need a custom transport domain, such
as hairpin.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:40 -07:00
Maxim Mikityanskiy
a402e3a747 net/mlx5e: Remove lro_param from mlx5e_build_indir_tir_ctx_common()
In order to reduce the list of parameters and to define clearer
responsibility for mlx5e_build_indir_tir_ctx_common(), stop passing
lro_param and instead call mlx5e_build_tir_ctx_lro() directly where
needed.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:39 -07:00
Maxim Mikityanskiy
983c9da2b1 net/mlx5e: Remove mlx5e_priv usage from mlx5e_build_*tir_ctx*()
The functions that build TIR context for TIR create and modify commands
used to depend on struct mlx5e_priv and fetch some values directly from
different places. It increased coupling of code and the chance of weird
misbehavior due to hidden complex dependencies.

As the first step, this commit removes the priv parameter from these
functions. Instead, the necessary values are passed directly.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:39 -07:00
Maxim Mikityanskiy
093d4bc173 net/mlx5e: Use mlx5e_rqt_get_rqtn to access RQT hardware id
In order to abstract from implementation details of mlx5e_rqt, use the
mlx5e_rqt_get_rqtn getter instead of accessing the field directly.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:39 -07:00
Maxim Mikityanskiy
0570c1c958 net/mlx5e: Take RQT out of TIR and group RX resources
RQT is not part of TIR, as multiple TIRs may point to the same RQT, as
it happens with indir_tir and inner_indir_tir. These instances of a TIR
don't use the embedded RQT.

This commit takes RQT out of TIR, making them independent. The RQTs are
placed into struct mlx5e_rx_res, and items in that struct are regrouped
by functionality: RSS, channels and PTP.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:39 -07:00
Maxim Mikityanskiy
3f22d6c77b net/mlx5e: Move RX resources to a separate struct
This commit moves RQTs and TIRs to a separate struct that is allocated
dynamically in profiles that support these RX resources (all profiles,
except IPoIB PKey). It also allows to remove rqt_enabled flags, as RQTs
are always enabled in profiles that support RX resources.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:38 -07:00
Maxim Mikityanskiy
4ad3184977 net/mlx5e: Move mlx5e_build_rss_params() call to init_rx
RSS params belong to the RX side initialization. Move them from
profile->init to profile->init_rx stage to allow the next commit to move
rss_params out of priv to a dynamically-allocated struct.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:38 -07:00
Maxim Mikityanskiy
06e9f13ac5 net/mlx5e: Convert RQT to a dedicated object
Code related to RQT is now encapsulated into a dedicated object and put
into new files en/rqt.{c,h}. All usages are converted.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:37 -07:00
Maxim Mikityanskiy
bc5506a166 net/mlx5e: Check if inner FT is supported outside of create/destroy functions
Move the mlx5e_tunnel_inner_ft_supported() check for inner flow tables
support outside of mlx5e_create_inner_ttc_table() and
mlx5e_destroy_inner_ttc_table(). It allows to avoid accessing invalid
TIRNs of inner indirect TIRs. In a later commit these accesses will be
replaced by getters that will WARN if inner indirect TIRs don't exist.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:37 -07:00
Maxim Mikityanskiy
69994ef3da net/mlx5: Take TIR destruction out of the TIR list lock
res->td.list_lock protects the list of TIRs. There is no point to call
mlx5_core_destroy_tir() and invoke a firmware command under this lock.
This commit moves this call outside of the lock and puts it after
deleting the TIR from the list to ensure that TIRs are always alive
while in the list.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:37 -07:00
Maxim Mikityanskiy
26ab7b3845 net/mlx5e: Block LRO if firmware asks for tunneled LRO
This commit does a cleanup in LRO configuration.

LRO is a parameter of an RQ, but its state is changed by modifying a TIR
related to the RQ.

The current status: LRO for tunneled packets is not supported in the
driver, inner TIRs may enable LRO on creation, but LRO status of inner
TIRs isn't changed in mlx5e_modify_tirs_lro(). This is inconsistent, but
as long as the firmware doesn't declare support for tunneled LRO, it
works, because the same RQs are shared between the inner and outer TIRs.

This commit does two fixes:

1. If the firmware has the tunneled LRO capability, LRO is blocked
altogether, because it's not possible to block it for inner TIRs only,
when the same RQs are shared between inner and outer TIRs, and the
driver won't be able to handle tunneled LRO traffic.

2. mlx5e_modify_tirs_lro() is patched to modify LRO state for all TIRs,
including inner ones, because all TIRs related to an RQ should agree on
their LRO state.

Fixes: 7b3722fa9e ("net/mlx5e: Support RSS for GRE tunneled packets")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:37 -07:00
Maxim Mikityanskiy
9c43f3865c net/mlx5e: Prohibit inner indir TIRs in IPoIB
TIR's rx_hash_field_selector_inner can be enabled only when
tunneled_offload_en = 1. tunneled_offload_en is filled according to the
tunneled_offload_en field in struct mlx5e_params, which is false in the
IPoIB profile. On the other hand, the IPoIB profile passes inner_ttc =
true to mlx5e_create_indirect_tirs, which potentially allows the latter
function to attempt to create inner indirect TIRs without having
tunneled_offload_en set.

This commit prohibits this behavior by passing inner_ttc = false to
mlx5e_create_indirect_tirs. The latter function won't attempt to create
inner indirect TIRs.

As inner indirect TIRs are not created in the IPoIB profile (this commit
blocks it explicitly, and even before they would have failed to be
created), the call to mlx5e_create_inner_ttc_table in
mlx5i_create_flow_steering is a no-op and can be removed.

Fixes: 46dc933cee ("net/mlx5e: Provide explicit directive if to create inner indirect tirs")
Fixes: 458821c72b ("net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steering")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:36 -07:00
Jiapeng Chong
7e4960b3d6 mlx4: Fix missing error code in mlx4_load_one()
The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'err'.

Eliminate the follow smatch warning:

drivers/net/ethernet/mellanox/mlx4/main.c:3538 mlx4_load_one() warn:
missing error code 'err'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: 7ae0e400cd ("net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs")
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-25 10:47:24 +01:00
Tal Gilboa
616d576934 IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
is_apu_thread_cq() used to detect CQs which are attached to APU
threads. This was extended to support other elements as well,
so the function was renamed to is_apu_cq().

c_eqn_or_apu_element was extended from 8 bits to 32 bits, which wan't
reflected when the APU support was first introduced.

Acked-by: Michael S. Tsirkin <mst@redhat.com> # vdpa
Signed-off-by: Tal Gilboa <talgi@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-07-25 11:39:04 +03:00
Tobias Waldekranz
472111920f net: bridge: switchdev: allow the TX data plane forwarding to be offloaded
Allow switchdevs to forward frames from the CPU in accordance with the
bridge configuration in the same way as is done between bridge
ports. This means that the bridge will only send a single skb towards
one of the ports under the switchdev's control, and expects the driver
to deliver the packet to all eligible ports in its domain.

Primarily this improves the performance of multicast flows with
multiple subscribers, as it allows the hardware to perform the frame
replication.

The basic flow between the driver and the bridge is as follows:

- When joining a bridge port, the switchdev driver calls
  switchdev_bridge_port_offload() with tx_fwd_offload = true.

- The bridge sends offloadable skbs to one of the ports under the
  switchdev's control using skb->offload_fwd_mark = true.

- The switchdev driver checks the skb->offload_fwd_mark field and lets
  its FDB lookup select the destination port mask for this packet.

v1->v2:
- convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long
- introduce a static key "br_switchdev_fwd_offload_used" to minimize the
  impact of the newly introduced feature on all the setups which don't
  have hardware that can make use of it
- introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache
  line access
- reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in
  __br_forward()
- do not strip VLAN on egress if forwarding offload on VLAN-aware bridge
  is being used
- propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP

v2->v3:
- replace the solution based on .ndo_dfwd_add_station with a solution
  based on switchdev_bridge_port_offload
- rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD
v3->v4: rebase
v4->v5:
- make sure the static key is decremented on bridge port unoffload
- more function and variable renaming and comments for them:
  br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload
  br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload
  nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom
  nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload
  fwd_accel to tx_fwd_offload

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23 16:32:37 +01:00