Commit Graph

4492 Commits

Author SHA1 Message Date
Shay Drory 5256a46bf5 net/mlx5: Introduce control IRQ request API
Currently, IRQ layer have a separate flow for ctrl and comp IRQs, and
the distinction between ctrl and comp IRQs is done in the IRQ layer.

In order to ease the coding and maintenance of the IRQ layer,
introduce a new API for requesting control IRQs -
mlx5_ctrl_irq_request(struct mlx5_core_dev *dev).

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:51 -08:00
Saeed Mahameed 20f80ffced net/mlx5: mlx5e_hv_vhca_stats_create return type to void
Callers of this functions ignore its return value, as reported by
Wang Qing, in one of the return paths, it returns positive values.

Since return value is ignored anyways, void out the return type of the
function.

Reported-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:50 -08:00
Jakub Kicinski 7d714ff14d net: fixup build after bpf header changes
Recent bpf-next merge brought in header changes which uncovered
includes missing in net-next which were not present in bpf-next.
Build problems happen only on less-popular arches like hppa,
sparc, alpha etc.

I could repro the build problem with ice but not the mlx5 problem
Abdul was reporting. mlx5 does look like it should include filter.h,
anyway.

Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Fixes: e63a023489 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")
Link: https://lore.kernel.org/all/7c03768d-d948-c935-a7ab-b1f963ac7eed@linux.vnet.ibm.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04 12:34:19 +00:00
Paul Blakey c9c079b4de net/mlx5: CT: Set flow source hint from provided tuple device
Get originating device from tuple offload metadata match ingress_ifindex,
and set flow_source hint to either LOCAL for vf/sf reps, UPLINK for
uplink/wire/tunnel devices/bond, or ANY (as before this patch)
for all others.

This allows lower layer (software steering or firmware) to insert the tuple
rule only in one table (either rx or tx) instead of two (rx and tx).

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-04 12:12:56 +00:00
David S. Miller e63a023489 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31 14:35:40 +00:00
Yevgeny Kliteynik aa36c94853 net/mlx5: Set SMFS as a default steering mode if device supports it
Set SMFS (SW-managed flow steering) as a default steering mode
instead of DMFS (device-managed flow steering)

In SMFS, the driver writes the STEs (Steering Table Entries) directly
to the device's ICM, which allows for a higher rule insertion rate
than through using FW command interface, as it is done in DMFS.

SMFS/DMFS steering modes can be configured through devlink param
'flow_steering_mode'. The possible values are 'smfs' or 'dmfs'.
The desired 'flow_steering_mode' param value should be set before
enabling switchdev mode.

Example:

  # devlink dev param set pci/0000:05:00.0 name flow_steering_mode smfs
  # devlink dev eswitch set pci/0000:05:00.0 mode switchdev

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:17:44 -08:00
Yevgeny Kliteynik 4ff725e1d4 net/mlx5: DR, Ignore modify TTL if device doesn't support it
When modifying TTL, packet's csum has to be recalculated.
Due to HW issue in ConnectX-5, csum recalculation for modify TTL
is supported through a work-around that is specifically enabled
by configuration.
If the work-around isn't enabled, ignore the modify TTL action
rather than adding an unsupported action.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:17:41 -08:00
Yevgeny Kliteynik cc2295cd54 net/mlx5: DR, Improve steering for empty or RX/TX-only matchers
Every matcher has RX and TX paths. When a new matcher is created, its RX
and TX start/end anchors are connected to the respective RX and TX anchors
of the previous and next matchers.
This creates a potential performance issue: when a certain rule is added
to a matcher, in many cases it is RX or TX only rule, which may create a
long chain of RX/TX-only paths w/o the actual rules.

This patch aims to handle this issue.

RX and TX matchers are now handled separately: matcher connection in the
matchers chain is split into two separate lists: RX only and TX only.
when a new matcher is created, it is initially created 'detached' - its
RX/TX members are not inserted into the table's matcher list.
When an actual rule is added, only its appropriate RX or TX nic matchers
are then added to the table's nic matchers list and inserted into its
place in the chain of matchers.
I.e., if the rule that is being added is an RX-only rule, only the RX
part of the matcher will be connected to the chain, while TX part of the
matcher remains detached and doesn't prolong the TX chain of the matchers.

Same goes for rule deletion: when the last RX/TX rule of the nic matcher
is destroyed, the nic matcher is removed from its list.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:17:37 -08:00
Yevgeny Kliteynik f59464e257 net/mlx5: DR, Add support for matching on geneve_tlv_option_0_exist field
Match on geneve_tlv_option_0_exist field on devices that support STEv1.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:17:34 -08:00
Muhammad Sammar 09753babaf net/mlx5: DR, Support matching on tunnel headers 0 and 1
Tunnel headers are generic encapsulation headers, applies for all
tunneling protocols identified by the device native parser or by the
programmable parser, this support will enable raw matching headers 0 and 1.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:17:30 -08:00
Muhammad Sammar 8c2b4fee9c net/mlx5: DR, Add misc5 to match_param structs
Add misc5 match params to enable matching tunnel headers.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
2021-12-31 00:17:27 -08:00
Muhammad Sammar 0f2a6c3b92 net/mlx5: Add misc5 flow table match parameters
Add support for misc5 match parameter as per HW spec, this will allow
matching on tunnel_header fields.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:17:23 -08:00
Yevgeny Kliteynik b54128275e net/mlx5: DR, Warn on failure to destroy objects due to refcount
Add WARN_ON_ONCE on refcount checks in SW steering object destructors

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:17:20 -08:00
Yevgeny Kliteynik e3a0f40b2f net/mlx5: DR, Add support for UPLINK destination type
Add support for a new destination type - UPLINK.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:17:17 -08:00
Muhammad Sammar 9222f0b27d net/mlx5: DR, Add support for dumping steering info
Extend mlx5 debugfs support to present Software Steering resources:
dr_domain including it's tables, matchers and rules.
The interface is read-only. While dump is being presented, new steering
rules cannot be inserted/deleted.

The steering information is dumped in the CSV form with the following
format:

    <object_type>,<object_ID>, <object_info>,...,<object_info>

This data can be read at the following path:

    /sys/kernel/debug/mlx5/<BDF>/steering/fdb/<domain_handle>

Example:

    # cat /sys/kernel/debug/mlx5/0000:82:00.0/steering/fdb/dmn_000018644
    3100,0x55caa4621c50,0xee802,4,65533
    3101,0x55caa4621c50,0xe0100008

Changes in V2:
 - Reduce temp hex buffer size and avoid unnecessary memset
 - Use bin2hex() instead of DIY loop
 - Don't check debugfs functions return values

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:17:13 -08:00
Muhammad Sammar 7766c9b922 net/mlx5: DR, Add missing reserved fields to dr_match_param
Add the reserved fields to dr_match_param and arrange
as mlx5_ifc_dr_match_param_bits.

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
2021-12-31 00:17:10 -08:00
Yevgeny Kliteynik 89cdba3224 net/mlx5: DR, Add check for flex parser ID value
Allow only legal values for flex parser ID - values from 0 to 7.
For other values skip the parser, and as a result the matcher creation
will fail for using invalid flex parser ID.

Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:17:06 -08:00
Yevgeny Kliteynik 08fac109f7 net/mlx5: DR, Rename list field in matcher struct to list_node
In dr_types structs, some list fields are list heads, and some
are just list nodes that are stored on the other structs' lists.
Rename the appropriate list field to reflect this distinction.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:17:03 -08:00
Yevgeny Kliteynik 32e9bd5853 net/mlx5: DR, Remove unused struct member in matcher
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:16:59 -08:00
Yevgeny Kliteynik c3fb0e280b net/mlx5: DR, Fix lower case macro prefix "mlx5_" to "MLX5_"
Macros prefix should be capital letters - fix the prefix in
mlx5_FLEX_PARSER_MPLS_OVER_UDP_ENABLED.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:16:56 -08:00
Yevgeny Kliteynik 84dfac39c6 net/mlx5: DR, Fix error flow in creating matcher
The error code of nic matcher init functions wasn't checked.
This patch improves the matcher init function and fix error flow bug:
the handling of match parameter is moved into a separate function
and error flow is simplified.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
2021-12-31 00:16:52 -08:00
Jakub Kicinski aec53e60e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
  commit 077cdda764 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
  commit 31108d142f ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
  commit 4390c6edc0 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
  https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/

net/smc/smc_wr.c
  commit 49dc9013e3 ("net/smc: Use the bitmap API when applicable")
  commit 349d43127d ("net/smc: fix kernel panic caused by race of smc_sock")
  bitmap_zero()/memset() is removed by the fix

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-30 12:12:12 -08:00
Jakub Kicinski b6459415b3 net: Don't include filter.h from net/sock.h
sock.h is pretty heavily used (5k objects rebuilt on x86 after
it's touched). We can drop the include of filter.h from it and
add a forward declaration of struct sk_filter instead.
This decreases the number of rebuilt objects when bpf.h
is touched from ~5k to ~1k.

There's a lot of missing includes this was masking. Primarily
in networking tho, this time.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-29 08:48:14 -08:00
Gal Pressman 992d8a4e38 net/mlx5e: Fix wrong features assignment in case of error
In case of an error in mlx5e_set_features(), 'netdev->features' must be
updated with the correct state of the device to indicate which features
were updated successfully.
To do that we maintain a copy of 'netdev->features' and update it after
successful feature changes, so we can assign it to back to
'netdev->features' if needed.

However, since not all netdev features are handled by the driver (e.g.
GRO/TSO/etc), some features may not be updated correctly in case of an
error updating another feature.

For example, while requesting to disable TSO (feature which is not
handled by the driver) and enable HW-GRO, if an error occurs during
HW-GRO enable, 'oper_features' will be assigned with 'netdev->features'
and HW-GRO turned off. TSO will remain enabled in such case, which is a
bug.

To solve that, instead of using 'netdev->features' as the baseline of
'oper_features' and changing it on set feature success, use 'features'
instead and update it in case of errors.

Fixes: 75b81ce719 ("net/mlx5e: Don't override netdev features field unless in error flow")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-28 22:42:50 -08:00
Roi Dayan 077cdda764 net/mlx5e: TC, Fix memory leak with rules with internal port
Fix a memory leak with decap rule with internal port as destination
device. The driver allocates a modify hdr action but doesn't set
the flow attr modify hdr action which results in skipping releasing
the modify hdr action when releasing the flow.

backtrace:
    [<000000005f8c651c>] krealloc+0x83/0xd0
    [<000000009f59b143>] alloc_mod_hdr_actions+0x156/0x310 [mlx5_core]
    [<000000002257f342>] mlx5e_tc_match_to_reg_set_and_get_id+0x12a/0x360 [mlx5_core]
    [<00000000b44ea75a>] mlx5e_tc_add_fdb_flow+0x962/0x1470 [mlx5_core]
    [<0000000003e384a0>] __mlx5e_add_fdb_flow+0x54c/0xb90 [mlx5_core]
    [<00000000ed8b22b6>] mlx5e_configure_flower+0xe45/0x4af0 [mlx5_core]
    [<00000000024f4ab5>] mlx5e_rep_indr_offload.isra.0+0xfe/0x1b0 [mlx5_core]
    [<000000006c3bb494>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core]
    [<00000000d3dac2ea>] tc_setup_cb_add+0x1d2/0x420

Fixes: b16eb3c81f ("net/mlx5: Support internal port as decap route device")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-28 22:42:50 -08:00
Christophe JAILLET 4390c6edc0 net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'
All the error handling paths of 'mlx5e_tc_add_fdb_flow()' end to 'err_out'
where 'flow_flag_set(flow, FAILED);' is called.

All but the new error handling paths added by the commits given in the
Fixes tag below.

Fix these error handling paths and branch to 'err_out'.

Fixes: 166f431ec6 ("net/mlx5e: Add indirect tc offload of ovs internal port")
Fixes: b16eb3c81f ("net/mlx5: Support internal port as decap route device")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
(cherry picked from commit 31108d142f)
2021-12-22 20:38:49 -08:00
Chris Mi 2820110d94 net/mlx5e: Delete forward rule for ct or sample action
When there is ct or sample action, the ct or sample rule will be deleted
and return. But if there is an extra mirror action, the forward rule can't
be deleted because of the return.

Fix it by removing the return.

Fixes: 69e2916ebc ("net/mlx5: CT: Add support for mirroring")
Fixes: f94d6389f6 ("net/mlx5e: TC, Add support to offload sample action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:49 -08:00
Maxim Mikityanskiy 19c4aba2d4 net/mlx5e: Fix ICOSQ recovery flow for XSK
There are two ICOSQs per channel: one is needed for RX, and the other
for async operations (XSK TX, kTLS offload). Currently, the recovery
flow for both is the same, and async ICOSQ is mistakenly treated like
the regular ICOSQ.

This patch prevents running the regular ICOSQ recovery on async ICOSQ.
The purpose of async ICOSQ is to handle XSK wakeup requests and post
kTLS offload RX parameters, it has nothing to do with RQ and XSKRQ UMRs,
so the regular recovery sequence is not applicable here.

Fixes: be5323c837 ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:48 -08:00
Maxim Mikityanskiy 17958d7cd7 net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow
Both regular RQ and XSKRQ use the same ICOSQ for UMRs. When doing
recovery for the ICOSQ, don't forget to deactivate XSKRQ.

XSK can be opened and closed while channels are active, so a new mutex
prevents the ICOSQ recovery from running at the same time. The ICOSQ
recovery deactivates and reactivates XSKRQ, so any parallel change in
XSK state would break consistency. As the regular RQ is running, it's
not enough to just flush the recovery work, because it can be
rescheduled.

Fixes: be5323c837 ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:48 -08:00
Gal Pressman a0cb909644 net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled
When TC classifier action offloads are disabled (CONFIG_MLX5_CLS_ACT in
Kconfig), the mlx5e_rep_tc_receive() function which is responsible for
passing the skb to the stack (or freeing it) is defined as a nop, and
results in leaking the skb memory. Replace the nop with a call to
napi_gro_receive() to resolve the leak.

Fixes: 28e7606fa8 ("net/mlx5e: Refactor rx handler of represetor device")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:48 -08:00
Amir Tzin 918fc3855a net/mlx5e: Wrap the tx reporter dump callback to extract the sq
Function mlx5e_tx_reporter_dump_sq() casts its void * argument to struct
mlx5e_txqsq *, but in TX-timeout-recovery flow the argument is actually
of type struct mlx5e_tx_timeout_ctx *.

 mlx5_core 0000:08:00.1 enp8s0f1: TX timeout detected
 mlx5_core 0000:08:00.1 enp8s0f1: TX timeout on queue: 1, SQ: 0x11ec, CQ: 0x146d, SQ Cons: 0x0 SQ Prod: 0x1, usecs since last trans: 21565000
 BUG: stack guard page was hit at 0000000093f1a2de (stack is 00000000b66ea0dc..000000004d932dae)
 kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
 CPU: 5 PID: 95 Comm: kworker/u20:1 Tainted: G W OE 5.13.0_mlnx #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
 RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
 [mlx5_core]
 Call Trace:
 mlx5e_tx_reporter_dump+0x43/0x1c0 [mlx5_core]
 devlink_health_do_dump.part.91+0x71/0xd0
 devlink_health_report+0x157/0x1b0
 mlx5e_reporter_tx_timeout+0xb9/0xf0 [mlx5_core]
 ? mlx5e_tx_reporter_err_cqe_recover+0x1d0/0x1d0
 [mlx5_core]
 ? mlx5e_health_queue_dump+0xd0/0xd0 [mlx5_core]
 ? update_load_avg+0x19b/0x550
 ? set_next_entity+0x72/0x80
 ? pick_next_task_fair+0x227/0x340
 ? finish_task_switch+0xa2/0x280
   mlx5e_tx_timeout_work+0x83/0xb0 [mlx5_core]
   process_one_work+0x1de/0x3a0
   worker_thread+0x2d/0x3c0
 ? process_one_work+0x3a0/0x3a0
   kthread+0x115/0x130
 ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30
 --[ end trace 51ccabea504edaff ]---
 RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
 PKRU: 55555554
 Kernel panic - not syncing: Fatal exception
 Kernel Offset: disabled
 end Kernel panic - not syncing: Fatal exception

To fix this bug add a wrapper for mlx5e_tx_reporter_dump_sq() which
extracts the sq from struct mlx5e_tx_timeout_ctx and set it as the
TX-timeout-recovery flow dump callback.

Fixes: 5f29458b77 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:48 -08:00
Chris Mi d671e109bd net/mlx5: Fix tc max supported prio for nic mode
Only prio 1 is supported if firmware doesn't support ignore flow
level for nic mode. The offending commit removed the check wrongly.
Add it back.

Fixes: 9a99c8f125 ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:47 -08:00
Moshe Shemesh 33de865f7b net/mlx5: Fix SF health recovery flow
SF do not directly control the PCI device. During recovery flow SF
should not be allowed to do pci disable or pci reset, its PF will do it.

It fixes the following kernel trace:
mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:387:(pid 40948): starting health recovery flow
mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called
mlx5_core 0000:03:00.0: wait vital counter value 0xab175 after 1 iterations
mlx5_core.sf mlx5_core.sf.25: firmware version: 24.32.532
mlx5_core.sf mlx5_core.sf.23: mlx5_health_try_recover:387:(pid 40946): starting health recovery flow
mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called
mlx5_core 0000:03:00.0: wait vital counter value 0xab193 after 1 iterations
mlx5_core.sf mlx5_core.sf.23: firmware version: 24.32.532
mlx5_core.sf mlx5_core.sf.25: mlx5_cmd_check:813:(pid 40948): ENABLE_HCA(0x104) op_mod(0x0) failed,
status bad resource state(0x9), syndrome (0x658908)
mlx5_core.sf mlx5_core.sf.25: mlx5_function_setup:1292:(pid 40948): enable hca failed
mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:389:(pid 40948): health recovery failed

Fixes: 1958fc2f07 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:47 -08:00
Shay Drory aa968f9220 net/mlx5: Fix error print in case of IRQ request failed
In case IRQ layer failed to find or to request irq, the driver is
printing the first cpu of the provided affinity as part of the error
print. Empty affinity is a valid input for the IRQ layer, and it is
an error to call cpumask_first() on empty affinity.

Remove the first cpu print from the error message.

Fixes: c36326d38d ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:47 -08:00
Shay Drory 26a7993c93 net/mlx5: Use first online CPU instead of hard coded CPU
Hard coded CPU (0 in our case) might be offline. Hence, use the first
online CPU instead.

Fixes: f891b7cdbd ("net/mlx5: Enable single IRQ for PCI Function")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:46 -08:00
Yevgeny Kliteynik 624bf42c2e net/mlx5: DR, Fix querying eswitch manager vport for ECPF
On BlueField the E-Switch manager is the ECPF (vport 0xFFFE), but when
querying capabilities of ECPF eswitch manager, need to query vport 0
with other_vport = 0.

Fixes: 9091b821aa ("net/mlx5: DR, Handle eswitch manager and uplink vports separately")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:46 -08:00
Miaoqian Lin 6b8b425858 net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources
The mlx5_get_uars_page() function  returns error pointers.
Using IS_ERR() to check the return value to fix this.

Fixes: 4ec9e7b026 ("net/mlx5: DR, Expose steering domain functionality")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22 20:38:46 -08:00
Tariq Toukan 1f08917ab9 net/mlx5e: Take packet_merge params directly from the RX res struct
As packet_merge params structure is saved on the RX resources structure, there
is no need to pass it separately.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:58 -08:00
Lama Kayal fa691d0c9c net/mlx5e: Allocate per-channel stats dynamically at first usage
Make stats allocation per-channel dynamic on demand, at channel open
operation.

Previously the stats array was pre-allocated for the maximum possible
number of channels. Here we defer the per-channel stats instance allocation
upon its first usage, so that it's allocated only if really needed.

Allocating stats on demand helps maintain a more memory-efficient code,
as we're saving memory when the used number of channels is smaller than
the maximum.

The stats memory instances are still freed in mlx5e_priv_arrays_free(),
so that they are persistent to channels' closure.

Memory size allocated for struct mlx5e_channel_stats is 3648 bytes.
If maximum number of channel stands for 64, the total memory space
allocated for stats is 3648x64 = 228K bytes. In scenarios where the
number of channels in use is significantly smaller than maximum number,
the memory saved can be remarkable.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:58 -08:00
Tariq Toukan be98737a4f net/mlx5e: Use dynamic per-channel allocations in stats
Make stats array an array of pointer. This patch comes in to prepare for
the next patch where allocations of the stats are to be performed
dynamically on first usage.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:57 -08:00
Tariq Toukan 473baf2e9e net/mlx5e: Allow profile-specific limitation on max num of channels
Let SF/VF representor's netdev use profile-specific limitation on
max_nch to reduce its memory and HW resources consumption.

This is particularly important for environments with limited memory
and high number of SFs.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Vu Pham <vuhuong@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:57 -08:00
Tariq Toukan 0246a57ab5 net/mlx5e: Save memory by using dynamic allocation in netdev priv
Many arrays in priv are statically allocated with a pre-defined maximum
(for num channels, num TCs, etc...), that is in some cases significantly
larger than the actual maximum. Examples:
- The more VFs are supported, the less MSIX vectors each of them could
  have. This limits the max_nch for each.
- Systems with limited number of cores or MSIX (< 64).
- Netdev profiles that do not support: QoS (DCB / HTB), PTP TX port
  timestamping.

Here we save some amount of memory by moving several structures
and arrays to follow the actual maximum instead.
This patch also prepares the code for even more savings to follow.

For example, on a system where the maximum num of channel is 8,
the channels stats structs alone go down from 3648*64 = 228 KB to
3648*8 = 28.5 KB per interface.

This is important for environments with high number of VFs/SFs or
limited memory.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:57 -08:00
Tariq Toukan 1958c2bddf net/mlx5e: Add profile indications for PTP and QOS HTB features
Let the profile indicate support of the PTP and HTB (QOS) features.
This unifies the logic that calculates the number of netdev queues needed
for the features, and allows simplification of mlx5e_create_netdev(),
which no longer requires number of rx/tx queues as parameters.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:56 -08:00
Tariq Toukan 6c72cb05d4 net/mlx5e: Use bitmap field for profile features
Use a features bitmap field in mlx5e_profile to declare profile support
state of the different features.  Let it replace the existing
rx_ptp_support boolean. It will be extended to cover more features in a
downstream patch.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:56 -08:00
Shaokun Zhang 08ab0ff47b net/mlx5: Remove the repeated declaration
Function 'mlx5_esw_vport_match_metadata_supported' and
'mlx5_esw_offloads_vport_metadata_set' are declared twice, so remove
the repeated declaration and blank line.

Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:56 -08:00
Shay Drory 8680a60fc1 net/mlx5: Let user configure max_macs generic param
Currently, max_macs is taking 70Kbytes of memory per function. This
size is not needed in all use cases, and is critical with large scale.
Hence, allow user to configure the number of max_macs.

For example, to reduce the number of max_macs to 1, execute::
$ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \
              cmode driverinit
$ devlink dev reload pci/0000:00:0b.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:55 -08:00
Shay Drory 57ca767820 net/mlx5: Let user configure event_eq_size param
Event EQ is an EQ which received the notification of almost all the
events generated by the NIC.
Currently, each event EQ is taking 512KB of memory. This size is not
needed in most use cases, and is critical with large scale. Hence,
allow user to configure the size of the event EQ.

For example to reduce event EQ size to 64, execute::
$ devlink dev param set pci/0000:00:0b.0 name event_eq_size value 64 \
              cmode driverinit
$ devlink dev reload pci/0000:00:0b.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:55 -08:00
Shay Drory 0844fa5f7b net/mlx5: Let user configure io_eq_size param
Currently, each I/O EQ is taking 128KB of memory. This size
is not needed in all use cases, and is critical with large scale.
Hence, allow user to configure the size of I/O EQs.

For example, to reduce I/O EQ size to 64, execute:
$ devlink dev param set pci/0000:00:0b.0 name io_eq_size value 64 \
              cmode driverinit
$ devlink dev reload pci/0000:00:0b.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21 19:08:54 -08:00
Baowen Zheng 144d4c9e80 flow_offload: reject to offload tc actions in offload drivers
A follow-up patch will allow users to offload tc actions independent of
classifier in the software datapath.

In preparation for this, teach all drivers that support offload of the flow
tables to reject such configuration as currently none of them support it.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19 14:08:47 +00:00
David S. Miller 823f7a5497 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
mlx5-next branch 2021-12-15

Hi Dave, Jakub, Jason

This pulls mlx5-next branch into net-next and rdma branches.
All patches already reviewed on both rdma and netdev mailing lists.

Please pull and let me know if there's any problem.

1) Add multiple FDB steering priorities [1]
2) Introduce HW bits needed to configure MAC list size of VF/SF.
   Required for ("net/mlx5: Memory optimizations") upcoming series [2].

[1] https://lore.kernel.org/netdev/20211201193621.9129-1-saeed@kernel.org/
[2] https://lore.kernel.org/lkml/20211208141722.13646-1-shayd@nvidia.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16 10:22:20 +00:00