Commit Graph

237 Commits

Author SHA1 Message Date
Anatolii Gerasymenko a632b2a4c9 ice: ethtool: Prohibit improper channel config for DCB
Do not allow setting less channels, than Traffic Classes there are
via ethtool. There must be at least one channel per Traffic Class.

If you set less channels, than Traffic Classes there are, then during
ice_vsi_rebuild there would be allocated only the requested amount
of tx/rx rings in ice_vsi_alloc_arrays. But later in ice_vsi_setup_q_map
there would be requested at least one channel per Traffic Class. This
results in setting num_rxq > alloc_rxq and num_txq > alloc_txq.
Later, there would be a NULL pointer dereference in
ice_vsi_map_rings_to_vectors, because we go beyond of rx_rings or
tx_rings arrays.

Change ice_set_channels() to return error if you try to allocate less
channels, than Traffic Classes there are.
Change ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() to return
status code instead of void.
Add error handling for ice_vsi_setup_q_map() and
ice_vsi_setup_q_map_mqprio() in ice_vsi_init() and ice_vsi_cfg_tc().

[53753.889983] INFO: Flow control is disabled for this traffic class (0) on this vsi.
[53763.984862] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[53763.992915] PGD 14b45f5067 P4D 0
[53763.996444] Oops: 0002 [#1] SMP NOPTI
[53764.000312] CPU: 12 PID: 30661 Comm: ethtool Kdump: loaded Tainted: GOE    --------- -  - 4.18.0-240.el8.x86_64 #1
[53764.011825] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0020.P21.2012150710 12/15/2020
[53764.022584] RIP: 0010:ice_vsi_map_rings_to_vectors+0x7e/0x120 [ice]
[53764.029089] Code: 41 0d 0f b7 b7 12 05 00 00 0f b6 d0 44 29 de 44 0f b7 c6 44 01 c2 41 39 d0 7d 2d 4c 8b 47 28 44 0f b7 ce 83 c6 01 4f 8b 04 c8 <49> 89 48 28 4                           c 8b 89 b8 01 00 00 4d 89 08 4c 89 81 b8 01 00 00 44
[53764.048379] RSP: 0018:ff550dd88ea47b20 EFLAGS: 00010206
[53764.053884] RAX: 0000000000000002 RBX: 0000000000000004 RCX: ff385ea42fa4a018
[53764.061301] RDX: 0000000000000006 RSI: 0000000000000005 RDI: ff385e9baeedd018
[53764.068717] RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000004
[53764.076133] R10: 0000000000000002 R11: 0000000000000004 R12: 0000000000000000
[53764.083553] R13: 0000000000000000 R14: ff385e658fdd9000 R15: ff385e9baeedd018
[53764.090976] FS:  000014872c5b5740(0000) GS:ff385e847f100000(0000) knlGS:0000000000000000
[53764.099362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[53764.105409] CR2: 0000000000000028 CR3: 0000000a820fa002 CR4: 0000000000761ee0
[53764.112851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[53764.120301] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[53764.127747] PKRU: 55555554
[53764.130781] Call Trace:
[53764.133564]  ice_vsi_rebuild+0x611/0x870 [ice]
[53764.138341]  ice_vsi_recfg_qs+0x94/0x100 [ice]
[53764.143116]  ice_set_channels+0x1a8/0x3e0 [ice]
[53764.147975]  ethtool_set_channels+0x14e/0x240
[53764.152667]  dev_ethtool+0xd74/0x2a10
[53764.156665]  ? __mod_lruvec_state+0x44/0x110
[53764.161280]  ? __mod_lruvec_state+0x44/0x110
[53764.165893]  ? page_add_file_rmap+0x15/0x170
[53764.170518]  ? inet_ioctl+0xd1/0x220
[53764.174445]  ? netdev_run_todo+0x5e/0x290
[53764.178808]  dev_ioctl+0xb5/0x550
[53764.182485]  sock_do_ioctl+0xa0/0x140
[53764.186512]  sock_ioctl+0x1a8/0x300
[53764.190367]  ? selinux_file_ioctl+0x161/0x200
[53764.195090]  do_vfs_ioctl+0xa4/0x640
[53764.199035]  ksys_ioctl+0x60/0x90
[53764.202722]  __x64_sys_ioctl+0x16/0x20
[53764.206845]  do_syscall_64+0x5b/0x1a0
[53764.210887]  entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 87324e747f ("ice: Implement ethtool ops for channels")
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-21 15:20:24 -07:00
Michal Wilczynski bf13502ed5 ice: Fix interrupt moderation settings getting cleared
Adaptive-rx and Adaptive-tx are interrupt moderation settings
that can be enabled/disabled using ethtool:
ethtool -C ethX adaptive-rx on/off adaptive-tx on/off

Unfortunately those settings are getting cleared after
changing number of queues, or in ethtool world 'channels':
ethtool -L ethX rx 1 tx 1

Clearing was happening due to introduction of bit fields
in ice_ring_container struct. This way only itr_setting
bits were rebuilt during ice_vsi_rebuild_set_coalesce().

Introduce an anonymous struct of bitfields and create a
union to refer to them as a single variable.
This way variable can be easily saved and restored.

Fixes: 61dc79ced7 ("ice: Restore interrupt throttle settings after VSI rebuild")
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-17 10:37:09 -07:00
Alexander Lobakin d7442f512b ice: arfs: fix use-after-free when freeing @rx_cpu_rmap
The CI testing bots triggered the following splat:

[  718.203054] BUG: KASAN: use-after-free in free_irq_cpu_rmap+0x53/0x80
[  718.206349] Read of size 4 at addr ffff8881bd127e00 by task sh/20834
[  718.212852] CPU: 28 PID: 20834 Comm: sh Kdump: loaded Tainted: G S      W IOE     5.17.0-rc8_nextqueue-devqueue-02643-g23f3121aca93 #1
[  718.219695] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[  718.223418] Call Trace:
[  718.227139]
[  718.230783]  dump_stack_lvl+0x33/0x42
[  718.234431]  print_address_description.constprop.9+0x21/0x170
[  718.238177]  ? free_irq_cpu_rmap+0x53/0x80
[  718.241885]  ? free_irq_cpu_rmap+0x53/0x80
[  718.245539]  kasan_report.cold.18+0x7f/0x11b
[  718.249197]  ? free_irq_cpu_rmap+0x53/0x80
[  718.252852]  free_irq_cpu_rmap+0x53/0x80
[  718.256471]  ice_free_cpu_rx_rmap.part.11+0x37/0x50 [ice]
[  718.260174]  ice_remove_arfs+0x5f/0x70 [ice]
[  718.263810]  ice_rebuild_arfs+0x3b/0x70 [ice]
[  718.267419]  ice_rebuild+0x39c/0xb60 [ice]
[  718.270974]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[  718.274472]  ? ice_init_phy_user_cfg+0x360/0x360 [ice]
[  718.278033]  ? delay_tsc+0x4a/0xb0
[  718.281513]  ? preempt_count_sub+0x14/0xc0
[  718.284984]  ? delay_tsc+0x8f/0xb0
[  718.288463]  ice_do_reset+0x92/0xf0 [ice]
[  718.292014]  ice_pci_err_resume+0x91/0xf0 [ice]
[  718.295561]  pci_reset_function+0x53/0x80
<...>
[  718.393035] Allocated by task 690:
[  718.433497] Freed by task 20834:
[  718.495688] Last potentially related work creation:
[  718.568966] The buggy address belongs to the object at ffff8881bd127e00
                which belongs to the cache kmalloc-96 of size 96
[  718.574085] The buggy address is located 0 bytes inside of
                96-byte region [ffff8881bd127e00, ffff8881bd127e60)
[  718.579265] The buggy address belongs to the page:
[  718.598905] Memory state around the buggy address:
[  718.601809]  ffff8881bd127d00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  718.604796]  ffff8881bd127d80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[  718.607794] >ffff8881bd127e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  718.610811]                    ^
[  718.613819]  ffff8881bd127e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
[  718.617107]  ffff8881bd127f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc

This is due to that free_irq_cpu_rmap() is always being called
*after* (devm_)free_irq() and thus it tries to work with IRQ descs
already freed. For example, on device reset the driver frees the
rmap right before allocating a new one (the splat above).
Make rmap creation and freeing function symmetrical with
{request,free}_irq() calls i.e. do that on ifup/ifdown instead
of device probe/remove/resume. These operations can be performed
independently from the actual device aRFS configuration.
Also, make sure ice_vsi_free_irq() clears IRQ affinity notifiers
only when aRFS is disabled -- otherwise, CPU rmap sets and clears
its own and they must not be touched manually.

Fixes: 28bf26724f ("ice: Implement aRFS")
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-08 09:08:36 -07:00
Anatolii Gerasymenko ccfee18220 ice: Set txq_teid to ICE_INVAL_TEID on ring creation
When VF is freshly created, but not brought up, ring->txq_teid
value is by default set to 0.
But 0 is a valid TEID. On some platforms the Root Node of
Tx scheduler has a TEID = 0. This can cause issues as shown below.

The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF).

Testing Hints:
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
ip link set dev ens785f0v0 up
ip link set dev ens785f0v0 down

If we have freshly created VF and quickly turn it on and off, so there
would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then
VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error:
[  639.531454] disable queue 89 failed 14
[  639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR
[  639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5

The reason for the fail is that we are trying to send AQ command to
delete queue 89, which has never been created and receive an "invalid
argument" error from firmware.

As this queue has never been created, it's teid and ring->txq_teid
have default value 0.
ice_dis_vsi_txq has a check against non-existent queues:

node = ice_sched_find_node_by_teid(pi->root, q_teids[i]);
if (!node)
	continue;

But on some platforms the Root Node of Tx scheduler has a teid = 0.
Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is
pi->root), and we go further to submit an erroneous request to firmware.

Fixes: 37bb839012 ("ice: Move common functions out of ice_main.c part 7/7")
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-05 12:50:25 +02:00
Ivan Vecera bd8c624c0c ice: Clear default forwarding VSI during VSI release
VSI is set as default forwarding one when promisc mode is set for
PF interface, when PF is switched to switchdev mode or when VF
driver asks to enable allmulticast or promisc mode for the VF
interface (when vf-true-promisc-support priv flag is off).
The third case is buggy because in that case VSI associated with
VF remains as default one after VF removal.

Reproducer:
1. Create VF
   echo 1 > sys/class/net/ens7f0/device/sriov_numvfs
2. Enable allmulticast or promisc mode on VF
   ip link set ens7f0v0 allmulticast on
   ip link set ens7f0v0 promisc on
3. Delete VF
   echo 0 > sys/class/net/ens7f0/device/sriov_numvfs
4. Try to enable promisc mode on PF
   ip link set ens7f0 promisc on

Although it looks that promisc mode on PF is enabled the opposite
is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC
handling first checks if any other VSI is set as default forwarding
one and if so the function does not do anything. At this point
it is not possible to enable promisc mode on PF without re-probe
device.

To resolve the issue this patch clear default forwarding VSI
during ice_vsi_release() when the VSI to be released is the default
one.

Fixes: 01b5e89aab ("ice: Add VF promiscuous support")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01 12:01:37 +01:00
Jonathan Toppins ad24d9ebc4 ice: change "can't set link" message to dbg level
In the case where the link is owned by manageability, the firmware is
not allowed to set the link state, so an error code is returned.
This however is non-fatal and there is nothing the operator can do,
so instead of confusing the operator with messages they can do nothing
about hide this message behind the debug log level.

Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09 08:02:58 -08:00
Jacob Keller 3d5985a185 ice: convert VF storage to hash table with krefs and RCU
The ice driver stores VF structures in a simple array which is allocated
once at the time of VF creation. The VF structures are then accessed
from the array by their VF ID. The ID must be between 0 and the number
of allocated VFs.

Multiple threads can access this table:

 * .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust
 * interrupts, such as due to messages from the VF using the virtchnl
   communication
 * processing such as device reset
 * commands to add or remove VFs

The current implementation does not keep track of when all threads are
done operating on a VF and can potentially result in use-after-free
issues caused by one thread accessing a VF structure after it has been
released when removing VFs. Some of these are prevented with various
state flags and checks.

In addition, this structure is quite static and does not support a
planned future where virtualization can be more dynamic. As we begin to
look at supporting Scalable IOV with the ice driver (as opposed to just
supporting Single Root IOV), this structure is not sufficient.

In the future, VFs will be able to be added and removed individually and
dynamically.

To allow for this, and to better protect against a whole class of
use-after-free bugs, replace the VF storage with a combination of a hash
table and krefs to reference track all of the accesses to VFs through
the hash table.

A hash table still allows efficient look up of the VF given its ID, but
also allows adding and removing VFs. It does not require contiguous VF
IDs.

The use of krefs allows the cleanup of the VF memory to be delayed until
after all threads have released their reference (by calling ice_put_vf).

To prevent corruption of the hash table, a combination of RCU and the
mutex table_lock are used. Addition and removal from the hash table use
the RCU-aware hash macros. This allows simple read-only look ups that
iterate to locate a single VF can be fast using RCU. Accesses which
modify the hash table, or which can't take RCU because they sleep, will
hold the mutex lock.

By using this design, we have a stronger guarantee that the VF structure
can't be released until after all threads are finished operating on it.
We also pave the way for the more dynamic Scalable IOV implementation in
the future.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03 11:57:18 -08:00
Jacob Keller fb916db1f0 ice: introduce VF accessor functions
Before we switch the VF data structure storage mechanism to a hash,
introduce new accessor functions to define the new interface.

* ice_get_vf_by_id is a function used to obtain a reference to a VF from
  the table based on its VF ID
* ice_has_vfs is used to quickly check if any VFs are configured
* ice_get_num_vfs is used to get an exact count of how many VFs are
  configured

We can drop the old ice_validate_vf_id function, since every caller was
just going to immediately access the VF table to get a reference
anyways. This way we simply use the single ice_get_vf_by_id to both
validate the VF ID is within range and that there exists a VF with that
ID.

This change enables us to more easily convert the codebase to the hash
table since most callers now properly use the interface.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03 10:54:33 -08:00
Jacob Keller 000773c00f ice: factor VF variables to separate structure
We maintain a number of values for VFs within the ice_pf structure. This
includes the VF table, the number of allocated VFs, the maximum number
of supported SR-IOV VFs, the number of queue pairs per VF, the number of
MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events.

We're about to add a few more variables to this list. Clean this up
first by extracting these members out into a new ice_vfs structure
defined in ice_virtchnl_pf.h

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03 10:54:29 -08:00
Jacob Keller c4c2c7db64 ice: convert ice_for_each_vf to include VF entry iterator
The ice_for_each_vf macro is intended to be used to loop over all VFs.
The current implementation relies on an iterator that is the index into
the VF array in the PF structure. This forces all users to perform a
look up themselves.

This abstraction forces a lot of duplicate work on callers and leaks the
interface implementation to the caller. Replace this with an
implementation that includes the VF pointer the primary iterator. This
version simplifies callers which just want to iterate over every VF, as
they no longer need to perform their own lookup.

The "i" iterator value is replaced with a new unsigned int "bkt"
parameter, as this will match the necessary interface for replacing
the VF array with a hash table. For now, the bkt is the VF ID, but in
the future it will simply be the hash bucket index. Document that it
should not be treated as a VF ID.

This change aims to simplify switching from the array to a hash table. I
considered alternative implementations such as an xarray but decided
that the hash table was the simplest and most suitable implementation. I
also looked at methods to hide the bkt iterator entirely, but I couldn't
come up with a feasible solution that worked for hash table iterators.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03 08:46:48 -08:00
Jacob Keller b03d519d34 ice: store VF pointer instead of VF ID
The VSI structure contains a vf_id field used to associate a VSI with a
VF. This is used mainly for ICE_VSI_VF as well as partially for
ICE_VSI_CTRL associated with the VFs.

This API was designed with the idea that VFs are stored in a simple
array that was expected to be static throughout most of the driver's
life.

We plan on refactoring VF storage in a few key ways:

  1) converting from a simple static array to a hash table
  2) using krefs to track VF references obtained from the hash table
  3) use RCU to delay release of VF memory until after all references
     are dropped

This is motivated by the goal to ensure that the lifetime of VF
structures is accounted for, and prevent various use-after-free bugs.

With the existing vsi->vf_id, the reference tracking for VFs would
become somewhat convoluted, because each VSI maintains a vf_id field
which will then require performing a look up. This means all these flows
will require reference tracking and proper usage of rcu_read_lock, etc.

We know that the VF VSI will always be backed by a valid VF structure,
because the VSI is created during VF initialization and removed before
the VF is destroyed. Rely on this and store a reference to the VF in the
VSI structure instead of storing a VF ID. This will simplify the usage
and avoid the need to perform lookups on the hash table in the future.

For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after
ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a
vsi->vf pointer is valid when dealing with VF VSIs. This will aid in
debugging code which violates this assumption and avoid more disastrous
panics.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03 08:46:47 -08:00
Karol Kolacinski 43113ff734 ice: add TTY for GNSS module for E810T device
Add a new ice_gnss.c file for holding the basic GNSS module functions.
If the device supports GNSS module, call the new ice_gnss_init and
ice_gnss_release functions where appropriate.

Implement basic functionality for reading the data from GNSS module
using TTY device.

Add I2C read AQ command. It is now required for controlling the external
physical connectors via external I2C port expander on E810-T adapters.

Future changes will introduce write functionality.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Sudhansu Sekhar Mishra <sudhansu.mishra@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03 14:11:00 +00:00
Jakub Kicinski 6b5567b1b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 11:44:20 -08:00
Dave Ertman 88f62aea1c ice: Simplify tracking status of RDMA support
The status of support for RDMA is currently being tracked with two
separate status flags. This is unnecessary with the current state of
the driver.

Simplify status tracking down to a single flag.

Rename the helper function to denote the RDMA specific status and
universally use the helper function to test the status bit.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14 13:35:12 +00:00
Jesse Brandeburg 86006f9963 ice: enable parsing IPSEC SPI headers for RSS
The COMMS package can enable the hardware parser to recognize IPSEC
frames with ESP header and SPI identifier.  If this package is available
and configured for loading in /lib/firmware, then the driver will
succeed in enabling this protocol type for RSS.

This in turn allows the hardware to hash over the SPI and use it to pick
a consistent receive queue for the same secure flow. Without this all
traffic is steered to the same queue for multiple traffic threads from
the same IP address. For that reason this is marked as a fix, as the
driver supports the model, but it wasn't enabled.

If the package is not available, adding this type will fail, but the
failure is ignored on purpose as it has no negative affect.

Fixes: c90ed40cef ("ice: Enable writing hardware filtering tables")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14 11:22:35 +00:00
Brett Creeley 1babaf77f4 ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev
In order for the driver to support 802.1ad VLAN filtering and offloads,
it needs to advertise those VLAN features and also support modifying
those VLAN features, so make the necessary changes to
ice_set_netdev_features(). By default, enable CTAG insertion/stripping
and CTAG filtering for both Single and Double VLAN Modes (SVM/DVM).
Also, in DVM, enable STAG filtering by default. This is done by
setting the feature bits in netdev->features. Also, in DVM, support
toggling of STAG insertion/stripping, but don't enable them by
default. This is done by setting the feature bits in
netdev->hw_features.

Since 802.1ad VLAN filtering and offloads are only supported in DVM, make
sure they are not enabled by default and that they cannot be enabled
during runtime, when the device is in SVM.

Add an implementation for the ndo_fix_features() callback. This is
needed since the hardware cannot support multiple VLAN ethertypes for
VLAN insertion/stripping simultaneously and all supported VLAN filtering
must either be enabled or disabled together.

Disable inner VLAN stripping by default when DVM is enabled. If a VSI
supports stripping the inner VLAN in DVM, then it will have to configure
that during runtime. For example if a VF is configured in a port VLAN
while DVM is enabled it will be allowed to offload inner VLANs.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09 09:24:45 -08:00
Brett Creeley 0d54d8f7a1 ice: Add hot path support for 802.1Q and 802.1ad VLAN offloads
Currently the driver only supports 802.1Q VLAN insertion and stripping.
However, once Double VLAN Mode (DVM) is fully supported, then both 802.1Q
and 802.1ad VLAN insertion and stripping will be supported. Unfortunately
the VSI context parameters only allow for one VLAN ethertype at a time
for VLAN offloads so only one or the other VLAN ethertype offload can be
supported at once.

To support this, multiple changes are needed.

Rx path changes:

[1] In DVM, the Rx queue context l2tagsel field needs to be cleared so
the outermost tag shows up in the l2tag2_2nd field of the Rx flex
descriptor. In Single VLAN Mode (SVM), the l2tagsel field should remain
1 to support SVM configurations.

[2] Modify the ice_test_staterr() function to take a __le16 instead of
the ice_32b_rx_flex_desc union pointer so this function can be used for
both rx_desc->wb.status_error0 and rx_desc->wb.status_error1.

[3] Add the new inline function ice_get_vlan_tag_from_rx_desc() that
checks if there is a VLAN tag in l2tag1 or l2tag2_2nd.

[4] In ice_receive_skb(), add a check to see if NETIF_F_HW_VLAN_STAG_RX
is enabled in netdev->features. If it is, then this is the VLAN
ethertype that needs to be added to the stripping VLAN tag. Since
ice_fix_features() prevents CTAG_RX and STAG_RX from being enabled
simultaneously, the VLAN ethertype will only ever be 802.1Q or 802.1ad.

Tx path changes:

[1] In DVM, the VLAN tag needs to be placed in the l2tag2 field of the Tx
context descriptor. The new define ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN was
added to the list of tx_flags to handle this case.

[2] When the stack requests the VLAN tag to be offloaded on Tx, the
driver needs to set either ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN or
ICE_TX_FLAGS_HW_VLAN, so the tag is inserted in l2tag2 or l2tag1
respectively. To determine which location to use, set a bit in the Tx
ring flags field during ring allocation that can be used to determine
which field to use in the Tx descriptor. In DVM, always use l2tag2,
and in SVM, always use l2tag1.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09 09:24:45 -08:00
Brett Creeley c31af68a1b ice: Add outer_vlan_ops and VSI specific VLAN ops implementations
Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN
ops are only available when the device is in Double VLAN Mode (DVM).
Depending on the VSI type, the requirements for what operations to
use/allow differ.

By default all VSI's have unsupported inner and outer VSI VLAN ops. This
implementation was chosen to prevent unexpected crashes due to null
pointer dereferences. Instead, if a VSI calls an unsupported op, it will
just return -EOPNOTSUPP.

Add implementations to support modifying outer VLAN fields for VSI
context. This includes the ability to modify VLAN stripping, insertion,
and the port VLAN based on the outer VLAN handling fields of the VSI
context.

These functions should only ever be used if DVM is enabled because that
means the firmware supports the outer VLAN fields in the VSI context. If
the device is in DVM, then always use the outer_vlan_ops, else use the
vlan_ops since the device is in Single VLAN Mode (SVM).

Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to
ice_vsi_vlan_setup() as the latter function is specific to the PF and
all other VSI types that need an untagged VLAN 0 filter already do this
in their specific flows. Without this change, Flow Director is failing
to initialize because it does not implement any VSI VLAN ops.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09 09:24:45 -08:00
Brett Creeley 7bd527aa17 ice: Adjust naming for inner VLAN operations
Current operations act on inner VLAN fields. To support double VLAN, outer
VLAN operations and functions will be implemented. Add the "inner" naming
to existing VLAN operations to distinguish them from the upcoming outer
values and functions. Some spacing adjustments are made to align
values.

Note that the inner is not talking about a tunneled VLAN, but the second
VLAN in the packet. For SVM the driver uses inner or single VLAN
filtering and offloads and in Double VLAN Mode the driver uses the
inner filtering and offloads for SR-IOV VFs in port VLANs in order to
support offloading the guest VLAN while a port VLAN is configured.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09 09:24:45 -08:00
Brett Creeley 2bfefa2dab ice: Use the proto argument for VLAN ops
Currently the proto argument is unused. This is because the driver only
supports 802.1Q VLAN filtering. This policy is enforced via netdev
features that the driver sets up when configuring the netdev, so the
proto argument won't ever be anything other than 802.1Q. However, this
will allow for future iterations of the driver to seemlessly support
802.1ad filtering. Begin using the proto argument and extend the related
structures to support its use.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09 09:24:45 -08:00
Brett Creeley fb05ba1257 ice: Introduce ice_vlan struct
Add a new struct for VLAN related information. Currently this holds
VLAN ID and priority values, but will be expanded to hold TPID value.
This reduces the changes necessary if any other values are added in
future. Remove the action argument from these calls as it's always
ICE_FWD_VSI.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09 09:24:45 -08:00
Brett Creeley bc42afa954 ice: Add new VSI VLAN ops
Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and
offloads require more flexibility when configuring VLANs. The VSI VLAN
interface will allow flexibility for configuring VLANs for all VSI
types. Add new files to separate the VSI VLAN ops and move functions to
make the code more organized.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09 09:24:45 -08:00
Brett Creeley 3e0b59714b ice: Add helper function for adding VLAN 0
There are multiple places where VLAN 0 is being added. Create a function
to be called in order to minimize changes as the implementation is expanded
to support double VLAN and avoid duplicated code.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09 09:24:45 -08:00
Brett Creeley daf4dd1643 ice: Refactor spoofcheck configuration functions
Add functions to configure Tx VLAN antispoof based on iproute
configuration and/or VLAN mode and VF driver support. This is needed
later so the driver can control when it can be configured. Also, add
functions that can be used to enable and disable MAC and VLAN
spoofcheck. Move spoofchk configuration during VSI setup into the
SR-IOV initialization path and into the post VSI rebuild flow for VF
VSIs.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09 09:24:45 -08:00
Kiran Patil 40319796b7 ice: Add flow director support for channel mode
Add support to enable flow-director filter when multiple TCs are
configured. Flow director filter can be configured using ethtool
(--config-ntuple option). When multiple TCs are configured, each
TC is mapped to an unique HW VSI. So VSI corresponding to queue
used in filter is identified and flow director context is updated
with correct VSI while configuring ntuple filter in HW.

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-30 13:16:07 +00:00
Tony Nguyen c14846914e ice: Propagate error codes
As all functions now return standard error codes, propagate the values
being returned instead of converting them to generic values.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14 10:19:14 -08:00
Tony Nguyen 2ccc1c1ccc ice: Remove excess error variables
ice_status previously had a variable to contain these values where other
error codes had a variable as well. With ice_status now being an int,
there is no need for two variables to hold error values. In cases where
this occurs, remove one of the excess variables and use a single one.
Some initialization of variables are no longer needed and have been
removed.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14 10:19:13 -08:00
Tony Nguyen 5518ac2a64 ice: Cleanup after ice_status removal
Clean up code after changing ice_status to int. Rearrange to fix reverse
Christmas tree and pull lines up where applicable.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14 10:19:13 -08:00
Tony Nguyen d54699e27d ice: Remove enum ice_status
Replace uses of ice_status to, as equivalent as possible, error codes.
Remove enum ice_status and its helper conversion function as they are no
longer needed.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14 10:19:13 -08:00
Tony Nguyen 5e24d5984c ice: Use int for ice_status
To prepare for removal of ice_status, change the variables from
ice_status to int. This eases the transition when values are changed to
return standard int error codes over enum ice_status.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14 10:19:13 -08:00
Tony Nguyen 5f87ec4861 ice: Remove string printing for ice_status
Remove the ice_stat_str() function which prints the string
representation of the ice_status error code. With upcoming changes
moving away from ice_status, there will be no need for this function.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14 10:19:13 -08:00
Maciej Fijalkowski 792b208658 ice: fix vsi->txq_map sizing
The approach of having XDP queue per CPU regardless of user's setting
exposed a hidden bug that could occur in case when Rx queue count differ
from Tx queue count. Currently vsi->txq_map's size is equal to the
doubled vsi->alloc_txq, which is not correct due to the fact that XDP
rings were previously based on the Rx queue count. Below splat can be
seen when ethtool -L is used and XDP rings are configured:

[  682.875339] BUG: kernel NULL pointer dereference, address: 000000000000000f
[  682.883403] #PF: supervisor read access in kernel mode
[  682.889345] #PF: error_code(0x0000) - not-present page
[  682.895289] PGD 0 P4D 0
[  682.898218] Oops: 0000 [#1] PREEMPT SMP PTI
[  682.903055] CPU: 42 PID: 2878 Comm: ethtool Tainted: G           OE     5.15.0-rc5+ #1
[  682.912214] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016
[  682.923380] RIP: 0010:devres_remove+0x44/0x130
[  682.928527] Code: 49 89 f4 55 48 89 fd 4c 89 ff 53 48 83 ec 10 e8 92 b9 49 00 48 8b 9d a8 02 00 00 48 8d 8d a0 02 00 00 49 89 c2 48 39 cb 74 0f <4c> 3b 63 10 74 25 48 8b 5b 08 48 39 cb 75 f1 4c 89 ff 4c 89 d6 e8
[  682.950237] RSP: 0018:ffffc90006a679f0 EFLAGS: 00010002
[  682.956285] RAX: 0000000000000286 RBX: ffffffffffffffff RCX: ffff88908343a370
[  682.964538] RDX: 0000000000000001 RSI: ffffffff81690d60 RDI: 0000000000000000
[  682.972789] RBP: ffff88908343a0d0 R08: 0000000000000000 R09: 0000000000000000
[  682.981040] R10: 0000000000000286 R11: 3fffffffffffffff R12: ffffffff81690d60
[  682.989282] R13: ffffffff81690a00 R14: ffff8890819807a8 R15: ffff88908343a36c
[  682.997535] FS:  00007f08c7bfa740(0000) GS:ffff88a03fd00000(0000) knlGS:0000000000000000
[  683.006910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  683.013557] CR2: 000000000000000f CR3: 0000001080a66003 CR4: 00000000003706e0
[  683.021819] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  683.030075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  683.038336] Call Trace:
[  683.041167]  devm_kfree+0x33/0x50
[  683.045004]  ice_vsi_free_arrays+0x5e/0xc0 [ice]
[  683.050380]  ice_vsi_rebuild+0x4c8/0x750 [ice]
[  683.055543]  ice_vsi_recfg_qs+0x9a/0x110 [ice]
[  683.060697]  ice_set_channels+0x14f/0x290 [ice]
[  683.065962]  ethnl_set_channels+0x333/0x3f0
[  683.070807]  genl_family_rcv_msg_doit+0xea/0x150
[  683.076152]  genl_rcv_msg+0xde/0x1d0
[  683.080289]  ? channels_prepare_data+0x60/0x60
[  683.085432]  ? genl_get_cmd+0xd0/0xd0
[  683.089667]  netlink_rcv_skb+0x50/0xf0
[  683.094006]  genl_rcv+0x24/0x40
[  683.097638]  netlink_unicast+0x239/0x340
[  683.102177]  netlink_sendmsg+0x22e/0x470
[  683.106717]  sock_sendmsg+0x5e/0x60
[  683.110756]  __sys_sendto+0xee/0x150
[  683.114894]  ? handle_mm_fault+0xd0/0x2a0
[  683.119535]  ? do_user_addr_fault+0x1f3/0x690
[  683.134173]  __x64_sys_sendto+0x25/0x30
[  683.148231]  do_syscall_64+0x3b/0xc0
[  683.161992]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix this by taking into account the value that num_possible_cpus()
yields in addition to vsi->alloc_txq instead of doubling the latter.

Fixes: efc2214b60 ("ice: Add support for XDP")
Fixes: 22bf877e52 ("ice: introduce XDP_TX fallback path")
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-11-22 08:33:54 -08:00
Brett Creeley 29e71f41e7 ice: Remove boolean vlan_promisc flag from function
Currently, the vlan_promisc flag is used exclusively by VF VSI to
determine whether or not to toggle VLAN pruning along with
trusted/true-promiscuous mode. This is not needed for a couple of
reasons. First, trusted/true-promiscuous mode is only supposed to allow
all MAC filters within VLANs that a VF has added filters for, so VLAN
pruning should not be disabled. Second, the boolean argument makes the
function confusing and unintuitive. Remove this flag.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-29 10:48:16 -07:00
Nathan Chancellor 370764e60b ice: Fix clang -Wimplicit-fallthrough in ice_pull_qvec_from_rc()
Clang warns:

drivers/net/ethernet/intel/ice/ice_lib.c:1906:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
        default:
        ^
drivers/net/ethernet/intel/ice/ice_lib.c:1906:2: note: insert 'break;' to avoid fall-through
        default:
        ^
        break;
1 error generated.

Clang is a little more pedantic than GCC, which does not warn when
falling through to a case that is just break or return. Clang's version
is more in line with the kernel's own stance in deprecated.rst, which
states that all switch/case blocks must end in either break,
fallthrough, continue, goto, or return. Add the missing break to silence
the warning.

Link: https://github.com/ClangBuiltLinux/linux/issues/1482
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-28 11:00:20 -07:00
David S. Miller bdfa75ad70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Lots of simnple overlapping additions.

With a build fix from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-22 11:41:16 +01:00
Kiran Patil fbc7b27af0 ice: enable ndo_setup_tc support for mqprio_qdisc
Add support in driver for TC_QDISC_SETUP_MQPRIO. This support
enables instantiation of channels in HW using existing MQPRIO
infrastructure which is extended to be offloadable. This
provides a mechanism to configure dedicated set of queues for
each TC.

Configuring channels using "tc mqprio":
--------------------------------------
tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \
	queues 4@0 4@4 4@8  hw 1 mode channel

Above command configures 3 TCs having 4 queues each. "hw 1 mode channel"
implies offload of channel configuration to HW. When driver processes
configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each
TC maps to HW VSI with specified queues.

User can optionally specify bandwidth min and max rate limit per TC
(see example below). If shaper params like min and/or max bandwidth
rate limit are specified, driver configures VSI specific rate limiter
in HW.

Configuring channels and bandwidth shaper parameters using "tc mqprio":
----------------------------------------------------------------
tc qdisc add dev <ethX> root mqprio \
	num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \
	shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \
	max_rate 4Gbit 5Gbit 6Gbit 7Gbit

Command to view configured TCs:
-----------------------------
tc qdisc show dev <ethX>

Deleting TCs:
------------
tc qdisc del dev <ethX> root mqprio

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-20 15:58:11 -07:00
Kiran Patil 0754d65bd4 ice: Add infrastructure for mqprio support via ndo_setup_tc
Add infrastructure required for "ndo_setup_tc:qdisc_mqprio".
ice_vsi_setup is modified to configure traffic classes based
on mqprio data received from the stack. This includes low-level
functions to configure min, max rate-limit parameters in hardware
for traffic classes. Each traffic class gets mapped to a hardware
channel (VSI) which can be individually configured with different
bandwidth parameters.

Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-20 15:57:54 -07:00
Jesse Brandeburg d16a4f45f3 ice: fix rate limit update after coalesce change
If the adaptive settings are changed with
ethtool -C ethx adaptive-rx off adaptive-tx off
then the interrupt rate limit should be maintained as a user set value,
but only if BOTH adaptive settings are off. Fix a bug where the rate
limit that was being used in adaptive mode was staying set in the
register but was not reported correctly by ethtool -c ethx. Due to long
lines include a small refactor of q_vector variable.

Fixes: b8b4772377 ("ice: refactor interrupt moderation writes")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19 10:45:16 -07:00
Jesse Brandeburg d8eb7ad5e4 ice: update dim usage and moderation
The driver was having trouble with unreliable latency when doing single
threaded ping-pong tests. This was root caused to the DIM algorithm
landing on a too slow interrupt value, which caused high latency, and it
was especially present when queues were being switched frequently by the
scheduler as happens on default setups today.

In attempting to improve this, we allow the upper rate limit for
interrupts to move to rate limit of 4 microseconds as a max, which means
that no vector can generate more than 250,000 interrupts per second. The
old config was up to 100,000. The driver previously tried to program the
rate limit too frequently and if the receive and transmit side were both
active on the same vector, the INTRL would be set incorrectly, and this
change fixes that issue as a side effect of the redesign.

This driver will operate from now on with a slightly changed DIM table
with more emphasis towards latency sensitivity by having more table
entries with lower latency than with high latency (high being >= 64
microseconds).

The driver also resets the DIM algorithm state with a new stats set when
there is no work done and the data becomes stale (older than 1 second),
for the respective receive or transmit portion of the interrupt.

Add a new helper for setting rate limit, which will be used more
in a followup patch.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19 10:45:16 -07:00
Brett Creeley 4ecc863305 ice: Add support for VF rate limiting
Implement ndo_set_vf_rate to support setting of min_tx_rate and
max_tx_rate; set the appropriate bandwidth in the scheduler for the
node representing the specified VF VSI.

Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19 09:22:31 -07:00
Maciej Fijalkowski 2faf63b650 ice: make use of ice_for_each_* macros
Go through the code base and use ice_for_each_* macros.  While at it,
introduce ice_for_each_xdp_txq() macro that can be used for looping over
xdp_rings array.

Commit is not introducing any new functionality.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15 07:39:03 -07:00
Maciej Fijalkowski 22bf877e52 ice: introduce XDP_TX fallback path
Under rare circumstances there might be a situation where a requirement
of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
resources have to be shared between CPUs. This yields a need for placing
accesses to xdp_ring inside a critical section protected by spinlock.
These accesses happen to be in the hot path, so let's introduce the
static branch that will be triggered from the control plane when driver
could not provide Tx queue dedicated for XDP on each CPU.

Currently, the design that has been picked is to allow any number of XDP
Tx queues that is at least half of a count of CPUs that platform has.
For lower number driver will bail out with a response to user that there
were not enough Tx resources that would allow configuring XDP. The
sharing of rings is signalled via static branch enablement which in turn
indicates that lock for xdp_ring accesses needs to be taken in hot path.

Approach based on static branch has no impact on performance of a
non-fallback path. One thing that is needed to be mentioned is a fact
that the static branch will act as a global driver switch, meaning that
if one PF got out of Tx resources, then other PFs that ice driver is
servicing will suffer. However, given the fact that HW that ice driver
is handling has 1024 Tx queues per each PF, this is currently an
unlikely scenario.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15 07:39:03 -07:00
Maciej Fijalkowski 0bb4f9ecad ice: unify xdp_rings accesses
There has been a long lasting issue of improper xdp_rings indexing for
XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index
is mixed with smp_processor_id(), there could be a situation where Tx
descriptors are produced onto XDP Tx ring, but tail is never bumped -
for example pin a particular queue id to non-matching IRQ line.

Address this problem by ignoring the user ring count setting and always
initialize the xdp_rings array to be of num_possible_cpus() size. Then,
always use the smp_processor_id() as an index to xdp_rings array. This
provides serialization as at given time only a single softirq can run on
a particular CPU.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15 07:39:02 -07:00
Maciej Fijalkowski e72bba2135 ice: split ice_ring onto Tx/Rx separate structs
While it was convenient to have a generic ring structure that served
both Tx and Rx sides, next commits are going to introduce several
Tx-specific fields, so in order to avoid hurting the Rx side, let's
pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.

Rx ring could be handled by the old ice_ring which would reduce the code
churn within this patch, but this would make things asymmetric.

Make the union out of the ring container within ice_q_vector so that it
is possible to iterate over newly introduced ice_tx_ring.

Remove the @size as it's only accessed from control path and it can be
calculated pretty easily.

Change definitions of ice_update_ring_stats and
ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
used for both Rx and Tx rings.

Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
difference now.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15 07:39:02 -07:00
Maciej Fijalkowski e93d1c37a8 ice: remove ring_active from ice_ring
This field is dead and driver is not making any use of it. Simply remove
it.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15 07:39:02 -07:00
Brett Creeley ff7e932194 ice: Fix failure to re-add LAN/RDMA Tx queues
Currently if the VSI is rebuilt/removed and the RDMA PF driver is active
the RDMA Tx queue scheduler node configuration will not be cleaned up.
This will cause the rebuild/re-add of the VSI to fail due to the
software structures not being correctly cleaned up for the VSI index.
Fix this by always calling ice_rm_vsi_rdma_cfg() for all VSI. If there
are no RDMA scheduler nodes created, then there is no harm in calling
ice_rm_vsi_rdma_cfg(). This change applies to all VSI types, so if
RDMA support is added for other VSI types they will also get this
change.

Fixes: 348048e724 ("ice: Implement iidc operations")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Jerzy Wiktor Jurkowski <jerzy.wiktor.jurkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14 10:14:00 -07:00
Maciej Machnikowski 325b2064d0 ice: Implement support for SMA and U.FL on E810-T
Expose SMA and U.FL connectors as ptp_pins on E810-T based adapters and
allow controlling them.

E810-T adapters are equipped with:
- 2 external bidirectional SMA connectors
- 1 internal TX U.FL
- 1 internal RX U.FL

U.FL connectors share signal lines with the SMA connectors. The TX U.FL1
share the line with the SMA1 and the RX U.FL2 share line with the SMA2.
This dependence is controlled by the ice_verify_pin_e810t.

Additionally add support for the E810-T-based  devices which don't use the
SMA/U.FL controller. If the IO expander is not detected don't expose pins
and use 2 predefined 1PPS input and output pins.

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14 07:37:30 -07:00
Grzegorz Nitka f66756e0ea ice: introduce new type of VSI for switchdev
New type of VSI has to be defined for switchdev control plane
VSI. Number of allocated Tx and Rx queue has to be equal to
amount of VFs, because each port representor should have one
Tx and Rx queue.

Also to not increase number of used irqs too much, control plane
VSI uses only one q_vector and handle all queues in one irq.
To allow handling all queues in one irq , new function to clean
msix for eswitch was introduced. This function will schedule napi
for each representor instead of scheduling it only for one like in
normal clean irq function.

Only one additional msix has to be requested. Always try to request
it in ice_ena_msix_range function.

Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07 10:41:42 -07:00
Michal Swiatkowski ff5411ef88 ice: manage VSI antispoof and destination override
Implement functions to make setting VSI security config easier.
Main function ice_update_security fills security section field and
checks against error in updating VSI. Reset functions are responsible
for correct filling config according to user expectations.

This helper is needed because destination override is located in
this section. Driver has to set this bit to allow strering Tx packet
on VSI based on value in Tx descriptors.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07 10:41:42 -07:00
Wojciech Drewek 2ae0aa4758 ice: Move devlink port to PF/VF struct
Keeping devlink port inside VSI data structure causes some issues.
Since VF VSI is released during reset that means that we have to
unregister devlink port and register it again every time reset is
triggered. With the new changes in devlink API it
might cause deadlock issues. After calling
devlink_port_register/devlink_port_unregister devlink API is going to
lock rtnl_mutex. It's an issue when VF reset is triggered in netlink
operation context (like setting VF MAC address or VLAN),
because rtnl_lock is already taken by netlink. Another call of
rtnl_lock from devlink API results in dead-lock.

By moving devlink port to PF/VF we avoid creating/destroying it
during reset. Since this patch, devlink ports are created during
ice_probe, destroyed during ice_remove for PF and created during
ice_repr_add, destroyed during ice_repr_rem for VF.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07 10:41:41 -07:00