The UDP reuseport conflict was a little bit tricky.
The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.
At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.
This requires moving the reuseport_has_conns() logic into the callers.
While we are here, get rid of inline directives as they do not belong
in foo.c files.
The other changes were cases of more straightforward overlapping
modifications.
Signed-off-by: David S. Miller <davem@davemloft.net>
The fact that NETIF_F_HW_TC is not set should be a sufficient
indication to the user that TC offloads are not supported.
No need to bother users of older firmware versions with
pointless warnings on every boot.
Also, since the support is optional, bnxt_init_tc() should not
return an error in case FW is old, similarly to how error
is not returned when CONFIG_BNXT_FLOWER_OFFLOAD is not set.
With that we can add an error message to the caller, to warn
about actual unexpected failures.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current completion ring sizing formula is wrong with TPA enabled.
The formula assumes that the number of TPA completions are bound by the
RX ring size, but that's not true. TPA_START completions are immediately
recycled so they are not bound by the RX ring size. We must add
bp->max_tpa to the worst case maximum RX and TPA completions.
The completion ring can overflow because of this mistake. This will
cause hardware to disable the completion ring when this happens,
leading to RX and TX traffic to stall on that ring. This issue is
generally exposed only when the RX ring size is set very small.
Fix the formula by adding bp->max_tpa to the number of RX completions
if TPA is enabled.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.");
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a shared port PHY configuration, async event is received when any of the
port modifies the configuration. Ethtool link settings should be
initialised after updated PHY configuration from firmware.
Fixes: b1613e78e9 ("bnxt_en: Add async. event logic for PHY configuration changes.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert to new infra, taking advantage of sleeping in callbacks.
v2:
- use bp->*_fw_dst_port_id != INVALID_HW_RING_ID as indication
that the offload is active.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bare-metal use cases require giving firmware and the embedded
application processor control over VLAN offloads. The driver should
not attempt to override or utilize this feature in such scenarios
since it will not work as expected.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware VLAN offload feature on our NIC does not have separate
knobs for handling customer and service tags on RX. Either offloading
of both must be enabled or both must be disabled. Introduce definitions
for the combined feature set in order to clean up the code and make
this constraint more clear. Technically these features can be separately
enabled on TX, however, since the default is to turn both on, the
combined TX feature set is also introduced for code consistency.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the new infrastructure in place, we can now support the setting of
the indirection table from ethtool.
When changing channels, in a rare case that firmware cannot reserve the
rings that were promised, we will still try to keep the RSS map and only
revert to default when absolutely necessary.
v4: Revert RSS map to default during ring change only when absolutely
necessary.
v3: Add warning messages when firmware cannot reserve the requested RX
rings, and when the RSS table entries have to change to default.
v2: When changing channels, if the RSS table size changes and RSS map
is non-default, return error.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have the logical table, we can fill the HW RSS table
using the logical table's entries and converting them to the HW
specific format. Re-initialize the logical table to standard
distribution if the number of RX rings changes during ring reservation.
v4: Use bnxt_get_rxfh_indir_size() to get the RSS table size.
v2: Use ALIGN() to roundup the RSS table size.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some chips, this varies based on the number of RX rings. Add this
helper function and refactor the existing code to use it.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver currently does not keep track of the logical RSS indirection
table. The hardware RSS table is set up with standard default ring
distribution when initializing the chip. This makes it difficult to
support user sepcified indirection table entries. As a first step, add
the logical table in the main bnxt structure and allocate it according
to chip specific table size. Add a function that sets up default
RSS distribution based on the number of RX rings.
v4: Use bnxt_get_rxfh_indir_size() for the current RSS table size.
v2: Use kmalloc_array() since we init. all entries afterwards.
Use ALIGN() to roundup the RSS table size.
Use ethtool_rxfh_indir_default() to init. the default entries.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we allocate one page for the hardware DMA RSS indirection
table. While the size is currently big enough for all chips, future
chip variations may support bigger sizes, so it is better to calculate
and store the chip specific size and allocate accordingly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Virtual functions does not have VPD information. This patch modifies
calling bnxt_read_vpd_info() only for PFs and avoids an unnecessary
error log.
Fixes: a0d0fd70fe ("bnxt_en: Read partno and serialno of the board from VPD")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On older firmware, the hardware statistics are not cleared when the
driver frees the hardware stats contexts during ifdown. The driver
expects these stats to be cleared and saves a copy before freeing
the stats contexts. During the next ifup, the driver will likely
allocate the same hardware stats contexts and this will cause a big
increase in the counters as the old counters are added back to the
saved counters.
We fix it by making an additional firmware call to clear the counters
before freeing the hw stats contexts when the firmware is the older
20.x firmware.
Fixes: b8875ca356 ("bnxt_en: Save ring statistics before reset.")
Reported-by: Jakub Kicinski <kicinski@fb.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Jakub Kicinski <kicinski@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Older firmware may not support legacy TX push properly and may not
be disabling it. So we check certain firmware versions that may
have this problem and disable legacy TX push unconditionally.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently only store the firmware version as a string for ethtool
and devlink info. Store it also as a version code. The next 2
patches will need to check the firmware major version to determine
some workarounds.
We also use the 16-bit firmware version fields if the firmware is newer
and provides the 16-bit fields.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will avoid many uneccessary error logs when driver or firmware is
in reset.
Fixes: 230d1f0de7 ("bnxt_en: Handle firmware reset.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AER reset should follow the same steps as suspend/resume. We need to
free context memory during AER reset and allocate new context memory
during recovery by calling bnxt_hwrm_func_qcaps(). We also need
to call bnxt_reenable_sriov() to restore the VFs.
Fixes: bae361c54f ("bnxt_en: Improve AER slot reset.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If VFs are enabled, we need to re-configure them during resume because
firmware has been reset while resuming. Otherwise, the VFs won't
work after resume.
Fixes: c16d4ee0e3 ("bnxt_en: Refactor logic to re-enable SRIOV after firmware reset detected.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The separate steps we do in bnxt_resume() can be done more simply by
calling bnxt_hwrm_func_qcaps(). This change will add an extra
__bnxt_hwrm_func_qcaps() call which is needed anyway on older
firmware.
Fixes: f9b69d7f62 ("bnxt_en: Fix suspend/resume path on 57500 chips")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.
The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.
Signed-off-by: David S. Miller <davem@davemloft.net>
The explicit mask and shift is not the appropriate way to parse fields
out of a little endian struct. The length field is internally __le16
and the strategy employed only happens to work on little endian machines
because the offset used is actually incorrect (length is at offset 6).
Also remove the related and no longer used definitions from bnxt.h.
Fixes: 845adfe40c ("bnxt_en: Improve valid bit checking in firmware response message.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have logic to maintain network counters across resets by storing
the counters in bp->net_stats_prev before reset. But not all resets
will clear the counters. Certain resets that don't need to change
the number of rings do not clear the counters. The current logic
accumulates the counters before all resets, causing big jumps in
the counters after some resets, such as ethtool -G.
Fix it by only accumulating the counters during reset if the irq_re_init
parameter is set. The parameter signifies that all rings and interrupts
will be reset and that means that the counters will also be reset.
Reported-by: Vijayendra Suman <vijayendra.suman@oracle.com>
Fixes: b8875ca356 ("bnxt_en: Save ring statistics before reset.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently have 3 software ring counters, rx_l4_csum_errors,
rx_buf_errors, and missed_irqs. The 1st two are RX counters and the
last one is a common counter. Organize them into 2 structures
bnxt_rx_sw_stats and bnxt_cmn_sw_stats.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Read the L2 doorbell size from the firmware and only map the portion
of the doorbell BAR for L2 use. This will leave the remaining doorbell
BAR available for the RoCE driver to use. The RoCE driver can map
the remaining portion as write-combining to support the push feature.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define the 57500 chip doorbell offsets instead of using the magic
values in the C file.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware does not expect the CRC to be included in the length
passed from the driver. The firmware always configures the chip
to strip out the CRC.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current formulas to calculate the TQM slow path and fast path ring
context memory sizes are not quite correct. TQM slow path entry is
array index 0 of ctx->tqm_mem[]. The other array entries are for fast
path. Fix these sizes according to latest firmware spec. for 57500 and
newer chips.
Fixes: 3be8136ce1 ("bnxt_en: Initialize context memory to the value specified by firmware.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer firmware spec. will specify the number of TQM rings to allocate
context memory for. Use the firmware specified value and fall back
to the old value derived from bp->max_q if it is not available.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current logic in bnxt_fix_features() will inadvertently turn on both
CTAG and STAG VLAN offload if the user tries to disable both. Fix it
by checking that the user is trying to enable CTAG or STAG before
enabling both. The logic is supposed to enable or disable both CTAG and
STAG together.
Fixes: 5a9f6b238e ("bnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_alloc_ctx_pg_tbls() should return error when the memory size of the
context memory to set up is zero. By returning success (0), the caller
may proceed normally and may crash later when it tries to set up the
memory.
Fixes: 08fe9d1816 ("bnxt_en: Add Level 2 context memory paging support.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improve the slot reset sequence by disabling the device to prevent bad
DMAs if slot reset fails. Return the proper result instead of always
PCI_ERS_RESULT_RECOVERED to the caller.
Fixes: 6316ea6db9 ("bnxt_en: Enable AER support.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Store the part number and serial number information from VPD in
the bnxt structure. Follow up patch will add the support to display
the information via devlink command.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Display the minimum version of firmware interface spec supported
between driver and firmware. Also update bnxt.rst documentation file.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Overlapping header include additions in macsec.c
A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c
Overlapping test additions in selftests Makefile
Overlapping PCI ID table adjustments in iwlwifi driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
If ring counts are not reset when ring reservation fails,
bnxt_init_dflt_ring_mode() will not be called again to reinitialise
IRQs when open() is called and results in system crash as napi will
also be not initialised. This patch fixes it by resetting the ring
counts.
Fixes: 47558acd56 ("bnxt_en: Reserve rings at driver open if none was reserved at probe time.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Other shutdown code paths will always disable PCI first to shutdown DMA
before freeing context memory. Do the same sequence in the error path
of probe to be safe and consistent.
Fixes: c20dc142dd ("bnxt_en: Disable bus master during PCI shutdown and driver unload.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code ignores the return value from
bnxt_hwrm_func_backing_store_cfg(), causing the driver to proceed in
the init path even when this vital firmware call has failed. Fix it
by propagating the error code to the caller.
Fixes: 1b9394e5a2 ("bnxt_en: Configure context memory on new devices.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is an indexing bug in determining these ethtool priority
counters. Instead of using the queue ID to index, we need to
normalize by modulo 10 to get the index. This index is then used
to obtain the proper CoS queue counter. Rename bp->pri2cos to
bp->pri2cos_idx to make this more clear.
Fixes: e37fed7903 ("bnxt_en: Add ethtool -S priority counters.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to other drivers, properly clear the devlink port type when
removing the device before unregistration.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If firmware command returns error code as HWRM_ERR_CODE_BUSY, which
means it cannot handle the command due to a conflicting command
from another function, convert it to -EAGAIN. If it is an ethtool
operation, this error code will be returned to userspace.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return code is not needed in some of these functions, as the return
code from firmware message is ignored. Remove the unused rc variable
and also convert functions to void.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of converting error code in firmware message to standard
code, checking for firmware return code is removed in most of the
places. Remove the assignment of return code where the function
can directly return.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver stores a copy of the DCB settings that have been applied to
the firmware. After firmware reset, the firmware settings are gone and
will revert back to default. Clear the driver's copy so that if there
is a DCBNL request to get the settings, the driver will retrieve the
current settings from the firmware. lldpad keeps the DCB settings in
userspace and will re-apply the settings if it is running.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we are in continuous NAPI polling mode, the current code in
bnxt_poll_p5() will only process the completion rings and will not
process the NQ until interrupt is re-enabled. Tis logic works and
will not cause RX or TX starvation, but async events in the NQ may
be delayed for the duration of continuous NAPI polling. These
async events may be firmware or VF events.
Continue to handle the NQ after we are done polling the completion
rings. This actually simplies the code in bnxt_poll_p5().
Acknowledge the NQ so these async events will not overflow.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the function by removing tha 'all' parameter. In the current
code, the caller has to specify whether to update/arm both completion
rings with the 'all' parameter.
Instead of this, we can just update/arm all the completion rings
that have been polled. By setting cpr->had_work_done earlier in
__bnxt_poll_work(), we know which completion ring has been polled
and can just update/arm all the completion rings with
cpr->had_work_done set.
This simplifies the function with one less parameter and works just
as well.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnxt_poll_p5(), the logic polls for up to 2 completion rings (RX and
TX) for work. In the current code, if we reach budget polling the
first completion ring, we will stop. If the other completion ring
has work to do, we will handle it when NAPI calls us back.
This is not optimal. We potentially leave an unproceesed entry in
the NQ. When we are finally done with NAPI polling and re-enable
interrupt, the remaining entry in the NQ will cause interrupt to
be triggered immediately for no reason.
Modify the code in bnxt_poll_p5() to keep looping until all NQ
entries are handled even if the first completion ring has reached
budget.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the open-coded implementation for reading the PCIe DSN with
pci_get_dsn().
Use of put_unaligned_le64 should be correct. pci_get_dsn() will perform
two pci_read_config_dword calls. The first dword will be placed in the
first 32 bits of the u64, while the second dword will be placed in the
upper 32 bits of the u64.
On Little Endian systems, the least significant byte comes first, which
will be the least significant byte of the first dword, followed by the
least significant byte of the second dword. Since the _le32 variations
do not perform byte swapping, we will correctly copy the dwords into the
dsn[] array in the same order as before.
On Big Endian systems, the most significant byte of the second dword
will come first. put_unaligned_le64 will perform a CPU_TO_LE64, which
will swap things correctly before copying. This should also end up with
the correct bytes in the dsn[] array.
While at it, fix a small typo in the netdev_info error message when the
DSN cannot be read.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>