Commit graph

999755 commits

Author SHA1 Message Date
Aya Levin
5b232ea94c net/mlx5e: Fix RQ creation flow for queues which doesn't support XDP
Allow to create an RQ which is not registered as an XDP RQ. For example:
the trap-RQ doesn't register as an XDP RQ.

Fixes: 869c5f9262 ("net/mlx5e: Generalize open RQ")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:03:10 -07:00
Wenpeng Liang
31450b435f net/mlx5: Replace spaces with tab at the start of a line
There should be no spaces at the start of the line.

Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:03:07 -07:00
Wenpeng Liang
9dee115bc1 net/mlx5: Remove return statement exist at the end of void function
void function return statements are not generally useful.

Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:03:04 -07:00
Wenpeng Liang
02f47c04c3 net/mlx5: Add a blank line after declarations
There should be a blank lines after declarations.

Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:03:01 -07:00
Colin Ian King
82c3ba31c3 net/mlx5: Fix bit-wise and with zero
The bit-wise and of the action field with MLX5_ACCEL_ESP_ACTION_DECRYPT
is incorrect as MLX5_ACCEL_ESP_ACTION_DECRYPT is zero and not intended
to be a bit-flag. Fix this by using the == operator as was originally
intended.

Addresses-Coverity: ("Logically dead code")
Fixes: 7dfee4b1d7 ("net/mlx5: IPsec, Refactor SA handle creation and destruction")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:58 -07:00
Roi Dayan
b7f86258a2 net/mlx5: DR, Alloc cmd buffer with kvzalloc() instead of kzalloc()
The cmd size is 8K so use kvzalloc().

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:55 -07:00
Jianbo Liu
9dac2966c5 net/mlx5: DR, Use variably sized data structures for different actions
mlx5dr_action is a generally used data structure, and there is an
union for different types of actions in it. The size of mlx5dr_action
is about 72 bytes, but for those actions with fewer fields, most of
the allocated memory is wasted.
Remove this union, and mlx5dr_action becomes a generic action header.
Then actions are dynamically allocated with needed memory, the data
for each action is stored right after the header.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:52 -07:00
Parav Pandit
a74ed24c43 net/mlx5: SF, Reuse stored hardware function id
SF's hardware function id is already stored in mlx5_sf. Reuse it,
instead of querying the hw table.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:49 -07:00
Parav Pandit
6e74e6ea1b net/mlx5: SF, Use device pointer directly
At many places in the code, device pointer is directly available. Make
use of it, instead of accessing it from the table.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:46 -07:00
Parav Pandit
57b92bdd9e net/mlx5: E-Switch, Initialize eswitch acls ns when eswitch is enabled
Currently eswitch flow steering (FS) namespace of vport's ingress and
egress ACL are enabled when FS layer is initialized. This is done even
when eswitch is diabled. This demands that total eswitch ports to be
known to FS layer without eswitch in use.

Given the FS core is not dependent on eswitch, make namespace init and
cleanup routines as helper routines to be invoked only when eswitch is
needed.

With this change, ingress and egress ACL namespaces are created only
when eswitch legacy/offloads mode is enabled.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:43 -07:00
Parav Pandit
b55b35382e net/mlx5: E-Switch, Move legacy code to a individual file
Currently eswitch offers two modes. Legacy and offloads.
Offloads code is already in its own file eswitch_offloads.c

However eswitch.c contains the eswitch legacy code and common
infrastructure  code.

To enable future extensions and to better manage generic common eswitch
infrastructure code, move the legacy code to its own legacy.c file.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:40 -07:00
Parav Pandit
b16f2bb6b6 net/mlx5: E-Switch, Convert a macro to a helper routine
Convert ESW_ALLOWED macro to a helper routine so that it can be used in
other eswitch files.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:38 -07:00
Parav Pandit
13795553a8 net/mlx5: E-Switch Make cleanup sequence mirror of init
Make cleanup sequence mirror of init sequence for cleaning up reps
and freeing vports.

Also when reps initialization fails, there is no need to perform reps
cleanup.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:35 -07:00
Parav Pandit
6308a5f06b net/mlx5: E-Switch, Make vport number u16
Vport number is 16-bit field in hardware. Make it u16.

Move location of vport in the structure so that it reduces a hole
in the structure.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:32 -07:00
Parav Pandit
7d5ae47891 net/mlx5: E-Switch, Skip querying SF enabled bits
With vhca events, SF state is queried through the VHCA events. Device no
longer expects SF bitmap in the query eswitch functions command.

Hence, remove it to simplify the code.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:29 -07:00
Parav Pandit
7bf481d7e7 net/mlx5: E-Switch, let user to enable disable metadata
Currently each packet inserted in eswitch is tagged with a internal
metadata to indicate source vport. Metadata tagging is not always
needed. Metadata insertion is needed for multi-port RoCE, failover
between representors and stacked devices. In many other cases,
metadata enablement is not needed.

Metadata insertion slows down the packet processing rate of the E-switch
when it is in switchdev mode.

Below table show performance gain with metadata disabled for VXLAN
offload rules in both SMFS and DMFS steering mode on ConnectX-5 device.

----------------------------------------------
| steering | metadata | pkt size | rx pps    |
| mode     |          |          | (million) |
----------------------------------------------
| smfs     | disabled | 128Bytes | 42        |
----------------------------------------------
| smfs     | enabled  | 128Bytes | 36        |
----------------------------------------------
| dmfs     | disabled | 128Bytes | 42        |
----------------------------------------------
| dmfs     | enabled  | 128Bytes | 36        |
----------------------------------------------

Hence, allow user to disable metadata using driver specific devlink
parameter. Metadata setting of the eswitch is applicable only for the
switchdev mode.

Example to show and disable metadata before changing eswitch mode:
$ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
pci/0000:06:00.0:
  name esw_port_metadata type driver-specific
    values:
      cmode runtime value true

$ devlink dev param set pci/0000:06:00.0 \
	  name esw_port_metadata value false cmode runtime

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
changelog:
v1->v2:
 - added performance numbers in commit log
 - updated commit log and documentation for switchdev mode
 - added explicit note on when user can disable metadata in
   documentation
2021-04-14 11:02:26 -07:00
Adam Ford
8ef7adc6be net: ethernet: ravb: Enable optional refclk
For devices that use a programmable clock for the AVB reference clock,
the driver may need to enable them.  Add code to find the optional clock
and enable it when available.

Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 14:09:59 -07:00
Adam Ford
6f43735b6d dt-bindings: net: renesas,etheravb: Add additional clocks
The AVB driver assumes there is an external crystal, but it could
be clocked by other means.  In order to enable a programmable
clock, it needs to be added to the clocks list and enabled in the
driver.  Since there currently only one clock, there is no
clock-names list either.

Update bindings to add the additional optional clock, and explicitly
name both of them.

Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 14:09:59 -07:00
David S. Miller
d27139c544 Merge branch 'enetc-ptp'
Yangbo Lu says:

====================
enetc: support PTP Sync packet one-step timestamping

This patch-set is to add support for PTP Sync packet one-step timestamping.
Since ENETC single-step register has to be configured dynamically per
packet for correctionField offeset and UDP checksum update, current
one-step timestamping packet has to be sent only when the last one
completes transmitting on hardware. So, on the TX, this patch handles
one-step timestamping packet as below:

- Trasmit packet immediately if no other one in transfer, or queue to
  skb queue if there is already one in transfer.
  The test_and_set_bit_lock() is used here to lock and check state.
- Start a work when complete transfer on hardware, to release the bit
  lock and to send one skb in skb queue if has.

Changes for v2:
	- Rebased.
	- Fixed issues from patchwork checks.
	- netif_tx_lock for one-step timestamping packet sending.
Changes for v3:
	- Used system workqueue.
	- Set bit lock when transmitted one-step packet, and scheduled
	  work when completed. The worker cleared the bit lock, and
	  transmitted one skb in skb queue if has, instead of a loop.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:34:21 -07:00
Yangbo Lu
7294380c52 enetc: support PTP Sync packet one-step timestamping
This patch is to add support for PTP Sync packet one-step timestamping.
Since ENETC single-step register has to be configured dynamically per
packet for correctionField offeset and UDP checksum update, current
one-step timestamping packet has to be sent only when the last one
completes transmitting on hardware. So, on the TX, this patch handles
one-step timestamping packet as below:

- Trasmit packet immediately if no other one in transfer, or queue to
  skb queue if there is already one in transfer.
  The test_and_set_bit_lock() is used here to lock and check state.
- Start a work when complete transfer on hardware, to release the bit
  lock and to send one skb in skb queue if has.

And the configuration for one-step timestamping on ENETC before
transmitting is,

- Set one-step timestamping flag in extension BD.
- Write 30 bits current timestamp in tstamp field of extension BD.
- Update PTP Sync packet originTimestamp field with current timestamp.
- Configure single-step register for correctionField offeset and UDP
  checksum update.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:34:21 -07:00
Yangbo Lu
f768e75130 enetc: mark TX timestamp type per skb
Mark TX timestamp type per skb on skb->cb[0], instead of
global variable for all skbs. This is a preparation for
one step timestamp support.

For one-step timestamping enablement, there will be both
one-step and two-step PTP messages to transfer. And a skb
queue is needed for one-step PTP messages making sure
start to send current message only after the last one
completed on hardware. (ENETC single-step register has to
be dynamically configured per message.) So, marking TX
timestamp type per skb is required.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:34:21 -07:00
David S. Miller
8043edee9a Merge branch 'ibmvnic-errors'
Lijun Pan says:

====================
ibmvnic: improve error printing

Patch 1 prints reset reason as a string.
Patch 2 prints adapter state as a string.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:31:27 -07:00
Lijun Pan
0666ef7f61 ibmvnic: print adapter state as a string
The adapter state can be added or deleted over different versions
of the source code. Print a string instead of a number.

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:31:26 -07:00
Lijun Pan
caee7bf5b0 ibmvnic: print reset reason as a string
The reset reason can be added or deleted over different versions
of the source code. Print a string instead of a number.

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:31:26 -07:00
Lijun Pan
c82eaa4064 ibmvnic: clean up the remaining debugfs data structures
Commit e704f0434e ("ibmvnic: Remove debugfs support") did not
clean up everything. Remove the remaining code.

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:29:10 -07:00
David S. Miller
645b34a7b5 Merge branch 'netns-sysctl-isolation'
Jonathon Reinhart says:

====================
Ensuring net sysctl isolation

This patchset is the result of an audit of /proc/sys/net to prove that
it is safe to be mouted read-write in a container when a net namespace
is in use. See [1].

The first commit adds code to detect sysctls which are not netns-safe,
and can "leak" changes to other net namespaces.

My manual audit found, and the above feature confirmed, that there are
two nf_conntrack sysctls which are in fact not netns-safe.

I considered sending the latter to netfilter-devel, but I think it's
better to have both together on net-next: Adding only the former causes
undesirable warnings in the kernel log.

[1]: https://github.com/opencontainers/runc/issues/2826
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:27:11 -07:00
Jonathon Reinhart
2671fa4dc0 netfilter: conntrack: Make global sysctls readonly in non-init netns
These sysctls point to global variables:
- NF_SYSCTL_CT_MAX (&nf_conntrack_max)
- NF_SYSCTL_CT_EXPECT_MAX (&nf_ct_expect_max)
- NF_SYSCTL_CT_BUCKETS (&nf_conntrack_htable_size_user)

Because their data pointers are not updated to point to per-netns
structures, they must be marked read-only in a non-init_net ns.
Otherwise, changes in any net namespace are reflected in (leaked into)
all other net namespaces. This problem has existed since the
introduction of net namespaces.

The current logic marks them read-only only if the net namespace is
owned by an unprivileged user (other than init_user_ns).

Commit d0febd81ae ("netfilter: conntrack: re-visit sysctls in
unprivileged namespaces") "exposes all sysctls even if the namespace is
unpriviliged." Since we need to mark them readonly in any case, we can
forego the unprivileged user check altogether.

Fixes: d0febd81ae ("netfilter: conntrack: re-visit sysctls in unprivileged namespaces")
Signed-off-by: Jonathon Reinhart <Jonathon.Reinhart@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:27:11 -07:00
Jonathon Reinhart
31c4d2f160 net: Ensure net namespace isolation of sysctls
This adds an ensure_safe_net_sysctl() check during register_net_sysctl()
to validate that sysctl table entries for a non-init_net netns are
sufficiently isolated. To be netns-safe, an entry must adhere to at
least (and usually exactly) one of these rules:

1. It is marked read-only inside the netns.
2. Its data pointer does not point to kernel/module global data.

An entry which fails both of these checks is indicative of a bug,
whereby a child netns can affect global net sysctl values.

If such an entry is found, this code will issue a warning to the kernel
log, and force the entry to be read-only to prevent a leak.

To test, simply create a new netns:

    $ sudo ip netns add dummy

As it sits now, this patch will WARN for two sysctls which will be
addressed in a subsequent patch:
- /proc/sys/net/netfilter/nf_conntrack_max
- /proc/sys/net/netfilter/nf_conntrack_expect_max

Signed-off-by: Jonathon Reinhart <Jonathon.Reinhart@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:27:11 -07:00
wengjianfeng
a115d24a63 nfc: pn533: remove redundant assignment
In many places,first assign a value to a variable and then return
the variable. which is redundant, we should directly return the value.
in pn533_rf_field funciton,return rc also in the if statement, so we
use return 0 to replace the last return rc.

Signed-off-by: wengjianfeng <wengjianfeng@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:23:03 -07:00
David S. Miller
5711ffd313 Merge branch 'bnxt_en-error-recovery'
Michael Chan says:

====================
bnxt_en: Error recovery fixes.

This series adds some fixes and enhancements to the error recovery
logic.  The health register logic is improved and we also add missing
code to free and re-create VF representors in the firmware after
error recovery.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:20:38 -07:00
Sriharsha Basavapatna
ac797ced1f bnxt_en: Free and allocate VF-Reps during error recovery.
During firmware recovery, VF-Rep configuration in the firmware is lost.
Fix it by freeing and (re)allocating VF-Reps in FW at relevant points
during the error recovery process.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:20:38 -07:00
Michael Chan
90f4fd0296 bnxt_en: Refactor __bnxt_vf_reps_destroy().
Add a new helper function __bnxt_free_one_vf_rep() to free one VF rep.
We also reintialize the VF rep fields to proper initial values so that
the function can be used without freeing the VF rep data structure.  This
will be used in subsequent patches to free and recreate VF reps after
error recovery.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:20:38 -07:00
Sriharsha Basavapatna
ea2d37b2b3 bnxt_en: Refactor bnxt_vf_reps_create().
Add a new function bnxt_alloc_vf_rep() to allocate a VF representor.
This function will be needed in subsequent patches to recreate the
VF reps after error recovery.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:20:38 -07:00
Vasundhara Volam
190eda1a9d bnxt_en: Invalidate health register mapping at the end of probe.
After probe is successful, interface may not be bought up in all
the cases and health register mapping could be invalid if firmware
undergoes reset. Fix it by invalidating the health register at the
end of probe. It will be remapped during ifup.

Fixes: 43a440c400 ("bnxt_en: Improve the status_reliable flag in bp->fw_health.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:20:38 -07:00
Michael Chan
17e1be342d bnxt_en: Treat health register value 0 as valid in bnxt_try_reover_fw().
The retry loop in bnxt_try_recover_fw() should not abort when the
health register value is 0.  It is a valid value that indicates the
firmware is booting up.

Fixes: 861aae786f ("bnxt_en: Enhance retry of the first message to the firmware.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:20:38 -07:00
Andrea Mayer
0d77036057 net: seg6: trivial fix of a spelling mistake in comment
There is a comment spelling mistake "interfarence" -> "interference" in
function parse_nla_action(). Fix it.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:17:09 -07:00
Colin Ian King
d0494135f9 net: hns3: Fix potential null pointer defererence of null ae_dev
The reset_prepare and reset_done calls have a null pointer check
on ae_dev however ae_dev is being dereferenced via the call to
ns3_is_phys_func with the ae->pdev argument. Fix this by performing
a null pointer check on ae_dev and hence short-circuiting the
dereference to ae_dev on the call to ns3_is_phys_func.

Addresses-Coverity: ("Dereference before null check")
Fixes: 715c58e94f ("net: hns3: add suspend and resume pm_ops")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:15:21 -07:00
Colin Ian King
e701a25840 net: thunderx: Fix unintentional sign extension issue
The shifting of the u8 integers rq->caching by 26 bits to
the left will be promoted to a 32 bit signed int and then
sign-extended to a u64. In the event that rq->caching is
greater than 0x1f then all then all the upper 32 bits of
the u64 end up as also being set because of the int
sign-extension. Fix this by casting the u8 values to a
u64 before the 26 bit left shift.

Addresses-Coverity: ("Unintended sign extension")
Fixes: 4863dea3fa ("net: Adding support for Cavium ThunderX network controller")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:13:57 -07:00
Colin Ian King
dd2c796773 cxgb4: Fix unintentional sign extension issues
The shifting of the u8 integers f->fs.nat_lip[] by 24 bits to
the left will be promoted to a 32 bit signed int and then
sign-extended to a u64. In the event that the top bit of the u8
is set then all then all the upper 32 bits of the u64 end up as
also being set because of the sign-extension. Fix this by
casting the u8 values to a u64 before the 24 bit left shift.

Addresses-Coverity: ("Unintended sign extension")
Fixes: 12b276fbf6 ("cxgb4: add support to create hash filters")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:13:17 -07:00
David S. Miller
5b489fea97 Merge branch 'ipa-next'
Alex Elder says:

====================
net: ipa: support two more platforms

This series adds IPA support for two more Qualcomm SoCs.

The first patch updates the DT binding to add compatible strings.

The second temporarily disables checksum offload support for IPA
version 4.5 and above.  Changes are required to the RMNet driver
to support the "inline" checksum offload used for IPA v4.5+, and
once those are present this capability will be enabled for IPA.

The third and fourth patches add configuration data for IPA versions
4.5 (used for the SDX55 SoC) and 4.11 (used for the SD7280 SoC).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:49:08 -07:00
Alex Elder
927c504345 net: ipa: add IPA v4.11 configuration data
Add support for the SC7280 SoC, which includes IPA version 4.11.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:49:08 -07:00
Alex Elder
fbb763e7e7 net: ipa: add IPA v4.5 configuration data
Add support for the SDX55 SoC, which includes IPA version 4.5.

Starting with IPA v4.5, a few of the memory regions have a different
number of "canary" values; update comments in the where the region
identifers are defined to accurately reflect that.

I'll note three differences in SDX55 versus the other two existing
platforms (SDM845 and SC7180):
  - SDX55 uses a 32-bit Linux kernel
  - SDX55 has four interconnects rather than three
  - SDX55 uses IPA v4.5, which uses inline checksum offload

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:49:08 -07:00
Alex Elder
c88c34fcf8 net: ipa: disable checksum offload for IPA v4.5+
Checksum offload for IPA v4.5+ is implemented differently, using
"inline" offload (which uses a common header format for both upload
and download offload).

The IPA hardware must be programmed to enable MAP checksum offload,
but the RMNet driver is responsible for interpreting checksum
metadata supplied with messages.

Currently, the RMNet driver does not support inline checksum offload.
This support is imminent, but until it is available, do not allow
newer versions of IPA to specify checksum offload for endpoints.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:49:08 -07:00
Alex Elder
c3264fee72 dt-bindings: net: qcom,ipa: add some compatible strings
Add existing supported platform "qcom,sc7180-ipa" to the set of IPA
compatible strings.  Also add newly-supported "qcom,sdx55-ipa",
"qcom,sc7280-ipa".

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:49:08 -07:00
Qiheng Lin
95291ced81 ehea: add missing MODULE_DEVICE_TABLE
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:42:38 -07:00
David S. Miller
23cfa4d4aa Merge branch 'veth-gro'
Paolo Abeni  says:

====================
veth: allow GRO even without XDP

This series allows the user-space to enable GRO/NAPI on a veth
device even without attaching an XDP program.

It does not change the default veth behavior (no NAPI, no GRO),
except that the GRO feature bit on top of this series will be
effectively off by default on veth devices. Note that currently
the GRO bit is on by default, but GRO never takes place in
absence of XDP.

On top of this series, setting the GRO feature bit enables NAPI
and allows the GRO to take place. The TSO features on the peer
device are preserved.

The main goal is improving UDP forwarding performances for
containers in a typical virtual network setup:

(container) veth -> veth peer -> bridge/ovs -> vxlan -> NIC

Enabling the NAPI threaded mode, GRO the NETIF_F_GRO_UDP_FWD
feature on the veth peer improves the UDP stream performance
with not void netfilter configuration by 2x factor with no
measurable overhead for TCP traffic: some heuristic ensures
that TCP will not go through the additional NAPI/GRO layer.

Some self-tests are added to check the expected behavior in
the default configuration, with XDP and with plain GRO enabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:39:28 -07:00
Paolo Abeni
1c3cadbe02 self-tests: add veth tests
Add some basic veth tests, that verify the expected flags and
aggregation with different setups (default, xdp, etc...)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:39:28 -07:00
Paolo Abeni
47e550e010 veth: refine napi usage
After the previous patch, when enabling GRO, locally generated
TCP traffic experiences some measurable overhead, as it traverses
the GRO engine without any chance of aggregation.

This change refine the NAPI receive path admission test, to avoid
unnecessary GRO overhead in most scenarios, when GRO is enabled
on a veth peer.

Only skbs that are eligible for aggregation enter the GRO layer,
the others will go through the traditional receive path.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:39:28 -07:00
Paolo Abeni
d3256efd8e veth: allow enabling NAPI even without XDP
Currently the veth device has the GRO feature bit set, even if
no GRO aggregation is possible with the default configuration,
as the veth device does not hook into the GRO engine.

Flipping the GRO feature bit from user-space is a no-op, unless
XDP is enabled. In such scenario GRO could actually take place, but
TSO is forced to off on the peer device.

This change allow user-space to really control the GRO feature, with
no need for an XDP program.

The GRO feature bit is now cleared by default - so that there are no
user-visible behavior changes with the default configuration.

When the GRO bit is set, the per-queue NAPI instances are initialized
and registered. On xmit, when napi instances are available, we try
to use them.

Some additional checks are in place to ensure we initialize/delete NAPIs
only when needed in case of overlapping XDP and GRO configuration
changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:39:28 -07:00
Paolo Abeni
c75fb320d4 veth: use skb_orphan_partial instead of skb_orphan
As described by commit 9c4c325252 ("skbuff: preserve sock
reference when scrubbing the skb."), orphaning a skb
in the TX path will cause OoO.

Let's use skb_orphan_partial() instead of skb_orphan(), so
that we keep the sk around for queue's selection sake and we
still avoid the problem fixed with commit 4bf9ffa0fb ("veth:
Orphan skb before GRO")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:39:28 -07:00