Commit graph

1176 commits

Author SHA1 Message Date
Linus Torvalds
6e5f6a8691 vhost,virtio,vdpa: features, fixes, cleanups
mac vlan filter and stats support in mlx5 vdpa
 irq hardening in virtio
 performance improvements in virtio crypto
 polling i/o support in virtio blk
 ASID support in vhost
 fixes, cleanups all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmKXBEwPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRp9pgIAJKdLbtrqDdDrPJRg5Xs2wsWMbGnXbKxm71Q
 kGPfnkn15yjwiMJcdNO8O16qP5VGUzBWPhGc3X7U9KRU4EI2yEGLFv9hwRvdlY9h
 4C7O4fBWTbuqTT6pwwAH9vT3/b6q8bMh4D3uD2P1zSwh8POqHm1kyRivXPPr4PUA
 5TAc04EGcrFQpC9co3v8UmJ7EjLa/tgdUb7igTv5VHi7K2GNGMycSvHP/v7CFgZ6
 paVYThMCrWOJkIIk8k6qhMKhAxkQ7lBrqhLBtfX9jyadLmq5hAQQu2D43PoZb2ed
 IbNN/g5Sv9yaL7ERYzTRAEfuXVGPiDLWli2C5sHdz2IJDBeqQeU=
 =Vlh6
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "vhost,virtio and vdpa features, fixes, and cleanups:

   - mac vlan filter and stats support in mlx5 vdpa

   - irq hardening in virtio

   - performance improvements in virtio crypto

   - polling i/o support in virtio blk

   - ASID support in vhost

   - fixes, cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits)
  vdpa: ifcvf: set pci driver data in probe
  vdpa/mlx5: Add RX MAC VLAN filter support
  vdpa/mlx5: Remove flow counter from steering
  vhost: rename vhost_work_dev_flush
  vhost-test: drop flush after vhost_dev_cleanup
  vhost-scsi: drop flush after vhost_dev_cleanup
  vhost_vsock: simplify vhost_vsock_flush()
  vhost_test: remove vhost_test_flush_vq()
  vhost_net: get rid of vhost_net_flush_vq() and extra flush calls
  vhost: flush dev once during vhost_dev_stop
  vhost: get rid of vhost_poll_flush() wrapper
  vhost-vdpa: return -EFAULT on copy_to_user() failure
  vdpasim: Off by one in vdpasim_set_group_asid()
  virtio: Directly use ida_alloc()/free()
  virtio: use WARN_ON() to warning illegal status value
  virtio: harden vring IRQ
  virtio: allow to unbreak virtqueue
  virtio-ccw: implement synchronize_cbs()
  virtio-mmio: implement synchronize_cbs()
  virtio-pci: implement synchronize_cbs()
  ...
2022-06-03 10:02:52 -07:00
Linus Torvalds
58f9d52ff6 Networking fixes for 5.19-rc1, including fixes from bpf, and netfilter.
Current release - new code bugs:
 
  - af_packet: make sure to pull the MAC header, avoid skb panic in GSO
 
  - ptp_clockmatrix: fix inverted logic in is_single_shot()
 
  - netfilter: flowtable: fix missing FLOWI_FLAG_ANYSRC flag
 
  - dt-bindings: net: adin: fix adi,phy-output-clock description syntax
 
  - wifi: iwlwifi: pcie: rename CAUSE macro, avoid MIPS build warning
 
 Previous releases - regressions:
 
  - Revert "net: af_key: add check for pfkey_broadcast in function
    pfkey_process"
 
  - tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd
 
  - nf_tables: disallow non-stateful expression in sets earlier
 
  - nft_limit: clone packet limits' cost value
 
  - nf_tables: double hook unregistration in netns path
 
  - ping6: fix ping -6 with interface name
 
 Previous releases - always broken:
 
  - sched: fix memory barriers to prevent skbs from getting stuck
    in lockless qdiscs
 
  - neigh: set lower cap for neigh_managed_work rearming, avoid
    constantly scheduling the probe work
 
  - bpf: fix probe read error on big endian in ___bpf_prog_run()
 
  - amt: memory leak and error handling fixes
 
 Misc:
 
  - ipv6: expand & rename accept_unsolicited_na to accept_untracked_na
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmKY9lMACgkQMUZtbf5S
 IrtNvA//WCpG53NwSy8aV2X/0vkprVEuO8EQeIYhaw1R4KlVcqrITQcaLqkq/xL/
 RUq6F/plftMSiuGRhTp/Sgbl0o0XgJkf769m4zQxz9NqWqgcw5kwJPu4Xq1nSM9t
 /2qAFNDnXShxRiSYrI0qxQrmd0OUjtsibxsKRTSrrlvcd6zYfrx/7+QK5qpLMF9E
 zJpBSYQm2R0RLGRith99G8w3WauhlprPaxyQ71ogQtBhTF+Eg7K+xEm2D5DKtyvj
 7CLyrQtR0jyDBAt2ZPCh5D/yVPkNI1rigQ8m4uiW9DE6mk1DsxxY+DIOt5vQPBdR
 x9Pq0qG54KS5sP18ABeNRQn4NWdkhVf/CcPkaRxHJdRs13mpQUATJRpZ3Ytd9Nt0
 HW6Kby+zY6bdpUX8+UYdhcG6wbt0Lw8B+bSCjiqfE/CBbfUFA3L9/q/5Hk8Xbnxn
 lCIk4asxQgpNhcZ+PAkZfFgE0GNDKnXDu1thO+q7/N9srZrrh9WQW5qoq5lexo8V
 c01jRbPTKa64Gbvm+xDDGEwSl2uIRITtea284bL3q6lnI50n50dlLOAW0z5tmbEg
 X9OHae5bMAdtvS5A1ForJaWA/Mj35ZqtGG5oj0WcGcLupVyec3rgaYaJtNvwgoDx
 ptCQVIMLTAHXtZMohm0YrBizg0qbqmCd2c0/LB+3odX328YStJU=
 =bWkn
 -----END PGP SIGNATURE-----

Merge tag 'net-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and netfilter.

  Current release - new code bugs:

   - af_packet: make sure to pull the MAC header, avoid skb panic in GSO

   - ptp_clockmatrix: fix inverted logic in is_single_shot()

   - netfilter: flowtable: fix missing FLOWI_FLAG_ANYSRC flag

   - dt-bindings: net: adin: fix adi,phy-output-clock description syntax

   - wifi: iwlwifi: pcie: rename CAUSE macro, avoid MIPS build warning

  Previous releases - regressions:

   - Revert "net: af_key: add check for pfkey_broadcast in function
     pfkey_process"

   - tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd

   - nf_tables: disallow non-stateful expression in sets earlier

   - nft_limit: clone packet limits' cost value

   - nf_tables: double hook unregistration in netns path

   - ping6: fix ping -6 with interface name

  Previous releases - always broken:

   - sched: fix memory barriers to prevent skbs from getting stuck in
     lockless qdiscs

   - neigh: set lower cap for neigh_managed_work rearming, avoid
     constantly scheduling the probe work

   - bpf: fix probe read error on big endian in ___bpf_prog_run()

   - amt: memory leak and error handling fixes

  Misc:

   - ipv6: expand & rename accept_unsolicited_na to accept_untracked_na"

* tag 'net-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (80 commits)
  net/af_packet: make sure to pull mac header
  net: add debug info to __skb_pull()
  net: CONFIG_DEBUG_NET depends on CONFIG_NET
  stmmac: intel: Add RPL-P PCI ID
  net: stmmac: use dev_err_probe() for reporting mdio bus registration failure
  tipc: check attribute length for bearer name
  ice: fix access-beyond-end in the switch code
  nfp: remove padding in nfp_nfdk_tx_desc
  ax25: Fix ax25 session cleanup problems
  net: usb: qmi_wwan: Add support for Cinterion MV31 with new baseline
  sfc/siena: fix wrong tx channel offset with efx_separate_tx_channels
  sfc/siena: fix considering that all channels have TX queues
  socket: Don't use u8 type in uapi socket.h
  net/sched: act_api: fix error code in tcf_ct_flow_table_fill_tuple_ipv6()
  net: ping6: Fix ping -6 with interface name
  macsec: fix UAF bug for real_dev
  octeontx2-af: fix error code in is_valid_offset()
  wifi: mac80211: fix use-after-free in chanctx code
  bonding: guard ns_targets by CONFIG_IPV6
  tcp: tcp_rtx_synack() can be called from process context
  ...
2022-06-02 12:50:16 -07:00
Linus Torvalds
176882156a VFIO updates for v5.19-rc1
- Improvements to mlx5 vfio-pci variant driver, including support
    for parallel migration per PF (Yishai Hadas)
 
  - Remove redundant iommu_present() check (Robin Murphy)
 
  - Ongoing refactoring to consolidate the VFIO driver facing API
    to use vfio_device (Jason Gunthorpe)
 
  - Use drvdata to store vfio_device among all vfio-pci and variant
    drivers (Jason Gunthorpe)
 
  - Remove redundant code now that IOMMU core manages group DMA
    ownership (Jason Gunthorpe)
 
  - Remove vfio_group from external API handling struct file ownership
    (Jason Gunthorpe)
 
  - Correct typo in uapi comments (Thomas Huth)
 
  - Fix coccicheck detected deadlock (Wan Jiabing)
 
  - Use rwsem to remove races and simplify code around container and
    kvm association to groups (Jason Gunthorpe)
 
  - Harden access to devices in low power states and use runtime PM to
    enable d3cold support for unused devices (Abhishek Sahu)
 
  - Fix dma_owner handling of fake IOMMU groups (Jason Gunthorpe)
 
  - Set driver_managed_dma on vfio-pci variant drivers (Jason Gunthorpe)
 
  - Pass KVM pointer directly rather than via notifier (Matthew Rosato)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmKPvyMbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsihegP/3XamiYsS0GuA7awAq/X
 h9Jahb6kJ+sh0RXL1Gqzc9nxH5X9H/hBcL88VOV3GLwyOhNVNpVjQXGguL3aLaCE
 zUrs0+AFEJb990y9H+VgwIDom5BIpgdZ2naG42bz9wUeVGg4daJnkMwOgXwIBzfx
 IOddktN6UwuE+DyA57yqL93f+0cTrhYZx9R14sDoLR5lE4uGnbQwIknawEKVtoeR
 rEPaCFptxPxCUbqoOSR0Y3bu6rUYSH4iiMZpMviqm2ak3aNn76gru3q4QAnI4gTd
 l/w+2OJNFC0U7H5Cz7cdIn2StdJvfSkX0e753+qsFccFsViRCGdnW0Lht/xrYrFC
 i8AJxkrq2/bs00LXs7kzcruaD8pJ2UPe2x2+nupHSEsj99K4NraeHRB2CC1uwj0d
 gYliOSW5T3//wOpztK48s475VppgXeKWkXGoNY3JJlGjAPyd0vFrH8hRLhVZJ9uI
 /eLh6hQnOJuCDz1rQrVNRk6cZi9R1Wpl5dvCBRLqjK519nm569aTlVBra+iNyUCQ
 lU5/kN0ym8+X8CweE5ILPGiX2iEXBYMqv+Dm5yOimRUHRJZHYv900FX0GVEnCUCq
 23sMDaeHS1hyDCQk//bd2Ig7xjh7mbh7CrKcdJ7pL5Gc/A1zkCXd54hvxViiGwQq
 U5KIPTyJy+erpcpxjUApaoP2
 =etEI
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.19-rc1' of https://github.com/awilliam/linux-vfio

Pull vfio updates from Alex Williamson:

 - Improvements to mlx5 vfio-pci variant driver, including support for
   parallel migration per PF (Yishai Hadas)

 - Remove redundant iommu_present() check (Robin Murphy)

 - Ongoing refactoring to consolidate the VFIO driver facing API to use
   vfio_device (Jason Gunthorpe)

 - Use drvdata to store vfio_device among all vfio-pci and variant
   drivers (Jason Gunthorpe)

 - Remove redundant code now that IOMMU core manages group DMA ownership
   (Jason Gunthorpe)

 - Remove vfio_group from external API handling struct file ownership
   (Jason Gunthorpe)

 - Correct typo in uapi comments (Thomas Huth)

 - Fix coccicheck detected deadlock (Wan Jiabing)

 - Use rwsem to remove races and simplify code around container and kvm
   association to groups (Jason Gunthorpe)

 - Harden access to devices in low power states and use runtime PM to
   enable d3cold support for unused devices (Abhishek Sahu)

 - Fix dma_owner handling of fake IOMMU groups (Jason Gunthorpe)

 - Set driver_managed_dma on vfio-pci variant drivers (Jason Gunthorpe)

 - Pass KVM pointer directly rather than via notifier (Matthew Rosato)

* tag 'vfio-v5.19-rc1' of https://github.com/awilliam/linux-vfio: (38 commits)
  vfio: remove VFIO_GROUP_NOTIFY_SET_KVM
  vfio/pci: Add driver_managed_dma to the new vfio_pci drivers
  vfio: Do not manipulate iommu dma_owner for fake iommu groups
  vfio/pci: Move the unused device into low power state with runtime PM
  vfio/pci: Virtualize PME related registers bits and initialize to zero
  vfio/pci: Change the PF power state to D0 before enabling VFs
  vfio/pci: Invalidate mmaps and block the access in D3hot power state
  vfio: Change struct vfio_group::container_users to a non-atomic int
  vfio: Simplify the life cycle of the group FD
  vfio: Fully lock struct vfio_group::container
  vfio: Split up vfio_group_get_device_fd()
  vfio: Change struct vfio_group::opened from an atomic to bool
  vfio: Add missing locking for struct vfio_group::kvm
  kvm/vfio: Fix potential deadlock problem in vfio
  include/uapi/linux/vfio.h: Fix trivial typo - _IORW should be _IOWR instead
  vfio/pci: Use the struct file as the handle not the vfio_group
  kvm/vfio: Remove vfio_group from kvm
  vfio: Change vfio_group_set_kvm() to vfio_file_set_kvm()
  vfio: Change vfio_external_check_extension() to vfio_file_enforced_coherent()
  vfio: Remove vfio_external_group_match_file()
  ...
2022-06-01 13:49:15 -07:00
Changcheng Liu
3fc2a9e89b net/mlx5: correct ECE offset in query qp output
ECE field should be after opt_param_mask in query qp output.

Fixes: 6b646a7e4a ("net/mlx5: Add ability to read and write ECE options")
Signed-off-by: Changcheng Liu <jerrliu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-31 13:40:54 -07:00
Eli Cohen
1892a3d425 vdpa/mlx5: Add support for reading descriptor statistics
Implement the get_vq_stats calback of vdpa_config_ops to return the
statistics for a virtqueue.

The statistics are provided as vendor specific statistics where the
driver provides a pair of attribute name and attribute value.

Currently supported are received descriptors and completed descriptors.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220518133804.1075129-6-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31 12:44:22 -04:00
Eli Cohen
94db331778 net/mlx5: Support multiport eswitch mode
Multiport eswitch mode is a LAG mode that allows to add rules that
forward traffic to a specific physical port without being affected by LAG
affinity configuration.

This mode of operation is mutual exclusive with the other LAG modes used
by multipath and bonding.

To make the transition between the modes, we maintain a counter on the
number of rules specifying one of the uplink representors as the target
of mirred egress redirect action.

An example of such rule would be:

$ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \
  00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0

If the reference count just grows to one and LAG is not in use, we
create the LAG in multiport eswitch mode. Other mode changes are not
allowed while in this mode. When the reference count reaches zero, we
destroy the LAG and let other modes be used if needed.

logic also changed such that if forwarding to some uplink destination
cannot be guaranteed, we fail the operation so the rule will eventually
be in software and not in hardware.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17 23:41:51 -07:00
Tariq Toukan
9b45bde82c net/mlx5: Inline db alloc API function
Take the wrapper version which picks default node into a header file.
This reduces the number of exported functions.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17 23:41:45 -07:00
Moshe Shemesh
1d2c717bc7 net/mlx5: Add last command failure syndrome to debugfs
Add syndrome of last command failure per command type to debugfs to ease
debugging of such failure.
last_failed_syndrome - last command failed syndrome returned by FW.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17 23:41:44 -07:00
Yishai Hadas
846e437387 net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIs
Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a
VF register to be notified for its enablement / disablement by the PF.

Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with
its notifier block and upon VF remove it will call
mlx5_sriov_blocking_notifier_unregister() to drop its registration.

This can give a VF the ability to clean some resources upon disable
before that the command interface goes down and on the other hand sets
some stuff before that it's enabled.

This may be used by a VF which is migration capable in few cases.(e.g.
PF load/unload upon an health recovery).

Link: https://lore.kernel.org/r/20220510090206.90374-2-yishaih@nvidia.com
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-10 15:45:28 +03:00
Mark Bloch
7f46a0b732 net/mlx5: Lag, add debugfs to query hardware lag state
Lag state has become very complicated with many modes, flags, types and
port selections methods and future work will add additional features.

Add a debugfs to query the current lag state. A new directory named "lag"
will be created under the mlx5 debugfs directory. As the driver has
debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag

For example:
/sys/kernel/debug/mlx5/0000:08:00.0/lag

The following files are exposed:

- state: Returns "active" or "disabled". If "active" it means hardware
         lag is active.

- members: Returns the BDFs of all the members of lag object.

- type: Returns the type of the lag currently configured. Valid only
	if hardware lag is active.
	* "roce" - Members are bare metal PFs.
	* "switchdev" - Members are in switchdev mode.
	* "multipath" - ECMP offloads.

- port_sel_mode: Returns the egress port selection method, valid
		 only if hardware lag is active.
		 * "queue_affinity" - Egress port is selected by
		   the QP/SQ affinity.
		 * "hash" - Egress port is selected by hash done on
		   each packet. Controlled by: xmit_hash_policy of the
		   bond device.
- flags: Returns flags that are specific per lag @type. Valid only if
	 hardware lag is active.
	 * "shared_fdb" - "on" or "off", if "on" single FDB is used.

- mapping: Returns the mapping which is used to select egress port.
	   Valid only if hardware lag is active.
	   If @port_sel_mode is "hash" returns the active egress ports.
	   The hash result will select only active ports.
	   if @port_sel_mode is "queue_affinity" returns the mapping
	   between the configured port affinity of the QP/SQ and actual
	   egress port. For example:
	   * 1:1 - Mapping means if the configured affinity is port 1
	           traffic will egress via port 1.
	   * 1:2 - Mapping means if the configured affinity is port 1
		   traffic will egress via port 2. This can happen
		   if port 1 is down or in active/backup mode and port 1
		   is backup.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:04 -07:00
Mark Bloch
4cd14d44b1 net/mlx5: Support devices with more than 2 ports
Increase the define MLX5_MAX_PORTS to 4 as the driver is ready
to support NICs with 4 ports.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:03 -07:00
Mark Bloch
34a30d7635 net/mlx5: Lag, expose number of lag ports
Downstream patches will add support for hardware lag with
more than 2 ports. Add a way for users to query the number of lag ports.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:00 -07:00
Gavin Li
8324a02c34 net/mlx5: Add exit route when waiting for FW
Currently, removing a device needs to get the driver interface lock before
doing any cleanup. If the driver is waiting in a loop for FW init, there
is no way to cancel the wait, instead the device cleanup waits for the
loop to conclude and release the lock.

To allow immediate response to remove device commands, check the TEARDOWN
flag while waiting for FW init, and exit the loop if it has been set.

Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09 22:54:00 -07:00
Leon Romanovsky
1c4a59b9fa net/mlx5: Remove not-supported ICV length
mlx5 doesn't allow to configure any AEAD ICV length other than 128,
so remove the logic that configures other unsupported values.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:17 -07:00
Leon Romanovsky
c6e3b421c7 net/mlx5: Merge various control path IPsec headers into one file
The mlx5 IPsec code has logical separation between code that operates
with XFRM objects (ipsec.c), HW objects (ipsec_offload.c), flow steering
logic (ipsec_fs.c) and data path (ipsec_rxtx.c).

Such separation makes sense for C-files, but isn't needed at all for
H-files as they are included in batch anyway.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:15 -07:00
Leon Romanovsky
2ea36e2e4a net/mlx5: Remove useless validity check
All callers build xfrm attributes with help of mlx5e_ipsec_build_accel_xfrm_attrs()
function that ensure validity of attributes. There is no need to recheck
them again.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:15 -07:00
Leon Romanovsky
c674df973a net/mlx5: Store IPsec ESN update work in XFRM state
mlx5 IPsec code updated ESN through workqueue with allocation calls
in the data path, which can be saved easily if the work is created
during XFRM state initialization routine.

The locking used later in the work didn't protect from anything because
change of HW context is possible during XFRM state add or delete only,
which can cancel work and make sure that it is not running.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:15 -07:00
Mark Bloch
6510bc0d7c net/mlx5: fs, add unused destination type
When the caller doesn't pass a destination fs_core will create a unused
rule just so a context can be returned. This unused rule
is zeroed out and its type is 0 which can be mixed up with
MLX5_FLOW_DESTINATION_TYPE_VPORT.

Create a dedicated type to differentiate between the two
named MLX5_FLOW_DESTINATION_TYPE_NONE.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-02 21:21:14 -07:00
Mark Bloch
d639af6216 net/mlx5: fs, split software and IFC flow destination definitions
Separate flow destinations between software and IFC.
Flow destination type passed by callers was used as the input in
firmware commands and over the years software only types were added
which resulted in mixing between the two.

Create an IFC enum that contains only the flow destinations defined
when talking to the firmware.

Now that there is a proper software only enum for flow destinations
the hardcoded values can be removed as the values are no longer used
in firmware commands.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-02 21:21:13 -07:00
Leon Romanovsky
2984287c4c net/mlx5: Remove not-implemented IPsec capabilities
Clean a capabilities enum to remove not-implemented bits.

Link: https://lore.kernel.org/r/1044bb7b779107ff38e48e3f6553421104f3f819.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09 08:25:07 +03:00
Leon Romanovsky
f2b41b32cd net/mlx5: Remove ipsec_ops function table
There is only one IPsec implementation and ipsec_ops is not needed
at all in this situation. Together with removal of ipsec_ops, we can
drop the entry checks as these functions are called for IPsec devices
only.

Link: https://lore.kernel.org/r/bc8dd1c8a77b65dbf5e2cf92c813ffaca2505c5f.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09 08:25:07 +03:00
Leon Romanovsky
54deb0e775 net/mlx5: Remove not-needed IPsec config
In current code, the CONFIG_MLX5_IPSEC and CONFIG_MLX5_EN_IPSEC are
the same. So remove useless indirection.

Link: https://lore.kernel.org/r/fd14492cbc01a0d51a5bfedde02bcd2154123fde.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09 08:25:07 +03:00
Leon Romanovsky
2451da081a net/mlx5: Unify device IPsec capabilities check
Merge two different function to one in order to provide coherent
picture if the device is IPsec capable or not.

Link: https://lore.kernel.org/r/8f10ea06ad19c6f651e9fb33921009658f01e1d5.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09 08:25:07 +03:00
Leon Romanovsky
de8bdb4769 RDMA/mlx5: Drop crypto flow steering API
The mlx5 flow steering crypto API was intended to be used in FPGA
devices, which is not supported for years already. The removal of
mlx5 crypto FPGA code together with inability to configure encryption
keys makes the low steering API completely unusable.

So delete the code, so any ESP flow steering requests will fail with
not supported error, as it is happening now anyway as no device support
this type of API.

Link: https://lore.kernel.org/r/634a5face7734381463d809bfb89850f6998deac.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09 08:25:06 +03:00
Leon Romanovsky
2fa33b3518 net/mlx5_fpga: Drop INNOVA IPsec support
Mellanox INNOVA IPsec cards are EOL in Nov, 2019 [1]. As such, the code
is unmaintained, untested and not in-use by any upstream/distro oriented
customers. In order to reduce code complexity, drop the kernel code.

[1] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf

Link: https://lore.kernel.org/r/2afe88ec5020a491079eacf6fe3c89b64d65195c.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09 08:23:47 +03:00
Borislav Petkov
0276bd3a94 IB/mlx5: Fix undefined behavior due to shift overflowing the constant
Fix:

  drivers/infiniband/hw/mlx5/main.c: In function ‘translate_eth_legacy_proto_oper’:
  drivers/infiniband/hw/mlx5/main.c:370:2: error: case label does not reduce to an integer constant
    case MLX5E_PROT_MASK(MLX5E_50GBASE_KR2):
    ^~~~

See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory
details as to why it triggers with older gccs only.

Link: https://lore.kernel.org/all/20220405151517.29753-11-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-06 10:15:03 +03:00
Leon Romanovsky
40379a0084 net/mlx5_fpga: Drop INNOVA TLS support
Mellanox INNOVA TLS cards are EOL in May, 2018 [1]. As such, the code
is unmaintained, untested and not in-use by any upstream/distro oriented
customers. In order to reduce code complexity, drop the kernel code.

[1] https://network.nvidia.com/related-docs/eol/LCR-000286.pdf

Link: https://lore.kernel.org/r/b88add368def721ea9d054cb69def72d9e3f67aa.1649073691.git.leonro@nvidia.com
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-06 10:00:36 +03:00
Linus Torvalds
169e77764a Networking changes for 5.18.
Core
 ----
 
  - Introduce XDP multi-buffer support, allowing the use of XDP with
    jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
 
  - Speed up netns dismantling (5x) and lower the memory cost a little.
    Remove unnecessary per-netns sockets. Scope some lists to a netns.
    Cut down RCU syncing. Use batch methods. Allow netdev registration
    to complete out of order.
 
  - Support distinguishing timestamp types (ingress vs egress) and
    maintaining them across packet scrubbing points (e.g. redirect).
 
  - Continue the work of annotating packet drop reasons throughout
    the stack.
 
  - Switch netdev error counters from an atomic to dynamically
    allocated per-CPU counters.
 
  - Rework a few preempt_disable(), local_irq_save() and busy waiting
    sections problematic on PREEMPT_RT.
 
  - Extend the ref_tracker to allow catching use-after-free bugs.
 
 BPF
 ---
 
  - Introduce "packing allocator" for BPF JIT images. JITed code is
    marked read only, and used to be allocated at page granularity.
    Custom allocator allows for more efficient memory use, lower
    iTLB pressure and prevents identity mapping huge pages from
    getting split.
 
  - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
    the correct probe read access method, add appropriate helpers.
 
  - Convert the BPF preload to use light skeleton and drop
    the user-mode-driver dependency.
 
  - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
    its use as a packet generator.
 
  - Allow local storage memory to be allocated with GFP_KERNEL if called
    from a hook allowed to sleep.
 
  - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
    bits to come later).
 
  - Add unstable conntrack lookup helpers for BPF by using the BPF
    kfunc infra.
 
  - Allow cgroup BPF progs to return custom errors to user space.
 
  - Add support for AF_UNIX iterator batching.
 
  - Allow iterator programs to use sleepable helpers.
 
  - Support JIT of add, and, or, xor and xchg atomic ops on arm64.
 
  - Add BTFGen support to bpftool which allows to use CO-RE in kernels
    without BTF info.
 
  - Large number of libbpf API improvements, cleanups and deprecations.
 
 Protocols
 ---------
 
  - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
 
  - Adjust TSO packet sizes based on min_rtt, allowing very low latency
    links (data centers) to always send full-sized TSO super-frames.
 
  - Make IPv6 flow label changes (AKA hash rethink) more configurable,
    via sysctl and setsockopt. Distinguish between server and client
    behavior.
 
  - VxLAN support to "collect metadata" devices to terminate only
    configured VNIs. This is similar to VLAN filtering in the bridge.
 
  - Support inserting IPv6 IOAM information to a fraction of frames.
 
  - Add protocol attribute to IP addresses to allow identifying where
    given address comes from (kernel-generated, DHCP etc.)
 
  - Support setting socket and IPv6 options via cmsg on ping6 sockets.
 
  - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
    Define dscp_t and stop taking ECN bits into account in fib-rules.
 
  - Add support for locked bridge ports (for 802.1X).
 
  - tun: support NAPI for packets received from batched XDP buffs,
    doubling the performance in some scenarios.
 
  - IPv6 extension header handling in Open vSwitch.
 
  - Support IPv6 control message load balancing in bonding, prevent
    neighbor solicitation and advertisement from using the wrong port.
    Support NS/NA monitor selection similar to existing ARP monitor.
 
  - SMC
    - improve performance with TCP_CORK and sendfile()
    - support auto-corking
    - support TCP_NODELAY
 
  - MCTP (Management Component Transport Protocol)
    - add user space tag control interface
    - I2C binding driver (as specified by DMTF DSP0237)
 
  - Multi-BSSID beacon handling in AP mode for WiFi.
 
  - Bluetooth:
    - handle MSFT Monitor Device Event
    - add MGMT Adv Monitor Device Found/Lost events
 
  - Multi-Path TCP:
    - add support for the SO_SNDTIMEO socket option
    - lots of selftest cleanups and improvements
 
  - Increase the max PDU size in CAN ISOTP to 64 kB.
 
 Driver API
 ----------
 
  - Add HW counters for SW netdevs, a mechanism for devices which
    offload packet forwarding to report packet statistics back to
    software interfaces such as tunnels.
 
  - Select the default NIC queue count as a fraction of number of
    physical CPU cores, instead of hard-coding to 8.
 
  - Expose devlink instance locks to drivers. Allow device layer of
    drivers to use that lock directly instead of creating their own
    which always runs into ordering issues in devlink callbacks.
 
  - Add header/data split indication to guide user space enabling
    of TCP zero-copy Rx.
 
  - Allow configuring completion queue event size.
 
  - Refactor page_pool to enable fragmenting after allocation.
 
  - Add allocation and page reuse statistics to page_pool.
 
  - Improve Multiple Spanning Trees support in the bridge to allow
    reuse of topologies across VLANs, saving HW resources in switches.
 
  - DSA (Distributed Switch Architecture):
    - replay and offload of host VLAN entries
    - offload of static and local FDB entries on LAG interfaces
    - FDB isolation and unicast filtering
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - LAN937x T1 PHYs
    - Davicom DM9051 SPI NIC driver
    - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
    - Microchip ksz8563 switches
    - Netronome NFP3800 SmartNICs
    - Fungible SmartNICs
    - MediaTek MT8195 switches
 
  - WiFi:
    - mt76: MediaTek mt7916
    - mt76: MediaTek mt7921u USB adapters
    - brcmfmac: Broadcom BCM43454/6
 
  - Mobile:
    - iosm: Intel M.2 7360 WWAN card
 
 Drivers
 -------
 
  - Convert many drivers to the new phylink API built for split PCS
    designs but also simplifying other cases.
 
  - Intel Ethernet NICs:
    - add TTY for GNSS module for E810T device
    - improve AF_XDP performance
    - GTP-C and GTP-U filter offload
    - QinQ VLAN support
 
  - Mellanox Ethernet NICs (mlx5):
    - support xdp->data_meta
    - multi-buffer XDP
    - offload tc push_eth and pop_eth actions
 
  - Netronome Ethernet NICs (nfp):
    - flow-independent tc action hardware offload (police / meter)
    - AF_XDP
 
  - Other Ethernet NICs:
    - at803x: fiber and SFP support
    - xgmac: mdio: preamble suppression and custom MDC frequencies
    - r8169: enable ASPM L1.2 if system vendor flags it as safe
    - macb/gem: ZynqMP SGMII
    - hns3: add TX push mode
    - dpaa2-eth: software TSO
    - lan743x: multi-queue, mdio, SGMII, PTP
    - axienet: NAPI and GRO support
 
  - Mellanox Ethernet switches (mlxsw):
    - source and dest IP address rewrites
    - RJ45 ports
 
  - Marvell Ethernet switches (prestera):
    - basic routing offload
    - multi-chain TC ACL offload
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - PTP over UDP with the ocelot-8021q DSA tagging protocol
    - basic QoS classification on Felix DSA switch using dcbnl
    - port mirroring for ocelot switches
 
  - Microchip high-speed industrial Ethernet (sparx5):
    - offloading of bridge port flooding flags
    - PTP Hardware Clock
 
  - Other embedded switches:
    - lan966x: PTP Hardward Clock
    - qca8k: mdio read/write operations via crafted Ethernet packets
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
    - enable RX PPDU stats in monitor co-exist mode
 
  - Intel WiFi (iwlwifi):
    - UHB TAS enablement via BIOS
    - band disablement via BIOS
    - channel switch offload
    - 32 Rx AMPDU sessions in newer devices
 
  - MediaTek WiFi (mt76):
    - background radar detection
    - thermal management improvements on mt7915
    - SAR support for more mt76 platforms
    - MBSSID and 6 GHz band on mt7915
 
  - RealTek WiFi:
    - rtw89: AP mode
    - rtw89: 160 MHz channels and 6 GHz band
    - rtw89: hardware scan
 
  - Bluetooth:
    - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
 
  - Microchip CAN (mcp251xfd):
    - multiple RX-FIFOs and runtime configurable RX/TX rings
    - internal PLL, runtime PM handling simplification
    - improve chip detection and error handling after wakeup
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
 IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
 FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
 AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
 KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
 /DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
 6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
 7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
 K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
 cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
 krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
 dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
 =ELpe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "The sprinkling of SPI drivers is because we added a new one and Mark
  sent us a SPI driver interface conversion pull request.

  Core
  ----

   - Introduce XDP multi-buffer support, allowing the use of XDP with
     jumbo frame MTUs and combination with Rx coalescing offloads (LRO).

   - Speed up netns dismantling (5x) and lower the memory cost a little.
     Remove unnecessary per-netns sockets. Scope some lists to a netns.
     Cut down RCU syncing. Use batch methods. Allow netdev registration
     to complete out of order.

   - Support distinguishing timestamp types (ingress vs egress) and
     maintaining them across packet scrubbing points (e.g. redirect).

   - Continue the work of annotating packet drop reasons throughout the
     stack.

   - Switch netdev error counters from an atomic to dynamically
     allocated per-CPU counters.

   - Rework a few preempt_disable(), local_irq_save() and busy waiting
     sections problematic on PREEMPT_RT.

   - Extend the ref_tracker to allow catching use-after-free bugs.

  BPF
  ---

   - Introduce "packing allocator" for BPF JIT images. JITed code is
     marked read only, and used to be allocated at page granularity.
     Custom allocator allows for more efficient memory use, lower iTLB
     pressure and prevents identity mapping huge pages from getting
     split.

   - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
     the correct probe read access method, add appropriate helpers.

   - Convert the BPF preload to use light skeleton and drop the
     user-mode-driver dependency.

   - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
     its use as a packet generator.

   - Allow local storage memory to be allocated with GFP_KERNEL if
     called from a hook allowed to sleep.

   - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
     bits to come later).

   - Add unstable conntrack lookup helpers for BPF by using the BPF
     kfunc infra.

   - Allow cgroup BPF progs to return custom errors to user space.

   - Add support for AF_UNIX iterator batching.

   - Allow iterator programs to use sleepable helpers.

   - Support JIT of add, and, or, xor and xchg atomic ops on arm64.

   - Add BTFGen support to bpftool which allows to use CO-RE in kernels
     without BTF info.

   - Large number of libbpf API improvements, cleanups and deprecations.

  Protocols
  ---------

   - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.

   - Adjust TSO packet sizes based on min_rtt, allowing very low latency
     links (data centers) to always send full-sized TSO super-frames.

   - Make IPv6 flow label changes (AKA hash rethink) more configurable,
     via sysctl and setsockopt. Distinguish between server and client
     behavior.

   - VxLAN support to "collect metadata" devices to terminate only
     configured VNIs. This is similar to VLAN filtering in the bridge.

   - Support inserting IPv6 IOAM information to a fraction of frames.

   - Add protocol attribute to IP addresses to allow identifying where
     given address comes from (kernel-generated, DHCP etc.)

   - Support setting socket and IPv6 options via cmsg on ping6 sockets.

   - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
     Define dscp_t and stop taking ECN bits into account in fib-rules.

   - Add support for locked bridge ports (for 802.1X).

   - tun: support NAPI for packets received from batched XDP buffs,
     doubling the performance in some scenarios.

   - IPv6 extension header handling in Open vSwitch.

   - Support IPv6 control message load balancing in bonding, prevent
     neighbor solicitation and advertisement from using the wrong port.
     Support NS/NA monitor selection similar to existing ARP monitor.

   - SMC
      - improve performance with TCP_CORK and sendfile()
      - support auto-corking
      - support TCP_NODELAY

   - MCTP (Management Component Transport Protocol)
      - add user space tag control interface
      - I2C binding driver (as specified by DMTF DSP0237)

   - Multi-BSSID beacon handling in AP mode for WiFi.

   - Bluetooth:
      - handle MSFT Monitor Device Event
      - add MGMT Adv Monitor Device Found/Lost events

   - Multi-Path TCP:
      - add support for the SO_SNDTIMEO socket option
      - lots of selftest cleanups and improvements

   - Increase the max PDU size in CAN ISOTP to 64 kB.

  Driver API
  ----------

   - Add HW counters for SW netdevs, a mechanism for devices which
     offload packet forwarding to report packet statistics back to
     software interfaces such as tunnels.

   - Select the default NIC queue count as a fraction of number of
     physical CPU cores, instead of hard-coding to 8.

   - Expose devlink instance locks to drivers. Allow device layer of
     drivers to use that lock directly instead of creating their own
     which always runs into ordering issues in devlink callbacks.

   - Add header/data split indication to guide user space enabling of
     TCP zero-copy Rx.

   - Allow configuring completion queue event size.

   - Refactor page_pool to enable fragmenting after allocation.

   - Add allocation and page reuse statistics to page_pool.

   - Improve Multiple Spanning Trees support in the bridge to allow
     reuse of topologies across VLANs, saving HW resources in switches.

   - DSA (Distributed Switch Architecture):
      - replay and offload of host VLAN entries
      - offload of static and local FDB entries on LAG interfaces
      - FDB isolation and unicast filtering

  New hardware / drivers
  ----------------------

   - Ethernet:
      - LAN937x T1 PHYs
      - Davicom DM9051 SPI NIC driver
      - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
      - Microchip ksz8563 switches
      - Netronome NFP3800 SmartNICs
      - Fungible SmartNICs
      - MediaTek MT8195 switches

   - WiFi:
      - mt76: MediaTek mt7916
      - mt76: MediaTek mt7921u USB adapters
      - brcmfmac: Broadcom BCM43454/6

   - Mobile:
      - iosm: Intel M.2 7360 WWAN card

  Drivers
  -------

   - Convert many drivers to the new phylink API built for split PCS
     designs but also simplifying other cases.

   - Intel Ethernet NICs:
      - add TTY for GNSS module for E810T device
      - improve AF_XDP performance
      - GTP-C and GTP-U filter offload
      - QinQ VLAN support

   - Mellanox Ethernet NICs (mlx5):
      - support xdp->data_meta
      - multi-buffer XDP
      - offload tc push_eth and pop_eth actions

   - Netronome Ethernet NICs (nfp):
      - flow-independent tc action hardware offload (police / meter)
      - AF_XDP

   - Other Ethernet NICs:
      - at803x: fiber and SFP support
      - xgmac: mdio: preamble suppression and custom MDC frequencies
      - r8169: enable ASPM L1.2 if system vendor flags it as safe
      - macb/gem: ZynqMP SGMII
      - hns3: add TX push mode
      - dpaa2-eth: software TSO
      - lan743x: multi-queue, mdio, SGMII, PTP
      - axienet: NAPI and GRO support

   - Mellanox Ethernet switches (mlxsw):
      - source and dest IP address rewrites
      - RJ45 ports

   - Marvell Ethernet switches (prestera):
      - basic routing offload
      - multi-chain TC ACL offload

   - NXP embedded Ethernet switches (ocelot & felix):
      - PTP over UDP with the ocelot-8021q DSA tagging protocol
      - basic QoS classification on Felix DSA switch using dcbnl
      - port mirroring for ocelot switches

   - Microchip high-speed industrial Ethernet (sparx5):
      - offloading of bridge port flooding flags
      - PTP Hardware Clock

   - Other embedded switches:
      - lan966x: PTP Hardward Clock
      - qca8k: mdio read/write operations via crafted Ethernet packets

   - Qualcomm 802.11ax WiFi (ath11k):
      - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
      - enable RX PPDU stats in monitor co-exist mode

   - Intel WiFi (iwlwifi):
      - UHB TAS enablement via BIOS
      - band disablement via BIOS
      - channel switch offload
      - 32 Rx AMPDU sessions in newer devices

   - MediaTek WiFi (mt76):
      - background radar detection
      - thermal management improvements on mt7915
      - SAR support for more mt76 platforms
      - MBSSID and 6 GHz band on mt7915

   - RealTek WiFi:
      - rtw89: AP mode
      - rtw89: 160 MHz channels and 6 GHz band
      - rtw89: hardware scan

   - Bluetooth:
      - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)

   - Microchip CAN (mcp251xfd):
      - multiple RX-FIFOs and runtime configurable RX/TX rings
      - internal PLL, runtime PM handling simplification
      - improve chip detection and error handling after wakeup"

* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
  llc: fix netdevice reference leaks in llc_ui_bind()
  drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
  ice: don't allow to run ice_send_event_to_aux() in atomic ctx
  ice: fix 'scheduling while atomic' on aux critical err interrupt
  net/sched: fix incorrect vlan_push_eth dest field
  net: bridge: mst: Restrict info size queries to bridge ports
  net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
  drivers: net: xgene: Fix regression in CRC stripping
  net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
  net: dsa: fix missing host-filtered multicast addresses
  net/mlx5e: Fix build warning, detected write beyond size of field
  iwlwifi: mvm: Don't fail if PPAG isn't supported
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  netdevice: add missing dm_private kdoc
  net: bridge: mst: prevent NULL deref in br_mst_info_size()
  selftests: forwarding: Use same VRF for port and VLAN upper
  ...
2022-03-24 13:13:26 -07:00
Linus Torvalds
7403e6d826 VFIO updates for v5.18-rc1
- Introduce new device migration uAPI and implement device specific
    mlx5 vfio-pci variant driver supporting new protocol (Jason Gunthorpe,
    Yishai Hadas, Leon Romanovsky)
 
  - New HiSilicon acc vfio-pci variant driver, also supporting migration
    interface (Shameer Kolothum, Longfang Liu)
 
  - D3hot fixes for vfio-pci-core (Abhishek Sahu)
 
  - Document new vfio-pci variant driver acceptance criteria
    (Alex Williamson)
 
  - Fix UML build unresolved ioport_{un}map() functions
    (Alex Williamson)
 
  - Fix MAINTAINERS due to header movement (Lukas Bulwahn)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmI6HGwbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiyxcP/18Mh4eYJudvqU7ARH/H
 8E2V+5YhkdVG088KZcB/sTEfVKAbROZrJ3zxkZMXU/OU2cYELHG2pgaI8yCMzHJK
 krz+kZ2p+nA/AMKp8V0xB0MCspTpX/3/6zHV2wDals+gTTLH34N0r6swh0wCjoSa
 wN+3ahE+c6KkX41H8X2Dup5YVM4ohg8MbCd3jSIFBrRDj6SMRGr7zytezCdLhnVs
 TwadlReOYSqKsuvcVnHObWbsOj5WCmuld2u9j0kTPknRm6VtxkfNFQTpKk3sbAcO
 SaPwDP0485plwCVZkNJELZVaF+qYIFW5WZLD5wlJNoH/mZE68a5BKbYFKSLt1gs3
 ntYdktcmsBLVQxTNxcZ6/gwEV2/wuY6v7C3cm0jT0AqXgPIdOqrwlzafTwP+Z/KU
 TC9x4EzPPvdsnBCut0XJZg4QUNlJ7Cp+62vxXqhLGPA2cd4tjGO/8B1KOm05B7VQ
 2XiDtlsW7pwx4v6jRPPdvoqUMd5qqjKF9RepTktirUSXv8z6NIjSyzGn3HZLrk6f
 7AHnlltUg56y/c6hmLxe25PrXKpGqO1fFIcuPYpC+IbBHrE4NVqOhi3ieoonO5GZ
 nwe6IT/fLxsLOudUG/dJ3swuoE8o2Glf17rV9e53K8zF9J9LoFJQsqSFbUzR17pD
 NGN+nA8dWFmmLDS4uYiY9WBg
 =Sv96
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.18-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Introduce new device migration uAPI and implement device specific
   mlx5 vfio-pci variant driver supporting new protocol (Jason
   Gunthorpe, Yishai Hadas, Leon Romanovsky)

 - New HiSilicon acc vfio-pci variant driver, also supporting migration
   interface (Shameer Kolothum, Longfang Liu)

 - D3hot fixes for vfio-pci-core (Abhishek Sahu)

 - Document new vfio-pci variant driver acceptance criteria
   (Alex Williamson)

 - Fix UML build unresolved ioport_{un}map() functions
   (Alex Williamson)

 - Fix MAINTAINERS due to header movement (Lukas Bulwahn)

* tag 'vfio-v5.18-rc1' of https://github.com/awilliam/linux-vfio: (31 commits)
  vfio-pci: Provide reviewers and acceptance criteria for variant drivers
  MAINTAINERS: adjust entry for header movement in hisilicon qm driver
  hisi_acc_vfio_pci: Use its own PCI reset_done error handler
  hisi_acc_vfio_pci: Add support for VFIO live migration
  crypto: hisilicon/qm: Set the VF QM state register
  hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver
  hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region
  hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices
  hisi_acc_qm: Move VF PCI device IDs to common header
  crypto: hisilicon/qm: Move few definitions to common header
  crypto: hisilicon/qm: Move the QM header to include/linux
  vfio/mlx5: Fix to not use 0 as NULL pointer
  PCI/IOV: Fix wrong kernel-doc identifier
  vfio/mlx5: Use its own PCI reset_done error handler
  vfio/pci: Expose vfio_pci_core_aer_err_detected()
  vfio/mlx5: Implement vfio_pci driver for mlx5 devices
  vfio/mlx5: Expose migration commands over mlx5 device
  vfio: Remove migration protocol v1 documentation
  vfio: Extend the device migration protocol with RUNNING_P2P
  vfio: Define device migration protocol v2
  ...
2022-03-24 12:35:59 -07:00
Saeed Mahameed
2af7e566a8 net/mlx5e: Fix build warning, detected write beyond size of field
When merged with Linus tree, the cited patch below will cause the
following build warning:

In function 'fortify_memset_chk',
    inlined from 'mlx5e_xmit_xdp_frame' at drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c:438:3:
include/linux/fortify-string.h:242:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
  242 |                         __write_overflow_field(p_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix that by grouping the fields to memeset in struct_group() to avoid
the false alarm.

Fixes: 9ded70fa1d ("net/mlx5e: Don't prefill WQEs in XDP SQ in the multi buffer mode")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20220322172224.31849-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22 16:48:56 -07:00
Tariq Toukan
770c9a3a01 net/mlx5: Remove unused fill page array API function
mlx5_fill_page_array API function is not used.
Remove it, reduce the number of exported functions.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17 11:51:58 -07:00
Tariq Toukan
4206fe40b2 net/mlx5: Remove unused exported contiguous coherent buffer allocation API
All WQ types moved to using the fragmented allocation API
for coherent memory. Contiguous API is not used anymore.
Remove it, reduce the number of exported functions.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17 11:51:58 -07:00
Gal Pressman
fcb610a86c net/mlx5: Parse module mapping using mlx5_ifc
The assumption that the first byte in the module mapping dword is the
module number shouldn't be hard-coded in the driver, but come from
mlx5_ifc structs.

While at it, fix the incorrect width for the 'rx_lane' and 'tx_lane'
fields.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-10 23:38:25 -08:00
Gal Pressman
271907ee2f net/mlx5: Query the maximum MCIA register read size from firmware
The MCIA register supports either 12 or 32 dwords, use the correct value
by querying the capability from the MCAM register.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-10 23:38:24 -08:00
Jakub Kicinski
1e8a3f0d2a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/dsa/dsa2.c
  commit afb3cc1a39 ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails")
  commit e83d565378 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master")
https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/

drivers/net/ethernet/intel/ice/ice.h
  commit 97b0129146 ("ice: Fix error with handling of bonding MTU")
  commit 43113ff734 ("ice: add TTY for GNSS module for E810T device")
https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/

drivers/staging/gdm724x/gdm_lte.c
  commit fc7f750dc9 ("staging: gdm724x: fix use after free in gdm_lte_rx()")
  commit 4bcc4249b4 ("staging: Use netif_rx().")
https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 17:16:56 -08:00
Yevgeny Kliteynik
6862c787c7 net/mlx5: DR, Add support for ConnectX-7 steering
Add support for a new SW format version that is implemented by
ConnectX-7.

Except for several differences, the STEv2 is identical to STEv1, so for
most callbacks the STEv2 context struct will call STEv1 functions.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 13:33:04 -08:00
Yevgeny Kliteynik
5c422bfad2 net/mlx5: DR, Add support for matching on Internet Header Length (IHL)
Add support for matching on new field - Internet Header Length (IHL).

Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 13:33:03 -08:00
Moshe Shemesh
32071187e9 net/mlx5: Add debugfs counters for page commands failures
Add the following new debugfs counters for debug and verbosity:
fw_pages_alloc_failed - number of pages FW requested but driver failed
to allocate.
give_pages_dropped - number of pages given to FW, but command give pages
failed by FW.
reclaim_pages_discard - number of pages which were about to reclaim back
and FW failed the command.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 13:33:02 -08:00
Moshe Shemesh
4e05cbf05c net/mlx5: Add pages debugfs
Add pages debugfs to expose the following counters for debuggability:
fw_pages_total - How many pages were given to FW and not returned yet.
vfs_pages - For SRIOV, how many pages were given to FW for virtual
functions usage.
host_pf_pages - For ECPF, how many pages were given to FW for external
hosts physical functions usage.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 13:33:02 -08:00
Moshe Shemesh
66771a1c72 net/mlx5: Move debugfs entries to separate struct
Move the debugfs entry pointers under priv to their own struct.
Add get function for device debugfs root.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 13:33:02 -08:00
Moshe Shemesh
d2cb8dda21 net/mlx5: Change release_all_pages cap bit location
mlx5 FW has changed release_all_pages cap bit by one bit offset to
reflect a fix in the FW flow for release_all_pages. The driver should
use the new bit to ensure it calls release_all_pages only if the FW fix
is there.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 13:33:01 -08:00
Moshe Shemesh
34f46ae0d4 net/mlx5: Add command failures data to debugfs
Add new counters to command interface debugfs to count command failures.
The following counters added:
total_failed - number of times command failed (any kind of failure).
failed_mbox_status - number of times command failed on bad status
returned by FW.

In addition, add data about last command failure to command interface
debugfs:
last_failed_errno - last command failed returned errno.
last_failed_mbox_status - last bad status returned by FW.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 13:33:00 -08:00
Ben Ben-Ishay
99a2b9be07 net/mlx5e: SHAMPO, reduce TIR indication
SHAMPO is an RQ / WQ feature, an indication was added to the TIR in the
first place to enforce suitability between connected TIR and RQ, this
enforcement does not exist in current the Firmware implementation and was
redundant in the first place.

Fixes: 83439f3c37 ("net/mlx5e: Add HW-GRO offload")
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 11:39:35 -08:00
Mohammad Kabat
ac77998b7a net/mlx5: Fix size field in bufferx_reg struct
According to HW spec the field "size" should be 16 bits
in bufferx register.

Fixes: e281682bf2 ("net/mlx5_core: HW data structs/types definitions cleanup")
Signed-off-by: Mohammad Kabat <mohammadkab@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-09 11:39:34 -08:00
Jakub Kicinski
f2b77012dd Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
mlx5-next 2022-22-02

The following PR includes updates to mlx5-next branch:

Headlines:
==========

1) Jakub cleans up unused static inline functions

2) I did some low level firmware command interface return status changes to
provide the caller with full visibility on the error/status returned by
the Firmware.

3) Use the new command interface in RDMA DEVX usecases to avoid flooding
dmesg with some "expected" user error prone use cases.

4) Moshe also uses the new command interface to grab the specific error
code from MFRL register command to provide the exact error reason for
why SW reset couldn't perform internally in FW.

5) From Mark Bloch: Lag, drop packets in hardware when possible

In active-backup mode the inactive interface's packets are dropped by the
bond device. In switchdev where TC rules are offloaded to the FDB
this can lead to packets being hit in the FDB where without offload
they would have been dropped before reaching TC rules in the kernel.

Create a drop rule to make sure packets on inactive ports are dropped
before reaching the FDB.

Listen on NETDEV_CHANGEUPPER / NETDEV_CHANGEINFODATA events and record
the inactive state and offload accordingly.

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Add clarification on sync reset failure
  net/mlx5: Add reset_state field to MFRL register
  RDMA/mlx5: Use new command interface API
  net/mlx5: cmdif, Refactor error handling and reporting of async commands
  net/mlx5: Use mlx5_cmd_do() in core create_{cq,dct}
  net/mlx5: cmdif, Add new api for command execution
  net/mlx5: cmdif, cmd_check refactoring
  net/mlx5: cmdif, Return value improvements
  net/mlx5: Lag, offload active-backup drops to hardware
  net/mlx5: Lag, record inactive state of bond device
  net/mlx5: Lag, don't use magic numbers for ports
  net/mlx5: Lag, use local variable already defined to access E-Switch
  net/mlx5: E-switch, add drop rule support to ingress ACL
  net/mlx5: E-switch, remove special uplink ingress ACL handling
  net/mlx5: E-Switch, reserve and use same uplink metadata across ports
  net/mlx5: Add ability to insert to specific flow group
  mlx5: remove unused static inlines
====================

Link: https://lore.kernel.org/r/20220223233930.319301-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-28 16:23:58 -08:00
Yishai Hadas
adfdaff3d1 net/mlx5: Introduce migration bits and structures
Introduce migration IFC related stuff to enable migration commands.

Link: https://lore.kernel.org/all/20220224142024.147653-7-yishaih@nvidia.com
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-02-27 11:41:38 +02:00
Yishai Hadas
1695b97b29 net/mlx5: Expose APIs to get/put the mlx5 core device
Expose an API to get the mlx5 core device from a given VF PCI device if
mlx5_core is its driver.

Upon the get API we stay with the intf_state_mutex locked to make sure
that the device can't be gone/unloaded till the caller will complete
its job over the device, this expects to be for a short period of time
for any flow that the lock is taken.

Upon the put API we unlock the intf_state_mutex.

The use case for those APIs is the migration flow of a VF over VFIO PCI.
In that case the VF doesn't ride on mlx5_core, because the device is
driving *two* different PCI devices, the PF owned by mlx5_core and the
VF owned by the vfio driver.

The mlx5_core of the PF is accessed only during the narrow window of the
VF's ioctl that requires its services.

This allows the PF driver to be more independent of the VF driver, so
long as it doesn't reset the FW.

Link: https://lore.kernel.org/all/20220224142024.147653-6-yishaih@nvidia.com
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-02-27 11:41:21 +02:00
Moshe Shemesh
45fee8edb4 net/mlx5: Add clarification on sync reset failure
In case devlink reload action fw_activate failed in sync reset stage,
use the new MFRL field reset_state to find why it failed and share this
clarification with the user.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-23 15:21:59 -08:00
Moshe Shemesh
72fb3b60a3 net/mlx5: Add reset_state field to MFRL register
Add new field reset_state to MFRL register. This field expose current
state of sync reset for fw update. This field enables sharing with the
user more details on why fw activate failed in case it failed the sync
reset stage.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-23 15:21:59 -08:00
Saeed Mahameed
0a41527608 net/mlx5: cmdif, Refactor error handling and reporting of async commands
Same as the new mlx5_cmd_do API, report all information to callers and
let them handle the error values and outbox parsing.

The user callback status "work->user_callback(status)" is now similar to
the error rc code returned from the blocking mlx5_cmd_do() version,
and now is defined as follows:

 -EREMOTEIO : Command executed by FW, outbox.status != MLX5_CMD_STAT_OK.
              Caller must check FW outbox status.
 0 : Command execution successful,  outbox.status == MLX5_CMD_STAT_OK.
 < 0 : Command couldn't execute, FW or driver induced error.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-23 15:21:59 -08:00