Commit graph

1017 commits

Author SHA1 Message Date
Dennis Dalessandro
b90c7e97c4 RDMA/hfi1: Remove all traces of diagpkt support
One of the concessions we made to get our driver upstream was to remove
the diagnostic packet support. There is however still some cruft that was
left over.  Remove it.

Link: https://lore.kernel.org/r/20220520183727.48973.93587.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24 15:08:32 -03:00
Dennis Dalessandro
1994c31340 RDMA/hfi1: Consolidate software versions
There is no need to have separate user and kernel software versions. There
is a single software that the kernel is compatible with.

Also remove the notion of a "kernel type" that is long since deprecated.

Link: https://lore.kernel.org/r/20220520183722.48973.60262.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24 15:08:32 -03:00
Dennis Dalessandro
676bffa02e RDMA/hfi1: Remove pointless driver version
Driver versions have long been forbidden in the RDMA subsystem. We removed
most of the code relating to them and have been very strict about not
allowing.  However there is some leftover versioning that we do not
need. Get rid of that.

Link: https://lore.kernel.org/r/20220520183717.48973.17418.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24 15:08:31 -03:00
Dennis Dalessandro
f93e91a037 RDMA/hfi1: Fix potential integer multiplication overflow errors
When multiplying of different types, an overflow is possible even when
storing the result in a larger type. This is because the conversion is
done after the multiplication. So arithmetic overflow and thus in
incorrect value is possible.

Correct an instance of this in the inter packet delay calculation.  Fix by
ensuring one of the operands is u64 which will promote the other to u64 as
well ensuring no overflow.

Cc: stable@vger.kernel.org
Fixes: 7724105686 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20220520183712.48973.29855.stgit@awfm-01.cornelisnetworks.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24 15:08:31 -03:00
Douglas Miller
629e052d0c RDMA/hfi1: Prevent panic when SDMA is disabled
If the hfi1 module is loaded with HFI1_CAP_SDMA off, a call to
hfi1_write_iter() will dereference a NULL pointer and panic. A typical
stack frame is:

  sdma_select_user_engine [hfi1]
  hfi1_user_sdma_process_request [hfi1]
  hfi1_write_iter [hfi1]
  do_iter_readv_writev
  do_iter_write
  vfs_writev
  do_writev
  do_syscall_64

The fix is to test for SDMA in hfi1_write_iter() and fail the I/O with
EINVAL.

Link: https://lore.kernel.org/r/20220520183706.48973.79803.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24 15:08:31 -03:00
Douglas Miller
05c03dfd09 RDMA/hfi1: Prevent use of lock before it is initialized
If there is a failure during probe of hfi1 before the sdma_map_lock is
initialized, the call to hfi1_free_devdata() will attempt to use a lock
that has not been initialized. If the locking correctness validator is on
then an INFO message and stack trace resembling the following may be seen:

  INFO: trying to register non-static key.
  The code is fine but needs lockdep annotation, or maybe
  you didn't initialize this object before use?
  turning off the locking correctness validator.
  Call Trace:
  register_lock_class+0x11b/0x880
  __lock_acquire+0xf3/0x7930
  lock_acquire+0xff/0x2d0
  _raw_spin_lock_irq+0x46/0x60
  sdma_clean+0x42a/0x660 [hfi1]
  hfi1_free_devdata+0x3a7/0x420 [hfi1]
  init_one+0x867/0x11a0 [hfi1]
  pci_device_probe+0x40e/0x8d0

The use of sdma_map_lock in sdma_clean() is for freeing the sdma_map
memory, and sdma_map is not allocated/initialized until after
sdma_map_lock has been initialized. This code only needs to be run if
sdma_map is not NULL, and so checking for that condition will avoid trying
to use the lock before it is initialized.

Fixes: 473291b3ea ("IB/hfi1: Fix for early release of sdma context")
Fixes: 7724105686 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20220520183701.48973.72434.stgit@awfm-01.cornelisnetworks.com
Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24 14:52:47 -03:00
Jason Gunthorpe
a6f844da39 Linux 5.18
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmKKlIAeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC3oH/iPm/fLG2sJut8My
 sU0RC9K+6ESV5h2Qy6k00/lqKstlu4EvBjw4V8vYpx3Q2+hbSFMn2SeWqqqT3Lkk
 Zb8KINCFuuyMtdCBb42PV0zhUf5pCQF7ocm/Ae4jllDHtPmqk3WJ6IGtZBK5JBlw
 z6RR/wKt0y0MRj9eZyPyYjOee2L2vuVh4tgnexK/4L8g2ZtMMRThhvUzSMWG4zxR
 STYYNp0uFcfT1Vt85+ODevFH4TvdECAj+SqAegN+seHLM17YY7M0/WiIYpxGRv8P
 lIpDQl4PBU8EBkpI5hkpJ/3qPincbuVOMLsYfxFtpcjjG12vGjFp2krGpS3TedZQ
 3mvaJ7c=
 =vLke
 -----END PGP SIGNATURE-----

Merge tag 'v5.18' into rdma.git for-next

Following patches have dependencies.

Resolve the merge conflict in
drivers/net/ethernet/mellanox/mlx5/core/main.c by keeping the new names
for the fs functions following linux-next:

https://lore.kernel.org/r/20220519113529.226bc3e2@canb.auug.org.au/

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24 12:40:28 -03:00
Julia Lawall
684b916b30 IB/hf1: Fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Link: https://lore.kernel.org/r/20220521111145.81697-83-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24 11:24:57 -03:00
Douglas Miller
2bbac98d09 RDMA/hfi1: Fix use-after-free bug for mm struct
Under certain conditions, such as MPI_Abort, the hfi1 cleanup code may
represent the last reference held on the task mm.
hfi1_mmu_rb_unregister() then drops the last reference and the mm is freed
before the final use in hfi1_release_user_pages().  A new task may
allocate the mm structure while it is still being used, resulting in
problems. One manifestation is corruption of the mmap_sem counter leading
to a hang in down_write().  Another is corruption of an mm struct that is
in use by another task.

Fixes: 3d2a9d6425 ("IB/hfi1: Ensure correct mm is used at all times")
Link: https://lore.kernel.org/r/20220408133523.122165.72975.stgit@awfm-01.cornelisnetworks.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-08 15:40:06 -03:00
Jason Gunthorpe
e945c653c8 RDMA: Split kernel-only global device caps from uverbs device caps
Split out flags from ib_device::device_cap_flags that are only used
internally to the kernel into kernel_cap_flags that is not part of the
uapi. This limits the device_cap_flags to being the same bitmap that will
be copied to userspace.

This cleanly splits out the uverbs flags from the kernel flags to avoid
confusion in the flags bitmap.

Add some short comments describing which each of the kernel flags is
connected to. Remove unused kernel flags.

Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-06 15:02:13 -03:00
Linus Torvalds
2dacc1e57b v5.18 merge window pull request
Patchces for the merge window:
 
 - Minor bug fixes in mlx5, mthca, pvrdma, rtrs, mlx4, hfi1, hns
 
 - Minor cleanups: coding style, useless includes and documentation
 
 - Reorganize how multicast processing works in rxe
 
 - Replace a red/black tree with xarray in rxe which improves performance
 
 - DSCP support and HW address handle re-use in irdma
 
 - Simplify the mailbox command handling in hns
 
 - Simplify iser now that FMR is eliminated
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmI7d8gACgkQOG33FX4g
 mxq9TRAAm/3rEBtDyVan4Dm3mzq8xeJOTdYtQ29QNWZcwvhStNTo4c29jQJITZ9X
 Xd+m8VEPWzjkkZ4+hDFM9HhWjOzqDcCCFcbF88WDO0+P1M3HYYUE6AbIcSilAaT2
 GIby3+kAnsTOAryrSzHLehYJKUYJuw5NRGsVAPY3r6hz3UECjNb/X1KTWhXaIfNy
 2OTqFx5wMQ6hZO8e4a2Wz4bBYAYm95UfK+TgfRqUwhDLCDnkDELvHMPh9pXxJecG
 xzVg7W7Nv+6GWpUrqtM6W6YpriyjUOfbrtCJmBV3T/EdpiD6lZhKpkX23aLgu2Bi
 XoMM67aZ0i1Ft2trCIF8GbLaejG6epBkeKkUMMSozKqFP5DjhNbD2f3WAlMI15iW
 CcdyPS5re1ymPp66UlycTDSkN19wD8LfgQPLJCNM3EV8g8qs9LaKzEPWunBoZmw1
 a+QX2zK8J07S21F8iq2sf/Qe1EDdmgOEOf7pmB/A5/PKqgXZPG7lYNYuhIKyFsdL
 mO+BqWWp0a953JJjsiutxyUCyUd8jtwwxRa0B66oqSQ1jO+xj6XDxjEL8ACuT8Iq
 9wFJluTCN6lfxyV8mrSfweT0fQh/W/zVF7EJZHWdKXEzuADxMp9X06wtMQ82ea+k
 /wgN7DUpBZczRtlqaxt1VSkER75QwAMy/oSpA974+O120QCwLMQ=
 =aux1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:

 - Minor bug fixes in mlx5, mthca, pvrdma, rtrs, mlx4, hfi1, hns

 - Minor cleanups: coding style, useless includes and documentation

 - Reorganize how multicast processing works in rxe

 - Replace a red/black tree with xarray in rxe which improves performance

 - DSCP support and HW address handle re-use in irdma

 - Simplify the mailbox command handling in hns

 - Simplify iser now that FMR is eliminated

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (93 commits)
  RDMA/nldev: Prevent underflow in nldev_stat_set_counter_dynamic_doit()
  IB/iser: Fix error flow in case of registration failure
  IB/iser: Generalize map/unmap dma tasks
  IB/iser: Use iser_fr_desc as registration context
  IB/iser: Remove iser_reg_data_sg helper function
  RDMA/rxe: Use standard names for ref counting
  RDMA/rxe: Replace red-black trees by xarrays
  RDMA/rxe: Shorten pool names in rxe_pool.c
  RDMA/rxe: Move max_elem into rxe_type_info
  RDMA/rxe: Replace obj by elem in declaration
  RDMA/rxe: Delete _locked() APIs for pool objects
  RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
  RDMA/rxe: Replace mr by rkey in responder resources
  RDMA/rxe: Fix ref error in rxe_av.c
  RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT
  RDMA/irdma: Add support for address handle re-use
  RDMA/qib: Fix typos in comments
  RDMA/mlx5: Fix memory leak in error flow for subscribe event routine
  Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error"
  RDMA/rxe: Remove useless argument for update_state()
  ...
2022-03-24 19:17:39 -07:00
Mike Marciniszyn
b135e324d7 IB/hfi1: Allow larger MTU without AIP
The AIP code signals the phys_mtu in the following query_port()
fragment:

	props->phys_mtu = HFI1_CAP_IS_KSET(AIP) ? hfi1_max_mtu :
				ib_mtu_enum_to_int(props->max_mtu);

Using the largest MTU possible should not depend on AIP.

Fix by unconditionally using the hfi1_max_mtu value.

Fixes: 6d72344cf6 ("IB/ipoib: Increase ipoib Datagram mode MTU's upper limit")
Link: https://lore.kernel.org/r/1644348309-174874-1-git-send-email-mike.marciniszyn@cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-03-04 17:22:02 -04:00
Yury Norov
3c8bc3954d RDMA/hfi: Replace cpumask_weight with cpumask_empty where appropriate
drivers/infiniband/hw/hfi1/affinity.c code calls cpumask_weight() to check
if any bit of a given cpumask is set. We can do it more efficiently with
cpumask_empty() because cpumask_empty() stops traversing the cpumask as
soon as it finds first set bit, while cpumask_weight() counts all bits
unconditionally.

Link: https://lore.kernel.org/r/20220210224933.379149-20-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-11 15:07:30 -04:00
Leon Romanovsky
75eeaed448 RDMA/hfi1: Delete useless module.h include
There is no need in include of module.h in the following files.

Link: https://lore.kernel.org/r/c53a257079bc93dac036155b793073939d17ddba.1642960861.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-28 13:03:12 -04:00
Mike Marciniszyn
e5cce44aff IB/hfi1: Fix tstats alloc and dealloc
The tstats allocation is done in the accelerated ndo_init function but the
allocation is not tested to succeed.

The deallocation is not done in the accelerated ndo_uninit function.

Resolve issues by testing for an allocation failure and adding the
free_percpu in the uninit function.

Fixes: aa0616a9bd ("IB/hfi1: switch to core handling of rx/tx byte/packet counters")
Link: https://lore.kernel.org/r/1642287756-182313-5-git-send-email-mike.marciniszyn@cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-28 11:12:15 -04:00
Mike Marciniszyn
5f8f55b92e IB/hfi1: Fix AIP early init panic
An early failure in hfi1_ipoib_setup_rn() can lead to the following panic:

  BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
  PGD 0 P4D 0
  Oops: 0002 [#1] SMP NOPTI
  Workqueue: events work_for_cpu_fn
  RIP: 0010:try_to_grab_pending+0x2b/0x140
  Code: 1f 44 00 00 41 55 41 54 55 48 89 d5 53 48 89 fb 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 48 89 55 00 40 84 f6 75 77 <f0> 48 0f ba 2b 00 72 09 31 c0 5b 5d 41 5c 41 5d c3 48 89 df e8 6c
  RSP: 0018:ffffb6b3cf7cfa48 EFLAGS: 00010046
  RAX: 0000000000000246 RBX: 00000000000001b0 RCX: 0000000000000000
  RDX: 0000000000000246 RSI: 0000000000000000 RDI: 00000000000001b0
  RBP: ffffb6b3cf7cfa70 R08: 0000000000000f09 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
  R13: ffffb6b3cf7cfa90 R14: ffffffff9b2fbfc0 R15: ffff8a4fdf244690
  FS:  0000000000000000(0000) GS:ffff8a527f400000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000001b0 CR3: 00000017e2410003 CR4: 00000000007706f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   __cancel_work_timer+0x42/0x190
   ? dev_printk_emit+0x4e/0x70
   iowait_cancel_work+0x15/0x30 [hfi1]
   hfi1_ipoib_txreq_deinit+0x5a/0x220 [hfi1]
   ? dev_err+0x6c/0x90
   hfi1_ipoib_netdev_dtor+0x15/0x30 [hfi1]
   hfi1_ipoib_setup_rn+0x10e/0x150 [hfi1]
   rdma_init_netdev+0x5a/0x80 [ib_core]
   ? hfi1_ipoib_free_rdma_netdev+0x20/0x20 [hfi1]
   ipoib_intf_init+0x6c/0x350 [ib_ipoib]
   ipoib_intf_alloc+0x5c/0xc0 [ib_ipoib]
   ipoib_add_one+0xbe/0x300 [ib_ipoib]
   add_client_context+0x12c/0x1a0 [ib_core]
   enable_device_and_get+0xdc/0x1d0 [ib_core]
   ib_register_device+0x572/0x6b0 [ib_core]
   rvt_register_device+0x11b/0x220 [rdmavt]
   hfi1_register_ib_device+0x6b4/0x770 [hfi1]
   do_init_one.isra.20+0x3e3/0x680 [hfi1]
   local_pci_probe+0x41/0x90
   work_for_cpu_fn+0x16/0x20
   process_one_work+0x1a7/0x360
   ? create_worker+0x1a0/0x1a0
   worker_thread+0x1cf/0x390
   ? create_worker+0x1a0/0x1a0
   kthread+0x116/0x130
   ? kthread_flush_work_fn+0x10/0x10
   ret_from_fork+0x1f/0x40

The panic happens in hfi1_ipoib_txreq_deinit() because there is a NULL
deref when hfi1_ipoib_netdev_dtor() is called in this error case.

hfi1_ipoib_txreq_init() and hfi1_ipoib_rxq_init() are self unwinding so
fix by adjusting the error paths accordingly.

Other changes:
- hfi1_ipoib_free_rdma_netdev() is deleted including the free_netdev()
  since the netdev core code deletes calls free_netdev()
- The switch to the accelerated entrances is moved to the success path.

Cc: stable@vger.kernel.org
Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/1642287756-182313-4-git-send-email-mike.marciniszyn@cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-28 11:12:15 -04:00
Mike Marciniszyn
b1151b74ff IB/hfi1: Fix alloc failure with larger txqueuelen
The following allocation with large txqueuelen will result in the
following warning:

  Call Trace:
   __alloc_pages_nodemask+0x283/0x2c0
   kmalloc_large_node+0x3c/0xa0
   __kmalloc_node+0x22a/0x2f0
   hfi1_ipoib_txreq_init+0x19f/0x330 [hfi1]
   hfi1_ipoib_setup_rn+0xd3/0x1a0 [hfi1]
   rdma_init_netdev+0x5a/0x80 [ib_core]
   ipoib_intf_init+0x6c/0x350 [ib_ipoib]
   ipoib_intf_alloc+0x5c/0xc0 [ib_ipoib]
   ipoib_add_one+0xbe/0x300 [ib_ipoib]
   add_client_context+0x12c/0x1a0 [ib_core]
   ib_register_client+0x147/0x190 [ib_core]
   ipoib_init_module+0xdd/0x132 [ib_ipoib]
   do_one_initcall+0x46/0x1c3
   do_init_module+0x5a/0x220
   load_module+0x14c5/0x17f0
   __do_sys_init_module+0x13b/0x180
   do_syscall_64+0x5b/0x1a0
   entry_SYSCALL_64_after_hwframe+0x65/0xca

For ipoib, the txqueuelen is modified with the module parameter
send_queue_size.

Fix by changing to use kv versions of the same allocator to handle the
large allocations.  The allocation embeds a hdr struct that is dma mapped.
Change that struct to a pointer to a kzalloced struct.

Cc: stable@vger.kernel.org
Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/1642287756-182313-3-git-send-email-mike.marciniszyn@cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-28 11:12:15 -04:00
Mike Marciniszyn
8c83d39cc7 IB/hfi1: Fix panic with larger ipoib send_queue_size
When the ipoib send_queue_size is increased from the default the following
panic happens:

  RIP: 0010:hfi1_ipoib_drain_tx_ring+0x45/0xf0 [hfi1]
  Code: 31 e4 eb 0f 8b 85 c8 02 00 00 41 83 c4 01 44 39 e0 76 60 8b 8d cc 02 00 00 44 89 e3 be 01 00 00 00 d3 e3 48 03 9d c0 02 00 00 <c7> 83 18 01 00 00 00 00 00 00 48 8b bb 30 01 00 00 e8 25 af a7 e0
  RSP: 0018:ffffc9000798f4a0 EFLAGS: 00010286
  RAX: 0000000000008000 RBX: ffffc9000aa0f000 RCX: 000000000000000f
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
  RBP: ffff88810ff08000 R08: ffff88889476d900 R09: 0000000000000101
  R10: 0000000000000000 R11: ffffc90006590ff8 R12: 0000000000000200
  R13: ffffc9000798fba8 R14: 0000000000000000 R15: 0000000000000001
  FS:  00007fd0f79cc3c0(0000) GS:ffff88885fb00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffc9000aa0f118 CR3: 0000000889c84001 CR4: 00000000001706e0
  Call Trace:
   <TASK>
   hfi1_ipoib_napi_tx_disable+0x45/0x60 [hfi1]
   hfi1_ipoib_dev_stop+0x18/0x80 [hfi1]
   ipoib_ib_dev_stop+0x1d/0x40 [ib_ipoib]
   ipoib_stop+0x48/0xc0 [ib_ipoib]
   __dev_close_many+0x9e/0x110
   __dev_change_flags+0xd9/0x210
   dev_change_flags+0x21/0x60
   do_setlink+0x31c/0x10f0
   ? __nla_validate_parse+0x12d/0x1a0
   ? __nla_parse+0x21/0x30
   ? inet6_validate_link_af+0x5e/0xf0
   ? cpumask_next+0x1f/0x20
   ? __snmp6_fill_stats64.isra.53+0xbb/0x140
   ? __nla_validate_parse+0x47/0x1a0
   __rtnl_newlink+0x530/0x910
   ? pskb_expand_head+0x73/0x300
   ? __kmalloc_node_track_caller+0x109/0x280
   ? __nla_put+0xc/0x20
   ? cpumask_next_and+0x20/0x30
   ? update_sd_lb_stats.constprop.144+0xd3/0x820
   ? _raw_spin_unlock_irqrestore+0x25/0x37
   ? __wake_up_common_lock+0x87/0xc0
   ? kmem_cache_alloc_trace+0x3d/0x3d0
   rtnl_newlink+0x43/0x60

The issue happens when the shift that should have been a function of the
txq item size mistakenly used the ring size.

Fix by using the item size.

Cc: stable@vger.kernel.org
Fixes: d47dfc2b00 ("IB/hfi1: Remove cache and embed txreq in ring")
Link: https://lore.kernel.org/r/1642287756-182313-2-git-send-email-mike.marciniszyn@cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-28 11:03:50 -04:00
Jason Gunthorpe
4922f09209 Linux 5.16-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmG2fU0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC7EH/3R7Rt+OD8Wn8Ss3
 w8V+dBxVwa2u2oMTyUHPxaeOXZ7bi38XlUdLFPOK/76bGwO0a5TmYZqsWdRbGyT0
 HfcYjHsQ0lbJXk/nh2oM47oJxJXVpThIHXJEk0FZ0Y5t+DYjIYlNHzqZymUyhLem
 St74zgWcyT+MXuqY34vB827FJDUnOxhhhi85tObeunaSPAomy9aiYidSC1ARREnz
 iz2VUntP/QnRnKVvL2nUZNzcz1xL5vfCRSKsRGRSv3qW1Y/1M71ylt6JVmSftWq+
 VmMdFxFhdrb1OK/1ct/930Un/UP2NG9EJsWxote2XYlnVSZHzDqH7lUhbqgdCcLz
 1m2tVNY=
 =7wRd
 -----END PGP SIGNATURE-----

Merge tag 'v5.16-rc5' into rdma.git for-next

Required due to dependencies in following patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-14 20:18:48 -04:00
Mike Marciniszyn
60a8b5a161 IB/hfi1: Fix leak of rcvhdrtail_dummy_kvaddr
This buffer is currently allocated in hfi1_init():

	if (reinit)
		ret = init_after_reset(dd);
	else
		ret = loadtime_init(dd);
	if (ret)
		goto done;

	/* allocate dummy tail memory for all receive contexts */
	dd->rcvhdrtail_dummy_kvaddr = dma_alloc_coherent(&dd->pcidev->dev,
							 sizeof(u64),
							 &dd->rcvhdrtail_dummy_dma,
							 GFP_KERNEL);

	if (!dd->rcvhdrtail_dummy_kvaddr) {
		dd_dev_err(dd, "cannot allocate dummy tail memory\n");
		ret = -ENOMEM;
		goto done;
	}

The reinit triggered path will overwrite the old allocation and leak it.

Fix by moving the allocation to hfi1_alloc_devdata() and the deallocation
to hfi1_free_devdata().

Link: https://lore.kernel.org/r/20211129192008.101968.91302.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: 46b010d3ee ("staging/rdma/hfi1: Workaround to prevent corruption during packet delivery")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:54 -04:00
Mike Marciniszyn
f6a3cfec3c IB/hfi1: Fix early init panic
The following trace can be observed with an init failure such as firmware
load failures:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  PGD 0 P4D 0
  Oops: 0010 [#1] SMP PTI
  CPU: 0 PID: 537 Comm: kworker/0:3 Tainted: G           OE    --------- -  - 4.18.0-240.el8.x86_64 #1
  Workqueue: events work_for_cpu_fn
  RIP: 0010:0x0
  Code: Bad RIP value.
  RSP: 0000:ffffae5f878a3c98 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff95e48e025c00 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff95e48e025c00
  RBP: ffff95e4bf3660a4 R08: 0000000000000000 R09: ffffffff86d5e100
  R10: ffff95e49e1de600 R11: 0000000000000001 R12: ffff95e4bf366180
  R13: ffff95e48e025c00 R14: ffff95e4bf366028 R15: ffff95e4bf366000
  FS:  0000000000000000(0000) GS:ffff95e4df200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffffffffd6 CR3: 0000000f86a0a003 CR4: 00000000001606f0
  Call Trace:
   receive_context_interrupt+0x1f/0x40 [hfi1]
   __free_irq+0x201/0x300
   free_irq+0x2e/0x60
   pci_free_irq+0x18/0x30
   msix_free_irq.part.2+0x46/0x80 [hfi1]
   msix_clean_up_interrupts+0x2b/0x70 [hfi1]
   hfi1_init_dd+0x640/0x1a90 [hfi1]
   do_init_one.isra.19+0x34d/0x680 [hfi1]
   local_pci_probe+0x41/0x90
   work_for_cpu_fn+0x16/0x20
   process_one_work+0x1a7/0x360
   worker_thread+0x1cf/0x390
   ? create_worker+0x1a0/0x1a0
   kthread+0x112/0x130
   ? kthread_flush_work_fn+0x10/0x10
   ret_from_fork+0x35/0x40

The free_irq() results in a callback to the registered interrupt handler,
and rcd->do_interrupt is NULL because the receive context data structures
are not fully initialized.

Fix by ensuring that the do_interrupt is always assigned and adding a
guards in the slow path handler to detect and handle a partially
initialized receive context and noop the receive.

Link: https://lore.kernel.org/r/20211129192003.101968.33612.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: b0ba3c18d6 ("IB/hfi1: Move normal functions from hfi1_devdata to const array")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:54 -04:00
Mike Marciniszyn
b6d57e24ce IB/hfi1: Insure use of smp_processor_id() is preempt disabled
The following BUG has just surfaced with our 5.16 testing:

  BUG: using smp_processor_id() in preemptible [00000000] code: mpicheck/1581081
  caller is sdma_select_user_engine+0x72/0x210 [hfi1]
  CPU: 0 PID: 1581081 Comm: mpicheck Tainted: G S                5.16.0-rc1+ #1
  Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
  Call Trace:
   <TASK>
   dump_stack_lvl+0x33/0x42
   check_preemption_disabled+0xbf/0xe0
   sdma_select_user_engine+0x72/0x210 [hfi1]
   ? _raw_spin_unlock_irqrestore+0x1f/0x31
   ? hfi1_mmu_rb_insert+0x6b/0x200 [hfi1]
   hfi1_user_sdma_process_request+0xa02/0x1120 [hfi1]
   ? hfi1_write_iter+0xb8/0x200 [hfi1]
   hfi1_write_iter+0xb8/0x200 [hfi1]
   do_iter_readv_writev+0x163/0x1c0
   do_iter_write+0x80/0x1c0
   vfs_writev+0x88/0x1a0
   ? recalibrate_cpu_khz+0x10/0x10
   ? ktime_get+0x3e/0xa0
   ? __fget_files+0x66/0xa0
   do_writev+0x65/0x100
   do_syscall_64+0x3a/0x80

Fix this long standing bug by moving the smp_processor_id() to after the
rcu_read_lock().

The rcu_read_lock() implicitly disables preemption.

Link: https://lore.kernel.org/r/20211129191958.101968.87329.stgit@awfm-01.cornelisnetworks.com
Cc: stable@vger.kernel.org
Fixes: 0cb2aa690c ("IB/hfi1: Add sysfs interface for affinity setup")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:54 -04:00
Mike Marciniszyn
9292f8f9a2 IB/hfi1: Correct guard on eager buffer deallocation
The code tests the dma address which legitimately can be 0.

The code should test the kernel logical address to avoid leaking eager
buffer allocations that happen to map to a dma address of 0.

Fixes: 60368186fd ("IB/hfi1: Fix user-space buffers mapping with IOMMU enabled")
Link: https://lore.kernel.org/r/20211129191952.101968.17137.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07 13:22:53 -04:00
Christophe JAILLET
f86dbc9fc5 IB/hfi1: Use bitmap_zalloc() when applicable
Use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid
some open-coded arithmetic in allocator arguments.

Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.

Link: https://lore.kernel.org/r/d46c6bc1869b8869244fa71943d2cad4104b3668.1637869925.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-29 14:33:55 -04:00
Dennis Dalessandro
da86dc175b IB/hfi1: Properly allocate rdma counter desc memory
When optional counter support was added the allocation of the memory
holding the counter descriptors was not cleared properly. This caused
WARN_ON()s in the IB/sysfs code to be hit.

This is because the uninitialized memory made some of the counters wrongly
look like optional counters. Use kzalloc.

While here change the sizeof() calls to use the pointer rather than the
name of the type.

  WARNING: CPU: 0 PID: 32644 at drivers/infiniband/core/sysfs.c:1064 ib_setup_port_attrs+0x7e1/0x890 [ib_core]
  CPU: 0 PID: 32644 Comm: kworker/0:2 Tainted: G S      W 5.15.0+ #36
  Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
  Workqueue: events work_for_cpu_fn
  RIP: 0010:ib_setup_port_attrs+0x7e1/0x890 [ib_core]
  RSP: 0018:ffffc90006ea3c40 EFLAGS: 00010202
  RAX: 0000000000000068 RBX: ffff888106ad8000 RCX: 0000000000000138
  RDX: ffff888126c84c00 RSI: ffff888103c41000 RDI: 0000000000000124
  RBP: ffff88810f63a801 R08: ffff888126c8a000 R09: 0000000000000001
  R10: ffffffffa09acf20 R11: 0000000000000065 R12: ffff88810f63a800
  R13: ffff88810f63a800 R14: ffff88810f63a8e0 R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff888667a00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00005590102cb078 CR3: 000000000240a003 CR4: 00000000001706f0
  Call Trace:
   ib_register_device.cold.44+0x23e/0x2d0 [ib_core]
   rvt_register_device+0xfa/0x230 [rdmavt]
   hfi1_register_ib_device+0x623/0x690 [hfi1]
   init_one.cold.36+0x2d1/0x49b [hfi1]
   local_pci_probe+0x45/0x80
   work_for_cpu_fn+0x16/0x20
   process_one_work+0x1b1/0x360
   worker_thread+0x1d4/0x3a0
   kthread+0x11a/0x140
   ret_from_fork+0x22/0x30

Fixes: 5e2ddd1e59 ("RDMA/counter: Add optional counter support")
Link: https://lore.kernel.org/r/20211115200913.124104.47770.stgit@awfm-01.cornelisnetworks.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-16 13:18:24 -04:00
Jason Gunthorpe
a2a2a69d14 Linux 5.15
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmF/AjYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1hkIAJ6sFDbvb4M4LMwf
 Slh2NVL9o5sLMBDzVwnVlyMSKDbMn1WBKreGssaLgZjGDc74lxsdSmw5l9MZm0JN
 xlq95Q6XFiuu+0qDHPWwfDz3JFO4TqW2ZLLPWk9NnkNbRXqccSrlVRi1RpgE1t3/
 NUtS8CQLu6A2BYMc6mkk3aV6IwSNKOkWbM5eBHSvU4j8B6lLbNQop0AfO/wyY1xB
 U6LiVE1RpN/b7Yv+75ITtNzuHzVIBx6305FvSnOlKbMKKvIClt96Vd2OeuoEkK+6
 wGU8JraB1+fc0GckAhynNrjWQWdvi0MAhFWWEJxjS20OGcV1rXDduNfkVNauO1Zn
 +dNyJ3s=
 =g9fz
 -----END PGP SIGNATURE-----

Merge tag 'v5.15' into rdma.git for-next

Pull in the accepted for-rc patches as the next merge needs a newer base.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-01 14:49:20 -03:00
Scott Breyer
ddf65f28dd IB/hfi1: Rebranding of hfi1 driver to Cornelis Networks
Changes instances of Intel to Cornelis in identifying strings

Link: https://lore.kernel.org/r/20211028124601.26694.35662.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Scott Breyer <scott.breyer@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-29 13:30:43 -03:00
Jakub Kicinski
fd92213e9a RDMA: Constify netdev->dev_addr accesses
netdev->dev_addr will become const soon, make sure drivers propagate the
qualifier.

Link: https://lore.kernel.org/r/20211019182604.1441387-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-25 14:33:09 -03:00
Mike Marciniszyn
13bac86195 IB/hfi1: Fix abba locking issue with sc_disable()
sc_disable() after having disabled the send context wakes up any waiters
by calling hfi1_qp_wakeup() while holding the waitlock for the sc.

This is contrary to the model for all other calls to hfi1_qp_wakeup()
where the waitlock is dropped and a local is used to drive calls to
hfi1_qp_wakeup().

Fix by moving the sc->piowait into a local list and driving the wakeup
calls from the list.

Fixes: 099a884ba4 ("IB/hfi1: Handle wakeup of orphaned QPs for pio")
Link: https://lore.kernel.org/r/20211013141852.128104.2682.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-13 13:33:22 -03:00
Aharon Landau
13f30b0fa0 RDMA/counter: Add a descriptor in struct rdma_hw_stats
Add a counter statistic descriptor structure in rdma_hw_stats. In addition
to the counter name, more meta-information will be added.  This code
extension is needed for optional-counter support in the following patches.

Link: https://lore.kernel.org/r/20211008122439.166063-4-markzhang@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12 12:48:04 -03:00
Andy Shevchenko
286dba65a4 IB/hf1: Use string_upper() instead of an open coded variant
Use string_upper() from the string helper module instead of an	open coded
variant.

Link: https://lore.kernel.org/r/20211001123153.67379-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-05 15:22:37 -03:00
Jason Gunthorpe
c78d218fc5 Linux 5.15-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmFaG98eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGosAH/jqy5B2BIEE39O+8
 QTr3vO54SyRRuY/d98wZ+O4SPjfqfpCHuyjKt9YJpEdmzH754NC9gSPOOBegnvHI
 DfrWaivmJ5mdjN2h7+JVqjs58krUv98wWNa5xfvqUp5H7wF3WQg3AxsaMKS1PePD
 kFHfeFbxsg2gYhyhPK6gHtwLn6dEsx9bGny2bKvCh6KuJQEiUXoEcgnFzjFgLNxp
 T5zI1cNSCNUzwRIe+vqQRlfVR2JlSI4tiy0zNJWy9dQ5Z4HOSbFcEz5Df2N7qNYn
 /MqruaASmyREgo9yLHpR1BSyzrea8MCckY04ycYqKZb7gDwcrpAe4QVw2I/Fuzu9
 q//PV4I=
 =+mYg
 -----END PGP SIGNATURE-----

Merge tag 'v5.15-rc4' into rdma.get for-next

Merged due to dependencies in following patches.

Conflict in drivers/infiniband/hw/hfi1/ipoib_tx.c resolved by hand to take
the %p change and txq stats rename together.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04 16:01:26 -03:00
Gustavo A. R. Silva
11333be19c RDMA/hfi1: Use struct_size() and flex_array_size() helpers
Make use of the struct_size() and flex_array_size() helpers instead of
open-coded versions, in order to avoid any potential type mistakes or
integer overflows that, in the worse scenario, could lead to heap
overflows.

Link: https://lore.kernel.org/r/20210927225333.GA192634@embeddedor
Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27 20:15:54 -03:00
Mike Marciniszyn
6d1ebccbd6 IB/hfi1: Add ring consumer and producers traces
These traces are used to debugging ring issues.

The ipoib_txreq needed to be moved to a header file to allow access from
the trace header file.

The trace changes include:
- new producer/consumer traces
- new allocation deallocation traces
- additional fidelity for SDMA engine prints

Fixes: 4bd00b55c9 ("IB/hfi1: Add AIP tx traces")
Link: https://lore.kernel.org/r/20210913132852.131370.9664.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27 20:06:42 -03:00
Mike Marciniszyn
b4b90a50cb IB/hfi1: Remove atomic completion count
The atomic is not needed.

Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210913132847.131370.54250.stgit@awfm-01.cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27 20:06:42 -03:00
Mike Marciniszyn
f5dc70a0e1 IB/hfi1: Tune netdev xmit cachelines
This patch moves fields in the ring and creates a line for the producer
and the consumer.

The adds a consumer side variable that tracks the ring avail so that the
code doesn't have the read the other cacheline to get a count for every
packet. A read now only occurs when the avail is at 0.

Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210913132842.131370.15636.stgit@awfm-01.cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27 20:06:42 -03:00
Mike Marciniszyn
a7125869b2 IB/hfi1: Get rid of tx priv backpointer
The txq has the backpointer, so this is a micro optimization for the tx
path.

Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210913132836.131370.89704.stgit@awfm-01.cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27 20:06:41 -03:00
Mike Marciniszyn
4bf0ca0c9f IB/hfi1: Get rid of hot path divide
The pointer math in this statemet does a divide;
	struct hfi1_ipoib_txq *txq = &priv->txqs[napi - priv->tx_napis];

Elminate the divide by embedding the struct napi_strut in the txq and
getting the txq with a container_of() using the newly embedded napi.

Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210913132831.131370.3993.stgit@awfm-01.cornelisnetworks.com
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27 20:06:41 -03:00
Mike Marciniszyn
d47dfc2b00 IB/hfi1: Remove cache and embed txreq in ring
This patch removes kmem cache allocation and deallocation in favor of
having the ipoib_txreq in the ring.

The consumer is now the packet sending side allocating tx descriptors from
ring and the producer is the napi interrupt handling freeing tx
descriptors.

The locks are now eliminated because the napi tx lock insures a single
consumer and the napi handling insures a single producer.

The napi poll is converted to memory poll looking for items that have been
marked completed.

Fixes: d99dc602e2 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210913132826.131370.4397.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27 20:06:41 -03:00
Guo Zhi
7d5cfafe8b RDMA/hfi1: Fix kernel pointer leak
Pointers should be printed with %p or %px rather than cast to 'unsigned
long long' and printed with %llx.  Change %llx to %p to print the secured
pointer.

Fixes: 042a00f93a ("IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev")
Link: https://lore.kernel.org/r/20210922134857.619602-1-qtxuning1999@sjtu.edu.cn
Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn>
Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27 14:32:14 -03:00
Linus Torvalds
4b105f4a25 RDMA v5.15 merge window 2nd Pull Request
An important error case regression fixes in mlx5:
 
 - Wrong size used when computing the error path smaller allocation request
   leads to corruption
 
 - Confusing but ultimately harmless alignment mis-calculation
 
 - Static checker warnings:
     Null pointer subtraction in qib
     kcalloc in bnxt_re
     Missing static on global variable in hfi1
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmE4qagACgkQOG33FX4g
 mxoueg//SfLuxirMifDHOpH2Db5vC2a/BwYW3AuT78zAXzlUmmEWYIUw1fbJA2vK
 vt/fsjaxW2FBsluiIgV8J961Mlrj+/Y1KPxbV+76CKFoBkaYcjEJcIajGXkm71r5
 bXviUuFEHV2810Nxs6QEitMPyEP3sODmcK5EoKIuifVfjhdcJ3XrLbJb0NA5PKk1
 voD2zpAfhumiAzjZ2q3rJgpCv08WWqts30DzAKdii1asdG9tDF8WBS9BJuPC7MHF
 B2AmYVTgwhsepCAUaYGathKxHblqakRXF3gjOIAB4TKuZEv0OwGVzH3flFbukg5L
 icEw/Ai0H2GM7e3yT3BI3QDoXMg52DfCeChgwttkNsJJidjlc7cUCwaaBCJ1oVo/
 860foW9YFA14ftAlBt2BtvfNfOaqYnmz4tmavVNxqW7NiZCqT/+uyV5BSDPD5b5M
 Xtsa8NK/6u65rz92GtNpVYd8W4ZfjOCAc8cWj6+ELeSDsU92sFWBWalWsl59Hq+t
 AdCUi25uwUAlLPiPKRlYxSXoCCOG80iDgrj8sntIGoVYT0nr4ITVv5VmmVWNQLPw
 V5xfDhCmALrNlp1nCBRsmtBHA/Legqu82/lWpkl12WZGOytU/jTEb2gLWPDglwr+
 BWmaq3Us/g36NX9h/GOvqD5tZCogzkjHvWLW0biMRebDyO1DW1Y=
 =Sg6A
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "I don't usually send a second PR in the merge window, but the fix to
  mlx5 is significant enough that it should start going through the
  process ASAP. Along with it comes some of the usual -rc stuff that
  would normally wait for a -rc2 or so.

  Summary:

  Important error case regression fixes in mlx5:

   - Wrong size used when computing the error path smaller allocation
     request leads to corruption

   - Confusing but ultimately harmless alignment mis-calculation

  Static checker warning fixes:

   - NULL pointer subtraction in qib

   - kcalloc in bnxt_re

   - Missing static on global variable in hfi1"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/hfi1: make hist static
  RDMA/bnxt_re: Prefer kcalloc over open coded arithmetic
  IB/qib: Fix null pointer subtraction compiler warning
  RDMA/mlx5: Fix xlt_chunk_align calculation
  RDMA/mlx5: Fix number of allocated XLT entries
2021-09-09 11:14:14 -07:00
chongjiapeng
2169b90889 IB/hfi1: make hist static
This symbol is not used outside of trace.c, so marks it static.

Fix the following sparse warning:

 drivers/infiniband/hw/hfi1/trace.c:491:23: warning: symbol 'hist' was not declared. Should it be static?

Link: https://lore.kernel.org/r/1630921723-21545-1-git-send-email-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-08 08:33:04 -03:00
Linus Torvalds
23852bec53 RDMA v5.15 merge window Pull Request
- Various cleanup and small features for rtrs
 
 - kmap_local_page() conversions
 
 - Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns
 
 - Cache the IB subnet prefix
 
 - Rework how CRC is calcuated in rxe
 
 - Clean reference counting in iwpm's netlink
 
 - Pull object allocation and lifecycle for user QPs to the uverbs core
   code
 
 - Several small hns features and continued general code cleanups
 
 - Fix the scatterlist confusion of orig_nents/nents introduced in an
   earlier patch creating the append operation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmEudRgACgkQOG33FX4g
 mxraJA//c6bMxrrTVrzmrtrkyYD4tYWE8RDfgvoyZtleZnnEOJeunCQWakQrpJSv
 ukSnOGCA3PtnmRMdV54f/11YJ/7otxOJodSO7jWsIoBrqG/lISAdX8mn2iHhrvJ0
 dIaFEFPLy0WqoMLCJVIYIupR0IStVHb/mWx0uYL4XnnoYKyt7f7K5JMZpNWMhDN2
 ieJw0jfrvEYm8pipWuxUvB16XARlzAWQrjqLpMRI+jFRpbDVBY21dz2/LJvOJPrA
 LcQ+XXsV/F659ibOAGm6bU4BMda8fE6Lw90B/gmhSswJ205NrdziF5cNYHP0QxcN
 oMjrjSWWHc9GEE7MTipC2AH8e36qob16Q7CK+zHEJ+ds7R6/O/8XmED1L8/KFpNA
 FGqnjxnxsl1y27mUegfj1Hh8PfoDp2oVq0lmpEw0CYo4cfVzHSMRrbTR//XmW628
 Ie/mJddpFK4oLk+QkSNjSLrnxOvdTkdA58PU0i84S5eUVMNm41jJDkxg2J7vp0Zn
 sclZsclhUQ9oJ5Q2so81JMWxu4JDn7IByXL0ULBaa6xwQTiVEnyvSxSuPlflhLRW
 0vI2ylATYKyWkQqyX7VyWecZJzwhwZj5gMMWmoGsij8bkZhQ/VaQMaesByzSth+h
 NV5UAYax4GqyOQ/tg/tqT6e5nrI1zof87H64XdTCBpJ7kFyQ/oA=
 =ZwOe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This is quite a small cycle, no major series stands out. The HNS and
  rxe drivers saw the most activity this cycle, with rxe being broken
  for a good chunk of time. The significant deleted line count is due to
  a SPDX cleanup series.

  Summary:

   - Various cleanup and small features for rtrs

   - kmap_local_page() conversions

   - Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns

   - Cache the IB subnet prefix

   - Rework how CRC is calcuated in rxe

   - Clean reference counting in iwpm's netlink

   - Pull object allocation and lifecycle for user QPs to the uverbs
     core code

   - Several small hns features and continued general code cleanups

   - Fix the scatterlist confusion of orig_nents/nents introduced in an
     earlier patch creating the append operation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (90 commits)
  RDMA/mlx5: Relax DCS QP creation checks
  RDMA/hns: Delete unnecessary blank lines.
  RDMA/hns: Encapsulate the qp db as a function
  RDMA/hns: Adjust the order in which irq are requested and enabled
  RDMA/hns: Remove RST2RST error prints for hw v1
  RDMA/hns: Remove dqpn filling when modify qp from Init to Init
  RDMA/hns: Fix QP's resp incomplete assignment
  RDMA/hns: Fix query destination qpn
  RDMA/hfi1: Convert to SPDX identifier
  IB/rdmavt: Convert to SPDX identifier
  RDMA/hns: Bugfix for incorrect association between dip_idx and dgid
  RDMA/hns: Bugfix for the missing assignment for dip_idx
  RDMA/hns: Bugfix for data type of dip_idx
  RDMA/hns: Fix incorrect lsn field
  RDMA/irdma: Remove the repeated declaration
  RDMA/core/sa_query: Retry SA queries
  RDMA: Use the sg_table directly and remove the opencoded version from umem
  lib/scatterlist: Fix wrong update of orig_nents
  lib/scatterlist: Provide a dedicated function to support table append
  RDMA/hns: Delete unused hns bitmap interface
  ...
2021-09-02 14:47:21 -07:00
Cai Huoqing
145eba1aae RDMA/hfi1: Convert to SPDX identifier
use SPDX-License-Identifier instead of a verbose license text

Link: https://lore.kernel.org/r/20210823042622.109-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25 14:56:48 -03:00
Christophe JAILLET
3f69f4e0d6 RDMA: switch from 'pci_' to 'dma_' API
The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below.

It has been hand modified to use 'dma_set_mask_and_coherent()' instead of
'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable.
This is less verbose.

It has been compile tested.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Link: https://lore.kernel.org/r/259e53b7a00f64bf081d41da8761b171b2ad8f5c.1629634798.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-23 13:43:54 -03:00
Christoph Hellwig
4b89451d2c RDMA/hfi1: Stop using seq_get_buf in _driver_stats_seq_show
Just use seq_write to copy the stats into the seq_file buffer instead of
poking holes into the seq_file abstraction.

Link: https://lore.kernel.org/r/20210810151711.1795374-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19 11:39:44 -03:00
Tuo Li
cbe71c6199 IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs()
kmalloc_array() is called to allocate memory for tx->descp. If it fails,
the function __sdma_txclean() is called:
  __sdma_txclean(dd, tx);

However, in the function __sdma_txclean(), tx-descp is dereferenced if
tx->num_desc is not zero:
  sdma_unmap_desc(dd, &tx->descp[0]);

To fix this possible null-pointer dereference, assign the return value of
kmalloc_array() to a local variable descp, and then assign it to tx->descp
if it is not NULL. Otherwise, go to enomem.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20210806133029.194964-1-islituo@gmail.com
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19 11:33:35 -03:00
Cai Huoqing
991c4274dc RDMA/hfi1: Fix typo in comments
Remove the repeated word 'the' from comments

Link: https://lore.kernel.org/r/20210729082346.1882-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30 10:06:08 -03:00
Xiyu Yang
a0293eb249 RDMA/hfi1: Convert from atomic_t to refcount_t on hfi1_devdata->user_refcount
refcount_t type and corresponding API can protect refcounters from
accidental underflow and overflow and further use-after-free situations.

Link: https://lore.kernel.org/r/1626674454-56075-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Tested-by: Josh Fisher <josh.fisher@cornelisnetworks.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30 09:34:26 -03:00
Mike Marciniszyn
62004871e1 IB/hfi1: Adjust pkey entry in index 0
It is possible for the primary IPoIB network device associated with any
RDMA device to fail to join certain multicast groups preventing IPv6
neighbor discovery and possibly other network ULPs from working
correctly. The IPv4 broadcast group is not affected as the IPoIB network
device handles joining that multicast group directly.

This is because the primary IPoIB network device uses the pkey at ndex 0
in the associated RDMA device's pkey table. Anytime the pkey value of
index 0 changes, the primary IPoIB network device automatically modifies
it's broadcast address (i.e. /sys/class/net/[ib0]/broadcast), since the
broadcast address includes the pkey value, and then bounces carrier. This
includes initial pkey assignment, such as when the pkey at index 0
transitions from the opa default of invalid (0x0000) to some value such as
the OPA default pkey for Virtual Fabric 0: 0x8001 or when the fabric
manager is restarted with a configuration change causing the pkey at index
0 to change. Many network ULPs are not sensitive to the carrier bounce and
are not expecting the broadcast address to change including the linux IPv6
stack.  This problem does not affect IPoIB child network devices as their
pkey value is constant for all time.

To mitigate this issue, change the default pkey in at index 0 to 0x8001 to
cover the predominant case and avoid issues as ipoib comes up and the FM
sweeps.

At some point, ipoib multicast support should automatically fix
non-broadcast addresses as it does with the primary broadcast address.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20210715160445.142451.47651.stgit@awfm-01.cornelisnetworks.com
Suggested-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-30 09:27:55 -03:00