Commit graph

1137018 commits

Author SHA1 Message Date
Luoyouming
0c5e259b06 RDMA/hns: Fix incorrect sge nums calculation
The user usually configures the number of sge through the max_send_sge
parameter when creating qp, and configures the maximum size of inline data
that can be sent through max_inline_data. Inline uses sge to fill data to
send. Expect the following:

1) When the sge space cannot hold inline data, the sge space needs to be
   expanded to accommodate all inline data

2) When the sge space is enough to accommodate inline data, the upper
   limit of inline data can be increased so that users can send larger
   inline data

Currently case one is not implemented. When the inline data is larger than
the sge space, an error of insufficient sge space occurs.  This part of
the code needs to be reimplemented according to the expected rules. The
calculation method of sge num is modified to take the maximum value of
max_send_sge and the sge for max_inline_data to solve this problem.

Fixes: 05201e01be ("RDMA/hns: Refactor process of setting extended sge")
Fixes: 30b707886a ("RDMA/hns: Support inline data in extented sge space for RC")
Link: https://lore.kernel.org/r/20221108133847.2304539-3-xuhaoyue1@hisilicon.com
Signed-off-by: Luoyouming <luoyouming@huawei.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18 20:19:49 -04:00
Luoyouming
8eaa6f7d56 RDMA/hns: Fix ext_sge num error when post send
In the HNS ROCE driver, The sge is divided into standard sge and extended
sge.  There are 2 standard sge in RC/XRC, and the UD standard sge is 0.
In the scenario of RC SQ inline, if the data does not exceed 32bytes, the
standard sge will be used. If it exceeds, only the extended sge will be
used to fill the data.

Currently, when filling the extended sge, max_gs is directly used as the
number of the extended sge, which did not subtract the number of standard
sge.  There is a logical error. The new algorithm subtracts the number of
standard sge from max_gs to get the actual number of extended sge.

Fixes: 30b707886a ("RDMA/hns: Support inline data in extented sge space for RC")
Link: https://lore.kernel.org/r/20221108133847.2304539-2-xuhaoyue1@hisilicon.com
Signed-off-by: Luoyouming <luoyouming@huawei.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18 20:19:48 -04:00
Li Zhijian
7d984dac8f RDMA/rxe: Fix mr->map double free
rxe_mr_cleanup() which tries to free mr->map again will be called when
rxe_mr_init_user() fails:

   CPU: 0 PID: 4917 Comm: rdma_flush_serv Kdump: loaded Not tainted 6.1.0-rc1-roce-flush+ #25
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
   Call Trace:
    <TASK>
    dump_stack_lvl+0x45/0x5d
    panic+0x19e/0x349
    end_report.part.0+0x54/0x7c
    kasan_report.cold+0xa/0xf
    rxe_mr_cleanup+0x9d/0xf0 [rdma_rxe]
    __rxe_cleanup+0x10a/0x1e0 [rdma_rxe]
    rxe_reg_user_mr+0xb7/0xd0 [rdma_rxe]
    ib_uverbs_reg_mr+0x26a/0x480 [ib_uverbs]
    ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x1a2/0x250 [ib_uverbs]
    ib_uverbs_cmd_verbs+0x1397/0x15a0 [ib_uverbs]

This issue was firstly exposed since commit b18c7da63f ("RDMA/rxe: Fix
memory leak in error path code") and then we fixed it in commit
8ff5f5d9d8 ("RDMA/rxe: Prevent double freeing rxe_map_set()") but this
fix was reverted together at last by commit 1e75550648 (Revert
"RDMA/rxe: Create duplicate mapping tables for FMRs")

Simply let rxe_mr_cleanup() always handle freeing the mr->map once it is
successfully allocated.

Fixes: 1e75550648 ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"")
Link: https://lore.kernel.org/r/1667099073-2-1-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18 20:15:51 -04:00
Zhu Yanjun
8e1a76493b RDMA/rxe: Remove reliable datagram support
The rdma_rxe driver does not actually support the reliable datagram
transport but contains a variable with RD opcodes in driver code.  And
this variable is never used. So remove it.

Link: https://lore.kernel.org/r/20221112023537.432912-1-yanjun.zhu@intel.com
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18 19:57:46 -04:00
Kees Cook
8f649b5785 IB/hfi1: Replace 1-element array with singleton
Zero-length arrays are deprecated[1] and are being replaced with flexible
array members in support of the ongoing efforts to tighten the
FORTIFY_SOURCE routines on memcpy(), correctly instrument array indexing
with UBSAN_BOUNDS, and to globally enable -fstrict-flex-arrays=3.

Replace zero-length array with flexible-array member "lvs" in struct
opa_port_data_counters_msg and struct opa_port_error_counters64_msg.

Additionally, the "port" member of several structs is defined as a
single-element, but is only ever accessed at index 0. Replace it with a
singleton so that flexible array usage is sane.

This results in no differences in binary output.

[1] https://github.com/KSPP/linux/issues/78

Link: https://lore.kernel.org/r/20221118215847.never.416-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18 19:54:11 -04:00
Guoqing Jiang
34a046f08b RDMA/rtrs-srv: Remove kobject_del from rtrs_srv_destroy_once_sysfs_root_folders
The kobj_paths which is created dynamically by kobject_create_and_add,
and per the comment above kobject_create_and_add, we only need to call
kobject_put which is not same as other kobjs such as stats->kobj_stats
and srv_path->kobj.

Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20221117101945.6317-9-guoqing.jiang@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17 13:47:28 +02:00
Guoqing Jiang
6af4609c18 RDMA/rtrs-srv: Fix several issues in rtrs_srv_destroy_path_files
There are several issues in the function which is supposed to be paired
with rtrs_srv_create_path_files.

1. rtrs_srv_stats_attr_group is not removed though it is created in
   rtrs_srv_create_stats_files.

2. it makes more sense to check kobj_stats.state_in_sysfs before destroy
   kobj_stats instead of rely on kobj.state_in_sysfs.

3. kobject_init_and_add is used for both kobjs (srv_path->kobj and
   srv_path->stats->kobj_stats), however we missed to call kobject_del
   for srv_path->kobj which was called in free_path.

4. rtrs_srv_destroy_once_sysfs_root_folders is independent of either
   kobj or kobj_stats.

Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20221117101945.6317-8-guoqing.jiang@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17 13:47:28 +02:00
Guoqing Jiang
7526198f27 RDMA/rtrs: Clean up rtrs_rdma_dev_pd_ops
Let's remove them since the three members are not used.

Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20221117101945.6317-7-guoqing.jiang@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17 13:47:28 +02:00
Guoqing Jiang
a439956335 RDMA/rtrs-srv: Remove outdated comments from create_con
Remove the orphan comments.

Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20221117101945.6317-6-guoqing.jiang@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17 13:47:28 +02:00
Guoqing Jiang
f5708e6699 RDMA/rtrs-clt: Correct the checking of ib_map_mr_sg
We should check with count, also the only successful case is that
all sg elements are mapped, so make it explicitly.

Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20221117101945.6317-5-guoqing.jiang@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17 13:47:28 +02:00
Guoqing Jiang
102d2f70ec RDMA/rtrs-srv: Correct the checking of ib_map_mr_sg
We should check with nr_sgt, also the only successful case is that
all sg elements are mapped, so make it explicitly.

Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20221117101945.6317-4-guoqing.jiang@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17 13:47:28 +02:00
Guoqing Jiang
0f597ac618 RDMA/rtrs-srv: Refactor the handling of failure case in map_cont_bufs
Let's call unmap_cont_bufs when failure happens, and also only update
mrs_num after everything is settled which means we can remove 'mri'.

Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20221117101945.6317-3-guoqing.jiang@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17 13:47:28 +02:00
Guoqing Jiang
d7115727e3 RDMA/rtrs-srv: Refactor rtrs_srv_rdma_cm_handler
The RDMA_CM_EVENT_CONNECT_REQUEST is quite different to other types,
let's check it separately at the beginning of routine, then we can
avoid the indentation accordingly.

Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Link: https://lore.kernel.org/r/20221117101945.6317-2-guoqing.jiang@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17 13:47:28 +02:00
Mustafa Ismail
8f7e2daa63 RDMA/irdma: Do not request 2-level PBLEs for CQ alloc
When allocating PBLE's for a large CQ, it is possible
that a 2-level PBLE is returned which would cause the
CQ allocation to fail since 1-level is assumed and checked for.
Fix this by requesting a level one PBLE only.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20221115011701.1379-4-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17 10:41:28 +02:00
Mustafa Ismail
24419777e9 RDMA/irdma: Fix RQ completion opcode
The opcode written by HW, in the RQ CQE, is the
RoCEv2/iWARP protocol opcode from the received
packet and not the SW opcode as currently assumed.
Fix this by returning the raw operation type and
queue type in the CQE to irdma_process_cqe and add
2 helpers set_ib_wc_op_sq set_ib_wc_op_rq to map
IRDMA HW op types to IB op types.

Note that for iWARP, only Write with Immediate is
supported so the opcode can only be IB_WC_RECV_RDMA_WITH_IMM
when there is immediate data present.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20221115011701.1379-3-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17 10:41:28 +02:00
Mustafa Ismail
4f44e519b6 RDMA/irdma: Fix inline for multiple SGE's
Currently, inline send and inline write assume a single
SGE and only copy data from the first one. Add support
for multiple SGE's.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20221115011701.1379-2-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-17 10:41:28 +02:00
Bernard Metzler
60da2d11fc RDMA/siw: Set defined status for work completion with undefined status
A malicious user may write undefined values into memory mapped completion
queue elements status or opcode. Undefined status or opcode values will
result in out-of-bounds access to an array mapping siw internal
representation of opcode and status to RDMA core representation when
reaping CQ elements. While siw detects those undefined values, it did not
correctly set completion status to a defined value, thus defeating the
whole purpose of the check.

This bug leads to the following Smatch static checker warning:

	drivers/infiniband/sw/siw/siw_cq.c:96 siw_reap_cqe()
	error: buffer overflow 'map_cqe_status' 10 <= 21

Fixes: bdf1da5df9 ("RDMA/siw: Fix immediate work request flush to completion queue")
Link: https://lore.kernel.org/r/20221115170747.1263298-1-bmt@zurich.ibm.com
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-15 16:47:00 -04:00
Mark Zhang
ecacb3751f RDMA/nldev: Return "-EAGAIN" if the cm_id isn't from expected port
When filling a cm_id entry, return "-EAGAIN" instead of 0 if the cm_id
doesn'the have the same port as requested, otherwise an incomplete entry
may be returned, which causes "rdam res show cm_id" to return an error.

For example on a machine with two rdma devices with "rping -C 1 -v -s"
running background, the "rdma" command fails:
  $ rdma -V
  rdma utility, iproute2-5.19.0
  $ rdma res show cm_id
  link mlx5_0/- cm-idn 0 state LISTEN ps TCP pid 28056 comm rping src-addr 0.0.0.0:7174
  error: Protocol not available

While with this fix it succeeds:
  $ rdma res show cm_id
  link mlx5_0/- cm-idn 0 state LISTEN ps TCP pid 26395 comm rping src-addr 0.0.0.0:7174
  link mlx5_1/- cm-idn 0 state LISTEN ps TCP pid 26395 comm rping src-addr 0.0.0.0:7174

Fixes: 00313983cd ("RDMA/nldev: provide detailed CM_ID information")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/a08e898cdac5e28428eb749a99d9d981571b8ea7.1667810736.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-15 09:56:45 +02:00
Mark Zhang
5e15ff29b1 RDMA/core: Make sure "ib_port" is valid when access sysfs node
The "ib_port" structure must be set before adding the sysfs kobject,
and reset after removing it, otherwise it may crash when accessing
the sysfs node:
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050
  Mem abort info:
    ESR = 0x96000006
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
  Data abort info:
    ISV = 0, ISS = 0x00000006
    CM = 0, WnR = 0
  user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000e85f5ba5
  [0000000000000050] pgd=0000000848fd9003, pud=000000085b387003, pmd=0000000000000000
  Internal error: Oops: 96000006 [#2] PREEMPT SMP
  Modules linked in: ib_umad(O) mlx5_ib(O) nfnetlink_cttimeout(E) nfnetlink(E) act_gact(E) cls_flower(E) sch_ingress(E) openvswitch(E) nsh(E) nf_nat_ipv6(E) nf_nat_ipv4(E) nf_conncount(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) mst_pciconf(O) ipmi_devintf(E) ipmi_msghandler(E) ipmb_dev_int(OE) mlx5_core(O) mlxfw(O) mlxdevm(O) auxiliary(O) ib_uverbs(O) ib_core(O) mlx_compat(O) psample(E) sbsa_gwdt(E) uio_pdrv_genirq(E) uio(E) mlxbf_pmc(OE) mlxbf_gige(OE) mlxbf_tmfifo(OE) gpio_mlxbf2(OE) pwr_mlxbf(OE) mlx_trio(OE) i2c_mlxbf(OE) mlx_bootctl(OE) bluefield_edac(OE) knem(O) ip_tables(E) ipv6(E) crc_ccitt(E) [last unloaded: mst_pci]
  Process grep (pid: 3372, stack limit = 0x0000000022055c92)
  CPU: 5 PID: 3372 Comm: grep Tainted: G      D    OE     4.19.161-mlnx.47.gadcd9e3 #1
  Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.9.2-15-ga2403ab Sep  8 2022
  pstate: 40000005 (nZcv daif -PAN -UAO)
  pc : hw_stat_port_show+0x4c/0x80 [ib_core]
  lr : port_attr_show+0x40/0x58 [ib_core]
  sp : ffff000029f43b50
  x29: ffff000029f43b50 x28: 0000000019375000
  x27: ffff8007b821a540 x26: ffff000029f43e30
  x25: 0000000000008000 x24: ffff000000eaa958
  x23: 0000000000001000 x22: ffff8007a4ce3000
  x21: ffff8007baff8000 x20: ffff8007b9066ac0
  x19: ffff8007bae97578 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000
  x15: 0000000000000000 x14: 0000000000000000
  x13: 0000000000000000 x12: 0000000000000000
  x11: 0000000000000000 x10: 0000000000000000
  x9 : 0000000000000000 x8 : ffff8007a4ce4000
  x7 : 0000000000000000 x6 : 000000000000003f
  x5 : ffff000000e6a280 x4 : ffff8007a4ce3000
  x3 : 0000000000000000 x2 : aaaaaaaaaaaaaaab
  x1 : ffff8007b9066a10 x0 : ffff8007baff8000
  Call trace:
   hw_stat_port_show+0x4c/0x80 [ib_core]
   port_attr_show+0x40/0x58 [ib_core]
   sysfs_kf_seq_show+0x8c/0x150
   kernfs_seq_show+0x44/0x50
   seq_read+0x1b4/0x45c
   kernfs_fop_read+0x148/0x1d8
   __vfs_read+0x58/0x180
   vfs_read+0x94/0x154
   ksys_read+0x68/0xd8
   __arm64_sys_read+0x28/0x34
   el0_svc_common+0x88/0x18c
   el0_svc_handler+0x78/0x94
   el0_svc+0x8/0xe8
  Code: f2955562 aa1603e4 aa1503e0 f9405683 (f9402861)

Fixes: d8a5883814 ("RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/88867e705c42c1cd2011e45201c25eecdb9fef94.1667810736.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-15 09:56:39 +02:00
Mark Zhang
dac153f280 RDMA/restrack: Release MR restrack when delete
The MR restrack also needs to be released when delete it, otherwise it
cause memory leak as the task struct won't be released.

Fixes: 13ef5539de ("RDMA/restrack: Count references to the verbs objects")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/703db18e8d4ef628691fb93980a709be673e62e3.1667810736.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-15 09:56:32 +02:00
Leon Romanovsky
3574cfdca2 RDMA/mana: Remove redefinition of basic u64 type
gdma_obj_handle_t is no more than redefinition of basic
u64 type. Remove such obfuscation.

Link: https://lore.kernel.org/r/3c1e821279e6a165d058655d2343722d6650e776.1668160486.git.leonro@nvidia.com
Acked-by: Long Li <longli@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-14 10:16:46 +02:00
Long Li
0266a17763 RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter
Add a RDMA VF driver for Microsoft Azure Network Adapter (MANA).

Co-developed-by: Ajay Sharma <sharmaajay@microsoft.com>
Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-13-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-11 11:40:09 +02:00
Leon Romanovsky
1ec5617432 Merge branch 'mana-shared-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Long Li says:

====================
Introduce Microsoft Azure Network Adapter (MANA) RDMA driver [netdev prep]

The first 11 patches which modify the MANA Ethernet driver to support
RDMA driver.

* 'mana-shared-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  net: mana: Define data structures for protection domain and memory registration
  net: mana: Define data structures for allocating doorbell page from GDMA
  net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES
  net: mana: Define max values for SGL entries
  net: mana: Move header files to a common location
  net: mana: Record port number in netdev
  net: mana: Export Work Queue functions for use by RDMA driver
  net: mana: Set the DMA device max segment size
  net: mana: Handle vport sharing between devices
  net: mana: Record the physical address for doorbell page region
  net: mana: Add support for auxiliary device
====================

Link: https://lore.kernel.org/all/1667502990-2559-1-git-send-email-longli@linuxonhyperv.com/
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-11 11:35:12 +02:00
Bob Pearson
5de087250f RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mmap.c
Replace calls to pr_xxx() in rxe_mmap.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-17-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:07 -04:00
Bob Pearson
813728043b RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_icrc.c
Replace calls to pr_xxx() in rxe_icrc.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-16-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:06 -04:00
Bob Pearson
c6aba5ea00 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe.c
Replace calls to pr_xxx() in rxe.c with rxe_dbg_xxx().
Calls with a rxe device not yet in scope are left as is.

Link: https://lore.kernel.org/r/20221103171013.20659-15-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:06 -04:00
Bob Pearson
fc50597934 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_task.c
Replace calls to pr_xxx() in rxe_task.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-14-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:06 -04:00
Bob Pearson
25fd735a4c RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_av.c
Replace calls to pr_xxx() in rxe_av.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-13-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
14e501fdb0 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_verbs.c
Replace calls to pr_xxx() in rxe_verbs.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-12-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
0e6090024b RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_srq.c
Replace calls to pr_xxx() in rxe_srq.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-11-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
74ddf7233c RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_resp.c
Replace calls to pr_xxx() in rxe_resp.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-10-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
0edfb15e30 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_req.c
Replace calls to pr_xxx() in rxe_req.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:04 -04:00
Bob Pearson
6af70060d2 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_qp.c
Replace calls to pr_xxx() in rxe_qp.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:04 -04:00
Bob Pearson
34549e88e0 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_net.c
Replace (some) calls to pr_xxx() in rxe_net.c with rxe_dbg_xxx().
Calls with a rxe device not yet in scope are left as is.

Link: https://lore.kernel.org/r/20221103171013.20659-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:04 -04:00
Bob Pearson
e8a87efdf8 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mw.c
Replace calls to pr_xxx() int rxe_mw.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:03 -04:00
Bob Pearson
2778b72b1d RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c
Replace calls to pr_xxx() in rxe_mr.c by rxe_dbg_mr().

Link: https://lore.kernel.org/r/20221103171013.20659-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:03 -04:00
Bob Pearson
52920f537a RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_cq.c
Replace calls to pr_xxx() in rxe_cq.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:03 -04:00
Bob Pearson
27c4c520bd RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_comp.c
Replace calls to pr_xxx() in rxe_comp.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:02 -04:00
Bob Pearson
4554bac48a RDMA/rxe: Add ibdev_dbg macros for rxe
Add macros borrowed from siw to call dynamic debug macro ibdev_dbg.

Link: https://lore.kernel.org/r/20221103171013.20659-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:02 -04:00
Leonid Ravich
5c20311d76 IB/mad: Don't call to function that might sleep while in atomic context
Tracepoints are not allowed to sleep, as such the following splat is
generated due to call to ib_query_pkey() in atomic context.

WARNING: CPU: 0 PID: 1888000 at kernel/trace/ring_buffer.c:2492 rb_commit+0xc1/0x220
CPU: 0 PID: 1888000 Comm: kworker/u9:0 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-305.3.1.el8.x86_64 #1
 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module_el8.3.0+555+a55c8938 04/01/2014
 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
 RIP: 0010:rb_commit+0xc1/0x220
 RSP: 0000:ffffa8ac80f9bca0 EFLAGS: 00010202
 RAX: ffff8951c7c01300 RBX: ffff8951c7c14a00 RCX: 0000000000000246
 RDX: ffff8951c707c000 RSI: ffff8951c707c57c RDI: ffff8951c7c14a00
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: ffff8951c7c01300 R11: 0000000000000001 R12: 0000000000000246
 R13: 0000000000000000 R14: ffffffff964c70c0 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff8951fbc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f20e8f39010 CR3: 000000002ca10005 CR4: 0000000000170ef0
 Call Trace:
  ring_buffer_unlock_commit+0x1d/0xa0
  trace_buffer_unlock_commit_regs+0x3b/0x1b0
  trace_event_buffer_commit+0x67/0x1d0
  trace_event_raw_event_ib_mad_recv_done_handler+0x11c/0x160 [ib_core]
  ib_mad_recv_done+0x48b/0xc10 [ib_core]
  ? trace_event_raw_event_cq_poll+0x6f/0xb0 [ib_core]
  __ib_process_cq+0x91/0x1c0 [ib_core]
  ib_cq_poll_work+0x26/0x80 [ib_core]
  process_one_work+0x1a7/0x360
  ? create_worker+0x1a0/0x1a0
  worker_thread+0x30/0x390
  ? create_worker+0x1a0/0x1a0
  kthread+0x116/0x130
  ? kthread_flush_work_fn+0x10/0x10
  ret_from_fork+0x35/0x40
 ---[ end trace 78ba8509d3830a16 ]---

Fixes: 821bf1de45 ("IB/MAD: Add recv path trace point")
Signed-off-by: Leonid Ravich <lravich@gmail.com>
Link: https://lore.kernel.org/r/Y2t5feomyznrVj7V@leonid-Inspiron-3421
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-10 10:57:15 +02:00
Ajay Sharma
28c66cfa45 net: mana: Define data structures for protection domain and memory registration
The MANA hardware support protection domain and memory registration for use
in RDMA environment. Add those definitions and expose them for use by the
RDMA driver.

Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-12-git-send-email-longli@linuxonhyperv.com
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:27 +02:00
Long Li
f72ececfc1 net: mana: Define data structures for allocating doorbell page from GDMA
The RDMA device needs to allocate doorbell pages for each user context.
Define the GDMA data structures for use by the RDMA driver.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-11-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:27 +02:00
Ajay Sharma
de372f2a9c net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES
When doing memory registration, the PF may respond with
GDMA_STATUS_MORE_ENTRIES to indicate a follow request is needed. This is
not an error and should be processed as expected.

Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-10-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:27 +02:00
Long Li
aa56549792 net: mana: Define max values for SGL entries
The number of maximum SGl entries should be computed from the maximum
WQE size for the intended queue type and the corresponding OOB data
size. This guarantees the hardware queue can successfully queue requests
up to the queue depth exposed to the upper layer.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-9-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:27 +02:00
Long Li
fd325cd648 net: mana: Move header files to a common location
In preparation to add MANA RDMA driver, move all the required header files
to a common location for use by both Ethernet and RDMA drivers.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-8-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Long Li
d44089e555 net: mana: Record port number in netdev
The port number is useful for user-mode application to identify this
net device based on port index. Set to the correct value in ndev.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-7-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Long Li
4c0ff7a106 net: mana: Export Work Queue functions for use by RDMA driver
RDMA device may need to create Ethernet device queues for use by Queue
Pair type RAW. This allows a user-mode context accesses Ethernet hardware
queues. Export the supporting functions for use by the RDMA driver.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-6-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Ajay Sharma
6fe254160b net: mana: Set the DMA device max segment size
MANA hardware doesn't have any restrictions on the DMA segment size, set it
to the max allowed value.

Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-5-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Long Li
b5c1c9855b net: mana: Handle vport sharing between devices
For outgoing packets, the PF requires the VF to configure the vport with
corresponding protection domain and doorbell ID for the kernel or user
context. The vport can't be shared between different contexts.

Implement the logic to exclusively take over the vport by either the
Ethernet device or RDMA device.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-4-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00
Long Li
f3dc096246 net: mana: Record the physical address for doorbell page region
For supporting RDMA device with multiple user contexts with their
individual doorbell pages, record the start address of doorbell page
region for use by the RDMA driver to allocate user context doorbell IDs.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1667502990-2559-3-git-send-email-longli@linuxonhyperv.com
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10 07:57:26 +02:00