Commit graph

3948 commits

Author SHA1 Message Date
Bart Van Assche
e7ffde0164 IB/srp: Add more logging
Log sgid and dgid when reporting that a login has been rejected or when
a host has been added. This makes it easy to figure out which initiator
and target ports these messages apply to.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:30 -07:00
Sagi Grimberg
2088ca66f5 IB/srp: Check ib_query_gid return value
Detected by Coverity.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:30 -07:00
Matan Barak
449fc48866 net/mlx4: Adapt code for N-Port VF
Adds support for N-Port VFs, this includes:
1. Adding support in the wrapped FW command
	In wrapped commands, we need to verify and convert
	the slave's port into the real physical port.
	Furthermore, when sending the response back to the slave,
	a reverse conversion should be made.
2. Adjusting sqpn for QP1 para-virtualization
	The slave assumes that sqpn is used for QP1 communication.
	If the slave is assigned to a port != (first port), we need
	to adjust the sqpn that will direct its QP1 packets into the
	correct endpoint.
3. Adjusting gid[5] to modify the port for raw ethernet
	In B0 steering, gid[5] contains the port. It needs
	to be adjusted into the physical port.
4. Adjusting number of ports in the query / ports caps in the FW commands
	When a slave queries the hardware, it needs to view only
	the physical ports it's assigned to.
5. Adjusting the sched_qp according to the port number
	The QP port is encoded in the sched_qp, thus in modify_qp we need
	to encode the correct port in sched_qp.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:18:30 -04:00
Matan Barak
82373701be IB/mlx4_ib: Adapt code to use caps.num_ports instead of a constant
Some code in the mlx4 IB driver stack assumed MLX4_MAX_PORTS ports.

Instead, we should only loop until the number of actual ports in i
the device, which is stored in dev->caps.num_ports.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:18:29 -04:00
Dan Carpenter
186f8ba062 IB/qib: Cleanup qib_register_observer()
Returning directly is easier to read than do-nothing gotos.  Remove the
duplicative check on "olp" and pull the code in one indent level.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:19:18 -07:00
CQ Tang
49c0e2414b IB/qib: Change SDMA progression mode depending on single- or multi-rail
Improve performance by changing the behavour of the driver when all
SDMA descriptors are in use, and the processes adding new descriptors
are single- or multi-rail.

For single-rail processes, the driver will block the call and finish
posting all SDMA descriptors onto the hardware queue before returning
back to PSM.  Repeated kernel calls are slower than blocking.

For multi-rail processes, the driver will return to PSM as quick as
possible so PSM can feed packets to other rail.  If all hardware
queues are full, PSM will buffer the remaining SDMA descriptors until
notified by interrupt that space is available.

This patch builds a red-black tree to track the number rails opened by
a particular PID. If the number is more than one, it is a multi-rail
PSM process, otherwise, it is a single-rail process.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: John A Gregor <john.a.gregor@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:19:12 -07:00
Steve Wise
eda6d1d1b7 RDMA/cxgb4: Save the correct map length for fast_reg_page_lists
We cannot save the mapped length using the rdma max_page_list_len field
of the ib_fast_reg_page_list struct because the core code uses it.  This
results in an incorrect unmap of the page list in c4iw_free_fastreg_pbl().

I found this with dma mapping debugging enabled in the kernel.  The
fix is to save the length in the c4iw_fr_page_list struct.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:01:30 -07:00
Steve Wise
df2d5130ec RDMA/cxgb4: Default peer2peer mode to 1
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:01:30 -07:00
Steve Wise
ba32de9d8d RDMA/cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:01:30 -07:00
Steve Wise
8a9c399eee RDMA/cxgb4: Fix incorrect BUG_ON conditions
Based on original work from Jay Hernandez <jay@chelsio.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:01:30 -07:00
Steve Wise
ebf00060c3 RDMA/cxgb4: Always release neigh entry
Always release the neigh entry in rx_pkt().

Based on original work by Santosh Rastapur <santosh@chelsio.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Steve Wise
f8e819081f RDMA/cxgb4: Allow loopback connections
find_route() must treat loopback as a valid egress interface.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Steve Wise
ffd435924c RDMA/cxgb4: Cap CQ size at T4_MAX_IQ_SIZE
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Dan Carpenter
e24a72a330 RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()
There is a four byte hole at the end of the "uresp" struct after the
->qid_mask member.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Dan Carpenter
ff1706f4fe RDMA/cxgb4: Fix underflows in c4iw_create_qp()
These sizes should be unsigned so we don't allow negative values and
have underflow bugs.  These can come from the user so there may be
security implications, but I have not tested this.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Sagi Grimberg
6192d4e6bb IB/iser: Publish T10-PI support to SCSI midlayer
After allocating a scsi_host we set protection types and guard type
supported.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg
0a7a08ad6f IB/iser: Implement check_protection
Once the iSCSI transaction is completed we must implement
check_protection in order to notify on DIF errors that may have
occured.

The routine boils down to calling ib_check_mr_status to get the
signature status of the transaction.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg
177e31bd5a IB/iser: Support T10-PI operations
Add logic to initialize protection information entities.  Upon each
iSCSI task, we keep the scsi_cmnd in order to query the scsi
protection operations and reference to protection buffers.

Modify iser_fast_reg_mr to receive indication whether it is
registering the data or protection buffers.

In addition introduce iser_reg_sig_mr which performs fast registration
work-request for a signature enabled memory region
(IB_WR_REG_SIG_MR).  In this routine we set all the protection
relevants for the device to offload protection data-transfer and
verification.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Alex Tabachnik
6b5a8fb0d2 IB/iser: Initialize T10-PI resources
During connection establishment we also initialize T10-PI resources
(QP, PI contexts) in order to support SCSI's protection operations.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Alex Tabachnik
7f73384752 IB/iser: Introduce pi_enable, pi_guard module parameters
Use modparams to activate protection information support.

pi_enable bool: Based on this parameter iSER will know if it should
support T10-PI.  We don't want to do this by default as it requires to
allocate and initialize extra resources.  In case pi_enable=N, iSER
won't publish to SCSI midlayer any DIF capabilities.

pi_guard int: Based on this parameter iSER will publish DIX guard type
support to SCSI midlayer.  0 means CRC is allowed to be passed in DIX
buffers, 1 (or non-zero) means IP-CSUM is allowed to be passed in DIX
buffers.  Note that over the wire, only CRC is allowed.

In the next phase, it is worth considering passing these parameters
from iscsid via nlmsg.  This will allow these parameters to be
connection based rather than global.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg
5f588e3d0c IB/iser: Generalize fall_to_bounce_buf routine
Unaligned SG-lists may also happen for protection information.
Generalize bounce buffer routine to handle any iser_data_buf which may
be data and/or protection.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
9a8b08fad2 IB/iser: Generalize iser_unmap_task_data and finalize_rdma_unaligned_sg
This routines operates on data buffers and may also work with
protection infomation buffers.  So we generalize them to handle an
iser_data_buf which can be the command data or command protection
information.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
73bc06b7ed IB/iser: Replace fastreg descriptor valid bool with indicators container
In T10-PI support we will have memory keys for protection buffers and
signature transactions.  We prefer to compact indicators rather than
keeping multiple bools.

This commit does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
65198d6b84 IB/iser: Keep IB device attributes under iser_device
For T10-PI offload support, we will need to know the device signature
offload capability upon every connection establishment.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
310b347c60 IB/iser: Move fast_reg_descriptor initialization to a function
fastreg descriptor will include protection information context.  In
order to place the logic in one place we introduce iser_create_fr_desc
function.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
d11ec4ecf0 IB/iser: Push the decision what memory key to use into fast_reg_mr routine
This is a preparation step for T10-PI offload support.  We prefer to
push the desicion of which mkey to use (global or fastreg) to
iser_fast_reg_mr.  We choose to do this since in T10-PI we may need to
register for protection buffers and in this case we wish to simplify
iser_fast_reg_mr instead of repeating the logic of which key to use.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
7306b8fad4 IB/iser: Avoid FRWR notation, use fastreg instead
FRWR stands for "fast registration work request". We want to avoid
calling the fastreg pool with that name, instead we name it fastreg
which stands for "fast registration".

This pool will include more elements in the future, so it is a good
idea to generalize the name.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
db523b8de1 IB/iser: Suppress completions for fast registration work requests
In case iSER uses fast registration method, it should not request for
successful completions on fast registration nor local invalidate
requests.  We color wr_id with ISER_FRWR_LI_WRID in order to correctly
consume error completions.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:54 -07:00
Bart Van Assche
0e9855dbf4 IB/mlx4: Fix a sparse endianness warning
Fix the following warning for the mlx4 driver:

    $ make M=drivers/infiniband C=2 CF=-D__CHECK_ENDIAN__
    drivers/infiniband/hw/mlx4/qp.c:1885:31: warning: restricted __be16 degrades to integer

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:23:52 -07:00
Prarit Bhargava
bc1b04ab34 RDMA/ocrdma: Fix compiler warning
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c: In function ‘_ocrdma_modify_qp’:
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1299:31: error: ‘old_qps’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  status = ocrdma_mbx_modify_qp(dev, qp, attr, attr_mask, old_qps);

ocrdma_mbx_modify_qp() (and subsequent calls) doesn't appear to use old_qps
so it doesn't need to be passed on.  Removing the variable results in the
warning going away.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Devesh Sharma (Devesh.sharma@emulex.com)
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:34:13 -07:00
Dan Carpenter
349850f0a9 RDMA/nes: Clean up a condition
We don't need to test "ret" twice and also the white space is messed up.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:29:37 -07:00
Dan Carpenter
db498827ff IB/qib: Remove duplicate check in get_a_ctxt()
We already know "pusable" is non-zero, no need to check again.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:28:09 -07:00
Fabio Estevam
970918b32b IB/usnic: Remove '0x' when using %pa format
%pa format already prints in hexadecimal format, so remove the '0x' annotation
to avoid a double '0x0x' pattern.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:26:38 -07:00
Yann Droneaud
9d194d1025 IB/nes: Return an error on ib_copy_from_udata() failure instead of NULL
In case of error while accessing to userspace memory, function
nes_create_qp() returns NULL instead of an error code wrapped through
ERR_PTR().  But NULL is not expected by ib_uverbs_create_qp(), as it
check for error with IS_ERR().

As page 0 is likely not mapped, it is going to trigger an Oops when
the kernel will try to dereference NULL pointer to access to struct
ib_qp's fields.

In some rare cases, page 0 could be mapped by userspace, which could
turn this bug to a vulnerability that could be exploited: the function
pointers in struct ib_device will be under userspace total control.

This was caught when using spatch (aka. coccinelle)
to rewrite calls to ib_copy_{from,to}_udata().

Link: https://www.gitorious.org/opteya/ib-hw-nes-create-qp-null
Link: 75ebf2c103:ib_copy_udata.cocci
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:20:28 -07:00
Dennis Dalessandro
06064a103f IB/qib: Fix memory leak of recv context when driver fails to initialize.
In qib_create_ctxts() we allocate an array to hold recv contexts. Then attempt
to create data for those recv contexts. If that call to qib_create_ctxtdata()
fails then an error is returned but the previously allocated memory is not
freed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Yann Droneaud
8572de9732 IB/qib: fixup indentation in qib_ib_rcv()
Commit af061a644a add some code in qib_ib_rcv() which
trigger a warning from coccicheck (coccinelle/spatch):

$ make C=2 CHECK=scripts/coccicheck drivers/infiniband/hw/qib/

  CHECK   drivers/infiniband/hw/qib/qib_verbs.c
drivers/infiniband/hw/qib/qib_verbs.c:679:5-32: code aligned with following code on line 681
  CC [M]  drivers/infiniband/hw/qib/qib_verbs.o

In fact, according to similar code in qib_kreceive(),
qib_ib_rcv() code is correct but improperly indented.

This patch fix indentation for the misaligned portion.

Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: infinipath@intel.com
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: cocci@systeme.lip6.fr
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Yann Droneaud
37a967651c IB/qib: add missing braces in do_qib_user_sdma_queue_create()
Commit c804f07248 moved qib_assign_ctxt() to
do_qib_user_sdma_queue_create() but dropped the braces
around the statements.

This was spotted by coccicheck (coccinelle/spatch):

$ make C=2 CHECK=scripts/coccicheck drivers/infiniband/hw/qib/

  CHECK   drivers/infiniband/hw/qib/qib_file_ops.c
drivers/infiniband/hw/qib/qib_file_ops.c:1583:2-23: code aligned with following code on line 1587

This patch adds braces back.

Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: infinipath@intel.com
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: cocci@systeme.lip6.fr
Cc: stable@vger.kernel.org
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Mike Marciniszyn
7d7632add8 IB/qib: Modify software pma counters to use percpu variables
The counters, unicast_xmit, unicast_rcv, multicast_xmit, multicast_rcv
are now maintained as percpu variables.

The mad code is modified to add a z_ latch so that the percpu counters
monotonically increase with appropriate adjustments in the reset,
read logic to maintain the z_ latch.

This patch also corrects the fact the unitcast_xmit wasn't handled
at all for UC and RC QPs.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Mike Marciniszyn
1ed88dd7d0 IB/qib: Add percpu counter replacing qib_devdata int_counter
This patch replaces the dd->int_counter with a percpu counter.

The maintanance of qib_stats.sps_ints and int_counter are
combined into the new counter.

There are two new functions added to read the counter:
- qib_int_counter (for a particular qib_devdata)
- qib_sps_ints (for all HCAs)

A z_int_counter is added to allow the interrupt detection logic
to determine if interrupts have occured since z_int_counter
was "reset".

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Mike Marciniszyn
f8b6c47a44 IB/qib: Fix debugfs ordering issue with multiple HCAs
The debugfs init code was incorrectly called before the idr mechanism
is used to get the unit number, so the dd->unit hasn't been
initialized.  This caused the unit relative directory creation to fail
after the first.

This patch moves the init for the debugfs stuff until after all of the
failures and after the unit number has been determined.

A bug in unwind code in qib_alloc_devdata() is also fixed.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Dennis Dalessandro
a2cb0eb8a6 IB/ipath: Fix potential buffer overrun in sending diag packet routine
Guard against a potential buffer overrun.  The size to read from the
user is passed in, and due to the padding that needs to be taken into
account, as well as the place holder for the ICRC it is possible to
overflow the 32bit value which would cause more data to be copied from
user space than is allocated in the buffer.

Cc: <stable@vger.kernel.org>
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Dennis Dalessandro
1c20c81909 IB/qib: Fix potential buffer overrun in sending diag packet routine
Guard against a potential buffer overrun.  Right now the qib driver is
protected by the fact that the data structure in question is only 16
bits.  Should that ever change the problem will be exposed. There is a
similar defect in the ipath driver and this brings the two code paths
into sync.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Tatyana Nikolova
43adff3979 RDMA/nes: Fix for passing a valid QP pointer to the user space library
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 10:04:16 -07:00
Tatyana Nikolova
4ac79a7003 RDMA/nes: Fixes for IRD/ORD negotiation with MPA v2
Fixes to enable the negotiation of the supported IRD/ORD sizes with
the peer when exchanging MPA v2 messages in connection establishment.

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 10:03:17 -07:00
Steve Wise
05eb23893c cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes
The current logic suffers from a slow response time to disable user DB
usage, and also fails to avoid DB FIFO drops under heavy load. This commit
fixes these deficiencies and makes the avoidance logic more optimal.
This is done by more efficiently notifying the ULDs of potential DB
problems, and implements a smoother flow control algorithm in iw_cxgb4,
which is the ULD that puts the most load on the DB fifo.

Design:

cxgb4:

Direct ULD callback from the DB FULL/DROP interrupt handler.  This allows
the ULD to stop doing user DB writes as quickly as possible.

While user DB usage is disabled, the LLD will accumulate DB write events
for its queues.  Then once DB usage is reenabled, a single DB write is
done for each queue with its accumulated write count.  This reduces the
load put on the DB fifo when reenabling.

iw_cxgb4:

Instead of marking each qp to indicate DB writes are disabled, we create
a device-global status page that each user process maps.  This allows
iw_cxgb4 to only set this single bit to disable all DB writes for all
user QPs vs traversing the idr of all the active QPs.  If the libcxgb4
doesn't support this, then we fall back to the old approach of marking
each QP.  Thus we allow the new driver to work with an older libcxgb4.

When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
via the status page and transition the DB state to STOPPED.  As user
processes see that DB writes are disabled, they call into iw_cxgb4
to submit their DB write events.  Since the DB state is in STOPPED,
the QP trying to write gets enqueued on a new DB "flow control" list.
As subsequent DB writes are submitted for this flow controlled QP, the
amount of writes are accumulated for each QP on the flow control list.
So all the user QPs that are actively ringing the DB get put on this
list and the number of writes they request are accumulated.

When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
context, we change the DB state to FLOW_CONTROL, and begin resuming all
the QPs that are on the flow control list.  This logic runs on until
the flow control list is empty or we exit FLOW_CONTROL mode (due to
a DB DROP upcall, for example).  QPs are removed from this list, and
their accumulated DB write counts written to the DB FIFO.  Sets of QPs,
called chunks in the code, are removed at one time. The chunk size is 64.
So 64 QPs are resumed at a time, and before the next chunk is resumed, the
logic waits (blocks) for the DB FIFO to drain.  This prevents resuming to
quickly and overflowing the FIFO.  Once the flow control list is empty,
the db state transitions back to NORMAL and user QPs are again allowed
to write directly to the user DB register.

The algorithm is designed such that if the DB write load is high enough,
then all the DB writes get submitted by the kernel using this flow
controlled approach to avoid DB drops.  As the load lightens though, we
resume to normal DB writes directly by user applications.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:44:11 -04:00
Steve Wise
7a2cea2aaa cxgb4/iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice
Based on original work by Anand Priyadarshee <anandp@chelsio.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:44:11 -04:00
David S. Miller
85dcce7a73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	drivers/net/xen-netback/netback.c

Both the r8152 and netback conflicts were simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:31:55 -04:00
Nicholas Bellinger
5aad2145ac Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband into for-next 2014-03-13 12:01:58 -07:00
Jack Morgenstein
aa9a2d51a3 mlx4: Activate RoCE/SRIOV
To activate RoCE/SRIOV, need to remove the following:
1. In mlx4_ib_add, need to remove the error return preventing
   initialization of a RoCE port under SRIOV.
2. In update_vport_qp_params (in resource_tracker.c) need to remove
   the error return when a RoCE RC or UD qp is detected.
   This error return causes the INIT-to-RTR qp transition to fail
   in the wrapper function under RoCE/SRIOV.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:16 -04:00
Shani Michaelli
ceb5433b3a mlx4_ib: Fix SIDR support of for UD QPs under SRIOV/RoCE
* Handle CM_SIDR_REQ_ATTR_ID and CM_SIDR_REP_ATTR_ID
  in multiplex_cm_handler and demux_cm_handler.

* Handle Service ID Resolution messages and REQ messages
  separately, for their formats are different.

Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:16 -04:00
Jack Morgenstein
5ea8bbfc49 mlx4: Implement IP based gids support for RoCE/SRIOV
Since there is no connection between the MAC/VLAN and the GID
when using IP-based addressing, the proxy QP1 (running on the
slave) must pass the source-mac, destination-mac, and vlan_id
information separately from the GID. Additionally, the Host
must pass the remote source-mac and vlan_id back to the slave,

This is achieved as follows:
Outgoing MADs:
    1. Source MAC: obtained from the CQ completion structure
       (struct ib_wc, smac field).
    2. Destination MAC: obtained from the tunnel header
    3. vlan_id: obtained from the tunnel header.
Incoming MADs
    1. The source (i.e., remote) MAC and vlan_id are passed in
       the tunnel header to the proxy QP1.

VST mode support:
     For outgoing MADs,  the vlan_id obtained from the header is
        discarded, and the vlan_id specified by the Hypervisor is used
        instead.
     For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the
        "invalid" vlan (0xffff)  is substituted when forwarding to the slave.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:16 -04:00
Jack Morgenstein
2f5bb47368 mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.

To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.

This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.

Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.

The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified:  INIT-to-RTR, and SQD-to-SQD.

Similarly for the alternate path information.

Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.

For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
   MAC information to the operational information, and reset
   the candidate MAC info.
2. If there was a previous MAC, unregister it.  Then move
   the MAC information from candidate to operational, and
   reset the candidate info (as in 1. above).

The MAC address failure logic is the same for all cases:
 - Unregister the candidate MAC, and reset the candidate MAC info.

For Vlan registration, the logic is similar.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:15 -04:00
Jack Morgenstein
b6ffaeffae mlx4: In RoCE allow guests to have multiple GIDS
The GIDs are statically distributed, as follows:
PF: gets 16 GIDs
VFs:  Remaining GIDS are divided evenly between VFs activated by the driver.
      If the division is not even, lower-numbered VFs get an extra GID.

For an IB interface, the number of gids per guest remains as before: one gid per guest.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:14 -04:00
Jack Morgenstein
6ee51a4e86 mlx4: Adjust QP1 multiplexing for RoCE/SRIOV
This requires the following modifications:
1. Fix build_mlx4_header to properly fill in the ETH fields
2. Adjust mux and demux QP1 flow to support RoCE.

This commit still assumes only one GID per slave for RoCE.
The commit enabling multiple GIDs is a subsequent commit, and
is done separately because of its complexity.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:12 -04:00
Linus Torvalds
66a523db70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "This series addresses a number of outstanding issues wrt to active I/O
  shutdown using iser-target.  This includes:

   - Fix a long standing tpg_state bug where a tpg could be referenced
     during explicit shutdown (v3.1+ stable)
   - Use list_del_init for iscsi_cmd->i_conn_node so list_empty checks
     work as expected (v3.10+ stable)
   - Fix a isert_conn->state related hung task bug + ensure outstanding
     I/O completes during session shutdown.  (v3.10+ stable)
   - Fix isert_conn->post_send_buf_count accounting for RDMA READ/WRITEs
     (v3.10+ stable)
   - Ignore FRWR completions during active I/O shutdown (v3.12+ stable)
   - Fix command leakage for interrupt coalescing during active I/O
     shutdown (v3.13+ stable)

  Also included is another DIF emulation fix from Sagi specific to
  v3.14-rc code"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  Target/sbc: Fix sbc_copy_prot for offset scatters
  iser-target: Fix command leak for tx_desc->comp_llnode_batch
  iser-target: Ignore completions for FRWRs in isert_cq_tx_work
  iser-target: Fix post_send_buf_count for RDMA READ/WRITE
  iscsi/iser-target: Fix isert_conn->state hung shutdown issues
  iscsi/iser-target: Use list_del_init for ->i_conn_node
  iscsi-target: Fix iscsit_get_tpg_from_np tpg_state bug
2014-03-09 13:50:14 -07:00
Sagi Grimberg
2dea909444 IB/mlx5: Expose support for signature MR feature
Currently support only T10-DIF types of signature handover operations
(types 1|2|3).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:40:04 -08:00
Sagi Grimberg
d5436ba010 IB/mlx5: Collect signature error completion
This commit takes care of the generated signature error CQE generated
by the HW (if happened).  The underlying mlx5 driver will handle
signature error completions and will mark the relevant memory region
as dirty.

Once the consumer gets the completion for the transaction, it must
check for signature errors on signature memory region using a new
lightweight verb ib_check_mr_status().

In case the user doesn't check for signature error (i.e. doesn't call
ib_check_mr_status() with status check IB_MR_CHECK_SIG_STATUS), the
memory region cannot be used for another signature operation
(REG_SIG_MR work request will fail).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:40:04 -08:00
Sagi Grimberg
e6631814fb IB/mlx5: Support IB_WR_REG_SIG_MR
This patch implements IB_WR_REG_SIG_MR posted by the user.

Baisically this WR involves 3 WQEs in order to prepare and properly
register the signature layout:

1. post UMR WR to register the sig_mr in one of two possible ways:
    * In case the user registered a single MR for data so the UMR data segment
      consists of:
      - single klm (data MR) passed by the user
      - BSF with signature attributes requested by the user.
    * In case the user registered 2 MRs, one for data and one for protection,
      the UMR consists of:
      - strided block format which includes data and protection MRs and
        their repetitive block format.
      - BSF with signature attributes requested by the user.

2. post SET_PSV in order to set the memory domain initial
   signature parameters passed by the user.
   SET_PSV is not signaled and solicited CQE.

3. post SET_PSV in order to set the wire domain initial
   signature parameters passed by the user.
   SET_PSV is not signaled and solicited CQE.

* After this compound WR we place a small fence for next WR to come.

This patch also introduces some helper functions to set the BSF correctly
and determining the signature format selectors.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:39:51 -08:00
Sagi Grimberg
2ac45934f8 IB/mlx5: Remove MTT access mode from umr flags helper function
get_umr_flags helper function might be used for types of access modes
other than ACCESS_MODE_MTT, such as ACCESS_MODE_KLM.  So remove it from
helper, and callers will add their own access mode flag.

This commit does not add/change functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg
6e5eadace1 IB/mlx5: Break up wqe handling into begin & finish routines
As a preliminary step for signature feature which will require posting
multiple (3) WQEs for a single WR, we break post_send routine WQE
indexing into begin and finish routines.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg
e1e66cc264 IB/mlx5: Initialize mlx5_ib_qp signature-related members
If user requested signature enable we initialize relevant mlx5_ib_qp
members.  We mark the qp as sig_enable and we increase the effective
SQ size, but still limit the user max_send_wr to original size
computed.  We also allow the create_qp routine to accept sig_enable
create flag.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg
3121e3c441 mlx5: Implement create_mr and destroy_mr
Support create_mr and destroy_mr verbs.  Creating ib_mr may be done
for either ib_mr that will register regular page lists like
alloc_fast_reg_mr routine, or indirect ib_mrs that can register other
(pre-registered) ib_mrs in an indirect manner.

In addition user may request signature enable, that will mean that the
created ib_mr may be attached with signature attributes (BSF, PSVs).

Currently we only allow direct/indirect registration modes.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg
1b01d33560 IB/core: Introduce signature verbs API
Introduce a verbs interface for signature-related operations.  A
signature handover operation configures the layouts of data and
protection attributes both in memory and wire domains.

Signature operations are:

- INSERT:
  Generate and insert protection information when handing over
  data from input space to output space.
- validate and STRIP:
  Validate protection information and remove it when handing over
  data from input space to output space.
- validate and PASS:
  Validate protection information and pass it when handing over
  data from input space to output space.

Once the signature handover opration is done, the HCA will offload
data integrity generation/validation while performing the actual data
transfer.

Additions:

1. HCA signature capabilities in device attributes
    Verbs provider supporting signature handover operations fills
    relevant fields in device attributes structure returned by
    ib_query_device.

2. QP creation flag IB_QP_CREATE_SIGNATURE_EN
    Creating a QP that will carry signature handover operations may
    require some special preparations from the verbs provider.  So we
    add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the
    created QP may carry out signature handover operations.  Expose
    signature support to verbs layer (no support for now).

3. New send work request IB_WR_REG_SIG_MR
    Signature handover work request. This WR will define the signature
    handover properties of the memory/wire domains as well as the
    domains layout. The purpose of this work request is to bind all
    the needed information for the signature operation:

    - data to be transferred:  wr->sg_list (ib_sge).
      * The raw data, pre-registered to a single MR (normally, before
        signature, this MR would have been used directly for the data
        transfer)
    - data protection guards: sig_handover.prot (ib_sge).
      * The data protection buffer, pre-registered to a single MR, which
        contains the data integrity guards of the raw data blocks.
        Note that it may not always exist, only in cases where the user is
        interested in storing protection guards in memory.
    - signature operation attributes: sig_handover.sig_attrs.
      * Tells the HCA how to validate/generate the protection information.

    Once the work request is executed, the memory region that will
    describe the signature transaction will be the sig_mr.  The
    application can now go ahead and send the sig_mr.rkey or use the
    sig_mr.lkey for data transfer.

4. New Verb ib_check_mr_status
    check_mr_status verb checks the status of the memory region post
    transaction.  The first check that may be used is
    IB_MR_CHECK_SIG_STATUS, which will indicate if any signature
    errors are pending for a specific signature-enabled ib_mr.  This
    verb is a lightwight check and is allowed to be taken from
    interrupt context.  An application must call this verb after it is
    known that the actual data transfer has finished.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Sagi Grimberg
17cd3a2db8 IB/core: Introduce protected memory regions
This commit introduces verbs for creating/destoying memory
regions which will allow new types of memory key operations such
as protected memory registration.

Indirect memory registration is registering several (one
of more) pre-registered memory regions in a specific layout.
The Indirect region may potentialy describe several regions
and some repitition format between them.

Protected Memory registration is registering a memory region
with various data integrity attributes which will describe protection
schemes that will be handled by the HCA in an offloaded manner.
These memory regions will be applicable for a new REG_SIG_MR
work request introduced later in this patchset.

In the future these routines may replace or implement current memory
regions creation routines existing today:
- ib_reg_user_mr
- ib_alloc_fast_reg_mr
- ib_get_dma_mr
- ib_dereg_mr

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-07 11:26:49 -08:00
Nicholas Bellinger
ebbe442183 iser-target: Fix command leak for tx_desc->comp_llnode_batch
This patch addresses a number of active I/O shutdown issues
related to isert_cmd descriptors being leaked that are part
of a completion interrupt coalescing batch.

This includes adding logic in isert_cq_tx_comp_err() to
drain any associated tx_desc->comp_llnode_batch, as well
as isert_cq_drain_comp_llist() to drain any associated
isert_conn->conn_comp_llist.

Also, set tx_desc->llnode_active in isert_init_send_wr()
in order to determine when work requests need to be skipped
in isert_cq_tx_work() exception path code.

Finally, update isert_init_send_wr() to only allow interrupt
coalescing when ISER_CONN_UP.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.13+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:10 -08:00
Nicholas Bellinger
9bb4ca68fc iser-target: Ignore completions for FRWRs in isert_cq_tx_work
This patch changes IB_WR_FAST_REG_MR + IB_WR_LOCAL_INV related
work requests to include a ISER_FRWR_LI_WRID value in order to
signal isert_cq_tx_work() that these requests should be ignored.

This is necessary because even though IB_SEND_SIGNALED is not
set for either work request, during a QP failure event the work
requests will be returned with exception status from the TX
completion queue.

v2 changes:
 - Rename ISER_FRWR_LI_WRID -> ISER_FASTREG_LI_WRID (Sagi)

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:10 -08:00
Nicholas Bellinger
b6b87a1df6 iser-target: Fix post_send_buf_count for RDMA READ/WRITE
This patch fixes the incorrect setting of ->post_send_buf_count
related to RDMA WRITEs + READs where isert_rdma_rw->send_wr_num
was not being taken into account.

This includes incrementing ->post_send_buf_count within
isert_put_datain() + isert_get_dataout(), decrementing within
__isert_send_completion() + isert_response_completion(), and
clearing wr->send_wr_num within isert_completion_rdma_read()

This is necessary because even though IB_SEND_SIGNALED is
not set for RDMA WRITEs + READs, during a QP failure event
the work requests will be returned with exception status
from the TX completion queue.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Nicholas Bellinger
defd884845 iscsi/iser-target: Fix isert_conn->state hung shutdown issues
This patch addresses a couple of different hug shutdown issues
related to wait_event() + isert_conn->state.  First, it changes
isert_conn->conn_wait + isert_conn->conn_wait_comp_err from
waitqueues to completions, and sets ISER_CONN_TERMINATING from
within isert_disconnect_work().

Second, it splits isert_free_conn() into isert_wait_conn() that
is called earlier in iscsit_close_connection() to ensure that
all outstanding commands have completed before continuing.

Finally, it breaks isert_cq_comp_err() into seperate TX / RX
related code, and adds logic in isert_cq_rx_comp_err() to wait
for outstanding commands to complete before setting ISER_CONN_DOWN
and calling complete(&isert_conn->conn_wait_comp_err).

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Nicholas Bellinger
5159d763f6 iscsi/iser-target: Use list_del_init for ->i_conn_node
There are a handful of uses of list_empty() for cmd->i_conn_node
within iser-target code that expect to return false once a cmd
has been removed from the per connect list.

This patch changes all uses of list_del -> list_del_init in order
to ensure that list_empty() returns false as expected.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Yishai Hadas
eeb8461e36 IB: Refactor umem to use linear SG table
This patch refactors the IB core umem code and vendor drivers to use a
linear (chained) SG table instead of chunk list.  With this change the
relevant code becomes clearer—no need for nested loops to build and
use umem.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-04 10:34:28 -08:00
Amir Vadai
169a1d85d0 net,IB/mlx: Bump all Mellanox driver versions
Bump all Mellanox driver versions.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-25 17:34:44 -05:00
Linus Torvalds
946dd683af Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Mostly minor fixes this time to v3.14-rc1 related changes.  Also
  included is one fix for a free after use regression in persistent
  reservations UNREGISTER logic that is CC'ed to >= v3.11.y stable"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  Target/sbc: Fix protection copy routine
  IB/srpt: replace strict_strtoul() with kstrtoul()
  target: Simplify command completion by removing CMD_T_FAILED flag
  iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
  iscsi-target: Fix SNACK Type 1 + BegRun=0 handling
  target: Fix missing length check in spc_emulate_evpd_83()
  qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_list
  target: Fix 32-bit + CONFIG_LBDAF=n link error w/ sector_div
  target: Fix free-after-use regression in PR unregister
2014-02-15 16:18:47 -08:00
Roland Dreier
c9459388d8 Merge branches 'cma', 'cxgb4', 'iser', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'usnic' into for-next 2014-02-14 09:49:12 -08:00
Devesh Sharma
09de3f1313 RDMA/ocrdma: Fix load time panic during GID table init
We should use rdma_vlan_dev_real_dev() instead of using vlan_dev_real_dev()
when building the GID table for a vlan interface.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:49:04 -08:00
Devesh Sharma
a61d93d92f RDMA/ocrdma: Fix traffic class shift
Use correct value for obtaining traffic class from device
response for Query QP request.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:49:00 -08:00
Dan Carpenter
fd8b48b22a IB/iser: Fix use after free in iser_snd_completion()
We use "tx_desc" again after we free it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:48:09 -08:00
Roi Dayan
7d9eacf945 IB/iser: Avoid dereferencing iscsi_iser conn object when not bound to iser connection
Fix a possible NULL pointer dereference in disconnection flow. This
can happen if the target disconnected/rejected the connection request,
e.g before the binding stage between iscsi connection to the transport
connection.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:48:03 -08:00
Upinder Malhi
f809309a25 IB/usnic: Fix smatch endianness error
Error reported at http://marc.info/?l=linux-rdma&m=138995755801039&w=2

Fix short to int cast for big endian systems.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:47:29 -08:00
Eli Cohen
0861565f50 IB/mlx5: Remove dependency on X86
Remove Kconfig dependency of mlx5_ib/mlx5_core on X86, since there is
no such dependency in reality.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 20:48:02 -08:00
Mike Marciniszyn
2f75e12c44 IB/qib: Add missing serdes init sequence
Research has shown that commit a77fcf8950 ("IB/qib: Use a single
txselect module parameter for serdes tuning") missed a key serdes init
sequence.

This patch add that sequence.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:52:44 -08:00
Kumar Sanghvi
0f0132001f RDMA/cxgb4: Add missing neigh_release in LE-Workaround path
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:46:40 -08:00
Moni Shoua
b4a26a2728 IB: Report using RoCE IP based gids in port caps
For userspace RoCE UD QPs we need to know the GID format that the
kernel uses, e.g when working over older kernels. For that end, add a
new port capability IB_PORT_IP_BASED_GIDS and report it when query
port is issued.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:46:03 -08:00
Moni Shoua
ad4885d279 IB/mlx4: Build the port IBoE GID table properly under bonding
When scanning netdevices we need to check a few more conditions and
cases to build the IBoE GID table properly.  For example, under
bonding we must make sure that when a port is down, the bond IP
address isn't programmed as a GID, since doing so will cause failure
with IB core flows that selects ports by GID.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:09 -08:00
Moni Shoua
5071456fe2 IB/mlx4: Do IBoE GID table resets per-port
The IBoE code used to reset the GID table did it for all Ethernet
ports of the device.  Since the whole architecture of generating GIDs
and responding to events is port-based, this is inefficient and can
lead to wrong content in the GID table.  Change the reset flow to be
per-port.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Moni Shoua
ddf8bd3491 IB/mlx4: Do IBoE locking earlier when initializing the GID table
Updating the GID table under IBoE requires read/write from/to shared
data structures.  These data structures are protected with the device
iboe lock.  The flows that modify the GID table start from

    1. Initializing the GID table
    2. NETDEV events
    3. INET or INET6 events

This patch makes sure that the flow of initializing the GID table is
consistent with the other two flows w.r.t on what step the lock is taken.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Moni Shoua
4ce5a5744a IB/mlx4: Move rtnl locking to the right place
On the one hand, the invocation of netdev_master_upper_dev_get()
within mlx4_ib_scan_netdevs() must be done with rtnl lock held.  On
the other hand, it's wrong to call rtnl_lock() from within this
function since it's also called by our netdev notifier callback.
Therefore move the locking to mlx4_ib_add() so that both cases are
covered.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Moni Shoua
acc4fccf4e IB/mlx4: Make sure GID index 0 is always occupied
Make sure that for Ethernet ports, the port GID table index 0 is always
occupied with a default GID of the relevant IPv6 link-local adderss.

This provides better user experience for legacy applications that don't use
the RDMA CM and were working on index 0 prior to the IP addressing change.

Also, as GIDs are generated from IP addresses of the network devices that
are associated with the port, it's basically possible that the GID table
will be empty if no IP address was assigned.  This doesn't comply with the
IB spec section 4.1.1 "GID usage and properties".

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:31:08 -08:00
Matan Barak
4196670be7 IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device
When the device has only Ethernet ports, don't try to allocate range
of steerable UD QPs since they aren't needed.  This fixes an issue
where mlx4 VFs tried to allocate a range of UD steerable QPs, but
failed to do so.

Fixes: c1c9850112 ("IB/mlx4: Add support for steerable IB UD QPs")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 09:00:18 -08:00
Jingoo Han
9d8abf4594 IB/srpt: replace strict_strtoul() with kstrtoul()
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-12 15:14:40 -08:00
Nicholas Bellinger
a80e21b3b2 iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
This patch fixes a memory leak for fr_desc upon failure of
isert_create_fr_desc() in isert_conn_create_fastreg_pool()
code.

As reported by Coverity 1166659:

*** CID 1166659:  Resource leak  (RESOURCE_LEAK)
/drivers/infiniband/ulp/isert/ib_isert.c: 470 in isert_conn_create_fastreg_pool()
464                      isert_conn, isert_conn->conn_fr_pool_size);
465
466             return 0;
467
468     err:
469             isert_conn_free_fastreg_pool(isert_conn);
>>>     CID 1166659:  Resource leak  (RESOURCE_LEAK)
>>>     Variable "fr_desc" going out of scope leaks the storage it points to.
470             return ret;
471     }
472
473     static int
474     isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
475     {

Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-12 15:14:11 -08:00
Julia Lawall
ab576627c8 RDMA/amso1100: Fix error return code
Set the return variable to an error code as done elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-12 11:11:46 -08:00
Julia Lawall
d07875bd0d RDMA/nes: Fix error return code
Set the return variable to an error code as done elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-12 11:11:09 -08:00
Eli Cohen
1a4c3a3dc5 IB/mlx5: Don't set "block multicast loopback" capability
Currently Connect-IB does not support blocking multicast loopback, so
don't set IB_DEVICE_BLOCK_MULTICAST_LOOPBACK in the device caps.

Reported by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:09:48 -08:00
Eli Cohen
78c0f98cc9 IB/mlx5: Fix binary compatibility with libmlx5
Commit c1be5232d2 ("Fix micro UAR allocator") broke binary compatibility
between libmlx5 and mlx5_ib since it defines a different value to the number
of micro UARs per page, leading to wrong calculation in libmlx5. This patch
defines struct mlx5_ib_alloc_ucontext_req_v2 as an extension to struct
mlx5_ib_alloc_ucontext_req.  The extended size is determined in mlx5_ib_alloc_ucontext()
and in case of old library we use uuarn 0 which works fine -- this is
acheived due to create_user_qp() falling back from high to medium then to
low class where low class will return 0.  For new libraries we use the
more sophisticated allocation algorithm.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:00:48 -08:00
Eli Cohen
9e65dc371b IB/mlx5: Fix RC transport send queue overhead computation
Fix the RC QPs send queue overhead computation to take into account
two additional segments in the WQE which are needed for registration
operations.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-06 23:00:48 -08:00
Linus Torvalds
4e13c5d021 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights this round include:

  - add support for SCSI Referrals (Hannes)
  - add support for T10 DIF into target core (nab + mkp)
  - add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab)
  - add support for T10 DIF -> bio_integrity passthrough in IBLOCK backend (nab)
  - prep changes to iser-target for >= v3.15 T10 DIF support (Sagi)
  - add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn)
  - allow percpu_ida_alloc() to receive task state bitmask (Kent)
  - fix >= v3.12 iscsi-target session reset hung task regression (nab)
  - fix >= v3.13 percpu_ref se_lun->lun_ref_active race (nab)
  - fix a long-standing network portal creation race (Andy)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
  target: Fix percpu_ref_put race in transport_lun_remove_cmd
  target/iscsi: Fix network portal creation race
  target: Report bad sector in sense data for DIF errors
  iscsi-target: Convert gfp_t parameter to task state bitmask
  iscsi-target: Fix connection reset hang with percpu_ida_alloc
  percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask
  iscsi-target: Pre-allocate more tags to avoid ack starvation
  qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport
  qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with TCM/LIO.
  qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
  IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
  IB/isert: Move fastreg descriptor creation to a function
  IB/isert: Avoid frwr notation, user fastreg
  IB/isert: seperate connection protection domains and dma MRs
  tcm_loop: Enable DIF/DIX modes in SCSI host LLD
  target/rd: Add DIF protection into rd_execute_rw
  target/rd: Add support for protection SGL setup + release
  target/rd: Refactor rd_build_device_space + rd_release_device_space
  target/file: Add DIF protection support to fd_execute_rw
  target/file: Add DIF protection init/format support
  ...
2014-01-31 15:31:23 -08:00
Linus Torvalds
4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
Nicholas Bellinger
676687c696 iscsi-target: Convert gfp_t parameter to task state bitmask
This patch propigates the use of task state bitmask now used by
percpu_ida_alloc() up the iscsi-target callchain, replacing the
use of GFP_ATOMIC for TASK_RUNNING, and GFP_KERNEL for
TASK_INTERRUPTIBLE.

Also, drop the unnecessary gfp_t parameter to isert_allocate_cmd(),
and just pass TASK_INTERRUPTIBLE into iscsit_allocate_cmd().

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-25 06:58:52 +00:00
Roland Dreier
fb1b5034e4 Merge branch 'ip-roce' into for-next
Conflicts:
	drivers/infiniband/hw/mlx4/main.c
2014-01-22 23:24:21 -08:00
Roland Dreier
8f399921ea Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-01-22 23:24:13 -08:00
Eli Cohen
57761d8df8 IB/mlx5: Verify reserved fields are cleared
Verify that reserved fields in struct mlx5_ib_resize_cq are cleared
before continuing execution of the verb. This is required to allow
making use of this area in future revisions.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:54 -08:00
Eli Cohen
9e9c47d07d IB/mlx5: Allow creation of QPs with zero-length work queues
The current code attmepts to call ib_umem_get() even if the length is
zero, which causes a failure. Since the spec allows zero length work
queues, change the code so we don't call ib_umem_get() in those cases.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:53 -08:00
Eli Cohen
bde51583f4 IB/mlx5: Add support for resize CQ
Implement resize CQ which is a mandatory verb in mlx5.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:50 -08:00
Eli Cohen
3bdb31f688 IB/mlx5: Implement modify CQ
Modify CQ is used by ULPs like IPoIB to change moderation parameters.  This
patch adds support in mlx5.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:49 -08:00
Eli Cohen
ada388f7af IB/mlx5: Make sure doorbell record is visible before doorbell
Put a wmb() to make sure the doorbell record is visible to the HCA before we
hit doorbell.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:23:49 -08:00
Ding Tianhong
79adc5321e RDMA/nes: Slight optimization of Ethernet address compare
Use the possibly more efficient ether_addr_equal() instead of memcmp().

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:22:26 -08:00
Ira Weiny
6e0ea9e6cb IB/qib: Fix QP check when looping back to/from QP1
The GSI QP type is compatible with and should be allowed to send data
to/from any UD QP.  This was found when testing ibacm on the same node
as an SA.

Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:16:47 -08:00
Paul Bolle
298589b1cb RDMA/cxgb4: Fix gcc warning on 32-bit arch
Building mem.o for 32 bits x86 triggers a GCC warning:

    drivers/infiniband/hw/cxgb4/mem.c: In function '_c4iw_write_mem_dma_aligned':
    drivers/infiniband/hw/cxgb4/mem.c:79:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Silence that warning by casting "&wr_wait" to unsigned long before
casting it to __be64.  That's what _c4iw_write_mem_inline() already does.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:07:09 -08:00
Wei Yongjun
a384b20e41 IB/usnic: Remove unused includes of <linux/version.h>
Remove including <linux/version.h> that don't need it.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:05:51 -08:00
Svetlana Mavrina
d9d5713ca6 RDMA/amso1100: Add check if cache memory was allocated before freeing it
There is a path in handle_vq() where kmem_cache_free() can be called
with pointer to a local variable.  It can happen if vq_repbuf_alloc()
failed to allocate memory from cache and req is NULL.

The patch adds check if cache memory was allocated before freeing it.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Svetlana Mavrina <another.karnil@gmail.com>
Reviewed-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:03:59 -08:00
Michal Schmidt
437708c443 IPoIB: Report operstate consistently when brought up without a link
After booting without a working link, "ip link" shows:

 5: mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc
 pfifo_fast state DOWN qlen 256
    ...

Then after connecting and disconnecting the link, which should result
in exactly the same state as before, it shows:

 5: mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc
 pfifo_fast state LOWERLAYERDOWN qlen 256
    ...

Notice the (now correct) LOWERLAYERDOWN operstate shown for the
mlx4_ib1.8003 interface. Ideally the identical state would be shown
right after boot.

The problem is related to the calling of netif_carrier_off() in
network drivers.  For a long time it was known that doing
netif_carrier_off() before registering the netdevice would result in
the interface's operstate being shown as UNKNOWN if the device was
brought up without a working link. This problem was fixed in commit
8f4cccbbd9 ('net: Set device operstate at registration time'), but
still there remains the minor inconsistency demonstrated above.

This patch fixes it by moving ipoib's call to netif_carrier_off() into
the .ndo_open method, which is where network drivers ordinarily do it.
With the patch when doing the same test as above, the operstate of
mlx4_ib1.8003 is shown as LOWERLAYERDOWN right after boot.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:01:05 -08:00
Or Gerlitz
05633102d8 IB/core: Fix unused variable warning
Fix the below "make W=1" build warning:

    drivers/infiniband/core/iwcm.c: In function ‘destroy_cm_id’:
    drivers/infiniband/core/iwcm.c:330: warning: variable ‘ret’ set but not used

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 22:55:04 -08:00
Somnath Kotur
5462eddd7a RDMA/cma: Handle global/non-linklocal IPv6 addresses in cma_check_linklocal()
If addr is not a linklocal address, the code incorrectly fails to
return and ends up assigning the scope ID to the scope id of the
address, which is wrong.  Fix by checking if it's a link local address
first, and immediately return 0 if not.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 22:49:17 -08:00
Dan Carpenter
8ce96afa82 IB/usnic: Use GFP_ATOMIC under spinlock
This is called from qp_grp_and_vf_bind() and we are holding the
vf->lock so the allocation can't sleep.

Fixes: e3cf00d0a8 ('IB/usnic: Add Cisco VIC low-level hardware driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-21 10:47:56 -08:00
Bart Van Assche
93079162bf scsi_transport_srp: Fix a race condition
The rport timers must be stopped before the SRP initiator destroys the
resources associated with the SCSI host. This is necessary because
otherwise the callback functions invoked from the SRP transport layer
could trigger a use-after-free. Stopping the rport timers before
invoking scsi_remove_host() can trigger long delays in the SCSI error
handler if a transport layer failure occurs while scsi_remove_host()
is in progress. Hence move the code for stopping the rport timers from
srp_rport_release() into a new function and invoke that function after
scsi_remove_host() has finished. This patch fixes the following
sporadic kernel crash:

     kernel BUG at include/asm-generic/dma-mapping-common.h:64!
     invalid opcode: 0000 [#1] SMP
     RIP: 0010:[<ffffffffa03b20b1>]  [<ffffffffa03b20b1>] srp_unmap_data+0x121/0x130 [ib_srp]
     Call Trace:
     [<ffffffffa03b20fc>] srp_free_req+0x3c/0x80 [ib_srp]
     [<ffffffffa03b2188>] srp_finish_req+0x48/0x70 [ib_srp]
     [<ffffffffa03b21fb>] srp_terminate_io+0x4b/0x60 [ib_srp]
     [<ffffffffa03a6fb5>] __rport_fail_io_fast+0x75/0x80 [scsi_transport_srp]
     [<ffffffffa03a7438>] rport_fast_io_fail_timedout+0x88/0xc0 [scsi_transport_srp]
     [<ffffffff8108b370>] worker_thread+0x170/0x2a0
     [<ffffffff81090876>] kthread+0x96/0xa0
     [<ffffffff8100c0ca>] child_rip+0xa/0x20

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-21 10:46:17 -08:00
Roland Dreier
27cdef637c IB/mlx4: Use IS_ENABLED(CONFIG_IPV6)
...instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:18:49 -08:00
Roland Dreier
9392fa0641 RDMA/ocrdma: Add dependency on INET
Now that ocrdma supports IP-based addressing, we need to depend on
INET, since ocrdma registers itself for net device events.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:16:23 -08:00
Roland Dreier
31ab8acbf6 RDMA/ocrdma: Move ocrdma_inetaddr_event outside of "#if CONFIG_IPV6"
This fixes the build if IPV6 isn't enabled.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:14:05 -08:00
Matan Barak
f282651de6 IB/mlx4: Add dependency INET
Since mlx4_ib supports IP based addressing, a dependency on INET needs
to be added, since mlx4_ib registers itself for net device events.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:14:05 -08:00
Wei Yongjun
990acea616 IB/cm: Fix missing unlock on error in cm_init_qp_rtr_attr()
Add the missing unlock before return from function cm_init_qp_rtr_attr()
in the error handling case.

Fixes: dd5f03beb4 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:14:04 -08:00
Matan Barak
2f85d24e60 IB/core: Make ib_addr a core IB module
IP based addressing introduces the usage of rdma_addr_find_dmac_by_grh()
within ib_core.  Since this function is declared in ib_addr, ib_addr
should be a part of the core INFINIBAND modules, rather than
INFINIBAND_ADDR_TRANS.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:14:04 -08:00
Or Gerlitz
ed4c54e5b4 IB/core: Resolve Ethernet L2 addresses when modifying QP
Existing user space applications provide only IBoE L3 address
attributes to the kernel when they issue a modify QP modify.  To work
with them and let such apps (plus kernel consumers which don't use the
RDMA-CM) keep working transparently under the IBoE GID IP addressing
changes, add an Eth L2 address resolution helper.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:14:04 -08:00
Moni Shoua
37721d8501 RDMA/ocrdma: Populate GID table with IP based gids
This patch is similar in spirit to the "IB/mlx4: Use IBoE (RoCE) IP
based GIDs in the port GID table" patch.

Changes to inet4 and inet6 addresses for the host are monitored and if
the address is associated with an ocrdma device then a gid is added or
deleted from the device's gid table. The gid format will be a IPv4 to
IPv6 mapped or the IPv6 address.

Cc: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:14:01 -08:00
Moni Shoua
40aca6ffca RDMA/ocrdma: Handle Ethernet L2 parameters for IP based GID addressing
This patch is similar in spirit to the "IB/mlx4: Handle Ethernet L2
parameters for IP based GID addressing".  It handles the fact that IP
based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN.

When building an address handle, instead of parsing the dgid to
get the MAC and VLAN, take them from the address handle attributes.

Cc: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19 15:13:58 -08:00
Sagi Grimberg
9bd626e79d IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
This routine may help for protection registration as well.
This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:09 +00:00
Sagi Grimberg
dc87a90f5d IB/isert: Move fastreg descriptor creation to a function
This routine may be called both by fast registration
descriptors for data and for integrity buffers.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:08 +00:00
Sagi Grimberg
a3a5a82627 IB/isert: Avoid frwr notation, user fastreg
Use fast registration lingo. fast registration will
also incorporate signature/DIF registration.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:07 +00:00
Sagi Grimberg
eb6ab13267 IB/isert: seperate connection protection domains and dma MRs
It is more correct to seperate connections protection domains
and dma_mr handles. protection information support requires to
do so.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:07 +00:00
Moni Shoua
297e0dad72 IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing
IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN.

Therefore, we need to extract them from the CQE and place them in
struct ib_wc (to be used for cases were they were taken from the gid).

Also, when modifying a QP or building address handle, instead of
parsing the dgid to get the MAC and VLAN, take them from the address
handle attributes.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 14:12:53 -08:00
Moni Shoua
d487ee7774 IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table
Currently, the mlx4 driver set IBoE (RoCE) gids to encode related
Ethernet netdevice interface MAC address and possibly VLAN id.

Change this scheme such that gids encode interface IP addresses (both
IP4 and IPv6).

This requires learning the IP addresses which are of use by a
netdevice associated with the HCA port, formatting them to gids and
adding them to the port gid table.  Furthermore, events of add and
delete address are caught to maintain the gid table accordingly.

Associated IP addresses may belong to a master of an Ethernet
netdevice on top of that port so this should be considered when
building and maintaining the gid table.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 14:12:52 -08:00
Moni Shoua
7b85627b9f IB/cma: IBoE (RoCE) IP-based GID addressing
Currently, the IB core and specifically the RDMA-CM assumes that IBoE
(RoCE) gids encode related Ethernet netdevice interface MAC address
and possibly VLAN id.

Change GIDs to be treated as they encode interface IP address.

Since Ethernet layer 2 address parameters are not longer encoded
within gids, we have to extend the Infiniband address structures (e.g.
ib_ah_attr) with layer 2 address parameters, namely mac and vlan.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 14:12:35 -08:00
Julia Lawall
af2e2e35a2 IB/mlx4: Fix error return code
Set the return variable to an error code as done elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 13:51:33 -08:00
Wei Yongjun
d1db47c5ee IB/usnic: Remove unused variable in usnic_debugfs_exit()
The variable qp_grp is initialized but never used otherwise, so remove
the unused variable.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 13:50:14 -08:00
Upinder Malhi
6dcebe614c IB/usnic: Set userspace/kernel ABI ver to 4
usNIC userspace/kernel ABI should be set to 4 instead of 3.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 13:48:54 -08:00
Upinder Malhi
61f7826893 IB/usnic: Advertise usNIC devices as RDMA_NODE_USNIC_UDP
usNIC default transport is UDP.  Hence, advertise RDMA_NODE_USNIC_UDP
by default for usNIC devices.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 13:48:54 -08:00
Upinder Malhi
5db5765e25 IB/core: Add support for RDMA_NODE_USNIC_UDP
Add the complementary RDMA_NODE_USNIC_UDP for RDMA_TRANSPORT_USNIC_UDP.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 13:48:54 -08:00
Upinder Malhi
2d97436f5b IB/usnic: Add dependency on CONFIG_INET
usNIC needs inet notifiers to function correctly, so add a Kconfig
dependency on CONFIG_INET.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 13:48:54 -08:00
Upinder Malhi
4942c0b4b6 IB/usnic: Fix endianness-related warnings
Fix sparse endianness related warnings.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18 13:48:54 -08:00
Aruna-Hewapathirane
63862b5bef net: replace macros net_random and net_srandom with direct calls to prandom
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:15:25 -08:00
Matan Barak
dd5f03beb4 IB/core: Ethernet L2 attributes in verbs/cm structures
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.

On the active side, the CM fills. its internal structures from the
path provided by the ULP.  We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC
associated with the REQ message.  We add there taking the ETH L2
attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB.  Vendor drivers are modified to support the new
function signature.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:20:54 -08:00
Matan Barak
c1c9850112 IB/mlx4: Add support for steerable IB UD QPs
This patch adds support for steerable (NETIF) QP creation.  When we
create the device, we allocate a range of steerable QPs.

Afterward when a QP is created with the NETIF flag, it's allocated
from this range.  Allocation is managed by bitmap allocator.

Internal steering rules for those QPs is automatically generated on
their creation.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:06:50 -08:00
Matan Barak
a37a1a4284 IB/mlx4: Add mechanism to support flow steering over IB links
The mlx4 device requires adding IB flow spec to rules that apply over
infiniband link layer.  This patch adds a mechanism to add such a rule.

If higher levels e.g. IP/UDP/TCP flow specs are provided, the device
requires us to add an empty wild-carded IB rule. Furthermore, the device
requires the QPN to be put in the rule.

Add here specific parsing support for IB empty rules and the ability
to self-generate missing specs based on existing ones.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:06:50 -08:00
Matan Barak
0a9b7d59d5 IB/mlx4: Enable device-managed steering support for IB ports too
Up until now, flow steering wasn't supported when using IB ports.

This patch enables support for flow steering if all hardware ports
support that, for example the new MLX4_DEV_CAP_FLAG2_DMFS_IPOIB mlx4
device capability.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:06:50 -08:00
Matan Barak
90f1d1b41b IB/core: Add flow steering support for IPoIB UD traffic
When creating an IPoIB UD QP, provide a hint to the low level driver
that the QP should support flow-steering.  This means that privileged
user space applications can steer TCP/IP IPoIB traffic from the
network stack, in a similar manner done with Ethernet RAW_PACKET QPs.

The hint is provided through new QP creation flag called NETIF_QP.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:06:50 -08:00
Eli Cohen
c1be5232d2 IB/mlx5: Fix micro UAR allocator
The micro UAR (uuar) allocator had a bug which resulted from the fact
that in each UAR we only have two micro UARs avaialable, those at
index 0 and 1.  This patch defines iterators to aid in traversing the
list of available micro UARs when allocating a uuar.

In addition, change the logic in create_user_qp() so that if high
class allocation fails (high class means lower latency), we revert to
medium class and not to the low class.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 13:54:23 -08:00
Eli Cohen
d9fe409163 IB/mlx5: Remove unused code in mr.c
The variable start in struct mlx5_ib_mr is never used. Remove it.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 13:54:23 -08:00
Upinder Malhi
3108bccb3d IB/usnic: Append documentation to usnic_transport.h and cleanup
Add comment describing usnic_transport_rsrv port and remove
extraneous space from usnic_transport.c.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:51:00 -08:00
Roland Dreier
c30392ab5b IB/usnic: Fix typo "Ignorning" -> "Ignoring"
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:46 -08:00
Upinder Malhi
9f637f7936 IB/usnic: Expose flows via debugfs
Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:46 -08:00
Upinder Malhi
c5f855e08a IB/usnic: Use for_each_sg instead of a for-loop
Use for_each_sg() instead of an explicit for-loop to iterate over
scatter-gather list.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:46 -08:00
Upinder Malhi
6a54d9f9a0 IB/usnic: Remove superflous parentheses
Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:45 -08:00
Upinder Malhi
248567f793 IB/core: Add RDMA_TRANSPORT_USNIC_UDP
Add RDMA_TRANSPORT_USNIC_UDP which will be used by usNIC.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:45 -08:00
Upinder Malhi
e45e614e40 IB/usnic: Add UDP support in usnic_ib_qp_grp.[hc]
UDP support for qp_grps/qps.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:44 -08:00
Upinder Malhi
c7845bcafe IB/usnic: Add UDP support in u*verbs.c, u*main.c and u*util.h
Add supports for:
	1) Parsing the socket file descriptor pass down from userspace.
	2) IP notifiers
	3) Encoding the IP in the GID
	4) Other aux. changes to support UDP

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:44 -08:00
Upinder Malhi
6214105460 IB:usnic: Add UDP support to usnic_transport.[hc]
This patch provides API for rest of usNIC code to increment or decrement
socket's reference count. Auxiliary socket APIs are also provided.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:44 -08:00
Upinder Malhi
3f92bed3d6 IB/usnic: Add UDP support to usnic_fwd.[hc]
Add *ip field* to *struct usnic_fwd_dev* as well as new *functions* to
manipulate the *ip field.*  Furthermore, add new functions for
programming UDP flows in the forwarding device.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:43 -08:00
Upinder Malhi
b85caf479b IB/usnic: Update ABI and Version file for UDP support
Expand the kernel/userspace interface so userspace may push down
a socket file descriptor to usNIC.  Also, bump up the abi and version
numbers.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:43 -08:00
Upinder Malhi
60b215e8b2 IB/usnic: Port over sysfs to new usnic_fwd.h
This patch ports usnic_ib_sysfs.c to the new interface of
usnic_fwd.h.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:42 -08:00
Upinder Malhi
256d6a6ac5 IB/usnic: Port over usnic_ib_qp_grp.[hc] to new usnic_fwd.h
This patch ports usnic_ib_qp_grp.[hc] to the new interface
of usnic_fwd.h.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:42 -08:00
Upinder Malhi
8af94ac66a IB/usnic: Port over main.c and verbs.c to the usnic_fwd.h
This patch ports usnic_ib_main.c, usnic_ib_verbs.c and usnic_ib.h
to the new interface of usnic_fwd.h.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:42 -08:00
Upinder Malhi
2183b990b6 IB/usnic: Push all forwarding state to usnic_fwd.[hc]
Push all of the usnic device forwarding state - such as mtu, mac - to
usnic_fwd_dev.  Furthermore, usnic_fwd.h exposes a improved interface
for rest of the usnic code.  The primary improvement is that
usnic_fwd.h's flow management interface takes in high-level *filter*
and *action* structures now, instead of low-level paramaters such as
vnic_idx, rq_idx.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:41 -08:00
Upinder Malhi
301a0dd68e IB/usnic: Add struct usnic_transport_spec
Add *struct usnic_transport_spec* for passing around transport
specifications.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:41 -08:00
Upinder Malhi
8192d4acb5 IB/usnic: Change WARN_ON to lockdep_assert_held
usNIC calls WARN_ON(spin_is_locked..) at few places.  In some of these
instances, the call is made while holding a spinlock.  Change
all WARN_ON(spin_is_locked...) calls in usNIC to
lockdep_assert_held to make it fool-proof bc the latter can be
called while holding a spinlock and unlike spin_is_locked,
lockdep_assert_held also works correctly on UP.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:40 -08:00
Upinder Malhi
e3cf00d0a8 IB/usnic: Add Cisco VIC low-level hardware driver
This adds a driver that allows userspace to use UD-like QPs over a
proprietary Cisco transport with Cisco's Virtual Interface Cards (VICs),
including VIC 1240 and 1280 cards.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 00:44:28 -08:00
Devesh Sharma
be8348df6e RDMA/ocrdma: Fix OCRDMA_GEN2_FAMILY macro definition
OCRDMA_GEN2_FAMILY is wrongly defined as 0x02 -- it should be 0x0F.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-13 13:14:33 -08:00
Devesh Sharma
fe5e8a1acc RDMA/ocrdma: Fix AV_VALID bit position
Fix ah->av->valid bit position and big endian portability.

Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-13 13:14:33 -08:00
David S. Miller
56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
Hangbin Liu
0d68fc4f12 infiniband: make sure the src net is infiniband when create new link
When we create a new infiniband link with uninfiniband device, e.g. `ip link
add link em1 type ipoib pkey 0x8001`. We will get a NULL pointer dereference
cause other dev like Ethernet don't have struct ib_device.

The code path is:
rtnl_newlink
  |-- ipoib_new_child_link
        |-- __ipoib_vlan_add
              |-- ipoib_set_dev_features
                    |-- ib_query_device

Fix this bug by make sure the src net is infiniband when create new link.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 20:38:56 -05:00
Linus Torvalds
67e0c1b037 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Some holiday bug fixes for 3.13...  There is still one bug I'd like to
  get fixed before 3.13-final.

  The vlan code erroneously assignes the header ops of the underlying
  real device to the VLAN device above it when the real device can
  hardware offload VLAN handling.  That's completely bogus because
  header ops are tied to the device type, so they only expect to see a
  'dev' argument compatible with their ops.

  The fix is the have the VLAN code use a special set of header ops that
  does the pass-thru correctly, by calling the underlying real device's
  header ops but _also_ passing in the real device instead of the VLAN
  device.

  That fix is currently waiting some testing.

  Anyways, of note here:

   1) Fix bitmap edge case in radiotap, from Johannes Berg.

   2) Fix oops on driver unload in rtlwifi, from Larry Finger.

   3) Bonding doesn't do locking correctly during speed/duplex/link
      changes, from Ding Tianhong.

   4) Fix header parsing in GRE code, this bug has been around for a few
      releases.  From Timo Teräs.

   5) SIT tunnel driver MTU check needs to take GSO into account, from
      Eric Dumazet.

   6) Minor info leak in inet_diag, from Daniel Borkmann.

   7) Info leak in YAM hamradio driver, from Salva Peiró.

   8) Fix route expiration state handling in ipv6 routing code, from Li
      RongQing.

   9) DCCP probe module does not check request_module()'s return value,
      from Wang Weidong.

  10) cpsw driver passes NULL device names to request_irq(), from
      Mugunthan V N.

  11) Prevent a NULL splat in RDS binding code, from Sasha Levin.

  12) Fix 4G overflow test in tg3 driver, from Nithin Sujir.

  13) Cure use after free in arc_emac and fec driver's software
      timestamp handling, from Eric Dumazet.

  14) SIT driver can fail to release the route when
      iptunnel_handle_offloads() throws an error.  From Li RongQing.

  15) Several batman-adv fixes from Simon Wunderlich and Antonio
      Quartulli.

  16) Fix deadlock during TIPC socket release, from Ying Xue.

  17) Fix regression in ROSE protocol recvmsg() msg_name handling, from
      Florian Westphal.

  18) stmmac PTP support releases wrong spinlock, from Vince Bridgers"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
  stmmac: Fix incorrect spinlock release and PTP cap detection.
  phy: IRQ cannot be shared
  net: rose: restore old recvmsg behavior
  xen-netback: fix guest-receive-side array sizes
  fec: Do not assume that PHY reset is active low
  tipc: fix deadlock during socket release
  netfilter: nf_tables: fix wrong datatype in nft_validate_data_load()
  batman-adv: fix vlan header access
  batman-adv: clean nf state when removing protocol header
  batman-adv: fix alignment for batadv_tvlv_tt_change
  batman-adv: fix size of batadv_bla_claim_dst
  batman-adv: fix size of batadv_icmp_header
  batman-adv: fix header alignment by unrolling batadv_header
  batman-adv: fix alignment for batadv_coded_packet
  netfilter: nf_tables: fix oops when updating table with user chains
  netfilter: nf_tables: fix dumping with large number of sets
  ipv6: release dst properly in ipip6_tunnel_xmit
  netxen: Correct off-by-one errors in bounds checks
  net: Add some clarification to skb_tx_timestamp() comment.
  arc_emac: fix potential use after free
  ...
2013-12-30 09:33:30 -08:00
dingtianhong
c5266d40b0 infiniband: slight optimization of addr compare
Use the possibly more efficient ether_addr_equal
to instead of memcmp.

Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-26 13:31:34 -05:00
Linus Torvalds
6961bc6c70 Last batch of InfiniBand/RDMA changes for 3.13 / 2014:
- Additional checks for uverbs to ensure forward compatibility, handle
    malformed input better.
  - Fix potential use-after-free in iWARP connection manager.
  - Make a function static.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSuHFcAAoJEENa44ZhAt0hetAP/2lMjThZ/dK/Q4eyGXr86qQ6
 ygjFc26H7SPEqZ2cjmcm0mgObHhVj4+94YiJDcemHpjwfzLhhZywZxjySTTJdKna
 ZSzmSb8si0KNdPAoEdldPmtmO04oTI0+mz7kK2sz9mbn033xDVcJ24M8jSUy6/dl
 KBmYWCaiE6rabSyvRNwXOIFilj5RwomQFZzJMjg9JGvWY5qpmwKbcvVdZMCSzAwS
 VsZ2Z5UEDaghsGtnYjKQzvR1TDGeq6XyWtR4rfjCH19U6rUcInbA1oZFDWcOqE3C
 r/o9jOCUszAVefG3iPX5PJ+A3kKCHB2lcDK9owfoIM/cLfV8KWBFkp0WouD5wiGr
 8sfENcKI7aMq+Rptsa24FDNkJVCcdG1HhribRqWY/mn4CwI4c3Qe+5s4x+kgHnxU
 VngX/Hp54lw+pnrgWWrKCT5tFVuDgdVhHqyYLabtnKrCHIPsb2Eqp8Kaz2JPNLkI
 OYWOcS17wzg7nAPoiqbMVfWInGmziqOpWqVZkFwZ+EBTfJIjE7td+Mxn46PzKsrO
 gbVYifn+q5cToDGwmK4pn14G1Xh45tqYW/P3KfZNwyGB6+Ui7u18Q5/epp8BHIWy
 8bn3jTcJzBIMlOmFxqirrJ2OixaeAWJhZjgAMJIA3So9MHwmhbkJfgo0J+S6jLnI
 p1WRy0kTmPC5NNX21YN8
 =bol6
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband fixes from Roland Dreier:
 "Last batch of InfiniBand/RDMA changes for 3.13 / 2014:
   - Additional checks for uverbs to ensure forward compatibility,
     handle malformed input better.
   - Fix potential use-after-free in iWARP connection manager.
   - Make a function static"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/uverbs: Check access to userspace response buffer in extended command
  IB/uverbs: Check input length in flow steering uverbs
  IB/uverbs: Set error code when fail to consume all flow_spec items
  IB/uverbs: Check reserved fields in create_flow
  IB/uverbs: Check comp_mask in destroy_flow
  IB/uverbs: Check reserved field in extended command header
  IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA()
  IB/core: const'ify inbuf in struct ib_udata
  RDMA/iwcm: Don't touch cm_id after deref in rem_ref
  RDMA/cxgb4: Make _c4iw_write_mem_dma() static
2013-12-23 17:23:42 -08:00
Roland Dreier
22f12c60e1 Merge branches 'cxgb4', 'flowsteer' and 'misc' into for-linus 2013-12-23 09:19:02 -08:00
Kumar Sanghvi
41b4f86c13 RDMA/cxgb4: Use cxgb4_select_ntuple to correctly calculate ntuple fields
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-22 18:09:08 -05:00
Kumar Sanghvi
8c04469057 RDMA/cxgb4: Server filters are supported only for IPv4
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-22 18:09:08 -05:00
Kumar Sanghvi
a4ea025fc2 RDMA/cxgb4: Calculate the filter server TID properly
Based on original work by Santosh Rastapur <santosh@chelsio.com>

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-22 18:09:08 -05:00
Yann Droneaud
6cc3df840a IB/uverbs: Check access to userspace response buffer in extended command
This patch adds a check on the output buffer with access_ok(VERIFY_WRITE, ...)
to ensure the whole buffer is in userspace memory before using the
pointer in uverbs functions.  If the buffer or a subset of it is not
valid, returns -EFAULT to the caller.

This will also catch invalid buffer before the final call to
copy_to_user() which happen late in most uverb functions.

Just like the check in read(2) syscall, it's a sanity check to detect
invalid parameters provided by userspace. This particular check was added
in vfs_read() by Linus Torvalds for v2.6.12 with following commit message:

https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=fd770e66c9a65b14ce114e171266cf6f393df502

  Make read/write always do the full "access_ok()" tests.

  The actual user copy will do them too, but only for the
  range that ends up being actually copied. That hides
  bugs when the range has been clamped by file size or other
  issues.

Note: there's no need to check input buffer since vfs_write() already does
access_ok(VERIFY_READ, ...) as part of write() syscall.

Link: http://marc.info/?i=cover.1387273677.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:34 -08:00
Yann Droneaud
6bcca3d4a3 IB/uverbs: Check input length in flow steering uverbs
Since ib_copy_from_udata() doesn't check yet the available input data
length before accessing userspace memory, an explicit check of this
length is required to prevent:

- reading past the user provided buffer,
- underflow when subtracting the expected command size from the input
  length.

This will ensure the newly added flow steering uverbs don't try to
process truncated commands.

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:33 -08:00
Yann Droneaud
98a37510ec IB/uverbs: Set error code when fail to consume all flow_spec items
If the flow_spec items parsed count does not match the number of items
declared in the flow_attr command, or if not all bytes are used for
flow_spec items (eg. trailing garbage), a log message is reported and
the function leave through the error path. Unfortunately the error
code is currently not set.

This patch set error code to -EINVAL in such cases, so that the error
is reported to userspace instead of silently fail.

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:33 -08:00
Yann Droneaud
c780d82a74 IB/uverbs: Check reserved fields in create_flow
As noted by Daniel Vetter in its article "Botching up ioctls"[1]

  "Check *all* unused fields and flags and all the padding for whether
   it's 0, and reject the ioctl if that's not the case.  Otherwise
   your nice plan for future extensions is going right down the
   gutters since someone *will* submit an ioctl struct with random
   stack garbage in the yet unused parts. Which then bakes in the ABI
   that those fields can never be used for anything else but garbage."

It's important to ensure that reserved fields are set to known value,
so that it will be possible to use them latter to extend the ABI.

The same reasonning apply to comp_mask field present in newer uverbs
command: per commit 22878dbc91 ("IB/core: Better checking of
userspace values for receive flow steering"), unsupported values in
comp_mask are rejected.

[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:32 -08:00
Yann Droneaud
2782c2d302 IB/uverbs: Check comp_mask in destroy_flow
Just like the check added to create_flow in 22878dbc91 ("IB/core:
Better checking of userspace values for receive flow steering"),
comp_mask must be checked in destroy_flow too.

Since only empty comp_mask is currently supported, any other value
must be rejected.

This check was silently added in a previous patch[1] to move comp_mask
in extended command header, part of previous patchset[2] against
create/destroy_flow uverbs. The idea of moving comp_mask to the header
was discarded for the final patchset[3].

Unfortunately the check added in destroy_flow uverb was not integrated
in the final patchset.

[1] http://marc.info/?i=40175eda10d670d098204da6aa4c327a0171ae5f.1381510045.git.ydroneaud@opteya.com
[2] http://marc.info/?i=cover.1381510045.git.ydroneaud@opteya.com
[3] http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

Cc: Matan Barak <matanb@mellanox.com>
Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:31 -08:00
Yann Droneaud
7efb1b19b3 IB/uverbs: Check reserved field in extended command header
As noted by Daniel Vetter in its article "Botching up ioctls"[1]

  "Check *all* unused fields and flags and all the padding for whether
   it's 0, and reject the ioctl if that's not the case.  Otherwise
   your nice plan for future extensions is going right down the
   gutters since someone *will* submit an ioctl struct with random
   stack garbage in the yet unused parts. Which then bakes in the ABI
   that those fields can never be used for anything else but garbage."

It's important to ensure that reserved fields are set to known value,
so that it will be possible to use them latter to extend the ABI.

The same reasonning apply to comp_mask field present in newer uverbs
command: per commit 22878dbc91 ("IB/core: Better checking of
userspace values for receive flow steering"), unsupported values in
comp_mask are rejected.

[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:54:30 -08:00
Roland Dreier
a96e4e2ffe IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA()
Trying to have a ternary operator to choose between NULL (or 0) and the
real pointer value in invocations leads to an impossible choice between
a sparse error about a literal 0 used as a NULL pointer, and a gcc
warning about "pointer/integer type mismatch in conditional expression."

Rather than clutter the source with more casts, move the ternary
operator into a new INIT_UDATA_BUF_OR_NULL() macro, which makes it
easier to use and simplifies its callers.

Reported-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-20 10:53:44 -08:00
Nicholas Bellinger
2853c2b667 iser-target: Move INIT_WORK setup into isert_create_device_ib_res
This patch moves INIT_WORK setup for cq_desc->cq_[rx,tx]_work into
isert_create_device_ib_res(), instead of being done each callback
invocation in isert_cq_[rx,tx]_callback().

This also fixes a 'INFO: trying to register non-static key' warning
when cancel_work_sync() is called before INIT_WORK has setup the
struct work_struct.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-19 00:18:43 -08:00
Yann Droneaud
309243ec14 IB/core: const'ify inbuf in struct ib_udata
Userspace input buffer is not modified by kernel, so it can be 'const'.

This is also a prerequisite to remove the implicit cast
from INIT_UDATA().

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-16 10:38:28 -08:00
Steve Wise
6b59ba609b RDMA/iwcm: Don't touch cm_id after deref in rem_ref
rem_ref() calls iwcm_deref_id(), which will wake up any blockers on
cm_id_priv->destroy_comp if the refcnt hits 0.  That will unblock
someone in iw_destroy_cm_id() which will free the cmid.  If that
happens before rem_ref() calls test_bit(IWCM_F_CALLBACK_DESTROY,
&cm_id_priv->flags), then the test_bit() will touch freed memory.

The fix is to read the bit first, then deref.  We should never be in
iw_destroy_cm_id() with IWCM_F_CALLBACK_DESTROY set, and there is a
BUG_ON() to make sure of that.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-15 16:47:47 -08:00
Rashika
c00850dd6c RDMA/cxgb4: Make _c4iw_write_mem_dma() static
This patch marks the function _c4iw_write_mem_dma() as static
because it is not used outside this file, which fixes the warning:

    drivers/infiniband/hw/cxgb4/mem.c:176:5: warning: no previous prototype for ‘_c4iw_write_mem_dma’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-15 08:04:15 -08:00
Wei Yongjun
94a7111043 iser-target: fix error return code in isert_create_device_ib_res()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-11 11:54:47 -08:00
Linus Torvalds
b0e3636f65 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Things have been quiet this round with mostly bugfixes, percpu
  conversions, and other minor iscsi-target conformance testing changes.

  The highlights include:

   - Add demo_mode_discovery attribute for iscsi-target (Thomas)
   - Convert tcm_fc(FCoE) to use percpu-ida pre-allocation
   - Add send completion interrupt coalescing for ib_isert
   - Convert target-core to use percpu-refcounting for se_lun
   - Fix mutex_trylock usage bug in iscsit_increment_maxcmdsn
   - tcm_loop updates (Hannes)
   - target-core ALUA cleanups + prep for v3.14 SCSI Referrals support (Hannes)

  v3.14 is currently shaping to be a busy development cycle in target
  land, with initial support for T10 Referrals and T10 DIF currently on
  the roadmap"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (40 commits)
  iscsi-target: chap auth shouldn't match username with trailing garbage
  iscsi-target: fix extract_param to handle buffer length corner case
  iscsi-target: Expose default_erl as TPG attribute
  target_core_configfs: split up ALUA supported states
  target_core_alua: Make supported states configurable
  target_core_alua: Store supported ALUA states
  target_core_alua: Rename ALUA_ACCESS_STATE_OPTIMIZED
  target_core_alua: spellcheck
  target core: rename (ex,im)plict -> (ex,im)plicit
  percpu-refcount: Add percpu-refcount.o to obj-y
  iscsi-target: Do not reject non-immediate CmdSNs exceeding MaxCmdSN
  iscsi-target: Convert iscsi_session statistics to atomic_long_t
  target: Convert se_device statistics to atomic_long_t
  target: Fix delayed Task Aborted Status (TAS) handling bug
  iscsi-target: Reject unsupported multi PDU text command sequence
  ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call
  iscsi-target: Fix mutex_trylock usage in iscsit_increment_maxcmdsn
  target: Core does not need blkdev.h
  target: Pass through I/O topology for block backstores
  iser-target: Avoid using FRMR for single dma entry requests
  ...
2013-11-22 10:52:03 -08:00
Linus Torvalds
1ea406c0e0 Main batch of InfiniBand/RDMA changes for 3.13:
- Re-enable flow steering verbs with new improved userspace ABI
  - Fixes for slow connection due to GID lookup scalability
  - IPoIB fixes
  - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
  - Further improvements to SRP error handling
  - Add new transport type for Cisco usNIC
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSil7BAAoJEENa44ZhAt0hbtgP/A+AmUalbOX6ZKzuOFxsrtY2
 r55CX9b1JBeFM/Zhn2o6y+81lpCjkckJSggESMe4izNgocGw0nW4vYGN4SBynatj
 y8sR9OSn+G3ihuENrzG41MJUGEa5WbcNMy4boN+Oa+qyTlV/WjLR7Fv4WbikK7Wm
 o8FNlXiiDhMoGfHHG5J0MD0EQsnxuLDk2XP+ciu4tLtTs+wBka+gFK8WnMvztle3
 gTeMNna5ilvCS2fdBxteuPA3KeDnJE9AgJSMJ2a4Rh+DR8uTgWYQ6n7amjmOc546
 yhAKkoBkxPE10+Yj82WOPhCFxSeWcuSwJvpgv5dTVZ1XqUUcC1V3TEcZDHmyyHQ7
 uPXgS1A+erBW3OYPBjZqtKvnHObscV12fL+rId3vIhcAQIbFroci08ZwPidEYRkn
 fvwlEKcrIsBIpRXEyjlFCxsiiDnfq1wC1VayMR3jrIK0P6idf1SXf/geiRp9+RGT
 wKUc0j51jvEx29qc65xuhEP9FQV9pCMxyd+FEE0d0KkjMz5hsIkjmcUcBbgF0CGg
 GEyDPlgRLv+vmWDGpT8XraaV/0CJOEQDIgB4WSN87/AZ4UoNt7spW2xqsLsp1toy
 5e0100tpWUleTPLe/Wig5GtBdagQ2jAUK1+186CP93pFPtkwc4/7X3hyp7qPIPTz
 VDvT9DEy6zjSMCLpMcdo
 =xxC+
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:
 - Re-enable flow steering verbs with new improved userspace ABI
 - Fixes for slow connection due to GID lookup scalability
 - IPoIB fixes
 - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
 - Further improvements to SRP error handling
 - Add new transport type for Cisco usNIC

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
  IB/core: Re-enable create_flow/destroy_flow uverbs
  IB/core: extended command: an improved infrastructure for uverbs commands
  IB/core: Remove ib_uverbs_flow_spec structure from userspace
  IB/core: Use a common header for uverbs flow_specs
  IB/core: Make uverbs flow structure use names like verbs ones
  IB/core: Rename 'flow' structs to match other uverbs structs
  IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
  IB/ucma: Convert use of typedef ctl_table to struct ctl_table
  IB/cm: Convert to using idr_alloc_cyclic()
  IB/mlx5: Fix page shift in create CQ for userspace
  IB/mlx4: Fix device max capabilities check
  IB/mlx5: Fix list_del of empty list
  IB/mlx5: Remove dead code
  IB/core: Encorce MR access rights rules on kernel consumers
  IB/mlx4: Fix endless loop in resize CQ
  RDMA/cma: Remove unused argument and minor dead code
  RDMA/ucma: Discard events for IDs not yet claimed by user space
  IB/core: Add Cisco usNIC rdma node and transport types
  RDMA/nes: Remove self-assignment from nes_query_qp()
  IB/srp: Report receive errors correctly
  ...
2013-11-18 15:36:04 -08:00
Roland Dreier
b4fdf52b3f Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next 2013-11-17 08:22:19 -08:00
Matan Barak
69ad5da41b IB/core: Re-enable create_flow/destroy_flow uverbs
This commit reverts commit 7afbddfae9 ("IB/core: Temporarily disable
create_flow/destroy_flow uverbs").  Since the uverbs extensions
functionality was experimental for v3.12, this patch re-enables the
support for them and flow-steering for v3.13.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-17 08:22:09 -08:00
Yann Droneaud
f21519b23c IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc9658 ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.

According to the commit 400dbc9658, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.

But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].

So the following patch is an attempt to a revised extensible command
infrastructure.

This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.

Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.

Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).

So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).

The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.

Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.

The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.

Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:

                             legacy      extended

   Maximum command buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
  Maximum response buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)

For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".

One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.

The proposed scheme will format input (command) and output (response)
buffers this way:

- command:

  legacy header +
  extended header +
  command data (core + hw):

    +----------------------------------------+
    | flags     |   00      00    |  command |
    |        in_words    |   out_words       |
    +----------------------------------------+
    |                 response               |
    |                 response               |
    | provider_in_words | provider_out_words |
    |                 padding                |
    +----------------------------------------+
    |                                        |
    .              <uverbs input>            .
    .              (in_words * 8)            .
    |                                        |
    +----------------------------------------+
    |                                        |
    .             <provider input>           .
    .          (provider_in_words * 8)       .
    |                                        |
    +----------------------------------------+

- response, if present:

    +----------------------------------------+
    |                                        |
    .          <uverbs output space>         .
    .             (out_words * 8)            .
    |                                        |
    +----------------------------------------+
    |                                        |
    .         <provider output space>        .
    .         (provider_out_words * 8)       .
    |                                        |
    +----------------------------------------+

The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.

Note:

The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility).  This was suggested by Roland Dreier in a previous
review[2].  But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.

[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com

[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com

[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

[ Convert "ret ? ret : 0" to the equivalent "ret".  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-17 08:22:09 -08:00
Yann Droneaud
2490f20be4 IB/core: Remove ib_uverbs_flow_spec structure from userspace
The structure holding any types of flow_spec is of no use to
userspace.  It would be wrong for userspace to do:

  struct ib_uverbs_flow_spec flow_spec;

  flow_spec.type = IB_FLOW_SPEC_TCP;
  flow_spec.size = sizeof(flow_spec);

Instead, userspace should use the dedicated flow_spec structure for
  - Ethernet : struct ib_uverbs_flow_spec_eth,
  - IPv4     : struct ib_uverbs_flow_spec_ipv4,
  - TCP/UDP  : struct ib_uverbs_flow_spec_tcp_udp.

In other words, struct ib_uverbs_flow_spec is a "virtual" data
structure that can only be use by the kernel as an alias to the other.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-17 08:22:08 -08:00
Yann Droneaud
b68c956021 IB/core: Make uverbs flow structure use names like verbs ones
This patch adds "flow" prefix to most of data structure added as part
of commit 436f2ad05a ("IB/core: Export ib_create/destroy_flow through
uverbs") to keep those names in sync with the data structures added in
commit 319a441d13 ("IB/core: Add receive flow steering support").

It's just a matter of translating 'ib_flow' to 'ib_uverbs_flow'.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-17 08:22:08 -08:00
Yann Droneaud
d82693dad0 IB/core: Rename 'flow' structs to match other uverbs structs
Commit 436f2ad05a ("IB/core: Export ib_create/destroy_flow through
uverbs") added public data structures to support receive flow
steering.  The new structs are not following the 'uverbs' pattern:
they're lacking the common prefix 'ib_uverbs'.

This patch replaces ib_kern prefix by ib_uverbs.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-17 08:22:08 -08:00
Matan Barak
f884827438 IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
This patch fixes the following issues:

1. Unneeded checks were removed

2. Removed the fixed size out of flow_attr.size, thus simplifying the checks.

3. Remove a 32bit hole on 64bit systems with strict alignment in
   struct ib_kern_flow_att by adding a reserved field.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-17 08:22:07 -08:00
Joe Perches
f3a5e3e37e IB/ucma: Convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-16 10:32:24 -08:00
Zhao Hongjiang
ab626d1a68 IB/cm: Convert to using idr_alloc_cyclic()
Commit 3e6628c4b3 ("idr: introduce idr_alloc_cyclic()") adds a new
idr_alloc_cyclic() routine and converts several of these users to it.
This is just a missed one - add it.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-16 10:30:42 -08:00
Linus Torvalds
9073e1a804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual earth-shaking, news-breaking, rocket science pile from
  trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
  doc: add missing files to timers/00-INDEX
  timekeeping: Fix some trivial typos in comments
  mm: Fix some trivial typos in comments
  irq: Fix some trivial typos in comments
  NUMA: fix typos in Kconfig help text
  mm: update 00-INDEX
  doc: Documentation/DMA-attributes.txt fix typo
  DRM: comment: `halve' -> `half'
  Docs: Kconfig: `devlopers' -> `developers'
  doc: typo on word accounting in kprobes.c in mutliple architectures
  treewide: fix "usefull" typo
  treewide: fix "distingush" typo
  mm/Kconfig: Grammar s/an/a/
  kexec: Typo s/the/then/
  Documentation/kvm: Update cpuid documentation for steal time and pv eoi
  treewide: Fix common typo in "identify"
  __page_to_pfn: Fix typo in comment
  Correct some typos for word frequency
  clk: fixed-factor: Fix a trivial typo
  ...
2013-11-15 16:47:22 -08:00
Eli Cohen
cf1c5e1f1c IB/mlx5: Fix page shift in create CQ for userspace
When creating a CQ, we must use mlx5 adapter page shift.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-15 14:36:36 -08:00