Commit graph

489 commits

Author SHA1 Message Date
Parag Warudkar
01e8ef11bc x86: sysfs: kill owner field from attribute
Tejun's commit 7b595756ec made sysfs
attribute->owner unnecessary.  But the field was left in the structure to
ease the merge.  It's been over a year since that change and it is now
time to start killing attribute->owner along with its users - one arch at
a time!

This patch is attempt #1 to get rid of attribute->owner only for
CONFIG_X86_64 or CONFIG_X86_32 .  We will deal with other arches later on
as and when possible - avr32 will be the next since that is something I
can test.  Compile (make allyesconfig / make allmodconfig / custom config)
and boot tested.

akpm: the idea is that we put the declaration of sttribute.owner inside
`#ifndef CONFIG_X86'.  But that proved to be too ambitious for now because
new usages kept on turning up in subsystem trees.

[akpm: remove the ifdef for now]
Signed-off-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:42 -07:00
Greg Kroah-Hartman
91bd418fdc device create: infiniband: convert device_create_drvdata to device_create
Now that device_create() has been audited, rename things back to the
original call to be sane.

Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-16 09:24:42 -07:00
Roland Dreier
528051746b IB/mad: Use krealloc() to resize snoop table
Use krealloc() instead of kmalloc() followed by memcpy() when resizing
the MAD module's snoop table.

Also put parentheses around the new table size to avoid calculating
the wrong size to allocate, which fixes a bug pointed out by Haven
Hash <haven.hash@isilon.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-14 14:05:36 -07:00
Julien Brunel
6aea938f54 RDMA/ucma: Test ucma_alloc_multicast() return against NULL, not with IS_ERR()
In case of error, the function ucma_alloc_multicast() returns a NULL
pointer, but never returns an ERR pointer.  So after a call to this
function, an IS_ERR test should be replaced by a NULL test.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@match bad_is_err_test@
expression x, E;
@@

x = ucma_alloc_multicast(...)
... when != x = E
IS_ERR(x)
// </smpl>

Signed-off-by:  Julien Brunel <brunel@diku.dk>
Signed-off-by:  Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-10 12:00:19 -07:00
Roland Dreier
eedd5d0a70 Merge branches 'cma', 'cxgb3', 'ehca', 'ipath', 'ipoib', 'mad', 'misc', 'mlx4', 'mthca' and 'nes' into for-next 2008-10-09 17:41:15 -07:00
Hefty, Sean
a7e80ce26c IB/cm: Correctly free cm_device structure
commit 110cf374 ("infiniband: make cm_device use a struct device and
not a kobject.") introduced a memory leak, since it deleted
cm_release_dev_obj(), which was where cm_dev was freed.  Fix this by
freeing the leaked structure after calling device_unregister().

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-30 10:36:54 -07:00
Michael Brooks
7097228c54 IB/mad: Don't discard BMA responses in kernel
This fixes the problem of incoming BMA responses being dropped due to
a bad "is response" check.  Fix the test to use the ib_response_mad()
predicate, which correctly handles BMA MADs.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=988>.

Signed-off-by: Michael Brooks <michael.brooks@qlogic.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-20 20:06:16 -07:00
Roland Dreier
06a91a02e9 Merge branches 'cma', 'cxgb3', 'ipath', 'ipoib', 'mad' and 'mlx4' into for-linus 2008-08-07 14:12:03 -07:00
Julien Brunel
cd55ef5a10 IB/mad: Test ib_create_send_mad() return with IS_ERR(), not == NULL
In case of error, the function ib_create_send_mad() returns an ERR
pointer, but never returns a NULL pointer.  So testing the return
value for error should be done with IS_ERR, not by comparing with
NULL.

A simplified version of the semantic patch that makes this change is
as follows:

(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@correct_null_test@
expression x,E;
statement S1, S2;
@@
x = ib_create_send_mad(...)
<... when != x = E
if (
(
- x@p2 != NULL
+ ! IS_ERR ( x )
|
- x@p2 == NULL
+ IS_ERR( x )
)
 )
S1
else S2
...>
? x = E;
// </smpl>

Signed-off-by: Julien Brunel <brunel@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-07 14:11:56 -07:00
Roland Dreier
3f44675439 RDMA/cma: Remove padding arrays by using struct sockaddr_storage
There are a few places where the RDMA CM code handles IPv6 by doing

	struct sockaddr		addr;
	u8			pad[sizeof(struct sockaddr_in6) -
				    sizeof(struct sockaddr)];

This is fragile and ugly; handle this in a better way with just

	struct sockaddr_storage	addr;

[ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
  switch to struct sockaddr_storage and get rid of padding arrays in
  struct rdma_addr. ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-04 11:02:14 -07:00
Roland Dreier
5ba18b186c RDMA/ucm: BKL is not needed for ib_ucm_open()
Remove explicit cycle_kernel_lock() call and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:36:59 -07:00
Roland Dreier
f7a6117ee5 RDMA/ucma: BKL is not needed for ucma_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:36:59 -07:00
Linus Torvalds
5c402355ad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  MAINTAINERS: Remove Glenn Streiff from NetEffect entry
  mlx4_core: Improve error message when not enough UAR pages are available
  IB/mlx4: Add support for memory management extensions and local DMA L_Key
  IB/mthca: Keep free count for MTT buddy allocator
  mlx4_core: Keep free count for MTT buddy allocator
  mlx4_code: Add missing FW status return code
  IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg
  mlx4_core: Add module parameter to enable QoS support
  RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes
  IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures
  IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one()
  IB/ehca: Release mutex in error path of alloc_small_queue_page()
  IB/ehca: Use default value for Local CA ACK Delay if FW returns 0
  IB/ehca: Filter PATH_MIG events if QP was never armed
  IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event
  RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event
  RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event
2008-07-24 12:56:07 -07:00
Roland Dreier
2cc177364e Merge branches 'bkl-removal', 'cma', 'ehca', 'for-2.6.27', 'mlx4', 'mthca' and 'nes' into for-linus 2008-07-24 08:38:47 -07:00
Dotan Barak
1ca8d15619 RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes
Remove IB_ACCESS_LOCAL_WRITE from qp.qp_access_flags because this
attribute is only used to set remote permissions.

Signed-off-by: Dotan Barak <dotanba@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:34 -07:00
Ralph Campbell
64b784b583 IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one()
If update_sm_ah() fails, it leaves the port's sm_ah as NULL.  Then if
the device or module is removed, ib_sa_remove_one() will dereference a
NULL pointer when it calls kref_put().  Fix this by testing if sm_ah
is NULL before dropping the reference.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:33 -07:00
Amir Vadai
38ca83a588 RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event
Consumers that want to re-use their QPs in new connections need to
know when the QP has exited the timewait state.  Report the timewait
event through the rdma_cm.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:14:23 -07:00
Or Gerlitz
dd5bdff83b RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event
Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm
consumers that wish to have their RDMA sessions always use the same
links (eg <hca/port>) as the IP stack does.  In the current code, this
does not happen when bonding is used and fail-over happened but the IB
link used by an already existing session is operating fine.

Use the netevent notification for sensing that a change has happened
in the IP stack, then scan the rdma-cm ID list to see if there is an
ID that is "misaligned" with respect to the IP stack, and deliver
RDMA_CM_EVENT_ADDR_CHANGE for this ID.  The consumer can act on the
event or just ignore it.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:14:22 -07:00
Greg Kroah-Hartman
110cf374a8 infiniband: make cm_device use a struct device and not a kobject.
This object really should be a struct device, or at least contain a
pointer to a struct device, as it is trying to create a separate device
tree outside of the main device tree.  This patch fixes this problem.

It is needed for the class core rework that is being done in the driver
core.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:49 -07:00
Greg Kroah-Hartman
d4c4196f24 infiniband: rename "device" to "ib_device" in cm_device
This pointer really is a struct ib_device, not a struct device, so name
it properly to help prevent confusion.

This makes the followon patch in this series much smaller and easier to
understand as well.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:49 -07:00
Or Gerlitz
de910bd921 RDMA/cma: Simplify locking needed for serialization of callbacks
The RDMA CM has some logic in place to make sure that callbacks on a
given CM ID are delivered to the consumer in a serialized manner.
Specifically it has code to protect against a device removal racing
with a running callback function.

This patch simplifies this logic by using a mutex per ID instead of a
wait queue and atomic variable.  This means that cma_disable_remove()
now is more properly named to cma_disable_callback(), and
cma_enable_remove() can now be removed because it just would become a
trivial wrapper around mutex_unlock().

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:53 -07:00
Or Gerlitz
64c5e613b9 RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr
Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
and copy it in as part of rdma_copy_addr().  Use rdma_translate_ip()
in cma_new_conn_id() to reduce some code duplication and also make
sure the src_dev member gets set.

In a high-availability configuration the netdevice pointer can be used
by the RDMA CM to align RDMA sessions to use the same links as the IP
stack does under fail-over and route change cases.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:53 -07:00
Steve Wise
7f624d023b RDMA/core: Add iWARP protocol statistics attributes in sysfs
This patch adds a sysfs attribute group called "proto_stats" under
/sys/class/infiniband/$device/ and populates this group with protocol
statistics if they exist for a given device.  Currently, only iWARP
stats are defined, but the code is designed to allow InfiniBand
protocol stats if they become available.  These stats are per-device
and more importantly -not- per port.

Details:

- Add union rdma_protocol_stats in ib_verbs.h.  This union allows
  defining transport-specific stats.  Currently only iwarp stats are
  defined.

- Add struct iw_protocol_stats to define the current set of iwarp
  protocol stats.

- Add new ib_device method called get_proto_stats() to return protocol
  statistics.

- Add logic in core/sysfs.c to create iwarp protocol stats attributes
  if the device is an RNIC and has a get_proto_stats() method.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Roland Dreier
468f2239bc RDMA/cma: Add missing newlines to printk()s
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
2008-07-14 23:48:47 -07:00
Ralph Campbell
e5a5e7d59a IB/core: Reset to error QP state transition is not allowed
I was reviewing the QP state transition diagram in the IB 1.2.1 spec
and the code for qp_state_table[], and noticed that the code allows a
QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes
for figure 124 (pg 457) specifically says that this transition isn't
allowed.  This is a clarification from earlier versions of the IB
spec, which were ambiguous in this area and suggested that the RESET
to ERR transition was allowed.

Fix up the qp_state_table[] to make RESET->ERR not allowed.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:46 -07:00
Steve Wise
00f7ec36c9 RDMA/core: Add memory management extensions support
This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement).  The new operations are:

 - Allocate an ib_mr for use in fast register work requests.

 - Allocate/free a physical buffer lists for use in fast register work
   requests.  This allows device drivers to allocate this memory as
   needed for use in posting send requests (eg via dma_alloc_coherent).

 - New send queue work requests:
   * send with remote invalidate
   * fast register memory region
   * local invalidate memory region
   * RDMA read with invalidate local memory region (iWARP only)

Consumer interface details:

 - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
   to indicate device support for these features.

 - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
   IB_WR_RDMA_READ_WITH_INV are added.

 - A new consumer API function, ib_alloc_mr() is added to allocate
   fast register memory regions.

 - New consumer API functions, ib_alloc_fast_reg_page_list() and
   ib_free_fast_reg_page_list() are added to allocate and free
   device-specific memory for fast registration page lists.

 - A new consumer API function, ib_update_fast_reg_key(), is added to
   allow the key portion of the R_Key and L_Key of a fast registration
   MR to be updated.  Consumers call this if desired before posting
   a IB_WR_FAST_REG_MR work request.

Consumers can use this as follows:

 - MR is allocated with ib_alloc_mr().

 - Page list memory is allocated with ib_alloc_fast_reg_page_list().

 - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().

 - MR made VALID and bound to a specific page list via
   ib_post_send(IB_WR_FAST_REG_MR)

 - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
   ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
   invalidate operation.

 - MR is deallocated with ib_dereg_mr()

 - page lists dealloced via ib_free_fast_reg_page_list().

Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ).  For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.

The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key.  The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Roland Dreier
f3781d2e89 RDMA: Remove subversion $Id tags
They don't get updated by git and so they're worse than useless.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Moni Shoua
164ba0893c IB/sa: Fail requests made while creating new SM AH
This patch solves a race that occurs after an event occurs that causes
the SA query module to flush its SM address handle (AH).  When SM AH
becomes invalid and needs an update it is handled by the global
workqueue.  On the other hand this event is also handled in the IPoIB
driver by queuing work in the ipoib_workqueue that does multicast
joins.  Although queuing is in the right order, it is done to 2
different workqueues and so there is no guarantee that the first to be
queued is the first to be executed.

This causes a problem because IPoIB may end up sending an request to
the old SM, which will take a long time to time out (since the old SM
is gone); this leads to a much longer than necessary interruption in
multicast traffer.

The patch sets the SA query module's SM AH to NULL when the event
occurs, and until update_sm_ah() is done, any request that needs sm_ah
fails with -EAGAIN return status.

For consumers, the patch doesn't make things worse.  Before the patch,
MADs are sent to the wrong SM so the request gets lost.  Consumers can
be improved if they examine the return code and respond to EAGAIN
properly but even without an improvement the situation is not getting
worse.

Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Sean Hefty
a947491709 RDMA: Fix license text
The license text for several files references a third software license
that was inadvertently copied in.  Update the license to what was
intended.  This update was based on a request from HP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Jonathan Corbet
2fceef397f Merge commit 'v2.6.26' into bkl-removal 2008-07-14 15:29:34 -06:00
Roland Dreier
feae1ef116 IB/umad: BKL is not needed for ib_umad_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-11 16:40:58 -06:00
Roland Dreier
5b2d281acb IB/uverbs: BKL is not needed for ib_uverbs_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-04 10:32:28 -06:00
Arnd Bergmann
6b0ee363b2 infiniband-ucma: BKL pushdown
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2008-06-20 14:05:57 -06:00
Jonathan Corbet
f2b9857eee Add a bunch of cycle_kernel_lock() calls
All of the open() functions which don't need the BKL on their face may
still depend on its acquisition to serialize opens against driver
initialization.  So make those functions acquire then release the BKL to be
on the safe side.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:53 -06:00
Jonathan Corbet
057e7c7ff9 infiniband: more BKL pushdown
Be extra-cautious and protect the remaining open() functions.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:51 -06:00
Jonathan Corbet
d21c95c569 Add "no BKL needed" comments to several drivers
This documents the fact that somebody looked at the relevant open()
functions and concluded that, due to their trivial nature, no locking was
needed.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:50 -06:00
Jack Morgenstein
fb77bcef9f IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler()
Commit 1ae5c187 ("IB/uverbs: Don't store struct file * for event
files") changed the way that closed files are handled in the uverbs
code.  However, after the conversion, is_closed flag is checked
incorrectly in ib_uverbs_async_handler().  As a result, no async
events are ever passed to applications.

Found by: Ronni Zimmerman <ronniz@mellanox.co.il>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-18 15:36:38 -07:00
Roland Dreier
8079ffa0e1 IB/umem: Avoid sign problems when demoting npages to integer
On a 64-bit architecture, if ib_umem_get() is called with a size value
that is so big that npages is negative when cast to int, then the
length of the page list passed to get_user_pages(), namely

	min_t(int, npages, PAGE_SIZE / sizeof (struct page *))

will be negative, and get_user_pages() will immediately return 0 (at
least since 900cf086, "Be more robust about bad arguments in
get_user_pages()").  This leads to an infinite loop in ib_umem_get(),
since the code boils down to:

	while (npages) {
		ret = get_user_pages(...);
		npages -= ret;
	}

Fix this by taking the minimum as unsigned longs, so that the value of
npages is never truncated.

The impact of this bug isn't too severe, since the value of npages is
checked against RLIMIT_MEMLOCK, so a process would need to have an
astronomical limit or have CAP_IPC_LOCK to be able to trigger this,
and such a process could already cause lots of mischief.  But it does
let buggy userspace code cause a kernel lock-up; for example I hit
this with code that passes a negative value into a memory registartion
function where it is promoted to a huge u64 value.

Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-06 21:38:37 -07:00
Linus Torvalds
c2448278e3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
  IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()
  MAINTAINERS: Add cxgb3 and iw_cxgb3 NIC and iWARP driver entries
  IB/mlx4: Fix creation of kernel QP with max number of send s/g entries
  IB/mthca: Fix max_sge value returned by query_device
  RDMA/cxgb3: Fix uninitialized variable warning in iwch_post_send()
  IB/mlx4: Fix uninitialized-var warning in mlx4_ib_post_send()
  IB/ipath: Fix UC receive completion opcode for RDMA WRITE with immediate
  IB/ipath: Fix printk format for ipath_sdma_status
2008-05-23 11:11:44 -07:00
Dave Olson
5a4f2b6752 IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
If a low-level driver returns IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED,
handle_outgoing_dr_smp() doesn't clean up properly.  The fix is to
kfree the local data and break, rather than falling through.  This was
observed with the ipath driver, but could happen with any driver.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1027>.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-23 10:52:59 -07:00
Greg Kroah-Hartman
6c06aec248 IB: fix race in device_create
There is a race from when a device is created with device_create() and
then the drvdata is set with a call to dev_set_drvdata() in which a
sysfs file could be open, yet the drvdata will be NULL, causing all
sorts of bad things to happen.

This patch fixes the problem by using the new function,
device_create_drvdata().

Cc: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-20 13:31:55 -07:00
Arthur Kepner
cb9fbc5c37 IB: expand ib_umem_get() prototype
Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
when mapping user-allocated CQs with ib_umem_get().

Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:12 -07:00
Linus Torvalds
e80ab411e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
  SCSI: convert struct class_device to struct device
  DRM: remove unused dev_class
  IB: rename "dev" to "srp_dev" in srp_host structure
  IB: convert struct class_device to struct device
  memstick: convert struct class_device to struct device
  driver core: replace remaining __FUNCTION__ occurrences
  sysfs: refill attribute buffer when reading from offset 0
  PM: Remove destroy_suspended_device()
  Firmware: add iSCSI iBFT Support
  PM: Remove legacy PM (fix)
  Kobject: Replace list_for_each() with list_for_each_entry().
  SYSFS: Explicitly include required header file slab.h.
  Driver core: make device_is_registered() work for class devices
  PM: Convert wakeup flag accessors to inline functions
  PM: Make wakeup flags available whenever CONFIG_PM is set
  PM: Fix misuse of wakeup flag accessors in serial core
  Driver core: Call device_pm_add() after bus_add_device() in device_add()
  PM: Handle device registrations during suspend/resume
  block: send disk "change" event for rescan_partitions()
  sysdev: detect multiple driver registrations
  ...

Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
2008-04-21 15:49:58 -07:00
Tony Jones
f4e91eb4a8 IB: convert struct class_device to struct device
This converts the main ib_device to use struct device instead of struct
class_device as class_device is going away.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:30 -07:00
Matthew Wilcox
6188e10d38 Convert asm/semaphore.h users to linux/semaphore.h
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-18 22:22:54 -04:00
Eli Cohen
2dd5716227 IB/core: Add support for modify CQ
Add support for modifying CQ parameters for controlling event
generation moderation.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Roland Dreier
0f39cf3d54 IB/core: Add support for "send with invalidate" work requests
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
"send with invalidate" work request as defined in the iWARP verbs and
the InfiniBand base memory management extensions.  Also put "imm_data"
and a new "invalidate_rkey" member in a new "ex" union in struct
ib_send_wr. The invalidate_rkey member can be used to pass in an
R_Key/STag to be invalidated.  Add this new union to struct
ib_uverbs_send_wr.  Add code to copy the invalidate_rkey field in
ib_uverbs_post_send().

Fix up low-level drivers to deal with the change to struct ib_send_wr,
and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
since that code never does any send with immediate operations.

Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
the iWARP drivers currently in the tree set the bit.  The amso1100
driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
if passed in as part of userspace send requests (since it does not
implement kernel bypass work request queueing).  Remove the flag from
all existing drivers that set it until we know which ones are OK.

The values chosen for the new flag is not consecutive to avoid clashing
with flags defined in the XRC patches, which are not merged yet but
which are already in use and are likely to be merged soon.

This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Dotan Barak
7ce5eacb45 IB/core: Check optional verbs before using them
Make sure that a device implements the modify_srq and reg_phys_mr
optional methods before calling them.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:28 -07:00
Eli Cohen
b846f25aa2 IB/core: Add creation flags to struct ib_qp_init_attr
Add a create_flags member to struct ib_qp_init_attr that will allow a
kernel verbs consumer to create a pass special flags when creating a QP.
Add a flag value for telling low-level drivers that a QP will be used
for IPoIB UD LSO.  The create_flags member will also be useful for XRC
and ehca low-latency QP support.

Since no create_flags handling is implemented yet, add code to all
low-level drivers to return -EINVAL if create_flags is non-zero.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Robert P. J. Day
157de22946 IB: Use shorter list_splice_init() for brevity
Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init()

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Julia Lawall
10f32065a2 RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0
The function rdma_create_id() always returns either a valid pointer or
a value made with ERR_PTR, so its result should be tested with IS_ERR,
not with a test for 0.

The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)

//<smpl>
@a@
expression E, E1;
statement S,S1;
position p;
@@

E = rdma_create_id(...)
... when != E = E1
if@p (E) S else S1

@n@
position a.p;
expression E,E1;
statement S,S1;
@@

E = NULL
... when != E = E1
if@p (E) S else S1

@depends on !n@
expression E;
statement S,S1;
position a.p;
@@

* if@p (E)
  S else S1
//</smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Roland Dreier
a7dab9e887 IB/uverbs: Use alloc_file() instead of get_empty_filp()
Christoph Hellwig wants to unexport get_empty_filp(), which is an ugly
internal interface.  Change the modular user in ib_uverbs_alloc_event_file()
to use the better alloc_file() interface; this makes the code cleaner too.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier
1ae5c187ac IB/uverbs: Don't store struct file * for event files
The file member of struct ib_uverbs_event_file was only used to keep
track of whether the file had been closed or not.  The only thing we
ever did with the value was check if it was NULL or not.  Simplify the
code and get rid of the need to keep track of the struct file * we
allocate by replacing the file member with an is_closed member.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier
9cda779cc2 RDMA/ucma: Endian annotation
Add __force cast of node_guid to __u64, since we are sticking it into a
structure whose definition is shared with userspace.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:07 -07:00
Roland Dreier
a88f488857 IB/cm: Endianness annotations
Mostly update the RB tree comparisons to force __be types to normal
integers, but the change to cm_format_sidr_req() is a real fix:
param->path->pkey is already __be16.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
2008-04-16 21:01:07 -07:00
Al Viro
1b90c137cc trivial endianness annotations: infiniband core
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30 14:20:24 -07:00
Steve Wise
d7c1fbd660 RDMA/iwcm: Don't access a cm_id after dropping reference
cm_work_handler() can access cm_id_priv after it drops its reference
by calling iwch_deref_id(), which might cause it to be freed.  The fix
is to look at whether IWCM_F_CALLBACK_DESTROY is set _before_ dropping
the reference.  Then if it was set, free the cm_id on this thread.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-10 21:22:22 -07:00
Pete Wyckoff
331552925d IB/fmr_pool: Flush all dirty FMRs from ib_fmr_pool_flush()
Commit a3cd7d90 ("IB/fmr_pool: ib_fmr_pool_flush() should flush all
dirty FMRs") caused a regression for iSER and was reverted in
e5507736.

This change attempts to redo the original patch so that all used FMR
entries are flushed when ib_flush_fmr_pool() is called without
affecting the normal FMR pool cleaning thread.  Simply move used
entries from the clean list onto the dirty list in ib_flush_fmr_pool()
before letting the cleanup thread do its job.

Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:31:48 -08:00
Pete Wyckoff
35fb5340e3 Revert "IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs"
This reverts commit a3cd7d9070.

The original commit breaks iSER reliably, making it complain:

    iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11

The FMR cleanup thread runs ib_fmr_batch_release() as dirty entries
build up.  This commit causes clean but used FMR entries also to be
purged.  During that process, another thread can see that there are no
free FMRs and fail, even though there should always have been enough
available.

Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:29:19 -08:00
Sean Hefty
84ba284cd7 IB/cm: Flush workqueue when removing device
When a CM MAD is received, it is queued to a CM workqueue for
processing.  The queued work item references the port and device on
which the MAD was received.  If that device is removed from the system
before the work item can execute, the work item will reference freed
memory.

To fix this, flush the workqueue after unregistering to receive MAD,
and before the device is be freed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:27:52 -08:00
Li Zefan
c7482b81c8 IB: Fix return value in ib_device_register_sysfs()
If kobject_create_and_add() fails and returns NULL, the current code
in ib_device_register_sysfs() does not set ret and hence returns 0.
Set ret to -ENOMEM for this failure, so that the caller knows that
ib_device_register_sysfs() actually failed.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-15 15:05:05 -08:00
Sean Hefty
ead595aeb0 RDMA/cma: Do not issue MRA if user rejects connection request
There's an undesirable interaction with issuing MRA requests to
increase connection timeouts and the listen backlog.

When the rdma_cm receives a connection request, it queues an MRA with
the ib_cm.  (The ib_cm will send an MRA if it receives a duplicate
REQ.)  The rdma_cm will then create a new rdma_cm_id and give that to
the user, which in this case is the rdma_user_cm.

If the listen backlog maintained in the rdma_user_cm is full, it
destroys the rdma_cm_id, which in turns destroys the ib_cm_id.  The
ib_cm_id generates a REJ because the state of the ib_cm_id has changed
to MRA sent, versus REQ received.  When the backlog is full, we just
want to drop the REQ so that it is retried later.

Fix this by deferring queuing the MRA until after the user of the
rdma_cm has examined the connection request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-14 15:30:41 -08:00
Roland Dreier
7c7a9bccd2 IB/cm: Fix infiniband_cm class kobject ref counting
Commit 9af57b7a ("IB/cm: Add basic performance counters") introduced a
bug in how the reference count for cm_class.subsys.kobj was handled:
the path that released a device did a kobject_put() on that kobject, but
there was no kobject_get() in the path the handles adding a device.  So
the reference count ended up too low, which leads to bad things.  Fix up
and simplify the reference counting to avoid this.

(Actually, I introduced the bug when fixing the patch up to match some
of Greg's kobject changes, but who's counting)

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12 14:38:27 -08:00
Roland Dreier
ab64b96067 IB/cm: Remove debug printk()s that snuck upstream
Pesky little devils, sneaking around...

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12 14:38:27 -08:00
Or Gerlitz
1d96354e61 IB/fmr_pool: Allocate page list for pool FMRs only when caching enabled
Allocate memory for the page_list field of struct ib_pool_fmr only
when caching is enabled for the FMR pool, since the field is not used
otherwise.  This can save significant amounts of memory for large
pools with caching turned off.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Sean Hefty
3971c9f6db IB/cm: Add interim support for routed paths
Paths with hop_limit > 1 indicate that the connection will be routed
between IB subnets.  Update the subnet local field in the CM REQ based
on the hop_limit value.  In addition, if the path is routed, then set
the LIDs in the REQ to the permissive LIDs.  This is used to indicate
to the passive side that it should use the LIDs in the received local
route header (LRH) associated with the REQ when programming the QP.

This is a temporary work-around to the IB CM to support IB router
development until the IB router specification is completed.  It is not
anticipated that this work-around will cause any interoperability
issues with existing stacks or future stacks that will properly
support IB routers when defined.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
Denis V. Lunev
f206351a50 [NETNS]: Add namespace parameter to ip_route_output_key.
Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:07 -08:00
Denis V. Lunev
1ab352768f [NETNS]: Add namespace parameter to ip_dev_find.
in_dev_find() need a namespace to pass it to fib_get_table(), so add
an argument.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:04 -08:00
Joe Perches
6360a02af1 [IPV4] drivers/infiniband: Use ipv4_is_<type>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:17 -08:00
Olaf Kirch
a3cd7d9070 IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs
When a FMR is released via ib_fmr_pool_unmap(), the FMR usually ends
up on the free_list rather than the dirty_list (because we allow a
certain number of remappings before actually requiring a flush).

However, ib_fmr_batch_release() only looks at dirty_list when flushing
out old mappings.  This means that when ib_fmr_pool_flush() is used to
force a flush of the FMR pool, some dirty FMRs that have not reached
their maximum remap count will not actually be flushed.

Fix this by flushing all FMRs that have been used at least once in
ib_fmr_batch_release().

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Olaf Kirch
a656eb758f IB/fmr_pool: Flush serial numbers can get out of sync
Normally, the serial numbers for flush requests and flushes executed
for an FMR pool should be in sync.

However, if the FMR pool flushes dirty FMRs because the
dirty_watermark was reached, we wake up the cleanup thread and let it
do its stuff.  As a side effect, the cleanup thread increments
pool->flush_ser, which leaves it one higher than pool->req_ser.  The
next time the user calls ib_flush_fmr_pool(), the cleanup thread will
be woken up, but ib_flush_fmr_pool() won't wait for the flush to
complete because flush_ser is already past req_ser.  This means the
FMRs that the user expects to be flushed may not have all been flushed
when the function returns.

Fix this by telling the cleanup thread to do work exclusively by
incrementing req_ser, and by moving the comparison of dirty_len and
dirty_watermark into ib_fmr_pool_unmap().

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
2008-01-25 14:15:42 -08:00
Roland Dreier
2fe7e6f7c9 IB/umad: Simplify and fix locking
In addition to being overly complex, the locking in user_mad.c is
broken: there were multiple reports of deadlocks and lockdep warnings.
In particular it seems that a single thread may end up trying to take
the same rwsem for reading more than once, which is explicitly
forbidden in the comments in <linux/rwsem.h>.

To solve this, we change the locking to use plain mutexes instead of
rwsems.  There is one mutex per open file, which protects the contents
of the struct ib_umad_file, including the array of agents and list of
queued packets; and there is one mutex per struct ib_umad_port, which
protects the contents, including the list of open files.  We never
hold the file mutex across calls to functions like ib_unregister_mad_agent(),
which can call back into other ib_umad code to queue a packet, and we
always hold the port mutex as long as we need to make sure that a
device is not hot-unplugged from under us.

This even makes things nicer for users of the -rt patch, since we
remove calls to downgrade_write() (which is not implemented in -rt).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:42 -08:00
Sean Hefty
5851bb893e RDMA/cma: Override default responder_resources with user value
By default, the responder_resources parameter is set to that received
in a connection request.  The passive side may override this value
when accepting the connection.  Use the value provided by the passive
side when transitioning the QP to RTR state, rather than the value
given in the connect request.  Without this change, the RTR transition
may fail if the passive side supports fewer responder_resources than
that in the request.

For code consistency and to protect against QP destruction, restructure
overriding initiator_depth to match how responder_resources is set.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:41 -08:00
Rolf Manderscheid
a9e527e3f9 IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Sean Hefty
88314e4dda RDMA/cma: add support for rdma_migrate_id()
This is based on user feedback from Doug Ledford at RedHat:

Events that occur on an rdma_cm_id are reported to userspace through an
event channel.  Connection request events are reported on the event
channel associated with the listen.  When the connection is accepted, a
new rdma_cm_id is created and automatically uses the listen event
channel.  This is suboptimal where the user only wants listen events on
that channel.

Additionally, it may be desirable to have events related to connection
establishment use a different event channel than those related to
already established connections.

Allow the user to migrate an rdma_cm_id between event channels. All
pending events associated with the rdma_cm_id are moved to the new event
channel.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:32 -08:00
Vladimir Sokolovsky
45d9478da1 RDMA/cma: Reenable device removal on passive side
Enable conn_id remove on the passive side after connection
establishment.  This corrects an issue where the IB driver can't be
unloaded after running applications over RDS.  The 'dev_remove' counter
does not reach 0 for established connections on the passive side.

This problem is limited to device removal, and only occurs on the
passive side if there are established connections.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:31 -08:00
Sean Hefty
b61d92d8ae IB/mad: Fix incorrect access to items on local_list
In cancel_mads(), MADs are moved from the wait_list and local_list
to a cancel_list for processing.  However, the structures on these two
lists are not the same.  The wait_list references struct
ib_mad_send_wr_private, but local_list references struct
ib_mad_local_private.  Cancel_mads() treats all items moved to the
cancel_list as struct ib_mad_send_wr_private.  This leads to a system
crash when requests are moved from the local_list to the cancel_list.

Fix this by leaving local_list alone.  All requests on the local_list
have completed are just awaiting processing by a queued worker thread.

Bug (crash) reported by Dotan Barak <dotanb@dev.mellanox.co.il>.
Problem with local_list access reported by Robert Reynolds
<rreynolds@opengridcomputing.com>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:31 -08:00
Sean Hefty
9af57b7a27 IB/cm: Add basic performance counters
Add performance/debug counters to track sent/received messages, retries,
and duplicates.  Counters are tracked per CM message type, per port.

The counters are always enabled, so intrusive state tracking is not done.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:30 -08:00
Sean Hefty
4fc8cd4919 IB/mad: Report number of times a mad was retried
To allow ULPs to tune timeout values and capture retry statistics,
report the number of times that a mad send operation was retried.

For RMPP mads, report the total number of times that the any portion
(send window) of the send operation was retried.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:30 -08:00
Sean Hefty
547af76521 IB/multicast: Report errors on multicast groups if P_key changes
P_key changes can invalidate multicast groups.  Report errors on all
multicast groups affected by a pkey change.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:29 -08:00
Steve Welch
727792da2b IB/mad: Enable loopback of DR SMP responses from userspace
The local loopback of an outgoing DR SMP response is limited to those
that originate at the driver specific SMA implementation during the
driver specific process_mad() function.  This patch enables a
returning DR SMP originating in userspace (or elsewhere) to be
delivered to the local managment stack.  In this specific case the
driver process_mad() function does not consume or process the MAD, so
a reponse mad has not be created and the original MAD must manually be
copied to the MAD buffer that is to be handed off to the local agent.

Signed-off-by: Steve Welch <swelch@systemfabricworks.com>
Acked-by: Hal Rosenstock <hal@xsigo.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:25 -08:00
Ralph Campbell
3828ff457a IB/mad: Remove redundant NULL pointer check in ib_mad_recv_done_handler()
In ib_mad_recv_done_handler(), the response pointer is checked for
NULL after allocating it.  It is then checked again in the local
process_mad() path but there is no possibility of it changing in
between.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Acked-by: Hal Rosenstock <hal@xsigo.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:25 -08:00
Steve Wise
8d8293cfb3 RDMA/iwcm: Set initiator depth and responder resources to device max values
Set the initiator depth and responder resources to the device max
values for new connect request events in the iWARP connection manager.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:25 -08:00
Greg Kroah-Hartman
c10997f657 Kobject: convert drivers/* from kobject_unregister() to kobject_put()
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman
35be068198 Kobject: change drivers/infiniband to use kobject_init_and_add
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.

Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <mshefty@ichips.intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:26 -08:00
Linus Torvalds
53173920da Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/fmr_pool: Stop ib_fmr threads from contributing to load average
  IB/ipath: Fix incorrect use of sizeof on msg buffer (function argument)
  IB/ipath: Limit length checksummed in eeprom
  IB/ipath: Fix a race where s_last is updated without lock held
  IB/mlx4: Lock SQ lock in mlx4_ib_post_send()
  IPoIB/cm: Fix receive QP cleanup
2007-10-30 15:26:56 -07:00
Anton Blanchard
3f776e8a25 IB/fmr_pool: Stop ib_fmr threads from contributing to load average
I noticed my machine was at a constant load average of 1. This was
because ib_create_fmr_pool calls kthread_create but does not
immediately wake the thread up.

Change to using kthread_run so we enter ib_fmr_cleanup_thread(), set
TASK_INTERRUPTIBLE, then go to sleep.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-30 14:57:43 -07:00
Jens Axboe
642f149031 SG: Change sg_set_page() to take length and offset argument
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.

Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-24 11:20:47 +02:00
Linus Torvalds
0b776eb542 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
  IPoIB/cm: Use common CQ for CM send completions
  IB/uverbs: Fix checking of userspace object ownership
  IB/mlx4: Sanity check userspace send queue sizes
  IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
  IB/ehca: Enable large page MRs by default
  IB/ehca: Change meaning of hca_cap_mr_pgsize
  IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
  IB/ehca: Fix masking error in {,re}reg_phys_mr()
  IB/ehca: Supply QP token for SRQ base QPs
  IPoIB: Use round_jiffies() for ah_reap_task
  RDMA/cma: Fix deadlock destroying listen requests
  RDMA/cma: Add locking around QP accesses
  IB/mthca: Avoid alignment traps when writing doorbells
  mlx4_core: Kill mlx4_write64_raw()
2007-10-23 09:56:11 -07:00
Jens Axboe
45711f1af6 [SG] Update drivers to use sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 21:19:53 +02:00
Roland Dreier
cbfb50e6e2 IB/uverbs: Fix checking of userspace object ownership
Commit 9ead190b ("IB/uverbs: Don't serialize with ib_uverbs_idr_mutex")
rewrote how userspace objects are looked up in the uverbs module's
idrs, and introduced a severe bug in the process: there is no checking
that an operation is being performed by the right process any more.
Fix this by adding the missing check of uobj->context in __idr_get_uobj().

Apparently everyone is being very careful to only touch their own
objects, because this bug was introduced in June 2006 in 2.6.18, and
has gone undetected until now.

Cc: stable <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-19 20:01:43 -07:00
Anton Arapov
a25de534f8 [INET]: Justification for local port range robustness.
There is a justifying patch for Stephen's patches. Stephen's patches
disallows using a port range of one single port and brakes the meaning
of the 'remaining' variable, in some places it has different meaning.
My patch gives back the sense of 'remaining' variable. It should mean
how many ports are remaining and nothing else. Also my patch allows
using a single port.

  I sure we must be able to use mentioned port range, this does not
restricted by documentation and does not brake current behavior.

usefull links:
Patches posted by Stephen Hemminger
  http://marc.info/?l=linux-netdev&m=119206106218187&w=2
  http://marc.info/?l=linux-netdev&m=119206109918235&w=2

Andrew Morton's comment
  http://marc.info/?l=linux-kernel&m=119248225007737&w=2

1. Allows using a port range of one single port.
2. Gives back sense of 'remaining' variable.

Signed-off-by: Anton Arapov <aarapov@redhat.com>
Acked-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-18 22:00:17 -07:00
Sean Hefty
d02d1f5359 RDMA/cma: Fix deadlock destroying listen requests
Deadlock condition reported by Kanoj Sarcar <kanoj@netxen.com>.
The deadlock occurs when a connection request arrives at the same
time that a wildcard listen is being destroyed.

A wildcard listen maintains per device listen requests for each
RDMA device in the system.  The per device listens are automatically
added and removed when RDMA devices are inserted or removed from
the system.

When a wildcard listen is destroyed, rdma_destroy_id() acquires
the rdma_cm's device mutex ('lock') to protect against hot-plug
events adding or removing per device listens.  It then tries to
destroy the per device listens by calling ib_destroy_cm_id() or
iw_destroy_cm_id().  It does this while holding the device mutex.

However, if the underlying iw/ib CM reports a connection request
while this is occurring, the rdma_cm callback function will try
to acquire the same device mutex.  Since we're in a callback,
the ib_destroy_cm_id() or iw_destroy_cm_id() calls will block until
their callback thread returns, but the callback is blocked waiting for
the device mutex.

Fix this by re-working how per device listens are destroyed.  Use
rdma_destroy_id(), which avoids the deadlock, in place of
cma_destroy_listen().  Additional synchronization is added to handle
device hot-plug events and ensure that the id is not destroyed twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16 12:25:49 -07:00
Sean Hefty
c5483388bb RDMA/cma: Add locking around QP accesses
If a user allocates a QP on an rdma_cm_id, the rdma_cm will automatically
transition the QP through its states (RTR, RTS, error, etc.)  While the
QP state transitions are occurring, the QP itself must remain valid.
Provide locking around the QP pointer to prevent its destruction while
accessing the pointer.

This fixes an issue reported by Olaf Kirch from Oracle that resulted in
a system crash:

"An incoming connection arrives and we decide to tear down the nascent
 connection.  The remote ends decides to do the same.  We start to shut
 down the connection, and call rdma_destroy_qp on our cm_id. ... Now
 apparently a 'connect reject' message comes in from the other host,
 and cma_ib_handler() is called with an event of IB_CM_REJ_RECEIVED.
 It calls cma_modify_qp_err, which for some odd reason tries to modify
 the exact same QP we just destroyed."

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16 12:25:00 -07:00
Kay Sievers
7eff2e7a8b Driver core: change add_uevent_var to use a struct
This changes the uevent buffer functions to use a struct instead of a
long list of parameters. It does no longer require the caller to do the
proper buffer termination and size accounting, which is currently wrong
in some places. It fixes a known bug where parts of the uevent
environment are overwritten because of wrong index calculations.

Many thanks to Mathieu Desnoyers for finding bugs and improving the
error handling.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:01 -07:00
Linus Torvalds
ce9d3c9a6a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits)
  mlx4_core: Fix section mismatches
  IPoIB: Allow setting policy to ignore multicast groups
  IB/mthca: Mark error paths as unlikely() in post_srq_recv functions
  IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages.
  IB/ipath: Remove redundant link state checks
  IB/ipath: Fix IB_EVENT_PORT_ERR event
  IB/ipath: Better handling of unexpected GPIO interrupts
  IB/ipath: Maintain active time on all chips
  IB/ipath: Fix QHT7040 serial number check
  IB/ipath: Indicate a couple of chip bugs to userspace
  IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround
  IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close
  IB/ipath: Remove duplicate copy of LMC
  IB/ipath: Add ability to set the LMC via the sysfs debugging interface
  IB/ipath: Optimize completion queue entry insertion and polling
  IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED
  IB/ipath: Generate flush CQE when QP is in error state
  IB/ipath: Remove redundant code
  IB/ipath: Future proof eeprom checksum code (contents reading)
  IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate
  ...
2007-10-11 19:43:13 -07:00
Stephen Hemminger
227b60f510 [INET]: local port range robustness
Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 17:30:46 -07:00
Sean Hefty
dcb3f974da RDMA/cma: Queue IB CM MRAs to avoid unnecessary remote retries
Automatically queue MRA message to decrease the number of retries sent
by the remote side during connection establishment.  This also has the
effect of increasing the overall connection timeout without using a
longer retry time in the case of dropped packets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Sean Hefty
de98b693e9 IB/cm: Modify interface to send MRAs in response to duplicate messages
The IB CM provides a message received acknowledged (MRA) message that
can be sent to indicate that a REQ or REP message has been received, but
will require more time to process than the timeout specified by those
messages.  In many cases, the application may not know how long it will
take to respond to a CM message, but the majority of the time, it will
usually respond before a retry has been sent.  Rather than sending an
MRA in response to all messages just to handle the case where a longer
timeout is needed, it is more efficient to queue the MRA for sending in
case a duplicate message is received.

This avoids sending an MRA when it is not needed, but limits the number
of times that a REQ or REP will be resent.  It also provides for a
simpler implementation than generating the MRA based on a timer event.
(That is, trying to send the MRA after receiving the first REQ or REP if
a response has not been generated, so that it is received at the remote
side before a duplicate REQ or REP has been received)

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Roland Dreier
04d29b0ede IB/uverbs: Make ib_uverbs_release_event_file() static
ib_uverbs_release_event_file() is only used in uverbs_main.c, so make it
static to that file.  Also move the definition before the first use, so
a forward declaration is not needed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:15 -07:00