linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-18 00:24:39 +00:00

Author	SHA1	Message	Date
Dennis Dalessandro	f48ad614c1	IB/hfi1: Move driver out of staging The TODO list for the hfi1 driver was completed during 4.6. In addition other objections raised (which are far beyond what was in the TODO list) have been addressed as well. It is now time to remove the driver from staging and into the drivers/infiniband sub-tree. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:35:14 -04:00
Jianxin Xiong	b583faf4dc	IB/hfi1: Fix bug that blocks process on exit after port bounce During the processing of a user SDMA request, if there was an error before the request counter was increased, the state of the packet queue could be updated incorrectly, causing the counter to underflow. As the result, the process could get stuck later since the counter could never get back to 0. This patch adds a condition to guard the packet queue update so that the counter is only decreased if it has been increased before the error happens. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:11 -04:00
Mitko Haralanov	9565c6a37a	IB/hfi1: Fix an interval RB node reference count leak Commit `e88c9271d9` ("IB/hfi1: Fix buffer cache corner case which may cause corruption") introduced a bug which may cause a reference count of a interval RB node to be leaked in the case where an SDMA transfer from that node completes at the same time as the node is being extended. If a node is being extended, it is first removed from the RB tree in order to be processed without the risk of an invalidation event removing the node at the same time. If a SDMA completion happens during that time, the completion handler will fail to find the node in the RB tree and, therefore, fail to correctly decrement its refcount. This leaves the node in the tree and its pages pinned for the duration of the user process. To prevent this from happening the io vector adds a reference to the RB node, which is used during the SDMA completion instead of looking up the node in the RB tree. This change adds a performance improvement as a side effect by avoiding the RB tree lookup. Fixes: `e88c9271d9` ("IB/hfi1: Fix buffer cache corner case which may cause corruption") Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-05-26 11:23:10 -04:00
Sebastian Sanchez	e38d1e4f50	IB/hfi1: Check P_KEY for all sent packets from user mode Add the P_KEY check for user-context mechanism for both PIO and SDMA. For PIO, the SendCtxtCheckEnable.DisallowKDETHPackets is set by default. When the P_KEY is set, SendCtxtCheckEnable.DisallowKDETHPackets is cleared. For SDMA, a software check was included. This change requires user processes to set the P_KEY before sending any packets, otherwise, the sent packet will fail. The original submission didn't have this check but it's required. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mikto Haralanov <mitko.haralanov@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:27 -04:00
Mitko Haralanov	e88c9271d9	IB/hfi1: Fix buffer cache races which may cause corruption There are two possible causes for node/memory corruption both of which are related to the cache eviction algorithm. One way to cause corruption is due to the asynchronous nature of the MMU invalidation and the locking used when invalidating node. The MMU invalidation routine would temporarily release the RB tree lock to avoid a deadlock. However, this would allow the eviction function to take the lock resulting in the removal of cache nodes. If the node being removed by the eviction code is the same as the node being invalidated, the result is use after free. The same is true in the other direction due to the temporary release of the eviction list lock in the eviction loop. Another corner case exists when dealing with the SDMA buffer cache that could cause memory corruption of kernel memory. The most common way, in which this corruption exhibits itself is a linked list node corruption. In that case, the kernel will complain that a node with poisoned pointers is being removed. The fact that the pointers are already poisoned means that the node has already been removed from the list. To root cause of this corruption was a mishandling of the eviction list maintained by the driver. In order for this to happen four conditions need to be satisfied: 1. A node describing a user buffer already exists in the interval RB tree, 2. The beginning of the current user buffer matches that node but is bigger. This will cause the node to be extended. 3. The amount of cached buffers is close or at the limit of the buffer cache size. 4. The node has dropped close to the end of the eviction list. This will cause the node to be considered for eviction. If all of the above conditions have been satisfied, it is possible for the eviction algorithm to evict the current node, which will free the node without the driver knowing. To solve both issues described above: - the locking around the MMU invalidation loop and cache eviction loop has been improved so locks are not released in the loop body, - a new RB function is introduced which will "atomically" find and remove the matching node from the RB tree, preventing the MMU invalidation loop from touching it, and - the node being extended by the pin_vector_pages() function is removed from the eviction list prior to calling the eviction function. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Mitko Haralanov	f53af85e47	IB/hfi1: Extract and reinsert MMU RB node on lookup The page pinning function, which also maintains the pin cache, behaves one of two ways when an exact buffer match is not found: 1. If no node is not found (a buffer with the same starting address is not found in the cache), a new node is created, the buffer pages are pinned, and the node is inserted into the RB tree, or 2. If a node is found but the buffer in that node is a subset of the new user buffer, the node is extended with the new buffer pages. Both modes of operation require (re-)insertion into the interval RB tree. When the node being inserted is a new node, the operations are pretty simple. However, when the node is already existing and is being extended, special care must be taken. First, we want to guard against an asynchronous attempt to delete the node by the MMU invalidation notifier. The simplest way to do this is to remove the node from the RB tree, preventing the search algorithm from finding it. Second, the node needs to be re-inserted so it lands in the proper place in the tree and the tree is correctly re-balanced. This also requires the node to be removed from the RB tree. This commit adds the hfi1_mmu_rb_extract() function, which will search for a node in the interval RB tree matching an address and length and remove it from the RB tree if found. This allows for both of the above special cases be handled in a single step. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Mitko Haralanov	de79093b28	IB/hfi1: Correctly compute node interval The computation of the interval of an interval RB node was incorrect leading to data corruption due to the RB search algorithm not properly finding the all RB nodes in an MMU invalidation interval. The problem stemmed from the fact that the beginning address of the node's range was being aligned to a page boundary. For certain buffer sizes, this would lead to a end address calculation that was off by 1 page. An important aspect of keeping the RB same is also updating the node's range in the case it's being extended. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Mitko Haralanov	0ad2d3d05b	IB/hfi1: Fix memory leak in user ExpRcv and SDMA The driver had two memory leaks - one in the user expected receive code and one in SDMA buffer cache. The leak in the expected receive code only showed up when the user/admin had set ulimit sufficiently low and the driver did not have enough room in the cache before hitting the limit of allowed cachable memory. When this condition occurred, the driver returned early signaling userland that it needed to free some buffers to free up room in the cache. The bug was that the driver was not cleaning up allocated memory prior to returning early. The leak in the SDMA buffer cache could occur (even though it never did), when the insertion of a buffer node in the interval RB tree failed. In this case, the driver failed to unpin the pages of the node instead erroneously returning success. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Mitko Haralanov	4787bc5e17	IB/hfi1: Don't remove list entries if they are not in a list The SDMA cache logic maintains an eviction list which is ordered by most recently used user buffers. Upon errors or buffer freeing, the list nodes were unconditionally being deleted. This would lead to list corruption warnings if the nodes were never inserted in the eviction list to begin with. This commit prevents this by checking that the nodes are already part of the eviction list. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 16:32:26 -04:00
Mitko Haralanov	849e3e9398	IB/hfi1: Prevent unpinning of wrong pages The routine used by the SDMA cache to handle already cached nodes can extend an already existing node. In its error handling code, the routine will unpin pages when not all pages of the buffer extension were pinned. There was a bug in that part of the routine, which would mistakenly unpin pages from the original set rather than the newly pinned pages. This commit fixes that bug by offsetting the page array to the proper place pointing at the beginning of the newly pinned pages. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 12:00:38 -04:00
Mitko Haralanov	f19bd643db	IB/hfi1: Prevent NULL pointer deferences in caching code There is a potential kernel crash when the MMU notifier calls the invalidation routines in the hfi1 pinned page caching code for sdma. The invalidation routine could call the remove callback for the node, which in turn ends up dereferencing the current task_struct to get a pointer to the mm_struct. However, the mm_struct pointer could be NULL resulting in the following backtrace: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1] 15 task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000 RIP: 0010:[<ffffffffa041f75a>] [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1] RSP: 0000:ffff88085c245878 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830 RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0 RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0 R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40 FS: 0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0 Stack: ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0 ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000 0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282 Call Trace: [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1] [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1] [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1] [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70 [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470 [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120 [<ffffffff81141fad>] try_to_unmap+0x4d/0x60 [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0 [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490 [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640 [<ffffffff81121641>] shrink_zone+0x31/0x100 [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0 [<ffffffff811229e3>] kswapd+0x403/0x7e0 [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0 [<ffffffff81068ac0>] kthread+0xc0/0xd0 [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40 [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0 [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40 To correct this, the mm_struct passed to us by the MMU notifier is used (which is what should have been done to begin with). This avoids the broken derefences and ensures that the correct mm_struct is used. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-04-28 12:00:38 -04:00
Linus Torvalds	b8ba452683	Round two of 4.6 merge window patches - A few minor core fixups needed for the next patch series - The IB SRIOV series. This has bounced around for several versions. Of note is the fact that the first patch in this series effects the net core. It was directed to netdev and DaveM for each iteration of the series (three versions total). Dave did not object, but did not respond either. I've taken this as permission to move forward with the series. - The new Intel X722 iWARP driver - A huge set of updates to the Intel hfi1 driver. Of particular interest here is that we have left the driver in staging since it still has an API that people object to. Intel is working on a fix, but getting these patches in now helps keep me sane as the upstream and Intel's trees were over 300 patches apart. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW8HR9AAoJELgmozMOVy/dDYMP+wSBALhIdV/pqVzdLCGfIUbK H5agonm/3b/Oj74W30w2JYqXBFfZC2LGVJy6OwocJ3wK04v/KfZbA9G+QsOuh2hQ Db+tFn1eoltvzrcx3k/a7x6zHGC4YyxyH9OX2B3QfRsNHeE7PG9KGp5dfEs2OH1r WGp3jMLAsHf7o8uKpa0jyTEUEErATaTlG+YoaJ+BGHwurgCNy8ni+wAn+EAFiJ3w iEJhcXB6KY69vkLsrLYuT9xxJn4udFJ3QEk8xdPkpLKsu+6Ue5i/eNQ19VfbpZgR c6fTc8genfIv5S+fis+0P44u1oA7Kl2JT6IZYLi35gJ60ZmxTD+7GruWP3xX/wJ2 zuR3sTj5fjcFWenk087RSIU/EK87ONPD4g9QPdZpf3FtgleTVKk3YDlqwjqf8pgv cO6gQ1BcOBnixJvhjNFiX1c2hvNhb3CkgObly1JBwhcCzZhLkV7BNFPbZuDHAeAx VqzNEUse4hupkgiiuiGgudcJ4fsSxMW37kyfX9QC/qyk6YVuUDbrekcWI+MAKot7 5e5dHqFExpbn1Zgvc8yfvh88H2MUQAgaYwjanWF/qpppOPRd01nTisVQIOJn7s5C arcWzvocpQe0GL2UsvDoWwAABXznL3bnnAoCyTWOES2RhOOcw0Ibw46Jl8FQ8gnl 2IRxQ+ltNEscb2cwi5wE =t2Ko -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "Round two of 4.6 merge window patches. This is a monster pull request. I held off on the hfi1 driver updates (the hfi1 driver is intimately tied to the qib driver and the new rdmavt software library that was created to help both of them) in my first pull request. The hfi1/qib/rdmavt update is probably 90% of this pull request. The hfi1 driver is being left in staging so that it can be fixed up in regards to the API that Al and yourself didn't like. Intel has agreed to do the work, but in the meantime, this clears out 300+ patches in the backlog queue and brings my tree and their tree closer to sync. This also includes about 10 patches to the core and a few to mlx5 to create an infrastructure for configuring SRIOV ports on IB devices. That series includes one patch to the net core that we sent to netdev@ and Dave Miller with each of the three revisions to the series. We didn't get any response to the patch, so we took that as implicit approval. Finally, this series includes Intel's new iWARP driver for their x722 cards. It's not nearly the beast as the hfi1 driver. It also has a linux-next merge issue, but that has been resolved and it now passes just fine. Summary: - A few minor core fixups needed for the next patch series - The IB SRIOV series. This has bounced around for several versions. Of note is the fact that the first patch in this series effects the net core. It was directed to netdev and DaveM for each iteration of the series (three versions total). Dave did not object, but did not respond either. I've taken this as permission to move forward with the series. - The new Intel X722 iWARP driver - A huge set of updates to the Intel hfi1 driver. Of particular interest here is that we have left the driver in staging since it still has an API that people object to. Intel is working on a fix, but getting these patches in now helps keep me sane as the upstream and Intel's trees were over 300 patches apart" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits) IB/ipoib: Allow mcast packets from other VFs IB/mlx5: Implement callbacks for manipulating VFs net/mlx5_core: Implement modify HCA vport command net/mlx5_core: Add VF param when querying vport counter IB/ipoib: Add ndo operations for configuring VFs IB/core: Add interfaces to control VF attributes IB/core: Support accessing SA in virtualized environment IB/core: Add subnet prefix to port info IB/mlx5: Fix decision on using MAD_IFC net/core: Add support for configuring VF GUIDs IB/{core, ulp} Support above 32 possible device capability flags IB/core: Replace setting the zero values in ib_uverbs_ex_query_device net/mlx5_core: Introduce offload arithmetic hardware capabilities net/mlx5_core: Refactor device capability function net/mlx5_core: Fix caching ATOMIC endian mode capability ib_srpt: fix a WARN_ON() message i40iw: Replace the obsolete crypto hash interface with shash IB/hfi1: Add SDMA cache eviction algorithm IB/hfi1: Switch to using the pin query function IB/hfi1: Specify mm when releasing pages ...	2016-03-22 15:48:44 -07:00
Mitko Haralanov	5511d78107	IB/hfi1: Add SDMA cache eviction algorithm This commit adds a cache eviction algorithm for the SDMA user buffer cache. Besides the interval RB tree used for node lookup, the cache nodes are also arranged in a doubly-linked list. When a node is used, it is put at the beginning of the list. Less frequently used nodes naturally move to the tail of the list. When the cache limit is reached, the eviction code starts traversing the linked list in reverse, freeing buffers until enough space has been freed to fit the new user buffer. This guarantees that only the least used cache nodes will be removed from the cache. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 15:55:25 -04:00
Mitko Haralanov	bd3a8947de	IB/hfi1: Specify mm when releasing pages This change adds a pointer to the process mm_struct when calling hfi1_release_user_pages(). Previously, the function used the mm_struct of the current process to adjust the number of pinned pages. However, is some cases, namely when unpinning pages due to a MMU notifier call, we want to drop into that code block as it will cause a deadlock (the MMU notifiers take the process' mmap_sem prior to calling the callbacks). By allowing to caller to specify the pointer to the mm_struct, the caller has finer control over that part of hfi1_release_user_pages(). Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 15:55:25 -04:00
Mitko Haralanov	5cd3a88d7f	IB/hfi1: Implement SDMA-side buffer caching Add support for caching of user buffers used for SDMA transfers. This change improves performance by avoiding repeatedly pinning the pages of buffers, which are being re-used by the application. While the cost of the pinning operation has been made heavier by adding the extra code to search the cache tree, re-allocate pages arrays, and future cache evictions, that cost will be amortized against the savings when the same buffer is re-used. It is also worth noting that in most cases, the cost of pinning should be much lower due to the buffer already being in the cache. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 15:55:24 -04:00
Amitoj Kaur Chawla	8444991755	staging: rdma: hfi1: Replace ALIGN with PAGE_ALIGN mm.h contains a helper function PAGE_ALIGN which aligns the pointer to the page boundary instead of using ALIGN(expression, PAGE_SIZE) This change was made with the help of the following Coccinelle semantic patch: //<smpl> @@ expression e; symbol PAGE_SIZE; @@ ( - ALIGN(e, PAGE_SIZE) + PAGE_ALIGN(e) \| - IS_ALIGNED(e, PAGE_SIZE) + PAGE_ALIGNED(e) ) //</smpl> Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-11 22:09:09 -08:00
Janani Ravichandran	16ccad0475	staging: rdma: hfi1: user_sdma.c: Drop void pointer cast Void pointers need not be cast to other pointer types. Semantic patch used: @r@ expression x; void e; type T; identifier f; @@ ( ((T )e) \| ((T )x) [...] \| ((T )x)->f \| - (T ) e ) Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-11 22:09:09 -08:00
Jubin John	05d6ac1d82	staging/rdma/hfi1: Fix header Fix the header by moving the copyright notice out of the license text and to the top of the header. Also, update the copyright date. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:45:42 -05:00
Jubin John	e490974e67	staging/rdma/hfi1: Add braces on all arms of statement Add braces on all arms of statements to fix checkpatch check: CHECK: braces {} should be used on all arms of this statement Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:45:42 -05:00
Jubin John	17fb4f2923	staging/rdma/hfi1: Fix code alignment Fix code alignment to fix checkpatch check: CHECK: Alignment should match open parenthesis Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:45:41 -05:00
Jubin John	4d114fdd90	staging/rdma/hfi1: Fix block comments Fix block comments with proper formatting to fix checkpatch warnings: WARNING: Block comments use * on subsequent lines WARNING: Block comments use a trailing */ on a separate line Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:45:41 -05:00
Jubin John	5161fc3ef6	staging/rdma/hfi1: Remove blank line before close brace Remove extra blank line before close brace to fix checkpatch check: CHECK: Blank lines aren't necessary before a close brace '}' Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:45:37 -05:00
Jubin John	50e5dcbed6	staging/rdma/hfi1: Remove space after cast Remove the space after a cast to fix checkpatch check: CHECK: No space is necessary after a cast Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:45:36 -05:00
Jubin John	8638b77f13	staging/rdma/hfi1: Add spaces around binary operators Add spaces around binary operators. Fixes checkpatch check: CHECK: spaces preferred around that 'x' where x is a binary operator Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:45:33 -05:00
Mike Marciniszyn	a545f5308b	staging/rdma/hfi: fix CQ completion order issue The current implementation of the sdma_wait variable has a timing hole that can cause a completion Q entry to be returned from a pio send prior to an older sdma packets completion queue entry. The sdma_wait variable used to be decremented prior to calling the packet complete routine. The hole is between decrement and the verbs completion where send engine using pio could return a out of order completion in that window. This patch closes the hole by allowing an API option to specify an sdma_drained callback. The atomic dec is positioned after the complete callback to avoid the window as long as the pio path doesn't execute when there is a non-zero sdma count. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:38:14 -05:00
Mitko Haralanov	a402d6ab40	staging/rdma/hfi1: Fix bug that could block the process on context exit A race was discovred in the user SDMA code, which could result in an process being stuck in the kernel call indefinitely in certain error conditions. If, during the processing of a user SDMA request, there was an error and all outstanding SDMA descriptor had been completed by the time the that error case was handled in the calling function, the state of the packet queue would not get correctly updated resulting in the process subsequently getting stuck, thinking that there are more descriptors to be completed. To handle this scenario, the driver now checks the submitted packet count vs. the completed. If all submitted packets have also been completed, the driver can safely free the request and signal user level. Otherwise, this will be handled by the completion callback. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:38:00 -05:00
Mitko Haralanov	0840aea98c	staging/rdma/hfi1: Improve performance of user SDMA To facilitate locked page counting, the user SDMA routines would maintain a list of io vectors, which were freed in the completion callback and then unpin the associated pages during the next call into the kernel. Since the size of this list was unbounded, doing this was bad for performance because the driver ended up spending too much time freeing the io vectors. This commit changes how the io vector freeing is done by moving the actual page unpinning in the callback and maintaining a count of unpinned pages. This count can then be used during the next call into the kernel to update the mm->pinned_vm variable (since that requires process context and the ability to sleep.) Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:37:59 -05:00
Mitko Haralanov	c7cbf2fabb	staging/rdma/hfi1: Properly determine error status of SDMA slots To ensure correct operation between the driver and PSM with respect to managing the SDMA request ring, it is important that the status for a particular request slot is set at the correct time. Otherwise, PSM can get out of sync with the driver, which could lead to hangs or errors on new requests. Properly determining of when to set the error status of a SDMA slot depends on knowing exactly when the last txreq for that request has been completed. This in turn requires that the driver knows exactly how many requests have been generated and how many of those requests have been successfully submitted to the SDMA queue. The previous implementation of the mid-layer SDMA API did not provide a way for the caller of sdma_send_txlist() to know how many of the txreqs in the input list have actually been submitted without traversing the list and counting. Since sdma_send_txlist() already traverses the list in order to process it, requiring such traversal in the caller is completely unnecessary. Therefore, it is much easier to enhance sdma_send_txlist() to return the number of successfully submitted txreqs. This, in turn, allows the caller to accurately determine the progress of the SDMA request and, therefore, correctly set the error status at the right time. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:37:55 -05:00
Mitko Haralanov	0f2d87d282	staging/rdma/hfi1: Improve performance of SDMA transfers Commit `a0d406934a` ("staging/rdma/hfi1: Add page lock limit check for SDMA requests") added a mechanism to delay the clean-up of user SDMA requests in order to facilitate proper locked page counting. This delayed processing was done using a kernel workqueue, which meant that a kernel thread would have to spin up and take CPU cycles to do the clean-up. This proved detrimental to performance because now there are two execution threads (the kernel workqueue and the user process) needing cycles on the same CPU. Performance-wise, it is much better to do as much of the clean-up as can be done in interrupt context (during the callback) and do the remaining work in-line during subsequent calls of the user process into the driver. The changes required to implement the above also significantly simplify the entire SDMA completion processing code and eliminate a memory corruption causing the following observed crash: [ 2881.703362] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2881.703389] IP: [<ffffffffa02897e4>] user_sdma_send_pkts+0xcd4/0x18e0 [hfi1] [ 2881.703422] PGD 7d4d25067 PUD 77d96d067 PMD 0 [ 2881.703427] Oops: 0000 [#1] SMP [ 2881.703431] Modules linked in: [ 2881.703504] CPU: 28 PID: 6668 Comm: mpi_stress Tainted: G OENX 3.12.28-4-default #1 [ 2881.703508] Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0044.090 [ 2881.703512] task: ffff88077da8e0c0 ti: ffff880856772000 task.ti: ffff880856772000 [ 2881.703515] RIP: 0010:[<ffffffffa02897e4>] [<ffffffffa02897e4>] user_sdma_send_pkts+0xcd4/0x [ 2881.703529] RSP: 0018:ffff880856773c48 EFLAGS: 00010287 [ 2881.703531] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000002000 [ 2881.703534] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000002000 [ 2881.703537] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 2881.703540] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 2881.703543] R13: 0000000000000000 R14: ffff88071e782e68 R15: ffff8810532955c0 [ 2881.703546] FS: 00007f9c4375e700(0000) GS:ffff88107eec0000(0000) knlGS:0000000000000000 [ 2881.703549] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2881.703551] CR2: 0000000000000000 CR3: 00000007d4cba000 CR4: 00000000003407e0 [ 2881.703554] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2881.703556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2881.703558] Stack: [ 2881.703559] ffffffff00002000 ffff881000001800 ffffffff00000000 00000000000080d0 [ 2881.703570] 0000000000000000 0000200000000000 0000000000000000 ffff88071e782db8 [ 2881.703580] ffff8807d4d08d80 ffff881053295600 0000000000000008 ffff88071e782fc8 [ 2881.703589] Call Trace: [ 2881.703691] [<ffffffffa028b5da>] hfi1_user_sdma_process_request+0x84a/0xab0 [hfi1] [ 2881.703777] [<ffffffffa0255412>] hfi1_aio_write+0xd2/0x110 [hfi1] [ 2881.703828] [<ffffffff8119e3d8>] do_sync_readv_writev+0x48/0x80 [ 2881.703837] [<ffffffff8119f78b>] do_readv_writev+0xbb/0x230 [ 2881.703843] [<ffffffff8119fab8>] SyS_writev+0x48/0xc0 This commit also addresses issues related to notification of user processes of SDMA request slot availability. The slot should be cleaned up first before the user processes is notified of its availability. Reviewed-by: Arthur Kepner <arthur.kepner@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 20:37:55 -05:00
Amitoj Kaur Chawla	84d899e709	staging: rdma: hfi1: Remove header file Remove duplicate include file. Found using includecheck. Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-02-22 12:03:22 -08:00
Amitoj Kaur Chawla	72a5f6a8df	staging: rdma: hfi1: Use offset_in_page macro Use offset_in_page macro instead of (var & ~PAGE_MASK) The Coccinelle semantic patch used to make this change is as follows: // <smpl> @@ unsigned long p; @@ - p & ~PAGE_MASK + offset_in_page(p) // </smpl> Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-02-20 14:54:05 -08:00
Bhumika Goyal	a4d7d05b2d	Staging: rdma: hfi1: Delete NULL check before vfree The function vfree test whether the argument is NULL and return immediately. SO NULL test is not needed before vfree. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-02-14 16:44:06 -08:00
Mitko Haralanov	6a5464f224	staging/rdma/hfi1: Detect SDMA transmission error early It is possible for an SDMA transmission error to happen during the processing of an user SDMA transfer. In that case it is better to detect it early and abort any further attempts to send more packets. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-12-21 13:57:55 -08:00
Mitko Haralanov	faa98b8620	staging/rdma/hfi1: Clean-up unnecessary goto statements Clean-up unnecessary goto statements based on feedback from the mailing list on previous patch submissions. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-12-21 13:57:55 -08:00
Mitko Haralanov	a0d406934a	staging/rdma/hfi1: Add page lock limit check for SDMA requests The driver pins pages on behalf of user processes in two separate instances - when the process has submitted a SDMA transfer and when the process programs an expected receive buffer. When pinning pages, the driver is required to observe the locked page limit set by the system administrator and refuse to lock more pages than allowed. Such a check was done for expected receives but was missing from the SDMA transfer code path. This commit adds the missing check for SDMA transfers. As of this commit, user SDMA or expected receive requests will be rejected if the number of pages required to be pinned will exceed the set limit. Due to the fact that the driver needs to take the MM semaphore in order to update the locked page count (which can sleep), this cannot be done by the callback function as it [the callback] is executed in interrupt context. Therefore, it is necessary to put all the completed SDMA tx requests onto a separate list (txcmp) and offload the actual clean-up and unpinning work to a workqueue. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-12-21 13:57:55 -08:00
Sunny Kumar	cb32649d08	staging: rdma: hfi1 : Prefer using the BIT macro This patch replaces bit shifting on 1 with the BIT(x) macro Signed-off-by: Sunny Kumar <sunnyk@cdac.in> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-11-15 20:02:47 -08:00
Ira Weiny	9e10af4787	staging/rdma/hfi1: Remove file pointer macros Remove the following macros in favor of explicit use of struct hfi1_filedata and various sub structures. ctxt_fp subctxt_fp tidcursor_fp user_sdma_pkt_fp user_sdma_comp_fp Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-11-15 20:02:47 -08:00
Mitko Haralanov	b9fb6318d0	staging/rdma/hfi1: Prevent silent data corruption with user SDMA User SDMA keeps track of progress into the submitted IO vectors by tracking an offset into the vectors when packets are submitted. This offset is updated after a successful submission of a txreq to the SDMA engine. The same offset was used when determining whether an IO vector should be 'freed' (pages unpinned) in the SDMA callback functions. This was causing a silent data corruption in big jobs (> 2 nodes, 120 ranks each) on the receive side because the send side was mistakenly unpinning the vector pages before the HW has processed all descriptors referencing the vector. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-27 17:19:22 +09:00
Alison Schofield	806e6e1bec	staging: rdma: hfi1: remove unnecessary out of memory messages Out of memory messages are unnecssary in the drivers as they are reported by memory management. Addresses checkpatch.pl: WARNING: Possible unnecessary 'out of memory' message Signed-off-by: Alison Schofield <amsfield22@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-12 20:35:21 -07:00
Shraddha Barke	314fcc0d53	Staging: rdma: hfi1: Use kcalloc instead of kzalloc to allocate array The advantage of kcalloc is, that will prevent integer overflows which could result from the multiplication of number of elements and size and it is also a bit nicer to read. Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-12 20:35:21 -07:00
Julia Lawall	adad44d132	hfi1: drop null test before destroy functions Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\\|mempool_destroy\\|dma_pool_destroy\)(x); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Mike Marciniszyn <infinipath@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-09-15 06:40:57 -07:00
Mike Marciniszyn	7724105686	IB/hfi1: add driver files Signed-off-by: Andrew Friedley <andrew.friedley@intel.com> Signed-off-by: Arthur Kepner <arthur.kepner@intel.com> Signed-off-by: Brendan Cunningham <brendan.cunningham@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jim Snow <jim.m.snow@intel.com> Signed-off-by: John Gregor <john.a.gregor@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Kevin Pine <kevin.pine@intel.com> Signed-off-by: Kyle Liddell <kyle.liddell@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ravi Krishnaswamy <ravi.krishnaswamy@intel.com> Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Sanath Kumar <sanath.s.kumar@intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Vlad Danushevsky <vladimir.danusevsky@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-28 22:59:36 -04:00

42 commits