Commit graph

1116 commits

Author SHA1 Message Date
Christoph Lameter
e3b6d8cf8d IB/core: Do not require CAP_NET_ADMIN for packet sniffing
In the Ethernet/TCP world, CAP_NET_RAW is sufficient to allow a program
to listen to all incoming packets on a specific interface, and the
higher CAP_NET_ADMIN is required to set the interface into promiscuous
mode.  We want to emulate that same basic division of privilege in the
RDMA stack, so when dealing with Raw Ethernet QPs, allow apps with
CAP_NET_RAW to listen to all incoming flows (and direct them as they see
fit in their own listen stream).  Do not require CAP_NET_ADMIN just to
listen to traffic already incoming.  Reserve CAP_NET_ADMIN if we attempt
to set promiscuous mode.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-18 10:31:58 -04:00
Doug Ledford
0651ec932a Merge branches 'cxgb4-2', 'i40iw-2', 'ipoib', 'misc-4.7' and 'mlx5-fcs' into k.o/for-4.7 2016-05-13 19:40:38 -04:00
Majd Dibbiny
b531b90948 IB/core: Add Scatter FCS create flag
Raw Packet QPs that were created with Scatter FCS flag, will scatter
the FCS into the receive buffers.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:40:28 -04:00
Majd Dibbiny
0b24e5ac93 IB/core: Add extended device capability flags
Since all the uverbs device_cap_flags are occupied, we need a place to
expose more device capabilities.

This patch adds a new 64 bit device_cap_flags_ex to expose new
device capabilities.

The lower 32 bits will be identical to the original device_cap_flags,
The upper 32 bits will be new capabilities.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:40:27 -04:00
Mark Bloch
0f377d8625 IB/SA: Use correct free function
Fixes a direct call to kfree_skb when nlmsg_free should be used.

Fixes: 2ca546b92a ('IB/sa: Route SA pathrecord query through netlink')
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:40:02 -04:00
Mark Bloch
2fa2d4fb11 IB/core: Fix a potential array overrun in CMA and SA agent
Fix array overrun when going over callback table.
In declaration of callback table, the max size isn't provided and
in registration phase, it is provided.

There is potential scenario where a new operation is added
and it is not supported by current client. The acceptance of
such operation by ib_netlink will cause to array overrun.

Fixes: 809d5fc9bf ("infiniband: pass rdma_cm module to netlink_dump_start")
Fixes: b493d91d33 ("iwcm: common code for port mapper")
Fixes: 2ca546b92a ("IB/sa: Route SA pathrecord query through netlink")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:40:02 -04:00
Mark Bloch
1ae5ccc781 IB/core: Remove unnecessary check in ibnl_rcv_msg
RDMA_NL_GET_OP is defined like this: (type & ((1 << 10) - 1))
which means op (defined as an int) can never be a negative number.

Fixes: b2cbae2c24 ('RDMA: Add netlink infrastructure')
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:40:01 -04:00
Mark Bloch
5ed935e861 IB/IWPM: Fix a potential skb leak
In case ibnl_put_msg fails in send_nlmsg_done,
the function returns with -ENOMEM without freeing.

This patch fixes this behavior.

Fixes: 30dc5e63d6 ("RDMA/core: Add support for iWARP Port Mapper user space service")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:40:01 -04:00
Bart Van Assche
825107a237 iwcm: Fix a sparse warning
Avoid that sparse complains about the comparison of s_addr
with INADDR_ANY.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:39:59 -04:00
Bart Van Assche
9aa8b3217e IB/core: Enhance ib_map_mr_sg()
The SRP initiator allows to set max_sectors to a value that exceeds
the largest amount of data that can be mapped at once with an mlx4
HCA using fast registration and a page size of 4 KB. Hence modify
ib_map_mr_sg() such that it can map partial sg-elements. If an
sg-element has been mapped partially, let the caller know
which fraction has been mapped by adjusting *sg_offset.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:37:57 -04:00
Christoph Hellwig
0e353e34e1 IB/core: add RW API support for signature MRs
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:37:20 -04:00
Christoph Hellwig
a060b5629a IB/core: generic RDMA READ/WRITE API
This supports both manual mapping of lots of SGEs, as well as using MRs
from the QP's MR pool, for iWarp or other cases where it's more optimal.
For now, MRs are only used for iWARP transports.  The user of the RDMA-RW
API must allocate the QP MR pool as well as size the SQ accordingly.

Thanks to Steve Wise for testing, fixing and rewriting the iWarp support,
and to Sagi Grimberg for ideas, reviews and fixes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:37:19 -04:00
Steve Wise
d4a85c309b IB/core: add a need_inval flag to struct ib_mr
This is the first step toward moving MR invalidation decisions
to the core.  It will be needed by the upcoming RW API.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:37:19 -04:00
Christoph Hellwig
fffb0383cf IB/core: add a simple MR pool
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:37:18 -04:00
Christoph Hellwig
04c41bf39f IB/core: refactor ib_create_qp
Split the XRC magic into a separate function, and return early on failure
to make the initialization code readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:37:18 -04:00
Christoph Hellwig
ff2ba99365 IB/core: Add passing an offset into the SG to ib_map_mr_sg
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 13:37:11 -04:00
Christoph Hellwig
0691a286d5 IB/cma: pass the port number to ib_create_qp
The new RW API will need this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-12 14:22:54 -04:00
Jason Gunthorpe
e6bd18f57a IB/security: Restrict use of the write() interface
The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl().  This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.

For the immediate repair, detect and deny suspicious accesses to
the write API.

For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).

The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.

Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ]
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:03:16 -04:00
Sagi Grimberg
42235f80ab IB/core: Don't drain non-existent rq queue-pair
The drain_rq function expects a normal receive qp to drain.  A qp can
only have either a normal rq or an srq.  If there is an srq, there
is no rq to drain.  Until the API supports draining SRQs, simply
skip draining the rq when the qp has an srq attached.

Fixes: 765d67748b ("IB: new common API for draining queues")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-26 12:40:50 -04:00
Doug Ledford
f4e7de63ab IB/core: Fix oops in ib_cache_gid_set_default_gid
When we fail to find the default gid index, we can't continue
processing in this routine or else we will pass a negative
index to later routines resulting in invalid memory access
attempts and a kernel oops.

Fixes: 03db3a2d81 (IB/core: Add RoCE GID table management)
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-22 20:26:44 -04:00
Linus Torvalds
b8ba452683 Round two of 4.6 merge window patches
- A few minor core fixups needed for the next patch series
 - The IB SRIOV series.  This has bounced around for several versions.
   Of note is the fact that the first patch in this series effects
   the net core.  It was directed to netdev and DaveM for each iteration
   of the series (three versions total).  Dave did not object, but did
   not respond either.  I've taken this as permission to move forward
   with the series.
 - The new Intel X722 iWARP driver
 - A huge set of updates to the Intel hfi1 driver.  Of particular interest
   here is that we have left the driver in staging since it still has an
   API that people object to.  Intel is working on a fix, but getting
   these patches in now helps keep me sane as the upstream and Intel's
   trees were over 300 patches apart.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW8HR9AAoJELgmozMOVy/dDYMP+wSBALhIdV/pqVzdLCGfIUbK
 H5agonm/3b/Oj74W30w2JYqXBFfZC2LGVJy6OwocJ3wK04v/KfZbA9G+QsOuh2hQ
 Db+tFn1eoltvzrcx3k/a7x6zHGC4YyxyH9OX2B3QfRsNHeE7PG9KGp5dfEs2OH1r
 WGp3jMLAsHf7o8uKpa0jyTEUEErATaTlG+YoaJ+BGHwurgCNy8ni+wAn+EAFiJ3w
 iEJhcXB6KY69vkLsrLYuT9xxJn4udFJ3QEk8xdPkpLKsu+6Ue5i/eNQ19VfbpZgR
 c6fTc8genfIv5S+fis+0P44u1oA7Kl2JT6IZYLi35gJ60ZmxTD+7GruWP3xX/wJ2
 zuR3sTj5fjcFWenk087RSIU/EK87ONPD4g9QPdZpf3FtgleTVKk3YDlqwjqf8pgv
 cO6gQ1BcOBnixJvhjNFiX1c2hvNhb3CkgObly1JBwhcCzZhLkV7BNFPbZuDHAeAx
 VqzNEUse4hupkgiiuiGgudcJ4fsSxMW37kyfX9QC/qyk6YVuUDbrekcWI+MAKot7
 5e5dHqFExpbn1Zgvc8yfvh88H2MUQAgaYwjanWF/qpppOPRd01nTisVQIOJn7s5C
 arcWzvocpQe0GL2UsvDoWwAABXznL3bnnAoCyTWOES2RhOOcw0Ibw46Jl8FQ8gnl
 2IRxQ+ltNEscb2cwi5wE
 =t2Ko
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
 "Round two of 4.6 merge window patches.

  This is a monster pull request.  I held off on the hfi1 driver updates
  (the hfi1 driver is intimately tied to the qib driver and the new
  rdmavt software library that was created to help both of them) in my
  first pull request.  The hfi1/qib/rdmavt update is probably 90% of
  this pull request.  The hfi1 driver is being left in staging so that
  it can be fixed up in regards to the API that Al and yourself didn't
  like.  Intel has agreed to do the work, but in the meantime, this
  clears out 300+ patches in the backlog queue and brings my tree and
  their tree closer to sync.

  This also includes about 10 patches to the core and a few to mlx5 to
  create an infrastructure for configuring SRIOV ports on IB devices.
  That series includes one patch to the net core that we sent to netdev@
  and Dave Miller with each of the three revisions to the series.  We
  didn't get any response to the patch, so we took that as implicit
  approval.

  Finally, this series includes Intel's new iWARP driver for their x722
  cards.  It's not nearly the beast as the hfi1 driver.  It also has a
  linux-next merge issue, but that has been resolved and it now passes
  just fine.

  Summary:

   - A few minor core fixups needed for the next patch series

   - The IB SRIOV series.  This has bounced around for several versions.
     Of note is the fact that the first patch in this series effects the
     net core.  It was directed to netdev and DaveM for each iteration
     of the series (three versions total).  Dave did not object, but did
     not respond either.  I've taken this as permission to move forward
     with the series.

   - The new Intel X722 iWARP driver

   - A huge set of updates to the Intel hfi1 driver.  Of particular
     interest here is that we have left the driver in staging since it
     still has an API that people object to.  Intel is working on a fix,
     but getting these patches in now helps keep me sane as the upstream
     and Intel's trees were over 300 patches apart"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits)
  IB/ipoib: Allow mcast packets from other VFs
  IB/mlx5: Implement callbacks for manipulating VFs
  net/mlx5_core: Implement modify HCA vport command
  net/mlx5_core: Add VF param when querying vport counter
  IB/ipoib: Add ndo operations for configuring VFs
  IB/core: Add interfaces to control VF attributes
  IB/core: Support accessing SA in virtualized environment
  IB/core: Add subnet prefix to port info
  IB/mlx5: Fix decision on using MAD_IFC
  net/core: Add support for configuring VF GUIDs
  IB/{core, ulp} Support above 32 possible device capability flags
  IB/core: Replace setting the zero values in ib_uverbs_ex_query_device
  net/mlx5_core: Introduce offload arithmetic hardware capabilities
  net/mlx5_core: Refactor device capability function
  net/mlx5_core: Fix caching ATOMIC endian mode capability
  ib_srpt: fix a WARN_ON() message
  i40iw: Replace the obsolete crypto hash interface with shash
  IB/hfi1: Add SDMA cache eviction algorithm
  IB/hfi1: Switch to using the pin query function
  IB/hfi1: Specify mm when releasing pages
  ...
2016-03-22 15:48:44 -07:00
Eli Cohen
50174a7f2c IB/core: Add interfaces to control VF attributes
Following the practice exercised for network devices which allow the PF
net device to configure attributes of its virtual functions, we
introduce the following functions to be used by IPoIB which is the
network driver implementation for IB devices.

ib_set_vf_link_state - set the policy for a VF link. More below.
ib_get_vf_config - read configuration information of a VF
ib_get_vf_stats - read VF statistics
ib_set_vf_guid - set the node or port GUID of a VF

Also add an indication in the device cap flags that indicates that this
IB devices is based on a virtual function.

A VF shares the physical port with the PF and other VFs. When setting
the link state we have three options:

1. Auto - in this mode, the virtual port follows the state of the
   physical port and becomes active only if the physical port's state is
   active. In all other cases it remains in a Down state.
2. Down - sets the state of the virtual port to Down
3. Up - causes the virtual port to transition into Initialize state if
   it was not already in this state. A virtualization aware subnet manager
   can then bring the state of the port into the Active state.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 17:13:14 -04:00
Eli Cohen
a0c1b2a350 IB/core: Support accessing SA in virtualized environment
Per the ongoing standardisation process, when virtual HCAs are present
in a network, traffic is routed based on a destination GID. In order to
access the SA we use the well known SA GID.

We also add a GRH required boolean field to the port attributes which is
used to report to the verbs consumer whether this port is connected to a
virtual network. We use this field to realize whether we need to create
an address vector with GRH to access the subnet administrator. We clear
the port attributes struct before calling the hardware driver to make
sure the default remains that GRH is not required.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:34:06 -04:00
Eli Cohen
fad61ad4e7 IB/core: Add subnet prefix to port info
The subnet prefix is a part of the port_info MAD returned and should be
available at the ib_port_attr struct. We define it here and provide a
default implementation in case the hardware driver does not provide one.
The subnet prefix is required when creating the address vector to access
the SA in networks where GRH must be used.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:34:06 -04:00
Leon Romanovsky
fb532d6a79 IB/{core, ulp} Support above 32 possible device capability flags
The old bitwise device_cap_flags variable was limited to u32 which
has all bits already defined. In order to overcome it, we converted
device_cap_flags variable to be u64 type.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:32:59 -04:00
Leon Romanovsky
2953f42513 IB/core: Replace setting the zero values in ib_uverbs_ex_query_device
The setting to zero during variable initialization eliminates
the need to explicitly set to zero variables and structures.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:32:36 -04:00
Linus Torvalds
643ad15d47 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 protection key support from Ingo Molnar:
 "This tree adds support for a new memory protection hardware feature
  that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

  There's a background article at LWN.net:

      https://lwn.net/Articles/643797/

  The gist is that protection keys allow the encoding of
  user-controllable permission masks in the pte.  So instead of having a
  fixed protection mask in the pte (which needs a system call to change
  and works on a per page basis), the user can map a (handful of)
  protection mask variants and can change the masks runtime relatively
  cheaply, without having to change every single page in the affected
  virtual memory range.

  This allows the dynamic switching of the protection bits of large
  amounts of virtual memory, via user-space instructions.  It also
  allows more precise control of MMU permission bits: for example the
  executable bit is separate from the read bit (see more about that
  below).

  This tree adds the MM infrastructure and low level x86 glue needed for
  that, plus it adds a high level API to make use of protection keys -
  if a user-space application calls:

        mmap(..., PROT_EXEC);

  or

        mprotect(ptr, sz, PROT_EXEC);

  (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
  this special case, and will set a special protection key on this
  memory range.  It also sets the appropriate bits in the Protection
  Keys User Rights (PKRU) register so that the memory becomes unreadable
  and unwritable.

  So using protection keys the kernel is able to implement 'true'
  PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
  PROT_READ as well.  Unreadable executable mappings have security
  advantages: they cannot be read via information leaks to figure out
  ASLR details, nor can they be scanned for ROP gadgets - and they
  cannot be used by exploits for data purposes either.

  We know about no user-space code that relies on pure PROT_EXEC
  mappings today, but binary loaders could start making use of this new
  feature to map binaries and libraries in a more secure fashion.

  There is other pending pkeys work that offers more high level system
  call APIs to manage protection keys - but those are not part of this
  pull request.

  Right now there's a Kconfig that controls this feature
  (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
  (like most x86 CPU feature enablement code that has no runtime
  overhead), but it's not user-configurable at the moment.  If there's
  any serious problem with this then we can make it configurable and/or
  flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
  mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
  x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
  mm/core, x86/mm/pkeys: Add execute-only protection keys support
  x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
  x86/mm/pkeys: Allow kernel to modify user pkey rights register
  x86/fpu: Allow setting of XSAVE state
  x86/mm: Factor out LDT init from context init
  mm/core, x86/mm/pkeys: Add arch_validate_pkey()
  mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
  x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
  x86/mm/pkeys: Add Kconfig prompt to existing config option
  x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
  x86/mm/pkeys: Dump PKRU with other kernel registers
  mm/core, x86/mm/pkeys: Differentiate instruction fetches
  x86/mm/pkeys: Optimize fault handling in access_error()
  mm/core: Do not enforce PKEY permissions on remote mm access
  um, pkeys: Add UML arch_*_access_permitted() methods
  mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
  x86/mm/gup: Simplify get_user_pages() PTE bit handling
  ...
2016-03-20 19:08:56 -07:00
Linus Torvalds
9ea4463520 Initial roundup of 4.6 merge window patches
- cxgb4 updates
 - nes updates
 - unification of iwarp portmapper code to core
 - add drain_cq API
 - various ib_core updates
 - minor ipoib updates
 - minor mlx4 updates
 - more significant mlx5 updates (including a minor merge conflict with
   net-next tree...merge is simple to resolve and Stephen's resolution was
   confirmed by Mellanox)
 - trivial net/9p rdma conversion
 - ocrdma RoCEv2 update
 - srpt updates
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6aTEAAoJELgmozMOVy/dlAEQAKgT0VwBi6Zd4PihP2UQgsfH
 LUmbGhCzBpcao1eJ7piOOEYQGSb3slN3Cnup4qBJak+y2mhtErxNkLOIhGRrvcHk
 XCym7N9uAhp4j++OnUBp6Cpr0hZNmBEBKm6nKqdEcdaxLaVa0ezdcxAOkVlHhZ77
 NnhTHvPy8pu4kC8NZCvCIJK+fqW+5Xj+ojAcVKGPV+Y3zf9lfaDCXCSdD2m6+dFX
 /KV3V/CNUSdYTWrPZSIDhqoYix2AGl5Fg17mfsgBWQB/T405fiwZkd0FEXkqXDkR
 bOhS5PnuCN+ScwsxMDHCbzqtaOb06sKttg9IE3s0qdFpOwGtbyoU+lLUh1qbjKLP
 vtEiySZq2Mhlr41ajuUuDSgNbqCTL7+52/HUf8qcjFFiSBlZRaTO8rVJ5tABKRiW
 SkxkHbR6orx8okKtaWRskKRtYSNkA2uexdIQ/wzc4fJVqzqJUh6Elcxp3dPq/KSN
 lkrYXNJ5X4ux72QfHRobBX1pBjT0P2+avoFri3763k9ZrsWwY9tXgDUB/OdX11IF
 gAadgUNw2pHgY10jqCZBOw22F+foB2qx8ZkaNSGYE0h3uQrp+iiCnfeU9rWNCWVv
 MelRGpfGa7VF3RTDojc7Dq7JpWRUChMx9BY+XrQPmV08Z+JGoVuRT20Q7twgillz
 Yb3aGRKZNtqYehj9fM4n
 =kTkT
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "Initial roundup of 4.6 merge window patches.

  This is the first of two pull requests.  It is the smaller request,
  but touches for more different things (this is everything but what is
  in or going into staging).  The pull request for the code in
  staging/rdma is on hold until after we decide what to do on the
  write/writev API issue and may be partially deferred until 4.7 as a
  result.

  Summary:

   - cxgb4 updates
   - nes updates
   - unification of iwarp portmapper code to core
   - add drain_cq API
   - various ib_core updates
   - minor ipoib updates
   - minor mlx4 updates
   - more significant mlx5 updates (including a minor merge conflict
     with net-next tree...merge is simple to resolve and Stephen's
     resolution was confirmed by Mellanox)
   - trivial net/9p rdma conversion
   - ocrdma RoCEv2 update
   - srpt updates"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (85 commits)
  iwpm: crash fix for large connections test
  iw_cxgb3: support for iWARP port mapping
  iw_cxgb4: remove port mapper related code
  iw_nes: remove port mapper related code
  iwcm: common code for port mapper
  net/9p: convert to new CQ API
  IB/mlx5: Add support for don't trap rules
  net/mlx5_core: Introduce forward to next priority action
  net/mlx5_core: Create anchor of last flow table
  iser: Accept arbitrary sg lists mapping if the device supports it
  mlx5: Add arbitrary sg list support
  IB/core: Add arbitrary sg_list support
  IB/mlx5: Expose correct max_fast_reg_page_list_len
  IB/mlx5: Make coding style more consistent
  IB/mlx5: Convert UMR CQ to new CQ API
  IB/ocrdma: Skip using unneeded intermediate variable
  IB/ocrdma: Skip using unneeded intermediate variable
  IB/ocrdma: Delete unnecessary variable initialisations in 11 functions
  IB/core: Documentation fix in the MAD header file
  IB/core: trivial prink cleanup.
  ...
2016-03-18 09:39:22 -07:00
Linus Torvalds
364e8dd9d6 Configfs changes for the 4.6 merge window:
- A large patch from me to simplify setting up the list of default
    groups by actually implementing it as a list instead of an array.
  - a small Y2083 prep patch from Deepa Dinamani.  Probably doesn't matter
    on it's own, but it seems like he is trying to get rid of all CURRENT_TIME
    uses in file systems, which is a worthwhile goal.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6Cz6AAoJEA+eU2VSBFGDmNYP/AzJuVdkXjOkzmAl0SjwS0UC
 b/gTF0Z0jAmXX8QTf0NtdNajHweYyY4PVvyuUYojO/Y9bgJigRC6gHIUviq8TLhO
 JR1EUJ3RNoWFZSHeEGTM4q+kSg3GkZ83WixeBiMkIZo7QgPXU2YB0mzErpdcID3N
 +KVnoVU+asVQi656UIDNZ1SawTAGog+tIMIgnM4vmL0Dd+9yN4pYhAmRLLS0C83P
 DPci/oVx1a3IjWAkmz24qtb9ht/SA+IBwyFPltg/gdn5OgJL9Vr1naW5mkqMhoPF
 PUBfX9YYizMwNMYuchng6JqyWlZBjXFr6iqi401vFJcILeq27As5Kc9adfDOEvVC
 V/dWCmTyMlHX507t+lC7kTa6OaHAZKA5scCHA6dgpQIvGfiaMNNu7MW8C6p0HqwY
 rf7na7S2fAu5zCyIRVPK//YMNbRHh2AoclzpK7Sw0NCV5jBlXZOdDJcSb4jQsVF7
 Yy84EqcebvF4ocaFRzwA/ZHNxz65l5Qu7brmOu6pTliQuQED1fop5z92RXkw2e9y
 rSIgzMCL5IoAUkYtoO1jzAQXzyySAb3QDpwCaBdZLzN4MbRF/dUxZDkOePKTaVft
 ckNXj5AVzvLYlpkmkhQ+bqsh91ayFH2/gw9Kt38i1yjzNLhsccZwq9ja5ifPlHLQ
 nOFiane31yp3Zhac8drb
 =9HqT
 -----END PGP SIGNATURE-----

Merge tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs

Pull configfs updates from Christoph Hellwig:

 - A large patch from me to simplify setting up the list of default
   groups by actually implementing it as a list instead of an array.

 - a small Y2083 prep patch from Deepa Dinamani.  Probably doesn't
   matter on it's own, but it seems like he is trying to get rid of all
   CURRENT_TIME uses in file systems, which is a worthwhile goal.

* tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs:
  configfs: switch ->default groups to a linked list
  configfs: Replace CURRENT_TIME by current_fs_time()
2016-03-17 16:25:46 -07:00
Doug Ledford
082eaa5083 Merge branches 'nes', 'cxgb4' and 'iwpm' into k.o/for-4.6 2016-03-16 13:57:43 -04:00
Faisal Latif
dafb558717 iwpm: crash fix for large connections test
During large connection test, there is a crash at wake_up() in the callback as waitq is
not yet initialized. Callback can happen before iwpm_wait_complete_req() is called to
initialize waitq.
To resolve, using signaling semaphore instead of waitq.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Reviewed-by: Tatyana E Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-16 13:48:32 -04:00
Faisal Latif
b493d91d33 iwcm: common code for port mapper
moved port mapper related code from drivers into common code

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Tatyana E. Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-16 13:47:52 -04:00
Doug Ledford
d2ad9cc759 Merge branches 'mlx4', 'mlx5' and 'ocrdma' into k.o/for-4.6 2016-03-16 13:38:28 -04:00
Doug Ledford
76b0640279 Merge branches 'ib_core', 'ib_ipoib', 'srpt', 'drain-cq-v4' and 'net/9p' into k.o/for-4.6 2016-03-14 17:42:57 -04:00
Christoph Hellwig
1ae1602de0 configfs: switch ->default groups to a linked list
Replace the current NULL-terminated array of default groups with a linked
list.  This gets rid of lots of nasty code to size and/or dynamically
allocate the array.

While we're at it also provide a conveniant helper to remove the default
groups.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Felipe Balbi <balbi@kernel.org>		[drivers/usb/gadget]
Acked-by: Joel Becker <jlbec@evilplan.org>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
2016-03-06 16:11:24 +01:00
Sagi Grimberg
f5aa9159a4 IB/core: Add arbitrary sg_list support
Devices that are capable in registering SG lists
with gaps can now expose it in the core to ULPs
using a new device capability IB_DEVICE_SG_GAPS_REG
(in a new field device_cap_flags_ex in the device attributes
as we ran out of bits), and a new mr_type IB_MR_TYPE_SG_GAPS_REG
which allocates a memory region which is capable of handling
SG lists with gaps.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-04 11:59:34 -05:00
Or Gerlitz
11d8d64534 IB/core: Use GRH when the path hop-limit > 0
According to IBTA spec v1.3 section 12.7.19, QPs should use GRH when
the path returned by the SA has hop-limit > 0. Currently, we do that
only for the > 1 case, fix that.

Fixes: 6d969a471b ('IB/sa: Add ib_init_ah_from_path()')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03 10:52:58 -05:00
Parav Pandit
aba25a3e96 IB/core: trivial prink cleanup.
1. Replaced printk with appropriate pr_warn, pr_err, pr_info.
2. Removed unnecessary prints around memory allocation failure
which are not required, as reported by the checkpatch script.

Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03 10:20:25 -05:00
Amitoj Kaur Chawla
db9314cd35 IB/core: Replace memset with eth_zero_addr
Use eth_zero_addr to assign the zero address to the given address
array instead of memset when second argument is address of zero.

The Coccinelle semantic patch used to make this change is as follows:

// <smpl>
@eth_zero_addr@
expression e;
@@

-memset(e,0x00,ETH_ALEN);
+eth_zero_addr(e);
// </smpl>

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03 10:19:41 -05:00
Eli Cohen
eaebc7d21e IB/core: Modify conditional on ucontext existence
Since we allow to call legacy verbs using their extended counterpart,
the check on ucontext has to move up to a common area in case this verb
is ever extended.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03 10:19:40 -05:00
Eli Cohen
2dbd5186a3 IB/core: IB/core: Allow legacy verbs through extended interfaces
When an extended verb is an extension to a legacy verb, the original
functionality is preserved. Hence we do not require each hardware driver
to set the extended capability. This will allow the use of the extended
verb in its simple form with drivers that do not support the extended
capability.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03 10:18:45 -05:00
Eli Cohen
74a0b0a5ea IB/core: Avoid duplicate code
Move the check on the validity of the command to a common area.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03 10:18:44 -05:00
Majd Dibbiny
3d943c9d1c IB/{core, mlx5}: Fix input len in vendor part of create_qp/srq
Currently, the inlen field of the vendor's part of the command
doesn't match the command buffer. This happens because the inlen
accommodates ib_uverbs_cmd_hdr which is deducted from the in buffer.
This is problematic since the vendor function could be called either
from the legacy verb (where the input length mismatches the actual
length) or by the extended verb (where the length matches). The vendor
has no idea which function calls it and therefore has no way to know
how the length variable should be treated.

Fixing this by aligning the inlen to the correct length.

All vendor drivers either assumed that inlen >= sizeof(vendor_uhw_cmd)
or just failed wrongly (mlx5) and fixed in this patch.

Fixes: cfb5e088e2 ('IB/mlx5: Add CQE version 1 support to user QPs and SRQs')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03 10:00:18 -05:00
Matan Barak
b2a239df4e IB/core: Add vendor's specific data to alloc mw
Passing udata to the vendor's driver in order to pass data from the
user-space driver to the kernel-space driver. This data will be
used in downstream patches.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-01 11:18:53 -05:00
Haggai Eran
84424a7fc7 IB/cma: Print warning on different inner and header P_Keys
Commit 4c21b5bcef ("IB/cma: Add net_dev and private data checks to RDMA
CM") added checks for incoming RDMA CM requests that they can be matched to
a netdev based on the P_Key in the BTH of the request. This behavior was
reverted in commit ab3964ad2a ("IB/cma: Use inner P_Key to determine
netdev"), since the mlx5 and ipath drivers didn't send the correct value
in the BTH P_Key.

Since the ipath driver was removed, and the mlx5 driver can now send GSI
packets on different P_Keys, we could revert the patch to let the rdma_cm
module look on the BTH P_Key when deciding to what netdev a packet belongs.
However, that still breaks compatibility with the older drivers.

Change the behavior to print a warning when receiving a request that has a
different BTH P_Key and inner payload P_Key. In the future, after users
have seen the warnings and upgraded their setups, remove the warning and
block these requests.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-01 11:04:07 -05:00
Leon Romanovsky
5adebafb75 IB/core: Fix missed clean call in registration path
In case of failure returned from query function in
IB device registration, we need to clean IB cache which
was missed.

This change fixes it.

Fixes: 3e153a93a1 ('IB/core: Save the device attributes on the device
structure')
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-29 20:41:47 -05:00
Marina Varshaver
a3100a7879 IB/core: Add don't trap flag to flow creation
Don't trap flag (i.e. IB_FLOW_ATTR_FLAGS_DONT_TRAP) indicates that QP
will receive traffic, but will not steal it.

When a packet matches a flow steering rule that was created with
the don't trap flag, the QPs assigned to this rule will get this
packet, but matching will continue to other equal/lower priority
rules. This will let other QPs assigned to those rules to get the
packet too.

If both don't trap rule and other rules have the same priority
and match the same packet, the behavior is undefined.

The don't trap flag can't be set with default rule types
(i.e. IB_FLOW_ATTR_ALL_DEFAULT, IB_FLOW_ATTR_MC_DEFAULT) as default rules
don't have rules after them and don't trap has no meaning here.

Signed-off-by: Marina Varshaver <marinav@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-29 17:11:40 -05:00
Steve Wise
765d67748b IB: new common API for draining queues
Add provider-specific drain_sq/drain_rq functions for providers needing
special drain logic.

Add static functions __ib_drain_sq() and __ib_drain_rq() which post noop
WRs to the SQ or RQ and block until their completions are processed.
This ensures the applications completions for work requests posted prior
to the drain work request have all been processed.

Add API functions ib_drain_sq(), ib_drain_rq(), and ib_drain_qp().

For the drain logic to work, the caller must:

ensure there is room in the CQ(s) and QP for the drain work request
and completion.

allocate the CQ using ib_alloc_cq() and the CQ poll context cannot be
IB_POLL_DIRECT.

ensure that there are no other contexts that are posting WRs concurrently.
Otherwise the drain is not guaranteed.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-29 17:10:27 -05:00
Dave Hansen
d4edcf0d56 mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm
We will soon modify the vanilla get_user_pages() so it can no
longer be used on mm/tasks other than 'current/current->mm',
which is by far the most common way it is called.  For now,
we allow the old-style calls, but warn when they are used.
(implemented in previous patch)

This patch switches all callers of:

	get_user_pages()
	get_user_pages_unlocked()
	get_user_pages_locked()

to stop passing tsk/mm so they will no longer see the warnings.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 10:11:12 +01:00
Dave Hansen
1e9877902d mm/gup: Introduce get_user_pages_remote()
For protection keys, we need to understand whether protections
should be enforced in software or not.  In general, we enforce
protections when working on our own task, but not when on others.
We call these "current" and "remote" operations.

This patch introduces a new get_user_pages() variant:

        get_user_pages_remote()

Which is a replacement for when get_user_pages() is called on
non-current tsk/mm.

We also introduce a new gup flag: FOLL_REMOTE which can be used
for the "__" gup variants to get this new behavior.

The uprobes is_trap_at_addr() location holds mmap_sem and
calls get_user_pages(current->mm) on an instruction address.  This
makes it a pretty unique gup caller.  Being an instruction access
and also really originating from the kernel (vs. the app), I opted
to consider this a 'remote' access where protection keys will not
be enforced.

Without protection keys, this patch should not change any behavior.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210154.3F0E51EA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 10:04:09 +01:00