When the remote side is not up, we do not have all the context for the
transport, and that causes NULL ptr access. Have the debugfs reads check
to see if transport is up before we make access.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Currently the debugfs does not have files for all NTB transport queue
pairs. When there are multiple NTBs present in a system, the QP names
of the last transport clobber the names of previously added transport
QPs. Only the last added QPs can be observed via debugfs.
Create a directory per NTB transport to associate the QPs with that
transport. Name the directory the same as the PCI device.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
It was possible for a synchronous update of the RX index in the error
case to get ahead of the asynchronous RX index update in the normal
case. Change the RX processing to preserve an RX completion order.
There were two error cases. First, if a buffer is not present to
receive data, there would be no queue entry to preserve the RX
completion order. Instead of dropping the RX frame, leave the RX frame
in the ring. Schedule RX processing when RX entries are enqueued, in
case there are RX frames waiting in the ring to be received.
Second, if a buffer is too small to receive data, drop the frame in the
ring, mark the RX entry as done, and indicate the error in the RX entry
length. Check for a negative length in the receive callback in
ntb_netdev, and count occurrences as rx_length_errors.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Printouts driver name and version to indicate what is being loaded.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Benchmarking showed a significant performance increase with the MTU size
to 64k instead of 16k. Change the driver default to 64k.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Disable DMA usage by default, since the CPU provides much better
performance with write combining. Provide a module parameter to enable
DMA usage when offloading the memcpy is preferred.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Changing the memory window BAR mappings to write combining significantly
boosts the performance. We will also use memcpy that uses non-temporal
store, which showed performance improvement when doing non-cached
memcpys.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Allocate memory and request the DMA channel for the same NUMA node as
the NTB device.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
When the ntb transport is connecting and waiting for the peer, the debug
console receives lots of debug level messages about the remote qp link
status being down. Rate limit those messages.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Reset the link stats when the link goes down. In particular, the TX and
RX index and count must be reset, or else the TX side will be sending
packets to the RX side where the RX side is not expecting them. Reset
all the stats, to be consistent.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
On link down, don't advance RX index to the next entry. The next entry
should never be valid after receiving the link down flag.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The same message "qp %d: Link Down\n" was printed at two locations in
ntb_transport. Change the messages so they are distinct.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The transport was writing and then reading the peer scratch pad,
essentially reading what it just wrote instead of exchanging any
information with the peer. The transport expects the peer values to be
the same as the local values, so this issue was not obvious.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Change ntb_hw_intel to use the new NTB hardware abstraction layer.
Split ntb_transport into its own driver. Change it to use the new NTB
hardware abstraction layer.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
This patch only moves files to their new locations, before applying the
next two patches adding the NTB Abstraction layer. Splitting this patch
from the next is intended make distinct which code is changed only due
to moving the files, versus which are substantial code changes in adding
the NTB Abstraction layer.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The NTB translate register must have the value to be BAR size aligned.
This alignment check make sure that the DMA memory allocated has the
proper alignment. Another requirement for NTB to function properly with
memory window BAR size greater or equal to 4M is to use the CMA feature
in 3.16 kernel with the appropriate CONFIG_CMA_ALIGNMENT and
CONFIG_CMA_SIZE_MBYTES set.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The detection of an uneven number of queues on the given memory windows
was not correct. The mw_num is zero based and the mod should be
division to spread them evenly over the mw's.
Signed-off-by: Jon Mason <jon.mason@intel.com>
NTB-RP Link Up issue, Xeon Doorbell errata workaround, ntb_transport
link down race, and correct dmaengine_get/put usage. Also, clean-ups
to remove duplicate defines and document a hardware errata. Finally,
some changes to improve performance.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJSj/dCAAoJEG5mS6x6i9IjCfAP/0pVA7VYAjQQGLtpUIf8RNIe
iNV58Q3h74+CSzyHkz51HYuJB+6uRC6c0ChKuZMUZk/ca3hqfW0FcC1V3SjYLzxP
Bdf687TaEE4WJOaTI5hnAzzDQ+gU9aE4DgaqvFz43M4jh4scSleuE+CQ0Pjm682x
71KespuPO7Z77EC5QPbWWV5KRSkF4FHuAuDcr8TYAmSnCb8VDGDxdnwYYwBTeY3j
DuPUwLGwvbE6X4gWmOYZSwDz6cArEs1CunpeP3EUFY4rjTZArc4MSLWdC6lYQtZp
T90khUDU/UklUqlOOuydJe30zT60uPBagcEL94JxyWFT+wa5b/ra+gvU+FE4NBzA
ZW6LVlXnYTQ5XJeO5HNDRBHPZ3zZaDI+TGDBM13EBS0wbcO3PojR6Z78nlkjPLeJ
/a1sIEfM3ts7u3Vpw8GpBbhl6+8tP6xbV/rDbzfE6mfq/VVwbfAa/k9oQVh8pfoJ
YyGq1Gr43aAFm0imPRF0qRNDnSDF2UiYrNHntWWyfhW7Vmq/EjT0H21AHhgxnP7b
3AaKQl7nuYoNzx6x8ZiEcSkDy+8qY84i2vUzd9Qll/SBVCOK7KvBoFmu9x7D+ZdN
0mS4qFNXSmHhow7ur2ffwzIaSkggVIhLbEiH4bYbPP+wXzhlTLTnnRb3lFmTs/5X
/HTR7QGxfDAzcjpYAzlS
=nOG2
-----END PGP SIGNATURE-----
Merge tag 'ntb-3.13' of git://github.com/jonmason/ntb
Pull non-transparent bridge updates from Jon Mason:
"NTB driver bug fixes to address a missed call to pci_enable_msix,
NTB-RP Link Up issue, Xeon Doorbell errata workaround, ntb_transport
link down race, and correct dmaengine_get/put usage.
Also, clean-ups to remove duplicate defines and document a hardware
errata. Finally, some changes to improve performance"
* tag 'ntb-3.13' of git://github.com/jonmason/ntb:
NTB: Disable interrupts and poll under high load
NTB: Enable Snoop on Primary Side
NTB: Document HW errata
NTB: remove duplicate defines
NTB: correct dmaengine_get/put usage
NTB: Fix ntb_transport link down race
ntb: Fix missed call to pci_enable_msix()
NTB: Fix NTB-RP Link Up
NTB: Xeon Doorbell errata workaround
dmaengine_get() causes the initialization of the per-cpu channel tables.
It needs to be called prior to dma_find_channel().
Initial version by Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jon Mason <jon.mason@intel.com>
A WARN_ON is being hit in ntb_qp_link_work due to the NTB transport link
being down while the ntb qp link is still active. This is caused by the
transport link being brought down prior to the qp link worker thread
being terminated. To correct this, shutdown the qp's prior to bringing
the transport link down. Also, only call the qp worker thread if it is
in interrupt context, otherwise call the function directly.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Use the generic unmap object to unmap dma buffers.
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Tomasz Figa <t.figa@samsung.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Jon Mason <jon.mason@intel.com>
[djbw: fix up unmap len, and GFP flags]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Many variable names in the NTB driver refer to the primary or secondary
side. However, these variables will be used to access the reverse case
when in NTB-RP mode. Make these names more generic in anticipation of
NTB-RP support.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Allocate and use a DMA engine channel to transmit and receive data over
NTB. If none is allocated, fall back to using the CPU to transfer data.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
There is a Xeon hardware errata related to writes to SDOORBELL or
B2BDOORBELL in conjunction with inbound access to NTB MMIO Space, which
may hang the system. To workaround this issue, use one of the memory
windows to access the interrupt and scratch pad registers on the remote
system. This bypasses the issue, but removes one of the memory windows
from use by the transport. This reduction of MWs necessitates adding
some logic to determine the number of available MWs.
Since some NTB usage methodologies may have unidirectional traffic, the
ability to disable the workaround via modparm has been added.
See BF113 in
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-c5500-c3500-spec-update.pdf
See BT119 in
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdf
Signed-off-by: Jon Mason <jon.mason@intel.com>
Debugfs was setup in NTB to only have a single debugfs directory. This
resulted in the leaking of debugfs directories and files when multiple
NTB devices were present, due to each device stomping on the variables
containing the previous device's values (thus preventing them from being
freed on cleanup). Correct this by creating a secondary directory of
the PCI BDF for each device present, and nesting the previously existing
information in those directories.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Fix issue with adding multiple ntb client devices to the ntb virtual
bus. Previously, multiple devices would be added with the same name,
resulting in crashes. To get around this issue, add a unique number to
the device when it is added.
Signed-off-by: Jon Mason <jon.mason@intel.com>
The system will appear to lockup for long periods of time due to the NTB
driver spending too much time in memcpy. Avoid this by reducing the
number of packets that can be serviced on a given interrupt.
Signed-off-by: Jon Mason <jon.mason@intel.com>
The ring logic of the NTB receive buffer/transmit memory window requires
there to be at least 2 payload sized allotments. For the minimal size
case, split the buffer into two and set the transport_mtu to the
appropriate size.
Signed-off-by: Jon Mason <jon.mason@intel.com>
If the NTB link toggles, the driver could stop receiving due to the
tx_index not being set to 0 on the transmitting size on a link-up event.
This is due to the driver expecting the incoming data to start at the
beginning of the receive buffer and not at a random place.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Each link-up will allocate a new NTB receive buffer when the NTB
properties are negotiated with the remote system. These allocations did
not check for existing buffers and thus did not free them. Now, the
driver will check for an existing buffer and free it if not of the
correct size, before trying to alloc a new one.
Signed-off-by: Jon Mason <jon.mason@intel.com>
64bit BAR sizes are permissible with an NTB device. To support them
various modifications and clean-ups were required, most significantly
using 2 32bit scratch pad registers for each BAR.
Also, modify the driver to allow more than 2 Memory Windows.
Signed-off-by: Jon Mason <jon.mason@intel.com>
->remote_rx_info and ->rx_info are struct ntb_rx_info pointers. If we
add sizeof(struct ntb_rx_info) then it goes too far.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jon Mason <jon.mason@intel.com>
Correct instances of variable dereferencing before checking its value on
the functions exported to the client drivers. Also, add sanity checks
for all exported functions.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jon Mason <jon.mason@intel.com>
Address the sparse warnings and resulting fallout
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the NTB client driver enqueues the maximum number of rx buffers, it
will not be able to re-enqueue another in its callback handler due to a
lack of free entries. This can be avoided by adding the current entry
to the free queue prior to calling the client callback handler. With
this change, ntb_netdev will no longer encounter a rx error on its first
packet.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CPU reads across NTB are slow(er) and can hang the local system if an
ECC error is encountered on the remote. To work around the need for a
read, have the remote side write its current position in the rx buffer
to the local system's buffer and use that to see if there is room when
transmitting.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Correct gcc warning of using too much stack debugfs_read. This is done
by kmallocing the buffer instead of using the char array on stack.
Also, shrinking the buffer to something closer to what is currently
being used.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Declare ntb_bus_type static to remove it from name space, and remove
unused ntb_get_max_spads function. Found via `make namespacecheck`.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use simple_open for debugfs instead of recreating it in the NTB driver.
Caught by coccicheck.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move all cancel_delayed_work_sync to a work thread to prevent sleeping
in interrupt context (when the NTB link goes down). Caught via
'Sleep inside atomic section checking'
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since it is possible for the memory windows on the two NTB connected
systems to be different sizes, the divergent sizes must be accounted for
in the segmentation of the MW's on each side. Create separate size
variables and initialization as necessary.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mmiowb is not sufficient to flush the data and is causing data
corruption. Change to wmb and the data corruption is no more.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Attempts to probe client ntb drivers without ntb hardware present will
result in null pointer dereference due to the lack of the ntb bus device
being registers. Check to see if this is the case, and fail all calls
by the clients registering their drivers.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These are now gone from the kernel, so remove them from the newly-added
drivers before they start to cause build errors for people.
Cc: Jon Mason <jon.mason@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A PCI-Express non-transparent bridge (NTB) is a point-to-point PCIe bus
connecting 2 systems, providing electrical isolation between the two subsystems.
A non-transparent bridge is functionally similar to a transparent bridge except
that both sides of the bridge have their own independent address domains. The
host on one side of the bridge will not have the visibility of the complete
memory or I/O space on the other side of the bridge. To communicate across the
non-transparent bridge, each NTB endpoint has one (or more) apertures exposed to
the local system. Writes to these apertures are mirrored to memory on the
remote system. Communications can also occur through the use of doorbell
registers that initiate interrupts to the alternate domain, and scratch-pad
registers accessible from both sides.
The NTB device driver is needed to configure these memory windows, doorbell, and
scratch-pad registers as well as use them in such a way as they can be turned
into a viable communication channel to the remote system. ntb_hw.[ch]
determines the usage model (NTB to NTB or NTB to Root Port) and abstracts away
the underlying hardware to provide access and a common interface to the doorbell
registers, scratch pads, and memory windows. These hardware interfaces are
exported so that other, non-mainlined kernel drivers can access these.
ntb_transport.[ch] also uses the exported interfaces in ntb_hw.[ch] to setup a
communication channel(s) and provide a reliable way of transferring data from
one side to the other, which it then exports so that "client" drivers can access
them. These client drivers are used to provide a standard kernel interface
(i.e., Ethernet device) to NTB, such that Linux can transfer data from one
system to the other in a standard way.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>