No description
Find a file
Selvin Xavier 942c9b6ca8 RDMA/bnxt_re: Avoid Hard lockup during error CQE processing
Hitting the following hardlockup due to a race condition in
error CQE processing.

[26146.879798] bnxt_en 0000:04:00.0: QPLIB: FP: CQ Processed Req
[26146.886346] bnxt_en 0000:04:00.0: QPLIB: wr_id[1251] = 0x0 with status 0xa
[26156.350935] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
[26156.357470] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace
[26156.447957] CPU: 4 PID: 3413 Comm: kworker/4:1H Kdump: loaded
[26156.457994] Hardware name: Dell Inc. PowerEdge R430/0CN7X8,
[26156.466390] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[26156.472639] Call Trace:
[26156.475379]  <NMI>  [<ffffffff98d0d722>] dump_stack+0x19/0x1b
[26156.481833]  [<ffffffff9873f775>] watchdog_overflow_callback+0x135/0x140
[26156.489341]  [<ffffffff9877f237>] __perf_event_overflow+0x57/0x100
[26156.496256]  [<ffffffff98787c24>] perf_event_overflow+0x14/0x20
[26156.502887]  [<ffffffff9860a580>] intel_pmu_handle_irq+0x220/0x510
[26156.509813]  [<ffffffff98d16031>] perf_event_nmi_handler+0x31/0x50
[26156.516738]  [<ffffffff98d1790c>] nmi_handle.isra.0+0x8c/0x150
[26156.523273]  [<ffffffff98d17be8>] do_nmi+0x218/0x460
[26156.528834]  [<ffffffff98d16d79>] end_repeat_nmi+0x1e/0x7e
[26156.534980]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.543268]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.551556]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.559842]  <EOE>  [<ffffffff98d083e4>] queued_spin_lock_slowpath+0xb/0xf
[26156.567555]  [<ffffffff98d15690>] _raw_spin_lock+0x20/0x30
[26156.573696]  [<ffffffffc08381a1>] bnxt_qplib_lock_buddy_cq+0x31/0x40 [bnxt_re]
[26156.581789]  [<ffffffffc083bbaa>] bnxt_qplib_poll_cq+0x43a/0xf10 [bnxt_re]
[26156.589493]  [<ffffffffc083239b>] bnxt_re_poll_cq+0x9b/0x760 [bnxt_re]

The issue happens if RQ poll_cq or SQ poll_cq or Async error event tries to
put the error QP in flush list. Since SQ and RQ of each error qp are added
to two different flush list, we need to protect it using locks of
corresponding CQs. Difference in order of acquiring the lock in
SQ poll_cq and RQ poll_cq can cause a hard lockup.

Revisits the locking strategy and removes the usage of qplib_cq.hwq.lock.
Instead of this lock, introduces qplib_cq.flush_lock to handle
addition/deletion of QPs in flush list. Also, always invoke the flush_lock
in order (SQ CQ lock first and then RQ CQ lock) to avoid any potential
deadlock.

Other than the poll_cq context, the movement of QP to/from flush list can
be done in modify_qp context or from an async error event from HW.
Synchronize these operations using the bnxt_re verbs layer CQ locks.
To achieve this, adds a call back to the HW abstraction layer(qplib) to
bnxt_re ib_verbs layer in case of async error event. Also, removes the
buddy cq functions as it is no longer required.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:39 -07:00
arch unify {de,}mangle_poll(), get rid of kernel-side POLL... 2018-02-11 14:37:22 -08:00
block vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
certs License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
crypto vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
Documentation KVM changes for 4.16 2018-02-10 13:16:35 -08:00
drivers RDMA/bnxt_re: Avoid Hard lockup during error CQE processing 2018-03-06 20:08:39 -07:00
firmware kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
fs vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
include IB/uverbs: Use __aligned_u64 for uapi headers 2018-02-15 14:59:45 -07:00
init membarrier: Provide core serializing command, *_SYNC_CORE 2018-02-05 21:35:03 +01:00
ipc vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
kernel vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
lib Kbuild updates for v4.16 (2nd) 2018-02-09 19:32:41 -08:00
LICENSES LICENSES: Add MPL-1.1 license 2018-01-06 10:59:44 -07:00
mm vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
net vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
samples sample/bpf: fix erspan metadata 2018-02-06 11:32:49 -05:00
scripts Kbuild updates for v4.16 (2nd) 2018-02-09 19:32:41 -08:00
security vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
sound vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
tools Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-02-09 15:34:18 -08:00
usr initramfs: fix initramfs rebuilds w/ compression after disabling 2017-11-03 07:39:19 -07:00
virt vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore scripts/package: snap-pkg target 2017-12-13 00:00:18 +09:00
.mailmap mailmap: update Mark Yao's email address 2018-01-04 16:45:09 -08:00
COPYING
CREDITS MAINTAINERS: update TPM driver infrastructure changes 2017-11-09 17:58:40 -08:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
MAINTAINERS KVM changes for 4.16 2018-02-10 13:16:35 -08:00
Makefile Linux 4.16-rc1 2018-02-11 15:04:29 -08:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.