No description
Find a file
Håkon Bugge 4df76024c0 IB/mlx4: Fix starvation in paravirt mux/demux
[ Upstream commit 7fd1507df7 ]

The mlx4 driver will proxy MAD packets through the PF driver. A VM or an
instantiated VF will send its MAD packets to the PF driver using
loop-back. The PF driver will be informed by an interrupt, but defer the
handling and polling of CQEs to a worker thread running on an ordered
work-queue.

Consider the following scenario: the VMs will in short proximity in time,
for example due to a network event, send many MAD packets to the PF
driver. Lets say there are K VMs, each sending N packets.

The interrupt from the first VM will start the worker thread, which will
poll N CQEs. A common case here is where the PF driver will multiplex the
packets received from the VMs out on the wire QP.

But before the wire QP has returned a send CQE and associated interrupt,
the other K - 1 VMs have sent their N packets as well.

The PF driver has to multiplex K * N packets out on the wire QP. But the
send-queue on the wire QP has a finite capacity.

So, in this scenario, if K * N is larger than the send-queue capacity of
the wire QP, we will get MAD packets dropped on the floor with this
dynamic debug message:

mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11)

and this despite the fact that the wire send-queue could have capacity,
but the PF driver isn't aware, because the wire send CQEs have not yet
been polled.

We can also have a similar scenario inbound, with a wire recv-queue larger
than the tunnel QP's send-queue. If many remote peers send MAD packets to
the very same VM, the tunnel send-queue destined to the VM could allegedly
be construed to be full by the PF driver.

This starvation is fixed by introducing separate work queues for the wire
QPs vs. the tunnel QPs.

With this fix, using a dual ported HCA, 8 VFs instantiated, we could run
cmtime on each of the 18 interfaces towards a similar configured peer,
each cmtime instance with 800 QPs (all in all 14400 QPs) without a single
CM packet getting lost.

Fixes: 3cf69cc8db ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200803061941.1139994-5-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29 09:07:08 +01:00
arch x86/fpu: Allow multiple bits in clearcpuid= parameter 2020-10-29 09:07:00 +01:00
block block: ensure bdi->io_pages is always initialized 2020-09-12 13:39:11 +02:00
certs Replace magic for trusting the secondary keyring with #define 2018-09-09 19:55:54 +02:00
crypto crypto: algif_skcipher - EBUSY on aio should be an error 2020-10-29 09:07:01 +01:00
Documentation x86/fpu: Allow multiple bits in clearcpuid= parameter 2020-10-29 09:07:00 +01:00
drivers IB/mlx4: Fix starvation in paravirt mux/demux 2020-10-29 09:07:08 +01:00
firmware Fix built-in early-load Intel microcode alignment 2020-01-23 08:20:30 +01:00
fs mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary 2020-10-29 09:07:08 +01:00
include mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary 2020-10-29 09:07:08 +01:00
init x86: Fix early boot crash on gcc-10, third try 2020-05-20 08:17:15 +02:00
ipc ipc/util.c: sysvipc_find_ipc() incorrectly updates position index 2020-05-20 08:17:07 +02:00
kernel mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary 2020-10-29 09:07:08 +01:00
lib Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts 2020-10-14 09:51:10 +02:00
mm mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary 2020-10-29 09:07:08 +01:00
net nl80211: fix non-split wiphy information 2020-10-29 09:07:07 +01:00
samples samples: bpf: Fix build error 2020-06-03 08:17:55 +02:00
scripts checkpatch: fix the usage of capture group ( ... ) 2020-09-09 19:03:13 +02:00
security ima: Don't ignore errors from crypto_shash_update() 2020-10-29 09:07:00 +01:00
sound ALSA: seq: oss: Avoid mutex lock for a long-time ioctl 2020-10-29 09:07:06 +01:00
tools perf top: Fix stdio interface input handling with glibc 2.28+ 2020-10-14 09:51:11 +02:00
usr initramfs: restore default compression behavior 2020-04-13 10:34:19 +02:00
virt KVM: fix overflow of zero page refcount with ksm running 2020-10-01 13:12:33 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: rpm-pkg: keep spec file until make mrproper 2018-02-13 10:19:46 +01:00
.mailmap .mailmap: Add Maciej W. Rozycki's Imagination e-mail address 2017-11-10 12:16:15 -08:00
COPYING
CREDITS MAINTAINERS: update TPM driver infrastructure changes 2017-11-09 17:58:40 -08:00
Kbuild License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
MAINTAINERS MAINTAINERS: Update drm/i915 bug filing URL 2020-02-28 16:36:12 +01:00
Makefile Linux 4.14.202 2020-10-17 10:29:55 +02:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.