Go to file
Jakub Kicinski 6025b9135f net: dqs: add NIC stall detector based on BQL
softnet_data->time_squeeze is sometimes used as a proxy for
host overload or indication of scheduling problems. In practice
this statistic is very noisy and has hard to grasp units -
e.g. is 10 squeezes a second to be expected, or high?

Delaying network (NAPI) processing leads to drops on NIC queues
but also RTT bloat, impacting pacing and CA decisions.
Stalls are a little hard to detect on the Rx side, because
there may simply have not been any packets received in given
period of time. Packet timestamps help a little bit, but
again we don't know if packets are stale because we're
not keeping up or because someone (*cough* cgroups)
disabled IRQs for a long time.

We can, however, use Tx as a proxy for Rx stalls. Most drivers
use combined Rx+Tx NAPIs so if Tx gets starved so will Rx.
On the Tx side we know exactly when packets get queued,
and completed, so there is no uncertainty.

This patch adds stall checks to BQL. Why BQL? Because
it's a convenient place to add such checks, already
called by most drivers, and it has copious free space
in its structures (this patch adds no extra cache
references or dirtying to the fast path).

The algorithm takes one parameter - max delay AKA stall
threshold and increments a counter whenever NAPI got delayed
for at least that amount of time. It also records the length
of the longest stall.

To be precise every time NAPI has not polled for at least
stall thrs we check if there were any Tx packets queued
between last NAPI run and now - stall_thrs/2.

Unlike the classic Tx watchdog this mechanism does not
ignore stalls caused by Tx being disabled, or loss of link.
I don't think the check is worth the complexity, and
stall is a stall, whether due to host overload, flow
control, link down... doesn't matter much to the application.

We have been running this detector in production at Meta
for 2 years, with the threshold of 8ms. It's the lowest
value where false positives become rare. There's still
a constant stream of reported stalls (especially without
the ksoftirqd deferral patches reverted), those who like
their stall metrics to be 0 may prefer higher value.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:23:26 +00:00
Documentation net: dqs: add NIC stall detector based on BQL 2024-03-08 10:23:26 +00:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
arch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-03-07 10:29:36 -08:00
block block: sed-opal: handle empty atoms when parsing response 2024-02-16 15:52:45 -07:00
certs This update includes the following changes: 2023-11-02 16:15:30 -10:00
crypto crypto: lskcipher - Copy IV in lskcipher glue code always 2024-02-24 08:37:24 +08:00
drivers net: chelsio: remove unused function calc_tx_descs 2024-03-08 10:19:35 +00:00
fs Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-03-07 10:29:36 -08:00
include net: dqs: add NIC stall detector based on BQL 2024-03-08 10:23:26 +00:00
init bpf-next-for-netdev 2024-03-02 20:50:59 -08:00
io_uring io_uring/net: fix multishot accept overflow handling 2024-02-14 18:30:19 -07:00
ipc shm: Slim down dependencies 2023-12-20 19:26:31 -05:00
kernel net: move skbuff_cache(s) to net_hotdata 2024-03-07 21:12:42 -08:00
lib net: dqs: add NIC stall detector based on BQL 2024-03-08 10:23:26 +00:00
mm net: introduce page_frag_cache_drain() 2024-03-05 11:38:14 +01:00
net net: dqs: add NIC stall detector based on BQL 2024-03-08 10:23:26 +00:00
rust rust: phy: use VTABLE_DEFAULT_ERROR 2024-01-27 14:28:00 +00:00
samples bpf-next-for-netdev 2024-03-02 20:50:59 -08:00
scripts Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-03-07 10:29:36 -08:00
security Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-03-07 10:29:36 -08:00
sound ASoC: Fixes for v6.8 2024-02-29 08:29:04 +01:00
tools netdev: add queue stat for alloc failures 2024-03-07 21:13:26 -08:00
usr Kbuild updates for v6.8 2024-01-18 17:57:07 -08:00
virt Generic: 2024-01-17 13:03:37 -08:00
.clang-format clang-format: Update with v6.7-rc4's `for_each` macro list 2023-12-08 23:54:38 +01:00
.cocciconfig
.editorconfig Add .editorconfig file for basic formatting 2023-12-28 16:22:47 +09:00
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore Add .editorconfig file for basic formatting 2023-12-28 16:22:47 +09:00
.mailmap Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-02-29 14:24:56 -08:00
.rustfmt.toml rust: add `.rustfmt.toml` 2022-09-28 09:02:20 +02:00
COPYING
CREDITS MAINTAINERS: supplement of zswap maintainers update 2024-01-25 23:52:21 -08:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-03-07 10:29:36 -08:00
Makefile Linux 6.8-rc7 2024-03-03 13:02:52 -08:00
README

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.