linux-stable/Documentation
Jakub Kicinski 6025b9135f net: dqs: add NIC stall detector based on BQL
softnet_data->time_squeeze is sometimes used as a proxy for
host overload or indication of scheduling problems. In practice
this statistic is very noisy and has hard to grasp units -
e.g. is 10 squeezes a second to be expected, or high?

Delaying network (NAPI) processing leads to drops on NIC queues
but also RTT bloat, impacting pacing and CA decisions.
Stalls are a little hard to detect on the Rx side, because
there may simply have not been any packets received in given
period of time. Packet timestamps help a little bit, but
again we don't know if packets are stale because we're
not keeping up or because someone (*cough* cgroups)
disabled IRQs for a long time.

We can, however, use Tx as a proxy for Rx stalls. Most drivers
use combined Rx+Tx NAPIs so if Tx gets starved so will Rx.
On the Tx side we know exactly when packets get queued,
and completed, so there is no uncertainty.

This patch adds stall checks to BQL. Why BQL? Because
it's a convenient place to add such checks, already
called by most drivers, and it has copious free space
in its structures (this patch adds no extra cache
references or dirtying to the fast path).

The algorithm takes one parameter - max delay AKA stall
threshold and increments a counter whenever NAPI got delayed
for at least that amount of time. It also records the length
of the longest stall.

To be precise every time NAPI has not polled for at least
stall thrs we check if there were any Tx packets queued
between last NAPI run and now - stall_thrs/2.

Unlike the classic Tx watchdog this mechanism does not
ignore stalls caused by Tx being disabled, or loss of link.
I don't think the check is worth the complexity, and
stall is a stall, whether due to host overload, flow
control, link down... doesn't matter much to the application.

We have been running this detector in production at Meta
for 2 years, with the threshold of 8ms. It's the lowest
value where false positives become rare. There's still
a constant stream of reported stalls (especially without
the ksoftirqd deferral patches reverted), those who like
their stall metrics to be 0 may prefer higher value.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 10:23:26 +00:00
..
ABI net: dqs: add NIC stall detector based on BQL 2024-03-08 10:23:26 +00:00
PCI docs: PCI: Fix typos 2023-12-28 17:37:36 -06:00
RAS
RCU doc: Mention address and data dependencies in rcu_dereference.rst 2023-12-14 01:16:28 +05:30
accel docs/accel: correct links to mailing list archives 2024-01-23 14:45:50 -07:00
accounting
admin-guide net: make SK_MEMORY_PCPU_RESERV tunable 2024-02-28 09:23:08 +00:00
arch x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key 2024-02-19 16:31:49 -08:00
block Documentation: block: ioprio: Update schedulers 2024-01-18 08:21:14 -07:00
bpf bpf: Replace bpf_lpm_trie_key 0-length array with flexible array 2024-02-29 22:52:43 +01:00
cdrom
core-api A handful of late-arriving documentation fixes. 2024-01-17 11:49:11 -08:00
cpu-freq
crypto Another moderately busy cycle for documentation, including: 2024-01-11 19:46:52 -08:00
dev-tools Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-02-01 15:12:37 -08:00
devicetree dt-bindings: net: dp83822: change ti,rmii-mode description 2024-03-07 20:25:28 -08:00
doc-guide docs: Raise the minimum Sphinx requirement to 2.4.4 2023-12-15 08:36:33 -07:00
driver-api dpll: move all dpll<>netdev helpers to dpll code 2024-03-05 18:36:42 -08:00
fault-injection
fb fbdev/intelfb: Remove driver 2024-01-12 12:38:37 +01:00
features riscv: Add support for BATCHED_UNMAP_TLB_FLUSH 2024-01-11 08:01:53 -08:00
filesystems ovl: mark xwhiteouts directory with overlay.opaque='x' 2024-01-23 12:39:48 +02:00
firmware-guide
firmware_class
fpga
gpu amd-drm-next-6.8-2024-01-05: 2024-01-09 09:07:50 +10:00
hid
hwmon hwmon: (lm75) Add AMS AS6200 temperature sensor 2024-01-02 08:44:57 -08:00
i2c Documentation/i2c: fix spelling error in i2c-address-translators 2023-12-27 20:05:44 +01:00
iio
images
infiniband
input
isdn
kbuild docs: kconfig: Fix grammar and formatting 2024-02-15 06:55:47 +09:00
kernel-hacking
leds
litmus-tests
livepatch
locking locking/mutex: Clarify that mutex_unlock(), and most other sleeping locks, can still use the lock object after it's unlocked 2024-01-08 09:55:31 +01:00
maintainer
mhi
misc-devices
mm mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED 2023-12-29 11:58:56 -08:00
netlabel
netlink netdev: add queue stat for alloc failures 2024-03-07 21:13:26 -08:00
networking netdev: add per-queue statistics 2024-03-07 21:13:25 -08:00
nvdimm
nvme
pcmcia
peci
power Documentation: PM: Adjust freezing-of-tasks.rst to the freezer changes 2023-12-19 21:14:32 +01:00
process Including fixes from bpf and netfilter. 2024-02-22 09:57:58 -08:00
rust LoongArch changes for v6.8 2024-01-19 13:30:49 -08:00
scheduler sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true) 2023-12-23 15:59:56 +01:00
scsi
security
sound
sphinx docs: translations: use attribute to store current language 2024-02-21 13:41:37 -07:00
sphinx-static docs: translations: add translations links when they exist 2023-12-19 14:34:59 -07:00
spi
staging rpmsg updates for v6.8 2024-01-17 15:05:27 -08:00
target
tee
timers
tools
trace tracing updates for 6.8: 2024-01-18 14:35:29 -08:00
translations A handful of late-arriving documentation fixes. 2024-01-17 11:49:11 -08:00
usb usb: gadget: ncm: Fix indentations in documentation of NCM section 2024-01-27 16:27:58 -08:00
userspace-api eventpoll: Add epoll ioctl for epoll_params 2024-02-14 11:01:01 +00:00
virt Documentation: hyperv: Add overview of PCI pass-thru device support 2024-03-01 08:29:49 +00:00
w1
watchdog
wmi
.gitignore
Changes
CodingStyle
Kconfig
Makefile doc/netlink: Regenerate netlink .rst files if ynl-gen-rst changes 2023-12-18 14:39:44 -08:00
SubmittingPatches
atomic_bitops.txt
atomic_t.txt
conf.py docs: Instruct LaTeX to cope with deeper nesting 2024-02-20 14:51:42 -07:00
docutils.conf
dontdiff
index.rst
memory-barriers.txt doc: Clarify historical disclaimers in memory-barriers.txt 2023-12-14 01:16:28 +05:30
subsystem-apis.rst