linux-stable/include
Eric Dumazet 75c119afe1 tcp: implement rb-tree based retransmit queue
Using a linear list to store all skbs in write queue has been okay
for quite a while : O(N) is not too bad when N < 500.

Things get messy when N is the order of 100,000 : Modern TCP stacks
want 10Gbit+ of throughput even with 200 ms RTT flows.

40 ns per cache line miss means a full scan can use 4 ms,
blowing away CPU caches.

SACK processing often can use various hints to avoid parsing
whole retransmit queue. But with high packet losses and/or high
reordering, hints no longer work.

Sender has to process thousands of unfriendly SACK, accumulating
a huge socket backlog, burning a cpu and massively dropping packets.

Using an rb-tree for retransmit queue has been avoided for years
because it added complexity and overhead, but now is the time
to be more resistant and say no to quadratic behavior.

1) RTX queue is no longer part of the write queue : already sent skbs
are stored in one rb-tree.

2) Since reaching the head of write queue no longer needs
sk->sk_send_head, we added an union of sk_send_head and tcp_rtx_queue

Tested:

 On receiver :
 netem on ingress : delay 150ms 200us loss 1
 GRO disabled to force stress and SACK storms.

for f in `seq 1 10`
do
 ./netperf -H lpaa6 -l30 -- -K bbr -o THROUGHPUT|tail -1
done | awk '{print $0} {sum += $0} END {printf "%7u\n",sum}'

Before patch :

323.87
351.48
339.59
338.62
306.72
204.07
304.93
291.88
202.47
176.88
   2840

After patch:

1700.83
2207.98
2070.17
1544.26
2114.76
2124.89
1693.14
1080.91
2216.82
1299.94
  18053

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07 00:28:54 +01:00
..
acpi ACPI / bus: Make ACPI_HANDLE() work for non-GPL code again 2017-09-19 22:42:31 +02:00
asm-generic percpu: make this_cpu_generic_read() atomic w.r.t. interrupts 2017-09-26 07:37:33 -07:00
clocksource
crypto
drm lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
dt-bindings ARC: reset: remove the misleading v1 suffix all over 2017-09-18 13:02:03 +02:00
keys
kvm
linux net: add rb_to_skb() and other rb tree helpers 2017-10-07 00:28:53 +01:00
math-emu
media media updates for v4.14-rc1 2017-09-07 12:53:14 -07:00
memory
misc
net tcp: implement rb-tree based retransmit queue 2017-10-07 00:28:54 +01:00
pcmcia
ras
rdma IB: Correct MR length field to be 64-bit 2017-09-25 11:47:23 -04:00
scsi SCSI misc on 20170913 2017-09-13 10:47:14 -07:00
soc ARM: SoC driver updates for v4.14 2017-09-10 20:40:00 -07:00
sound sound fixes for 4.14-rc4 2017-10-05 10:39:29 -07:00
target
trace sched/debug: Add explicit TASK_PARKED printing 2017-09-29 11:02:57 +02:00
uapi VSOCK: add sock_diag interface 2017-10-05 18:44:17 -07:00
video
xen xen, arm64: drop dummy lookup_address() 2017-09-19 09:25:05 -04:00