linux-stable/net
Neal Cardwell 4720852ed9 tcp: fix delayed ACKs for MSS boundary condition
This commit fixes poor delayed ACK behavior that can cause poor TCP
latency in a particular boundary condition: when an application makes
a TCP socket write that is an exact multiple of the MSS size.

The problem is that there is painful boundary discontinuity in the
current delayed ACK behavior. With the current delayed ACK behavior,
we have:

(1) If an app reads data when > 1*MSS is unacknowledged, then
    tcp_cleanup_rbuf() ACKs immediately because of:

     tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||

(2) If an app reads all received data, and the packets were < 1*MSS,
    and either (a) the app is not ping-pong or (b) we received two
    packets < 1*MSS, then tcp_cleanup_rbuf() ACKs immediately beecause
    of:

     ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED2) ||
      ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED) &&
       !inet_csk_in_pingpong_mode(sk))) &&

(3) *However*: if an app reads exactly 1*MSS of data,
    tcp_cleanup_rbuf() does not send an immediate ACK. This is true
    even if the app is not ping-pong and the 1*MSS of data had the PSH
    bit set, suggesting the sending application completed an
    application write.

Thus if the app is not ping-pong, we have this painful case where
>1*MSS gets an immediate ACK, and <1*MSS gets an immediate ACK, but a
write whose last skb is an exact multiple of 1*MSS can get a 40ms
delayed ACK. This means that any app that transfers data in one
direction and takes care to align write size or packet size with MSS
can suffer this problem. With receive zero copy making 4KB MSS values
more common, it is becoming more common to have application writes
naturally align with MSS, and more applications are likely to
encounter this delayed ACK problem.

The fix in this commit is to refine the delayed ACK heuristics with a
simple check: immediately ACK a received 1*MSS skb with PSH bit set if
the app reads all data. Why? If an skb has a len of exactly 1*MSS and
has the PSH bit set then it is likely the end of an application
write. So more data may not be arriving soon, and yet the data sender
may be waiting for an ACK if cwnd-bound or using TX zero copy. Thus we
set ICSK_ACK_PUSHED in this case so that tcp_cleanup_rbuf() will send
an ACK immediately if the app reads all of the data and is not
ping-pong. Note that this logic is also executed for the case where
len > MSS, but in that case this logic does not matter (and does not
hurt) because tcp_cleanup_rbuf() will always ACK immediately if the
app reads data and there is more than an MSS of unACKed data.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Guo <guoxin0309@gmail.com>
Link: https://lore.kernel.org/r/20231001151239.1866845-2-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04 15:34:18 -07:00
..
6lowpan
9p net: annotate data-races around sock->ops 2023-08-09 15:32:43 -07:00
802
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
appletalk sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
atm sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
ax25 ax25: Kconfig: Update link for linux-ax25.org 2023-09-18 12:56:58 +01:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-24 10:51:39 -07:00
bluetooth Bluetooth: hci_codec: Fix leaking content of local_codecs 2023-09-20 11:03:11 -07:00
bpf bpf: Prevent inlining of bpf_fentry_test7() 2023-08-30 08:36:17 +02:00
bpfilter
bridge neighbour: fix data-races around n->output 2023-10-01 17:14:37 +01:00
caif sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
can net: annotate data-races around sk->sk_tsflags 2023-09-01 07:27:33 +01:00
ceph libceph: do not include crypto/algapi.h 2023-08-24 11:24:37 +02:00
core bpf-for-netdev 2023-10-04 08:28:07 -07:00
dcb net: dcb: choose correct policy to parse DCB_ATTR_BCN 2023-08-01 21:07:46 -07:00
dccp dccp: fix dccp_v4_err()/dccp_v6_err() again 2023-09-18 07:10:31 +01:00
devlink devlink: move devlink_notify_register/unregister() to dev.c 2023-08-28 08:02:24 -07:00
dns_resolver
dsa net: dsa: mark parsed interface mode for legacy switch drivers 2023-08-09 13:08:09 -07:00
ethernet
ethtool ethtool: plca: fix plca enable data type while parsing the value 2023-10-03 07:18:58 -07:00
handshake net/handshake: Fix memory leak in __sock_create() and sock_alloc_file() 2023-09-20 11:54:49 +01:00
hsr net: hsr: Add __packed to struct hsr_sup_tlv. 2023-09-18 08:26:19 +01:00
ieee802154 sysctl-6.6-rc1 2023-08-29 17:39:15 -07:00
ife
ipv4 tcp: fix delayed ACKs for MSS boundary condition 2023-10-04 15:34:18 -07:00
ipv6 ipv6: tcp: add a missing nf_reset_ct() in 3WHS handling 2023-10-03 09:49:24 +02:00
iucv
kcm kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg(). 2023-09-14 10:43:51 +02:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
l2tp ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data() 2023-10-01 18:19:27 +01:00
l3mdev
lapb
llc net/llc/llc_conn.c: fix 4 instances of -Wmissing-variable-declarations 2023-08-09 15:34:28 -07:00
mac80211 wifi: mac80211: Create resources for disabled links 2023-09-26 09:12:47 +02:00
mac802154 Core WPAN changes: 2023-06-24 15:41:46 -07:00
mctp sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
mpls networking: Update to register_net_sysctl_sz 2023-08-15 15:26:18 -07:00
mptcp mptcp: fix dangling connection hang-up 2023-09-18 12:47:56 +01:00
ncsi ncsi: Propagate carrier gain/loss events to the NCSI controller 2023-09-18 07:06:05 +01:00
netfilter netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure 2023-10-04 15:57:28 +02:00
netlabel netlabel: Remove unused declaration netlbl_cipsov4_doi_free() 2023-08-02 12:28:22 -07:00
netlink genetlink: add a family pointer to struct genl_info 2023-08-15 15:01:03 -07:00
netrom netrom: Deny concurrent connect(). 2023-08-28 06:58:46 +01:00
nfc net: nfc: llcp: Add lock when modifying device list 2023-10-03 07:39:31 -07:00
nsh net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-18 12:44:56 -07:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-08-10 14:10:53 -07:00
phonet sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
psample
qrtr net: qrtr: Handle IPCR control port format of older targets 2023-07-17 09:02:30 +01:00
rds net: prevent address rewrite in kernel_bind() 2023-10-01 19:31:29 +01:00
rfkill rfkill: sync before userspace visibility/changes 2023-09-18 09:36:57 +02:00
rose sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
rxrpc Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
sched net: sched: sch_qfq: Fix UAF in qfq_dequeue() 2023-09-05 08:54:12 +02:00
sctp Including fixes from netfilter and bpf. 2023-09-07 18:33:07 -07:00
smc net/smc: use smc_lgr_list.lock to protect smc_lgr_list.list iterate in smcr_port_add 2023-09-10 19:31:42 +01:00
strparser
sunrpc NFSv4.1: fix pnfs MDS=DS session trunking 2023-09-13 11:51:11 -04:00
switchdev net: switchdev: Add a helper to replay objects on a bridge port 2023-07-21 08:54:03 +01:00
tipc tipc: fix a potential deadlock on &tx->lock 2023-10-04 13:24:12 -07:00
tls net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict() 2023-09-12 09:51:49 +02:00
unix Including fixes from netfilter and bpf. 2023-09-07 18:33:07 -07:00
vmw_vsock vsock: Remove unused function declarations 2023-07-31 14:41:08 -07:00
wireless wifi: cfg80211: avoid leaking stack data into trace 2023-09-26 09:12:27 +02:00
x25 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
xdp xsk: Fix xsk_diag use-after-free error during socket cleanup 2023-08-31 13:21:11 +02:00
xfrm sysctl-6.6-rc1 2023-08-29 17:39:15 -07:00
compat.c
devres.c
Kconfig bpf: Add fd-based tcx multi-prog infra with link support 2023-07-19 10:07:27 -07:00
Kconfig.debug
Makefile
socket.c net: prevent address rewrite in kernel_bind() 2023-10-01 19:31:29 +01:00
sysctl_net.c sysctl: Add size to register_net_sysctl function 2023-08-15 15:26:17 -07:00