linux-stable/net/ipv4
Salvatore Dipietro 7267e8dcad tcp: Add memory barrier to tcp_push()
On CPUs with weak memory models, reads and updates performed by tcp_push
to the sk variables can get reordered leaving the socket throttled when
it should not. The tasklet running tcp_wfree() may also not observe the
memory updates in time and will skip flushing any packets throttled by
tcp_push(), delaying the sending. This can pathologically cause 40ms
extra latency due to bad interactions with delayed acks.

Adding a memory barrier in tcp_push removes the bug, similarly to the
previous commit bf06200e73 ("tcp: tsq: fix nonagle handling").
smp_mb__after_atomic() is used to not incur in unnecessary overhead
on x86 since not affected.

Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu
22.04 and Apache Tomcat 9.0.83 running the basic servlet below:

import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class HelloWorldServlet extends HttpServlet {
    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
      throws ServletException, IOException {
        response.setContentType("text/html;charset=utf-8");
        OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
        String s = "a".repeat(3096);
        osw.write(s,0,s.length());
        osw.flush();
    }
}

Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
values is observed while, with the patch, the extra latency disappears.

No patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
  ...
 50.000%    0.91ms
 75.000%    1.13ms
 90.000%    1.46ms
 99.000%    1.74ms
 99.900%    1.89ms
 99.990%   41.95ms  <<< 40+ ms extra latency
 99.999%   48.32ms
100.000%   48.96ms

With patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
  ...
 50.000%    0.90ms
 75.000%    1.13ms
 90.000%    1.45ms
 99.000%    1.72ms
 99.900%    1.83ms
 99.990%    2.11ms  <<< no 40+ ms extra latency
 99.999%    2.53ms
100.000%    2.62ms

Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
affected by this issue and the patch doesn't introduce any additional
delay.

Fixes: 7aa5470c2c ("tcp: tsq: move tsq_flags close to sk_wmem_alloc")
Signed-off-by: Salvatore Dipietro <dipiets@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-23 10:22:31 +01:00
..
netfilter netfilter: bridge: replace physindev with physinif in nf_bridge_info 2024-01-17 12:02:49 +01:00
af_inet.c tcp: make sure init the accept_queue's spinlocks once 2024-01-19 21:13:25 -08:00
ah4.c net: ipv4: stop checking crypto_ahash_alignmask 2023-10-27 18:04:29 +08:00
arp.c neighbour: annotate lockless accesses to n->nud_state 2023-03-15 00:37:32 -07:00
bpf_tcp_ca.c Revert BPF token-related functionality 2023-12-19 08:23:03 -08:00
cipso_ipv4.c inet: move inet->is_icsk to inet->inet_flags 2023-08-16 11:09:17 +01:00
datagram.c inet: implement lockless getsockopt(IP_MULTICAST_IF) 2023-10-01 19:39:19 +01:00
devinet.c net: ipv4: fix one memleak in __inet_del_ifa() 2023-09-08 08:02:17 +01:00
esp4.c net: ipv4: fix typo in comments 2023-10-25 10:38:07 +01:00
esp4_offload.c xfrm: Support GRO for IPv4 ESP in UDP encapsulation 2023-10-06 07:30:40 +02:00
fib_frontend.c ipv4: Fix incorrect table ID in IOCTL path 2023-03-16 17:26:31 -07:00
fib_lookup.h
fib_notifier.c
fib_rules.c fib: remove unnecessary input parameters in fib_default_rule_add 2024-01-03 16:42:48 -08:00
fib_semantics.c ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr 2023-10-18 18:11:31 -07:00
fib_trie.c Kill sched.h dependency on rcupdate.h 2023-12-27 11:50:20 -05:00
fou_bpf.c bpf: Add __bpf_kfunc_{start,end}_defs macros 2023-11-01 22:33:53 -07:00
fou_core.c bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
fou_nl.c net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
fou_nl.h net: ynl: prefix uAPI header include with uapi/ 2023-05-26 10:30:14 +01:00
gre_demux.c
gre_offload.c net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
icmp.c xfrm: pass struct net to xfrm_decode_session wrappers 2023-10-06 08:31:53 +02:00
igmp.c ipv4: igmp: fix refcnt uaf issue when receiving igmp query packet 2023-11-24 15:25:56 +00:00
inet_connection_sock.c tcp: make sure init the accept_queue's spinlocks once 2024-01-19 21:13:25 -08:00
inet_diag.c tcp: Link sk and twsk to tb2->owners using skc_bind_node. 2023-12-22 22:15:35 +00:00
inet_fragment.c net: dropreason: add SKB_DROP_REASON_FRAG_REASM_TIMEOUT 2022-10-31 20:14:27 -07:00
inet_hashtables.c tcp: Remove dead code and fields for bhash2. 2023-12-22 22:15:35 +00:00
inet_timewait_sock.c tcp: Link sk and twsk to tb2->owners using skc_bind_node. 2023-12-22 22:15:35 +00:00
inetpeer.c inetpeer: Fix data-races around sysctl. 2022-07-08 12:10:33 +01:00
ip_forward.c net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment check 2023-10-13 09:58:45 -07:00
ip_fragment.c networking: Update to register_net_sysctl_sz 2023-08-15 15:26:18 -07:00
ip_gre.c ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit() 2023-12-06 10:08:05 +01:00
ip_input.c ipv4: ignore dst hint for multipath routes 2023-09-01 08:11:51 +01:00
ip_options.c
ip_output.c net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams. 2023-10-20 12:01:00 +01:00
ip_sockglue.c bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
ip_tunnel.c bpf-next-for-netdev 2023-04-13 16:43:38 -07:00
ip_tunnel_core.c tunnels: fix kasan splat when generating ipv4 pmtu error 2023-08-04 18:24:52 -07:00
ip_vti.c xfrm: pass struct net to xfrm_decode_session wrappers 2023-10-06 08:31:53 +02:00
ipcomp.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipconfig.c net: ipconfig: move ic_nameservers_fallback into #ifdef block 2023-05-22 11:17:55 +01:00
ipip.c ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices 2023-04-12 16:40:39 -07:00
ipmr.c fib: remove unnecessary input parameters in fib_default_rule_add 2024-01-03 16:42:48 -08:00
ipmr_base.c ipmr: adopt rcu_read_lock() in mr_dump() 2022-06-24 11:34:38 +01:00
Kconfig net/tcp: Add TCP-AO config and structures 2023-10-27 10:35:44 +01:00
Makefile bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
metrics.c ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() 2023-01-23 21:37:25 -08:00
netfilter.c xfrm: pass struct net to xfrm_decode_session wrappers 2023-10-06 08:31:53 +02:00
netlink.c
nexthop.c nexthop: Do not increment dump sentinel at the end of the dump 2023-08-15 18:54:53 -07:00
ping.c bpf-next-for-netdev 2023-10-16 21:05:33 -07:00
proc.c net/tcp: Ignore specific ICMPs for TCP-AO connections 2023-10-27 10:35:45 +01:00
protocol.c
raw.c inet: implement lockless getsockopt(IP_MULTICAST_IF) 2023-10-01 19:39:19 +01:00
raw_diag.c net: fill in MODULE_DESCRIPTION()s for SOCK_DIAG modules 2023-11-19 20:09:13 +00:00
route.c ipv4: Correct/silence an endian warning in __ip_do_redirect 2023-11-21 12:55:22 +01:00
syncookies.c tcp: Factorise cookie-dependent fields initialisation in cookie_v[46]_check() 2023-11-29 20:16:38 -08:00
sysctl_net_ipv4.c Use READ/WRITE_ONCE() for IP local_port_range. 2023-12-08 10:44:42 -08:00
tcp.c tcp: Add memory barrier to tcp_push() 2024-01-23 10:22:31 +01:00
tcp_ao.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-12-07 17:53:17 -08:00
tcp_bbr.c net: implement lockless SO_MAX_PACING_RATE 2023-10-01 19:09:54 +01:00
tcp_bic.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_bpf.c tcp_bpf: properly release resources on error paths 2023-10-18 18:09:31 -07:00
tcp_cdg.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
tcp_cong.c net: Update an existing TCP congestion control algorithm. 2023-03-22 22:53:00 -07:00
tcp_cubic.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_dctcp.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_dctcp.h
tcp_diag.c net: fill in MODULE_DESCRIPTION()s for SOCK_DIAG modules 2023-11-19 20:09:13 +00:00
tcp_fastopen.c inet: move inet->defer_connect to inet->inet_flags 2023-08-16 11:09:18 +01:00
tcp_highspeed.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_htcp.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_hybla.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_illinois.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-12-14 17:14:41 -08:00
tcp_ipv4.c tcp: Revert no longer abort SYN_SENT when receiving some ICMP 2024-01-08 19:08:51 -08:00
tcp_lp.c tcp: rename tcp_time_stamp() to tcp_time_stamp_ts() 2023-10-23 09:35:01 +01:00
tcp_metrics.c tcp_metrics: optimize tcp_metrics_flush_all() 2023-10-03 10:05:22 +02:00
tcp_minisocks.c net/tcp: Consistently align TCP-AO option in the header 2023-12-06 12:36:55 +01:00
tcp_nv.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_offload.c net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
tcp_output.c net: Remove acked SYN flag from packet in the transmit queue correctly 2023-12-12 15:56:02 -08:00
tcp_plb.c prandom: remove prandom_u32_max() 2022-12-20 03:13:45 +01:00
tcp_rate.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-04-28 13:02:01 -07:00
tcp_recovery.c tcp: fix excessive TLP and RACK timeouts from HZ rounding 2023-10-17 17:25:42 -07:00
tcp_scalable.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_sigpool.c net/tcp_sigpool: Use kref_get_unless_zero() 2024-01-01 14:42:05 +00:00
tcp_timer.c tcp: use tp->total_rto to track number of linear timeouts in SYN_SENT state 2023-11-16 23:35:12 +00:00
tcp_ulp.c net/ulp: use consistent error code when blocking ULP 2023-01-19 09:26:16 -08:00
tcp_vegas.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_vegas.h
tcp_veno.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_westwood.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_yeah.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tunnel4.c
udp.c bpf: Avoid iter->offset making backward progress in bpf_iter_udp 2024-01-13 11:01:44 -08:00
udp_bpf.c bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() 2023-03-03 17:25:15 +01:00
udp_diag.c net: fill in MODULE_DESCRIPTION()s for SOCK_DIAG modules 2023-11-19 20:09:13 +00:00
udp_impl.h sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
udp_offload.c udp: move udp->gro_enabled to udp->udp_flags 2023-09-14 16:16:36 +02:00
udp_tunnel_core.c ipv4: use tunnel flow flags for tunnel route lookups 2023-10-16 09:57:52 +01:00
udp_tunnel_nic.c udp_tunnel: Use flex array to simplify code 2023-10-03 11:39:34 +02:00
udp_tunnel_stub.c
udplite.c udplite: remove UDPLITE_BIT 2023-09-14 16:16:36 +02:00
xfrm4_input.c xfrm Fix use after free in __xfrm6_udp_encap_rcv. 2023-10-23 07:10:39 +02:00
xfrm4_output.c
xfrm4_policy.c sysctl-6.6-rc1 2023-08-29 17:39:15 -07:00
xfrm4_protocol.c net: xfrm: unexport __init-annotated xfrm4_protocol_init() 2022-06-08 10:10:13 -07:00
xfrm4_state.c
xfrm4_tunnel.c xfrm: tunnel: add extack to ipip_init_state, xfrm6_tunnel_init_state 2022-09-29 07:18:00 +02:00