linux-stable/net/ipv4
Salvatore Dipietro 90fba981ca tcp: Add memory barrier to tcp_push()
[ Upstream commit 7267e8dcad ]

On CPUs with weak memory models, reads and updates performed by tcp_push
to the sk variables can get reordered leaving the socket throttled when
it should not. The tasklet running tcp_wfree() may also not observe the
memory updates in time and will skip flushing any packets throttled by
tcp_push(), delaying the sending. This can pathologically cause 40ms
extra latency due to bad interactions with delayed acks.

Adding a memory barrier in tcp_push removes the bug, similarly to the
previous commit bf06200e73 ("tcp: tsq: fix nonagle handling").
smp_mb__after_atomic() is used to not incur in unnecessary overhead
on x86 since not affected.

Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu
22.04 and Apache Tomcat 9.0.83 running the basic servlet below:

import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class HelloWorldServlet extends HttpServlet {
    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
      throws ServletException, IOException {
        response.setContentType("text/html;charset=utf-8");
        OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
        String s = "a".repeat(3096);
        osw.write(s,0,s.length());
        osw.flush();
    }
}

Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
values is observed while, with the patch, the extra latency disappears.

No patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
  ...
 50.000%    0.91ms
 75.000%    1.13ms
 90.000%    1.46ms
 99.000%    1.74ms
 99.900%    1.89ms
 99.990%   41.95ms  <<< 40+ ms extra latency
 99.999%   48.32ms
100.000%   48.96ms

With patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000  http://172.31.60.173:8080/hello/hello
  ...
 50.000%    0.90ms
 75.000%    1.13ms
 90.000%    1.45ms
 99.000%    1.72ms
 99.900%    1.83ms
 99.990%    2.11ms  <<< no 40+ ms extra latency
 99.999%    2.53ms
100.000%    2.62ms

Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
affected by this issue and the patch doesn't introduce any additional
delay.

Fixes: 7aa5470c2c ("tcp: tsq: move tsq_flags close to sk_wmem_alloc")
Signed-off-by: Salvatore Dipietro <dipiets@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-31 16:17:05 -08:00
..
bpfilter
netfilter netfilter: bridge: replace physindev with physinif in nf_bridge_info 2024-01-25 15:27:51 -08:00
Kconfig tcp: configurable source port perturb table size 2022-11-16 13:02:04 +00:00
Makefile
af_inet.c tcp: make sure init the accept_queue's spinlocks once 2024-01-31 16:17:03 -08:00
ah4.c xfrm: ah: add extack to ah_init_state, ah6_init_state 2022-09-29 07:17:59 +02:00
arp.c neighbour: annotate lockless accesses to n->nud_state 2023-10-10 22:00:42 +02:00
bpf_tcp_ca.c bpf: Use 0 instead of NOT_INIT for btf_struct_access() writes 2022-09-10 17:27:32 -07:00
cipso_ipv4.c cipso: Fix data-races around sysctl. 2022-07-08 12:10:33 +01:00
datagram.c ipv4: fix data-races around inet->inet_id 2023-08-30 16:11:02 +02:00
devinet.c net: ipv4: fix one memleak in __inet_del_ifa() 2023-09-19 12:28:08 +02:00
esp4.c net: ipv4: fix return value check in esp_remove_trailer 2023-10-25 12:03:06 +02:00
esp4_offload.c xfrm: Linearize the skb after offloading if needed. 2023-06-28 11:12:29 +02:00
fib_frontend.c ipv4: Fix incorrect table ID in IOCTL path 2023-03-22 13:33:50 +01:00
fib_lookup.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-17 11:44:20 -08:00
fib_notifier.c
fib_rules.c ipv4: remove unnecessary type castings 2022-04-30 15:12:58 +01:00
fib_semantics.c ipv4/fib: send notify when delete source address routes 2023-10-25 12:03:11 +02:00
fib_trie.c ipv4/fib: send notify when delete source address routes 2023-10-25 12:03:11 +02:00
fou.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
gre_demux.c
gre_offload.c net: gro: skb_gro_header helper function 2022-08-25 10:33:21 +02:00
icmp.c icmp: guard against too small mtu 2023-04-13 16:55:21 +02:00
igmp.c ipv4: igmp: fix refcnt uaf issue when receiving igmp query packet 2023-12-08 08:51:17 +01:00
inet_connection_sock.c tcp: make sure init the accept_queue's spinlocks once 2024-01-31 16:17:03 -08:00
inet_diag.c net: annotate data-races around sk->sk_mark 2023-08-11 12:08:14 +02:00
inet_fragment.c ipv4: remove unnecessary type castings 2022-04-30 15:12:58 +01:00
inet_hashtables.c net: set SOCK_RCU_FREE before inserting socket into hashtable 2023-11-28 17:07:04 +00:00
inet_timewait_sock.c Revert "tcp: avoid the lookup process failing to get sk in ehash table" 2023-07-27 08:50:45 +02:00
inetpeer.c inetpeer: Fix data-races around sysctl. 2022-07-08 12:10:33 +01:00
ip_forward.c ip: Fix data-races around sysctl_ip_fwd_update_priority. 2022-07-15 11:49:55 +01:00
ip_fragment.c net: ip: Handle delivery_time in ip defrag 2022-03-03 14:38:48 +00:00
ip_gre.c ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit() 2023-12-13 18:39:09 +01:00
ip_input.c ipv4: ignore dst hint for multipath routes 2023-09-19 12:28:01 +02:00
ip_options.c ipv4: drop fragmentation code from ip_options_build() 2022-01-29 17:53:07 +00:00
ip_output.c net: annotate data-races around sk->sk_tsflags 2024-01-10 17:10:23 +01:00
ip_sockglue.c net: annotate data-races around sk->sk_tsflags 2024-01-10 17:10:23 +01:00
ip_tunnel.c net: tunnels: annotate lockless accesses to dev->needed_headroom 2023-03-22 13:33:46 +01:00
ip_tunnel_core.c tunnels: fix kasan splat when generating ipv4 pmtu error 2023-08-16 18:27:27 +02:00
ip_vti.c ip_vti: fix potential slab-use-after-free in decode_session6 2023-08-23 17:52:32 +02:00
ipcomp.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipconfig.c Driver core / kernfs changes for 6.0-rc1 2022-08-04 11:31:20 -07:00
ipip.c net: Add helper function to parse netlink msg of ip_tunnel_parm 2022-10-03 07:59:06 +01:00
ipmr.c ipmr: support IP_PKTINFO on cache report IGMP msg 2024-01-25 15:27:28 -08:00
ipmr_base.c ipmr: adopt rcu_read_lock() in mr_dump() 2022-06-24 11:34:38 +01:00
metrics.c ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() 2023-02-01 08:34:45 +01:00
netfilter.c netfilter: Use l3mdev flow key when re-routing mangled packets 2022-05-16 13:03:29 +02:00
netlink.c
nexthop.c neighbour: switch to standard rcu, instead of rcu_bh 2023-10-10 22:00:42 +02:00
ping.c ping: Fix potentail NULL deref for /proc/net/icmp. 2023-04-13 16:55:24 +02:00
proc.c tcp: Don't allocate tcp_death_row outside of struct netns_ipv4. 2022-09-20 10:21:49 -07:00
protocol.c
raw.c net: annotate data-races around sk->sk_priority 2023-08-11 12:08:15 +02:00
raw_diag.c raw: Fix NULL deref in raw_get_next(). 2023-04-13 16:55:23 +02:00
route.c ipv4: Correct/silence an endian warning in __ip_do_redirect 2023-12-03 07:32:07 +01:00
syncookies.c tcp: fix cookie_init_timestamp() overflows 2023-11-20 11:51:54 +01:00
sysctl_net_ipv4.c tcp: enforce receive buffer memory limits by allowing the tcp window to shrink 2023-10-19 23:08:54 +02:00
tcp.c tcp: Add memory barrier to tcp_push() 2024-01-31 16:17:05 -08:00
tcp_bbr.c bpf: Switch to new kfunc flags infrastructure 2022-07-21 20:59:42 -07:00
tcp_bic.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_bpf.c tcp_bpf: properly release resources on error paths 2023-10-25 12:03:13 +02:00
tcp_cdg.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
tcp_cong.c tcp: Add tracepoint for tcp_set_ca_state 2022-04-07 20:33:15 -07:00
tcp_cubic.c bpf: Switch to new kfunc flags infrastructure 2022-07-21 20:59:42 -07:00
tcp_dctcp.c bpf: Switch to new kfunc flags infrastructure 2022-07-21 20:59:42 -07:00
tcp_dctcp.h
tcp_diag.c tcp: Access &tcp_hashinfo via net. 2022-09-20 10:21:49 -07:00
tcp_fastopen.c tcp: annotate data-races around fastopenq.max_qlen 2023-07-27 08:50:49 +02:00
tcp_highspeed.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_htcp.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_hybla.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_illinois.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_input.c tcp: do not accept ACK of bytes we never sent 2023-12-13 18:39:11 +01:00
tcp_ipv4.c ipv4, ipv6: Use splice_eof() to flush 2024-01-10 17:10:27 +01:00
tcp_lp.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_metrics.c tcp_metrics: do not create an entry from tcp_init_metrics() 2023-11-20 11:51:53 +01:00
tcp_minisocks.c tcp: annotate data-races around tcp_rsk(req)->ts_recent 2023-07-27 08:50:45 +02:00
tcp_nv.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_offload.c tcp: gso: really support BIG TCP 2023-06-14 11:15:20 +02:00
tcp_output.c net: Remove acked SYN flag from packet in the transmit queue correctly 2023-12-20 17:00:18 +01:00
tcp_rate.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-04-28 13:02:01 -07:00
tcp_recovery.c tcp: fix excessive TLP and RACK timeouts from HZ rounding 2023-10-25 12:03:06 +02:00
tcp_scalable.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_timer.c net: tcp: fix unexcepted socket die when snd_wnd is 0 2023-09-13 09:42:32 +02:00
tcp_ulp.c net/ulp: use consistent error code when blocking ULP 2023-01-24 07:24:43 +01:00
tcp_vegas.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_vegas.h
tcp_veno.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_westwood.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_yeah.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tunnel4.c
udp.c udp: annotate data-races around up->pending 2024-01-25 15:27:49 -08:00
udp_bpf.c bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() 2023-03-17 08:50:24 +01:00
udp_diag.c
udp_impl.h net: remove noblock parameter from recvmsg() entities 2022-04-12 15:00:25 +02:00
udp_offload.c udp: move udp->gro_enabled to udp->udp_flags 2024-01-10 17:10:28 +01:00
udp_tunnel_core.c udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GRO 2024-01-10 17:10:28 +01:00
udp_tunnel_nic.c udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister() 2022-02-23 12:35:00 +00:00
udp_tunnel_stub.c
udplite.c udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). 2023-05-30 14:03:20 +01:00
xfrm4_input.c udp: annotate data-races around udp->encap_type 2024-01-10 17:10:28 +01:00
xfrm4_output.c
xfrm4_policy.c net: rename reference+tracking helpers 2022-06-09 21:52:55 -07:00
xfrm4_protocol.c net: xfrm: unexport __init-annotated xfrm4_protocol_init() 2022-06-08 10:10:13 -07:00
xfrm4_state.c
xfrm4_tunnel.c xfrm: tunnel: add extack to ipip_init_state, xfrm6_tunnel_init_state 2022-09-29 07:18:00 +02:00