linux-stable/net
Matthieu Baerts (NGI0) 4fe707a297 tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
commit c166829268 upstream.

The 'Fixes' commit recently changed the behaviour of TCP by skipping the
processing of the 3rd ACK when a sk->sk_socket is set. The goal was to
skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an
unnecessary ACK in case of simultaneous connect(). Unfortunately, that
had an impact on TFO and MPTCP.

I started to look at the impact on MPTCP, because the MPTCP CI found
some issues with the MPTCP Packetdrill tests [1]. Then Paolo Abeni
suggested me to look at the impact on TFO with "plain" TCP.

For MPTCP, when receiving the 3rd ACK of a request adding a new path
(MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that
has been created when the MPTCP connection got established before with
the first path. The newly added 'goto' will then skip the processing of
the segment text (step 7) and not go through tcp_data_queue() where the
MPTCP options are validated, and some actions are triggered, e.g.
sending the MPJ 4th ACK [2] as demonstrated by the new errors when
running a packetdrill test [3] establishing a second subflow.

This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be
delayed. Still, we don't want to have this behaviour as it delays the
switch to the fully established mode, and invalid MPTCP options in this
3rd ACK will not be caught any more. This modification also affects the
MPTCP + TFO feature as well, and being the reason why the selftests
started to be unstable the last few days [4].

For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer
passing: if the 3rd ACK contains data, and the connection is accept()ed
before receiving them, these data would no longer be processed, and thus
not ACKed.

One last thing about MPTCP, in case of simultaneous connect(), a
fallback to TCP will be done, which seems fine:

  `../common/defaults.sh`

   0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3
  +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)

  +0 > S  0:0(0)                 <mss 1460, sackOK, TS val 100 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 < S  0:0(0) win 1000        <mss 1460, sackOK, TS val 407 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 > S. 0:0(0) ack 1           <mss 1460, sackOK, TS val 330 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]>
  +0 >  . 1:1(0) ack 1           <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1>

Simultaneous SYN-data crossing is also not supported by TFO, see [6].

Kuniyuki Iwashima suggested to restrict the processing to SYN+ACK only:
that's a more generic solution than the one initially proposed, and
also enough to fix the issues described above.

Later on, Eric Dumazet mentioned that an ACK should still be sent in
reaction to the second SYN+ACK that is received: not sending a DUPACK
here seems wrong and could hurt:

   0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
  +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)

  +0 > S  0:0(0)                <mss 1460, sackOK, TS val 1000 ecr 0,nop,wscale 8>
  +0 < S  0:0(0)       win 1000 <mss 1000, sackOK, nop, nop>
  +0 > S. 0:0(0) ack 1          <mss 1460, sackOK, TS val 3308134035 ecr 0,nop,wscale 8>
  +0 < S. 0:0(0) ack 1 win 1000 <mss 1000, sackOK, nop, nop>
  +0 >  . 1:1(0) ack 1          <nop, nop, sack 0:1>  // <== Here

So in this version, the 'goto consume' is dropped, to always send an ACK
when switching from TCP_SYN_RECV to TCP_ESTABLISHED. This ACK will be
seen as a DUPACK -- with DSACK if SACK has been negotiated -- in case of
simultaneous SYN crossing: that's what is expected here.

Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696 [1]
Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [2]
Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28 [3]
Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [4]
Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21 [5]
Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/client/simultaneous-fast-open.pkt [6]
Fixes: 23e89e8ee7 ("tcp: Don't drop SYN+ACK for simultaneous connect().")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240724-upstream-net-next-20240716-tcp-3rd-ack-consume-sk_socket-v3-1-d48339764ce9@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12 11:11:40 +02:00
..
6lowpan
9p net/9p: fix uninit-value in p9_client_rpc() 2024-06-16 13:47:41 +02:00
802
8021q net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb 2024-05-17 12:02:07 +02:00
appletalk appletalk: Fix Use-After-Free in atalk_ioctl 2023-12-20 17:01:50 +01:00
atm atm: Fix Use-After-Free in do_vcc_ioctl 2023-12-20 17:01:48 +01:00
ax25 ax25: Replace kfree() in ax25_dev_free() with ax25_dev_put() 2024-06-21 14:38:14 +02:00
batman-adv batman-adv: Don't accept TT entries for out-of-spec VIDs 2024-07-05 09:34:04 +02:00
bluetooth Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT 2024-09-12 11:11:33 +02:00
bpf bpf: Set run context for rawtp test_run callback 2024-06-21 14:38:16 +02:00
bpfilter
bridge net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN 2024-09-12 11:11:35 +02:00
caif
can can: bcm: Remove proc entry when dev is unregistered. 2024-09-12 11:11:31 +02:00
ceph libceph: fix race between delayed_work() and ceph_monc_stop() 2024-07-18 13:21:22 +02:00
core net/socket: Break down __sys_getsockopt 2024-09-12 11:11:34 +02:00
dcb
dccp tcp/dccp: do not care about families in inet_twsk_purge() 2024-08-29 17:33:46 +02:00
devlink devlink: fix port new reply cmd type 2024-03-26 18:20:11 -04:00
dns_resolver keys, dns: Fix size check of V1 server-list header 2024-01-25 15:35:41 -08:00
dsa net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection 2024-08-29 17:33:45 +02:00
ethernet ethernet: Add helper for assigning packet type when dest address does not match device address 2024-05-02 16:32:46 +02:00
ethtool ethtool: check device is present when getting link settings 2024-09-04 13:28:26 +02:00
handshake net/handshake: Fix handshake_req_destroy_test1 2024-02-23 09:24:50 +01:00
hsr hsr: Simplify code for announcing HSR nodes timer setup 2024-05-17 12:02:24 +02:00
ieee802154
ife net: sched: ife: fix potential use-after-free 2024-01-01 12:42:30 +00:00
ipv4 tcp: process the 3rd ACK with sk_socket for TFO/MPTCP 2024-09-12 11:11:40 +02:00
ipv6 ila: call nf_unregister_net_hooks() sooner 2024-09-12 11:11:28 +02:00
iucv s390/iucv: fix receive buffer virtual vs physical address confusion 2024-08-29 17:33:39 +02:00
kcm kcm: Serialise kcm_sendmsg() for the same socket. 2024-08-29 17:33:46 +02:00
key
l2tp l2tp: fix lockdep splat 2024-08-14 13:58:40 +02:00
l3mdev
lapb
llc llc: call sock_orphan() at release time 2024-02-05 20:14:36 +00:00
mac80211 wifi: mac80211: check ieee80211_bss_info_change_notify() against MLD 2024-09-08 07:54:43 +02:00
mac802154 net: mac802154: Fix racy device stats updates by DEV_STATS_INC() and DEV_STATS_ADD() 2024-07-25 09:50:52 +02:00
mctp net: mctp: test: Use correct skb for route input check 2024-08-29 17:33:46 +02:00
mpls net: mpls: error out if inner headers are not set 2024-04-13 13:07:41 +02:00
mptcp mptcp: pr_debug: add missing \n at the end 2024-09-08 07:54:35 +02:00
ncsi net/ncsi: Fix the multi thread manner of NCSI driver 2024-06-21 14:38:14 +02:00
netfilter netfilter: nf_conncount: fix wrong variable type 2024-09-12 11:11:29 +02:00
netlabel calipso: fix memory leak in netlbl_calipso_add_pass() 2024-01-25 15:35:14 -08:00
netlink netlink: hold nlk->cb_mutex longer in __netlink_dump_start() 2024-08-29 17:33:35 +02:00
netrom netrom: Fix a memory leak in nr_heartbeat_expiry() 2024-06-27 13:49:06 +02:00
nfc nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies 2024-08-19 06:04:28 +02:00
nsh nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment(). 2024-05-17 12:02:02 +02:00
openvswitch net: ovs: fix ovs_drop_reasons error 2024-08-29 17:33:50 +02:00
packet af_packet: Handle outgoing VLAN packets without hardware offloading 2024-08-03 08:54:13 +02:00
phonet phonet: fix rtm_phonet_notify() skb allocation 2024-05-17 12:02:22 +02:00
psample psample: Require 'CAP_NET_ADMIN' when joining "packets" group 2023-12-13 18:45:10 +01:00
qrtr net: qrtr: ns: Fix module refcnt 2024-06-12 11:12:12 +02:00
rds net:rds: Fix possible deadlock in rds_message_put 2024-08-19 06:04:27 +02:00
rfkill net: rfkill: gpio: set GPIO direction 2024-01-01 12:42:41 +00:00
rose net/rose: fix races in rose_kill_by_device() 2024-01-01 12:42:31 +00:00
rxrpc rxrpc: Don't pick values out of the wire header when setting up security 2024-08-29 17:33:36 +02:00
sched sched: sch_cake: fix bulk flow accounting logic for host fairness 2024-09-12 11:11:28 +02:00
sctp sctp: fix association labeling in the duplicate COOKIE-ECHO case 2024-09-04 13:28:27 +02:00
smc net/smc: add the max value of fallback reason count 2024-08-14 13:58:40 +02:00
strparser
sunrpc sunrpc: use the struct net as the svc proc private 2024-08-19 06:04:23 +02:00
switchdev net: bridge: switchdev: Skip MDB replays of deferred events on offload 2024-03-01 13:35:06 +01:00
tipc tipc: Return non-zero value from tipc_udp_addr2str() on error 2024-08-03 08:54:37 +02:00
tls tls: fix missing memory barrier in tls_init 2024-06-12 11:12:50 +02:00
unix af_unix: Remove put_pid()/put_cred() in copy_peercred(). 2024-09-12 11:11:29 +02:00
vmw_vsock vsock: fix recursive ->recvmsg calls 2024-08-29 17:33:21 +02:00
wireless wifi: cfg80211: make hash table duplicates more survivable 2024-09-08 07:54:47 +02:00
x25 net/x25: fix incorrect parameter validation in the x25_getsockopt() function 2024-03-26 18:19:41 -04:00
xdp xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING 2024-04-17 11:19:28 +02:00
xfrm xfrm: call xfrm_dev_policy_delete when kill policy 2024-08-03 08:53:42 +02:00
compat.c
devres.c
Kconfig
Kconfig.debug
Makefile
socket.c bpf, net: Fix a potential race in do_sock_getsockopt() 2024-09-12 11:11:34 +02:00
sysctl_net.c sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table) 2024-08-11 12:47:13 +02:00