linux-stable/net
Florian Westphal ca66f66718 netfilter: conntrack: tcp: only close if RST matches exact sequence
[ Upstream commit be0502a3f2 ]

TCP resets cause instant transition from established to closed state
provided the reset is in-window.  Endpoints that implement RFC 5961
require resets to match the next expected sequence number.
RST segments that are in-window (but that do not match RCV.NXT) are
ignored, and a "challenge ACK" is sent back.

Main problem for conntrack is that its a middlebox, i.e.  whereas an end
host might have ACK'd SEQ (and would thus accept an RST with this
sequence number), conntrack might not have seen this ACK (yet).

Therefore we can't simply flag RSTs with non-exact match as invalid.

This updates RST processing as follows:

1. If the connection is in a state other than ESTABLISHED, nothing is
   changed, RST is subject to normal in-window check.

2. If the RSTs sequence number either matches exactly RCV.NXT,
   connection state moves to CLOSE.

3. The same applies if the RST sequence number aligns with a previous
   packet in the same direction.

In all other cases, the connection remains in ESTABLISHED state.
If the normal-in-window check passes, the timeout will be lowered
to that of CLOSE.

If the peer sends a challenge ack, connection timeout will be reset.

If the challenge ACK triggers another RST (RST was valid after all),
this 2nd RST will match expected sequence and conntrack state changes to
CLOSE.

If no challenge ACK is received, the connection will time out after
CLOSE seconds (10 seconds by default), just like without this patch.

Packetdrill test case:

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Receive a segment.
0.210 < P. 1:1001(1000) ack 1 win 46
0.210 > . 1:1(0) ack 1001

// Application writes 1000 bytes.
0.250 write(4, ..., 1000) = 1000
0.250 > P. 1:1001(1000) ack 1001

// First reset, old sequence. Conntrack (correctly) considers this
// invalid due to failed window validation (regardless of this patch).
0.260 < R  2:2(0) ack 1001 win 260

// 2nd reset, but too far ahead sequence.  Same: correctly handled
// as invalid.
0.270 < R 99990001:99990001(0) ack 1001 win 260

// in-window, but not exact sequence.
// Current Linux kernels might reply with a challenge ack, and do not
// remove connection.
// Without this patch, conntrack state moves to CLOSE.
// With patch, timeout is lowered like CLOSE, but connection stays
// in ESTABLISHED state.
0.280 < R 1010:1010(0) ack 1001 win 260

// Expect challenge ACK
0.281 > . 1001:1001(0) ack 1001 win 501

// With or without this patch, RST will cause connection
// to move to CLOSE (sequence number matches)
// 0.282 < R 1001:1001(0) ack 1001 win 260

// ACK
0.300 < . 1001:1001(0) ack 1001 win 257

// more data could be exchanged here, connection
// is still established

// Client closes the connection.
0.610 < F. 1001:1001(0) ack 1001 win 260
0.650 > . 1001:1001(0) ack 1002

// Close the connection without reading outstanding data
0.700 close(4) = 0

// so one more reset.  Will be deemed acceptable with patch as well:
// connection is already closing.
0.701 > R. 1001:1001(0) ack 1002 win 501
// End packetdrill test case.

With patch, this generates following conntrack events:
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]

Without patch, first RST moves connection to close, whereas socket state
does not change until FIN is received.
   [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]

Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:33:00 +02:00
..
6lowpan 6lowpan: iphc: reset mac_header after decompress to fix panic 2018-07-06 12:32:12 +02:00
9p 9p/net: fix memory leak in p9_client_create 2019-03-23 20:09:38 +01:00
802
8021q net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
appletalk
atm Revert "net: simplify sock_poll_wait" 2018-11-04 14:50:51 +01:00
ax25 ax25: fix possible use-after-free 2019-02-23 09:07:27 +01:00
batman-adv batman-adv: release station info tidstats 2019-03-13 14:02:34 -07:00
bluetooth Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer 2019-04-03 06:26:14 +02:00
bpf bpf/test_run: support cgroup local storage 2018-08-03 00:47:32 +02:00
bpfilter net: bpfilter: use get_pid_task instead of pid_task 2018-10-17 22:03:40 -07:00
bridge netfilter: ebtables: remove BUGPRINT messages 2019-03-27 14:14:42 +09:00
caif Revert "net: simplify sock_poll_wait" 2018-11-04 14:50:51 +01:00
can can: bcm: check timer values before ktime conversion 2019-01-31 08:14:39 +01:00
ceph libceph: wait for latest osdmap in ceph_monc_blacklist_add() 2019-03-27 14:14:39 +09:00
core net-sysfs: call dev_hold if kobject_init_and_add success 2019-04-03 06:26:17 +02:00
dcb net: dcb: Add priority-to-DSCP map getters 2018-07-27 13:17:50 -07:00
dccp dccp: do not use ipv6 header for ipv4 flow 2019-04-03 06:26:15 +02:00
decnet decnet: fix using plain integer as NULL warning 2018-08-09 14:11:24 -07:00
dns_resolver net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
dsa net: dsa: slave: Don't propagate flag changes on down slave interfaces 2019-02-12 19:47:22 +01:00
ethernet
hsr net/hsr: fix possible crash in add_timer() 2019-03-19 13:12:38 +01:00
ieee802154 ieee802154: lowpan_header_create check must check daddr 2019-01-09 17:38:31 +01:00
ife
ipv4 netfilter: ipt_CLUSTERIP: fix warning unused variable cn 2019-03-23 20:09:57 +01:00
ipv6 ila: Fix rhashtable walker list corruption 2019-04-03 06:26:18 +02:00
iucv Revert "net: simplify sock_poll_wait" 2018-11-04 14:50:51 +01:00
kcm Revert "kcm: remove any offset before parsing messages" 2018-09-17 18:43:42 -07:00
key af_key: unconditionally clone on broadcast 2019-03-23 20:09:48 +01:00
l2tp l2tp: fix infoleak in l2tp_ip6_recvmsg() 2019-03-19 13:12:38 +01:00
l3mdev
lapb
llc llc: do not use sk_eat_skb() 2018-12-01 09:37:27 +01:00
mac80211 mac80211: Fix Tx aggregation session tear down with ITXQs 2019-03-23 20:09:45 +01:00
mac802154 net: mac802154: tx: expand tailroom if necessary 2018-08-06 11:21:37 +02:00
mpls mpls: Return error for RTA_GATEWAY attribute 2019-03-10 07:17:19 +01:00
ncsi net/ncsi: Fixup .dumpit message flags and ID check in Netlink handler 2018-08-22 21:39:08 -07:00
netfilter netfilter: conntrack: tcp: only close if RST matches exact sequence 2019-04-05 22:33:00 +02:00
netlabel netlabel: fix out-of-bounds memory accesses 2019-03-10 07:17:18 +01:00
netlink genetlink: Fix a memory leak on error path 2019-04-03 06:26:15 +02:00
netrom netrom: switch to sock timer API 2019-02-06 17:30:07 +01:00
nfc net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails 2019-03-10 07:17:18 +01:00
nsh nsh: set mac len based on inner packet 2018-07-12 16:55:29 -07:00
openvswitch openvswitch: Avoid OOB read when parsing flow nlattrs 2019-01-31 08:14:32 +01:00
packet packets: Always register packet sk in the same order 2019-04-03 06:26:17 +02:00
phonet phonet: fix building with clang 2019-03-23 20:09:51 +01:00
psample
qrtr net: qrtr: Reset the node and port ID of broadcast messages 2018-07-05 20:20:03 +09:00
rds rds: fix refcount bug in rds_sock_addref 2019-02-12 19:47:22 +01:00
rfkill Here are quite a large number of fixes, notably: 2018-09-03 22:12:02 -07:00
rose net: rose: fix a possible stack overflow 2019-04-03 06:26:17 +02:00
rxrpc rxrpc: Fix client call queueing, waiting for channel 2019-03-19 13:12:39 +01:00
sched net: sched: fix cleanup NULL pointer exception in act_mirr 2019-04-03 06:26:19 +02:00
sctp sctp: use memdup_user instead of vmemdup_user 2019-04-03 06:26:17 +02:00
smc net/smc: fix smc_poll in SMC_INIT state 2019-03-19 13:12:41 +01:00
strparser strparser: remove redundant variable 'rd_desc' 2018-08-01 10:00:06 -07:00
sunrpc svcrpc: fix UDP on servers with lots of threads 2019-03-23 20:10:10 +01:00
switchdev
tipc tipc: fix cancellation of topology subscriptions 2019-04-03 06:26:18 +02:00
tls net/tls: Init routines in create_ctx 2019-01-13 09:51:00 +01:00
unix missing barriers in some of unix_sock ->addr and ->path accesses 2019-03-19 13:12:41 +01:00
vmw_vsock vsock/virtio: reset connected sockets on device removal 2019-03-13 14:02:36 -07:00
wimax wimax: remove blank lines at EOF 2018-07-24 14:10:42 -07:00
wireless cfg80211: extend range deviation for DMG 2019-03-05 17:58:52 +01:00
x25 net/x25: fix a race in x25_bind() 2019-03-19 13:12:40 +01:00
xdp xsk: do not call synchronize_net() under RCU read lock 2018-10-11 10:19:01 +02:00
xfrm xfrm: Fix inbound traffic via XFRM interfaces across network namespaces 2019-03-23 20:09:49 +01:00
compat.c sock: Make sock->sk_stamp thread-safe 2019-01-09 17:38:33 +01:00
Kconfig net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
Makefile
socket.c net: socket: set sock->sk to NULL after calling proto_ops::release() 2019-03-10 07:17:18 +01:00
sysctl_net.c