af_unix: Don't retry after unix_state_lock_nested() in unix_stream_connect().

[ Upstream commit 1ca27e0c8c ]

When a SOCK_(STREAM|SEQPACKET) socket connect()s to another one, we need
to lock the two sockets to check their states in unix_stream_connect().

We use unix_state_lock() for the server and unix_state_lock_nested() for
client with tricky sk->sk_state check to avoid deadlock.

The possible deadlock scenario are the following:

  1) Self connect()
  2) Simultaneous connect()

The former is simple, attempt to grab the same lock, and the latter is
AB-BA deadlock.

After the server's unix_state_lock(), we check the server socket's state,
and if it's not TCP_LISTEN, connect() fails with -EINVAL.

Then, we avoid the former deadlock by checking the client's state before
unix_state_lock_nested().  If its state is not TCP_LISTEN, we can make
sure that the client and the server are not identical based on the state.

Also, the latter deadlock can be avoided in the same way.  Due to the
server sk->sk_state requirement, AB-BA deadlock could happen only with
TCP_LISTEN sockets.  So, if the client's state is TCP_LISTEN, we can
give up the second lock to avoid the deadlock.

  CPU 1                 CPU 2                  CPU 3
  connect(A -> B)       connect(B -> A)        listen(A)
  ---                   ---                    ---
  unix_state_lock(B)
  B->sk_state == TCP_LISTEN
  READ_ONCE(A->sk_state) == TCP_CLOSE
                            ^^^^^^^^^
                            ok, will lock A    unix_state_lock(A)
             .--------------'                  WRITE_ONCE(A->sk_state, TCP_LISTEN)
             |                                 unix_state_unlock(A)
             |
             |          unix_state_lock(A)
             |          A->sk_sk_state == TCP_LISTEN
             |          READ_ONCE(B->sk_state) == TCP_LISTEN
             v                                    ^^^^^^^^^^
  unix_state_lock_nested(A)                       Don't lock B !!

Currently, while checking the client's state, we also check if it's
TCP_ESTABLISHED, but this is unlikely and can be checked after we know
the state is not TCP_CLOSE.

Moreover, if it happens after the second lock, we now jump to the restart
label, but it's unlikely that the server is not found during the retry,
so the jump is mostly to revist the client state check.

Let's remove the retry logic and check the state against TCP_CLOSE first.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This commit is contained in:
Kuniyuki Iwashima 2024-06-20 13:56:15 -07:00 committed by Greg Kroah-Hartman
parent ce7172b0d5
commit bd0b5563ec

View file

@ -1461,6 +1461,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
struct unix_sock *u = unix_sk(sk), *newu, *otheru;
struct net *net = sock_net(sk);
struct sk_buff *skb = NULL;
unsigned char state;
long timeo;
int err;
@ -1505,7 +1506,6 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
goto out;
}
/* Latch state of peer */
unix_state_lock(other);
/* Apparently VFS overslept socket death. Retry. */
@ -1535,37 +1535,21 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
goto restart;
}
/* Latch our state.
It is tricky place. We need to grab our state lock and cannot
drop lock on peer. It is dangerous because deadlock is
possible. Connect to self case and simultaneous
attempt to connect are eliminated by checking socket
state. other is TCP_LISTEN, if sk is TCP_LISTEN we
check this before attempt to grab lock.
Well, and we have to recheck the state after socket locked.
/* self connect and simultaneous connect are eliminated
* by rejecting TCP_LISTEN socket to avoid deadlock.
*/
switch (READ_ONCE(sk->sk_state)) {
case TCP_CLOSE:
/* This is ok... continue with connect */
break;
case TCP_ESTABLISHED:
/* Socket is already connected */
err = -EISCONN;
goto out_unlock;
default:
err = -EINVAL;
state = READ_ONCE(sk->sk_state);
if (unlikely(state != TCP_CLOSE)) {
err = state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;
goto out_unlock;
}
unix_state_lock_nested(sk, U_LOCK_SECOND);
if (sk->sk_state != TCP_CLOSE) {
if (unlikely(sk->sk_state != TCP_CLOSE)) {
err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;
unix_state_unlock(sk);
unix_state_unlock(other);
sock_put(other);
goto restart;
goto out_unlock;
}
err = security_unix_stream_connect(sk, other, newsk);