No description
Find a file
Kuniyuki Iwashima 333bb73f62 tcp: Keep TCP_CLOSE sockets in the reuseport group.
When we close a listening socket, to migrate its connections to another
listener in the same reuseport group, we have to handle two kinds of child
sockets. One is that a listening socket has a reference to, and the other
is not.

The former is the TCP_ESTABLISHED/TCP_SYN_RECV sockets, and they are in the
accept queue of their listening socket. So we can pop them out and push
them into another listener's queue at close() or shutdown() syscalls. On
the other hand, the latter, the TCP_NEW_SYN_RECV socket is during the
three-way handshake and not in the accept queue. Thus, we cannot access
such sockets at close() or shutdown() syscalls. Accordingly, we have to
migrate immature sockets after their listening socket has been closed.

Currently, if their listening socket has been closed, TCP_NEW_SYN_RECV
sockets are freed at receiving the final ACK or retransmitting SYN+ACKs. At
that time, if we could select a new listener from the same reuseport group,
no connection would be aborted. However, we cannot do that because
reuseport_detach_sock() sets NULL to sk_reuseport_cb and forbids access to
the reuseport group from closed sockets.

This patch allows TCP_CLOSE sockets to remain in the reuseport group and
access it while any child socket references them. The point is that
reuseport_detach_sock() was called twice from inet_unhash() and
sk_destruct(). This patch replaces the first reuseport_detach_sock() with
reuseport_stop_listen_sock(), which checks if the reuseport group is
capable of migration. If capable, it decrements num_socks, moves the socket
backwards in socks[] and increments num_closed_socks. When all connections
are migrated, sk_destruct() calls reuseport_detach_sock() to remove the
socket from socks[], decrement num_closed_socks, and set NULL to
sk_reuseport_cb.

By this change, closed or shutdowned sockets can keep sk_reuseport_cb.
Consequently, calling listen() after shutdown() can cause EADDRINUSE or
EBUSY in inet_csk_bind_conflict() or reuseport_add_sock() which expects
such sockets not to have the reuseport group. Therefore, this patch also
loosens such validation rules so that a socket can listen again if it has a
reuseport group with num_closed_socks more than 0.

When such sockets listen again, we handle them in reuseport_resurrect(). If
there is an existing reuseport group (reuseport_add_sock() path), we move
the socket from the old group to the new one and free the old one if
necessary. If there is no existing group (reuseport_alloc() path), we
allocate a new reuseport group, detach sk from the old one, and free it if
necessary, not to break the current shutdown behaviour:

  - we cannot carry over the eBPF prog of shutdowned sockets
  - we cannot attach/detach an eBPF prog to/from listening sockets via
    shutdowned sockets

Note that when the number of sockets gets over U16_MAX, we try to detach a
closed socket randomly to make room for the new listening socket in
reuseport_grow().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20210612123224.12525-4-kuniyu@amazon.co.jp
2021-06-15 18:01:05 +02:00
arch Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-05-19 12:58:29 -07:00
block block-5.13-2021-05-07 2021-05-07 11:35:12 -07:00
certs Kbuild updates for v5.13 (2nd) 2021-05-08 10:00:11 -07:00
crypto for-5.13/drivers-2021-04-27 2021-04-28 14:39:37 -07:00
Documentation net: Introduce net.ipv4.tcp_migrate_req. 2021-06-15 18:01:05 +02:00
drivers ethernet: ucc_geth: Use kmemdup() rather than kmalloc+memcpy 2021-05-23 18:51:42 -07:00
fs Kbuild updates for v5.13 (2nd) 2021-05-08 10:00:11 -07:00
include tcp: Keep TCP_CLOSE sockets in the reuseport group. 2021-06-15 18:01:05 +02:00
init Merge branch 'akpm' (patches from Andrew) 2021-05-07 00:34:51 -07:00
ipc ipc/sem.c: spelling fix 2021-05-07 00:26:34 -07:00
kernel bpf, tnums: Provably sound, faster, and more precise algorithm for tnum_mul 2021-06-01 13:34:15 +02:00
lib Kbuild updates for v5.13 (2nd) 2021-05-08 10:00:11 -07:00
LICENSES LICENSES: Add the CC-BY-4.0 license 2020-12-08 10:33:27 -07:00
mm mm: fix typos in comments 2021-05-07 00:26:35 -07:00
net tcp: Keep TCP_CLOSE sockets in the reuseport group. 2021-06-15 18:01:05 +02:00
samples sample/bpf: Add xdp_redirect_map_multi for redirect_map broadcast test 2021-05-26 09:46:16 +02:00
scripts Kbuild updates for v5.13 (2nd) 2021-05-08 10:00:11 -07:00
security Simple code cleanup 2021-05-05 12:08:06 -07:00
sound sound fixes for 5.13-rc1 2021-05-07 11:40:18 -07:00
tools libbpf: Set NLM_F_EXCL when creating qdisc 2021-06-15 14:00:30 +02:00
usr .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
virt KVM: Boost vCPU candidate in user mode which is delivering interrupt 2021-04-21 12:20:03 -04:00
.clang-format cxl for 5.12 2021-02-24 09:38:36 -08:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap It's been a relatively busy cycle in docsland, though more than usually 2021-04-26 13:22:43 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: move Murali Karicheri to credits 2021-04-29 15:47:30 -07:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS net: phy: add driver for Motorcomm yt8511 phy 2021-05-21 13:19:11 -07:00
Makefile Kbuild updates for v5.13 (2nd) 2021-05-08 10:00:11 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.