linux-stable/net/ipv6
Lorenz Bauer 9c02bec959 bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
sockets. This means we can't use the helper to steer traffic to Envoy,
which configures SO_REUSEPORT on its sockets. In turn, we're blocked
from removing TPROXY from our setup.

The reason that bpf_sk_assign refuses such sockets is that the
bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead,
one of the reuseport sockets is selected by hash. This could cause
dispatch to the "wrong" socket:

    sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
    bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed

Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
helpers unfortunately. In the tc context, L2 headers are at the start
of the skb, while SK_REUSEPORT expects L3 headers instead.

Instead, we execute the SK_REUSEPORT program when the assigned socket
is pulled out of the skb, further up the stack. This creates some
trickiness with regards to refcounting as bpf_sk_assign will put both
refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
freed. We can infer that the sk_assigned socket is RCU freed if the
reuseport lookup succeeds, but convincing yourself of this fact isn't
straight forward. Therefore we defensively check refcounting on the
sk_assign sock even though it's probably not required in practice.

Fixes: 8e368dc72e ("bpf: Fix use of sk->sk_reuseport from sk_assign")
Fixes: cf7fbe660f ("bpf: Add socket assign support")
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@cilium.io>
Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-7-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-07-25 13:55:55 -07:00
..
ila ila: do not generate empty messages in ila_xlat_nl_cmd_get_mapping() 2023-03-01 08:48:46 +00:00
netfilter Revert "net: Remove low_thresh in ip defrag" 2023-05-16 20:46:30 -07:00
addrconf.c net: add sysctl accept_ra_min_rtr_lft 2023-07-23 11:51:24 +01:00
addrconf_core.c net: rename reference+tracking helpers 2022-06-09 21:52:55 -07:00
addrlabel.c ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to network 2022-11-07 12:26:15 +00:00
af_inet6.c ipv6: remove hard coded limitation on ipv6_pinfo 2023-07-24 09:39:31 +01:00
ah6.c net: ipv6: Remove completion function scaffolding 2023-02-13 18:35:15 +08:00
anycast.c
calipso.c cipso,calipso: resolve a number of problems with the DOI refcounts 2021-03-04 15:26:57 -08:00
datagram.c ipv6: Constify the sk parameter of several helper functions. 2023-07-14 08:27:33 +01:00
esp6.c net: ipv6: Remove completion function scaffolding 2023-02-13 18:35:15 +08:00
esp6_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-22 18:40:38 -07:00
exthdrs.c ipv6: rpl: Remove redundant skb_dst_drop(). 2023-07-12 17:12:29 -07:00
exthdrs_core.c ipv6: Fix out-of-bounds access in ipv6_find_tlv() 2023-05-24 08:43:39 +01:00
exthdrs_offload.c
fib6_notifier.c
fib6_rules.c ipv6: change fib6_rules_net_exit() to batch mode 2022-02-08 20:41:34 -08:00
fou6.c
icmp.c ipv6: Constify the sk parameter of several helper functions. 2023-07-14 08:27:33 +01:00
inet6_connection_sock.c net: annotate lockless accesses to sk->sk_err_soft 2023-03-17 08:25:05 +00:00
inet6_hashtables.c net: remove duplicate sk_lookup helpers 2023-07-25 13:51:44 -07:00
ioam6.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
ioam6_iptunnel.c ipv6: ioam: Insertion frequency in lwtunnel output 2022-02-04 20:24:45 -08:00
ip6_checksum.c
ip6_fib.c ipv6: remove nexthop_fib6_nh_bh() 2023-05-11 18:07:05 -07:00
ip6_flowlabel.c ipv6: flowlabel: do not disable BH where not needed 2023-03-21 21:32:18 -07:00
ip6_gre.c net:ipv6: check return value of pskb_trim() 2023-07-19 12:25:58 +01:00
ip6_icmp.c net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending 2021-02-23 11:29:52 -08:00
ip6_input.c netfilter: keep conntrack reference until IPsecv6 policy checks are done 2023-03-22 21:50:23 +01:00
ip6_offload.c net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
ip6_offload.h
ip6_output.c ip, ip6: Fix splice to raw and ping sockets 2023-06-16 11:45:16 -07:00
ip6_tunnel.c net: tunnels: annotate lockless accesses to dev->needed_headroom 2023-03-15 00:04:04 -07:00
ip6_udp_tunnel.c
ip6_vti.c ipv6: tunnels: use DEV_STATS_INC() 2022-11-16 12:48:44 +00:00
ip6mr.c net: ioctl: Use kernel memory on protocol ioctl callbacks 2023-06-15 22:33:26 -07:00
ipcomp6.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipv6_sockglue.c net/ipv6: Initialise msg_control_is_user 2023-04-14 11:09:27 +01:00
Kconfig crypto: lib - make the sha1 library optional 2022-07-15 16:43:59 +08:00
Makefile net: ipv6: use ipv6-y directly instead of ipv6-objs 2021-09-28 13:13:40 +01:00
mcast.c ipv6: Constify the sk parameter of several helper functions. 2023-07-14 08:27:33 +01:00
mcast_snoop.c net: bridge: mcast: fix broken length + header check for MRDv6 Adv. 2021-04-27 14:02:06 -07:00
mip6.c xfrm: mip6: add extack to mip6_destopt_init_state, mip6_rthdr_init_state 2022-09-29 07:18:01 +02:00
ndisc.c net: add sysctl accept_ra_min_rtr_lft 2023-07-23 11:51:24 +01:00
netfilter.c netfilter: Use l3mdev flow key when re-routing mangled packets 2022-05-16 13:03:29 +02:00
output_core.c treewide: use get_random_u32_{above,below}() instead of manual loop 2022-11-18 02:15:22 +01:00
ping.c ipv6: remove hard coded limitation on ipv6_pinfo 2023-07-24 09:39:31 +01:00
proc.c icmp: Add counters for rate limits 2023-01-26 10:52:18 +01:00
protocol.c
raw.c ipv6: remove hard coded limitation on ipv6_pinfo 2023-07-24 09:39:31 +01:00
reassembly.c Revert "net: Remove low_thresh in ip defrag" 2023-05-16 20:46:30 -07:00
route.c ipv6: also use netdev_hold() in ip6_route_check_nh() 2023-06-18 14:27:45 +01:00
rpl.c ipv6: rpl: Remove pskb(_may)?_pull() in ipv6_rpl_srh_rcv(). 2023-06-19 11:32:58 -07:00
rpl_iptunnel.c ipv6: rpl: Remove redundant skb_dst_drop(). 2023-07-12 17:12:29 -07:00
seg6.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-09-08 18:38:30 +02:00
seg6_hmac.c net: ipv6: unexport __init-annotated seg6_hmac_net_init() 2022-06-28 21:23:30 -07:00
seg6_iptunnel.c seg6: Cleanup duplicates of skb_dst_drop calls 2023-05-17 09:05:47 +01:00
seg6_local.c seg6: add PSP flavor support for SRv6 End behavior 2023-02-16 13:18:06 +01:00
sit.c sit: update dev->needed_headroom in ipip6_tunnel_bind_dev() 2023-04-28 09:48:14 +01:00
syncookies.c tcp: Fix data-races around sysctl_tcp_syncookies. 2022-07-18 12:21:54 +01:00
sysctl_net_ipv6.c net: sysctl: introduce sysctl SYSCTL_THREE 2022-05-03 10:15:06 +02:00
tcp_ipv6.c ipv6: remove hard coded limitation on ipv6_pinfo 2023-07-24 09:39:31 +01:00
tcpv6_offload.c net: Make gro complete function to return void 2023-05-31 09:50:17 +01:00
tunnel6.c
udp.c bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign 2023-07-25 13:55:55 -07:00
udp_impl.h tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). 2022-10-12 17:50:37 -07:00
udp_offload.c gso: fix dodgy bit handling for GSO_UDP_L4 2023-07-14 10:29:20 +01:00
udplite.c ipv6: remove hard coded limitation on ipv6_pinfo 2023-07-24 09:39:31 +01:00
xfrm6_input.c xfrm: fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets 2023-06-09 08:16:34 +02:00
xfrm6_output.c xfrm: fix tunnel model fragmentation behavior 2022-03-01 12:08:40 +01:00
xfrm6_policy.c net: dst: fix missing initialization of rt_uncached 2023-04-21 20:26:56 -07:00
xfrm6_protocol.c
xfrm6_state.c
xfrm6_tunnel.c xfrm: tunnel: add extack to ipip_init_state, xfrm6_tunnel_init_state 2022-09-29 07:18:00 +02:00